Peak oxygen uptake (peak ^{−1}·min^{−1}). More appropriate means of accommodating body size were well-established in the general biological sciences (eg, 19) before the initial studies of children’s aerobic fitness but were either not considered, or noted and ignored, by the pioneers of pediatric exercise physiology. This set a trend which has continued to the present day and which has not only clouded our current understanding of the growth and development of youth aerobic fitness (and other size-related physiological variables) but also misrepresented the role of aerobic fitness in subsequent analyses. For example, the generation of fitness norms for young people’s health and well-being, and the assessment of fitness in young people with serious long-term health conditions.

Robinson’s (28) laboratory-based study of physical fitness in 6- to 91-year-old males, in 1938, was the first to include boys. In accord with earlier studies of men, ^{−1} before, without a rationale, being “referred to body weight” (p. 280) and presented as a ratio in mL·kg^{−1}·min^{−1}. Robinson compared the ratio-scaled data of the men in his study with his own unpublished data and with values he calculated from individual values of ^{−1}) and body mass (in kg) published in classical papers which had themselves not ratio-scaled their data (16,17,29). In the second laboratory-based investigation of boys’ peak ^{−1} but, without an underpinning rationale or statistical justification, only reported in ratio with body mass (25).

In 1952, the first study to include girls, Åstrand (10) presented and discussed his data in both absolute (L·min^{−1}) and ratio-scaled (mL·kg^{−1}·min^{−1}) terms but, insightfully, expressed reservations about whether this approach was appropriate with children. Specifically, he noted that group comparisons of peak ^{0.67}.

The influential Saskatchewan Growth and Development study of the 1970s–1980s was the first longitudinal study to evaluate physiological parameters in both boys and girls over an extended period (24). By the time of the study, Tanner’s (36) seminal paper, subsequently reexplored with specific reference to oxygen uptake (20) had unequivocally established that expressing physiological functions such as ^{−1} and mL·kg^{−1}·min^{−1}.

In his landmark book, Bar-Or (12) stated that, “Although theoretically not the method of choice, the most common way of expressing maximal O_{2} uptake for comparative purposes has been per kilogram body weight” (p. 4) but then proceeded to discuss ratio-scaled data in relation to age. Over the next 20 years, regular critical reviews and empirical reports (eg, 6,42,43,49) clearly demonstrated the inadequacy of ratio scaling for interpreting physiological data in relation to body size, and new textbooks of pediatric exercise physiology devoted whole chapters to the issue (eg, 5,31,39,48). In the second edition of Bar-Or’s text, coauthored with Rowland and now including an appendix reviewing “Scaling for size differences,” it was reiterated that, “ratio scaling is not the ideal way to compare the maximal aerobic power of people who differ in body size” (13, p. 6) but noted that as, “the most convenient and traditionally accepted way … a majority of studies still express maximal O_{2} uptake per kilogram body mass” (p. 7). In the 15 years since this publication, incredibly, nothing appears to have changed. Second editions and new textbooks (eg, 30,38,40,41) have updated tutorial chapters which clearly demonstrate the fallacy of ratio scaling and advocate the use of alternative methodology. The vast majority of authors, however, persist in failing to present a scientific evidence-based rationale and/or statistical justification for the use of ratio scaling but “conveniently” report peak ^{−1}·min^{−1} as the primary (or often the only) descriptor of youth aerobic fitness regardless of context. We have even been advised ourselves by both reviewers and editors of reputable journals to erroneously ratio scale our data and present it in this manner, “if it is to be considered further for publication”!

With reference to interpreting growth-related data, we agree with Nobel laureate Andre Gide who allegedly began many of his lectures with, “Everything has already been said, but since nobody was listening, we have to start again.” The objective of this paper is therefore to question the “convenient and traditional” use of mL·kg^{−1}·min^{−1} to describe youth peak

## Sources of Data for This Paper

Over the 20-year period between the first laboratory measurements of children’s peak

In this paper, we draw on all of our cross-sectional data from this period totaling 958 TM peak

Summary of Age Range, Mean Values for Peak Oxygen Uptake Expressed in Absolute, Relative, and Allometric Terms and the Correlation Coefficients Between These and Body Mass in 20 Individual Study Groups

Study(Reference) | Age, y | Sex | n | Peak ^{−1} | Correlation coefficients for peak ^{−1}) andbody mass | Peak ^{−1}·min^{−1}) | Correlationcoefficients for peak ^{−1}·min^{−1}) and body mass | Allometric (b) exponent ± standard error | Correlation coefficientsfor peak ^{−}·min^{b}^{−1}) and body mass |
---|---|---|---|---|---|---|---|---|---|

1 (8) | 11–16 | M | 113 | 2.31 (0.60) | .84, P < .01 | 49 (7) | ns | b = 0.94 ± 0.06 | ns |

2 (4) | 9–10 | M | 17 | 1.94 (0.24) | .83, P < .01 | 62 (5) | −.63, P < .01 | b = 0.64 ± 0.11* | ns |

3 (35) | 9–10 | M | 19 | 1.78 (0.12) | .71, P < .01 | 58 (8) | −.87, P < .01 | b = 0.37 ± 0.09* | ns |

4 (47) | 9–10 | M | 16 | 1.68 (0.22) | .61, P < .01 | 57 (7) | −.51, P < .05 | b = 0.53 ± 0.20* | ns |

5 (45) | 9–10 | M | 45 | 1.85 (0.25) | .70, P < .01 | 53 (7) | −.71, P < .01 | b = 0.47 ± 0.08* | ns |

6 (7) | 10–11 | M | 125 | 1.81 (0.26) | .70, P < .01 | 50 (6) | −.54, P < .01 | b = 0.62 ± 0.05* | ns |

7 (3) | 11–12 | M | 36 | 2.06 (0.36) | .74, P < .01 | 53 (7) | −.55, P < .01 | b = 0.63 ± 0.11* | ns |

8 (3) | 13–14 | M | 36 | 2.76 (0.51) | .75, P < .01 | 54 (6) | −.43, P < .01 | b = 0.76 ± 0.10* | ns |

9 (3) | 16–18 | M | 19 | 3.99 (0.80) | .69, P < .01 | 54 (9) | ns | b = 0.79 ± 0.19 | ns |

10 (8) | 11–16 | F | 107 | 1.90 (0.36) | .66, P < .01 | 41 (6) | −.50, P < .01 | b = 0.61 ± 0.06* | ns |

11 (35) | 9–10 | F | 17 | 1.53 (0.20) | .71, P < .01 | 49 (6) | −.62, P < .01 | b = 0.54 ± 0.12* | ns |

12 (4) | 9–10 | F | 17 | 1.80 (0.18) | .80, P < .01 | 52 (6) | −.85, P < .01 | b = 0.45 ± 0.09* | ns |

13 (44) | 9–10 | F | 54 | 1.68 (0.27) | .77, P < .01 | 49 (6) | −.57, P < .05 | b = 0.65 ± 0.07* | ns |

14 (23) | 9–10 | F | 30 | 1.43 (0.22) | .79, P < .01 | 46 (5) | −.37, P < .04 | b = 0.76 ± 0.10* | ns |

15 (7) | 10–11 | F | 128 | 1.62 (0.27) | .83, P < .01 | 43 (5) | −.50, P < .01 | b = 0.72 ± 0.04* | ns |

16 (3) | 11–12 | F | 35 | 1.97 (0.31) | .85, P < .01 | 47 (5) | −.61, P < .01 | b = 0.68 ± 0.07* | ns |

17 (3) | 12–13 | F | 47 | 2.20 (0.35) | .83, P < .01 | 47 (5) | −.69 P < .01 | b = 0.64 ± 0.06* | ns |

18 (34) | 13–14 | F | 38 | 2.32 (0.34) | .69, P < .01 | 41 (5) | −.39, P < .02 | b = 0.68 ± 0.12* | ns |

19 (3) | 13–14 | F | 29 | 2.35 (0.28) | .58, P < .01 | 46 (5) | ns | b = 0.75 ± 0.19 | ns |

20 (3) | 16–18 | F | 25 | 2.59 (0.32) | .69, P < .01 | 44 (5) | −.48, P < .02 | b = 0.61 ± 0.15* | ns |

*Exponent is significantly (*P* < .05) different from *b* = 1.0. ns signifies not significant (*P* > .05).

Although in this paper we focus on the interpretation of TM-determined peak

For complete details of our exercise protocols and measurement procedures over this period, interested readers are referred to previous publications (eg, 6–8). To briefly summarize, all TM peak

We have collated and reanalyzed data from all of these studies to examine and illustrate the limitations of simple ratio scaling, but no additional data “cleaning” has been undertaken. If we accepted the child’s effort as maximal in the original study and entered their data into the Centre’s database then they were included in the reanalyzes presented here.

## Why Do We Need to Scale?

Measures of exercise performance such as peak ^{−1}) and body mass for boys and girls respectively. Within the 20 individual cross-sectional groups of young people aged 9–18 years, we identified significant (*P* < .05) correlation coefficients (Pearson’s *r*) ranging from .58 to .84 with mean values of *r* = .73 for boys and *r* = .75 for girls. Individual group correlation coefficients are also summarized in Table 1.

The data in Figure 1 confirm that the relationship between body mass and peak

## When Can the Ratio Standard Be Used: Testing Assumptions

Simple division by body mass is the usual approach to normalizing data for body size. However, this measure is crude at best. We know that additional factors affect peak

As both Tanner (36) and Katch (20) highlighted, subsequently reiterated by contemporary authors (26,38), application of the ratio standard assumes an underlying set of specific statistical assumptions. We have seen that assumption 1, that of a perfect correlation (*r* = 1.0) between variables is not valid, and we go on to examine further assumptions in the following sections.

### Assumption 2

Expressing peak ^{−1}·min^{−1} removes the influence of body mass to enable valid comparison of individuals or groups.

### Evidence

If the simple ratio, peak ^{−1}·min^{−1}, should not remain significantly correlated with body mass. Table 1 includes correlation coefficients describing this relationship for the individual study groups. In all but 3 of the 20 studies analyzed, a statistically significant (*P* < .05) negative relationship remained with coefficients ranging from *r* = −.37 to *r* = −.87. Computing this relationship offers the researcher a simple technique by which to verify if ratio scaling has removed the influence of body mass.

### Assumption 3

The relationship between peak *b*·body mass.

### Evidence

In statistical terms, when we express peak ^{−1}·min^{−1} we are making the, usually untested, assumption that the line of best fit is one that extends from 0 (the origin) through the point at which the mean values for body mass and peak *b* · body mass, where *b* is the slope of the line.

We illustrate this in Figure 2 using as an example the data from study number 13 in Table 1 (44), comprising n = 54, 9- to 10-year old girls. The dashed line is the line assumed by the ratio standard but a visual inspection of the data suggests that this is not the best fit, and the data would be better represented by a line which *intercepts with the y axis somewhat above 0*. Rather than assuming the existence of a statistical relationship, computing a linear regression with peak

Although this analysis allows for individual mass-regressed scores for peak

This phenomenon is relatively straightforward to rectify by fitting the linear regression line to data after they have been natural log-transformed:

^{−1}·min

^{−1}, because the per-body-mass ratio is an allometric relationship where the

*b*exponent equals 1.0, that is, peak

^{1.0}.

### Assumption 4

The relationship between body mass and peak *b*) = 1.0 (and ratio scaling applies).

### Evidence

Table 1 shows the values for the mass exponents plus or minus SE computed for the individual groups that comprise our cross-sectional data set. We also indicate the extent of 2 × SE approximating the 95% confidence intervals. If the confidence intervals exclude 1.0, we can state that the exponent is significantly different from that assumed by the ratio standard. Of the 20 data sets analyzed, in only 3 cases was the mass exponent not significantly different from 1.0. Therefore, in 20 years of measuring peak *not* support the assumption that the relationship between body mass and peak

As we described previously, a simple way to check whether a computed ratio has removed the influence of the size variable is to correlate that ratio with the size variable. The mass exponent derived in the log-linear regression described previously can be used to compute a ratio by dividing peak *b* exponent, for example, for study 2 in Table 1 (4) this would be mass^{0.64}. The resulting ratio would be in mL·kg^{−0.64}·min^{−1}. We computed these allometrically adjusted ratios using the group-specific mass exponents and correlated them with body mass. The resulting coefficients are presented in Table 1. In all cases, these were nonsignificant (*P* > .05) confirming that this form of adjustment successfully removed the influence of body mass.

To summarize so far, using our data sets as originally collected, we have demonstrated that:

- (1)Peak
$\dot{\mathrm{V}}{\mathrm{O}}_{2}$ is significantly correlated with body mass with the strength of that relationship, in groups of similar chronological age, comparable to the strength of the relationship observed in adults (*r*∼ .75). - (2)The traditional means of controlling for individual and group differences in body mass by computing the simple ratio standard (mL·kg
^{−1}·min^{−1}) does not remove the influence of body mass if the ratio remains significantly, negatively correlated with body mass. - (3)Data are not best fit by the simple linear relationship that underpins ratio scaling, but a log-linear regression is needed to accommodate the spread that is typically evident in size-related exercise data. Using this analysis, mass exponents were, in all but 3 cases, significantly different from the value of 1.0 assumed by the ratio standard.
- (4)Ratios computed using the mass exponents derived from the log-linear regression analyses successfully remove the effect of body mass as evidenced by no residual correlation with body mass.

## Variation in Mass Exponents

Table 1 summarizes a large series of cross-sectional data on TM-determined peak

We would not wish to overinterpret the individual values for exponents obtained, but it is worth highlighting what factors might underlie the variation observed. For example, in the groups of 9- to 10-year-old boys (studies 2–5 in Table 1 [4,35,45,47]), mass exponents ranged from 0.37 to 0.64. Sample size and composition are likely to play key roles in this variation, particularly as they were volunteer groups. Clearly, there are differences in fitness between the groups, and this is underpinned by a large spread in body size even within this predominantly prepubertal population—a fact which Tanner (36) specifically highlighted when cautioning against the uncritical adoption of ratio standards for basal metabolism in children “… the variability of weight and surface area for some single-year age groups considerably exceeds that of adults.” (p. 9). In study 5 (45), in Table 1 for example, body mass ranged from under 25 to 58 kg and included 4 overweight (15) boys. This variation in body size, unsurprisingly, becomes more noted in the age groups associated with peak pubertal growth. It is likely that a combination of these factors means that mass exponents will, as demonstrated in our data set, be sample specific.

It is interesting to consider in more detail one example where the log-linear derived mass exponent appeared to offer no significant improvement to traditional ratio scaling (study 1 in Table 1 [8]). In 1991, in what was our initial venture into alternative scaling methods, we presented data on 10- and 15-year-old boys showing a positive effect for age or maturation on peak *b* = 0.74, SE = 0.07, a value now significantly different from 1.0.

In a longitudinal study of peak *P* > .05, a significant effect of fatness is observed (−.291, SE = 0.04) and the exponent for mass is increased to 1.08 (SE = 0.05).

Interestingly, an inflated exponent was not observed for the girls with a similar within-study age range (study 10, Table 1 [8]), where the mass exponent was computed as *b* = 0.61. When added to the log-linear regression analysis here, and in contrast to the boys, age did not add significantly to the model; however, the addition of skinfold thickness did yield a significant, negative exponent as also observed in our recent longitudinal analysis of peak

## Is There a Universal Alternative to Ratio Scaling?

Perhaps one of the drivers underpinning the continued use of the ratio standard is the absence of a universally applicable alternative. As ourselves and others have discussed, the mass exponent most often suggested for adoption is *b* = 0.67, a value derived from theories of geometric similarity (see 11,31). We have observed values very close to this in large sample longitudinal data, but only after adjusting for other significant covariates such as age, maturity status, and skinfold thickness (7). Although we could generalize from our cross-sectional data presented here and say that the computed mass exponents are closer to 0.67 than they are to 1.0—indeed the value of 0.67 falls within the 95% confidence intervals of all but 3 of the 20 exponents computed in Table 1—we do not recommend the universal adoption of this as a value.

We have seen that sample size and composition affects the value of computed exponents, and we know that the effects of age and maturity status may need to be considered, particularly in boys. In addition, from our own and others’ analyses, we know that peak

## Does it Really Matter?

Over 30 years ago we raised concerns with the use of field tests, notably the 20-m shuttle run, to predict children’s aerobic fitness (9). Recent years, however, have seen an explosion in the use of this test with data reported on literally hundreds of thousands of children, including for those as young as 3 years old. Such tests have been recommended and accepted for inclusion in fitness test batteries for children for population level surveillance (21,33) and as the basis for classifying youth fitness, with levels of 42 and 35 mL·kg^{−1}·min^{−1} for boys and girls, respectively, identified as “Clinical Red Flags”—potentially warranting intervention (21,33).

While this explosion stems from a desire to promote youth fitness for the prevention of current and future disease, based upon relationships between ratio-scaled measures of fitness and a range of health indicators (32), our genuine concerns lie with the spurious relationships and recommendations that might potentially emerge from laboratory or field tests that *predict peak* *in mL*·*kg*^{−}* ^{1}*·

*min*

^{−}

*and adversely or inappropriately impact on the promotion of young people’s health and well-being.*

^{1}We also have serious concerns regarding the increasing use of maximal exercise testing with ratio scaling of data in the fitness assessment of children with a range of severe or potentially life-limiting diseases of, for example, the lungs (22,27) muscles (14) or heart (1,18). As we have already mentioned, not only did Tanner (36) explicitly explain why caution should be applied to the generation of per body mass standards in children, he also demonstrated that the use of such standards would lead to inappropriate conclusions when subsequently used for investigating relationships with other lifestyle, physiological, or behavioral factors. Although the statistical analyses within which the ratio-scaled and/or predicted values are being used may be somewhat more sophisticated these days, the underlying principles remain: If the use of mL·kg^{−1}·min^{−1} cannot be demonstrated to appropriately describe children’s aerobic fitness, then any comparisons, conclusions, or recommendations based upon per body mass standards are likely to be spurious.

## Conclusion

Interpretation of peak ^{−1}·min^{−1} cannot be sanctioned. We know of no other scientific discipline where an assumed relationship albeit “convenient and traditional” has become the acceptable alternative to rigorous scientific justification.

The issues we raise are not new, having been eloquently and completely described by visionary scientists since 1949, and which we initially applied to children’s data in 1991 and have continued to champion for almost 30 years. We strongly urge exercise scientists, medics, and population health experts to consider the issues raised in this paper and to use the simple statistical techniques suggested to check whether a simple per body mass ratio is justified in all studies of children’s fitness. Tanner’s “tongue in cheek” comment from nearly 70 years ago still resonates today … rather than raising a “Clinical Red Flag” perhaps children with levels of fitness below the international standards recommended above should be investigated not for cardiovascular risk but for “no more formidable a disease than statistical artefact” (36, p. 3).

The authors gratefully acknowledge the support of the

## References

- 2.↑
Armstrong N, Welsman J. Sex-specific longitudinal modeling of youth peak oxygen uptake. Pediatr Exerc Sci. 2019;31. doi:10.1123/pes.2018-0175

- 3.↑
Armstrong N, Welsman J. Unpublished Studies of Peak Oxygen Uptake. Exeter, UK: Children’s Health and Exercise Research Centre, University of Exeter; 1986–2006.

- 5.↑
Armstrong N, Welsman JR. Young People and Physical Activity. Oxford, UK: Oxford University Press; 1997:6–55.

- 8.↑
Armstrong N, Williams J, Balding J, Gentle P, Kirby B. The peak oxygen uptake of British children with reference to age, sex and sexual maturity. Eur J Appl Physiol Occup Physiol. 1991;62:369–75. doi:10.1007/BF00634975

- 9.↑
Armstrong N, Williams J, Ringham D. Peak oxygen uptake and progressive shuttle-run performance in boys aged 11–14 years. Br J Phys Educ Res Suppl. 1988;4:10–11.

- 10.↑
Åstrand PO. Experimental Studies of Physical Working Capacity in Relation to Sex and Age. Copenhagen, Denmark: Munksgaard; 1952.

- 16.↑
Henderson Y, Haggard HW. The maximum of human power and its fuel. Am J Physiol. 1925;72:264–82. doi:10.1152/ajplegacy.1925.72.2.264

- 17.↑
Hill AV, Lupton H. Muscular exercise, lactic acid, and the supply and utilization of oxygen. Q J Med. 1923;os-16:135–71. doi:10.1093/qjmed/os-16.62.135

- 19.↑
Huxley JS. On the relation between egg-weight and body-weight in birds. Zool J Linn Soc. 1927;36:457–66. doi:10.1111/j.1096-3642.1927.tb02180.x

- 22.↑
Madsen A, Green K, Buchvald F, Hanel B, Neilsen KG. Aerobic fitness in children and young adults with primary ciliary dyskinesia. PLoS ONE. 2013;8(8):71409. doi:10.1371/journal.pone.0071409

- 27.↑
Radke T, Nevitt SJ, Hebestreit H, Kriemler S. Physical exercise training for cystic fibrosis. Cochrane Database Syst Rev. 2017;11:CD002768. doi:10.1002/14651858.CD002768.pub4

- 28.↑
Robinson S. Experimental studies of physical fitness in relation to age. Arbeitsphysiologie. 1938;10:251–323.

- 29.↑
Robinson S, Edwards HT, Dill DB. New records in human power. Science. 1937;85:409–10. doi:10.1126/science.85.2208.409

- 33.↑
Ruiz JR, Cavero-Redondo I, Ortega FB, Welk GJ, Andersen LB, Martinez-Vizcaino V. Cardiorespiratory fitness cut points to avoid cardiovascular disease risk in children and adolescents; what level of fitness should raise a red flag? A systematic review and meta-analysis. Br J Sports Med. 2016;50:1451–8. doi:10.1136/bjsports-2015-095903

- 35.↑
Sutton NC. The Assessment of Children’s Anaerobic Performance. [Unpublished PhD thesis]. Exeter, UK: University of Exeter; 1999.

- 38.↑
Welsman JR, Armstrong N. Interpreting exercise performance data in relation to body size. In: Armstrong N, van Mechelen W, eds. Paediatric Exercise Science and Medicine. 2nd ed. Oxford, UK: Oxford University Press; 2008:13–21.

- 39.↑
Welsman JR, Armstrong N. Interpreting exercise performance data in relation to body size. In: Armstrong N, van Mechelen W, eds. Textbook of Paediatric Exercise Science and Medicine. Oxford, UK: Oxford University Press; 2000:3–9.

- 40.↑
Welsman JR, Armstrong N. Interpreting performance in relation to body size. In: Armstrong N, ed. Paediatric Exercise Physiology. Edinburgh, UK: Churchill Livingstone; 2007:27–46.

- 41.↑
Welsman JR, Armstrong N. Scaling for size: relevance to understanding the effects of growth on performance. In: Hebestreit H, Bar-Or O, eds. The Young Athlete. Oxford, UK: Blackwell; 2008:50–62.

- 42.↑
Welsman JR, Armstrong N. Statistical techniques for interpreting body size-related exercise performance during growth. Pediatr Exerc Sci. 2000;12:112–27. doi:10.1123/pes.12.2.112

- 44.↑
Welsman JR, Armstrong N, Withers S. Responses of young girls to two modes of aerobic training. Br J Sports Med. 1997;31:139–42. doi:10.1136/bjsm.31.2.139

- 46.↑
Williams JR, Armstrong N, Winter EM, Crichton N. Changes in peak oxygen uptake with age and sexual maturation in boys: physiological fact or statistical anomaly? In: Coudert J, van Praagh E, eds. Children and Exercise XVI. Paris, France: Masson; 1992:35–7.

- 47.↑
Winsley R, Armstrong N, Welsman J. Leg volume is not related to peak oxygen uptake in 9-year-old boys. In: Ring FJ, ed. Children in Sport. Bath, UK: Centre for Continuing Education; 1995:70–6.

- 48.↑
Winter EM. Importance and principles of scaling for size differences. In: Bar-Or O, ed. The Child and Adolescent Athlete. Oxford, UK: Blackwell Science; 1996:673–9.

- 49.↑
Winter EM. Scaling: partitioning out differences in size. Pediatr Exerc Sci. 1992;4:296–301. doi:10.1123/pes.4.4.296