Utilities for computing pediatric BMI percentiles, Z-scores, and related tools

The CDC provides a SAS macro for computing BMI percentiles and Z-scores. The function ext_bmiz(), included in growthcleanr, provides an equivalent feature. ext_bmiz() calculates the sigma (scale parameter for the half-normal distribution, extended BMI percentile), extended BMIz, and the CDC LMS z-scores for weight, height, and BMI. Note that for BMIs ≤ 95th percentile of the CDC growth charts, the extended values for BMI are equal to the LMS values. The extended values differ only for children who have a BMI > 95th percentile.

The function assumes a variable ‘sex’ and variables for age in months, weight (kg), height (cm), and BMI (weight/ht2). Please be careful with age - the units should be months and use the most accurate information available (e.g., 23.4928 months. The extended BMIz is the inverse cumulative distribution function (CDF) of the extended BMI percentile. If the extended percentile is very close to 100, the qnorm function in R produces an infinite value. The occurs only if the extended BMI percentile is > 99.99999999999999. This occurs infrequently, such as a 48-month-old with a BMI > 39, and it is likely that these BMIs represent data entry errors. For these cases, extended BMIz is set to 8.21, a value that is slightly greater than the largest value that can be calculated.

The longwide() function provides a convenient way to prepare data for use with ext_bmiz(). The example in the next section shows a potential workflow for taking the “long” (one observation per row) data from cleangrowth() and converting it to the “wide” (height and weight on one line) format required by ext_bmiz(). An additional function recode_sex() supports recoding coded values for sex from one value set to another.

The CDC SAS macro was updated in December 2022, according to the findings of this NCHS report. The ext_bmiz() function has been updated to match it as of growthcleanr v2.1.0.

Converting long growthcleanr data to wide format w/BMI

Because ext_bmiz() performs cross-sectional analysis of BMI, observation data must be in a wide format, i.e. with height and weight information on the same row. This is distinct from cleangrowth(), which performs longitudinal analysis on all observations for each subject, presented in a long format with one observation per row. To facilitate use of both functions, growthcleanr includes utility functions to transform data used with cleangrowth() for use with ext_bmiz(). They are optimized to move data directly from the output of cleangrowth() into input for ext_bmiz(), but have options to support independent use as well.

Using the syngrowth example dataset, to convert the data after it has been cleaned by cleangrowth() for use with ext_bmiz(), use longwide() and simple_bmi():

# Use the built-in utility function to convert the input observations to wide
# format for BMI calculation
cleaned_data_wide <- longwide(cleaned_data)

# Compute simple BMI values (adds column "bmi")
cleaned_data_bmi <- simple_bmi(cleaned_data_wide)

# Compute Z-scores and percentiles
cleaned_data_bmiz <- ext_bmiz(cleaned_data_bmi)

Note that this assumes that cleaned_data has the same structure as described in Quickstart - Data preparation:

names(cleaned_data)
[1] "id"          "subjid"      "sex"         "agedays"     "param"       "measurement" "gcr_result"

The wide dataset cleaned_data_wide will include rows with aligned height and weight measurements drawn from the observations in cleaned_data marked by cleangrowth() for inclusion. As such, it will be a shorter dataset (fewer rows) based on fewer observations.

dim(cleaned_data)
[1] 85728     7

dim(cleaned_data_wide)
[1] 26701     9

head(cleaned_data_wide)
                                subjid    agey     agem sex   wt wt_id    ht ht_id  agedays
1 002986c5-354d-bb9d-c180-4ce26813ca28 56.0964 673.1568   2 71.7 83331 151.1 83330 20489.22
2 002986c5-354d-bb9d-c180-4ce26813ca28 57.1122 685.3464   2 73.2 83333 151.1 83332 20860.22
3 002986c5-354d-bb9d-c180-4ce26813ca28 58.1279 697.5348   2 74.6 83336 151.1 83335 21231.22
4 002986c5-354d-bb9d-c180-4ce26813ca28 59.1437 709.7244   2 72.8 83338 151.1 83337 21602.22
5 002986c5-354d-bb9d-c180-4ce26813ca28 59.2012 710.4144   2 72.4 83340 151.1 83339 21623.22
6 002986c5-354d-bb9d-c180-4ce26813ca28 60.1594 721.9128   2 69.4 83343 151.1 83342 21973.22

In this example, the subject identifiers previously marked as subjid are now in the id column; individual identifiers for observations of a single parameter are not present.

longwide() can be called with name mapping parameters if your input set uses different column names. For example, if my_cleaned_data specifies age in days as aged and parameter type as type, specify each, with quotes:

head(my_cleaned_data)
     id subjid sex    aged     type measurement                  gcr_result
1: 1510 775155   0     889 HEIGHTCM       84.90 Exclude-Extraneous-Same-Day
2: 1511 775155   0     889 HEIGHTCM       89.06                     Include
3: 1518 775155   0     889 WEIGHTKG       13.10                     Include
4: 1512 775155   0    1071 HEIGHTCM       92.50                     Include
5: 1519 775155   0    1071 WEIGHTKG       14.70                     Include
6: 1513 775155   0    1253 HEIGHTCM       96.20                     Include
longwide(my_cleaned_data, agedays = "aged", param = "type")

By default, longwide() will only transform records flagged by cleangrowth() for inclusion. To include more categories assigned by cleangrowth(), use the inclusion_types option. For example, to include carried forward values along with included records for the BMI calculation:

cleaned_data_wide_cf <- longwide(
  long_df = cleaned_data,
  inclusion_types=c("Include", "Exclude-Carried-Forward")
)

Another option, include_all, set to FALSE by default, will include all observations for transformation. Additional options provide flexibility to preserve additional columns and unmatched observation rows.

See longwide() for full details.

With wide data in hand, output taken directly from longwide() can have BMI added with simple_bmi(), and then the output can be passed to ext_bmiz(), as shown in the simple example above. Alternatively, you can provide a similarly formatted data frame directly to ext_bmiz().

Recoding sex values

Note that ext_bmiz() allows for the sex variable to be coded using a range of possible values, but not the same 0 and 1 values as cleangrowth(). This difference from the growthcleanr data preparation specification sustains compatibility with the CDC SAS macro. The longwide() function will handle this conversion from growthcleanr’s 0 (male) or 1 (female), but not from other coded values.

If you are using input data with different value codes for sex with ext_bmiz(), use recode_sex() to ensure your values are recoded first. For example, if you have data in the PCORnet CDM format (using M and F), and want to prepare it for ext_bmiz():

recode_sex(
  input_data = cdm_formatted,
  sourcecol = "sex",
  sourcem = "M",
  sourcef = "F",
  targetm = 1L,
  targetf = 2L
)

recode_sex() can also be used for other purposes, such as recoding values in preparation for cleaning with cleangrowth(), or transforming growthcleanr output to match external specifications.

Computing BMI percentiles and Z-scores

With data in wide format with BMI, and with sex values properly coded (as any of ‘1’, ‘b’, ‘B’, ‘Boys’, ‘m’, ‘M’, ‘male’, or ‘Male’ for male subjects and any of ‘2’, ‘g’, ‘G’, ‘Girls’, ‘f’, ‘F’, ‘female’, or ‘Female’ for female subjects), ext_bmiz() can be called:

cleaned_data_bmiz <- ext_bmiz(cleaned_data_bmi)
head(cleaned_data_bmiz)
                                 subjid    agey      age   sex    wt wt_id    ht ht_id agedays      bmi      bmiz
                                 <char>   <num>    <num> <int> <num> <int> <num> <int>   <int>    <num>     <num>
1: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 10.0233 120.2796     2  35.4    17 141.6    15    3661 17.65537 0.3236612
2: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 11.0390 132.4680     2  39.2    19 147.9    18    4032 17.92048 0.1734315
3: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 12.0548 144.6576     2  44.8    21 155.1    20    4403 18.62320 0.1832443
4: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 12.5914 151.0968     2  47.8    23 158.7    22    4599 18.97903 0.1829183
5: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad 13.0705 156.8460     2  50.5    26 160.8    24    4774 19.53077 0.2586449
6: 001aa16d-bf0e-a077-3b3d-5ab8b58545ad  3.9288  47.1456     2  16.6     2 102.6     1    1435 15.76933 0.3453978
       bmip       waz       wp       haz       hp      p95      p97   bmip95   mod_bmiz   mod_waz   mod_haz
      <num>     <num>    <num>     <num>    <num>    <num>    <num>    <num>      <num>     <num>     <num>
1: 62.69027 0.3498817 63.67862 0.5140553 69.63933 22.96109 24.57902 76.89254 0.18485501 0.2399201 0.5008944
2: 56.88438 0.2311645 59.14065 0.5002225 69.15408 24.13836 25.90525 74.24067 0.09543668 0.1556259 0.4955022
3: 57.26968 0.3237374 62.69316 0.4803298 68.45036 25.26981 27.17179 73.69745 0.10065274 0.2201655 0.4855100
4: 57.25689 0.3812700 64.84986 0.5153244 69.68368 25.83904 27.80781 73.45100 0.10011666 0.2596479 0.5212281
5: 60.20454 0.4440465 67.14955 0.4849566 68.61464 26.32757 28.35405 74.18371 0.14353771 0.3024342 0.4885546
6: 63.51023 0.4369963 66.89430 0.5348018 70.36065 18.02950 18.60078 87.46409 0.24462581 0.3332011 0.5224816
      sigma original_bmip original_bmiz sev_obese obese
      <num>         <num>         <num>     <int> <int>
1: 4.443536      62.69027     0.3236612         0     0
2: 4.797031      56.88438     0.1734315         0     0
3: 5.148292      57.26968     0.1832443         0     0
4: 5.332930      57.25689     0.1829183         0     0
5: 5.497248      60.20454     0.2586449         0     0
6: 2.274792      63.51023     0.3453978         0     0

The output columns include:

variable	description
bmi	BMI
bmiz, bmip	LMS / Extended z-score and percentile
waz, wp	LMS Weight-for-sex/age z-score and percentile
haz, hp	LMS Height-for-sex/age z-score and percentile
p95, p97	95th and 97th percentile of BMI in growth charts
bmip95	BMI expressed as a percentage of the 95th percentile. A value ≥ 120 is widely used as the cut point for severe obesity.
mod_bmiz, mod_waz, mod_haz	Modified BMI-for-age, Weight-for-age, and Height-for-age z-scores for identifying outliers (see the information in the CDC SAS growth charts program website)
sigma	Scale parameter for half-normal distribution
original_bmiz, original_bmip	LMS BMI-for-sex/age z-score and percentile
sev_obese	BMI >= 120% of 95th percentile (0/1)
obese	BMI >= 95th percentile (0/1)

For convenience, these labels are available on the output of ext_bmiz(), e.g., when viewed in RStudio with View(cleaned_data_bmi).

Like longwide(), ext_bmiz() also includes options for mapping alternate column names, for age, weight, height, and BMI. The default column names are the same as the output from longwide() for convenience. If you have different column names, specify the column names without quotes. For example, for a dataset using “heightcm” and “weightkg” instead of “ht” and “wt”:

my_cleaned_data_bmiz <- ext_bmiz(my_cleaned_data_wide_bmi, ht = heightcm, wt = weightkg)

For ext_bmiz(), use the most precise age in months available. If an input dataset only has age in months as integer values, by default ext_bmiz() will automatically convert these to double values and add 0.5 to account for the distribution of actual ages over the range of days within a month. This is enabled with the option adjust.integer.age, set to TRUE by default. Specify FALSE to disable.

my_cleaned_data_bmi <- ext_bmiz(my_cleaned_data_wide, adjust.integer.age = FALSE)

ext_bmiz() uses reference data provided by the CDC, included in the growthcleanr package as inst/extdata/CDCref_d.csv. This file is automatically loaded and used by default. If you are working with a different reference dataset or developing the growthcleanr package, specify an alternate path to this file with ref.data.path, as for cleangrowth().

The CDC provides a SAS Program for the 2000 CDC Growth Charts which can also be used to identify biologically implausible values using a different approach, as also implemented for growthcleanr in the function ext_bmiz(), described above. The SAS program was updated in December, 2022, according to the findings of this NCHS report, and ext_bmiz() has been updated to match it as of growthcleanr v2.1.0.

GrowthViz provides insights into how growthcleanr assesses data, packaged in a Jupyter notebook. It ships with the same syngrowth synthetic example dataset as growthcleanr, with cleaning results included.

2026-02-20

Converting long growthcleanr data to wide format w/BMI

Recoding sex values

Computing BMI percentiles and Z-scores

Related tools