NHANES: Reliability Standards

Introduction

This tutorial reproduces the key tables from the Flegal et al. (2016) article. The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate NCHS/CDC reliability standards to ensure estimates are statistically defensible.

Setup and Data Preparation

This first section prepares the data for analysis. The key steps are:

  • Load Data: The Flegal2016.RData file containing the necessary NHANES data is loaded.
  • Define Analytic Sample: An analytic_design object is created. This sample of N=5,455 matches the total number of participants reported in the paper for the 2013–2014 cycle.
  • Recode Variables: All variables needed for the analysis (e.g., Age, race, obese, smoking, education) are created and categorized to match the definitions used by Flegal et al.
  • Create Survey Design Object: A svydesign object is created to account for survey design (weights, strata, clusters).
# --- Load Libraries ---
library(survey)
library(dplyr)
library(knitr)
library(car)
library(tidyr)
library(Publish)
library(stringr)
library(kableExtra) 
# install.packages("devtools")
# devtools::install_github("ehsanx/svyTable1", 
#                          build_vignettes = TRUE)
library(svyTable1)

# --- Load Data ---
load("Data/surveydata/Flegal2016.RData")

# --- Create Analytic Sample (N=5,455) ---
dat.analytic <- subset(dat.full, RIDAGEYR >= 20)
dat.analytic <- subset(dat.analytic, !is.na(BMXBMI))
dat.analytic <- subset(dat.analytic, is.na(RIDEXPRG) | RIDEXPRG != "Yes, positive lab pregnancy test")

# --- Create ALL Analysis Variables in the Full Dataset ---
dat.full <- dat.full %>%
  mutate(
    Age = cut(RIDAGEYR,
              breaks = c(20, 40, 60, Inf),
              right = FALSE,
              labels = c("20-39", "40-59", ">=60")),
    
    race = case_when(
      RIDRETH3 == "Non-Hispanic White" ~ "Non-Hispanic white",
      RIDRETH3 == "Non-Hispanic Black" ~ "Non-Hispanic black",
      RIDRETH3 == "Non-Hispanic Asian" ~ "Non-Hispanic Asian",
      RIDRETH3 %in% c("Mexican American", "Other Hispanic") ~ "Hispanic",
      RIDRETH3 == "Other Race - Including Multi-Rac" ~ "Other",
      TRUE ~ NA_character_
    ),
    race = factor(race, levels = c("Non-Hispanic white", "Non-Hispanic black", "Non-Hispanic Asian", "Hispanic", "Other")),
    
    gender = factor(RIAGENDR, labels = c("Male", "Female")),
    
    obese = factor(ifelse(BMXBMI >= 30, "Yes", "No"), levels = c("No", "Yes")),
    obese_class3 = factor(ifelse(BMXBMI >= 40, "Yes", "No"), levels = c("No", "Yes")),
    
    smoking = case_when(
      SMQ020 == "No" ~ "Never smoker",
      SMQ020 == "Yes" & SMQ040 == "Not at all" ~ "Former smoker",
      SMQ020 == "Yes" ~ "Current smoker",
      TRUE ~ NA_character_
    ),
    smoking = factor(smoking, levels = c("Never smoker", "Former smoker", "Current smoker")),

    education = case_when(
        DMDEDUC2 %in% c("Less than 9th grade", "9-11th grade (Includes 12th grad)") ~ "<High school",
        DMDEDUC2 == "High school graduate/GED or equi" ~ "High school",
        DMDEDUC2 %in% c("Some college or AA degree", "College graduate or above") ~ ">High school",
        TRUE ~ NA_character_
    ),
    education = factor(education, levels = c("High school", "<High school", ">High school"))
  )

# --- Create Survey Design Object ---
options(survey.lonely.psu = "adjust")
svy.design.full <- svydesign(id = ~SDMVPSU, 
                             strata = ~SDMVSTRA, 
                             weights = ~WTINT2YR, 
                             nest = TRUE, 
                             data = dat.full)
analytic_design <- subset(svy.design.full, SEQN %in% dat.analytic$SEQN)

Reproducing Table 1

This section reproduces the unweighted sample sizes shown in Flegal et al.’s Table 1. The code first generates separate summary tables for all participants, men, and women. It then performs several formatting steps to combine these into a single table.

What svytable1 Does

The svytable1 function creates a descriptive summary table—commonly referred to as a “Table 1”—from complex survey data. It is specifically designed to produce publication-ready results that align with NCHS Data Presentation Standards for reliability.

Key svytable1 Operations

When you call svytable1 (link), it performs the following steps for each analysis (for example, for all participants, men, and women):

  1. Calculates Proportions
    It summarizes categorical variables (like Age) by calculating the proportion of participants in each category (e.g., 20–39, 40–59, ≥60).

  2. Stratifies by a Grouping Variable
    Results are calculated separately by levels of the strata_var (for example, race), creating side-by-side columns.

  3. Displays Mixed Mode Results
    Because mode = "mixed", each cell shows both:

    • The unweighted sample count (N)
    • The weighted percentage (%), which represents the population estimate.
  4. Performs Reliability Checks
    When reliability_checks = TRUE, the function evaluates each estimate against NCHS Data Presentation Standards. These checks prevent publication of unstable or statistically unreliable estimates.

What the Asterisk (*) Means

An asterisk (*) in these tables output indicate suppression: the estimate was determined to be statistically unreliable. The function hides the unreliable value to avoid misinterpretation.

NCHS reliability rules (for proportions)

Estimates are suppressed if they fail one or more NCHS reliability rules (for proportions):

  • fail_n_30: The unweighted sample size (n) is fewer than 30 participants.
  • fail_eff_n_30: The effective sample size (adjusted for design effects) is less than 30.
  • fail_df_8: The design degrees of freedom (df) are fewer than 8.
  • fail_ciw_30: The absolute confidence interval width is ≥ 30 percentage points.
  • fail_rciw_130: The relative CI width is greater than 130%.
    Each of these flags indicates limited precision or instability in the estimate.

In the output, the asterisks appear in the “Other” race column for certain age groups (such as “40–59” and “≥60”).
This happens because the number of participants in those cells is very small, producing unstable or wide confidence intervals. Thus, the function correctly replaces the unreliable estimates with *, ensuring your published results remain statistically defensible and transparent.

Reliability Metrics Table

In addition to the detailed checks for proportions, the svytable1 function also assesses the reliability of means for numeric variables. For these estimates, it applies the standard NCHS recommendation, which uses the Relative Standard Error (RSE). If a mean’s RSE is 30% or greater, it is considered statistically unreliable and will be suppressed with an asterisk (*) in the formatted table.

The $reliability_metrics table will be printed with the output if you select return_metrics = TRUE which will include rows for each mean, reporting the calculated RSE and the outcome of this check in the fail_rse_30 column.

# View reliability_metrics
table1_svy <- svytable1(
  design = analytic_design, strata_var = "race", 
  table_vars = "Age", mode = "mixed",
  reliability_checks = TRUE,
  return_metrics = TRUE
)
table1_svy$reliability_metrics[table1_svy$reliability_metrics$suppressed == TRUE, ]

Summary tables for each group

# --- Create the summary tables for each group ---
table1_svy_all <- svytable1(
  design = analytic_design, strata_var = "race", 
  table_vars = "Age", mode = "mixed",
  reliability_checks = TRUE
) %>%
  mutate(Variable = dplyr::recode(Variable, "Age" = "Age Groups"))
kable(table1_svy_all)
Variable Level Overall Non-Hispanic white Non-Hispanic black Non-Hispanic Asian Hispanic Other
n 5,455 2,343 1,115 623 1,214 160
Age Groups
20-39 1,810 (35.5%) 734 (30.8%) 362 (38.9%) 216 (40.2%) 412 (49.6%) 86 (49.2%)
40-59 1,896 (37.5%) 759 (37.5%) 383 (39.3%) 251 (38.5%) 449 (35.6%) *
>=60 1,749 (27.0%) 850 (31.7%) 370 (21.8%) 156 (21.2%) 353 (14.8%) *

male_design <- subset(analytic_design, gender == "Male")
table1_svy_men <- svytable1(
  design = male_design, strata_var = "race", 
  table_vars = "Age", mode = "mixed",
  reliability_checks = TRUE
) %>%
  mutate(Variable = dplyr::recode(Variable, "Age" = "Age group"))
kable(table1_svy_men)
Variable Level Overall Non-Hispanic white Non-Hispanic black Non-Hispanic Asian Hispanic Other
n 2,638 1,130 556 300 573 79
Age group
20-39 909 (37.2%) 386 (32.3%) 182 (41.3%) 106 (41.7%) 189 (51.6%) 46 (52.9%)
40-59 897 (37.6%) 360 (37.9%) 179 (39.0%) 120 (38.5%) 215 (35.1%) *
>=60 832 (25.1%) 384 (29.8%) 195 (19.7%) 74 (19.8%) 169 (13.2%) *

female_design <- subset(analytic_design, gender == "Female")
table1_svy_women <- svytable1(
  design = female_design, strata_var = "race", 
  table_vars = "Age", mode = "mixed",
  reliability_checks = TRUE
) %>%
  mutate(Variable = dplyr::recode(Variable, "Age" = "Age group"))
kable(table1_svy_women)
Variable Level Overall Non-Hispanic white Non-Hispanic black Non-Hispanic Asian Hispanic Other
n 2,817 1,213 559 323 641 81
Age group
20-39 901 (33.8%) 348 (29.4%) 180 (36.9%) 110 (39.0%) 223 (47.6%) *
40-59 999 (37.4%) 399 (37.1%) 204 (39.5%) 131 (38.6%) 234 (36.0%) *
>=60 917 (28.8%) 466 (33.5%) 175 (23.6%) 82 (22.5%) 184 (16.4%) *

Format and combine the tables

# --- Format and combine the tables ---
table_all_formatted <- table1_svy_all %>%
  select(
    `Age Groups,y` = Level, `All Groups` = Overall,
    `White` = `Non-Hispanic white`, `Black` = `Non-Hispanic black`,
    `Asian` = `Non-Hispanic Asian`, `Hispanic`
  ) %>%
  slice(3:5) 

table_men_formatted <- table1_svy_men %>%
  select(
    `Age Groups,y` = Level, `All Groups` = Overall,
    `White` = `Non-Hispanic white`, `Black` = `Non-Hispanic black`,
    `Asian` = `Non-Hispanic Asian`, `Hispanic`
  ) %>%
  slice(3:5)

table_women_formatted <- table1_svy_women %>%
  select(
    `Age Groups,y` = Level, `All Groups` = Overall,
    `White` = `Non-Hispanic white`, `Black` = `Non-Hispanic black`,
    `Asian` = `Non-Hispanic Asian`, `Hispanic`
  ) %>%
  slice(3:5) 

final_table_data <- bind_rows(
  table_all_formatted,
  table_men_formatted,
  table_women_formatted
)

# --- Render the final, publication-quality table ---
final_table_data %>%
  kable(caption = "Characteristics by Group and Race/Hispanic Origin", align = "lrrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "bordered"), full_width = FALSE) %>%
  add_header_above(c(" " = 1, "All Groups" = 1, "Race/Ethnicity" = 4)) %>%
  kableExtra::group_rows(index = c("All participants" = 3,
                                   "Men" = 3,
                                   "Women" = 3))
Characteristics by Group and Race/Hispanic Origin
All Groups
Race/Ethnicity
Age Groups,y All Groups White Black Asian Hispanic
All participants
20-39 1,810 (35.5%) 734 (30.8%) 362 (38.9%) 216 (40.2%) 412 (49.6%)
40-59 1,896 (37.5%) 759 (37.5%) 383 (39.3%) 251 (38.5%) 449 (35.6%)
>=60 1,749 (27.0%) 850 (31.7%) 370 (21.8%) 156 (21.2%) 353 (14.8%)
Men
20-39 909 (37.2%) 386 (32.3%) 182 (41.3%) 106 (41.7%) 189 (51.6%)
40-59 897 (37.6%) 360 (37.9%) 179 (39.0%) 120 (38.5%) 215 (35.1%)
>=60 832 (25.1%) 384 (29.8%) 195 (19.7%) 74 (19.8%) 169 (13.2%)
Women
20-39 901 (33.8%) 348 (29.4%) 180 (36.9%) 110 (39.0%) 223 (47.6%)
40-59 999 (37.4%) 399 (37.1%) 204 (39.5%) 131 (38.6%) 234 (36.0%)
>=60 917 (28.8%) 466 (33.5%) 175 (23.6%) 82 (22.5%) 184 (16.4%)

Reproducing Table 3

This section reproduces the weighted logistic regression models from Flegal et al.’s Table 3. Four separate models are fit: one for obesity and one for class 3 obesity, each stratified by gender. We extend the replication by incorporating NCHS/CDC reliability checks using relative standard error (RSE) for regression coefficients. Per NCHS practices, coefficients with RSE ≥ 30% are considered unreliable and flagged with an asterisk (*).

# --- Helper function to format the output of publish() ---
format_publish_output <- function(publish_object, new_col_name) {
  
  # Extract the clean regression table from the publish object
  df <- publish_object$regressionTable
  
  # Combine OR and CI into one string, handling reference groups
  df$result <- ifelse(
    df$OddsRatio == "Ref", 
    "1 [Reference]",
    # Combine the OR and CI, removing the brackets from the CI string and replacing the semicolon
    paste0(df$OddsRatio, " (", str_replace(str_remove_all(df$CI.95, "\\[|\\]"), ";", "-"), ")")
  )
  
  # Return a clean tibble with just the variable level and the formatted result
  tibble(term = df$Units, !!new_col_name := df$result)
}

# --- Create complete-case survey designs for each gender ---
male_design_complete <- subset(analytic_design, gender == "Male" & !is.na(smoking) & !is.na(education))
female_design_complete <- subset(analytic_design, gender == "Female" & !is.na(smoking) & !is.na(education))

Fit the Models

# --- Fit all four models ---
fit_men_obese <- svyglm(I(obese == "Yes") ~ Age + race + smoking + education, design = male_design_complete, family = binomial())
# print(summary(fit_men_obese))
fit_men_obese3 <- svyglm(I(obese_class3 == "Yes") ~ Age + race + smoking + education, design = male_design_complete, family = binomial())
fit_women_obese <- svyglm(I(obese == "Yes") ~ Age + race + smoking + education, design = female_design_complete, family = binomial())
fit_women_obese3 <- svyglm(I(obese_class3 == "Yes") ~ Age + race + smoking + education, design = female_design_complete, family = binomial())

Reliability check for regression coefficients

The National Center for Health Statistics (NCHS) and the Centers for Disease Control and Prevention (CDC) do not recommend using the relative standard error (RSE) as a primary reliability check for regression coefficients. Their guidelines focus on other types of estimates.

  • For means and totals, the RSE is a commonly used metric.
  • For proportions, while the RSE was used in the past, the NCHS has transitioned to a more sophisticated, multi-step approach. This newer method considers a minimum number of events or a minimum denominator sample size, and the width of the confidence interval around the proportion. This change was made because the RSE can be misleading for proportions, especially those close to 0% or 100%.
  • There are no well-established NCHS or CDC guidelines that advocate for the use of RSE to determine the reliability of regression coefficients. The statistical properties of regression coefficients are more complex than those of means or proportions. The reliability of a regression coefficient is influenced by a variety of factors, including the sample size, the variance of the independent variable, and the overall fit of the model. Given this complexity, a simple RSE threshold is not considered an adequate measure of reliability for regression coefficients. Instead, we assess reliability by examining several key metrics that, taken together, give us a complete picture of a coefficient’s stability and precision. The main tool for this is the svyglmdiag() functions in svyTable1 package, which we’ll use to look at:
  1. The Standard Error (SE): A direct measure of the coefficient’s precision. A smaller SE relative to its coefficient suggests a more reliable estimate.

  2. The p-value: Tells you if the coefficient is statistically distinguishable from zero. A non-significant p-value (e.g., p > 0.05) means we cannot be confident the predictor has any association with the outcome.

  3. The Confidence Interval (CI): Provides a plausible range for the true value of the coefficient. A very wide CI indicates a high degree of uncertainty and, therefore, low reliability. For logistic regression, if the CI for the odds ratio contains 1.0, the result is not statistically significant.

We will also calculate the RSE to demonstrate why it can be misleading. Finally, we’ll run a quick check for multicollinearity using the Variance Inflation Factor (VIF), as this is a common cause of unstable (unreliable) coefficients.

# Let's examine one model in detail: fit_men_obese
# The same process applies to the other three models.
# Get the standard model summary and confidence intervals
diagnostics_table <- svyglmdiag(fit_men_obese)
knitr::kable(
  diagnostics_table,
  caption = "Table 3: Reliability Diagnostics for Obesity Model Coefficients",
  digits = 3
)
Table 3: Reliability Diagnostics for Obesity Model Coefficients
Term Estimate SE p.value is_significant CI_Lower CI_Upper CI_Width RSE_percent is_rse_high
(Intercept) -0.745 0.167 0.007 TRUE -1.175 -0.315 0.860 22.451 FALSE
Age40-59 0.204 0.174 0.294 FALSE -0.243 0.651 0.895 85.311 TRUE
Age>=60 0.159 0.218 0.498 FALSE -0.401 0.719 1.119 136.779 TRUE
raceNon-Hispanic black 0.279 0.136 0.096 FALSE -0.071 0.630 0.701 48.796 TRUE
raceNon-Hispanic Asian -1.277 0.164 0.001 TRUE -1.697 -0.856 0.841 12.811 FALSE
raceHispanic 0.264 0.190 0.224 FALSE -0.225 0.752 0.977 72.060 TRUE
raceOther 0.158 0.322 0.644 FALSE -0.670 0.987 1.657 203.734 TRUE
smokingFormer smoker 0.207 0.140 0.198 FALSE -0.152 0.567 0.719 67.436 TRUE
smokingCurrent smoker -0.250 0.149 0.153 FALSE -0.632 0.132 0.764 59.346 TRUE
education<High school -0.528 0.154 0.019 TRUE -0.924 -0.132 0.792 29.164 FALSE
education>High school -0.013 0.147 0.932 FALSE -0.390 0.364 0.754 1109.146 TRUE

Generally, the model shows limited reliability and predictive power. Most of the predictor variables, such as Age and smoking status, are not statistically significant (their p.value is high). This indicates that, for men in this dataset, these factors don’t have a clear, reliable association with obesity.

The few significant predictors are raceNon-Hispanic Asian and education\<High school. These coefficients are considered stable and reliable. The unreliability of the other terms is not caused by the variables being correlated with each other, as the multicollinearity check shows.

The RSE Can Be Misleading for Regression: Notice that some statistically insignificant coefficients (like Age40-59 and raceHispanic) have high RSEs, which is expected. However, the education\>High school coefficient is highly insignificant: p-value of 0.932 correctly tells you that this coefficient is not statistically significant and is not reliably different from zero. However its RSE is flagged as “TRUE” for being unreliable. The RSE is calculated as (0.147 / -0.013) * 100 = 1109%. Here, the extremely high RSE here is not a result of a large standard error, but of the coefficient estimate being very close to zero. An inflated RSE doesn’t provide any new or more accurate information than the p-value; it simply reflects that the coefficient itself is minuscule. This is a great example of why RSE isn’t a primary tool for regression coefficients: it can be inflated by estimates close to zero, regardless of their precision.

Check for Multicollinearity

Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make your coefficient estimates unstable. The VIF is used to detect this issue.

GVIF^(1/(2*Df)) is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors. It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.

Why GVIF1/(2×Df) is Necessary for Categorical Variables

A categorical variable with (k) levels (e.g., race) is typically represented in a regression model by (k - 1) dummy variables. Dummy variables are inherently correlated because they all describe the same categorical feature. This intrinsic relationship would lead to very high — but misleading — GVIF scores if the overall GVIF were interpreted directly.

The adjustment GVIF1/(2×Df) standardizes the GVIF value. It reduces the measure from a hypervolume of confidence for multiple coefficients down to a linear measure, making it comparable to the single VIF value used for continuous predictors. Here, Df is the degrees of freedom for the categorical term, which equals (k - 1), the number of dummy variables.

Acceptable Ranges and Interpretation

The interpretation of GVIF1/(2×Df) follows the same guidelines as the standard VIF:

GVIF1/(2×Df) Range Interpretation
1 No correlation among predictors.
1 – 2.5 Low to moderate correlation — generally acceptable (typical for most well-specified models).
2.5 – 5 Moderate to high correlation — may warrant further investigation.
> 5 Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome. A more conservative cutoff of 4 is sometimes used.
vif_values <- vif(fit_men_obese)
print(vif_values)
#>                GVIF Df GVIF^(1/(2*Df))
#> Age        8.137621  2        1.688979
#> race      16.391999  4        1.418499
#> smoking    3.435829  2        1.361469
#> education  6.381028  2        1.589361

The key values in the GVIF\^(1/(2\*Df)) column are all low (below 2.5). This confirms that your predictor variables are independent enough from one another and are not artificially inflating each other’s standard errors. The lack of precision in the model comes from other sources, not from multicollinearity.

Formatting the Table

# --- Use the helper function to format results from each model ---
men_obese_res <- format_publish_output(publish(fit_men_obese),
                                       "Men, Obese, All Grades")
#>   Variable              Units OddsRatio       CI.95    p-value 
#>        Age              20-39       Ref                        
#>                         40-59      1.23 [0.87;1.72]   0.293928 
#>                          >=60      1.17 [0.77;1.80]   0.497521 
#>       race Non-Hispanic white       Ref                        
#>            Non-Hispanic black      1.32 [1.01;1.73]   0.095724 
#>            Non-Hispanic Asian      0.28 [0.20;0.38]   0.000553 
#>                      Hispanic      1.30 [0.90;1.89]   0.223882 
#>                         Other      1.17 [0.62;2.20]   0.644324 
#>    smoking       Never smoker       Ref                        
#>                 Former smoker      1.23 [0.94;1.62]   0.198208 
#>                Current smoker      0.78 [0.58;1.04]   0.152799 
#>  education        High school       Ref                        
#>                  <High school      0.59 [0.44;0.80]   0.018658 
#>                  >High school      0.99 [0.74;1.32]   0.931661
men_obese3_res <- format_publish_output(publish(fit_men_obese3),
                                        "Men, Class 3 Obesity")
#>   Variable              Units OddsRatio       CI.95   p-value 
#>        Age              20-39       Ref                       
#>                         40-59      0.85 [0.54;1.35]   0.52065 
#>                          >=60      0.88 [0.33;2.31]   0.80102 
#>       race Non-Hispanic white       Ref                       
#>            Non-Hispanic black      1.52 [0.85;2.72]   0.21824 
#>            Non-Hispanic Asian      0.08 [0.01;0.53]   0.04797 
#>                      Hispanic      1.32 [0.57;3.06]   0.54578 
#>                         Other      0.72 [0.28;1.82]   0.51598 
#>    smoking       Never smoker       Ref                       
#>                 Former smoker      1.08 [0.63;1.85]   0.78547 
#>                Current smoker      0.92 [0.47;1.79]   0.82227 
#>  education        High school       Ref                       
#>                  <High school      0.52 [0.27;1.03]   0.11800 
#>                  >High school      0.93 [0.65;1.33]   0.70887
women_obese_res <- format_publish_output(publish(fit_women_obese), 
                                         "Women, Obese, All Grades")
#>   Variable              Units OddsRatio       CI.95     p-value 
#>        Age              20-39       Ref                         
#>                         40-59      1.39 [1.08;1.78]   0.0486143 
#>                          >=60      1.08 [0.86;1.36]   0.5252155 
#>       race Non-Hispanic white       Ref                         
#>            Non-Hispanic black      2.17 [1.88;2.50]   0.0001332 
#>            Non-Hispanic Asian      0.23 [0.16;0.35]   0.0008139 
#>                      Hispanic      1.30 [0.92;1.84]   0.2016538 
#>                         Other      0.97 [0.57;1.65]   0.9169417 
#>    smoking       Never smoker       Ref                         
#>                 Former smoker      1.33 [1.00;1.76]   0.1067193 
#>                Current smoker      0.94 [0.59;1.49]   0.7987621 
#>  education        High school       Ref                         
#>                  <High school      0.96 [0.61;1.51]   0.8623856 
#>                  >High school      0.68 [0.55;0.86]   0.0215212
women_obese3_res <- format_publish_output(publish(fit_women_obese3), 
                                          "Women, Class 3 Obesity")
#>   Variable              Units OddsRatio       CI.95   p-value 
#>        Age              20-39       Ref                       
#>                         40-59      1.23 [0.80;1.91]   0.38944 
#>                          >=60      0.57 [0.39;0.85]   0.03880 
#>       race Non-Hispanic white       Ref                       
#>            Non-Hispanic black      1.92 [1.39;2.64]   0.01054 
#>            Non-Hispanic Asian      0.02 [0.00;0.16]   0.01235 
#>                      Hispanic      1.01 [0.56;1.82]   0.98433 
#>                         Other      1.65 [0.89;3.05]   0.17137 
#>    smoking       Never smoker       Ref                       
#>                 Former smoker      1.57 [1.05;2.35]   0.07838 
#>                Current smoker      1.02 [0.78;1.33]   0.88571 
#>  education        High school       Ref                       
#>                  <High school      1.25 [0.62;2.50]   0.55965 
#>                  >High school      0.89 [0.59;1.35]   0.61174

# --- Create a template for the final table structure ---
final_table_structure <- tibble(
  Group = c(
    "Age, y", "", "",
    "Race", "", "", "", "",
    "Smoking", "", "",
    "Education", "", ""
  ),
  term = c(
    "20-39", "40-59", ">=60",
    "Non-Hispanic white", "Non-Hispanic black", "Non-Hispanic Asian", "Hispanic", "Other",
    "Never smoker", "Former smoker", "Current smoker",
    "High school", "<High school", ">High school"
  )
)

# --- Join all formatted results onto the template ---
final_table <- final_table_structure %>%
  left_join(men_obese_res, by = "term") %>%
  left_join(men_obese3_res, by = "term") %>%
  left_join(women_obese_res, by = "term") %>%
  left_join(women_obese3_res, by = "term")

# --- Display the final, publication-quality table ---
kable(final_table, caption = "Weighted Logistic Regression Models for Obesity", 
      col.names = c("", "", 
                    "Men, Obese, All Grades", 
                    "Men, Class 3 Obesity", 
                    "Women, Obese, All Grades", 
                    "Women, Class 3 Obesity"))
Weighted Logistic Regression Models for Obesity
Men, Obese, All Grades Men, Class 3 Obesity Women, Obese, All Grades Women, Class 3 Obesity
Age, y 20-39 1 [Reference] 1 [Reference] 1 [Reference] 1 [Reference]
40-59 1.23 (0.87-1.72) 0.85 (0.54-1.35) 1.39 (1.08-1.78) 1.23 (0.80-1.91)
>=60 1.17 (0.77-1.80) 0.88 (0.33-2.31) 1.08 (0.86-1.36) 0.57 (0.39-0.85)
Race Non-Hispanic white 1 [Reference] 1 [Reference] 1 [Reference] 1 [Reference]
Non-Hispanic black 1.32 (1.01-1.73) 1.52 (0.85-2.72) 2.17 (1.88-2.50) 1.92 (1.39-2.64)
Non-Hispanic Asian 0.28 (0.20-0.38) 0.08 (0.01-0.53) 0.23 (0.16-0.35) 0.02 (0.00-0.16)
Hispanic 1.30 (0.90-1.89) 1.32 (0.57-3.06) 1.30 (0.92-1.84) 1.01 (0.56-1.82)
Other 1.17 (0.62-2.20) 0.72 (0.28-1.82) 0.97 (0.57-1.65) 1.65 (0.89-3.05)
Smoking Never smoker 1 [Reference] 1 [Reference] 1 [Reference] 1 [Reference]
Former smoker 1.23 (0.94-1.62) 1.08 (0.63-1.85) 1.33 (1.00-1.76) 1.57 (1.05-2.35)
Current smoker 0.78 (0.58-1.04) 0.92 (0.47-1.79) 0.94 (0.59-1.49) 1.02 (0.78-1.33)
Education High school 1 [Reference] 1 [Reference] 1 [Reference] 1 [Reference]
<High school 0.59 (0.44-0.80) 0.52 (0.27-1.03) 0.96 (0.61-1.51) 1.25 (0.62-2.50)
>High school 0.99 (0.74-1.32) 0.93 (0.65-1.33) 0.68 (0.55-0.86) 0.89 (0.59-1.35)

Differences from the Original Paper

While the ‘Statistical Analyses’ section of Flegal et al. (2016) details their models, it does not explicitly state the method used to handle missing data for covariates. Our replication employs a complete-case analysis, which excludes participants with missing smoking or education data from the models. This difference is the most likely reason for the minor discrepancies between our results and those published in the original paper.