NHANES: Reliability Standards

Introduction

This tutorial reproduces the key tables from the Flegal et al. (2016) article (Flegal et al. 2016). The analysis uses the same NHANES data and aims to replicate the unweighted sample size counts from Table 1 and the weighted logistic regression models from Table 3. We incorporate NCHS/CDC reliability standards to ensure estimates are statistically defensible (Disease Control and Prevention 2025).

Setup and Data Preparation

This first section prepares the data for analysis. The key steps are:

Load Data: The Flegal2016.RData file containing the necessary NHANES data is loaded.
Define Analytic Sample: An analytic_design object is created. This sample of N=5,455 matches the total number of participants reported in the paper for the 2013–2014 cycle.
Recode Variables: All variables needed for the analysis (e.g., Age, race, obese, smoking, education) are created and categorized to match the definitions used by Flegal et al.
Create Survey Design Object: A svydesign object is created to account for survey design (weights, strata, clusters).

# --- Load Libraries ---
library(survey)
library(dplyr)
library(knitr)
library(car)
library(tidyr)
library(Publish)
library(stringr)
library(kableExtra) 
# install.packages("devtools")
# devtools::install_github("ehsanx/svyTable1", 
#                          build_vignettes = TRUE, 
#                          dependencies = TRUE)
library(svyTable1)

# --- Load Data ---
load("Data/surveydata/Flegal2016.RData")

# --- Create Analytic Sample (N=5,455) ---
dat.analytic <- subset(dat.full, RIDAGEYR >= 20)
dat.analytic <- subset(dat.analytic, !is.na(BMXBMI))
dat.analytic <- subset(dat.analytic, is.na(RIDEXPRG) | RIDEXPRG != "Yes, positive lab pregnancy test")

# --- Create ALL Analysis Variables in the Full Dataset ---
dat.full <- dat.full %>%
  mutate(
    Age = cut(RIDAGEYR,
              breaks = c(20, 40, 60, Inf),
              right = FALSE,
              labels = c("20-39", "40-59", ">=60")),
    
    race = case_when(
      RIDRETH3 == "Non-Hispanic White" ~ "Non-Hispanic white",
      RIDRETH3 == "Non-Hispanic Black" ~ "Non-Hispanic black",
      RIDRETH3 == "Non-Hispanic Asian" ~ "Non-Hispanic Asian",
      RIDRETH3 %in% c("Mexican American", "Other Hispanic") ~ "Hispanic",
      RIDRETH3 == "Other Race - Including Multi-Rac" ~ "Other",
      TRUE ~ NA_character_
    ),
    race = factor(race, levels = c("Non-Hispanic white", "Non-Hispanic black", "Non-Hispanic Asian", "Hispanic", "Other")),
    
    gender = factor(RIAGENDR, labels = c("Male", "Female")),
    
    obese = factor(ifelse(BMXBMI >= 30, "Yes", "No"), levels = c("No", "Yes")),
    obese_class3 = factor(ifelse(BMXBMI >= 40, "Yes", "No"), levels = c("No", "Yes")),
    
    smoking = case_when(
      SMQ020 == "No" ~ "Never smoker",
      SMQ020 == "Yes" & SMQ040 == "Not at all" ~ "Former smoker",
      SMQ020 == "Yes" ~ "Current smoker",
      TRUE ~ NA_character_
    ),
    smoking = factor(smoking, levels = c("Never smoker", "Former smoker", "Current smoker")),

    education = case_when(
        DMDEDUC2 %in% c("Less than 9th grade", "9-11th grade (Includes 12th grad)") ~ "<High school",
        DMDEDUC2 == "High school graduate/GED or equi" ~ "High school",
        DMDEDUC2 %in% c("Some college or AA degree", "College graduate or above") ~ ">High school",
        TRUE ~ NA_character_
    ),
    education = factor(education, levels = c("High school", "<High school", ">High school"))
  )

# --- Create Survey Design Object ---
options(survey.lonely.psu = "adjust")
svy.design.full <- svydesign(id = ~SDMVPSU, 
                             strata = ~SDMVSTRA, 
                             weights = ~WTINT2YR, 
                             nest = TRUE, 
                             data = dat.full)
analytic_design <- subset(svy.design.full, SEQN %in% dat.analytic$SEQN)

Reproducing Table 1

This section reproduces the unweighted sample sizes shown in Flegal et al.’s Table 1. The code first generates separate summary tables for all participants, men, and women. It then performs several formatting steps to combine these into a single table.

What `svytable1` Does

The svytable1 function creates a descriptive summary table—commonly referred to as a “Table 1”—from complex survey data (Karim 2025). It is specifically designed to produce publication-ready results that align with NCHS Data Presentation Standards for reliability.

Key `svytable1` Operations

When we call svytable1 (link), it performs the following steps for each analysis (for example, for all participants, men, and women):

Calculates Proportions
It summarizes categorical variables (like Age) by calculating the proportion of participants in each category (e.g., 20–39, 40–59, ≥60).
Stratifies by a Grouping Variable
Results are calculated separately by levels of the strata_var (for example, race), creating side-by-side columns.
Displays Mixed Mode Results
Because mode = "mixed", each cell shows both:
- The unweighted sample count (N)
- The weighted percentage (%), which represents the population estimate.
Performs Reliability Checks
When reliability_checks = TRUE, the function evaluates each estimate against NCHS Data Presentation Standards. These checks prevent publication of unstable or statistically unreliable estimates.

What the Asterisk (*) Means

An asterisk (*) in these tables output indicate suppression: the estimate was determined to be statistically unreliable. The function hides the unreliable value to avoid misinterpretation.

NCHS reliability rules (for proportions)

Estimates are suppressed if they fail one or more NCHS reliability rules (for proportions):

fail_n_30: The unweighted sample size (n) is fewer than 30 participants.
fail_eff_n_30: The effective sample size (adjusted for design effects) is less than 30.
fail_df_8: The design degrees of freedom (df) are fewer than 8.
fail_ciw_30: The absolute confidence interval width is ≥ 30 percentage points.
fail_rciw_130: The relative CI width is greater than 130%.
Each of these flags indicates limited precision or instability in the estimate.

In the output, the asterisks appear in the “Other” race column for certain age groups (such as “40–59” and “≥60”).
This happens because the number of participants in those cells is very small, producing unstable or wide confidence intervals. Thus, the function correctly replaces the unreliable estimates with *, ensuring the published results remain statistically defensible and transparent.

Reliability Metrics Table

In addition to the detailed checks for proportions, the svytable1 function also assesses the reliability of means for numeric variables. For these estimates, it applies the standard NCHS recommendation, which uses the Relative Standard Error (RSE). If a mean’s RSE is 30% or greater, it is considered statistically unreliable and will be suppressed with an asterisk (*) in the formatted table.

The $reliability_metrics table will be printed with the output if we select return_metrics = TRUE which will include rows for each mean, reporting the calculated RSE and the outcome of this check in the fail_rse_30 column.

# View reliability_metrics
table1_svy <- svytable1(
  design = analytic_design, strata_var = "race", 
  table_vars = "Age", mode = "mixed",
  reliability_checks = TRUE,
  return_metrics = TRUE
)
table1_svy$reliability_metrics[table1_svy$reliability_metrics$suppressed == TRUE, ]

Summary tables for each group

# --- Create the summary tables for each group ---
table1_svy_all <- svytable1(
  design = analytic_design, strata_var = "race", 
  table_vars = "Age", mode = "mixed",
  reliability_checks = TRUE
) %>%
  mutate(Variable = dplyr::recode(Variable, "Age" = "Age Groups"))
kable(table1_svy_all)

Variable	Level	Overall	Non-Hispanic white	Non-Hispanic black	Non-Hispanic Asian	Hispanic	Other
n		5,455	2,343	1,115	623	1,214	160
Age Groups
	20-39	1,810 (35.5%)	734 (30.8%)	362 (38.9%)	216 (40.2%)	412 (49.6%)	86 (49.2%)
	40-59	1,896 (37.5%)	759 (37.5%)	383 (39.3%)	251 (38.5%)	449 (35.6%)	*
	>=60	1,749 (27.0%)	850 (31.7%)	370 (21.8%)	156 (21.2%)	353 (14.8%)	*


male_design <- subset(analytic_design, gender == "Male")
table1_svy_men <- svytable1(
  design = male_design, strata_var = "race", 
  table_vars = "Age", mode = "mixed",
  reliability_checks = TRUE
) %>%
  mutate(Variable = dplyr::recode(Variable, "Age" = "Age group"))
kable(table1_svy_men)

Variable	Level	Overall	Non-Hispanic white	Non-Hispanic black	Non-Hispanic Asian	Hispanic	Other
n		2,638	1,130	556	300	573	79
Age group
	20-39	909 (37.2%)	386 (32.3%)	182 (41.3%)	106 (41.7%)	189 (51.6%)	46 (52.9%)
	40-59	897 (37.6%)	360 (37.9%)	179 (39.0%)	120 (38.5%)	215 (35.1%)	*
	>=60	832 (25.1%)	384 (29.8%)	195 (19.7%)	74 (19.8%)	169 (13.2%)	*


female_design <- subset(analytic_design, gender == "Female")
table1_svy_women <- svytable1(
  design = female_design, strata_var = "race", 
  table_vars = "Age", mode = "mixed",
  reliability_checks = TRUE
) %>%
  mutate(Variable = dplyr::recode(Variable, "Age" = "Age group"))
kable(table1_svy_women)

Variable	Level	Overall	Non-Hispanic white	Non-Hispanic black	Non-Hispanic Asian	Hispanic	Other
n		2,817	1,213	559	323	641	81
Age group
	20-39	901 (33.8%)	348 (29.4%)	180 (36.9%)	110 (39.0%)	223 (47.6%)	*
	40-59	999 (37.4%)	399 (37.1%)	204 (39.5%)	131 (38.6%)	234 (36.0%)	*
	>=60	917 (28.8%)	466 (33.5%)	175 (23.6%)	82 (22.5%)	184 (16.4%)	*

Format and combine the tables

# --- Format and combine the tables ---
table_all_formatted <- table1_svy_all %>%
  select(
    `Age Groups,y` = Level, `All Groups` = Overall,
    `White` = `Non-Hispanic white`, `Black` = `Non-Hispanic black`,
    `Asian` = `Non-Hispanic Asian`, `Hispanic`
  ) %>%
  slice(3:5) 

table_men_formatted <- table1_svy_men %>%
  select(
    `Age Groups,y` = Level, `All Groups` = Overall,
    `White` = `Non-Hispanic white`, `Black` = `Non-Hispanic black`,
    `Asian` = `Non-Hispanic Asian`, `Hispanic`
  ) %>%
  slice(3:5)

table_women_formatted <- table1_svy_women %>%
  select(
    `Age Groups,y` = Level, `All Groups` = Overall,
    `White` = `Non-Hispanic white`, `Black` = `Non-Hispanic black`,
    `Asian` = `Non-Hispanic Asian`, `Hispanic`
  ) %>%
  slice(3:5) 

final_table_data <- bind_rows(
  table_all_formatted,
  table_men_formatted,
  table_women_formatted
)

# --- Render the final, publication-quality table ---
final_table_data %>%
  kable(caption = "Characteristics by Group and Race/Hispanic Origin", align = "lrrrrr") %>%
  kable_styling(bootstrap_options = c("striped", "bordered"), full_width = FALSE) %>%
  add_header_above(c(" " = 1, "All Groups" = 1, "Race/Ethnicity" = 4)) %>%
  kableExtra::group_rows(index = c("All participants" = 3,
                                   "Men" = 3,
                                   "Women" = 3))

Characteristics by Group and Race/Hispanic Origin
	All Groups	Race/Ethnicity
Age Groups,y	All Groups	White	Black	Asian	Hispanic
All participants
20-39	1,810 (35.5%)	734 (30.8%)	362 (38.9%)	216 (40.2%)	412 (49.6%)
40-59	1,896 (37.5%)	759 (37.5%)	383 (39.3%)	251 (38.5%)	449 (35.6%)
>=60	1,749 (27.0%)	850 (31.7%)	370 (21.8%)	156 (21.2%)	353 (14.8%)
Men
20-39	909 (37.2%)	386 (32.3%)	182 (41.3%)	106 (41.7%)	189 (51.6%)
40-59	897 (37.6%)	360 (37.9%)	179 (39.0%)	120 (38.5%)	215 (35.1%)
>=60	832 (25.1%)	384 (29.8%)	195 (19.7%)	74 (19.8%)	169 (13.2%)
Women
20-39	901 (33.8%)	348 (29.4%)	180 (36.9%)	110 (39.0%)	223 (47.6%)
40-59	999 (37.4%)	399 (37.1%)	204 (39.5%)	131 (38.6%)	234 (36.0%)
>=60	917 (28.8%)	466 (33.5%)	175 (23.6%)	82 (22.5%)	184 (16.4%)

Reproducing Table 3

This section reproduces the weighted logistic regression models from Flegal et al.’s Table 3. Four separate models are fit: one for obesity and one for class 3 obesity, each stratified by gender. We extend the replication by incorporating NCHS/CDC reliability checks using relative standard error (RSE) for regression coefficients. Per NCHS practices, coefficients with RSE ≥ 30% are considered unreliable and flagged with an asterisk (*).

# --- Helper function to format the output of publish() ---
format_publish_output <- function(publish_object, new_col_name) {
  
  # Extract the clean regression table from the publish object
  df <- publish_object$regressionTable
  
  # Combine OR and CI into one string, handling reference groups
  df$result <- ifelse(
    df$OddsRatio == "Ref", 
    "1 [Reference]",
    # Combine the OR and CI, removing the brackets from the CI string and replacing the semicolon
    paste0(df$OddsRatio, " (", str_replace(str_remove_all(df$CI.95, "\\[|\\]"), ";", "-"), ")")
  )
  
  # Return a clean tibble with just the variable level and the formatted result
  tibble(term = df$Units, !!new_col_name := df$result)
}

# --- Create complete-case survey designs for each gender ---
male_design_complete <- subset(analytic_design, gender == "Male" & !is.na(smoking) & !is.na(education))
female_design_complete <- subset(analytic_design, gender == "Female" & !is.na(smoking) & !is.na(education))

Fit the Models

# --- Fit all four models ---
fit_men_obese <- svyglm(I(obese == "Yes") ~ Age + race + smoking + education, design = male_design_complete, family = binomial())
# print(summary(fit_men_obese))
fit_men_obese3 <- svyglm(I(obese_class3 == "Yes") ~ Age + race + smoking + education, design = male_design_complete, family = binomial())
fit_women_obese <- svyglm(I(obese == "Yes") ~ Age + race + smoking + education, design = female_design_complete, family = binomial())
fit_women_obese3 <- svyglm(I(obese_class3 == "Yes") ~ Age + race + smoking + education, design = female_design_complete, family = binomial())

Reliability check for regression coefficients

The National Center for Health Statistics (NCHS) and the Centers for Disease Control and Prevention (CDC) do not recommend using the relative standard error (RSE) as a primary reliability check for regression coefficients. Their guidelines focus on other types of estimates.

For means and totals, the RSE is a commonly used metric.
For proportions, while the RSE was used in the past, the NCHS has transitioned to a more sophisticated, multi-step approach. This newer method considers a minimum number of events or a minimum denominator sample size, and the width of the confidence interval around the proportion. This change was made because the RSE can be misleading for proportions, especially those close to 0% or 100%.
There are no well-established NCHS or CDC guidelines that advocate for the use of RSE to determine the reliability of regression coefficients. The statistical properties of regression coefficients are more complex than those of means or proportions. The reliability of a regression coefficient is influenced by a variety of factors, including the sample size, the variance of the independent variable, and the overall fit of the model. Given this complexity, a simple RSE threshold is not considered an adequate measure of reliability for regression coefficients. Instead, we assess reliability by examining several key metrics that, taken together, give us a complete picture of a coefficient’s stability and precision. The main tool for this is the svydiag() functions in svyTable1 package, which we’ll use to look at:

The Standard Error (SE): A direct measure of the coefficient’s precision. A smaller SE relative to its coefficient suggests a more reliable estimate.
The p-value: Tells if the coefficient is statistically distinguishable from zero. A non-significant p-value (e.g., p > 0.05) means we cannot be confident the predictor has any association with the outcome.
The Confidence Interval (CI): Provides a plausible range for the true value of the coefficient. A very wide CI indicates a high degree of uncertainty and, therefore, low reliability. For logistic regression, if the CI for the odds ratio contains 1.0, the result is not statistically significant.

We will also calculate the RSE to demonstrate why it can be misleading. Finally, we’ll run a quick check for multicollinearity using the Variance Inflation Factor (VIF), as this is a common cause of unstable (unreliable) coefficients.

# Let's examine one model in detail: fit_men_obese
# The same process applies to the other three models.
# Get the standard model summary and confidence intervals
diagnostics_table <- svydiag(fit_men_obese)
knitr::kable(
  diagnostics_table,
  caption = "Table 3: Reliability Diagnostics for Obesity Model Coefficients",
  digits = 3
)

Table 3: Reliability Diagnostics for Obesity Model Coefficients
Term	Estimate	SE	p.value	is_significant	CI_Lower	CI_Upper	CI_Width	RSE_percent	is_rse_high
(Intercept)	-0.745	0.167	0.007	TRUE	-1.175	-0.315	0.860	22.451	FALSE
Age40-59	0.204	0.174	0.294	FALSE	-0.243	0.651	0.895	85.311	TRUE
Age>=60	0.159	0.218	0.498	FALSE	-0.401	0.719	1.119	136.779	TRUE
raceNon-Hispanic black	0.279	0.136	0.096	FALSE	-0.071	0.630	0.701	48.796	TRUE
raceNon-Hispanic Asian	-1.277	0.164	0.001	TRUE	-1.697	-0.856	0.841	12.811	FALSE
raceHispanic	0.264	0.190	0.224	FALSE	-0.225	0.752	0.977	72.060	TRUE
raceOther	0.158	0.322	0.644	FALSE	-0.670	0.987	1.657	203.734	TRUE
smokingFormer smoker	0.207	0.140	0.198	FALSE	-0.152	0.567	0.719	67.436	TRUE
smokingCurrent smoker	-0.250	0.149	0.153	FALSE	-0.632	0.132	0.764	59.346	TRUE
education<High school	-0.528	0.154	0.019	TRUE	-0.924	-0.132	0.792	29.164	FALSE
education>High school	-0.013	0.147	0.932	FALSE	-0.390	0.364	0.754	1109.146	TRUE

Generally, the model shows limited reliability and predictive power. Most of the predictor variables, such as Age and smoking status, are not statistically significant (their p.value is high). This indicates that, for men in this dataset, these factors don’t have a clear, reliable association with obesity.

The few significant predictors are raceNon-Hispanic Asian and education\<High school. These coefficients are considered stable and reliable. The unreliability of the other terms is not caused by the variables being correlated with each other, as the multicollinearity check shows.

The RSE Can Be Misleading for Regression: Notice that some statistically insignificant coefficients (like Age40-59 and raceHispanic) have high RSEs, which is expected. However, the education\>High school coefficient is highly insignificant: p-value of 0.932 correctly tells that this coefficient is not statistically significant and is not reliably different from zero. However its RSE is flagged as “TRUE” for being unreliable. The RSE is calculated as (0.147 / -0.013) * 100 = 1109%. Here, the extremely high RSE here is not a result of a large standard error, but of the coefficient estimate being very close to zero. An inflated RSE doesn’t provide any new or more accurate information than the p-value; it simply reflects that the coefficient itself is minuscule. This is a great example of why RSE isn’t a primary tool for regression coefficients: it can be inflated by estimates close to zero, regardless of their precision.

Check for Multicollinearity

Multicollinearity occurs when predictor variables in a model are highly correlated with each other. This can inflate the standard errors and make the coefficient estimates unstable. The VIF is used to detect this issue.

GVIF^(1/(2*Df)) is a scaled version of the Generalized Variance Inflation Factor (GVIF) used to assess multicollinearity in regression models with categorical predictors (Fox and Monette 1992). It adjusts for the number of dummy variables created for a categorical variable, making its value directly comparable to the traditional VIF used for continuous predictors within the same model.

Why GVIF^1/(2×Df) is Necessary for Categorical Variables

A categorical variable with (k) levels (e.g., race) is typically represented in a regression model by (k - 1) dummy variables. Dummy variables are inherently correlated because they all describe the same categorical feature. This intrinsic relationship would lead to very high — but misleading — GVIF scores if the overall GVIF were interpreted directly.

The adjustment GVIF^1/(2×Df) standardizes the GVIF value. It reduces the measure from a hypervolume of confidence for multiple coefficients down to a linear measure, making it comparable to the single VIF value used for continuous predictors. Here, Df is the degrees of freedom for the categorical term, which equals (k - 1), the number of dummy variables.

Acceptable Ranges and Interpretation

The interpretation of GVIF^1/(2×Df) follows the same guidelines as the standard VIF:

GVIF^1/(2×Df) Range	Interpretation
1	No correlation among predictors.
1 – 2.5	Low to moderate correlation — generally acceptable (typical for most well-specified models).
2.5 – 5	Moderate to high correlation — may warrant further investigation.
> 5	Potentially severe multicollinearity. The predictor may have a strong overlap with others, obscuring its effect on the outcome.

A more conservative cutoff of 3 is sometimes used. The scaled GVIF, $GVIF^{1/(2·df)}$, is designed to be comparable to the square root of the VIF, which explains the use of cutoffs like $\sqrt{5}$ (≈ 2.24) and $\sqrt{10}$ (≈ 3.16) (Nahhas 2024). Larger than $\sqrt{20}$ (≈ 4.47) is therefore the case of severe multicollinearity.

vif_values <- vif(fit_men_obese)
print(vif_values)
#>                GVIF Df GVIF^(1/(2*Df))
#> Age        8.137621  2        1.688979
#> race      16.391999  4        1.418499
#> smoking    3.435829  2        1.361469
#> education  6.381028  2        1.589361

The key values in the GVIF\^(1/(2\*Df)) column are all low (below 2.5). This confirms that the predictor variables are independent enough from one another and are not artificially inflating each other’s standard errors. The lack of precision in the model comes from other sources, not from multicollinearity.

Formatting the Table

# --- Use the helper function to format results from each model ---
men_obese_res <- format_publish_output(publish(fit_men_obese),
                                       "Men, Obese, All Grades")
#>   Variable              Units OddsRatio       CI.95    p-value 
#>        Age              20-39       Ref                        
#>                         40-59      1.23 [0.87;1.72]   0.293928 
#>                          >=60      1.17 [0.77;1.80]   0.497521 
#>       race Non-Hispanic white       Ref                        
#>            Non-Hispanic black      1.32 [1.01;1.73]   0.095724 
#>            Non-Hispanic Asian      0.28 [0.20;0.38]   0.000553 
#>                      Hispanic      1.30 [0.90;1.89]   0.223882 
#>                         Other      1.17 [0.62;2.20]   0.644324 
#>    smoking       Never smoker       Ref                        
#>                 Former smoker      1.23 [0.94;1.62]   0.198208 
#>                Current smoker      0.78 [0.58;1.04]   0.152799 
#>  education        High school       Ref                        
#>                  <High school      0.59 [0.44;0.80]   0.018658 
#>                  >High school      0.99 [0.74;1.32]   0.931661
men_obese3_res <- format_publish_output(publish(fit_men_obese3),
                                        "Men, Class 3 Obesity")
#>   Variable              Units OddsRatio       CI.95   p-value 
#>        Age              20-39       Ref                       
#>                         40-59      0.85 [0.54;1.35]   0.52065 
#>                          >=60      0.88 [0.33;2.31]   0.80102 
#>       race Non-Hispanic white       Ref                       
#>            Non-Hispanic black      1.52 [0.85;2.72]   0.21824 
#>            Non-Hispanic Asian      0.08 [0.01;0.53]   0.04797 
#>                      Hispanic      1.32 [0.57;3.06]   0.54578 
#>                         Other      0.72 [0.28;1.82]   0.51598 
#>    smoking       Never smoker       Ref                       
#>                 Former smoker      1.08 [0.63;1.85]   0.78547 
#>                Current smoker      0.92 [0.47;1.79]   0.82227 
#>  education        High school       Ref                       
#>                  <High school      0.52 [0.27;1.03]   0.11800 
#>                  >High school      0.93 [0.65;1.33]   0.70887
women_obese_res <- format_publish_output(publish(fit_women_obese), 
                                         "Women, Obese, All Grades")
#>   Variable              Units OddsRatio       CI.95     p-value 
#>        Age              20-39       Ref                         
#>                         40-59      1.39 [1.08;1.78]   0.0486143 
#>                          >=60      1.08 [0.86;1.36]   0.5252155 
#>       race Non-Hispanic white       Ref                         
#>            Non-Hispanic black      2.17 [1.88;2.50]   0.0001332 
#>            Non-Hispanic Asian      0.23 [0.16;0.35]   0.0008139 
#>                      Hispanic      1.30 [0.92;1.84]   0.2016538 
#>                         Other      0.97 [0.57;1.65]   0.9169417 
#>    smoking       Never smoker       Ref                         
#>                 Former smoker      1.33 [1.00;1.76]   0.1067193 
#>                Current smoker      0.94 [0.59;1.49]   0.7987621 
#>  education        High school       Ref                         
#>                  <High school      0.96 [0.61;1.51]   0.8623856 
#>                  >High school      0.68 [0.55;0.86]   0.0215212
women_obese3_res <- format_publish_output(publish(fit_women_obese3), 
                                          "Women, Class 3 Obesity")
#>   Variable              Units OddsRatio       CI.95   p-value 
#>        Age              20-39       Ref                       
#>                         40-59      1.23 [0.80;1.91]   0.38944 
#>                          >=60      0.57 [0.39;0.85]   0.03880 
#>       race Non-Hispanic white       Ref                       
#>            Non-Hispanic black      1.92 [1.39;2.64]   0.01054 
#>            Non-Hispanic Asian      0.02 [0.00;0.16]   0.01235 
#>                      Hispanic      1.01 [0.56;1.82]   0.98433 
#>                         Other      1.65 [0.89;3.05]   0.17137 
#>    smoking       Never smoker       Ref                       
#>                 Former smoker      1.57 [1.05;2.35]   0.07838 
#>                Current smoker      1.02 [0.78;1.33]   0.88571 
#>  education        High school       Ref                       
#>                  <High school      1.25 [0.62;2.50]   0.55965 
#>                  >High school      0.89 [0.59;1.35]   0.61174

# --- Create a template for the final table structure ---
final_table_structure <- tibble(
  Group = c(
    "Age, y", "", "",
    "Race", "", "", "", "",
    "Smoking", "", "",
    "Education", "", ""
  ),
  term = c(
    "20-39", "40-59", ">=60",
    "Non-Hispanic white", "Non-Hispanic black", "Non-Hispanic Asian", "Hispanic", "Other",
    "Never smoker", "Former smoker", "Current smoker",
    "High school", "<High school", ">High school"
  )
)

# --- Join all formatted results onto the template ---
final_table <- final_table_structure %>%
  left_join(men_obese_res, by = "term") %>%
  left_join(men_obese3_res, by = "term") %>%
  left_join(women_obese_res, by = "term") %>%
  left_join(women_obese3_res, by = "term")

# --- Display the final, publication-quality table ---
kable(final_table, caption = "Weighted Logistic Regression Models for Obesity", 
      col.names = c("", "", 
                    "Men, Obese, All Grades", 
                    "Men, Class 3 Obesity", 
                    "Women, Obese, All Grades", 
                    "Women, Class 3 Obesity"))

Weighted Logistic Regression Models for Obesity
		Men, Obese, All Grades	Men, Class 3 Obesity	Women, Obese, All Grades	Women, Class 3 Obesity
Age, y	20-39	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]
	40-59	1.23 (0.87-1.72)	0.85 (0.54-1.35)	1.39 (1.08-1.78)	1.23 (0.80-1.91)
	>=60	1.17 (0.77-1.80)	0.88 (0.33-2.31)	1.08 (0.86-1.36)	0.57 (0.39-0.85)
Race	Non-Hispanic white	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]
	Non-Hispanic black	1.32 (1.01-1.73)	1.52 (0.85-2.72)	2.17 (1.88-2.50)	1.92 (1.39-2.64)
	Non-Hispanic Asian	0.28 (0.20-0.38)	0.08 (0.01-0.53)	0.23 (0.16-0.35)	0.02 (0.00-0.16)
	Hispanic	1.30 (0.90-1.89)	1.32 (0.57-3.06)	1.30 (0.92-1.84)	1.01 (0.56-1.82)
	Other	1.17 (0.62-2.20)	0.72 (0.28-1.82)	0.97 (0.57-1.65)	1.65 (0.89-3.05)
Smoking	Never smoker	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]
	Former smoker	1.23 (0.94-1.62)	1.08 (0.63-1.85)	1.33 (1.00-1.76)	1.57 (1.05-2.35)
	Current smoker	0.78 (0.58-1.04)	0.92 (0.47-1.79)	0.94 (0.59-1.49)	1.02 (0.78-1.33)
Education	High school	1 [Reference]	1 [Reference]	1 [Reference]	1 [Reference]
	<High school	0.59 (0.44-0.80)	0.52 (0.27-1.03)	0.96 (0.61-1.51)	1.25 (0.62-2.50)
	>High school	0.99 (0.74-1.32)	0.93 (0.65-1.33)	0.68 (0.55-0.86)	0.89 (0.59-1.35)

Differences from the Original Paper

While the ‘Statistical Analyses’ section of Flegal et al. (2016) details their models, it does not explicitly state the method used to handle missing data for covariates. Our replication employs a complete-case analysis, which excludes participants with missing smoking or education data from the models. This difference is the most likely reason for the minor discrepancies between our results and those published in the original paper.

References

Disease Control, Centers for, and Prevention. 2025. “NHANES Tutorials: Reliability of Estimates Module.” National Center for Health Statistics. https://wwwn.cdc.gov/nchs/nhanes/tutorials/reliabilityofestimates.aspx.

Flegal, Katherine M, Deanna Kruszon-Moran, Margaret D Carroll, Cheryl D Fryar, and Cynthia L Ogden. 2016. “Trends in Obesity Among Adults in the United States, 2005 to 2014.” Jama 315 (21): 2284–91.

Fox, John, and Georges Monette. 1992. “Generalized Collinearity Diagnostics.” Journal of the American Statistical Association 87 (417): 178–83.

Karim, Mohammad Ehsanul. 2025. “svyTable1: Table 1 and Diagnostics from Complex Survey Designs.” https://github.com/ehsanx/svyTable1.

Nahhas, Ramzi W. 2024. Introduction to Regression Methods for Public Health Using r. CRC Press.