Skip to contents

{r setup} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )

The svyTable1 package provides a suite of functions for creating publication-ready tables from complex analytical results. This vignette focuses on its core capability: generating descriptive “Table 1” summaries using the svytable1() function.

Describing Complex Survey Data

The svytable1() function defaults to mode = "mixed". This approach is considered a best practice in high-quality survey research because it promotes transparency.

When you present a table in mixed mode, two types of information are shown for categorical variables:

This structure prevents common reporting problems such as showing weighted counts (which do not represent real sample sizes) or hiding sparse categories that contribute instability.

For continuous variables, the function reports weighted means, standard errors, and confidence intervals. It also offers optional reliability checks derived from National Center for Health Statistics (NCHS) guidance, including standard error and relative standard error thresholds.

Example: Using NHANES Data (2009–2012)

The examples below use data from the NHANES raw dataset to demonstrate how to prepare a survey design and generate descriptive tables. The code illustrates both incomplete data situations and complete case summaries.

1. Data Preparation

data(NHANESraw)

nhanes_adults_with_na <- NHANESraw %>%
  dplyr::filter(Age >= 20) %>%
  mutate(
    ObeseStatus = factor(
      ifelse(BMI >= 30, "Obese", "Not Obese"),
      levels = c("Not Obese", "Obese")
    )
  )

adult_design_with_na <- svydesign(
  id = ~SDMVPSU,
  strata = ~SDMVSTRA,
  weights = ~WTMEC2YR,
  nest = TRUE,
  data = nhanes_adults_with_na
)

2. Generating Tables

Example A: Handling Variables with Missing Data

Variables in large health surveys often contain missing values. svytable1() automatically identifies missingness and calculates unweighted Ns that reflect real denominators. Weighted percentages are based only on cases with observed data, consistent with standard survey practice.

vars_with_missing <- c(
  "Age", "Gender", "Race1", "Education", "HHIncome",
  "TotChol", "SleepHrsNight", "SmokeNow"
)

table_with_missing <- svytable1(
  design = adult_design_with_na,
  strata_var = "ObeseStatus",
  table_vars = vars_with_missing
)

knitr::kable(
  table_with_missing,
  caption = "Table 1: Summarizing Variables with Missing Data"
)
Table 1: Summarizing Variables with Missing Data
Variable Level Overall Missing Not Obese Obese
n 11,778 547 7,073 4,158
Age
Mean (SD) 47.18 (16.89) 56.29 (19.15) 46.45 (17.32) 48.25 (15.87)
Gender
female 6,032 (51.9%) 275 (52.3%) 3,443 (51.2%) 2,314 (53.2%)
male 5,746 (48.1%) 272 (47.7%) 3,630 (48.8%) 1,844 (46.8%)
Race1
Black 2,577 (11.4%) 108 (12.1%) 1,296 (9.1%) 1,173 (15.8%)
Hispanic 1,210 (5.8%) 62 (2.9%) 714 (5.7%) 434 (6.0%)
Mexican 1,680 (8.2%) 75 (9.4%) 920 (7.3%) 685 (9.8%)
White 5,017 (67.2%) 235 (69.6%) 3,114 (68.6%) 1,668 (64.4%)
Other 1,294 (7.4%) 67 (6.0%) 1,029 (9.3%) 198 (4.0%)
Education
8th Grade 1,321 (6.1%) 79 (9.8%) 770 (5.9%) 472 (6.3%)
9 - 11th Grade 1,787 (11.8%) 84 (16.8%) 1,021 (11.3%) 682 (12.5%)
High School 2,595 (21.5%) 121 (21.1%) 1,496 (20.2%) 978 (23.8%)
Some College 3,399 (31.3%) 144 (33.4%) 1,968 (29.4%) 1,287 (34.5%)
College Grad 2,656 (29.3%) 116 (18.7%) 1,805 (33.0%) 735 (22.8%)
Missing 20 (0.1%) 3 (0.2%) 13 (0.1%) 4 (0.0%)
HHIncome
0-4999 270 (1.6%) 16 (1.2%) 159 (1.6%) 95 (1.6%)
10000-14999 915 (5.5%) 57 (11.3%) 501 (5.0%) 357 (6.1%)
15000-19999 831 (5.0%) 46 (3.8%) 484 (4.8%) 301 (5.4%)
20000-24999 911 (5.8%) 42 (9.5%) 522 (5.5%) 347 (6.4%)
25000-34999 1,352 (9.7%) 64 (14.3%) 750 (8.7%) 538 (11.3%)
35000-44999 1,054 (8.9%) 38 (10.1%) 637 (8.6%) 379 (9.4%)
45000-54999 830 (7.9%) 22 (1.6%) 502 (8.0%) 306 (8.0%)
5000-9999 503 (2.6%) 26 (6.6%) 275 (2.3%) 202 (3.0%)
55000-64999 612 (6.1%) 20 (1.5%) 380 (6.1%) 212 (6.3%)
65000-74999 515 (5.7%) 19 (3.0%) 299 (5.3%) 197 (6.6%)
75000-99999 993 (11.0%) 25 (5.3%) 607 (10.8%) 361 (11.6%)
more 99999 1,710 (22.2%) 63 (19.4%) 1,169 (25.1%) 478 (16.9%)
Missing 1,282 (8.0%) 109 (12.3%) 788 (8.2%) 385 (7.4%)
TotChol
Mean (SD) 5.07 (1.07) 5.00 (1.42) 5.07 (1.08) 5.06 (1.04)
Missing, n (%) 1,169 (5.6%) 426 (15.5%) 480 (5.5%) 263 (5.4%)
SleepHrsNight
Mean (SD) 6.89 (1.34) 6.84 (1.88) 6.95 (1.30) 6.78 (1.39)
Missing, n (%) 26 (0.2%) 6 (1.4%) 10 (0.2%) 10 (0.2%)
SmokeNow
No 2,779 (24.2%) 142 (29.1%) 1,580 (23.2%) 1,057 (26.0%)
Yes 2,454 (20.1%) 102 (19.5%) 1,594 (21.4%) 758 (17.6%)
Missing 6,545 (55.7%) 303 (51.4%) 3,899 (55.4%) 2,343 (56.4%)

Example B: Summarizing Complete Data

If you prefer to restrict the analysis to respondents with complete data for a chosen set of variables, you can construct a complete analytic dataset and supply it to a new svydesign object. The function then calculates summary statistics without any missing categories.

vars_for_complete_table <- c(
  "Age", "Gender", "Race1", "BPSysAve", "Pulse", "BMI"
)

nhanes_adults_complete <- nhanes_adults_with_na %>%
  drop_na(all_of(vars_for_complete_table))

adult_design_complete <- svydesign(
  id = ~SDMVPSU,
  strata = ~SDMVSTRA,
  weights = ~WTMEC2YR,
  nest = TRUE,
  data = nhanes_adults_complete
)

table_without_missing <- svytable1(
  design = adult_design_complete,
  strata_var = "ObeseStatus",
  table_vars = c("Age", "Gender", "Race1", "BPSysAve", "Pulse")
)

knitr::kable(
  table_without_missing,
  caption = "Table 2: Summarizing Variables with No Missing Data"
)
Table 2: Summarizing Variables with No Missing Data
Variable Level Overall Not Obese Obese
n 10,736 6,756 3,980
Age
Mean (SD) 47.21 (16.84) 46.52 (17.30) 48.47 (15.89)
Gender
female 5,455 (51.5%) 3,261 (50.9%) 2,194 (52.7%)
male 5,281 (48.5%) 3,495 (49.1%) 1,786 (47.3%)
Race1
Black 2,342 (11.3%) 1,230 (8.9%) 1,112 (15.5%)
Hispanic 1,089 (5.7%) 678 (5.6%) 411 (5.9%)
Mexican 1,547 (8.2%) 889 (7.3%) 658 (9.7%)
White 4,607 (67.6%) 2,998 (69.1%) 1,609 (64.8%)
Other 1,151 (7.2%) 961 (9.0%) 190 (4.0%)
BPSysAve
Mean (SD) 120.94 (17.15) 119.50 (17.23) 123.56 (16.71)
Pulse
Mean (SD) 72.62 (12.09) 71.61 (11.81) 74.46 (12.37)

3. Checking Estimate Reliability for Proportions (NCHS Standards)

Federal statistical agencies discourage reporting proportions that are unstable or based on very small samples. When you request reliability_checks = TRUE, svytable1() returns:

  • The formatted table with suppressed estimates.

  • A companion metrics table showing which estimates failed reliability standards.

These checks follow NCHS guidance for the evaluation of proportions in complex surveys. Estimates with relative standard error values above recommended thresholds are marked or suppressed.

results_list <- svytable1(
  design = adult_design_with_na,
  strata_var = "ObeseStatus",
  table_vars = vars_with_missing,
  reliability_checks = TRUE,
  return_metrics = TRUE
)

knitr::kable(
  results_list$formatted_table,
  caption = "Table 3: Table with NCHS Reliability Checks Applied"
)
Table 3: Table with NCHS Reliability Checks Applied
Variable Level Overall Missing Not Obese Obese
n 11,778 547 7,073 4,158
Age
Mean (SD) 47.18 (16.89) 56.29 (19.15) 46.45 (17.32) 48.25 (15.87)
Gender
female 6,032 (51.9%) 275 (52.3%) 3,443 (51.2%) 2,314 (53.2%)
male 5,746 (48.1%) 272 (47.7%) 3,630 (48.8%) 1,844 (46.8%)
Race1
Black 2,577 (11.4%) 108 (12.1%) 1,296 (9.1%) 1,173 (15.8%)
Hispanic 1,210 (5.8%) * 714 (5.7%) 434 (6.0%)
Mexican 1,680 (8.2%) * 920 (7.3%) 685 (9.8%)
White 5,017 (67.2%) 235 (69.6%) 3,114 (68.6%) 1,668 (64.4%)
Other 1,294 (7.4%) 67 (6.0%) 1,029 (9.3%) 198 (4.0%)
Education
8th Grade 1,321 (6.1%) * 770 (5.9%) 472 (6.3%)
9 - 11th Grade 1,787 (11.8%) 84 (16.8%) 1,021 (11.3%) 682 (12.5%)
High School 2,595 (21.5%) 121 (21.1%) 1,496 (20.2%) 978 (23.8%)
Some College 3,399 (31.3%) 144 (33.4%) 1,968 (29.4%) 1,287 (34.5%)
College Grad 2,656 (29.3%) * 1,805 (33.0%) 735 (22.8%)
Missing 20 (0.1%) * * *
HHIncome
0-4999 270 (1.6%) * 159 (1.6%) 95 (1.6%)
10000-14999 915 (5.5%) * 501 (5.0%) 357 (6.1%)
15000-19999 831 (5.0%) * 484 (4.8%) 301 (5.4%)
20000-24999 911 (5.8%) 42 (9.5%) 522 (5.5%) 347 (6.4%)
25000-34999 1,352 (9.7%) 64 (14.3%) 750 (8.7%) 538 (11.3%)
35000-44999 1,054 (8.9%) * 637 (8.6%) 379 (9.4%)
45000-54999 830 (7.9%) * 502 (8.0%) 306 (8.0%)
5000-9999 503 (2.6%) * 275 (2.3%) 202 (3.0%)
55000-64999 612 (6.1%) * 380 (6.1%) 212 (6.3%)
65000-74999 515 (5.7%) * 299 (5.3%) 197 (6.6%)
75000-99999 993 (11.0%) * 607 (10.8%) 361 (11.6%)
more 99999 1,710 (22.2%) * 1,169 (25.1%) 478 (16.9%)
Missing 1,282 (8.0%) 109 (12.3%) 788 (8.2%) 385 (7.4%)
TotChol
Mean (SD) 5.07 (1.07) 5.00 (1.42) 5.07 (1.08) 5.06 (1.04)
Missing, n (%) 1,169 (5.6%) 426 (15.5%) 480 (5.5%) 263 (5.4%)
SleepHrsNight
Mean (SD) 6.89 (1.34) 6.84 (1.88) 6.95 (1.30) 6.78 (1.39)
Missing, n (%) 26 (0.2%) 6 (1.4%) 10 (0.2%) 10 (0.2%)
SmokeNow
No 2,779 (24.2%) 142 (29.1%) 1,580 (23.2%) 1,057 (26.0%)
Yes 2,454 (20.1%) 102 (19.5%) 1,594 (21.4%) 758 (17.6%)
Missing 6,545 (55.7%) 303 (51.4%) 3,899 (55.4%) 2,343 (56.4%)
knitr::kable(
  results_list$reliability_metrics[
    results_list$reliability_metrics$suppressed == TRUE, ],
  caption = "Reliability Metrics for Suppressed Estimates"
)
Reliability Metrics for Suppressed Estimates
stratum variable level n df deff effective_n ci_low ci_high rse suppressed fail_n_30 fail_eff_n_30 fail_df_8 fail_ciw_30 fail_rciw_130 fail_rse_30
Race1Hispanic Missing Race1 Hispanic 62 26 1.77 35.04 0.00 0.10 63.22 TRUE FALSE FALSE FALSE FALSE TRUE TRUE
Race1Mexican Missing Race1 Mexican 75 26 2.18 34.47 0.03 0.20 38.01 TRUE FALSE FALSE FALSE FALSE TRUE TRUE
Education8th Grade Missing Education 8th Grade 79 26 1.81 43.74 0.04 0.19 33.69 TRUE FALSE FALSE FALSE FALSE TRUE TRUE
EducationCollege Grad Missing Education College Grad 116 26 3.61 32.16 0.08 0.35 32.76 TRUE FALSE FALSE FALSE FALSE TRUE TRUE
EducationMissing Missing Education Missing 3 26 0.29 3.00 0.00 0.01 102.95 TRUE TRUE TRUE FALSE FALSE FALSE TRUE
EducationMissing1 Not Obese Education Missing 13 33 1.20 10.86 0.00 0.00 34.32 TRUE TRUE TRUE FALSE FALSE FALSE TRUE
EducationMissing2 Obese Education Missing 4 33 0.54 4.00 0.00 0.00 52.87 TRUE TRUE TRUE FALSE FALSE FALSE TRUE
HHIncome0-4999 Missing HHIncome 0-4999 16 26 1.24 12.93 0.00 0.05 84.76 TRUE TRUE TRUE FALSE FALSE TRUE TRUE
HHIncome10000-14999 Missing HHIncome 10000-14999 57 26 1.63 34.95 0.05 0.20 29.54 TRUE FALSE FALSE FALSE FALSE TRUE FALSE
HHIncome15000-19999 Missing HHIncome 15000-19999 46 26 0.63 46.00 0.02 0.07 32.80 TRUE FALSE FALSE FALSE FALSE TRUE TRUE
HHIncome35000-44999 Missing HHIncome 35000-44999 38 26 3.85 9.87 0.03 0.25 48.41 TRUE FALSE TRUE FALSE FALSE TRUE TRUE
HHIncome45000-54999 Missing HHIncome 45000-54999 22 26 1.48 14.89 0.00 0.07 79.51 TRUE TRUE TRUE FALSE FALSE TRUE TRUE
HHIncome5000-9999 Missing HHIncome 5000-9999 26 26 1.28 20.37 0.03 0.13 35.23 TRUE TRUE TRUE FALSE FALSE TRUE TRUE
HHIncome55000-64999 Missing HHIncome 55000-64999 20 26 0.65 20.00 0.00 0.04 54.45 TRUE TRUE TRUE FALSE FALSE FALSE TRUE
HHIncome65000-74999 Missing HHIncome 65000-74999 19 26 3.31 5.74 0.00 0.14 85.07 TRUE TRUE TRUE FALSE FALSE TRUE TRUE
HHIncome75000-99999 Missing HHIncome 75000-99999 25 26 1.09 22.92 0.02 0.11 36.68 TRUE TRUE TRUE FALSE FALSE TRUE TRUE
HHIncomemore 99999 Missing HHIncome more 99999 63 26 3.04 20.75 0.09 0.34 29.37 TRUE FALSE TRUE FALSE FALSE FALSE FALSE

4. Checking Estimate Reliability for Means (NCHS Standards)

For numeric variables, svytable1() performs an additional check:

  • fail_rse_30: Flags means whose relative standard error (RSE) is at least 30 percent.

RSE is calculated as:

RSE = (Standard Error / Estimate) * 100

An RSE ≥ 30 percent indicates an unreliable estimate based on NCHS guidance.

References

Health Statistics (US), National Center for. 1994. Plan and Operation of the Third National Health and Nutrition Examination Survey, 1988-94. Vol. 32. US Government Printing Office.
Seidenberg, Andrew B, Richard P Moser, and Brady T West. 2023. “Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA).” Journal of Survey Statistics and Methodology 11 (4): 743–57.