Creating Descriptive Table 1 Summaries with svyTable1
Source:vignettes/v1-svyTable1-basics.Rmd
v1-svyTable1-basics.Rmd{r setup} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The svyTable1 package provides a suite of functions for
creating publication-ready tables from complex analytical results. This
vignette focuses on its core capability: generating descriptive “Table
1” summaries using the svytable1() function.
Describing Complex Survey Data
The svytable1() function defaults to
mode = "mixed". This approach is considered a best practice
in high-quality survey research because it promotes transparency.
When you present a table in mixed mode, two types of information are shown for categorical variables:
- Unweighted N: The actual number of survey respondents in that category. This indicates precision (Seidenberg, Moser, and West 2023).
- Weighted Percentage: The proportion of the target population estimated to fall in that category, accounting for survey design (Health Statistics (US) 1994).
This structure prevents common reporting problems such as showing weighted counts (which do not represent real sample sizes) or hiding sparse categories that contribute instability.
For continuous variables, the function reports weighted means, standard errors, and confidence intervals. It also offers optional reliability checks derived from National Center for Health Statistics (NCHS) guidance, including standard error and relative standard error thresholds.
Example: Using NHANES Data (2009–2012)
The examples below use data from the NHANES raw dataset to demonstrate how to prepare a survey design and generate descriptive tables. The code illustrates both incomplete data situations and complete case summaries.
1. Data Preparation
data(NHANESraw)
nhanes_adults_with_na <- NHANESraw %>%
dplyr::filter(Age >= 20) %>%
mutate(
ObeseStatus = factor(
ifelse(BMI >= 30, "Obese", "Not Obese"),
levels = c("Not Obese", "Obese")
)
)
adult_design_with_na <- svydesign(
id = ~SDMVPSU,
strata = ~SDMVSTRA,
weights = ~WTMEC2YR,
nest = TRUE,
data = nhanes_adults_with_na
)2. Generating Tables
Example A: Handling Variables with Missing Data
Variables in large health surveys often contain missing values.
svytable1() automatically identifies missingness and
calculates unweighted Ns that reflect real denominators. Weighted
percentages are based only on cases with observed data, consistent with
standard survey practice.
vars_with_missing <- c(
"Age", "Gender", "Race1", "Education", "HHIncome",
"TotChol", "SleepHrsNight", "SmokeNow"
)
table_with_missing <- svytable1(
design = adult_design_with_na,
strata_var = "ObeseStatus",
table_vars = vars_with_missing
)
knitr::kable(
table_with_missing,
caption = "Table 1: Summarizing Variables with Missing Data"
)| Variable | Level | Overall | Missing | Not Obese | Obese |
|---|---|---|---|---|---|
| n | 11,778 | 547 | 7,073 | 4,158 | |
| Age | |||||
| Mean (SD) | 47.18 (16.89) | 56.29 (19.15) | 46.45 (17.32) | 48.25 (15.87) | |
| Gender | |||||
| female | 6,032 (51.9%) | 275 (52.3%) | 3,443 (51.2%) | 2,314 (53.2%) | |
| male | 5,746 (48.1%) | 272 (47.7%) | 3,630 (48.8%) | 1,844 (46.8%) | |
| Race1 | |||||
| Black | 2,577 (11.4%) | 108 (12.1%) | 1,296 (9.1%) | 1,173 (15.8%) | |
| Hispanic | 1,210 (5.8%) | 62 (2.9%) | 714 (5.7%) | 434 (6.0%) | |
| Mexican | 1,680 (8.2%) | 75 (9.4%) | 920 (7.3%) | 685 (9.8%) | |
| White | 5,017 (67.2%) | 235 (69.6%) | 3,114 (68.6%) | 1,668 (64.4%) | |
| Other | 1,294 (7.4%) | 67 (6.0%) | 1,029 (9.3%) | 198 (4.0%) | |
| Education | |||||
| 8th Grade | 1,321 (6.1%) | 79 (9.8%) | 770 (5.9%) | 472 (6.3%) | |
| 9 - 11th Grade | 1,787 (11.8%) | 84 (16.8%) | 1,021 (11.3%) | 682 (12.5%) | |
| High School | 2,595 (21.5%) | 121 (21.1%) | 1,496 (20.2%) | 978 (23.8%) | |
| Some College | 3,399 (31.3%) | 144 (33.4%) | 1,968 (29.4%) | 1,287 (34.5%) | |
| College Grad | 2,656 (29.3%) | 116 (18.7%) | 1,805 (33.0%) | 735 (22.8%) | |
| Missing | 20 (0.1%) | 3 (0.2%) | 13 (0.1%) | 4 (0.0%) | |
| HHIncome | |||||
| 0-4999 | 270 (1.6%) | 16 (1.2%) | 159 (1.6%) | 95 (1.6%) | |
| 10000-14999 | 915 (5.5%) | 57 (11.3%) | 501 (5.0%) | 357 (6.1%) | |
| 15000-19999 | 831 (5.0%) | 46 (3.8%) | 484 (4.8%) | 301 (5.4%) | |
| 20000-24999 | 911 (5.8%) | 42 (9.5%) | 522 (5.5%) | 347 (6.4%) | |
| 25000-34999 | 1,352 (9.7%) | 64 (14.3%) | 750 (8.7%) | 538 (11.3%) | |
| 35000-44999 | 1,054 (8.9%) | 38 (10.1%) | 637 (8.6%) | 379 (9.4%) | |
| 45000-54999 | 830 (7.9%) | 22 (1.6%) | 502 (8.0%) | 306 (8.0%) | |
| 5000-9999 | 503 (2.6%) | 26 (6.6%) | 275 (2.3%) | 202 (3.0%) | |
| 55000-64999 | 612 (6.1%) | 20 (1.5%) | 380 (6.1%) | 212 (6.3%) | |
| 65000-74999 | 515 (5.7%) | 19 (3.0%) | 299 (5.3%) | 197 (6.6%) | |
| 75000-99999 | 993 (11.0%) | 25 (5.3%) | 607 (10.8%) | 361 (11.6%) | |
| more 99999 | 1,710 (22.2%) | 63 (19.4%) | 1,169 (25.1%) | 478 (16.9%) | |
| Missing | 1,282 (8.0%) | 109 (12.3%) | 788 (8.2%) | 385 (7.4%) | |
| TotChol | |||||
| Mean (SD) | 5.07 (1.07) | 5.00 (1.42) | 5.07 (1.08) | 5.06 (1.04) | |
| Missing, n (%) | 1,169 (5.6%) | 426 (15.5%) | 480 (5.5%) | 263 (5.4%) | |
| SleepHrsNight | |||||
| Mean (SD) | 6.89 (1.34) | 6.84 (1.88) | 6.95 (1.30) | 6.78 (1.39) | |
| Missing, n (%) | 26 (0.2%) | 6 (1.4%) | 10 (0.2%) | 10 (0.2%) | |
| SmokeNow | |||||
| No | 2,779 (24.2%) | 142 (29.1%) | 1,580 (23.2%) | 1,057 (26.0%) | |
| Yes | 2,454 (20.1%) | 102 (19.5%) | 1,594 (21.4%) | 758 (17.6%) | |
| Missing | 6,545 (55.7%) | 303 (51.4%) | 3,899 (55.4%) | 2,343 (56.4%) |
Example B: Summarizing Complete Data
If you prefer to restrict the analysis to respondents with complete
data for a chosen set of variables, you can construct a complete
analytic dataset and supply it to a new svydesign object.
The function then calculates summary statistics without any missing
categories.
vars_for_complete_table <- c(
"Age", "Gender", "Race1", "BPSysAve", "Pulse", "BMI"
)
nhanes_adults_complete <- nhanes_adults_with_na %>%
drop_na(all_of(vars_for_complete_table))
adult_design_complete <- svydesign(
id = ~SDMVPSU,
strata = ~SDMVSTRA,
weights = ~WTMEC2YR,
nest = TRUE,
data = nhanes_adults_complete
)
table_without_missing <- svytable1(
design = adult_design_complete,
strata_var = "ObeseStatus",
table_vars = c("Age", "Gender", "Race1", "BPSysAve", "Pulse")
)
knitr::kable(
table_without_missing,
caption = "Table 2: Summarizing Variables with No Missing Data"
)| Variable | Level | Overall | Not Obese | Obese |
|---|---|---|---|---|
| n | 10,736 | 6,756 | 3,980 | |
| Age | ||||
| Mean (SD) | 47.21 (16.84) | 46.52 (17.30) | 48.47 (15.89) | |
| Gender | ||||
| female | 5,455 (51.5%) | 3,261 (50.9%) | 2,194 (52.7%) | |
| male | 5,281 (48.5%) | 3,495 (49.1%) | 1,786 (47.3%) | |
| Race1 | ||||
| Black | 2,342 (11.3%) | 1,230 (8.9%) | 1,112 (15.5%) | |
| Hispanic | 1,089 (5.7%) | 678 (5.6%) | 411 (5.9%) | |
| Mexican | 1,547 (8.2%) | 889 (7.3%) | 658 (9.7%) | |
| White | 4,607 (67.6%) | 2,998 (69.1%) | 1,609 (64.8%) | |
| Other | 1,151 (7.2%) | 961 (9.0%) | 190 (4.0%) | |
| BPSysAve | ||||
| Mean (SD) | 120.94 (17.15) | 119.50 (17.23) | 123.56 (16.71) | |
| Pulse | ||||
| Mean (SD) | 72.62 (12.09) | 71.61 (11.81) | 74.46 (12.37) |
3. Checking Estimate Reliability for Proportions (NCHS Standards)
Federal statistical agencies discourage reporting proportions that
are unstable or based on very small samples. When you request
reliability_checks = TRUE, svytable1()
returns:
The formatted table with suppressed estimates.
A companion metrics table showing which estimates failed reliability standards.
These checks follow NCHS guidance for the evaluation of proportions in complex surveys. Estimates with relative standard error values above recommended thresholds are marked or suppressed.
results_list <- svytable1(
design = adult_design_with_na,
strata_var = "ObeseStatus",
table_vars = vars_with_missing,
reliability_checks = TRUE,
return_metrics = TRUE
)
knitr::kable(
results_list$formatted_table,
caption = "Table 3: Table with NCHS Reliability Checks Applied"
)| Variable | Level | Overall | Missing | Not Obese | Obese |
|---|---|---|---|---|---|
| n | 11,778 | 547 | 7,073 | 4,158 | |
| Age | |||||
| Mean (SD) | 47.18 (16.89) | 56.29 (19.15) | 46.45 (17.32) | 48.25 (15.87) | |
| Gender | |||||
| female | 6,032 (51.9%) | 275 (52.3%) | 3,443 (51.2%) | 2,314 (53.2%) | |
| male | 5,746 (48.1%) | 272 (47.7%) | 3,630 (48.8%) | 1,844 (46.8%) | |
| Race1 | |||||
| Black | 2,577 (11.4%) | 108 (12.1%) | 1,296 (9.1%) | 1,173 (15.8%) | |
| Hispanic | 1,210 (5.8%) | * | 714 (5.7%) | 434 (6.0%) | |
| Mexican | 1,680 (8.2%) | * | 920 (7.3%) | 685 (9.8%) | |
| White | 5,017 (67.2%) | 235 (69.6%) | 3,114 (68.6%) | 1,668 (64.4%) | |
| Other | 1,294 (7.4%) | 67 (6.0%) | 1,029 (9.3%) | 198 (4.0%) | |
| Education | |||||
| 8th Grade | 1,321 (6.1%) | * | 770 (5.9%) | 472 (6.3%) | |
| 9 - 11th Grade | 1,787 (11.8%) | 84 (16.8%) | 1,021 (11.3%) | 682 (12.5%) | |
| High School | 2,595 (21.5%) | 121 (21.1%) | 1,496 (20.2%) | 978 (23.8%) | |
| Some College | 3,399 (31.3%) | 144 (33.4%) | 1,968 (29.4%) | 1,287 (34.5%) | |
| College Grad | 2,656 (29.3%) | * | 1,805 (33.0%) | 735 (22.8%) | |
| Missing | 20 (0.1%) | * | * | * | |
| HHIncome | |||||
| 0-4999 | 270 (1.6%) | * | 159 (1.6%) | 95 (1.6%) | |
| 10000-14999 | 915 (5.5%) | * | 501 (5.0%) | 357 (6.1%) | |
| 15000-19999 | 831 (5.0%) | * | 484 (4.8%) | 301 (5.4%) | |
| 20000-24999 | 911 (5.8%) | 42 (9.5%) | 522 (5.5%) | 347 (6.4%) | |
| 25000-34999 | 1,352 (9.7%) | 64 (14.3%) | 750 (8.7%) | 538 (11.3%) | |
| 35000-44999 | 1,054 (8.9%) | * | 637 (8.6%) | 379 (9.4%) | |
| 45000-54999 | 830 (7.9%) | * | 502 (8.0%) | 306 (8.0%) | |
| 5000-9999 | 503 (2.6%) | * | 275 (2.3%) | 202 (3.0%) | |
| 55000-64999 | 612 (6.1%) | * | 380 (6.1%) | 212 (6.3%) | |
| 65000-74999 | 515 (5.7%) | * | 299 (5.3%) | 197 (6.6%) | |
| 75000-99999 | 993 (11.0%) | * | 607 (10.8%) | 361 (11.6%) | |
| more 99999 | 1,710 (22.2%) | * | 1,169 (25.1%) | 478 (16.9%) | |
| Missing | 1,282 (8.0%) | 109 (12.3%) | 788 (8.2%) | 385 (7.4%) | |
| TotChol | |||||
| Mean (SD) | 5.07 (1.07) | 5.00 (1.42) | 5.07 (1.08) | 5.06 (1.04) | |
| Missing, n (%) | 1,169 (5.6%) | 426 (15.5%) | 480 (5.5%) | 263 (5.4%) | |
| SleepHrsNight | |||||
| Mean (SD) | 6.89 (1.34) | 6.84 (1.88) | 6.95 (1.30) | 6.78 (1.39) | |
| Missing, n (%) | 26 (0.2%) | 6 (1.4%) | 10 (0.2%) | 10 (0.2%) | |
| SmokeNow | |||||
| No | 2,779 (24.2%) | 142 (29.1%) | 1,580 (23.2%) | 1,057 (26.0%) | |
| Yes | 2,454 (20.1%) | 102 (19.5%) | 1,594 (21.4%) | 758 (17.6%) | |
| Missing | 6,545 (55.7%) | 303 (51.4%) | 3,899 (55.4%) | 2,343 (56.4%) |
knitr::kable(
results_list$reliability_metrics[
results_list$reliability_metrics$suppressed == TRUE, ],
caption = "Reliability Metrics for Suppressed Estimates"
)| stratum | variable | level | n | df | deff | effective_n | ci_low | ci_high | rse | suppressed | fail_n_30 | fail_eff_n_30 | fail_df_8 | fail_ciw_30 | fail_rciw_130 | fail_rse_30 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Race1Hispanic | Missing | Race1 | Hispanic | 62 | 26 | 1.77 | 35.04 | 0.00 | 0.10 | 63.22 | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE |
| Race1Mexican | Missing | Race1 | Mexican | 75 | 26 | 2.18 | 34.47 | 0.03 | 0.20 | 38.01 | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE |
| Education8th Grade | Missing | Education | 8th Grade | 79 | 26 | 1.81 | 43.74 | 0.04 | 0.19 | 33.69 | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE |
| EducationCollege Grad | Missing | Education | College Grad | 116 | 26 | 3.61 | 32.16 | 0.08 | 0.35 | 32.76 | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE |
| EducationMissing | Missing | Education | Missing | 3 | 26 | 0.29 | 3.00 | 0.00 | 0.01 | 102.95 | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE |
| EducationMissing1 | Not Obese | Education | Missing | 13 | 33 | 1.20 | 10.86 | 0.00 | 0.00 | 34.32 | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE |
| EducationMissing2 | Obese | Education | Missing | 4 | 33 | 0.54 | 4.00 | 0.00 | 0.00 | 52.87 | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE |
| HHIncome0-4999 | Missing | HHIncome | 0-4999 | 16 | 26 | 1.24 | 12.93 | 0.00 | 0.05 | 84.76 | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE |
| HHIncome10000-14999 | Missing | HHIncome | 10000-14999 | 57 | 26 | 1.63 | 34.95 | 0.05 | 0.20 | 29.54 | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | FALSE |
| HHIncome15000-19999 | Missing | HHIncome | 15000-19999 | 46 | 26 | 0.63 | 46.00 | 0.02 | 0.07 | 32.80 | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE | TRUE |
| HHIncome35000-44999 | Missing | HHIncome | 35000-44999 | 38 | 26 | 3.85 | 9.87 | 0.03 | 0.25 | 48.41 | TRUE | FALSE | TRUE | FALSE | FALSE | TRUE | TRUE |
| HHIncome45000-54999 | Missing | HHIncome | 45000-54999 | 22 | 26 | 1.48 | 14.89 | 0.00 | 0.07 | 79.51 | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE |
| HHIncome5000-9999 | Missing | HHIncome | 5000-9999 | 26 | 26 | 1.28 | 20.37 | 0.03 | 0.13 | 35.23 | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE |
| HHIncome55000-64999 | Missing | HHIncome | 55000-64999 | 20 | 26 | 0.65 | 20.00 | 0.00 | 0.04 | 54.45 | TRUE | TRUE | TRUE | FALSE | FALSE | FALSE | TRUE |
| HHIncome65000-74999 | Missing | HHIncome | 65000-74999 | 19 | 26 | 3.31 | 5.74 | 0.00 | 0.14 | 85.07 | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE |
| HHIncome75000-99999 | Missing | HHIncome | 75000-99999 | 25 | 26 | 1.09 | 22.92 | 0.02 | 0.11 | 36.68 | TRUE | TRUE | TRUE | FALSE | FALSE | TRUE | TRUE |
| HHIncomemore 99999 | Missing | HHIncome | more 99999 | 63 | 26 | 3.04 | 20.75 | 0.09 | 0.34 | 29.37 | TRUE | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE |
4. Checking Estimate Reliability for Means (NCHS Standards)
For numeric variables, svytable1() performs an
additional check:
-
fail_rse_30: Flags means whose relative standard error (RSE) is at least 30 percent.
RSE is calculated as:
RSE = (Standard Error / Estimate) * 100
An RSE ≥ 30 percent indicates an unreliable estimate based on NCHS guidance.