load("Data/missingdata/Flegal2016.RData")
ls()
#> [1] "dat.full"
str(dat.full)
#> 'data.frame': 10175 obs. of 12 variables:
#> $ SEQN : 'labelled' int 73557 73558 73559 73560 73561 73562 73563 73564 73565 73566 ...
#> ..- attr(*, "label")= chr "Respondent sequence number"
#> $ RIDAGEYR: 'labelled' int 69 54 72 9 73 56 0 61 42 56 ...
#> ..- attr(*, "label")= chr "Age in years at screening"
#> $ RIAGENDR: Factor w/ 2 levels "Male","Female": 1 1 1 1 2 1 1 2 1 2 ...
#> $ DMDEDUC2: Factor w/ 7 levels "Less than 9th grade",..: 3 3 4 NA 5 4 NA 5 3 3 ...
#> $ RIDRETH3: Factor w/ 6 levels "Mexican American",..: 4 3 3 3 3 1 3 3 2 3 ...
#> $ RIDEXPRG: Factor w/ 3 levels "Yes, positive lab pregnancy test",..: NA NA NA NA NA NA NA NA NA NA ...
#> $ WTINT2YR: 'labelled' num 13281 23682 57215 55201 63710 ...
#> ..- attr(*, "label")= chr "Full sample 2 year interview weight"
#> $ SDMVPSU : 'labelled' int 1 1 1 2 2 1 1 1 2 1 ...
#> ..- attr(*, "label")= chr "Masked variance pseudo-PSU"
#> $ SDMVSTRA: 'labelled' int 112 108 109 109 116 111 105 114 106 112 ...
#> ..- attr(*, "label")= chr "Masked variance pseudo-stratum"
#> $ BMXBMI : 'labelled' num 26.7 28.6 28.9 17.1 19.7 41.7 NA 35.7 NA 26.5 ...
#> ..- attr(*, "label")= chr "Body Mass Index (kg/m**2)"
#> $ SMQ020 : Factor w/ 3 levels "Yes","No","Don't know": 1 1 1 NA 2 1 NA 2 1 1 ...
#> $ SMQ040 : Factor w/ 3 levels "Every day","Some days",..: 3 2 3 NA NA 3 NA NA 3 1 ...Testing Interactions
We will use the following article by Flegal et al. (2016). We will use the same data as before.
Objective: To statistically test whether an interaction term contributes significantly to a model when using complex survey data (NHANES) that has been imputed using mice.
The Scenario:
Outcome (\(Y\)): BMI (BMXBMI)
Exposure (\(X\)): Age (RIDAGEYR)
Moderator (\(Z\)): Smoking Status (SMQ020)
Question: Does the effect of Age on BMI depend on whether the person smokes?
Step 1: Imputation
Before modeling, we must handle missing data to avoid bias.
library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#>
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#>
#> dotchart
# 1. Impute the data
# Recommendation: Set m = 20 for interaction tests to stabilize variance.
imp <- mice(dat.full, m = 20, maxit = 5)
#>
#> iter imp variable
#> 1 1 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 2 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 3 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 4 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 5 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 6 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 7 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 8 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 9 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 10 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 11 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 12 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 13 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 14 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 15 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 16 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 17 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 18 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 19 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 1 20 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 1 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 2 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 3 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 4 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 5 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 6 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 7 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 8 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 9 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 10 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 11 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 12 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 13 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 14 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 15 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 16 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 17 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 18 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 19 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 2 20 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 1 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 2 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 3 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 4 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 5 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 6 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 7 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 8 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 9 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 10 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 11 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 12 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 13 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 14 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 15 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 16 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 17 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 18 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 19 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 3 20 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 1 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 2 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 3 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 4 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 5 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 6 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 7 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 8 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 9 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 10 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 11 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 12 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 13 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 14 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 15 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 16 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 17 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 18 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 19 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 4 20 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 1 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 2 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 3 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 4 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 5 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 6 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 7 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 8 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 9 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 10 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 11 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 12 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 13 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 14 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 15 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 16 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 17 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 18 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 19 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> 5 20 DMDEDUC2 RIDEXPRG BMXBMI SMQ020 SMQ040
#> Warning: Number of logged events: 200
# 2. Extract the data into a list
# We cannot use 'with()' easily with svydesign, so we create a list of
# completed dataframes to loop over manually.
imp_list <- complete(imp, "all")Step 2: Fitting the Nested Models
To test an interaction, we must compare two models:
-
The Full Model: Includes the interaction term (Age * Smoking).
- The Reduced Model: Includes only main effects (Age + Smoking).
We use lapply to fit these models across all 20 imputed datasets.
# --- Helper: Calculate Design Degrees of Freedom (dfcom) ---
# We need this to prevent NaN errors in the final test.
# It is usually (Number of PSUs - Number of Strata).
# We calculate it once using the first imputed dataset.
temp_des <- svydesign(id = ~SDMVPSU,
strata = ~SDMVSTRA,
weights = ~WTINT2YR,
data = imp_list[[1]],
nest = TRUE)
df_degrees_freedom <- degf(temp_des)
# May need to do subsetting if the original data was reduced.Model A: The Full Model (With Interaction)
Model B: The Reduced Model (No Interaction)
Step 3: Pooling and Testing
We convert our lists of models into mira objects (Multiply Imputed Repeated Analyses) so the mice package recognizes them. Then, we use the Wald Test (\(D_1\)).
Note that pool.compare(fit_full, fit_reduced, method = "wald") is deprecated. D1 correctly accounts for the small sample correction inherent in your survey design (dfcom), whereas pool.compare might be overestimating the precision.
# 1. Convert to 'mira' class
fit_full <- list(analyses = fit_full_list)
class(fit_full) <- "mira"
fit_reduced <- list(analyses = fit_reduced_list)
class(fit_reduced) <- "mira"
# 2. Run the D1 Statistic (Multivariate Wald Test)
# We pass the dfcom we calculated earlier to ensure stability
test_result <- D1(fit_full,
fit_reduced,
dfcom = df_degrees_freedom)Step 4: Detailed Breakdown of the Output
When you run print(test_result), you get a row of statistics.
names(test_result)
#> [1] "call" "result" "formulas" "m" "method" "use" "dfcom"
test_result$dfcom
#> [1] 15
library(kableExtra)
kable(test_result$result)| F.value | df1 | df2 | P(>F) | RIV |
|---|---|---|---|---|
| 0.5023733 | 2 | 2.625538 | 0.6535277 | 2.959959 |
The Definitions
| Metric | What it is | How to Interpret |
|---|---|---|
| test statistic | The F-value of the Wald test. | High values (> 4) suggest the models are significantly different. Values near 1 suggest the interaction adds nothing. |
| df1 | Numerator Degrees of Freedom. | Number of parameters added. |
| dfcom | Complete Data Degrees of Freedom. | From survey design. Low dfcom (< 30) means limited power. |
| riv | Relative Increase in Variance. | Low (< 0.5) means stable imputations. High (> 1) means large uncertainty. |
| df2 | Denominator Degrees of Freedom. | Effective sample size after MI and design. |
| p.value | Significance Level. | < 0.05 indicates significant interaction. |
Step 5: How to Report This
Scenario A: The Interaction Significant (p < 0.05)
“To test whether the association between age and BMI was modified by smoking status, we compared nested survey-weighted regression models using the multiparameter Wald test (\(D_1\)) on 20 imputed datasets. We found a significant interaction between age and smoking (\(F(2, 350.5) = 4.52, p = 0.01\)). Consequently, we present stratified results for smokers and non-smokers.”
Scenario B: The Interaction NOT Significant (p > 0.05)
“We tested for effect modification by smoking status using the \(D_1\) Wald statistic. The interaction term was not statistically significant (\(F(2, 105.2) = 0.88, p = 0.48\)). Therefore, the interaction term was removed, and we proceeded with the main effects model adjusted for smoking status.”
You can use the same D1 function and workflow to test any set of nested models to check specification.
If you add an interaction term (e.g., Age * Smoking), the \(D_1\) test checks if any of these interaction coefficients are non-zero.
If you want to test whether adding a set of covariates (e.g., Race, Education, and Income) improves your model, you can compare a “Base Model” (without them) to a “Full Model” (with them). The \(D_1\) test will tell you if the set of covariates significantly improves the fit.