PS Weighting (US)
Propensity analysis problem
In this chapter, we will use propensity score weighting (using SMD cut-point 0.2, may adjust for imbalanced and/or all covariates in the outcome model, if any) analysis as per the following recommendations - (Zanutto 2006) - (DuGoff, Schuler, and Stuart 2014) - (Austin, Jembere, and Chiu 2018)
Dataset
- The following modified NHANES dataset
- NHANES15lab5.RData
 
 - use the data set “analytic.with.miss” within this file.
- for obtaining the final treatment effect estimates, you can omit missing values, but only after creating the design (e.g., subset the design, not the data itself directly).
 
 - The same dataset was used for propensity score weighting
 
Variables
- Outcome: diabetes
- ‘No’ as the reference category
 
 - Exposure: bmi
- convert to binary with >25 vs. <= 25,
 - with > 25 as the reference category
 
 - Confounder list:
- gender
 - age
- assume continuous
 
 - race
 - income
 - education
 - married
 - cholesterol
 - diastolicBP
 - systolicBP
 
 - Mediator:
- physical.work
- ‘No’ as the reference category
 
 
 - physical.work
 - Survey features
- psu
 - strata
 - weight
 
 
Pre-processing
Load data
Variable summary
# Full data
dat.full <- analytic.with.miss
# Exposure
dat.full$bmi <- with(dat.full, ifelse(bmi>25, "Overweight", 
                                      ifelse(bmi<=25, "Not overweight", NA)))
dat.full$bmi <- as.factor(dat.full$bmi)
dat.full$bmi <- relevel(dat.full$bmi, ref = "Overweight")
# Drop unnecessary variables 
dat.full$born <- NULL
dat.full$physical.work <- NULL
# Rename the weight variable into interview.weight
names(dat.full)[names(dat.full) == "weight"] <- "interview.weight"Complete case data
We will use the complete case data to perform the analysis.
Reproducibility
Approach by Zanutto (2006)
Step 1
Step 2
For the second step, we will calculate the both unstabilized and stabilized weights. However, stabilized inverse probability weight is often recommended to prevent from extreme weights (Hernán and Robins 2006).
# Propensity scores
ps.fit <- glm(ps.formula, data = analytic.data, family = binomial("logit"))
analytic.data$ps <- predict(ps.fit, type = "response", newdata = analytic.data)
# Unstabilized weight
analytic.data$usweight <- with(analytic.data, ifelse(I(bmi=="Not overweight"), 
                                                     1/ps, 1/(1-ps)))
# Unstabilized weight summary
round(summary(analytic.data$usweight), 2)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    1.01    1.33    1.63    2.10    2.14   62.31
# Stabilized weight
analytic.data$sweight <- with(analytic.data, ifelse(I(bmi=="Not overweight"), 
                                                    mean(I(bmi=="Not overweight"))/ps, 
                                                    (1-mean(I(bmi=="Not overweight")))/(1-ps)))
# Stabilized weight summary
round(summary(analytic.data$sweight), 2)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.44    0.70    0.84    1.04    1.08   25.40We can see that the mean of stabilized weights is 1, while it is approximately 2.1 for unstabilized weights. For both unstabilized and stabilized weights, it seems there are extreme weights, particularly for the unstabilized weights. Extreme weights could be dealt with weight truncation, typically truncated at the 1st and 99th percentiles (Cole and Hernán 2008).
Let us truncate the weights at the 1st and 99th percentiles
# Truncating unstabilized weight
analytic.data <- analytic.data %>% 
  mutate(usweight_t = pmin(pmax(usweight, quantile(usweight, 0.01)), 
                           quantile(usweight, 0.99)))
summary(analytic.data$usweight_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   1.055   1.335   1.629   2.020   2.140  10.912
# Truncating stabilized weight
analytic.data <- analytic.data %>% 
  mutate(sweight_t = pmin(pmax(sweight, quantile(sweight, 0.01)), 
                          quantile(sweight, 0.99)))
summary(analytic.data$sweight_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.4861  0.7035  0.8400  1.0044  1.0833  4.4901Step 3
Now we will check the distribution of the covariates by the exposure status on the pseudo population in terms of pre-specified SMD.
# Covariates
vars <- c("gender", "age", "race", "income", "education", "married", "cholesterol", 
         "diastolicBP", "systolicBP")
# Design with truncated unstabilized weight
design.unstab <- svydesign(ids = ~ID, weights = ~usweight_t, data = analytic.data)
# Design with truncated stabilized weight
design.stab <- svydesign(ids = ~ID, weights = ~sweight_t, data = analytic.data)
# Balance checking with truncated unstabilized weight
tab.unstab <- svyCreateTableOne(vars = vars, strata = "bmi", data = design.unstab, test = F)
print(tab.unstab, smd = T)
#>                          Stratified by bmi
#>                           Overweight      Not overweight  SMD   
#>   n                        6199.3          6559.7               
#>   gender = Male (%)        2999.3 (48.4)   3126.6 (47.7)   0.014
#>   age (mean (SD))           48.38 (17.38)   49.83 (18.11)  0.082
#>   race (%)                                                 0.047
#>      Black                 1304.7 (21.0)   1407.8 (21.5)        
#>      Hispanic              1901.9 (30.7)   1873.7 (28.6)        
#>      Other                  956.4 (15.4)   1029.3 (15.7)        
#>      White                 2036.2 (32.8)   2249.0 (34.3)        
#>   income (%)                                               0.033
#>      <25k                  1664.3 (26.8)   1829.5 (27.9)        
#>      Between.25kto54k      1986.5 (32.0)   2133.5 (32.5)        
#>      Between.55kto99k      1449.1 (23.4)   1466.6 (22.4)        
#>      Over100k              1099.4 (17.7)   1130.0 (17.2)        
#>   education (%)                                            0.019
#>      College               3469.5 (56.0)   3610.4 (55.0)        
#>      High.School           2047.7 (33.0)   2217.9 (33.8)        
#>      School                 682.1 (11.0)    731.4 (11.1)        
#>   married (%)                                              0.040
#>      Married               3728.2 (60.1)   3853.6 (58.7)        
#>      Never.married         1101.9 (17.8)   1147.0 (17.5)        
#>      Previously.married    1369.2 (22.1)   1559.1 (23.8)        
#>   cholesterol (mean (SD))  181.58 (40.93)  183.13 (43.59)  0.037
#>   diastolicBP (mean (SD))   66.31 (14.64)   66.87 (14.78)  0.038
#>   systolicBP (mean (SD))   121.04 (16.61)  123.60 (23.62)  0.125
# Balance checking with truncated stabilized weight
tab.stab <- svyCreateTableOne(vars = vars, strata = "bmi", data = design.stab, test = F)
print(tab.stab, smd = T)
#>                          Stratified by bmi
#>                           Overweight      Not overweight  SMD   
#>   n                        3665.9          2677.9               
#>   gender = Male (%)        1771.7 (48.3)   1276.5 (47.7)   0.013
#>   age (mean (SD))           48.40 (17.38)   49.84 (18.12)  0.081
#>   race (%)                                                 0.048
#>      Black                  772.6 (21.1)    575.0 (21.5)        
#>      Hispanic              1126.3 (30.7)    764.6 (28.6)        
#>      Other                  561.1 (15.3)    420.5 (15.7)        
#>      White                 1206.0 (32.9)    917.9 (34.3)        
#>   income (%)                                               0.033
#>      <25k                   982.8 (26.8)    747.3 (27.9)        
#>      Between.25kto54k      1176.4 (32.1)    870.8 (32.5)        
#>      Between.55kto99k       857.6 (23.4)    598.5 (22.3)        
#>      Over100k               649.2 (17.7)    461.3 (17.2)        
#>   education (%)                                            0.019
#>      College               2051.4 (56.0)   1473.3 (55.0)        
#>      High.School           1210.5 (33.0)    905.9 (33.8)        
#>      School                 404.0 (11.0)    298.7 (11.2)        
#>   married (%)                                              0.040
#>      Married               2205.9 (60.2)   1572.9 (58.7)        
#>      Never.married          650.0 (17.7)    468.0 (17.5)        
#>      Previously.married     810.1 (22.1)    637.1 (23.8)        
#>   cholesterol (mean (SD))  181.62 (40.91)  183.15 (43.61)  0.036
#>   diastolicBP (mean (SD))   66.37 (14.53)   66.87 (14.81)  0.034
#>   systolicBP (mean (SD))   121.06 (16.58)  123.63 (23.65)  0.126As we can see, all SMDs are less than our specified cut-point of 0.2, indicating that there is good covariate balancing. next, we will fit the outcome model on the pseudo population (i.e., weighted data). Note that we must utilize the survey feature as the design for the population-level estimate. For this step, we will multiply propensity score weight and survey weight and create a new weight variable.
Step 4 - with unstabilized weight
require(survey)
require(jtools)
# Create an indicator variable in the full dataset
dat.full$ind <- 0
dat.full$ind[dat.full$ID %in% analytic.data$ID] <- 1
# New weight = interview weight * unstabilized weight 
analytic.data$new.usweight_t <- with(analytic.data, interview.weight * usweight_t)
summary(analytic.data$new.usweight_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    6437   26567   42862   77768   88106 1831548
#  New weight variable in the full dataset
dat.full$new.usweight_t <- 0
dat.full$new.usweight_t[dat.full$ID %in% analytic.data$ID] <- 
  analytic.data$new.usweight_t
summary(dat.full$new.usweight_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       0       0   24068   49261   55161 1831548
# Survey setup with full data 
w.design0 <- svydesign(id = ~psu, strata = ~strata, weights = ~new.usweight_t, 
                      data = dat.full, nest = TRUE)
# Subset the design for analytic sample
w.design.s <- subset(w.design0, ind == 1)
# Outcome model
out.formula <- as.formula(I(diabetes == "Yes") ~ bmi)
fit <- svyglm(out.formula, design = w.design.s, family = binomial("logit"))
summ(fit, exp = TRUE, confint = TRUE, digits = 3)| Observations | 6316 | 
| Dependent variable | I(diabetes == "Yes") | 
| Type | Survey-weighted generalized linear model | 
| Family | binomial | 
| Link | logit | 
| Pseudo-R² (Cragg-Uhler) | 0.062 | 
| Pseudo-R² (McFadden) | 0.047 | 
| AIC | 3371.907 | 
| exp(Est.) | 2.5% | 97.5% | t val. | p | |
|---|---|---|---|---|---|
| (Intercept) | 0.164 | 0.141 | 0.190 | -24.213 | 0.000 | 
| bmiNot overweight | 0.280 | 0.217 | 0.361 | -9.814 | 0.000 | 
| Standard errors: Robust | 
Step 4 - with stabilized weight
Similarly, we can fit the outcome model with stabilized weights.
# Create an indicator variable in the full dataset
dat.full$ind <- 0
dat.full$ind[dat.full$ID %in% analytic.data$ID] <- 1
# New weight = interview weight * stabilized weight
analytic.data$new.sweight_t <- with(analytic.data, interview.weight * sweight_t)
summary(analytic.data$new.sweight_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    2985   13311   22494   39174   43896  753678
#  New weight variable in the full dataset
dat.full$new.sweight_t <- 0
dat.full$new.sweight_t[dat.full$ID %in% analytic.data$ID] <- analytic.data$new.sweight_t
summary(dat.full$new.sweight_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       0       0   12115   24814   28232  753678
# Survey setup with full data 
w.design0 <- svydesign(id = ~psu, strata = ~strata, weights = ~new.sweight_t, 
                       data = dat.full, nest = TRUE)
# Subset the design for analytic sample
w.design.s2 <- subset(w.design0, ind == 1)
# Outcome model
out.formula <- as.formula(I(diabetes == "Yes") ~ bmi)
fit.stab <- svyglm(out.formula, design = w.design.s2, family = binomial("logit"))
summ(fit.stab, exp = TRUE, confint = TRUE, digits = 3)| Observations | 6316 | 
| Dependent variable | I(diabetes == "Yes") | 
| Type | Survey-weighted generalized linear model | 
| Family | binomial | 
| Link | logit | 
| Pseudo-R² (Cragg-Uhler) | 0.055 | 
| Pseudo-R² (McFadden) | 0.041 | 
| AIC | 3712.451 | 
| exp(Est.) | 2.5% | 97.5% | t val. | p | |
|---|---|---|---|---|---|
| (Intercept) | 0.164 | 0.141 | 0.190 | -24.235 | 0.000 | 
| bmiNot overweight | 0.280 | 0.217 | 0.361 | -9.815 | 0.000 | 
| Standard errors: Robust | 
Double adjustment
library(survey)
# Outcome model with covariates adjustment
fit.DA <- svyglm(I(diabetes == "Yes") ~ bmi + gender + age + race + income + 
                 education + married + cholesterol + diastolicBP + systolicBP, 
               design = w.design.s2, family = binomial("logit"))
summ(fit.DA, exp = TRUE, confint = TRUE, digits = 3)| Observations | 6316 | 
| Dependent variable | I(diabetes == "Yes") | 
| Type | Survey-weighted generalized linear model | 
| Family | binomial | 
| Link | logit | 
| Pseudo-R² (Cragg-Uhler) | 0.194 | 
| Pseudo-R² (McFadden) | 0.149 | 
| AIC | 3331.238 | 
| exp(Est.) | 2.5% | 97.5% | t val. | p | |
|---|---|---|---|---|---|
| (Intercept) | 0.045 | 0.015 | 0.133 | -5.608 | NA | 
| bmiNot overweight | 0.234 | 0.175 | 0.312 | -9.826 | NA | 
| genderMale | 1.402 | 1.190 | 1.652 | 4.038 | NA | 
| age | 1.050 | 1.041 | 1.058 | 11.793 | NA | 
| raceHispanic | 0.790 | 0.600 | 1.040 | -1.677 | NA | 
| raceOther | 0.849 | 0.428 | 1.683 | -0.470 | NA | 
| raceWhite | 0.541 | 0.371 | 0.789 | -3.192 | NA | 
| incomeBetween.25kto54k | 0.723 | 0.478 | 1.093 | -1.540 | NA | 
| incomeBetween.55kto99k | 0.647 | 0.411 | 1.018 | -1.883 | NA | 
| incomeOver100k | 0.476 | 0.309 | 0.733 | -3.371 | NA | 
| educationHigh.School | 0.937 | 0.725 | 1.211 | -0.498 | NA | 
| educationSchool | 0.930 | 0.568 | 1.524 | -0.286 | NA | 
| marriedNever.married | 0.895 | 0.628 | 1.275 | -0.615 | NA | 
| marriedPreviously.married | 0.791 | 0.529 | 1.184 | -1.137 | NA | 
| cholesterol | 0.993 | 0.990 | 0.997 | -3.760 | NA | 
| diastolicBP | 1.007 | 0.999 | 1.014 | 1.744 | NA | 
| systolicBP | 1.002 | 0.993 | 1.012 | 0.529 | NA | 
| Standard errors: Robust | 
# Log odds ratio with p-values
summary(fit.DA, df.resid = degf(w.design.s2))
#> 
#> Call:
#> svyglm(formula = I(diabetes == "Yes") ~ bmi + gender + age + 
#>     race + income + education + married + cholesterol + diastolicBP + 
#>     systolicBP, design = w.design.s2, family = binomial("logit"))
#> 
#> Survey design:
#> subset(w.design0, ind == 1)
#> 
#> Coefficients:
#>                            Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               -3.095813   0.552073  -5.608 4.99e-05 ***
#> bmiNot overweight         -1.454514   0.148024  -9.826 6.29e-08 ***
#> genderMale                 0.338112   0.083733   4.038  0.00107 ** 
#> age                        0.048468   0.004110  11.793 5.48e-09 ***
#> raceHispanic              -0.235285   0.140287  -1.677  0.11422    
#> raceOther                 -0.164132   0.349277  -0.470  0.64517    
#> raceWhite                 -0.613643   0.192233  -3.192  0.00606 ** 
#> incomeBetween.25kto54k    -0.324911   0.211048  -1.540  0.14451    
#> incomeBetween.55kto99k    -0.435472   0.231288  -1.883  0.07927 .  
#> incomeOver100k            -0.742995   0.220399  -3.371  0.00420 ** 
#> educationHigh.School      -0.065157   0.130721  -0.498  0.62540    
#> educationSchool           -0.072034   0.251806  -0.286  0.77874    
#> marriedNever.married      -0.110932   0.180496  -0.615  0.54803    
#> marriedPreviously.married -0.233875   0.205646  -1.137  0.27327    
#> cholesterol               -0.006873   0.001828  -3.760  0.00189 ** 
#> diastolicBP                0.006728   0.003859   1.744  0.10169    
#> systolicBP                 0.002438   0.004605   0.529  0.60430    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Zero or negative residual df; p-values not defined
#> 
#> (Dispersion parameter for binomial family taken to be 0.9452878)
#> 
#> Number of Fisher Scoring iterations: 6Double adjustment:
As explained in the previous chapter, double adjustment should be applied thoughtfully, with careful consideration of model specification, covariate selection, and underlying assumptions to ensure valid and reliable results. Always consider the specific context of the study and consult statistical guidelines or experts when applying advanced methods like double adjustment in propensity score analysis.
Approach by DuGoff et al. (2014)
Step 1
Step 2
# Propensity scores
ps.fit2 <- glm(ps.formula2, data = analytic.data, family = binomial("logit"))
analytic.data$ps2 <- fitted(ps.fit2)
# Stabilized weight
analytic.data$sweight.dug <- with(analytic.data, ifelse(I(bmi=="Not overweight"), 
                                                    mean(I(bmi=="Not overweight"))/ps2, 
                                                    (1-mean(I(bmi=="Not overweight")))/(1-ps2)))
summary(analytic.data$sweight.dug)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.4349  0.6942  0.8296  1.0307  1.0859 23.0226
# Truncating stabilized weight
analytic.data <- analytic.data %>% 
  mutate(sweight.dug_t = pmin(pmax(sweight.dug, quantile(sweight.dug, 0.01)), 
                          quantile(sweight.dug, 0.99)))
summary(analytic.data$sweight.dug_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.4802  0.6942  0.8296  1.0019  1.0859  4.4041Step 3
# Balance checking
cov2 <- c("gender", "age", "race", "income", "education", "married", "cholesterol", 
         "diastolicBP", "systolicBP")
# Design with truncated stabilized weight
design.stab <- svydesign(ids = ~ID, weights = ~sweight.dug_t, data = analytic.data)
# Balance checking with truncated stabilized weight
tab.stab2 <- svyCreateTableOne(vars = cov2, strata = "bmi", data = design.stab, test = F)
print(tab.stab2, smd = T)
#>                          Stratified by bmi
#>                           Overweight      Not overweight  SMD   
#>   n                        3652.9          2675.0               
#>   gender = Male (%)        1766.5 (48.4)   1276.8 (47.7)   0.013
#>   age (mean (SD))           48.47 (17.42)   49.80 (17.99)  0.075
#>   race (%)                                                 0.056
#>      Black                  769.1 (21.1)    568.7 (21.3)        
#>      Hispanic              1121.5 (30.7)    755.4 (28.2)        
#>      Other                  553.0 (15.1)    417.0 (15.6)        
#>      White                 1209.4 (33.1)    933.9 (34.9)        
#>   income (%)                                               0.023
#>      <25k                   982.1 (26.9)    734.0 (27.4)        
#>      Between.25kto54k      1172.1 (32.1)    871.1 (32.6)        
#>      Between.55kto99k       846.1 (23.2)    597.0 (22.3)        
#>      Over100k               652.6 (17.9)    472.9 (17.7)        
#>   education (%)                                            0.012
#>      College               2049.1 (56.1)   1484.2 (55.5)        
#>      High.School           1202.5 (32.9)    893.7 (33.4)        
#>      School                 401.3 (11.0)    297.1 (11.1)        
#>   married (%)                                              0.035
#>      Married               2201.3 (60.3)   1581.2 (59.1)        
#>      Never.married          647.4 (17.7)    465.6 (17.4)        
#>      Previously.married     804.2 (22.0)    628.2 (23.5)        
#>   cholesterol (mean (SD))  181.74 (40.94)  182.97 (43.25)  0.029
#>   diastolicBP (mean (SD))   66.46 (14.42)   66.88 (14.79)  0.029
#>   systolicBP (mean (SD))   121.14 (16.59)  123.39 (23.38)  0.111All SMDs are less than our specified cut-point of 0.2.
Step 4
# Create an indicator variable in the full dataset
dat.full$ind <- 0
dat.full$ind[dat.full$ID %in% analytic.data$ID] <- 1
# New weight = interview weight * stabilized weight
analytic.data$new.sweight.dug_t <- with(analytic.data, interview.weight * sweight.dug_t)
summary(analytic.data$new.sweight.dug_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    2996   13343   22260   39436   43914  855144
#  New weight variable in the full dataset
dat.full$new.sweight.dug_t <- 0
dat.full$new.sweight.dug_t[dat.full$ID %in% analytic.data$ID] <- analytic.data$new.sweight.dug_t
summary(dat.full$new.sweight.dug_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       0       0   12058   24980   28225  855144
# Survey setup with full data 
w.design0 <- svydesign(id = ~psu, strata = ~strata, weights = ~new.sweight.dug_t, 
                       data = dat.full, nest = TRUE)
# Subset the design for analytic sample
w.design.s2 <- subset(w.design0, ind == 1)
# Outcome model
out.formula <- as.formula(I(diabetes == "Yes") ~ bmi)
fit.stab.dug <- svyglm(out.formula, design = w.design.s2, family = binomial("logit"))
summ(fit.stab.dug, exp = TRUE, confint = TRUE, digits = 3)| Observations | 6316 | 
| Dependent variable | I(diabetes == "Yes") | 
| Type | Survey-weighted generalized linear model | 
| Family | binomial | 
| Link | logit | 
| Pseudo-R² (Cragg-Uhler) | 0.060 | 
| Pseudo-R² (McFadden) | 0.045 | 
| AIC | 3518.691 | 
| exp(Est.) | 2.5% | 97.5% | t val. | p | |
|---|---|---|---|---|---|
| (Intercept) | 0.161 | 0.140 | 0.185 | -26.027 | 0.000 | 
| bmiNot overweight | 0.272 | 0.203 | 0.364 | -8.727 | 0.000 | 
| Standard errors: Robust | 
Double adjustment
# Outcome model with covariates adjustment
fit2.DA <- svyglm(I(diabetes == "Yes") ~ bmi + gender + age + race + income + 
                 education + married + cholesterol + diastolicBP + systolicBP, 
               design = w.design.s2, family = binomial("logit"))
summ(fit2.DA, exp = TRUE, confint = TRUE, digits = 3)| Observations | 6316 | 
| Dependent variable | I(diabetes == "Yes") | 
| Type | Survey-weighted generalized linear model | 
| Family | binomial | 
| Link | logit | 
| Pseudo-R² (Cragg-Uhler) | 0.196 | 
| Pseudo-R² (McFadden) | 0.152 | 
| AIC | 3154.781 | 
| exp(Est.) | 2.5% | 97.5% | t val. | p | |
|---|---|---|---|---|---|
| (Intercept) | 0.043 | 0.014 | 0.131 | -5.537 | NA | 
| bmiNot overweight | 0.235 | 0.169 | 0.325 | -8.730 | NA | 
| genderMale | 1.393 | 1.150 | 1.688 | 3.381 | NA | 
| age | 1.049 | 1.042 | 1.056 | 13.564 | NA | 
| raceHispanic | 0.813 | 0.624 | 1.061 | -1.524 | NA | 
| raceOther | 0.841 | 0.454 | 1.558 | -0.549 | NA | 
| raceWhite | 0.535 | 0.369 | 0.774 | -3.316 | NA | 
| incomeBetween.25kto54k | 0.720 | 0.492 | 1.053 | -1.693 | NA | 
| incomeBetween.55kto99k | 0.659 | 0.440 | 0.985 | -2.035 | NA | 
| incomeOver100k | 0.491 | 0.338 | 0.713 | -3.735 | NA | 
| educationHigh.School | 0.930 | 0.753 | 1.150 | -0.667 | NA | 
| educationSchool | 0.994 | 0.625 | 1.580 | -0.026 | NA | 
| marriedNever.married | 0.982 | 0.671 | 1.439 | -0.092 | NA | 
| marriedPreviously.married | 0.783 | 0.534 | 1.150 | -1.245 | NA | 
| cholesterol | 0.993 | 0.989 | 0.996 | -3.913 | NA | 
| diastolicBP | 1.006 | 0.998 | 1.013 | 1.426 | NA | 
| systolicBP | 1.004 | 0.995 | 1.012 | 0.853 | NA | 
| Standard errors: Robust | 
# Log odds ratio with p-values
summary(fit2.DA, df.resid = degf(w.design.s2))
#> 
#> Call:
#> svyglm(formula = I(diabetes == "Yes") ~ bmi + gender + age + 
#>     race + income + education + married + cholesterol + diastolicBP + 
#>     systolicBP, design = w.design.s2, family = binomial("logit"))
#> 
#> Survey design:
#> subset(w.design0, ind == 1)
#> 
#> Coefficients:
#>                            Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               -3.143943   0.567807  -5.537 5.70e-05 ***
#> bmiNot overweight         -1.450047   0.166099  -8.730 2.89e-07 ***
#> genderMale                 0.331561   0.098056   3.381  0.00411 ** 
#> age                        0.047790   0.003523  13.564 7.97e-10 ***
#> raceHispanic              -0.206657   0.135568  -1.524  0.14822    
#> raceOther                 -0.172590   0.314341  -0.549  0.59105    
#> raceWhite                 -0.625821   0.188746  -3.316  0.00471 ** 
#> incomeBetween.25kto54k    -0.328863   0.194223  -1.693  0.11107    
#> incomeBetween.55kto99k    -0.417687   0.205223  -2.035  0.05989 .  
#> incomeOver100k            -0.711500   0.190509  -3.735  0.00199 ** 
#> educationHigh.School      -0.072053   0.108040  -0.667  0.51496    
#> educationSchool           -0.006136   0.236588  -0.026  0.97965    
#> marriedNever.married      -0.017832   0.194660  -0.092  0.92822    
#> marriedPreviously.married -0.244058   0.196041  -1.245  0.23226    
#> cholesterol               -0.007067   0.001806  -3.913  0.00138 ** 
#> diastolicBP                0.005516   0.003869   1.426  0.17440    
#> systolicBP                 0.003739   0.004381   0.853  0.40682    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Zero or negative residual df; p-values not defined
#> 
#> (Dispersion parameter for binomial family taken to be 0.9373219)
#> 
#> Number of Fisher Scoring iterations: 6Approach by Austin et al. (2018)
Step 1
# Specify the PS model to estimate propensity scores
ps.formula3 <- as.formula(I(bmi == "Not overweight") ~ gender + age + race + income + education + 
                           married + cholesterol + diastolicBP + systolicBP)
# Survey design
require(survey)
analytic.design <- svydesign(id = ~psu, weights = ~interview.weight, strata = ~strata,
                             data = analytic.data, nest = TRUE)Step 2
# Propensity scores
ps.fit3 <- svyglm(ps.formula3, design = analytic.design, family = binomial("logit"))
analytic.data$ps3 <- fitted(ps.fit3)
# Stabilized weight
analytic.data$sweight.aus <- with(analytic.data, ifelse(I(bmi=="Not overweight"), 
                                                    mean(I(bmi=="Not overweight"))/ps3, 
                                                    (1-mean(I(bmi=="Not overweight")))/(1-ps3)))
summary(analytic.data$sweight.aus)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.4436  0.7138  0.8410  1.0392  1.0645 26.2776
# Truncating stabilized weight
analytic.data <- analytic.data %>% 
  mutate(sweight.aus_t = pmin(pmax(sweight.aus, quantile(sweight.aus, 0.01)), 
                          quantile(sweight.aus, 0.99)))
summary(analytic.data$sweight.aus_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.5198  0.7138  0.8410  1.0071  1.0645  4.8830Step 3
# Balance checking
vars <- c("gender", "age", "race", "income", "education", "married", "cholesterol", 
          "diastolicBP", "systolicBP")
# Design with truncated stabilized weight
design.stab <- svydesign(ids = ~ID, weights = ~sweight.aus_t, data = analytic.data)
# Balance checking with truncated stabilized weight
tab.stab.aus <- svyCreateTableOne(vars = vars, strata = "bmi", data = design.stab, test = F)
print(tab.stab.aus, smd = T)
#>                          Stratified by bmi
#>                           Overweight      Not overweight  SMD   
#>   n                        3434.3          2926.8               
#>   gender = Male (%)        1609.8 (46.9)   1458.6 (49.8)   0.059
#>   age (mean (SD))           48.51 (17.32)   49.95 (18.14)  0.081
#>   race (%)                                                 0.099
#>      Black                  737.1 (21.5)    623.8 (21.3)        
#>      Hispanic              1085.0 (31.6)    826.0 (28.2)        
#>      Other                  471.2 (13.7)    489.7 (16.7)        
#>      White                 1141.0 (33.2)    987.3 (33.7)        
#>   income (%)                                               0.041
#>      <25k                   933.7 (27.2)    813.0 (27.8)        
#>      Between.25kto54k      1108.2 (32.3)    946.8 (32.3)        
#>      Between.55kto99k       776.7 (22.6)    685.4 (23.4)        
#>      Over100k               615.7 (17.9)    481.6 (16.5)        
#>   education (%)                                            0.039
#>      College               1908.2 (55.6)   1601.1 (54.7)        
#>      High.School           1129.9 (32.9)   1011.4 (34.6)        
#>      School                 396.2 (11.5)    314.3 (10.7)        
#>   married (%)                                              0.030
#>      Married               2063.9 (60.1)   1737.6 (59.4)        
#>      Never.married          605.8 (17.6)    501.4 (17.1)        
#>      Previously.married     764.6 (22.3)    687.8 (23.5)        
#>   cholesterol (mean (SD))  182.67 (41.11)  182.70 (43.38)  0.001
#>   diastolicBP (mean (SD))   66.76 (14.28)   66.85 (14.85)  0.006
#>   systolicBP (mean (SD))   121.39 (16.74)  123.98 (23.72)  0.126All SMDs are less than our specified cut-point of 0.2.
Step 4
# Create an indicator variable in the full dataset
dat.full$ind <- 0
dat.full$ind[dat.full$ID %in% analytic.data$ID] <- 1
# New weight = interview weight * stabilized weight
analytic.data$new.sweight.aus_t <- with(analytic.data, interview.weight * sweight.aus_t)
summary(analytic.data$new.sweight.aus_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    3255   13604   22527   38953   43663  819622
#  New weight variable in the full dataset
dat.full$new.sweight.aus_t <- 0
dat.full$new.sweight.aus_t[dat.full$ID %in% analytic.data$ID] <- analytic.data$new.sweight.aus_t
summary(dat.full$new.sweight.aus_t)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       0       0   12356   24675   28169  819622
# Survey setup with full data 
w.design0 <- svydesign(id = ~psu, strata = ~strata, weights = ~new.sweight.aus_t, 
                       data = dat.full, nest = TRUE)
# Subset the design for analytic sample
w.design.s2 <- subset(w.design0, ind == 1)
# Outcome model
out.formula <- as.formula(I(diabetes == "Yes") ~ bmi)
fit.stab.aus <- svyglm(out.formula, design = w.design.s2, family = binomial("logit"))
summ(fit.stab.aus, exp = TRUE, confint = TRUE, digits = 3)| Observations | 6316 | 
| Dependent variable | I(diabetes == "Yes") | 
| Type | Survey-weighted generalized linear model | 
| Family | binomial | 
| Link | logit | 
| Pseudo-R² (Cragg-Uhler) | 0.055 | 
| Pseudo-R² (McFadden) | 0.041 | 
| AIC | 3622.951 | 
| exp(Est.) | 2.5% | 97.5% | t val. | p | |
|---|---|---|---|---|---|
| (Intercept) | 0.162 | 0.140 | 0.187 | -24.912 | 0.000 | 
| bmiNot overweight | 0.291 | 0.225 | 0.377 | -9.397 | 0.000 | 
| Standard errors: Robust | 
Double adjustment
# Outcome model with covariates adjustment
fit3.DA <- svyglm(I(diabetes == "Yes") ~ bmi + gender + age + race + income + 
                 education + married + cholesterol + diastolicBP + systolicBP, 
               design = w.design.s2, family = binomial("logit"))
summ(fit3.DA, exp = TRUE, confint = TRUE, digits = 3)| Observations | 6316 | 
| Dependent variable | I(diabetes == "Yes") | 
| Type | Survey-weighted generalized linear model | 
| Family | binomial | 
| Link | logit | 
| Pseudo-R² (Cragg-Uhler) | 0.193 | 
| Pseudo-R² (McFadden) | 0.149 | 
| AIC | 3247.730 | 
| exp(Est.) | 2.5% | 97.5% | t val. | p | |
|---|---|---|---|---|---|
| (Intercept) | 0.045 | 0.015 | 0.129 | -5.735 | NA | 
| bmiNot overweight | 0.233 | 0.175 | 0.311 | -9.886 | NA | 
| genderMale | 1.416 | 1.186 | 1.690 | 3.850 | NA | 
| age | 1.049 | 1.041 | 1.057 | 12.066 | NA | 
| raceHispanic | 0.773 | 0.585 | 1.022 | -1.808 | NA | 
| raceOther | 0.867 | 0.451 | 1.666 | -0.428 | NA | 
| raceWhite | 0.520 | 0.353 | 0.765 | -3.316 | NA | 
| incomeBetween.25kto54k | 0.700 | 0.461 | 1.062 | -1.677 | NA | 
| incomeBetween.55kto99k | 0.650 | 0.407 | 1.039 | -1.801 | NA | 
| incomeOver100k | 0.467 | 0.303 | 0.721 | -3.437 | NA | 
| educationHigh.School | 0.948 | 0.744 | 1.209 | -0.428 | NA | 
| educationSchool | 0.976 | 0.598 | 1.592 | -0.099 | NA | 
| marriedNever.married | 0.896 | 0.629 | 1.276 | -0.609 | NA | 
| marriedPreviously.married | 0.784 | 0.525 | 1.169 | -1.195 | NA | 
| cholesterol | 0.993 | 0.990 | 0.997 | -3.653 | NA | 
| diastolicBP | 1.005 | 0.998 | 1.013 | 1.412 | NA | 
| systolicBP | 1.004 | 0.995 | 1.012 | 0.826 | NA | 
| Standard errors: Robust | 
# Log odds ratio with p-values
summary(fit3.DA, df.resid = degf(w.design.s2))
#> 
#> Call:
#> svyglm(formula = I(diabetes == "Yes") ~ bmi + gender + age + 
#>     race + income + education + married + cholesterol + diastolicBP + 
#>     systolicBP, design = w.design.s2, family = binomial("logit"))
#> 
#> Survey design:
#> subset(w.design0, ind == 1)
#> 
#> Coefficients:
#>                            Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)               -3.109181   0.542121  -5.735 3.94e-05 ***
#> bmiNot overweight         -1.456869   0.147369  -9.886 5.81e-08 ***
#> genderMale                 0.347655   0.090292   3.850  0.00157 ** 
#> age                        0.047457   0.003933  12.066 4.01e-09 ***
#> raceHispanic              -0.257364   0.142343  -1.808  0.09069 .  
#> raceOther                 -0.142755   0.333175  -0.428  0.67440    
#> raceWhite                 -0.653902   0.197190  -3.316  0.00470 ** 
#> incomeBetween.25kto54k    -0.357122   0.212996  -1.677  0.11432    
#> incomeBetween.55kto99k    -0.430289   0.238964  -1.801  0.09190 .  
#> incomeOver100k            -0.761311   0.221498  -3.437  0.00367 ** 
#> educationHigh.School      -0.052990   0.123816  -0.428  0.67475    
#> educationSchool           -0.024734   0.249884  -0.099  0.92246    
#> marriedNever.married      -0.109989   0.180532  -0.609  0.55148    
#> marriedPreviously.married -0.243725   0.203990  -1.195  0.25072    
#> cholesterol               -0.006605   0.001808  -3.653  0.00235 ** 
#> diastolicBP                0.005253   0.003721   1.412  0.17842    
#> systolicBP                 0.003618   0.004380   0.826  0.42172    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Zero or negative residual df; p-values not defined
#> 
#> (Dispersion parameter for binomial family taken to be 0.9416613)
#> 
#> Number of Fisher Scoring iterations: 6