Perform a propensity score matching (pair matching, without replacement, using SMD cut-point 0.2, may adjust for imbalanced and/or all covariates in the outcome model, if any) analysis as per the following recommendations
See explained in the previous chapter on PSM in OA-CVD (US), there are four steps of the propensity score matching:
Step 1: PS model specification
Step 2: Matching based on the estimated propensity scores
Step 3: Balance checking
Step 4: Outcome modelling
Only Step 1 is different for Zanutto (2006), DuGoff et al. (2014), and Austin et al. (2018) approaches.
Dataset
The following modified NHANES dataset
NHANES15lab5.RData
use the data set “analytic.with.miss” within this file.
for obtaining the final treatment effect estimates, you can omit missing values, but only after creating the design (e.g., subset the design, not the data itself directly).
# Exposureanalytic.with.miss$bmi <-with(analytic.with.miss, ifelse(bmi>25, "Overweight", ifelse(bmi<=25, "Not overweight", NA)))analytic.with.miss$bmi <-as.factor(analytic.with.miss$bmi)analytic.with.miss$bmi <-relevel(analytic.with.miss$bmi, ref ="Overweight")# Drop unnecessary variables analytic.with.miss$born <-NULLanalytic.with.miss$physical.work <-NULL# Rename the weight variable into interview.weightnames(analytic.with.miss)[names(analytic.with.miss) =="weight"] <-"interview.weight"
Reproducibility
set.seed(504)
Approach by Zanutto (2006)
Step 1
# Specify the PS model to estimate propensity scoresps.formula <-as.formula(I(bmi=="Not overweight") ~ gender + age + race + income + education + married + cholesterol + diastolicBP + systolicBP)
Step 2
require(MatchIt)# Complete data for matchinganalytic.data <- analytic.with.miss[complete.cases(analytic.with.miss),]# Create a missing variable in the original datasetanalytic.with.miss$miss <-1analytic.with.miss$miss[analytic.with.miss$ID %in% analytic.data$ID] <-0# Propensity scoresps.fit <-glm(ps.formula, data = analytic.data, family =binomial("logit"))analytic.data$PS <-predict(ps.fit, type ="response", newdata = analytic.data)# Caliper fixing to 0.2*sd(logit of PS)caliper <-0.2*sd(log(analytic.data$PS/(1-analytic.data$PS)))# Match exposed and unexposed subjects match.obj <-matchit(ps.formula, data = analytic.data,distance = analytic.data$PS, method ="nearest", replace =FALSE,caliper = caliper, ratio =1)analytic.data$PS <- match.obj$distance# Extract matched datamatched.data <-match.data(match.obj)
All SMDs are less than your specified cut-point? If yes, that indicates that there is good covariate balancing.
Step 4
# Setup the design with survey featuresanalytic.with.miss$matched2 <-0analytic.with.miss$matched2[analytic.with.miss$ID %in% matched.data2$ID] <-1# Survey setup for full dataw.design02 <-svydesign(strata =~strata, id =~psu, weights =~interview.weight, data = analytic.with.miss, nest =TRUE)# Subset matched dataw.design.m2 <-subset(w.design02, matched2 ==1)# Outcome modelout.formula <-as.formula(I(diabetes =="Yes") ~ bmi)fit2 <-svyglm(out.formula, design = w.design.m2, family =binomial("logit"))summ(fit2, exp =TRUE, confint =TRUE, digits =3)
Observations
4074
Dependent variable
I(diabetes == "Yes")
Type
Survey-weighted generalized linear model
Family
binomial
Link
logit
Pseudo-R² (Cragg-Uhler)
0.083
Pseudo-R² (McFadden)
0.063
AIC
2322.836
exp(Est.)
2.5%
97.5%
t val.
p
(Intercept)
0.174
0.149
0.202
-22.325
0.000
bmiNot overweight
0.221
0.163
0.299
-9.749
0.000
Standard errors: Robust
Double adjustment
# Outcome model with covariates adjustmentfit2.DA <-svyglm(I(diabetes =="Yes") ~ bmi + gender + age + race + income + education + married + cholesterol + diastolicBP + systolicBP,design = w.design.m2, family =binomial("logit"))summ(fit2.DA, exp =TRUE, confint =TRUE, digits =3)
Observations
4074
Dependent variable
I(diabetes == "Yes")
Type
Survey-weighted generalized linear model
Family
binomial
Link
logit
Pseudo-R² (Cragg-Uhler)
0.232
Pseudo-R² (McFadden)
0.183
AIC
2029.087
exp(Est.)
2.5%
97.5%
t val.
p
(Intercept)
0.035
0.006
0.191
-3.877
NA
bmiNot overweight
0.205
0.146
0.287
-9.239
NA
genderMale
1.314
1.046
1.651
2.350
NA
age
1.053
1.044
1.064
10.741
NA
raceHispanic
0.975
0.703
1.352
-0.150
NA
raceOther
1.038
0.594
1.811
0.130
NA
raceWhite
0.813
0.568
1.164
-1.131
NA
incomeBetween.25kto54k
0.784
0.525
1.171
-1.189
NA
incomeBetween.55kto99k
0.741
0.504
1.090
-1.522
NA
incomeOver100k
0.688
0.440
1.075
-1.644
NA
educationHigh.School
0.776
0.558
1.078
-1.514
NA
educationSchool
0.745
0.512
1.083
-1.543
NA
marriedNever.married
0.917
0.540
1.558
-0.320
NA
marriedPreviously.married
0.809
0.545
1.200
-1.055
NA
cholesterol
0.992
0.987
0.998
-2.586
NA
diastolicBP
1.001
0.991
1.012
0.239
NA
systolicBP
1.005
0.993
1.017
0.808
NA
Standard errors: Robust
Tip
Double adjustment:
Double adjustment in PSM can offer an additional layer of control for confounding and enhance the robustness of treatment effect estimates. The primary goal of double adjustment is to further minimize bias in the estimated treatment effect, especially when the propensity score model may not fully balance all covariates across treatment groups. in this approach, The propensity score is estimated, typically using logistic regression, where the treatment assignment is regressed on observed covariates. Units (e.g., individuals) in the treatment group are matched with units in the control group based on their propensity scores. This aims to create comparable groups and balance observed covariates across treatment and control groups. After matching, an outcome model is fitted to estimate the treatment effect while additionally adjusting for the same covariates used in the propensity score model. This second adjustment in the outcome model is what constitutes the “double” in “double adjustment.”
However, it should be applied thoughtfully, with careful consideration of model specification, covariate selection, and underlying assumptions to ensure valid and reliable results. Always consider the specific context of the study and consult statistical guidelines or experts when applying advanced methods like double adjustment in PSM.
Approach by Austin et al. (2018)
Step 1
# Specify the PS model to estimate propensity scoresps.formula3 <-as.formula(I(bmi =="Not overweight") ~ gender + age + race + income + education + married + cholesterol + diastolicBP + systolicBP)# Survey designrequire(survey)analytic.design <-svydesign(id =~psu, weights =~interview.weight, strata =~strata,data = analytic.data, nest =TRUE)
Step 2
# Propensity scoresps.fit3 <-svyglm(ps.formula3, design = analytic.design, family =binomial("logit"))analytic.data$PS3 <-fitted(ps.fit3)# Caliper fixing to 0.2*sd(logit of PS)caliper3 <-0.2*sd(log(analytic.data$PS3/(1-analytic.data$PS3)))# Match exposed and unexposed subjects set.seed(504)match.obj3 <-matchit(ps.formula3, data = analytic.data,distance = analytic.data$PS3,method ="nearest", replace =FALSE,caliper = caliper3, ratio =1)analytic.data$PS3 <- match.obj3$distance# Extract matched datamatched.data3 <-match.data(match.obj3)
All SMDs are less than your specified cut-point? If yes, that indicates that there is good covariate balancing.
Step 4
# Setup the design with survey featuresanalytic.with.miss$matched3 <-0analytic.with.miss$matched3[analytic.with.miss$ID %in% matched.data3$ID] <-1# Survey setup for full dataw.design03 <-svydesign(strata =~strata, id =~psu, weights =~interview.weight, data = analytic.with.miss, nest =TRUE)# Subset matched dataw.design.m3 <-subset(w.design03, matched3 ==1)# Outcome modelout.formula <-as.formula(I(diabetes =="Yes") ~ bmi)fit3 <-svyglm(out.formula, design = w.design.m3, family =binomial("logit"))summ(fit3, exp =TRUE, confint =TRUE, digits =3)
Observations
4058
Dependent variable
I(diabetes == "Yes")
Type
Survey-weighted generalized linear model
Family
binomial
Link
logit
Pseudo-R² (Cragg-Uhler)
0.071
Pseudo-R² (McFadden)
0.054
AIC
2364.398
exp(Est.)
2.5%
97.5%
t val.
p
(Intercept)
0.169
0.142
0.202
-19.891
0.000
bmiNot overweight
0.244
0.178
0.334
-8.825
0.000
Standard errors: Robust
Double adjustment
# Outcome model with covariates adjustmentfit3.DA <-svyglm(I(diabetes =="Yes") ~ bmi + gender + age + race + income + education + married + cholesterol + diastolicBP + systolicBP,design = w.design.m3, family =binomial("logit"))summ(fit3.DA, exp =TRUE, confint =TRUE, digits =3)
Observations
4058
Dependent variable
I(diabetes == "Yes")
Type
Survey-weighted generalized linear model
Family
binomial
Link
logit
Pseudo-R² (Cragg-Uhler)
0.225
Pseudo-R² (McFadden)
0.176
AIC
2063.963
exp(Est.)
2.5%
97.5%
t val.
p
(Intercept)
0.027
0.007
0.107
-5.169
NA
bmiNot overweight
0.226
0.157
0.327
-7.953
NA
genderMale
1.266
0.954
1.679
1.637
NA
age
1.056
1.049
1.063
16.336
NA
raceHispanic
0.993
0.678
1.454
-0.036
NA
raceOther
1.323
0.632
2.771
0.742
NA
raceWhite
0.817
0.522
1.278
-0.886
NA
incomeBetween.25kto54k
0.940
0.544
1.626
-0.221
NA
incomeBetween.55kto99k
0.885
0.551
1.422
-0.505
NA
incomeOver100k
0.837
0.538
1.302
-0.788
NA
educationHigh.School
1.039
0.804
1.344
0.295
NA
educationSchool
0.794
0.447
1.411
-0.787
NA
marriedNever.married
1.139
0.731
1.775
0.575
NA
marriedPreviously.married
0.939
0.592
1.490
-0.265
NA
cholesterol
0.991
0.987
0.995
-4.140
NA
diastolicBP
1.002
0.992
1.012
0.438
NA
systolicBP
1.004
0.995
1.013
0.836
NA
Standard errors: Robust
References
Austin, Peter C, Nathaniel Jembere, and Maria Chiu. 2018. “Propensity Score Matching and Complex Surveys.”Statistical Methods in Medical Research 27 (4): 1240–57.
DuGoff, Eva H, Megan Schuler, and Elizabeth A Stuart. 2014. “Generalizing Observational Study Results: Applying Propensity Score Methods to Complex Surveys.”Health Services Research 49 (1): 284–303.
Zanutto, Elaine L. 2006. “A Comparison of Propensity Score and Linear Regression Analysis of Complex Survey Data.”Journal of Data Science 4 (1): 67–91.