Perform a propensity score matching (pair matching, without replacement, using SMD cut-point 0.2, may adjust for imbalanced and/or all covariates in the outcome model, if any) analysis as per the following recommendations
See explained in the previous chapter on PSM in OA-CVD (US), there are four steps of the propensity score matching:
Step 1: PS model specification
Step 2: Matching based on the estimated propensity scores
Step 3: Balance checking
Step 4: Outcome modelling
Only Step 1 is different for Zanutto (2006), DuGoff et al. (2014), and Austin et al. (2018) approaches.
Dataset
The following modified NHANES dataset
NHANES15lab5.RData
use the data set “analytic.with.miss” within this file.
for obtaining the final treatment effect estimates, you can omit missing values, but only after creating the design (e.g., subset the design, not the data itself directly).
# Exposureanalytic.with.miss$bmi <-with(analytic.with.miss, ifelse(bmi>25, "Overweight", ifelse(bmi<=25, "Not overweight", NA)))analytic.with.miss$bmi <-as.factor(analytic.with.miss$bmi)analytic.with.miss$bmi <-relevel(analytic.with.miss$bmi, ref ="Overweight")# Drop unnecessary variables analytic.with.miss$born <-NULLanalytic.with.miss$physical.work <-NULL# Rename the weight variable into interview.weightnames(analytic.with.miss)[names(analytic.with.miss) =="weight"] <-"interview.weight"
Reproducibility
set.seed(504)
Approach by Zanutto (2006)
Step 1
# Specify the PS model to estimate propensity scoresps.formula <-as.formula(I(bmi=="Not overweight") ~ gender + age + race + income + education + married + cholesterol + diastolicBP + systolicBP)
Step 2
require(MatchIt)# Complete data for matchinganalytic.data <- analytic.with.miss[complete.cases(analytic.with.miss),]# Create a missing variable in the original datasetanalytic.with.miss$miss <-1analytic.with.miss$miss[analytic.with.miss$ID %in% analytic.data$ID] <-0# Propensity scoresps.fit <-glm(ps.formula, data = analytic.data, family =binomial("logit"))analytic.data$PS <-predict(ps.fit, type ="response", newdata = analytic.data)# Caliper fixing to 0.2*sd(logit of PS)caliper <-0.2*sd(log(analytic.data$PS/(1-analytic.data$PS)))# Match exposed and unexposed subjects match.obj <-matchit(ps.formula, data = analytic.data,distance = analytic.data$PS, method ="nearest", replace =FALSE,caliper = caliper, ratio =1)analytic.data$PS <- match.obj$distance# Extract matched datamatched.data <-match.data(match.obj)
All SMDs are less than your specified cut-point? If yes, that indicates that there is good covariate balancing.
Step 4
# Setup the design with survey featuresanalytic.with.miss$matched2 <-0analytic.with.miss$matched2[analytic.with.miss$ID %in% matched.data2$ID] <-1# Survey setup for full dataw.design02 <-svydesign(strata =~strata, id =~psu, weights =~interview.weight, data = analytic.with.miss, nest =TRUE)# Subset matched dataw.design.m2 <-subset(w.design02, matched2 ==1)# Outcome modelout.formula <-as.formula(I(diabetes =="Yes") ~ bmi)fit2 <-svyglm(out.formula, design = w.design.m2, family =binomial("logit"))summ(fit2, exp =TRUE, confint =TRUE, digits =3)
Observations
4074
Dependent variable
I(diabetes == "Yes")
Type
Survey-weighted generalized linear model
Family
binomial
Link
logit
Pseudo-R² (Cragg-Uhler)
0.083
Pseudo-R² (McFadden)
0.063
AIC
2322.836
exp(Est.)
2.5%
97.5%
t val.
p
(Intercept)
0.174
0.149
0.202
-22.325
0.000
bmiNot overweight
0.221
0.163
0.299
-9.749
0.000
Standard errors: Robust
Double adjustment
# Outcome model with covariates adjustmentfit2.DA <-svyglm(I(diabetes =="Yes") ~ bmi + gender + age + race + income + education + married + cholesterol + diastolicBP + systolicBP, design = w.design.m, family =binomial("logit"))summ(fit2.DA, exp =TRUE, confint =TRUE, digits =3)
Observations
4094
Dependent variable
I(diabetes == "Yes")
Type
Survey-weighted generalized linear model
Family
binomial
Link
logit
Pseudo-R² (Cragg-Uhler)
0.212
Pseudo-R² (McFadden)
0.166
AIC
2063.466
exp(Est.)
2.5%
97.5%
t val.
p
(Intercept)
0.045
0.010
0.199
-4.101
NA
bmiNot overweight
0.235
0.159
0.348
-7.219
NA
genderMale
1.317
1.082
1.604
2.740
NA
age
1.051
1.044
1.059
13.280
NA
raceHispanic
0.896
0.649
1.237
-0.665
NA
raceOther
1.158
0.565
2.374
0.401
NA
raceWhite
0.778
0.507
1.193
-1.152
NA
incomeBetween.25kto54k
0.830
0.551
1.251
-0.889
NA
incomeBetween.55kto99k
0.872
0.621
1.224
-0.794
NA
incomeOver100k
0.669
0.418
1.072
-1.669
NA
educationHigh.School
0.948
0.733
1.225
-0.408
NA
educationSchool
0.872
0.448
1.695
-0.405
NA
marriedNever.married
1.055
0.677
1.646
0.237
NA
marriedPreviously.married
0.849
0.534
1.348
-0.694
NA
cholesterol
0.991
0.986
0.995
-3.810
NA
diastolicBP
1.005
0.992
1.018
0.749
NA
systolicBP
1.003
0.993
1.012
0.559
NA
Standard errors: Robust
Tip
Double adjustment:
Double adjustment in PSM can offer an additional layer of control for confounding and enhance the robustness of treatment effect estimates. The primary goal of double adjustment is to further minimize bias in the estimated treatment effect, especially when the propensity score model may not fully balance all covariates across treatment groups. in this approach, The propensity score is estimated, typically using logistic regression, where the treatment assignment is regressed on observed covariates. Units (e.g., individuals) in the treatment group are matched with units in the control group based on their propensity scores. This aims to create comparable groups and balance observed covariates across treatment and control groups. After matching, an outcome model is fitted to estimate the treatment effect while additionally adjusting for the same covariates used in the propensity score model. This second adjustment in the outcome model is what constitutes the “double” in “double adjustment.”
However, it should be applied thoughtfully, with careful consideration of model specification, covariate selection, and underlying assumptions to ensure valid and reliable results. Always consider the specific context of the study and consult statistical guidelines or experts when applying advanced methods like double adjustment in PSM.
Approach by Austin et al. (2018)
Step 1
# Specify the PS model to estimate propensity scoresps.formula3 <-as.formula(I(bmi =="Not overweight") ~ gender + age + race + income + education + married + cholesterol + diastolicBP + systolicBP)# Survey designrequire(survey)analytic.design <-svydesign(id =~psu, weights =~interview.weight, strata =~strata,data = analytic.data, nest =TRUE)
Step 2
# Propensity scoresps.fit3 <-svyglm(ps.formula3, design = analytic.design, family =binomial("logit"))analytic.data$PS3 <-fitted(ps.fit3)# Caliper fixing to 0.2*sd(logit of PS)caliper3 <-0.2*sd(log(analytic.data$PS3/(1-analytic.data$PS3)))# Match exposed and unexposed subjects set.seed(504)match.obj3 <-matchit(ps.formula, data = analytic.data,distance = analytic.data$PS3, method ="nearest", replace =FALSE,caliper = caliper3, ratio =1)analytic.data$PS3 <- match.obj3$distance# Extract matched datamatched.data3 <-match.data(match.obj3)
All SMDs are less than your specified cut-point? If yes, that indicates that there is good covariate balancing.
Step 4
# Setup the design with survey featuresanalytic.with.miss$matched3 <-0analytic.with.miss$matched3[analytic.with.miss$ID %in% matched.data3$ID] <-1# Survey setup for full dataw.design03 <-svydesign(strata =~strata, id =~psu, weights =~interview.weight, data = analytic.with.miss, nest =TRUE)# Subset matched dataw.design.m3 <-subset(w.design03, matched3 ==1)# Outcome modelout.formula <-as.formula(I(diabetes =="Yes") ~ bmi)fit3 <-svyglm(out.formula, design = w.design.m3, family =binomial("logit"))summ(fit3, exp =TRUE, confint =TRUE, digits =3)
Observations
4058
Dependent variable
I(diabetes == "Yes")
Type
Survey-weighted generalized linear model
Family
binomial
Link
logit
Pseudo-R² (Cragg-Uhler)
0.071
Pseudo-R² (McFadden)
0.054
AIC
2364.398
exp(Est.)
2.5%
97.5%
t val.
p
(Intercept)
0.169
0.142
0.202
-19.891
0.000
bmiNot overweight
0.244
0.178
0.334
-8.825
0.000
Standard errors: Robust
Double adjustment
# Outcome model with covariates adjustmentfit3.DA <-svyglm(I(diabetes =="Yes") ~ bmi + gender + age + race + income + education + married + cholesterol + diastolicBP + systolicBP, design = w.design.m, family =binomial("logit"))summ(fit3.DA, exp =TRUE, confint =TRUE, digits =3)
Observations
4094
Dependent variable
I(diabetes == "Yes")
Type
Survey-weighted generalized linear model
Family
binomial
Link
logit
Pseudo-R² (Cragg-Uhler)
0.212
Pseudo-R² (McFadden)
0.166
AIC
2063.466
exp(Est.)
2.5%
97.5%
t val.
p
(Intercept)
0.045
0.010
0.199
-4.101
NA
bmiNot overweight
0.235
0.159
0.348
-7.219
NA
genderMale
1.317
1.082
1.604
2.740
NA
age
1.051
1.044
1.059
13.280
NA
raceHispanic
0.896
0.649
1.237
-0.665
NA
raceOther
1.158
0.565
2.374
0.401
NA
raceWhite
0.778
0.507
1.193
-1.152
NA
incomeBetween.25kto54k
0.830
0.551
1.251
-0.889
NA
incomeBetween.55kto99k
0.872
0.621
1.224
-0.794
NA
incomeOver100k
0.669
0.418
1.072
-1.669
NA
educationHigh.School
0.948
0.733
1.225
-0.408
NA
educationSchool
0.872
0.448
1.695
-0.405
NA
marriedNever.married
1.055
0.677
1.646
0.237
NA
marriedPreviously.married
0.849
0.534
1.348
-0.694
NA
cholesterol
0.991
0.986
0.995
-3.810
NA
diastolicBP
1.005
0.992
1.018
0.749
NA
systolicBP
1.003
0.993
1.012
0.559
NA
Standard errors: Robust
References
Austin, Peter C, Nathaniel Jembere, and Maria Chiu. 2018. “Propensity Score Matching and Complex Surveys.”Statistical Methods in Medical Research 27 (4): 1240–57.
DuGoff, Eva H, Megan Schuler, and Elizabeth A Stuart. 2014. “Generalizing Observational Study Results: Applying Propensity Score Methods to Complex Surveys.”Health Services Research 49 (1): 284–303.
Zanutto, Elaine L. 2006. “A Comparison of Propensity Score and Linear Regression Analysis of Complex Survey Data.”Journal of Data Science 4 (1): 67–91.