Chapter 3 Propensity score
3.1 Motivating problem
\(Y\) : Outcome | Cholesterol levels (high vs. low) |
\(A\) : Exposure | Diabetes |
\(L\) : Known Confounders | gender, age, race, education, married, BMI |
Search literature for the confounder variables, and look for those variables in the data source (NHANES 2017-2018).
- Data recourses:
- All of the data files used in this workshop are available in the GitHub repo.
- If you are interested in how to obtain NHANES datasets and prepare analytic data, here is an example (outside of the scope of the current tutorial).
load(file="data/NHANES17.RData")
require(dplyr)
<- dplyr::select(analytic,
analytic # outcome
cholesterol,
gender, age, race, education, # confounders
married, bmi, # exposure
diabetes) $cholesterol <- ifelse(analytic$cholesterol > 240, 1, 0)
analytic$diabetes <- ifelse(analytic$diabetes == "Yes", 1, 0) analytic
require(summarytools)
dfSummary(analytic)
## Data Frame Summary
## analytic
## Dimensions: 1562 x 8
## Duplicates: 2
##
## -------------------------------------------------------------------------------------------------------------------------------------------------
## No Variable Label Stats / Values Freqs (% of Valid) Graph Valid Missing
## ---- --------------------- --------------------------- ------------------------- --------------------- --------------------- ---------- ---------
## 1 cholesterol Min : 0 0 : 1391 (89.1%) IIIIIIIIIIIIIIIII 1562 0
## [numeric] Mean : 0.1 1 : 171 (10.9%) II (100.0%) (0.0%)
## Max : 1
##
## 2 gender 1. Female 603 (38.6%) IIIIIII 1562 0
## [character] 2. Male 959 (61.4%) IIIIIIIIIIII (100.0%) (0.0%)
##
## 3 age Age in years at screening Mean (sd) : 53.2 (17.2) 61 distinct values : : : 1562 0
## [labelled, integer] min < med < max: . : . . : : : : : (100.0%) (0.0%)
## 20 < 55 < 80 : : : : : : : : : :
## IQR (CV) : 29 (0.3) : : : : : : : : : :
## : : : : : : : : : :
##
## 4 race 1. Black 324 (20.7%) IIII 1562 0
## [character] 2. Hispanic 284 (18.2%) III (100.0%) (0.0%)
## 3. Other 228 (14.6%) II
## 4. White 726 (46.5%) IIIIIIIII
##
## 5 education 1. College 806 (51.6%) IIIIIIIIII 1562 0
## [character] 2. High.School 658 (42.1%) IIIIIIII (100.0%) (0.0%)
## 3. School 98 ( 6.3%) I
##
## 6 married 1. Married 921 (59.0%) IIIIIIIIIII 1562 0
## [character] 2. Never.married 228 (14.6%) II (100.0%) (0.0%)
## 3. Previously.married 413 (26.4%) IIIII
##
## 7 bmi Body Mass Index (kg/m2) Mean (sd) : 30 (7.3) 314 distinct values : 1562 0
## [labelled, numeric] min < med < max: : . (100.0%) (0.0%)
## 14.8 < 28.9 < 64.2 : : :
## IQR (CV) : 8.8 (0.2) : : : :
## . : : : : : .
##
## 8 diabetes Min : 0 0 : 1232 (78.9%) IIIIIIIIIIIIIII 1562 0
## [numeric] Mean : 0.2 1 : 330 (21.1%) IIII (100.0%) (0.0%)
## Max : 1
## -------------------------------------------------------------------------------------------------------------------------------------------------
3.2 Defining Propensity score
- Conditional Probability of getting treatment, given the observed covariates
Prob(treatment: \(A = 1\) | baseline or pre-treatment covariates: \(L\))
Prob(\(A = 1\): Has diabetes | \(L\): gender, age, race, education, married, bmi)
- PS = \(Prob(A=1|L)\)
Condensing multiple variables (in L) into one summary variable (PS).
- Essentially a dimension reduction exercise!
3.2.1 Theoretical result
Rosenbaum and Rubin (1983) showed:
- For potential outcomes \(Y(1), Y(0)\), if you have sufficient observed covariate list \(L\) to reduce confounding (`strong ignoribility’):
- i.e., if \((Y(1), Y(0)) \perp A | L\)
- Note that is this NOT \(Y \perp A | L\)
- then
- \((Y(1), Y(0)) \perp A | PS\) and
- \(A \perp L | PS\)
3.2.2 Assumptions
Conditional Exchangeability | \(Y(1), Y(0) \perp A | L\) | Treatment assignment is independent of the potential outcome, given L |
Positivity | \(0 < P(A=1 | L) < 1\) | Subjects are eligible to receive both treatment, given L |
Consistency | \(Y = Y(a) \forall A=a\) | No multiple version of the treatment |
3.3 PS Matching Steps
Propensity score matching has 4 steps (Austin 2011a)
Step 1 | exposure modelling: \(PS = Prob(A=1|L)\) |
Step 2 | Match by \(PS\) |
Step 3 | Assess balance and overlap (\(PS\) and \(L\)) |
Step 4 | outcome modelling: \(Prob(Y=1|A=1)\) |
References
———. 2011a. “A Tutorial and Case Study in Propensity Score Analysis: An Application to Estimating the Effect of in-Hospital Smoking Cessation Counseling on Mortality.” Multivariate Behavioral Research 46 (1): 119–51.
Rosenbaum, Paul R, and Donald B Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55.