Chapter 7 Step 4: Outcome modelling

  • Some flexibility in choosing outcome model
    • considered independent of exposure modelling
    • some propose double robust approach
    • adjusting imbalanced covariates only?

7.1 Crude outcome model

Estimate the effect of treatment on outcomes using propensity score-matched sample

fit3 <- glm(cholesterol~diabetes,
            family=binomial, data = matched.data)
publish(fit3)
##  Variable Units OddsRatio       CI.95  p-value 
##  diabetes            0.90 [0.54;1.50]   0.6984

7.2 Double-adjustment

Estimate the effect of treatment on outcomes using propensity score-matched sample, and adjust for imbalanced covariate

fit3r <- glm(cholesterol~diabetes + race,
            family=binomial, data = matched.data)
publish(fit3r)
##  Variable    Units OddsRatio       CI.95  p-value 
##  diabetes               0.89 [0.54;1.49]   0.6657 
##      race    Black       Ref                      
##           Hispanic      0.96 [0.46;2.02]   0.9165 
##              Other      1.32 [0.63;2.78]   0.4581 
##              White      0.58 [0.30;1.13]   0.1095

7.3 Adjusted outcome model

Adjust for all covariates, again! (suggested)

out.formula <- as.formula(paste("cholesterol", "~ diabetes +", 
                               paste(baselinevars, 
                                     collapse = "+")))
out.formula
## cholesterol ~ diabetes + gender + age + race + education + married + 
##     bmi
fit3b <- glm(out.formula,
            family=binomial, data = matched.data)
publish(fit3b)
##   Variable              Units OddsRatio       CI.95     p-value 
##   diabetes                         0.86 [0.51;1.46]   0.5794126 
##     gender             Female       Ref                         
##                          Male      0.38 [0.21;0.69]   0.0012767 
##        age                         0.95 [0.93;0.97]     < 1e-04 
##       race              Black       Ref                         
##                      Hispanic      0.72 [0.31;1.65]   0.4346787 
##                         Other      0.77 [0.34;1.73]   0.5224157 
##                         White      0.51 [0.25;1.04]   0.0649791 
##  education            College       Ref                         
##                   High.School      0.70 [0.39;1.24]   0.2215142 
##                        School      0.93 [0.35;2.43]   0.8791455 
##    married            Married       Ref                         
##                 Never.married      0.48 [0.15;1.54]   0.2173180 
##            Previously.married      0.84 [0.45;1.57]   0.5900732 
##        bmi                         0.93 [0.89;0.97]   0.0005547

The above analysis do not take matched pair into consideration while regressing.

7.4 Variance considerations

Literature proposes different strategies:

  • do not control for pairs / clusters
    • use glm as is
  • control for pairs / clusters
    • use cluster option (preferred)
    • use GEE or
    • use conditional logistic

7.4.1 Cluster option

Here is an example using cluster option:

require(jtools)
summ(fit3b, rubust = "HC0", confint = TRUE, digists = 3, 
     cluster = "subclass", model.info = FALSE, 
     model.fit = FALSE, exp = TRUE)
exp(Est.) 2.5% 97.5% z val. p
(Intercept) 100.02 8.74 1144.55 3.70 0.00
diabetes 0.86 0.51 1.46 -0.55 0.58
genderMale 0.38 0.21 0.69 -3.22 0.00
age 0.95 0.93 0.97 -4.47 0.00
raceHispanic 0.72 0.31 1.65 -0.78 0.43
raceOther 0.77 0.34 1.73 -0.64 0.52
raceWhite 0.51 0.25 1.04 -1.85 0.06
educationHigh.School 0.70 0.39 1.24 -1.22 0.22
educationSchool 0.93 0.35 2.43 -0.15 0.88
marriedNever.married 0.48 0.15 1.54 -1.23 0.22
marriedPreviously.married 0.84 0.45 1.57 -0.54 0.59
bmi 0.93 0.89 0.97 -3.45 0.00
Standard errors: MLE

7.4.2 Bootstrap

  • Bootstrap for matched pair for WOR (Austin and Small 2014)
    • straightforward method to estimate SE
    • may not be appropriate for WR
  • Following is an example of block bootstrap; see MatchIt package vignettes for additional details.
# For binary outcomes
# with covariate adjustment and bootstrapping
pair_ids <- levels(matched.data$subclass)
est_fun <- function(pairs, i) {
  # See MatchIt vignettes
  #Compute number of times each pair is present
  numreps <- table(pairs[i])
  
  #For each pair p, copy corresponding matched.data row indices numreps[p] times
  ids <- unlist(lapply(pair_ids[pair_ids %in% names(numreps)],
                       function(p) rep(which(matched.data$subclass == p), 
                                              numreps[p])))
  
  #Subset matched.data with block bootstrapped ids
  matched.data_boot <- matched.data[ids,]
  
  #Fitting outcome the model
  out.formula <- as.formula(paste("cholesterol", "~ diabetes +", 
                               paste(baselinevars, 
                                     collapse = "+")))
  fit_boot <- glm(out.formula,
            family = binomial(link = "logit"), 
            weights = weights,
            data = matched.data_boot)
  
  #Estimate potential outcomes for each unit
  #Under control
  matched.data_boot$diabetes <- 0
  P0 <- weighted.mean(predict(fit_boot, matched.data_boot, type = "response"),
                      w = matched.data_boot$weights)
  Odds0 <- P0 / (1 - P0)
  
  #Under treatment
  matched.data_boot$diabetes <- 1
  P1 <- weighted.mean(predict(fit_boot, matched.data_boot, type = "response"),
                      w = matched.data_boot$weights)
  Odds1 <- P1 / (1 - P1)

  #Return marginal odds ratio
  return(Odds1 / Odds0)
}

boot_est <- boot(pair_ids, est_fun, R = 1000)
# R must be larger than # of data rows
boot_est
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = pair_ids, statistic = est_fun, R = 1000)
## 
## 
## Bootstrap Statistics :
##      original     bias    std. error
## t1* 0.8706933 0.04061083   0.2376491
boot.ci(boot_est, type = "bca")
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = boot_est, type = "bca")
## 
## Intervals : 
## Level       BCa          
## 95%   ( 0.5239,  1.3936 )  
## Calculations and Intervals on Original Scale

7.5 Estimate obtained

  • The example compared diabetic (a treated group; target) vs Not diabetic (untreated).
  • Thc corresponding treatment effect estimate is known as
    • Average Treatment Effects on the Treated (ATT)
  • Other estimates from PS analysis (e.g., PS weighting) are possible that compared the whole population
    • what if everyone treated vs. what if nobody was treated (ATE)

You can see further comparison of results elsewhere.

References

Austin, Peter C, and Dylan S Small. 2014. “The Use of Bootstrapping When Using Propensity-Score Matching Without Replacement: A Simulation Study.” Statistics in Medicine 33 (24): 4306–19.
Nguyen, Tri-Long, Gary S Collins, Jessica Spence, Jean-Pierre Daurès, PJ Devereaux, Paul Landais, and Yannick Le Manach. 2017. “Double-Adjustment in Propensity Score Matching Analysis: Choosing a Threshold for Considering Residual Imbalance.” BMC Medical Research Methodology 17 (1): 1–8.