Chapter 9 PS vs. Double robust methods

Understanding sources of bias is important

Bias = Average of our estimates in repeated sampling - true treatment effect = bias due to finite sample + bias due to not designing the study properly + bias due to model misspecification

The past part is often less talked about. Nonparametric or machine learning methods could be an advantage if used properly to reduce bias due to this source. A topic for a different tutorial, but here we will show an example of the implication of avoiding the possibility of model-misspecification.

9.1 Complex Data Simulation

Complex simulation example, so that we know the true parameter \(\theta\).

\(Y\) : Outcome	(continuous)
\(A\) : Exposure	(binary)
\(C\) (1-4): true confounders	(continuous)
\(L\) (1-4): measured and transformed version of true confounders	(continuous)

True Exposure Model

\[\begin{equation} \begin{aligned} \pi = P(A=1|C) &= f(C) \\ &= \frac{1}{1+exp[\alpha_0 + \alpha_1 C_1 + \alpha_2 C_2 + \alpha_3 C_3 + \alpha_4 C_4 ]}\nonumber \end{aligned} \end{equation}\]

True Outcome Model

\[\begin{equation} \begin{aligned} E(Y|A,C) &= g(A,C)\\ &= \beta_0 + \theta A + \beta_1 C_1 + \beta_2 C_2 + \beta_3 C_3 + \beta_4 C_4 \nonumber \end{aligned} \end{equation}\]

In our example, \(\theta\) = 6, which is our treatment effect.

Outcomes and exposures are complex functions of measured covariates

Using example from Kang and Schafer (2007), Naimi, Mishler, and Kennedy (2017) and Balzer and Westling (2021), let us assume that we don’t observed the \(C\) variables directly, and we only have access to a set of transformed covariates \(L\)s.

\[\begin{equation} \begin{aligned} L_1 &= \exp(C1/2) \nonumber\\ L_2 &= [(C2/(1+\exp(C1)) + 10)]\nonumber\\ L_3 &= [(C1*C3/25 + 0.6)^3] \nonumber\\ L_4 &= [(C2 + C4 + 20)^2\nonumber \end{aligned} \end{equation}\]

create.data <- function(n, te = 6){
  # original covariates that generates A and Y
  C1 <- rnorm(n,0,1)
  C2 <- rnorm(n,0,1)
  C3 <- rnorm(n,0,1)
  C4 <- rnorm(n,0,1)
  pscore <- plogis(-1 +log(1.75)*(C1+C2+C3+C4) )
  # treatment generation model
  A <- rbinom(n, 1, pscore)
  eps <- rnorm(n,0,6) # error term
  get.Y <- function(A, TE=te) {120+TE*A+3*(C1+C2+C3+C4)+eps}
  Y1 <- get.Y(A=1, TE=te)
  Y0 <- get.Y(A=0, TE=te)
  # potential outcomes
  PO <- data.frame(cbind(pscore, Y1,Y0, C1, C2, C3, C4))
  # outcome generation model
  Y <- get.Y(A=A, TE=te)
  # misspecified versions of covariates
  L1 <- exp(C1/2)
  L2 <- C2/(1+exp(C1)) + 10
  L3 <- (C1*C3/25 + 0.6)^3
  L4 <- (C2 + C4 + 20)^2
  obs <- data.frame(cbind(L1,L2,L3,L4,A,Y))
  return(list(observed=obs, 
              Potential.Outcome= PO, 
              ate = mean(PO$Y1)-mean(PO$Y0)))
}

create.data(6)

## $observed
##          L1       L2        L3       L4 A        Y
## 1 2.2374436 10.06754 0.1625330 419.8183 1 120.3914
## 2 1.7679300 10.02612 0.1537718 441.0633 1 113.7917
## 3 1.8338882 10.06432 0.2636953 489.8636 0 133.9537
## 4 1.3741692 10.04336 0.1926407 389.0893 0 124.1439
## 5 0.9565624 11.25774 0.2182527 556.0792 1 138.0037
## 6 0.8111285 10.77576 0.2057078 512.2704 0 127.6085
## 
## $Potential.Outcome
##      pscore       Y1       Y0         C1        C2         C3          C4
## 1 0.4265123 120.3914 114.3914  1.6106679 0.4056800 -0.8423000  0.08378755
## 2 0.3564975 113.7917 107.7917  1.1396187 0.1077560 -1.4095507  0.89375158
## 3 0.7938446 139.9537 133.9537  1.2128768 0.2806335  0.8504563  1.85222880
## 4 0.2154750 130.1439 124.1439  0.6356986 0.1252420 -0.8832483 -0.39989653
## 5 0.6518198 138.0037 132.0037 -0.0888186 2.4085868 -0.5850746  1.17274430
## 6 0.6371594 133.6085 127.6085 -0.4186576 1.2861567  0.5783549  1.34723419
## 
## $ate
## [1] 6

9.2 Understanding finite sample bias

First try:

set.seed(1)
result.data <- create.data(100)
round(coef(glm(Y ~ A + L1 + L2 + L3 + L4, family="gaussian", 
               data=result.data$observed)),2)

## (Intercept)           A          L1          L2          L3          L4 
##       91.23        8.46        3.50        0.81      -19.83        0.05

Second try:

set.seed(22)
result.data <- create.data(100)
round(coef(glm(Y ~ A + L1 + L2 + L3 + L4, family="gaussian", 
               data=result.data$observed)),2)

## (Intercept)           A          L1          L2          L3          L4 
##       70.05       10.20        4.48        2.29        9.53        0.05

Third try:

set.seed(33)
result.data <- create.data(100)
round(coef(glm(Y ~ A + L1 + L2 + L3 + L4, family="gaussian", 
               data=result.data$observed)),2)

## (Intercept)           A          L1          L2          L3          L4 
##       97.51        7.13        2.32       -0.40       12.16        0.05

9.3 Estimation using different methods

We now work with a larger data: \(n = 10,000\)

set.seed(123)
result.data <- create.data(10000)

9.3.1 Regression

round(coef(glm(Y ~ A + L1 + L2 + L3 + L4, family="gaussian", 
               data=result.data$observed)),2)

## (Intercept)           A          L1          L2          L3          L4 
##       88.23        7.70        4.40       -0.12       -3.13        0.07

9.3.2 Propensity score

Propensity score model fitting:

require(MatchIt)
match.obj <- matchit(A ~ L1 + L2 + L3 + L4, method = "nearest", 
                     data = result.data$observed,
                     distance = 'logit', 
                     caliper = 0.2,
                     replace = FALSE, 
                     ratio = 1)
match.obj

## A matchit object
##  - method: 1:1 nearest neighbor matching without replacement
##  - distance: Propensity score [caliper]
##              - estimated with logistic regression
##  - caliper: <distance> (0.034)
##  - number of obs.: 10000 (original), 5790 (matched)
##  - target estimand: ATT
##  - covariates: L1, L2, L3, L4

matched.data <- match.data(match.obj)

Results from step 4: crude

round(coef(glm(Y ~ A, 
               family="gaussian", 
               data=matched.data)),2)

## (Intercept)           A 
##      121.51        7.93

Results from step 4: double adjustment

round(coef(glm(Y ~ A + L1 + L2 + L3 + L4, 
               family="gaussian", 
               data=matched.data)),2)

## (Intercept)           A          L1          L2          L3          L4 
##       89.56        7.66        4.11       -0.36        8.38        0.07

9.3.3 Double machine learning method

We are using DML with only one learner:

res <- result.data$observed
require(DoubleML)
dd <- DoubleMLData$new(res, 
                       y_col = "Y", 
                       d_cols = "A", 
                       x_cols = c("L1","L2","L3","L4"))
library(mlr3)
library(mlr3learners)
lgr::get_logger("mlr3")$set_threshold("warn")

learner = lrn("regr.ranger", 
              num.trees=500, 
              mtry=floor(sqrt(5)), 
              max.depth=5, 
              min.node.size=2)
ml_g = learner$clone() # outcome modelling
ml_m = learner$clone() # exposure modelling

set.seed(1234)
dml.res <- DoubleMLPLR$new(dd, ml_g=ml_g, ml_m=ml_m)
dml.res$fit()
print(dml.res)

## ================= DoubleMLPLR Object ==================
## 
## 
## ------------------ Data summary      ------------------
## Outcome variable: Y
## Treatment variable(s): A
## Covariates: L1, L2, L3, L4
## Instrument(s): 
## No. Observations: 10000
## 
## ------------------ Score & algorithm ------------------
## Score function: partialling out
## DML algorithm: dml2
## 
## ------------------ Machine learner   ------------------
## ml_g: regr.ranger
## ml_m: regr.ranger
## 
## ------------------ Resampling        ------------------
## No. folds: 5
## No. repeated sample splits: 1
## Apply cross-fitting: TRUE
## 
## ------------------ Fit summary       ------------------
##  Estimates and significance testing of the effect of target variables
##   Estimate. Std. Error t value Pr(>|t|)    
## A    7.0114     0.1532   45.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

dml.res$summary()

## Estimates and significance testing of the effect of target variables
##   Estimate. Std. Error t value Pr(>|t|)    
## A    7.0114     0.1532   45.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

9.3.4 Augmented Inverse probability weighting

SL.lib=c('SL.glm', 'SL.step.interaction', 'SL.earth', 'SL.mean')
require(AIPW)
require(ggplot2)
res <- result.data$observed
aipw_sl <- AIPW$new(Y=res$Y, A=res$A, 
                    W=res[c("L1","L2","L3","L4")],
                    Q.SL.library=SL.lib,g.SL.library=SL.lib,
                    k_split=1,verbose=TRUE)
aipw_sl$fit()
aipw_sl$summary(g.bound = 0.025)

print(aipw_sl$result, digits = 2)

##                  Estimate   SE 95% LCL 95% UCL     N
## Risk of exposure    126.3 0.18   126.0   126.7  3055
## Risk of control     119.9 0.10   119.7   120.1  6945
## Risk Difference       6.4 0.19     6.1     6.8 10000

9.3.5 Double robust method (TMLE)

TMLE with superlearner

tmle.res <- ltmle(data=result.data$observed, Anodes='A', Ynodes='Y', abar=list(1,0), 
                SL.library=c('SL.glm', 'SL.step.interaction', 'SL.earth', 'SL.mean'), 
                estimate.time=F,
                SL.cvControl=list(V=10),
                gbounds=c(0.025, 0.975))

summary(tmle.res)

## Estimator:  tmle 
## Call:
## ltmle(data = result.data$observed, Anodes = "A", Ynodes = "Y", 
##     abar = list(1, 0), gbounds = c(0.025, 0.975), SL.library = c("SL.glm", 
##         "SL.step.interaction", "SL.earth", "SL.mean"), SL.cvControl = list(V = 10), 
##     estimate.time = F)
## 
## Treatment Estimate:
##    Parameter Estimate:  126.3 
##     Estimated Std Err:  0.15795 
##               p-value:  <2e-16 
##     95% Conf Interval: (125.99, 126.61) 
## 
## Control Estimate:
##    Parameter Estimate:  119.92 
##     Estimated Std Err:  0.099995 
##               p-value:  <2e-16 
##     95% Conf Interval: (119.73, 120.12) 
## 
## Additive Treatment Effect:
##    Parameter Estimate:  6.382 
##     Estimated Std Err:  0.16961 
##               p-value:  <2e-16 
##     95% Conf Interval: (6.0496, 6.7144)

You can also try with more candidate learners within super learner

# Time consuming
# tmle.res1 <- ltmle(data=result.data$observed, Anodes='A', Ynodes='Y', abar=list(1,0), 
#                 SL.library=c('SL.glm', 'SL.step.interaction', 
#                              'SL.gam', 'SL.glmnet', 'SL.cforest',
#                              "SL.ranger", "SL.xgboost",
#                              "SL.rpart", "SL.rpartPrune",    
#                              'SL.earth', 'SL.mean'), 
#                 estimate.time=F,
#                 SL.cvControl=list(V=10),
#                 gbounds=c(0.025, 0.975))

# summary(tmle.res1)

References

Balzer, Laura B, and Ted Westling. 2021. “Demystifying Statistical Inference When Using Machine Learning in Causal Research.” American Journal of Epidemiology.

Kang, Joseph DY, and Joseph L Schafer. 2007. “Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data.” Statistical Science 22 (4): 523–39.

Naimi, Ashley I, Alan E Mishler, and Edward H Kennedy. 2017. “Challenges in Obtaining Valid Causal Effect Estimates with Machine Learning Algorithms.” arXiv Preprint arXiv:1711.07137.