Binary Outcomes
For this example we will be looking at the binary outcome variable death
.
TMLE
TMLE works by first constructing an initial outcome and extracting a crude estimate of the treatment effect. Then, TMLE aims to refine the initial estimate in the direction of the true value of the parameter of interest through use of the exposure model.
To label the treatment effect given by TMLE as causal, the same conditional exchangeability, positivity, and consistency assumptions must be met as for other modeling strategies (see introduction to propensity score). TMLE also assumes that at least one of the exposure or outcome model is correctly specified. If this does not hold, TMLE does not necessarily produce a consistent estimator.
Luque-Fernandez et al. (2018) discussed the implementation of TMLE, and providing a detailed step-by-step guide, primarily focusing on a binary outcome.
The basic steps are:
- Construct initial outcome model & get crude estimate
- Construct exposure model and use propensity scores to update the initial outcome model through a targeted adjustment
- Extract treatment effect estimate
- Estimate confidence interval based on a closed-form formula
The tmle package implements TMLE for both binary and continuous outcomes, and uses the SuperLearner to construct the exposure and outcome models.
The tmle method takes a number of parameters, including:
Term | Description |
---|---|
Y | Outcome vector |
A | Exposure vector |
W | Matrix that includes vectors of all covariates |
family | Distribution |
V | Cross-validation folds for exposure and outcome modeling |
Q.SL.library | Set of machine learning methods to use for SuperLearner for outcome modeling |
g.SL.library | Set of machine learning methods to use for SuperLearner for exposure modeling |
Constructing SuperLearner
We will need to specify two SuperLearners, one for the exposure and one for the outcome model. We will need to consider the characteristics of our sample in order to decide the number of cross-validation folds and construct a diverse and computationally feasible library of algorithms.
Number of folds
First, we need to define the number of cross-validation folds to use for each model. This depends on our effective sample size (Phillips et al. 2023).
Our effective sample size for the outcome model is:
Our effective sample size for the exposure model is:
For both models, the effective sample size is the same as our sample size, \(n = 5,735\).
Since \(5,000 \leq n_{eff} \leq 10,000\), we should use 5 or more cross-validation folds according to Phillips et al. (2023). For the sake of computational feasibility, we will use 5 folds in this example.
Candidate learners
The second step is to define the library of learners we will feed in to SuperLearner as potential options for each model (exposure and outcome). In this example, some of our covariates are continuous variables, such as temperature and blood pressure, so we need to include learners that allow non-linear/monotonic relationships.
Since \(n\) is large (\(>5000\)), we should include as many learners as is computationally feasible in our libraries.
Furthermore, we have 50 covariates:
\(5735/20 = 286.75\), and \(50<286.75\), so we do not have high-dimensional data and including screeners is optional (Phillips et al. 2023).
Since the requirements for the exposure and outcome models are the same in this example, we will use the same SuperLearner library for both. Overall for this example we need to make sure to include:
Parametric learners
Highly data-adaptive learners
Multiple variants of the same learner with different parameter specifications
Learners that allow non-linear/monotonic relationships
For this example, we will include the following learners:
-
Parametric
SL.mean
: only meanSL.glm
: generalized linear model
-
Highly data-adaptive
SL.glmnet
: penalized regression such as lassoSL.xgboost
: extreme gradient boosting
-
Allowing non-linear/monotonic relationships
SL.randomForest
: random foresttmle.SL.dbarts2
: bayesian additive regression treeSL.svm
: support vector machine
TMLE with SuperLearner
To run TMLE, we need to install the tmle package and load it on the R environment.
We also need to create a data frame containing only the covariates:
Then we can run TMLE using the tmle
method from the tmle
package:
ATE for binary outcome using user-specified library: 1.29 and 95% CI is 1.2011697, 1.3878896
These results show those who received RHC had odds of death that were 1.29 times as high as the odds of death in those who did not receive RHC.
Understanding defaults
We can compare the results using our specified SuperLearner library to the results we would get when using the tmle
package’s default SuperLearner libraries. To do this we simply do not specify libraries for the Q.SL.library
and g.SL.library
arguments.
set.seed(1444)
tmle.fit.def <- tmle::tmle(Y = ObsData$Death,
A = ObsData$RHC.use,
W = ObsData.noYA,
family = "binomial",
V.Q = 5,
V.g = 5)
# Q.SL.library = SL.library.test, ## removed this line
# g.SL.library = SL.library.test) ## removed this line
tmle.est.bin.def <- tmle.fit.def$estimates$OR$psi
tmle.ci.bin.def <- tmle.fit.def$estimates$OR$CI
ATE for binary outcome using default library: 1.32 with 95% CI 1.1808107, 1.4657624.
These ATE when using the default SuperLearner library (1.31) is very close to the ATE when using our user-specified SuperLearner library (1.29). However, the confidence interval from TMLE using the default SuperLearner library (1.17, 1.46) is slightly wider than the confidence interval from TMLE using our user-specified SuperLearner library (1.20, 1.39).
Comparison of results
We can also compare these results to those from a basic regression and from the literature.
Connors et al. (1996) conducted a propensity score matching analysis. Table 4 showed that, after propensity score pair (1-to-1) matching, the odds of in-hospital mortality were 39% higher in those who received RHC (OR: 1.39 (1.15, 1.67)).
method.list | Estimate | 2.5 % | 97.5 % |
---|---|---|---|
Adjusted Regression | 0.36 | 0.22 | 0.51 |
TMLE (user-specified SL library) | 1.29 | 1.20 | 1.39 |
TMLE (default SL library) | 1.32 | 1.18 | 1.47 |
Connors et al. (1996) paper | 1.39 | 1.15 | 1.67 |