Continuous Outcomes
We will now go through an example of using TMLE for a continuous outcome. The setup for SuperLearner in this case is similar to that for binary outcomes, so rather than going through the SuperLearner steps again, we will instead focus on the additional steps that are necessary for running the tmle
method on continuous outcomes.
Frank and Karim (2023) extensively discussed the implementation of TMLE for continuous outcomes, providing a detailed step-by-step guide using the openly accessible RHC dataset. In this tutorial, we will revisit the same example with additional explanations.
Only outcome variable (Length of stay); slightly different than Table 2 in Connors et al. (1996) (means were 20.5 vs. 25.7; and medians were 16 vs. 17).
Constructing SuperLearner
Just as we did for a binary outcome, we will need to specify two SuperLearners, one for the exposure and one for the outcome model.
The effective sample size for a continuous outcome is just \(n_{eff}=n=5,735\). We calculated the effective sample size for the exposure model earlier, which also turned out to be \(n_{eff}=n=5,735\). So once again we will use 5 folds because \(5,000 \leq n_{eff} \leq 10,000\) (Phillips et al. 2023).
Similarly to our example with the binary outcome, the key considerations for the library of learners are:
We have some continuous covariates, and should therefore include learners that allow non-linear/monotonic relationships.
We have a large \(n\), so should include as many learners as is computationally feasible.
We have 49 covariates and 5,735 observations, so we do not have high-dimensional data and including screeners is optional.
Again the requirements for the exposure and outcome models are the same and we can use the same library for both models. Note that even though one model will have a binary dependent variable, and one will have a continuous dependent variable, most of the available learners automatically adapt to binary and continuous dependent variables.
For this example, we will use the same SuperLearner library as for the binary outcome example.
Dealing with continuous outcomes
For this example, we will be examining the length of stay in hospital outcome.
The key difference between running TMLE on a continuous outcome in comparison to running it with a binary outcome, is that we must transform the outcome to fall within the range of 0 to 1, so that the modeled outcomes fall within the range of the outcome’s true distribution (Gruber and Laan 2010).
To transform the outcome, we can use min-max normalization:
\[ Y_{transformed} = \frac{Y-Y_{min}}{Y_{max}-Y_{min}} \]
Once we have transformed the outcome to fall within the range of 0 to 1, we can run TMLE as before, using the tmle
method in the tmle
package:
Once the tmle
method has run, we still have one step to complete to get our final estimate. At this point, we must transform the average treatment effect generated by the tmle
method (\(\widehat{ATE}\)) back to the outcome’s original scale:
\[ \widehat{ATE}_{rescaled} = (Y_{max}-Y_{min})*\widehat{ATE} \]
We also have to transform the confidence interval back to the original scale:
ATE for continuous outcome: 2.9396218, and 95 % CI is 1.959698, 3.9195455.
The results indicate that if all participants had received RHC, the average length of stay in hospital would be 2.95 (1.99, 3.91) days longer than if no participants had received RHC.
Understanding defaults
Transform outcome:
Run TMLE, using the tmle
package’s default SuperLearner library:
Transform the average treatment effect generated by the tmle
method (\(\widehat{ATE}\)) back to the outcome’s original scale:
\[ \widehat{ATE}_{rescaled} = (Y_{max}-Y_{min})*\widehat{ATE} \]
Transform the confidence interval back to the original scale:
ATE for continuous outcome using default library: 3.0362984, and 95% CI 1.2686301, 4.8039667.
The estimate using the default SuperLearner library (2.18) is similar to the estimate we got when using our user-specified SuperLearner library (2.95). However, the confidence interval using the default SuperLearner library (1.25, 4.37) was much wider than that using our user-specified SuperLearner library (1.99, 3.91).
Comparison of results
Adjusted regression:
# adjust the exposure variable
# (primary interest) + covariates
baselineVars.LoS <- c(baselinevars, "Death")
out.formula.cont <- as.formula(
paste("Length.of.Stay~ RHC.use +",
paste(baselineVars.LoS,
collapse = "+")))
fit1.cont <- lm(out.formula.cont, data = ObsData)
publish(fit1.cont, digits=1)$regressionTable[2,]
Connors et al. (1996) conducted a propensity score matching analysis. Table 5 showed that, after propensity score pair (1-to-1) matching, means of length of stay (\(Y\)), when stratified by RHC (\(A\)) were not significantly different (\(p = 0.14\)).
method.list | Estimate | 2.5 % | 97.5 % |
---|---|---|---|
Adjusted Regression | 3.04 | 1.51 | 4.58 |
TMLE (user-specified SL library) | 2.94 | 1.96 | 3.92 |
TMLE (default SL library) | 3.04 | 1.27 | 4.80 |
Keele and Small (2021) paper | 2.01 | 0.60 | 3.41 |
Differences in results can likely be attributed to the use of different SuperLearner libraries, the use of different combinations of variables used, or random sampling associated with the cross-validation used in the SuperLearner algorithm.