Motivation

When using methods like propensity score approaches, we are making assumptions about the model specification. For example, we must specify any interaction terms.
With machine learning methods, these assumptions can be relaxed somewhat, as some machine learning methods allow automatic detection of data structures such as interactions.
However, machine learning was created for prediction modeling, not with causal inference in mind. Statistical inference such as calculating standard errors and confidence intervals is not as straightforward since the estimator given by machine learning methods does not follow a known statistical distribution. By contrast, the estimators resulting from a standard regression using maximum likelihood estimation will follow an approximately normal distribution, where it is easy to calculate standard errors and confidence intervals.
Targeted maximum likelihood estimation (TMLE) is a causal inference method that can incorporate machine learning in a way that still allows straightforward statistical inference based on theoretical development grounded in semi-parametric theory.
- TMLE is a doubly robust method. This means it uses both the exposure (AKA propensity score) model and the outcome model. As long as one of these models is correctly specified, TMLE will give a consistent estimator, meaning it gets closer and closer to the true value as the sample size increases.
- Since TMLE uses both the exposure and the outcome model, machine learning can be used in each of these intermediary modeling steps while allowing straightforward statistical inference.
It has been shown that TMLE outperform singly robust methods with machine learning, such as IPTW.

Revisiting RHC Data

Note

This tutorial uses the same data as some of the previous tutorials, including working with a predictive question, machine learning with a continuous outocome, and machine learning with a binary outcome.

ObsData <- readRDS(file = 
                     "Data/machinelearningCausal/rhcAnalyticTest.RDS")
baselinevars <- names(dplyr::select(ObsData, 
                         !c(RHC.use,Length.of.Stay,Death)))
head(ObsData)

Table 1

Only for some demographic and comorbidity variables; matches with Table 1 in Connors et al. (1996).

tab0 <- CreateTableOne(vars = c("age", "sex", "race", 
                                "Disease.category", "Cancer"),
                       data = ObsData, 
                       strata = "RHC.use", 
                       test = FALSE)
print(tab0, showAllLevels = FALSE)
#>                       Stratified by RHC.use
#>                        0            1           
#>   n                    3551         2184        
#>   age (%)                                       
#>      [-Inf,50)          884 (24.9)   540 (24.7) 
#>      [50,60)            546 (15.4)   371 (17.0) 
#>      [60,70)            812 (22.9)   577 (26.4) 
#>      [70,80)            809 (22.8)   529 (24.2) 
#>      [80, Inf)          500 (14.1)   167 ( 7.6) 
#>   sex = Female (%)     1637 (46.1)   906 (41.5) 
#>   race (%)                                      
#>      white             2753 (77.5)  1707 (78.2) 
#>      black              585 (16.5)   335 (15.3) 
#>      other              213 ( 6.0)   142 ( 6.5) 
#>   Disease.category (%)                          
#>      ARF               1581 (44.5)   909 (41.6) 
#>      CHF                247 ( 7.0)   209 ( 9.6) 
#>      Other              955 (26.9)   208 ( 9.5) 
#>      MOSF               768 (21.6)   858 (39.3) 
#>   Cancer (%)                                    
#>      None              2652 (74.7)  1727 (79.1) 
#>      Localized (Yes)    638 (18.0)   334 (15.3) 
#>      Metastatic         261 ( 7.4)   123 ( 5.6)

References

Connors, Alfred F, Theodore Speroff, Neal V Dawson, Charles Thomas, Frank E Harrell, Douglas Wagner, Norman Desbiens, et al. 1996. “The Effectiveness of Right Heart Catheterization in the Initial Care of Critically III Patients.” Jama 276 (11): 889–97. https://tinyurl.com/Connors1996.