19  DC-TMLE

Double Cross-Fit TMLE (DC-TMLE) (Mondol and Karim 2024; M. Karim and Mondol 2025) is an extension of TMLE designed to improve robustness and reduce overfitting when using flexible, high-dimensional or machine learning-based models. It works by splitting the data into multiple folds, training nuisance models (e.g., the propensity score and outcome regressions) on one subset, and then evaluating the targeted update and parameter estimation on another. This sample-splitting (cross-fitting) procedure helps ensure that the estimation step is not biased by the same data used to fit the nuisance models. This process of sample-splitting and estimation is repeated, and the results are averaged to produce a final, stable estimate. DC-TMLE maintains double robustness, meaning it remains consistent if either the treatment or outcome model is correctly specified, and it provides valid statistical inference even in high-dimensional settings where traditional TMLE may be unstable.

Figure 1. Double Cross-Fit TMLE (DC-TMLE)

19.1 Background

Residual confounding remains a persistent challenge in observational studies, particularly with high-dimensional data (M. E. Karim and Lei 2025). Recent work evaluates traditional and machine learning-based extensions of hdPS methods, including Super Learner (SL), TMLE, and Double Cross-Fit TMLE (DC-TMLE).

19.2 Simulation Design

Element Details
Data Source NHANES 2013–2018
Sample Size 3,000 per iteration
Iterations 500
Exposure/Outcome Prevalence 3 scenarios: (i) Frequent-Frequent, (ii) Rare-Frequent, (iii) Frequent-Rare
True Effect OR = 1 (null); RD = 0
Proxies 142 medication variables; 94 outcome-associated proxies and 48 noise variables
Confounding Simulation Used proxy-derived comorbidity index and complex transformations to mimic unmeasured confounding

19.3 Methods Compared

Method Group Method Description
TMLE Methods with Proxies TMLE.ks, hdPS.TMLE, LASSO.TMLE, hdPS.LASSO.TMLE TMLE with various proxy selection strategies
DC.TMLE Double cross-fit TMLE
Super Learner Methods with Proxies hdPS.SL, LASSO.SL, hdPS.LASSO.SL, SL.ks Super Learner with proxy selection options
Standard Methods with Proxies PS.ks, hdPS, LASSO, hdPS.LASSO Propensity score and outcome models with proxy inclusion
No Proxy Methods TMLE.u, SL.u, PS.u Only measured covariates, no proxies

Super Learner libraries included:

  • 1-learner: Logistic regression
  • 3-learners: Logistic regression, LASSO, MARS
  • 4-learners: Above + XGBoost (non-Donsker)

19.4 Simulation Results

Figure 2. Bias across Methods in NHANES Plasmode Simulation

Figure 3. Coverage across Methods in NHANES Plasmode Simulation

Results are fully accessible via a Shiny app:

👉 Interactive Causal Benchmark App

Explore bias, SEs, and coverage metrics across methods and simulation conditions.

19.5 Conclusion

  • Simpler models with structured proxy inclusion (hdPS, LASSO) remain competitive and stable.
  • TMLE is effective for bias reduction but suffers under high-dimensional instability with complex libraries.
  • SL performance is library-sensitive; 1- and 3-learner libraries performed best. Complex learners (e.g., XGBoost) should be used cautiously.