19 DC-TMLE

Double Cross-Fit TMLE (DC-TMLE) (Mondol and Karim 2024; M. Karim and Mondol 2025) is an extension of TMLE designed to improve robustness and reduce overfitting when using flexible, high-dimensional or machine learning-based models. It works by splitting the data into multiple folds, training nuisance models (e.g., the propensity score and outcome regressions) on one subset, and then evaluating the targeted update and parameter estimation on another. This sample-splitting (cross-fitting) procedure helps ensure that the estimation step is not biased by the same data used to fit the nuisance models. This process of sample-splitting and estimation is repeated, and the results are averaged to produce a final, stable estimate. DC-TMLE maintains double robustness, meaning it remains consistent if either the treatment or outcome model is correctly specified, and it provides valid statistical inference even in high-dimensional settings where traditional TMLE may be unstable.

Figure 1. Double Cross-Fit TMLE (DC-TMLE)

19.1 Background

Residual confounding remains a persistent challenge in observational studies, particularly with high-dimensional data (M. E. Karim and Lei 2025). Recent work evaluates traditional and machine learning-based extensions of hdPS methods, including Super Learner (SL), TMLE, and Double Cross-Fit TMLE (DC-TMLE).

Tip

(M. E. Karim and Lei 2025)

19.2 Simulation Design

Element	Details
Data Source	NHANES 2013–2018
Sample Size	3,000 per iteration
Iterations	500
Exposure/Outcome Prevalence	3 scenarios: (i) Frequent-Frequent, (ii) Rare-Frequent, (iii) Frequent-Rare
True Effect	OR = 1 (null); RD = 0
Proxies	142 medication variables; 94 outcome-associated proxies and 48 noise variables
Confounding Simulation	Used proxy-derived comorbidity index and complex transformations to mimic unmeasured confounding

19.3 Methods Compared

Method Group	Method	Description
TMLE Methods with Proxies	TMLE.ks, hdPS.TMLE, LASSO.TMLE, hdPS.LASSO.TMLE	TMLE with various proxy selection strategies
	DC.TMLE	Double cross-fit TMLE
Super Learner Methods with Proxies	hdPS.SL, LASSO.SL, hdPS.LASSO.SL, SL.ks	Super Learner with proxy selection options
Standard Methods with Proxies	PS.ks, hdPS, LASSO, hdPS.LASSO	Propensity score and outcome models with proxy inclusion
No Proxy Methods	TMLE.u, SL.u, PS.u	Only measured covariates, no proxies

Super Learner libraries included:

1-learner: Logistic regression
3-learners: Logistic regression, LASSO, MARS
4-learners: Above + XGBoost (non-Donsker)

19.4 Simulation Results

Figure 2. Bias across Methods in NHANES Plasmode Simulation

Figure 3. Coverage across Methods in NHANES Plasmode Simulation

Results are fully accessible via a Shiny app:

👉 Interactive Causal Benchmark App

Explore bias, SEs, and coverage metrics across methods and simulation conditions.

19.5 Conclusion

Simpler models with structured proxy inclusion (hdPS, LASSO) remain competitive and stable.
TMLE is effective for bias reduction but suffers under high-dimensional instability with complex libraries.
SL performance is library-sensitive; 1- and 3-learner libraries performed best. Complex learners (e.g., XGBoost) should be used cautiously.