flowchart LR S(Super Learner) --> l(Logistic regression) S --> g(LASSO) S --> m(Multivariate Adaptive Regression Splines MARS) style S fill:#90EE90;
We show an example using a super learner using 3 candidate learners.
If you want to know more about Super Learner, look at other tutorials.
flowchart LR S(Super Learner) --> l(Logistic regression) S --> g(LASSO) S --> m(Multivariate Adaptive Regression Splines MARS) style S fill:#90EE90;
The super learning approach is fundamentally different from the pure ML or LASSO approach discussed earlier. Here all of the candidate learners are using exposure as their outcome while running the model.
proxy.list <- names(out3$autoselected_covariate_df[,-1])
length(proxy.list)
#> [1] 100
covform <- paste0(investigator.specified.covariates, collapse = "+")
proxyform <- paste0(proxy.list, collapse = "+")
rhsformula <- paste0(c(covform, proxyform), collapse = "+")
ps.formula <- as.formula(paste0("exposure", "~", rhsformula))We work with all proxies
require(WeightIt)
W.out <- weightit(ps.formula,
data = hdps.data,
estimand = "ATE",
method = "super",
SL.library = c("SL.glm",
"SL.glmnet",
"SL.earth"))
#> Loading required namespace: glmnet
#> Loading required namespace: earthPropensity score model fit based on super learning algorithm to be able to calculate the inverse probability weights.
summary(W.out$ps)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.006352 0.250715 0.430072 0.448121 0.628521 0.982149
out.formula <- as.formula(paste0("outcome", "~", "exposure"))
fit <- glm(out.formula,
data = hdps.data,
weights = W.out$weights,
family= binomial(link = "logit"))
fit.summary <- summary(fit)$coef["exposure",
c("Estimate",
"Std. Error",
"Pr(>|z|)")]
fit.ci <- confint(fit, level = 0.95)["exposure", ]
fit.summary_with_ci.sl <- c(fit.summary, fit.ci)
round(fit.summary_with_ci.sl,2)
#> Estimate Std. Error Pr(>|z|) 2.5 % 97.5 %
#> 0.47 0.04 0.00 0.39 0.54Summary of results (log-OR).