15 Hybrid ML

Instead of all recurrence variables, you start with the hdPS variables chosen by the hdPS algorithm first.

(Karim, Pang, and Platt 2018; Franklin et al. 2015)

15.1 Build model formula based on selected variables

From hdPS variables, we try to identify proxies that are empirically associated with the outcome based on a multivariate LASSO (outcome with all proxies in one model).

length(proxy.list.sel)
#> [1] 100
proxy.list <- names(out3$autoselected_covariate_df[,-1])
covarsTfull <- c(investigator.specified.covariates, proxy.list)
Y.form <- as.formula(paste0(c("outcome~ exposure", 
                              covarsTfull), collapse = "+") )
covar.mat <- model.matrix(Y.form, data = hdps.data)[,-1]
lasso.fit<-glmnet::cv.glmnet(y = hdps.data$outcome, 
                             x = covar.mat, 
                             type.measure='mse',
                             family="binomial",
                             alpha = 1, 
                             nfolds = 5)
coef.fit<-coef(lasso.fit,s='lambda.min',exact=TRUE)
sel.variables<-row.names(coef.fit)[which(as.numeric(coef.fit)!=0)]
proxy.list.sel.hybrid <- proxy.list[proxy.list %in% sel.variables]
length(proxy.list.sel.hybrid)
#> [1] 52
proxyform <- paste0(proxy.list.sel.hybrid, collapse = "+")
rhsformula <- paste0(c(covform, proxyform), collapse = "+")
ps.formula <- as.formula(paste0("exposure", "~", rhsformula))

Build propensity score model based on selected variables based on LASSO.

15.1.1 Fit the PS model

W.out <- weightit(ps.formula, 
                    data = hdps.data, 
                    estimand = "ATE",
                    method = "ps")

Propensity score model fit to be able to calculate the inverse probability weights.

15.1.2 Obtain log-OR from unadjusted outcome model

out.formula <- as.formula(paste0("outcome", "~", "exposure"))
fit <- glm(out.formula,
            data = hdps.data,
            weights = W.out$weights,
            family= binomial(link = "logit"))
fit.summary <- summary(fit)$coef["exposure",
                                 c("Estimate", 
                                   "Std. Error", 
                                   "Pr(>|z|)")]
fit.ci <- confint(fit, level = 0.95)["exposure", ]
fit.summary_with_ci.h <- c(fit.summary, fit.ci)
round(fit.summary_with_ci.h,2) 
#>   Estimate Std. Error   Pr(>|z|)      2.5 %     97.5 % 
#>       0.44       0.04       0.00       0.36       0.51

Summary of results (log-OR).

Alternative process

It is also possible to start with ML selection, and then applying Bross’s formula on top of it (Schneeweiss et al. 2017).