
14 Pure ML
Tip
We show an example using LASSO aproach
15 Pure ML approach (LASSO)
Start with all recurrence variables (EC in the following equation)
Say, 100 proxies (associated with outcome) were selected by LASSO approach (ML-hdPS)

15.1 Choose variables associated with outcome
proxy.list <- names(out3$autoselected_covariate_df[,-1])
covarsTfull <- c(investigator.specified.covariates, proxy.list)
Y.form <- as.formula(paste0(c("outcome~ exposure",
covarsTfull), collapse = "+") )
covar.mat <- model.matrix(Y.form, data = hdps.data)[,-1]
lasso.fit<-glmnet::cv.glmnet(y = hdps.data$outcome,
x = covar.mat,
type.measure='mse',
family="binomial",
alpha = 1,
nfolds = 5)
coef.fit<-coef(lasso.fit,s='lambda.min',exact=TRUE)
sel.variables<-row.names(coef.fit)[which(as.numeric(coef.fit)!=0)]
proxy.list.sel.ml <- proxy.list[proxy.list %in% sel.variables]
length(proxy.list.sel.ml)
#> [1] 61- From all proxies, we try to identify proxies that are empirically associated with the outcome based on a multivariate LASSO (outcome with all proxies in one model).
- Note that LASSO model is choosing variables based on association with the
outcomeconditional on the ’exposure`. - Variable selection is only happening for proxy variables.
- Investigator specified variables are not being subject to variable selection.
15.2 Build model formula based on selected variables
covform <- paste0(investigator.specified.covariates, collapse = "+")
proxyform <- paste0(proxy.list.sel.ml, collapse = "+")
rhsformula <- paste0(c(covform, proxyform), collapse = "+")
ps.formula <- as.formula(paste0("exposure", "~", rhsformula))Build propensity score model based on selected variables based on LASSO.
15.3 Fit the PS model
require(WeightIt)
W.out <- weightit(ps.formula,
data = hdps.data,
estimand = "ATE",
method = "ps")Propensity score model fit to be able to calculate the inverse probability weights.
15.4 Obtain log-OR from unadjusted outcome model
out.formula <- as.formula(paste0("outcome", "~", "exposure"))
fit <- glm(out.formula,
data = hdps.data,
weights = W.out$weights,
family= binomial(link = "logit"))
fit.summary <- summary(fit)$coef["exposure",
c("Estimate",
"Std. Error",
"Pr(>|z|)")]
fit.ci <- confint(fit, level = 0.95)["exposure", ]
fit.summary_with_ci <- c(fit.summary, fit.ci)
round(fit.summary_with_ci,2)
#> Estimate Std. Error Pr(>|z|) 2.5 % 97.5 %
#> 0.41 0.04 0.00 0.34 0.49Summary of results (log-OR).