14 Pure ML
Tip
We show an example using LASSO aproach
15 Pure ML approach (LASSO)
Start with all recurrence variables (EC in the following equation)
Say, 100 proxies (associated with outcome) were selected by LASSO approach (ML-hdPS)
15.1 Choose variables associated with outcome
<- names(out3$autoselected_covariate_df[,-1])
proxy.list <- c(investigator.specified.covariates, proxy.list)
covarsTfull <- as.formula(paste0(c("outcome~ exposure",
Y.form collapse = "+") )
covarsTfull), <- model.matrix(Y.form, data = hdps.data)[,-1]
covar.mat <-glmnet::cv.glmnet(y = hdps.data$outcome,
lasso.fitx = covar.mat,
type.measure='mse',
family="binomial",
alpha = 1,
nfolds = 5)
<-coef(lasso.fit,s='lambda.min',exact=TRUE)
coef.fit<-row.names(coef.fit)[which(as.numeric(coef.fit)!=0)]
sel.variables<- proxy.list[proxy.list %in% sel.variables]
proxy.list.sel.ml length(proxy.list.sel.ml)
#> [1] 61
- From all proxies, we try to identify proxies that are empirically associated with the outcome based on a multivariate LASSO (outcome with all proxies in one model).
- Note that LASSO model is choosing variables based on association with the
outcome
conditional on the ’exposure`. - Variable selection is only happening for proxy variables.
- Investigator specified variables are not being subject to variable selection.
15.2 Build model formula based on selected variables
<- paste0(investigator.specified.covariates, collapse = "+")
covform <- paste0(proxy.list.sel.ml, collapse = "+")
proxyform <- paste0(c(covform, proxyform), collapse = "+")
rhsformula <- as.formula(paste0("exposure", "~", rhsformula)) ps.formula
Build propensity score model based on selected variables based on LASSO.
15.3 Fit the PS model
require(WeightIt)
<- weightit(ps.formula,
W.out data = hdps.data,
estimand = "ATE",
method = "ps")
Propensity score model fit to be able to calculate the inverse probability weights.
15.4 Obtain log-OR from unadjusted outcome model
<- as.formula(paste0("outcome", "~", "exposure"))
out.formula <- glm(out.formula,
fit data = hdps.data,
weights = W.out$weights,
family= binomial(link = "logit"))
<- summary(fit)$coef["exposure",
fit.summary c("Estimate",
"Std. Error",
"Pr(>|z|)")]
<- confint(fit, level = 0.95)["exposure", ]
fit.ci <- c(fit.summary, fit.ci)
fit.summary_with_ci round(fit.summary_with_ci,2)
#> Estimate Std. Error Pr(>|z|) 2.5 % 97.5 %
#> 0.41 0.04 0.00 0.34 0.49
Summary of results (log-OR).