25 Time-dependent exposure
25.1 Time-dependent Cox regression
Let us start with an example of exploring the relationship between disease-modifying drugs (DMDs) for multiple sclerosis and long-term mortality. The DMD exposure is a time-dependent variable, and the mortality outcome is a time-to-event outcome. Employing Cox proportional hazards models with time-varying exposure to DMDs can address immortal time bias in this example.
- Time-dependent Cox regression with time-varying exposure can help mitigate immortal time bias.
- The hdPS approach is used to deal with residual confounding with a binary ‘time-fixed’ treatment
- With a ‘time-dependent’ exposure, implementing the hdPS in conjunction with the time-dependent Cox regression presents a methodological and practical challenge.
See the associated article for more details (Hossain et al. 2025).
25.2 Nested case-control (NCC)
The nested case-control (NCC) design is a well-established method for addressing immortal time bias with a time-dependent exposure. The NCC framework provides a robust alternative for addressing immortal time bias, while allowing for the integration of hdPS analysis to minimize the residual confounding bias.
- Match subjects who experienced the event of interest (called cases) to a subset of event-free subjects (called controls) using incidence density sampling
- Some controls could later become cases themselves and also serve as controls for other cases
- Four controls per case has been shown to provide near-optimal statistical efficiency without the need for the full cohort analysis
25.3 hdPS in the NCC framework
The time-dependent exposure status becomes a time-independent exposure variable in the NCC analysis. Hence, we could implement the hdPS technique in the NCC framework to deal with residual confounding bias.
25.3.1 Step 0: Analytic data
To demonstrate the use of hdPS analysis with a time-dependent exposure, we will use a simulated dataset. This example explores the relationship between exposure to disease-modifying drugs (DMDs) for multiple sclerosis and all-cause mortality.
25.3.1.1 Dataset with time-dependent exposure
head(simdat)
25.3.1.2 NCC with 4 control per case
Let us use the nested case-control (NCC) design with 4 controls per case. The ccwc
function from the Epi package is used to create the nested case-control dataset. The ccwc
function requires the following arguments:
origin
: The time origin for the studyentry
: The time of entry into the studyexit
: The follow-up timefail
: The event of interestcontrols
: The number of controls per casematch
: The variables to match on
library(Epi)
set.seed(100)
<- ccwc(
dat.ncc origin = 0,
entry = 0,
exit = follow_up,
fail = mortality_outcome,
controls = 4,
match = list(ses, cci, year),
include = list(id, follow_up, mortality_outcome, anyDMD, yrs_anyDMD,
sex, age),data = simdat,
silent = T
)
# Drop those experienced the event before being exposed
$anyDMD[dat.ncc$yrs_anyDMD > dat.ncc$Time] <- NA
dat.ncc<- dat.ncc[complete.cases(dat.ncc$anyDMD),]
dat.ncc
1:10,] dat.ncc[
# Rows
dim(simdat)
#> [1] 19000 10
dim(dat.ncc)
#> [1] 14370 14
# Mortality status
table(simdat$mortality_outcome)
#>
#> 0 1
#> 15947 3053
table(dat.ncc$Fail)
#>
#> 0 1
#> 11317 3053
25.3.2 Step 1: Proxy sources
In this example, we will use four data dimensions:
- 3-digit diagnostic codes from hospital database (diag)
- 3-digit procedure codes from hospital database (proc)
- 3-digit icd codes from physician claim database (msp)
- DINPIN from drug dispensation database (din)
table(dat.proxy$dim)
#>
#> diag din msp proc
#> 10000 125 22135 158
25.3.3 Step 2: Empirical covariates
library(autoCovariateSelection)
<- simdat$id
id
<- get_candidate_covariates(df = dat.proxy, domainVarname = "dim",
step1 eventCodeVarname = "code",
patientIdVarname = "id",
patientIdVector = id,
n = 1000,
min_num_patients = 20)
<- step1$covars_data
out1 head(out1)
25.3.4 Step 3: Recurrence
Let us generate the binary recurrence covariates.
all.equal(id, step1$patientIds)
#> [1] TRUE
# Assessing recurrence of codes
<- get_recurrence_covariates(df = out1,
step2 eventCodeVarname = "code",
patientIdVarname = "id",
patientIdVector = id)
<- step2$recurrence_data
out2 dim(out2)
#> [1] 19000 454
# Recurrence covariates
<- names(out2)[-1]
vars.empirical head(vars.empirical)
#> [1] "rec_diag_H02_once" "rec_diag_H18_once" "rec_diag_H35_once"
#> [4] "rec_diag_H40_once" "rec_diag_H44_once" "rec_diag_I69_once"
25.3.4.1 Merging all recurrence covariates with the analytic dataset
<- merge(dat.ncc, out2, by = "id", all.x = T)
hdps.data dim(hdps.data)
#> [1] 14370 467
25.3.5 Step 4: Prioritize
We will use Cox-PH with LASSO regularization to prioritize the empirical covariates. The hyperparameter (\(\lambda\)) will be selected using 5-fold cross-validation.
25.3.5.1 Hyperparameter tuning
# Formula with only empirical covariates
<- as.formula(paste("Surv(Time, Fail) ~ ",
formula.out paste(vars.empirical, collapse = " + ")))
# Model matrix for fitting Cox with LASSO regularization
<- model.matrix(formula.out, data = hdps.data)[,-1]
X <- as.matrix(data.frame(time = hdps.data$Time, status = hdps.data$Fail))
Y
# Detect the number of cores
<- parallel::detectCores()
n_cores
# Create a cluster of cores
<- makeCluster(n_cores - 1)
cl
# Register the cluster for parallel processing
registerDoParallel(cl)
# Hyperparameter tuning with 5-fold cross-validation
set.seed(123)
<- cv.glmnet(x = X, y = Y, nfolds = 5, parallel = T, alpha = 1,
fit.lasso family = "cox")
stopCluster(cl)
plot(fit.lasso)
## Best lambda
$lambda.min
fit.lasso#> [1] 0.01042345
25.3.5.2 Variable ranking based on Cox-LASSO
<- coef(fit.lasso, s = fit.lasso$lambda.min)
empvars.lasso <- data.frame(as.matrix(empvars.lasso))
empvars.lasso <- data.frame(vars = rownames(empvars.lasso),
empvars.lasso coef = empvars.lasso)
colnames(empvars.lasso) <- c("vars", "coef")
rownames(empvars.lasso) <- NULL
# Number of non-zero coefficients
table(empvars.lasso$coef != 0)
#>
#> FALSE TRUE
#> 442 11
Since proxies were random and unrelated to the simulated data, LASSO produced only 11 non-zero coefficients. Let choose an arbitrary value as to demonstrate the process of variable selection.
<- coef(fit.lasso, s = exp(-6))
empvars.lasso <- data.frame(as.matrix(empvars.lasso))
empvars.lasso <- data.frame(vars = rownames(empvars.lasso),
empvars.lasso coef = empvars.lasso)
colnames(empvars.lasso) <- c("vars", "coef")
rownames(empvars.lasso) <- NULL
head(empvars.lasso)
# Number of non-zero coefficients
table(empvars.lasso$coef != 0)
#>
#> FALSE TRUE
#> 196 257
25.3.5.3 Rank empirical covariates
Now we will rank the empirical covariates based on absolute value of log hazard ratio.
$coef.abs <- abs(empvars.lasso$coef)
empvars.lasso<- empvars.lasso[order(empvars.lasso$coef.abs, decreasing = T),]
empvars.lasso head(empvars.lasso)
25.3.6 Step 5: Covariates
We used all investigator-specified covariates and the top 200 empirical covariates for the PS model. Again, this is a simplistic scenario where we only consider the main effects of the covariates.
# Investigator-specified covariates
<- c("sex", "age")
investigator.vars
# Top 200 empirical covariates section based on Cox-LASSO
<- empvars.lasso$vars[1:200]
empirical.vars.lasso
# Investigator-specified and empirical covariates
<- c(investigator.vars, empirical.vars.lasso)
vars.hsps head(vars.hsps)
#> [1] "sex" "age" "rec_diag_T44_once"
#> [4] "rec_diag_O41_once" "rec_msp_793_once" "rec_msp_459_once"
25.3.7 Step 6: Propensity score
25.3.7.1 Create propensity score formula
<- as.formula(paste0("I(anyDMD == 'Yes') ~ ",
ps.formula paste(vars.hsps, collapse = "+")))
25.3.7.2 Fit PS model
require(WeightIt)
<- weightit(ps.formula,
W.out data = hdps.data,
estimand = "ATE",
method = "ps",
stabilize = T)
25.3.7.3 Obtain PS
$ps <- W.out$ps hdps.data
25.3.7.4 Obtain weights
$w <- W.out$weights hdps.data
25.3.7.5 Assessing balance
library(survey)
# Balance checking for investigator-specified covariates
<- svydesign(ids = ~id, weights = ~w, data = hdps.data)
design.ipw <- svyCreateTableOne(vars = investigator.vars,
tab.ipw strata = "anyDMD",
data = design.ipw,
test = F)
print(tab.ipw, smd = T) # Age and sex are balanced
#> Stratified by anyDMD
#> No Yes SMD
#> n 11035.4 3324.4
#> sex = Male (%) 3199.3 (29.0) 1007.3 (30.3) 0.029
#> age (mean (SD)) 45.58 (13.75) 45.67 (13.48) 0.007
25.3.8 Step 7: Association
25.3.8.1 Obtain HR
We can fit the Cox-PH model, adjusting for the matched strata.
library(survival)
library(Publish)
<- coxph(Surv(Time, Fail) ~ anyDMD + strata(Set),
fit.hdps weights = w,
data = hdps.data)
publish(fit.hdps, pvalue.method = "robust", confint.method = "robust",
print = F)$regressionTable[1:2,]
25.3.8.2 Obtain HR with conditional logistic
For the NCC analysis, an alternative to the stratified Cox-PH model is to use the conditional logistic regression. The HR and SE from both models should be similar under the proportional hazards assumption.
<- clogit(Fail ~ anyDMD + strata(Set),
fit.hdps1 weights = w,
data = hdps.data,
method = "efron")
publish(fit.hdps1, pvalue.method = "robust", confint.method = "robust",
print = F)$regressionTable[1:2,]