8 Step 5: Covariates
We select 2 types of covariates for the next step (to analyze using propensity score or other alternative approaches):
8.1 Ideal number of prioritised covariates
Based on calculated \(log-absolute-bias\), we select top k
recurrence covariates to be used in the hdPS analyses later. Below is a plot of all of the absolute log of the Bias Multiplier:
We used \(k = 100\) covariates selected by the hdPS algorithm (we call them ‘hdPS covariates’). What should be the cutpoint?
- Absolute log of the Bias Multiplier has a null value of 0. Anything above 0 is an indication of confounding bias adjusted by the adjustment of the associated recurrent covariate.
- For large proxy data sources, \(k = 500\) is suggested (Schneeweiss et al. 2009).
- See Sensitivity Analysis section for an understanding of how to choose a value based on an ad-hoc process.
8.2 Selected hdPS variables (proxies)
<- out3$autoselected_covariate_df
hdps.dim dim(hdps.dim) # id + k
#> [1] 7585 101
head(hdps.dim)[,1:3]
$id <- hdps.dim$idx
hdps.dim$idx <- NULL hdps.dim
8.3 Investigator-specified covariates
\(25\) investigator-specified covariates are selected based on variables in the DAG that are available in the data set.
We should also add necessary interactions of these investigator-specified covariates, or add other useful model-specifications (e.g., polynomials).
- 14 demographic, behavioral, health history related variables/access
- Mostly categorical
- 11 lab variables
- Mostly continuous
<- "obese"
exposure <- "diabetes"
outcome <-
investigator.specified.covariates c(# Demographic
"age.cat", "sex", "education", "race",
"marital", "income", "born", "year",
# health history related variables/access
"diabetes.family.history", "medical.access",
# behavioral
"smoking", "diet.healthy", "physical.activity", "sleep",
# Laboratory
"uric.acid", "protein.total", "bilirubin.total", "phosphorus",
"sodium", "potassium", "globulin", "calcium.total",
"systolicBP", "diastolicBP", "high.cholesterol"
)length(investigator.specified.covariates)
#> [1] 25
8.4 Merged data
load("data/analytic3cycles.RData")
<- merge(data.complete[,c("id",
hdps.data
outcome,
exposure,
investigator.specified.covariates)], by = "id")
hdps.dim, dim(hdps.data)
#> [1] 7585 128
Variable count (128)
- 1 ID variable
- 1 exposure
- 1 outcome
- 25 investigator-specified covariates
- 100 hdPS variables