In causal inference, understanding the role of colliders is crucial. A collider is a variable that is a common effect of two or more variables. Adjusting for a collider can introduce bias into your estimates.
# Load required packageslibrary(simcausal)
In a DAG, a collider is a variable influenced by two or more other variables. In our case, L is a collider because it is affected by both A (the treatment) and Y (the outcome). When you adjust for a collider like L, you could introduce bias into your estimates, as demonstrated in the examples below.
Let us consider
L is continuous variable
A is binary treatment/exposure
Y is continuous outcome
An example of A could be a genetic variable (e.g., skin color), Y could be an environmental variable (e.g., indoor air pollution), and L could be disease conditions (e.g., number of comorbidities) (detailed expamles here).
Non-null effect
True treatment effect = 1.3
Data generating process
D <-DAG.empty()D <- D +node("A", distr ="rbern", prob =plogis(-10)) +node("Y", distr ="rnorm", mean =1.3* A, sd = .1) +node("L", distr ="rnorm", mean =10* Y +1.3* A, sd =1)Dset <-set.DAG(D)
require(simcausal)Obs.Data <-sim(DAG = Dset, n =1000000, rndseed =123)head(Obs.Data)
Estimate effect
# Not adjusted for Lfit0 <-glm(Y ~ A, family="gaussian", data=Obs.Data)round(coef(fit0),2)#> (Intercept) A #> 0.00 1.29# Adjusted for Lfit <-glm(Y ~ A + L, family="gaussian", data=Obs.Data)round(coef(fit),2)#> (Intercept) A L #> 0.00 0.58 0.05
Important
When not adjusting for L, we recover the true effect close to 1.3. Adjusting for L introduces bias, making the estimate unreliable.
Null effect
True treatment effect = 0
Data generating process
D <-DAG.empty()D <- D +node("A", distr ="rbern", prob =plogis(-10)) +node("Y", distr ="rnorm", mean =0, sd = .1) +node("L", distr ="rnorm", mean =10* Y +1.3* A, sd =1)Dset <-set.DAG(D)
require(simcausal)Obs.Data <-sim(DAG = Dset, n =1000000, rndseed =123)head(Obs.Data)
Estimate effect
# Not adjusted for Lfit0 <-glm(Y ~ A, family="gaussian", data=Obs.Data)round(coef(fit0),2)#> (Intercept) A #> 0.00 -0.01# Adjusted for Lfit <-glm(Y ~ A + L, family="gaussian", data=Obs.Data)round(coef(fit),2)#> (Intercept) A L #> 0.00 -0.07 0.05
Important
When the true effect is null, not adjusting for L shows an estimate close to zero. Adjusting for L moves the estimate away from the null value, introducing bias.
Even 1,000,000 observations were not enough to recover true treatment effect! But we are close enough.