Mediator

Mediators play a crucial role in understanding how a treatment variable affects an outcome. A mediator variable lies in the pathway between the treatment and outcome, essentially transmitting or explaining the effect of the treatment variable. In this expanded tutorial, we’ll delve into more details based on the lecture, specifically focusing on the true direct and indirect effects when a mediator is present.

# Load required packages
library(simcausal)

Let us consider

An example of A could be receiving the right heart catheterization (RHC) procedure or not, Y could be the length of hospital stay, and M could be the number of comorbidities.

Non-null effect

  • True treatment effect = 1.3

Our true treatment effect is 1.3, and the mediator variable’s effect on the outcome Y is 0.5. It’s important to differentiate between these effects.

Data generating process

require(simcausal)
D <- DAG.empty()
D <- D + 
  node("A", distr = "rbern", prob = plogis(-10)) +
  node("M", distr = "rnorm", mean = 10 + 0.9 * A, sd = 1) + 
  node("Y", distr = "rnorm", mean = 0.5 * M + 1.3 * A, sd = .1)
Dset <- set.DAG(D)

Generate DAG

plotDAG(Dset, xjitter = 0.1, yjitter = .9,
        edge_attrs = list(width = 0.5, arrow.width = 0.4, arrow.size = 0.7),
        vertex_attrs = list(size = 12, label.cex = 0.8))

As we can see, M is a mediator (mediates the effect of A on Y). When exploring the total effect of A on Y, we should not adjust our model for M.

Generate Data

require(simcausal)
Obs.Data <- sim(DAG = Dset, n = 1000000, rndseed = 123)
head(Obs.Data)

Estimate effect

# Not adjusted for M
fit0 <- glm(Y ~ A, family="gaussian", data=Obs.Data)
round(coef(fit0),2)
#> (Intercept)           A 
#>        5.00        1.69

# Adjusted for M
fit <- glm(Y ~ A + M, family="gaussian", data=Obs.Data)
round(coef(fit),2)
#> (Intercept)           A           M 
#>         0.0         1.3         0.5
Important

You might notice a total effect that could differ from the direct effect. Here the direct effect of A on Y (the coefficient on A in the M-adjusted model) is 1.3. The indirect effect of A through M is the product of the A→M effect (0.9) and the M→Y effect (0.5), i.e., \(0.9 \times 0.5 = 0.45\). The total effect is therefore the sum of the direct and indirect effects: \(1.3 + 0.45 = 1.75\).

When you do not adjust for M, the coefficient on A estimates this total effect (≈1.75). When you adjust for M, the coefficient on A estimates the direct effect (1.3) and the coefficient on M estimates the M→Y effect (0.5) — note that the M coefficient is the M→Y effect, not the indirect effect itself; the indirect effect is the product (A→M)×(M→Y) = 0.45.

In this expanded tutorial, we’ve shown how essential it is to consider mediator variables when estimating treatment effects. We’ve also illustrated how adjusting for mediators allows you to differentiate between direct and indirect effects, thereby reducing the risk of drawing incorrect conclusions from your data.

Detailed on the mediation analysis can be found in the Mediation analysis chapter.

Null effect

  • True direct effect = 0 (total effect = 0.45 via the mediator)

Data generating process

require(simcausal)
D <- DAG.empty()
D <- D + 
  node("A", distr = "rbern", prob = plogis(-10)) +
  node("M", distr = "rnorm", mean = 10 + 0.9 * A, sd = 1) + 
  node("Y", distr = "rnorm", mean = 0.5 * M, sd = .1)
Dset <- set.DAG(D)

Generate DAG

plotDAG(Dset, xjitter = 0.1, yjitter = .9,
        edge_attrs = list(width = 0.5, arrow.width = 0.4, arrow.size = 0.7),
        vertex_attrs = list(size = 12, label.cex = 0.8))

Generate Data

require(simcausal)
Obs.Data <- sim(DAG = Dset, n = 1000000, rndseed = 123)
head(Obs.Data)

Estimate effect

# Not adjusted for M
fit0 <- glm(Y ~ A, family="gaussian", data=Obs.Data)
round(coef(fit0),2)
#> (Intercept)           A 
#>        5.00        0.39

# Adjusted for M
fit <- glm(Y ~ A + M, family="gaussian", data=Obs.Data)
round(coef(fit),2)
#> (Intercept)           A           M 
#>         0.0         0.0         0.5
Important

Direct vs. total effect here: In this DGP the direct effect of A on Y is 0 (Y depends only on M), but A still affects M (A→M = 0.9) and M affects Y (M→Y = 0.5). So the indirect effect of A is \(0.9 \times 0.5 = 0.45\) and the total effect is \(0 + 0.45 = 0.45\). The unadjusted A coefficient estimates this total effect (≈0.45), not zero, even though the direct effect is null.

Total Effect: If you want to measure the “total effect” of a treatment on an outcome, then you typically don’t adjust for the mediator. The reason is that the total effect captures both the direct effect of the treatment on the outcome and the indirect effect through the mediator.

Direct and Indirect Effects: If you want to separate out the direct and indirect effects, then you would adjust for the mediator. In essence, when you control for the mediator, what remains is the direct effect of the treatment on the outcome (here, 0).

Linearity and Decomposition: In linear models with continuous outcomes, it is more straightforward to decompose total effects into direct and indirect effects. The mathematics get more complicated in non-linear models or when dealing with non-continuous outcomes.