| Reporting Component | Guideline for Effect Modification | Guideline for Interaction |
|---|---|---|
| Purpose | To show how the effect of one primary exposure (A) is modified by the strata of another factor (X). | To show the causal, joint effect of two distinct exposures (A and B) acting together. |
| Step 1: Joint Effects | Required. (e.g., ORs for all A/X combinations vs. a single reference, A=0, X=0). | Required. (e.g., ORs for all A/B combinations vs. a single reference, A=0, B=0). |
| Step 2: Stratum-Specific Effects | Required (SUBSET). Show only the effect of A within each stratum of X. | Required (FULL). Show the effect of A within each stratum of B... ...AND... ...the effect of B within each stratum of A. |
| Step 3: Interaction Measures | Required. Report measures for both additive (e.g., RERI) and multiplicative (e.g., ROR) scales, with CIs and p-values. | Required. Report measures for both additive (e.g., RERI) and multiplicative (e.g., ROR) scales, with CIs and p-values. |
| Step 4: Confounder Adjustment | Required. Adjust for confounders of the primary A-D relationship. | Required. Adjust for confounders of *both* the A-D relationship *and* the B-D relationship. |
Concepts (R)
Confounding
Confounding is a pervasive concern in epidemiology, especially in observational studies focusing on causality. Epidemiologists need to carefully select confounders to avoid biased results due to third factors affecting the relationship between exposure and outcome. Commonly used methods for selecting confounders, such as change-in-estimator or solely relying on p-value-based statistical methods, may be inadequate or even problematic.
Epidemiologists need a more formalized system for confounder selection, incorporating causal diagrams (Greenland, Pearl, and Robins 1999; Tennant et al. 2021) and counterfactual reasoning. This includes an understanding of the underlying causal relationships and the potential impacts of different variables on the observed association. Understanding the temporal order and causal pathways is crucial for accurate confounder control.
However, it is possible that epidemiologists may lack comprehensive knowledge about the causal roles of all variables and hence may need to resort to empirical criteria (VanderWeele 2019) such as the disjunctive cause criterion, or other variable selection methods such as machine learning approaches. While these methods can provide more sophisticated analyses and help address the high dimensionality and complex structures of modern epidemiological data, epidemiologists need to understand how these approaches function, along with their benefits and limitations, to avoid introducing additional bias into the analysis.
Effect modifier
Effect modification and interaction are two distinct concepts in epidemiology (VanderWeele 2009; Bours 2021). Effect modification occurs when the causal effect of an exposure (A) on an outcome (Y) varies based on the levels of a third factor (B).
In this scenario, the association between the exposure and the outcome differs within the strata of a second exposure, which acts as the effect modifier. For instance, the impact of alcohol (A) on oral cancer (Y) might differ based on tobacco smoking (B).
On the other hand, interaction refers to the joint causal effect of two exposures (A and B) on an outcome (Y). It examines how the combination of multiple exposures influences the outcome, such as the combined effect of alcohol (A) and tobacco smoking (B) on oral cancer (Y).
In essence, while effect modification looks at how a third factor influences the relationship between an exposure and an outcome, interaction focuses on the combined effect of two exposures on the outcome.
Table 2 fallacy
The “Table 2 Fallacy” in epidemiology refers to the misleading practice of presenting multiple adjusted effect estimates from a single statistical model in one table, often resulting in misinterpretation. This occurs when researchers report both the primary exposure’s effects and secondary exposures’ (often an adjustment variable for the primary exposure) effects without adequately distinguishing between the types of effects or considering the causal relationships among variables.
This idea highlights the potential for misunderstanding in interpreting the effects of various exposures on an outcome when they are reported together, leading to confusion over the nature and magnitude of the relationships and possibly influencing the design and interpretation of further studies (Westreich and Greenland 2013). The fallacy demonstrates the need for careful consideration of the types of effects estimated and reported in statistical models, urging researchers to be clear about the distinctions and implications of controlled direct effects, total effects, and the presence of confounding or mediating variables.
Reading list
Confounding key reference: (VanderWeele 2019; Tennant et al. 2021)
Effect modification key reference: (VanderWeele 2009; Bours 2021)
Table 2 fallacy key reference: (Westreich and Greenland 2013)
Optional reading:
Video Lessons
Before dissecting specific confounder selection techniques, it is crucial to establish the epistemological distinction that governs variable selection: the divergence between predictive and causal inference goals. This distinction is frequently conflated in practice, leading to the misapplication of algorithms designed for one purpose to the problems of the other.
The Goal of Prediction
In predictive modeling, the objective is to minimize the expected loss (e.g., mean squared error) between the predicted and observed outcome values. In this context, a “good” variable is one that is strongly correlated with the outcome, regardless of the direction of causality. A variable that is a consequence of the outcome (a proxy) or a mediator of the exposure can be an excellent predictor. Variable selection methods in this domain, such as standard stepwise regression, Akaike Information Criterion (AIC) minimization, or standard Lasso regularization, are designed to identify a parsimonious set of correlates that maximize model fit and reduce prediction error.
The Goal of Causal Explanation
In causal inference, the objective is to isolate the specific marginal effect of an intervention (exposure) on an outcome. Here, the correlation is only useful if it reflects a structural cause-effect relationship. Including a mediator in the model will increase the \(R^2\) (predictive power) but will bias the estimation of the total causal effect toward the null.
Consequently, variable selection methods optimized for prediction are often mathematically antagonistic to causal inference. Techniques that rely on “goodness-of-fit” or statistical significance can inadvertently select colliders (inducing bias) or drop weak confounders that are critical for validity. The failure to distinguish these goals is a primary source of methodological error in the medical literature, motivating the need for distinct, causally-grounded selection strategies.
The Counterfactual Framework for Defining Causality
Defining the Causal Effect: Potential Outcomes
To understand causality, one must first be able to imagine a world that does not exist. The potential outcomes framework formalizes this by defining the causal effect of an exposure in terms of what would have happened under different exposure scenarios. Let us define the key notations:
- A: The exposure status of an individual (e.g., \(A=1\) if a smoker, \(A=0\) if a non-smoker).
- Y: The outcome of interest (e.g., hypertension).
- L: A measured covariate or potential confounder.
- U: An unmeasured variable.
For any individual, we can define two potential outcomes:
- Y(A=1): The outcome that would be observed if the individual were a smoker.
- Y(A=0): The outcome that would be observed if that same individual were a non-smoker at the same point in time.
The Individual Treatment Effect (TE) is the difference between these two potential outcomes for a single person: \(TE = Y(A=1) - Y(A=0)\). For example, if a patient named John smokes (\(A=1\)) and develops hypertension, while he would not have developed hypertension had he not smoked (\(A=0\)), the causal effect of smoking for John is present.
The Fundamental Problem of Causal Inference
The definition of the individual TE immediately presents a profound challenge. For any given individual, we can only ever observe one of their potential outcomes. If John smokes, we observe \(Y(A=1)\), but his counterfactual outcome, \(Y(A=0)\), remains unobserved forever. This is known as the fundamental problem of causal inference; it is a problem of missing data where half the data is always missing for every subject.
Because the individual TE is unobservable, the goal of epidemiology shifts from the individual to the population. We instead seek to estimate the Average Treatment Effect (ATE), defined as the average of the individual effects across all subjects in a population: \(ATE = E\).
From Association to Causation: The Role of Confounding
In the real world, we cannot directly observe both potential outcomes for a population. Instead, we observe outcomes in two different groups of people: those who happened to be exposed (smokers) and those who were not (non-smokers). We can calculate the associational difference between these groups: \(E - E\). A critical error is to assume this associational difference is equal to the causal ATE.
This difference arises because of confounding. The groups of smokers and non-smokers may differ systematically on factors that also affect the outcome. For instance, individuals with lower socioeconomic status may be more likely to smoke and also have a higher underlying risk of hypertension for reasons unrelated to smoking (e.g., diet, stress). In this case, the observed difference in outcomes is a mixture of the true treatment effect and these pre-existing, systematic differences between the groups.
The Observational Study Solution: Conditional Exchangeability
Randomized Controlled Trials (RCTs) are the gold standard for causal inference because the process of randomization, with a large enough sample size, ensures that the exposed and unexposed groups are, on average, identical (“exchangeable”) on all baseline characteristics, both measured and unmeasured. In an RCT, any systematic differences are eliminated, making the associational difference a valid estimate of the causal ATE.
In observational studies, where randomization is not possible, we cannot achieve this level of exchangeability. Instead, we strive for conditional exchangeability. This is the assumption that, within strata of the measured confounders, the exposed and unexposed groups are exchangeable. By estimating the effect of smoking separately within each level of the confounder(s) \(L\) (e.g., estimating the effect of smoking separately for different age groups) and then averaging these stratum-specific effects, we can aim to reconstruct the causal ATE. This process of stratification, or “adjustment,” is the conceptual basis for controlling for confounding in observational research. However, its validity rests entirely on the critical and untestable assumption that we have successfully identified and measured all important common causes of the exposure and the outcome.
What is included in this Video Lesson:
- 0:00 Introduction
- 0:16 Notations
- 2:40 Treatment Effect
- 6:13 Real-world Problem of the counterfactual definition
- 9:44 Real-world Solution in Observational Setting
The timestamps are also included in the YouTube video description.
To properly address confounding, researchers need a tool to translate their subject-matter knowledge and assumptions about the world into a formal structure. Directed Acyclic Graphs (DAGs) serve this purpose, providing a visual language and a set of rigorous rules for identifying sources of bias and guiding statistical analysis.
The Grammar of Causal Diagrams
A DAG is a graphical model of causal relationships between variables. Its components follow a simple grammar:
- Nodes: Represent variables (e.g., smoking, hypertension, age).
- Arrows (Directed Edges): Represent a direct causal effect from one variable to another.
- Directed: The arrows have a single head, indicating the assumed direction of causality.
- Acyclic: A path of arrows cannot form a closed loop. This enforces the principle of temporality: a variable cannot be its own cause.
Crucially, the most powerful assumptions in a DAG are the absent arrows. The absence of an arrow between two variables represents a strong claim of no direct causal effect.
Paths: Causal and Non-Causal
A path is any sequence of arrows connecting two variables, regardless of the direction of the arrowheads. When assessing the relationship between an exposure like smoking (\(A\)) and an outcome like hypertension (\(Y\)), paths can be categorized into two critical types:
- Causal Paths (Front-door paths): These are paths that begin with an arrow originating from \(A\) and moving toward \(Y\) (e.g., \(A \rightarrow \text{Stress} \rightarrow Y\)). These paths transmit the causal effect of \(A\) on \(Y\) that we wish to estimate.
- Non-Causal Paths (Back-door paths): These are paths between \(A\) and \(Y\) that begin with an arrow pointing into \(A\) (e.g., \(A \leftarrow \text{Age} \rightarrow Y\)). These paths are sources of non-causal association (confounding) that can bias our estimate. The goal of adjustment is to “block” these backdoor paths.
The Three Elementary Causal Structures
All complex DAGs are composed of three fundamental building blocks. Understanding how information flows through these structures is the key to using DAGs to identify and control for bias.
-
The Fork (Confounding): The structure is \(A \leftarrow L \rightarrow Y\). Here, \(L\) is a common cause of both the exposure \(A\) and the outcome \(Y\).
- Example: Age (\(L\)) is a common cause of both smoking habits (\(A\)) and hypertension (\(Y\)).
- Rule: The backdoor path through a common cause is open by default, creating a spurious association. To remove this confounding, one must condition on the confounder \(L\), which blocks the path.
-
The Chain (Mediation): The structure is \(A \rightarrow M \rightarrow Y\). Here, \(M\) is a mediator that lies on the causal pathway.
- Example: Smoking (\(A\)) causes chronic inflammation (\(M\)), which in turn causes hypertension (\(Y\)).
- Rule: The causal path through a mediator is open by default. To estimate the total effect of \(A\) on \(Y\), one must not condition on the mediator \(M\). Doing so would block this part of the causal effect.
-
The Collider (Selection/Collider Bias): The structure is \(A \rightarrow L \leftarrow Y\). Here, \(L\) is a common effect of both \(A\) and \(Y\).
- Example: Both smoking (\(A\)) and a genetic predisposition (\(Y\)) can lead to a specific biomarker level (\(L\)).
- Rule: The path through a collider is blocked by default. However, conditioning on the collider \(L\) opens the path, inducing a spurious, non-causal association between \(A\) and \(Y\). Adjusting for a collider is a critical error that introduces bias.
Applying the Rules with Dagitty
In practice, causal systems can be highly complex. Software such as Dagitty.net automates the application of these path-blocking rules. Given a user-drawn DAG, Dagitty can identify all open backdoor paths and determine the minimal sufficient adjustment sets: the smallest set of covariates that, if conditioned on, will block all backdoor paths and allow for an unbiased estimation of the total causal effect.
The video lesson split into 3 parts
Example DAG codes can be accessed from this GitHub repository folder
In many practical epidemiological investigations, particularly those involving novel exposures or complex metabolic pathways, the full causal structure is unknown. The uncertainty regarding the presence or direction of arrows makes the strict construction of a DAG impossible. In such “DAG-deficient” scenarios, epidemiologists must resort to pragmatic heuristics or empirical criteria that aim to approximate the Backdoor Criterion with less stringent assumptions.
In the absence of a fully specified DAG, researchers can rely on a set of empirical criteria that require less stringent assumptions.
One of the simplest and most intuitive heuristics is the Pre-treatment Criterion, which dictates adjusting for all covariates measured chronologically before the exposure was administered or assigned.
Rationale: The logic is grounded in temporal causality; a variable occurring before the exposure cannot be a downstream effect (mediator) of the exposure. Therefore, adjusting for pre-treatment variables avoids the error of overadjustment via mediation.
Critique:
While this criterion successfully avoids adjusting for mediators, it fails to protect against M-bias. A pre-treatment variable can still be a collider if it is caused by two unobserved latent variables—one linked to the exposure and one to the outcome. Adjusting for such a pre-treatment collider introduces bias.
This “kitchen sink” approach often leads to the inclusion of Instrumental Variables (IVs)—pre-treatment variables that cause the exposure but have no independent effect on the outcome. As discussed later, adjusting for IVs inflates the variance of the estimator and can amplify bias due to residual unmeasured confounding (Z-bias).
Thus, while the Pre-treatment Criterion is a helpful starting point, it is often too crude for high-stakes causal inference.
The Common Cause Criterion refines the selection process by narrowing the adjustment set to variables known (or suspected) to be causes of both the exposure and the outcome.
Rationale: This criterion targets the classical epidemiological definition of a confounder. By restricting selection to common causes, it theoretically avoids colliders (which are effects) and instruments (which are causes of exposure only).
Critique: The major limitation of this approach is its reliance on definitive knowledge. If a researcher is unsure whether a variable causes the outcome, the strict application of this criterion would lead to its exclusion. However, standard bias analysis suggests that omitting a true confounder (due to uncertainty) generally introduces more bias than including a non-confounder. Therefore, the Common Cause Criterion is often viewed as overly conservative, potentially leading to residual confounding in the pursuit of parsimony.
To address the limitations of the Common Cause Criterion, the Disjunctive Cause Criterion is proposed as a pragmatic strategy for confounder selection (VanderWeele 2019).
The Rule: Control for any pre-exposure covariate that is
- a cause of the exposure, OR
- a cause of the outcome, OR
- both.
Mechanism: This union-based approach ensures that all common causes (confounders) are included, as they satisfy the condition of being a cause of both. By including variables that are only causes of the outcome, the method improves the precision of the estimate (reducing standard error) without introducing bias. By including variables that are only causes of the exposure (potential instruments), it risks some variance inflation, but this is often considered an acceptable trade-off to ensure no confounders are missed.
Strength: The primary strength of the Disjunctive Cause Criterion is its robustness to uncertainty regarding the full causal structure. The researcher does not need to know if a variable affects both exposure and outcome; knowing it affects at least one is sufficient for inclusion. This effectively minimizes the risk of unadjusted confounding while generally avoiding colliders (which are effects, not causes).
Refining the Disjunctive Cause Criterion further, the Modified Disjunctive Cause Criterion incorporates specific exclusions and inclusions to optimize both validity and efficiency.
Exclude IVs: Recognizing the variance inflation and Z-bias risks associated with instruments, the modified criterion explicitly removes variables known to affect the exposure but not the outcome. This requires some structural knowledge but yields a more efficient estimator.
Include Proxies: Acknowledging that true confounders are often unmeasured, the modified criterion mandates the inclusion of measured variables that serve as proxies for the unmeasured common causes. Even if a proxy is not a direct cause, adjusting for it partially blocks the backdoor path transmitted through the unobserved parent variable.
Statistical methods can also be used for variable selection, but their application requires careful consideration of the research goal: prediction versus causal inference.
The Change-in-Estimate (CIE) method represents an operationalization of the definition of confounding: if a variable is a confounder, adjusting for it should change the estimated effect of the exposure.
The Procedure: The researcher begins with a “crude” model containing only the exposure and outcome. Potential confounders are added to the model one by one (or removed from a full model). If the regression coefficient for the exposure changes by more than a specified percentage (commonly 10%), the variable is deemed a confounder and retained in the model.
The Non-Collapsibility Trap: A critical flaw of the CIE method arises when using non-collapsible effect measures, such as the OR or HR. In logistic regression, the addition of a covariate that is strongly associated with the outcome (but independent of the exposure) will increase the magnitude of the exposure’s OR—driving it further from the null. This occurs not because of confounding bias, but because of a mathematical property known as non-collapsibility. A CIE algorithm would interpret this change as evidence of confounding and select the variable, potentially leading to over-adjustment or misinterpretation of the effect measure modification. Thus, CIE is safer for RDs or RRs but hazardous for ORs.
Stepwise selection algorithms (forward selection, backward elimination, or bidirectional search) rely on statistical significance (p-values) to determine variable inclusion.
The Procedure: Variables are added to the model if their association with the outcome yields a p-value below a certain threshold (e.g., 0.05) or removed if the p-value exceeds it.
The Confounding vs. Significance Fallacy: The most fundamental critique of this approach is that “confounding is not a significance test.” A variable can be a strong confounder—systematically biasing the effect estimate—even if its association with the outcome fails to reach statistical significance in a specific sample, particularly in small studies. Relying on p-values often leads to under-adjustment and residual confounding.
Post-Selection Inference: Stepwise selection invalidates the statistical theory behind confidence intervals. The final model treats the selected variables as if they were specified a priori, ignoring the immense “data dredging” and multiple testing that occurred during the selection process. This results in standard errors that are systematically too small and confidence intervals that are too narrow, creating a false sense of precision.
Prediction vs. Causation: Ultimately, stepwise algorithms are designed to maximize model fit (prediction). They will happily select a collider or a mediator if it is strongly correlated with the outcome, thereby maximizing \(R^2\) while destroying the validity of the causal coefficient.
Recognizing the limitations of purely mechanical stepwise regression, the “Purposeful Selection” algorithm, a hybrid approach was proposed (Hosmer, Lemeshow, and Sturdivant 2013; Bursac et al. 2008)that combines statistical criteria with researcher judgment and confounding checks.
The Algorithm:
-
Univariate Screening:
- Evaluate all covariates individually.
- Retain any variable with a univariate p-value \(< 0.25\). This relaxed threshold is crucial; it aims to capture potential confounders that may be weak individually but strong jointly, or whose effects are masked in univariate analysis.
-
Multivariable Model:
- Fit a model with all candidates identified in step 1.
- Remove variables that are not significant at traditional levels (e.g., \(p < 0.05\)).
-
Confounding Check: This is the distinguishing feature.
- Before permanently discarding a variable, the analyst must check if its removal induces a major change (\(>15-20\%\)) in the coefficients of the remaining variables.
- If it does, the variable is added back into the model as a confounder, regardless of its statistical significance.
- Refinement and Interactions: Excluded variables are added back one by one to check for residual significance. Finally, the model is checked for plausible interactions.
Insight: Purposeful Selection is widely cited in epidemiology because it operationalizes the definition of confounding within the selection process. Unlike rigid stepwise regression, it prioritizes the stability of the exposure coefficient over the parsimony of the outcome model. It forces the analyst to examine the data at each step, acting as a safeguard against the automation of causal errors.
Criticism: Purposeful Selection is now considered outdated and flawed by modern causal inference standards. Its fundamental weakness is that it remains entirely driven by statistical associations within the data rather than by a priori causal structure. The “confounding check” (Step 3), its distinguishing feature, is ironically its most critical flaw. This change-in-estimator (CIE) criterion cannot distinguish true confounders from colliders or mediators. In the case of a collider, adjusting for it induces a spurious association (bias), which causes a large change in the exposure’s coefficient. The algorithm misinterprets this induced bias as a sign of confounding and therefore retains the collider, leading to a biased final estimate. Because it is “causally blind,” it is not a safeguard against causal errors and is superseded by methods like those based on DAGs.
Algorithms such as LASSO and Random Forests are excellent for high-dimensional prediction. Their primary role in causal inference is in developing propensity score (PS) models, which is a prediction task for the exposure model (Karim and Lei 2025). The goal is to create a score that balances measured covariates between the exposed and unexposed groups, mimicking randomization.
Criticism: The variance estimation can be poor depending on the machine learning method used to do the variable selection, often resulting in poor coverage.
- High-Dimensional Propensity Score (hdPS) (Schneeweiss et al. 2009; Karim et al. 2025): designed for healthcare databases. It algorithmically scans thousands of proxy variables (e.g., prior diagnoses, medications) and selects those that are most likely to be confounders to include in the propensity score model.
- Machine learning versions of hdPS (Karim 2025; Karim, Pang, and Platt 2018): These models are excellent at capturing complex, non-linear relationships and interactions among covariates. See external workshop materials here.
- Post-double-selection method (Belloni, Chernozhukov, and Hansen 2014): It formally recognizes that a confounder must be related to both the exposure and the outcome. It use a machine learning method (e.g., LASSO) to select all covariates that are predictive of the outcome, and then again uses LASSO to select all covariates that are predictive of the exposure. The final set of confounders to adjust for is the union (all variables from both lists). This algorithmically mimics the “Disjunctive Cause Criterion” (adjust for causes of Exposure or Outcome). It is robust and avoids the biases of selecting based only on the outcome. Runs a simple (non-penalized) regression for the final estimate, adjusting for the union set.
- Outcome-Adaptive Lasso (Shortreed and Ertefaie 2017; Baldé, Yang, and Lefebvre 2023): This is a variation of LASSO that essentially performs “double selection” in a single step. It’s a penalized regression (LASSO) for the outcome model, but the penalty for each covariate is adapted (weighted). Covariates that are strongly predictive of the exposure are given a smaller penalty, making them more likely to be kept in the final outcome model, regardless of their association with the outcome.
- Collaborative Targeted Maximum Likelihood Estimation (C-TMLE) (Laan and Gruber 2010): It uses machine learning (often a “Super Learner” that combines many ML algorithms) to build the best possible outcome model. Then, it collaboratively uses information from that model to decide which covariates also need to go into the propensity score model to minimize bias. This is an extension of the TMLE method that we cover later.
A crucial, and often overlooked, aspect of statistical adjustment is the concept of collapsibility. An effect measure is said to be collapsible if the marginal (crude) measure of association is equal to a weighted average of the stratum-specific measures of association after conditioning on another variable. This property has profound implications for how we interpret adjusted estimates.
In the absence of confounding, some effect measures, like the Risk Difference (RD) and Risk Ratio (RR), are collapsible. This means that if a variable is not a confounder, adjusting for it will not change the effect estimate. However, other common measures, most notably the Odds Ratio (OR), are non-collapsible.
The non-collapsibility of the odds ratio is a mathematical property stemming from the non-linearity of the logistic model’s link function. It means that the adjusted OR can be different from the crude OR even when there is no confounding. This phenomenon, where an association in a population differs from the association within its subgroups, is also known as Simpson’s Paradox (in the absence of confounding). This is precisely why the change-in-estimate criterion for confounder selection is invalid when using odds ratios—a change in the OR upon adjustment does not necessarily signal the presence of confounding.
Simpson’s Paradox is a statistical phenomenon where an association observed in a population is different from—and often in the opposite direction of—the associations observed in all of its subgroups. This paradox is a powerful illustration of how failing to account for a key third variable (a confounder or a collider) can lead to completely erroneous conclusions.
A famous example is the “Birthweight Paradox,” where maternal smoking appeared to be protective against infant mortality among low-birthweight infants, a finding that contradicted the known harms of smoking. This occurred because birthweight acted as a collider. Adjusting for it induced a spurious association between smoking and other unmeasured causes of mortality (e.g., birth defects).
The effect of an exposure may not be uniform across a population. A third variable can alter the exposure-outcome relationship, a phenomenon that leads to frequent confusion between two distinct concepts: interaction and effect modification.
Formal Definitions
While often used interchangeably, these terms address different causal questions:
- Effect Modification: This occurs when the causal effect of a single exposure (e.g., smoking) on an outcome (hypertension) differs across strata of a second variable (e.g., education level). The question is: “Is the effect of smoking different for people with high education versus people with low education?” This involves only one intervention (on smoking). The variable ‘education’ is treated as a baseline characteristic defining subgroups.
- Interaction: This refers to the joint causal effect of two exposures (e.g., smoking and low education) on an outcome (hypertension). The question is: “Is the effect of intervening on both smoking and education greater than the sum of the effects of intervening on each one alone?” This involves two distinct interventions and assesses synergy or antagonism.
Implications for Confounding Control
The distinction is critical for analytical strategy:
-
To assess Effect Modification: When investigating if education modifies the effect of smoking on hypertension, a researcher only needs to control for the set of confounders of the
smoking -> hypertensionrelationship. -
To assess Interaction: When investigating the causal interaction between smoking and education, a researcher must control for all confounders of the
smoking -> hypertensionrelationship AND all confounders of theeducation -> hypertensionrelationship. This is a much more demanding requirement.
The Role of the Scale: Effect Measure Modification
Whether modification is detected can depend on the statistical scale used (e.g., additive scale for Risk Difference vs. multiplicative scale for Risk Ratio). For this reason, the more precise term is effect measure modification. A statistical finding of interaction is a property of the chosen model and does not necessarily correspond to a specific biological mechanism.
Reporting guideline
See Knol and VanderWeele (2012)
To revisit or deepen your grasp of these two concepts, consider reviewing this external tutorial.
One of the most common errors in reporting observational research is the Table 2 Fallacy. This fallacy is the practice of presenting a single multivariable regression model and interpreting the coefficients for all variables—the primary exposure and all adjustment covariates—as if they are equally valid estimates of the total causal effect of each variable on the outcome.
Why A Single Model Fails: A DAG-Based Explanation
A multivariable regression model is built to answer a single, specific causal question. The adjustment set required to estimate the causal effect of one variable is often different from the set required to estimate the effect of another.
Consider a DAG for the effects of smoking, age, and hypertension:
-
Causal Question 1: What is the total effect of Smoking on Hypertension?
- Assume Age is a common cause of both Smoking and Hypertension. To get an unbiased estimate of the total effect of Smoking, one must adjust for Age. The appropriate model is:
Hypertension ~ Smoking + Age. The coefficient for Smoking can be interpreted as the total causal effect.
- Assume Age is a common cause of both Smoking and Hypertension. To get an unbiased estimate of the total effect of Smoking, one must adjust for Age. The appropriate model is:
-
Causal Question 2: What is the total effect of Age on Hypertension?
- In this same DAG, Smoking may be a mediator of the effect of Age (i.e.,
Age -> Smoking -> Hypertension). To estimate the total effect of Age, one must not adjust for the mediator, Smoking. The model built for Question 1 does adjust for Smoking. Therefore, the coefficient for Age in that first model is not an estimate of the total effect; it is an estimate of the controlled direct effect—the effect of Age on Hypertension that does not operate through the Smoking pathway.
- In this same DAG, Smoking may be a mediator of the effect of Age (i.e.,
Best Practices for Reporting
To avoid the Table 2 Fallacy, analysis and reporting must be driven by a “one exposure, one model” principle:
- Be Explicit: Clearly state the single primary exposure of interest for each model.
- Use Multiple Models: If causal effects are desired for multiple variables, fit a separate, correctly specified model for each one.
- Structure Tables Clearly: The primary results table should only show the effect estimate for the main exposure of interest. The covariates used for adjustment should be listed in a footnote, not in the table with their own effect estimates.
Video Lesson Slides
Confounding
Effect modification
Table 2 fallacy
Links
Confounding
Effect modification
Table 2 fallacy