Notation and glossary

Note

This appendix collects the symbols, estimands, and terminology conventions used throughout the book so that notation stays consistent across modules. Where individual chapters historically used a different word for the same idea, this page states the preferred convention.

Variables and symbols

Symbol	Meaning
\(Y\)	Outcome variable
\(A\)	Exposure (the primary variable whose effect we study). We use exposure throughout for observational studies; treatment is used only where it is part of an established term (e.g., “average treatment effect”).
\(L\), \(C\), \(X\)	Covariates / potential confounders
\(M\)	Mediator
\(Y^{a}\) or \(Y(a)\)	Potential (counterfactual) outcome that would be observed if exposure were set to \(A=a\)
\(E[\cdot]\)	Expectation. Conditional expectation is written \(E[Y \mid A=a]\); the mean of a potential outcome is \(E[Y(a)]\).
\(\hat{\theta}\)	An estimate of a parameter \(\theta\)

Estimands (what we are trying to estimate)

Term	Symbol	Definition
Average Treatment Effect	ATE	\(E[Y(1) - Y(0)]\) — the average effect in the whole target population, on the difference (risk-difference) scale
Average Treatment effect on the Treated	ATT	\(E[Y(1) - Y(0) \mid A=1]\) — average effect among the exposed
Individual Treatment Effect	ITE	\(Y_i(1) - Y_i(0)\) for a single unit \(i\) (generally not identifiable)
Total Effect	TE	The full effect of \(A\) on \(Y\), i.e. direct + indirect (mediated) effect
Natural Direct Effect	NDE	Effect of \(A\) on \(Y\) not through \(M\), with \(M\) left at its natural value
Natural Indirect Effect	NIE	Effect of \(A\) on \(Y\) operating through \(M\)
Controlled Direct Effect	CDE	Effect of \(A\) on \(Y\) when \(M\) is fixed to a specific value (e.g., \(M=0\))

Tip

Effect terminology. Use ITE for an individual-level effect, TE for the total effect, and CDE vs NDE/NIE to distinguish fixing the mediator from leaving it at its natural value. Earlier drafts occasionally wrote “TE” for an individual effect or “TCE” for the total effect — prefer ITE and TE respectively.

Measures of association/effect

RD (risk difference), RR (risk ratio), OR (odds ratio), HR (hazard ratio).
Marginal vs conditional. A marginal effect is averaged over the covariate distribution; a conditional effect holds covariates fixed. For non-linear models (e.g., logistic), the OR is non-collapsible: a conditional OR and a marginal OR generally differ even with no confounding. State which one a quantity represents.
Collapsibility. RD and RR are collapsible; OR and HR are not.

Weights

Different “weights” appear in different modules; they are not interchangeable:

Weight	Where used	Purpose
Survey (sampling) weight	Complex Survey Data (D)	Inverse probability of selection into the sample; makes estimates representative of the target population
IPTW (inverse-probability-of-treatment weight)	Propensity Score (S), Causal ML (C)	Inverse probability of the observed exposure; creates a pseudo-population in which exposure is independent of measured confounders
Matching weight	Propensity Score (S)	Weight induced by a matching scheme (e.g., 1:k matching, matching with replacement)
MI / analysis weight	Missing Data (M)	Combining/aggregating across multiply imputed datasets

When survey weights and IPTW (or matching weights) both apply, they are multiplied to form a combined weight.

Key causal assumptions

SUTVA (stable unit treatment value assumption): no interference between units and one version of each exposure level.
Exchangeability / no unmeasured confounding: \(Y(a) \perp A \mid L\).
Positivity: \(0 < P(A=a \mid L) < 1\) for all covariate strata.
Consistency: the observed outcome under the observed exposure equals the corresponding potential outcome.

Missing-data mechanisms

MCAR (missing completely at random): missingness unrelated to observed or unobserved data.
MAR (missing at random): missingness depends only on observed data.
MNAR (missing not at random): missingness depends on unobserved values.

Other recurring terms

MSM (marginal structural model): a model for the marginal mean of the potential outcomes, \(E[Y(a)]\), typically estimated by an IPTW-weighted GLM/GEE. The weighting (not the GEE machinery alone) is what handles time-varying treatment–confounder feedback.
SMD (standardized mean difference): a scale-free balance measure. This book uses SMD < 0.2 as the working balance threshold consistently across tutorials and exercise solutions.
R Markdown: written as two words (“R Markdown”) throughout; file extension .Rmd. This book is itself built with Quarto (.qmd), a successor to R Markdown / bookdown.