Chapter 1 Defining Parameter

1.1 Epidemiological research goals

Two common goals for epidemiological research are prediction and causal inference:

Prediction goal: The primary objective of a prediction goal is to forecast the occurrence or risk of an outcome ($Y$) based on one or more risk factors ($A$). The focus of this goal is often on making accurate predictions.

Causal goal: The causal goal focuses on understanding the causal relationship between a risk factor (often a treatment, $A$) and a health outcome ($Y$). Control for confounding factors ($L$) is often a necessary step in understanding such a relationship. The focus of this goal is often on estimating the parameter ‘treatment effect’.

We only focus on estimating treatment effect today. For that, let us define the notations first.

1.2 Potential outcome

$A$: Exposure status
- $1$ = takes Rosuvastatin
- $0$ = does not take rosuvastatin
$Y$: Outcome: Total cholesterol levels
- $Y(A=1)$ = potential outcome when exposed
- $Y(A=0)$ = potential outcome when not exposed

Relationship between $Y$ and $[Y(A=1), Y(A=0)]$ can be expressed as follows: $Y = A \times Y(A=1) + (1-A) \times Y(A=0)$

1.3 Parameters of interest

When assessing the effect of an exposure on an outcome, we are interested about the following estimands

treatment effect for an individual (TE)
average treatment effect (ATE)
average treatment effect on the treated (ATT)

1.3.1 TE

John takes Rosuvastatin $(A=1)$ and his total cholesterol level is = $Y(A=1)$ = $195$ mg/dL (milligrams per deciliter) after 3 months
John does not take Rosuvastatin $(A=0)$ and his total cholesterol level is = $Y(A=0)$ = $245$ mg/dL after 3 months Effect of Rosuvastatin on John is =

$TE = Y(A=1) - Y(A=0) = 195 - 245 = - 50$

TE is not estimable as we generally can’t observe outcomes under both treatment conditions.

1.3.2 ATE

Person <- c("John","Jim","Jake","Cody","Luke")
Y1 <- c( 195, 100, 210, 155, 165)
Y0 <- c(245, 160, 270, 210, 230)
PotentialOutcomes <- data.frame(Person, Y1, Y0, TE = Y1-Y0)
mean.values <- c(NA, mean(PotentialOutcomes$Y1),
                 mean(PotentialOutcomes$Y0),
                 mean(PotentialOutcomes$TE))
PotentialOutcomes <- rbind(PotentialOutcomes, mean.values)
kable(PotentialOutcomes, booktabs = TRUE, 
             col.names = c("Person", "Y(1)", "Y(0)", "TE")) %>%
  row_spec(6, bold = T, color = "white", background = "#D7261E")

Person	Y(1)	Y(0)	TE
John	195	245	-50
Jim	100	160	-60
Jake	210	270	-60
Cody	155	210	-55
Luke	165	230	-65
	165	223	-58

$ATE = E[Y(A=1)-Y(A=0)]$

mean(PotentialOutcomes$Y1 - PotentialOutcomes$Y0)

## [1] -58

1.3.3 Interpretation of ATE

This is a treatment effect (on an average) of the following hypothetical situation

having the entire population as treated, vs.
having the entire population as untreated.

Entire population is the reference goup here.

1.3.4 Identifiability Assumptions

Real-world scenario (both outcomes under different treatments can not be observed):

Person <- c("John","Jim","Jake","Cody","Luke")
Y1 <- c( NA, 100, NA, 155, NA)
Y0 <- c(245, NA, 270, NA, 230)
PotentialOutcomes <- data.frame(Person, Y1, Y0, TE = Y1-Y0)
mean.values <- c(NA, mean(PotentialOutcomes$Y1, na.rm = TRUE),
                 mean(PotentialOutcomes$Y0, na.rm = TRUE),
                 mean(PotentialOutcomes$TE))
PotentialOutcomes <- rbind(PotentialOutcomes, round(mean.values,1))
PotentialOutcomes[6,4] <- round(mean(PotentialOutcomes$Y1, na.rm = TRUE)- 
                 mean(PotentialOutcomes$Y0, na.rm = TRUE),1)
kable(PotentialOutcomes, booktabs = TRUE, 
             col.names = c("Person", "Y(1)", "Y(0)", "TE")) %>%
  row_spec(6, bold = T, color = "white", background = "#D7261E")

Person	Y(1)	Y(0)	TE
John		245.0
Jim	100.0
Jake		270.0
Cody	155.0
Luke		230.0
	127.5	248.3	-120.8

We can rearrange it as follows:

Person <- c("John","Jim","Jake","Cody","Luke")
A <- c( 0, 1, 0, 1, 0)
Y <- c(245, 100, 270, 155, 230)
RealOutcomes <- data.frame(Person, A, Y)
kable(RealOutcomes, booktabs = TRUE, 
             col.names = c("Person", "A", "Y"))

Person	A	Y
John	0	245
Jim	1	100
Jake	0	270
Cody	1	155
Luke	0	230

If we can compute a causal quantity, such as $ATE = E[Y(A=1)-Y(A=0)]$ or mean(PotentialOutcomes$Y1 - PotentialOutcomes$Y0) using a statistical quantity, such as $E[Y|A=1]-E[Y|A=0]$ or mean(Y[A=1]) - mean(Y[A=0]), we say that the causal quantity is identifiable. For such identifiability, we need to meet the following assumptions:

Exchangeability	$Y(1), Y(0) \perp A$	Treatment assignment is independent of the potential outcome
Positivity	$0 < P(A=1) < 1$	Subjects are eligible to receive both treatment
Consistency	$Y = Y(a) \forall A=a$	No multiple version of the treatment
No interference		Treated one patient will not impact outcome for others

Note here, from data we get the estimate of average TE is (100+155)/2 - (245+270+230)/3 = -120.8. Alternatively, we can calculate the beta coefficient associated with $A$ as follows:

round(coef(lm(Y~A)),1)

## (Intercept)           A 
##       248.3      -120.8

Here, beta coefficient associated with $A$ is -120.8, which is different than average TE -58 that we obtained from the potential outcome data table above. Part of it is because of finite sample bias (having only 5 data points) instead of infinite population. If we had a large enough sample, we would expect the estimate to be close to the true average TE.

You can find more detailed exploration of estimation in a different tutorial using a real data.

Extending these assumptions when confounders exist:

Conditional Exchangeability	$Y(1), Y(0) \perp A \| L$	Treatment assignment is independent of the potential outcome, given L
Positivity	$0 < P(A=1 \| L) < 1$	Subjects are eligible to receive both treatment, given L

Here, - $L$: Confounder: Age, could be an example

1.3.5 ATT

Assume that the following are the confounders that impact the relationship between rosuvastatin and cholesterol levels
- race
- sex
- age
We have 5 Rosuvastatin-treated subjects who are all
- white,
- male,
- 50 years of age
We recruited additional 5 subjects (same characteristics) to non-rosuvastatin group.

Treated group:

Person <- c("John","Jim","Jake","Cody","Luke")
Y1 <- c( 195, 100, 210, 155, 165)
Y0 <- rep(NA, length(Y1))
Treated <- data.frame(Person, Y1, Y0, TE = Y1-Y0)
Treated[6,2] <- mean(Treated$Y1)
kable(Treated, booktabs = TRUE, 
             col.names = c("Person", "Y(1)", "Y(0)", "TE"))%>%
  row_spec(6, bold = T, color = "white", background = "#D7261E")

Person	Y(1)	Y(0)	TE
John	195
Jim	100
Jake	210
Cody	155
Luke	165
	165

Untreated group: New folks with characteristics similar to the treated group.

Person <- c( "Jack", "Dustin", "Cole", "Lucas", "Dylan")
Y0 <- c( 245, 160, 270, 210, 165)
Y1 <- rep(NA, length(Y0))
Untreated <- data.frame(Person, Y1, Y0, TE = Y1-Y0)
Untreated[6,3] <- mean(Untreated$Y0)
kable(Untreated, booktabs = TRUE, 
             col.names = c("Person", "Y(1)", "Y(0)", "TE"))%>%
  row_spec(6, bold = T, color = "white", background = "#D7261E")

Person	Y(1)	Y(0)	TE
Jack		245
Dustin		160
Cole		270
Lucas		210
Dylan		165
		210

$ATT = E[Y(A=1)-Y(A=0) | A = 1]$

mean(Treated$Y1) - mean(Untreated$Y0)

## [1] -45

1.3.6 Interpretation of ATT

This is a treatment effect (on an average) of

the treated population (reference group), vs.
untreated population, but have similar characteristics to the reference group/treated population.

It is also possible to change the reference population to untreated population. Then it is called Average Treatment Effect for the Untreated (ATU).

1.3.7 ATT vs. ATE

	In a RCT (enough n), the ATT & ATE are equivalent
	In an observational study the ATT and ATE are not necessarily the same.

Exchangeability	\(Y(1), Y(0) \perp A\)	Treatment assignment is independent of the potential outcome
Positivity	\(0 < P(A=1) < 1\)	Subjects are eligible to receive both treatment
Consistency	\(Y = Y(a) \forall A=a\)	No multiple version of the treatment
No interference		Treated one patient will not impact outcome for others

Conditional Exchangeability	\(Y(1), Y(0) \perp A \| L\)	Treatment assignment is independent of the potential outcome, given L
Positivity	\(0 < P(A=1 \| L) < 1\)	Subjects are eligible to receive both treatment, given L