Concepts (P)

Research goals

Epidemiologists identify four types of inferential goals in research:

  1. Prediction: Establishing models to predict future outcomes based on current or past information.
  2. Evaluating an exposure of primary interest: Focusing on assessing the impact or significance of a specific exposure variable within a model.
  3. Identifying the important independent predictors of an outcome: Determining which variables most significantly affect the outcome and understanding the strength and nature of these relationships.
  4. Descriptive: A possible emphasis on describing the data and relationships within it.

Reading list

Key reference: (Vittinghoff et al. 2011)

Optional reading:

Exercise:

Which type of goal does this article have?

Video Lessons

Inferential goals in an epidemiological study

Prediction, causal, important predictors, descriptive

Goal 1: Prediction model

discrimination, calibration, overfitting, validation, model selection

Goal 2: Causal exploration

outcome vs. exposure of primary interest

Goal 3: Outcome vs. multiple exposures
Centering and scaling
Choosing reference level

Criteria to determine the appropriate reference level for a categorical covariate:

  • If the covariate possesses at least an ordinal nature and serves primarily as an adjustment variable, it is advisable to select either the lowest or highest category as the reference level. This choice can be particularly useful in uncovering potential dose-response relationships within the data.

  • Consider the specific aspect you wish to emphasize in your interpretation. For instance, if you aim to shed light on the concept of an unhealthy diet within the context, opting for the “healthy” category as the reference level can align with the typical causal motivation behind the choice.

  • In general, it is advisable to designate the category with the highest frequency as the reference level. This selection carries a statistical advantage, especially when dealing with imbalanced categories. Avoid choosing a low-frequency category as the reference level, as regression estimates may become highly unstable under such circumstances.

Video Lesson Slides

References

Greenland, Sander, and Neil Pearce. 2015. “Statistical Foundations for Model-Based Adjustments.” Annual Review of Public Health 36: 89–108. https://doi.org/10.1146/annurev-publhealth-031914-122531.
Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive Modeling. Vol. 26. Springer.
Vittinghoff, Eric, David V. Glidden, Stephen C. Shiboski, and Charles E. McCulloch. 2011. “Predictor Selection.” In Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models. Springer.
Williamson, Elizabeth J, Alex J Walker, Krishnan Bhaskaran, Seb Bacon, Chris Bates, Caroline E Morton, Helen J Curtis, et al. 2020. “Factors Associated with COVID-19-Related Death Using OpenSAFELY.” Nature 584 (7821): 430–36.