Concepts (M)

Missing Data Analysis

This section is about understanding, categorizing, and addressing missing data in clinical and epidemiological research. It highlights the prevalence of missing data in these fields, the common use of complete case analysis without considering the implications, and the types of missingness: Missing Completely at Random (MCAR), Missing at Random (MAR), and Not Missing at Random (NMAR), each requiring different approaches and considerations. The consequences of not properly addressing missing data are detailed as bias, incorrect standard errors/precision, and a substantial loss of power.

This section also delves into strategies for addressing missing data, focusing on ad-hoc approaches and imputation methods. Ad-hoc approaches, such as ignoring missing data or using a missing category indicator, are generally dismissed as statistically invalid. In contrast, imputation, particularly multiple imputation (MI), is presented as a more robust and statistically sound method. Multiple imputation involves creating multiple complete datasets by predicting missing values and pooling the results to address the uncertainty associated with missing data. The section further discusses the types of imputation, the necessity of including a sufficient number of predictive variables, and the use of subject-area knowledge in building imputation models, providing a nuanced understanding of the challenges and solutions associated with missing data in research.

Reporting Guideline section delves into the complexities of handling missing data in statistical analysis, primarily through MI methods, especially Multiple Imputation by Chained Equations (MICE). It lays out the assumptions necessary for these methods (MCAR, MAR, MNAR). The guide also details how MICE works, using sequential regression imputation to create multiple imputed datasets, thereby allowing for more accurate and robust statistical inferences. Additionally, it provides comprehensive instructions on reporting MICE analysis, including detailing the missingness rates, the reasons for missing data, the assumptions made, and the specifics of the imputation and pooling methods used, ensuring transparency and reproducibility in research.

Reading list

Key reference: (Sterne and al. 2009)

Optional reading: (Van Buuren 2018)

Further optional readings: (Lumley 2011; Granger, Sergeant, and Lunt 2019; Hughes et al. 2019)

Video Lessons

Missing Data Analysis
Reporting guidelines when missing data is present

Video Lesson Slides

Missing data

Reporting guideline

References

Granger, Elizabeth, Jamie C. Sergeant, and Mark Lunt. 2019. “Avoiding Pitfalls When Combining Multiple Imputation and Propensity Scores.” Statistics in Medicine 38 (26): 5120–32.
Hughes, Rachael A., Jon Heron, Jonathan A. Sterne, and Kate Tilling. 2019. “Accounting for Missing Data in Statistical Analyses: Multiple Imputation Is Not Always the Answer.” International Journal of Epidemiology 1: 11.
Lumley, Thomas. 2011. Complex Surveys: A Guide to Analysis Using r. Vol. 565. John Wiley & Sons.
Sterne, Jonathan A., and et al. 2009. “Multiple Imputation for Missing Data in Epidemiological and Clinical Research: Potential and Pitfalls.” BMJ 338: b2393.
Van Buuren, Stef. 2018. Flexible Imputation of Missing Data. Chapman; Hall/CRC.