Longitudinal data

Background

The chapter focuses on longitudinal data analysis. The first dives into mixed effects models, highlighting their importance in studying repeated measurements, especially in longitudinal or clustered data. These models comprise two essential components: fixed effects, which represent universal trends across subjects or clusters, and random effects, capturing individual attributes of each subject or cluster. The tutorial explains their application using datasets, emphasizing model selection metrics and validation methods. The second tutorial introduces the generalized estimating equation (GEE), suitable for modeling non-normally distributed longitudinal data. GEE differentiates from mixed-effects models in its distribution assumptions and its capability to handle various correlation structures. The tutorial illustrates GEE’s application, emphasizing its advantages over other regression models for repeated measurements, and ends with a comparison between GEE and mixed models’ random effects.

Important

Datasets:

All of the datasets used in this tutorial can be accessed from this GitHub repository folder

Overview of tutorials

Longitudinal data analysis is specialized for handling data collected over time with repeated measurements on the same subjects. It accounts for within-subject correlation, time trends, and subject-specific effects, making it distinct from traditional regression, propensity score analysis, and machine learning approaches we discussed earlier, which are not designed for the longitudinal aspect of the data. Choosing the appropriate analysis method depends on the specific research question and the nature of the data at hand.

Mixed effects models

This tutorial provides an in-depth look into mixed effects models, which are instrumental in analyzing repeated measurements, especially in longitudinal or clustered data scenarios. Within the context of such models, there are two core components: fixed effects and random effects. Fixed effects depict broad trends applicable universally across subjects or clusters, offering insights such as average student performance across different subjects. In contrast, random effects spotlight the distinct attributes of each subject or cluster, accounting for variables such as the quality of resources or the experience of teachers in schools. Using a dataset, the tutorial demonstrates the application and visualization of linear mixed effects models. Emphasis is also placed on model selection, using metrics like AIC and BIC, and on validating the assumptions of the model using residual plots and QQ-plots.

GEE

The tutorial introduces the generalized estimating equation (GEE) as a method for modeling longitudinal or clustered data that can be non-normally distributed, such as binary, count, or skewed data. Unlike mixed-effects models that assume a normal distribution for error terms and beta coefficients, GEE is distribution-free but assumes specific correlation structures within subjects or clusters. The tutorial exemplifies the application of GEE using respiratory data with a binary response variable. While logistic regression and Poisson regression handle binary and count data, they do not account for repeated measurement structures, potentially leading to incorrect standard error estimates. GEE allows for various correlation structures, and the tutorial details several types, including independent, exchangeable, AR-1 autoregressive, and unstructured correlations. The GEE models produce consistent estimates even if the correlation structure is misspecified. The tutorial concludes by comparing GEE with random effects in mixed models using several code examples.

Tip

Optional Content:

You’ll find that some sections conclude with an optional video walkthrough that demonstrates the code. Keep in mind that the content might have been updated since these videos were recorded. Watching these videos is optional.

Warning

Bug Report:

Fill out this form to report any issues with the tutorial.