Concepts (D)
Survey Data Analysis
Design-based analysis differs from model-based analysis in its approach to handling survey data. Design-based analysis emphasizes the importance of the survey’s sampling method and structure, focusing on representativeness and accurate variance estimation according to how the data was collected. It accounts for the complexities of the sampling design, e.g., stratification and clustering, to ensure that results are representative of the entire population. On the other hand, model-based analysis uses statistical models to understand relationships and patterns, assuming data come from a specific distribution and often relying on random sampling.
Understanding survey features such as weights, strata, and clusters is crucial in complex survey data analysis. Survey weights adjust for unequal probabilities of selection and nonresponse, ensuring that the sample represents the population accurately. Stratification improves precision and representation of subgroups, while clustering, often used for practicality and cost considerations, must be accounted for to avoid underestimating standard errors. These features are vital in design-based analysis to provide unbiased, reliable estimates and are what fundamentally distinguish it from model-based approaches, which may not reflect the difficulties of complex survey structures. NHANES is used an an example to explain these ideas.
Reading list
Key reference: (Steven G. Heeringa, West, and Berglund 2017) (chapters 2 and 3)
Optional reading: (Steven G. Heeringa, West, and Berglund 2014)
Theoretical references (optional):
- F/chi-squared statistic with the Rao-Scott second-order correction (Rao and Scott 1984; Koch, Freeman Jr, and Freeman 1975; Thomas and Rao 1987)
- AIC and BIC for modeling with complex survey data (Lumley and Scott 2015)
- Pseudo-R2 statistics under complex sampling (Lumley 2017)
- Tests for regression models fitted to survey data (Lumley and Scott 2014)
- Goodness-of-fit test for a logistic regression model fitted using survey sample data (Archer and Lemeshow 2006)
Video Lessons
What is included in this Video Lesson:
- reference 00:38
- design-based 1:28
- examples 3:33
- NHANES and sampling 4:54
- weights and other survey features 9:05
- estimate of interest 12:55
- design effect 15:52
- Variance estimation 18:13
- design-based analysis 25:11
- How to make inference 29:33
- inappropriate analysis 32:08
- how useful are sampling weights 36:15
- how useful are psu/cluster info 37:42
- subpopulation / subsetting 38:57
- missingness collected to weights? 40:45
- Dealing with subpopulation 41:38
The timestamps are also included in the YouTube video description.