Setup: Before you begin
This page is the starting point for running the tutorials yourself. It tells you what software to install, how to install all R packages used across the book in one step, and how the datasets are organized so the code runs end-to-end on a fresh machine.
Software
Installing all R packages in one step
Every package used anywhere in this book is listed in a single script, packages.R, in the repository root. It installs the CRAN packages and the few GitHub-only packages (including simcausal, WeightedROC, and svyTable1).
From a fresh R session, run:
or, to run it directly from GitHub without cloning:
GitHub-only packages. A few packages are not on CRAN and are installed from GitHub by packages.R via remotes::install_github():
svyTable1— survey-aware Table 1 / reporting toolkit, used in several survey, confounding, and missing-data chapters.simcausal— DAG-based data simulation, used in the Simulation and Causal Roles modules.WeightedROC— weighted ROC/AUC, used in the survey and missing-data modules.
Installing GitHub packages requires the remotes package (handled automatically) and, for some packages, a working compiler toolchain (Rtools on Windows, Xcode command-line tools on macOS).
Datasets and data access
Processed analytic datasets used by the tutorials are committed in the Data/ folder of the repository, organized by module (e.g., Data/wrangling/, Data/propensityscore/). Each chapter’s Overview page links to the relevant data folder.
- NHANES (US CDC) data are publicly available; the relevant cycles are either pre-processed in
Data/or downloaded in-tutorial via thenhanesA/NHANESpackages. - CCHS (Statistics Canada) microdata are not openly redistributable. The book ships the processed analytic objects needed to run the code; reproducing them from raw CCHS files requires institutional (e.g., UBC) access via the Canadian Research Data Centre Network / UBC Abacus.
Reproducibility standard. The analyses are intended to run end-to-end by another analyst on a fresh machine after source("packages.R") and obtaining the committed Data/ folder. Paths in the code are case-sensitive on Linux/macOS; keep the Data/ folder name capitalized exactly as in the repository.
Reporting problems
Found a bug, a broken link, or a package that fails to install? Please report it via this form so we can fix it.