Setup: Before you begin

Note

This page is the starting point for running the tutorials yourself. It tells you what software to install, how to install all R packages used across the book in one step, and how the datasets are organized so the code runs end-to-end on a fresh machine.

Software

R (version 4.2.0 or newer recommended). Download from CRAN.
RStudio Desktop (recommended IDE). Download from posit.co.
The book itself is built with Quarto (bundled with recent RStudio). You only need Quarto if you want to render the book; you do not need it to run the individual tutorial code.

Installing all R packages in one step

Every package used anywhere in this book is listed in a single script, packages.R, in the repository root. It installs the CRAN packages and the few GitHub-only packages (including simcausal, WeightedROC, and svyTable1).

From a fresh R session, run:

# from the root of a local clone of the EpiMethods repository
source("packages.R")

or, to run it directly from GitHub without cloning:

source("https://raw.githubusercontent.com/ehsanx/EpiMethods/main/packages.R")

Important

GitHub-only packages. A few packages are not on CRAN and are installed from GitHub by packages.R via remotes::install_github():

svyTable1 — survey-aware Table 1 / reporting toolkit, used in several survey, confounding, and missing-data chapters.
simcausal — DAG-based data simulation, used in the Simulation and Causal Roles modules.
WeightedROC — weighted ROC/AUC, used in the survey and missing-data modules.

Installing GitHub packages requires the remotes package (handled automatically) and, for some packages, a working compiler toolchain (Rtools on Windows, Xcode command-line tools on macOS).

Datasets and data access

Processed analytic datasets used by the tutorials are committed in the Data/ folder of the repository, organized by module (e.g., Data/wrangling/, Data/propensityscore/). Each chapter’s Overview page links to the relevant data folder.

NHANES (US CDC) data are publicly available; the relevant cycles are either pre-processed in Data/ or downloaded in-tutorial via the nhanesA/NHANES packages.
CCHS (Statistics Canada) microdata are not openly redistributable. The book ships the processed analytic objects needed to run the code; reproducing them from raw CCHS files requires institutional (e.g., UBC) access via the Canadian Research Data Centre Network / UBC Abacus.

Tip

Reproducibility standard. The analyses are intended to run end-to-end by another analyst on a fresh machine after source("packages.R") and obtaining the committed Data/ folder. Paths in the code are case-sensitive on Linux/macOS; keep the Data/ folder name capitalized exactly as in the repository.

Reporting problems

Warning

Found a bug, a broken link, or a package that fails to install? Please report it via this form so we can fix it.