Simulation

Background

This chapter provides a structured introduction to using Monte Carlo simulation for statistical inference and evaluation. They guide the reader through key concepts—beginning with the fundamentals of simulation, followed by methods for assessing estimator performance, and culminating in the estimation of causal effects using simulated data. Each tutorial emphasizes how artificial data generated under known conditions can be used to study the behavior of statistical estimators, especially when theoretical solutions are complex or unavailable. By integrating hands-on examples, the tutorials illustrate how simulation supports robust statistical reasoning, performance evaluation, and causal inference in controlled settings.

Overview of tutorials

Monte Carlo Simulation

This tutorial explains how Monte Carlo simulation is used to estimate probabilities in uncertain scenarios by generating and analyzing many random samples. It begins by defining simulations as experiments that create artificial data to test statistical methods when real data is limited or assumptions are unmet. Monte Carlo methods are then introduced as a way to approximate sampling distributions through repeated random sampling. Two examples are provided: one estimates the probability of getting exactly three heads in five coin flips, and the other estimates the likelihood of rolling a sum of 10 or more at least twice in five paired dice rolls. In both cases, simulation results closely match theoretical probabilities, illustrating how Monte Carlo methods help approximate complex probabilities when analytical solutions are impractical.

Performance Measures

This tutorial extends Monte Carlo simulation by introducing performance measures used to assess the accuracy and reliability of simulation-based estimates. It explains how convergence, bias, relative bias, empirical and model-based standard errors, mean squared error, and coverage can quantify how well a simulation captures the true parameter value. The tutorial revisits a coin-flip example to illustrate these metrics, showing how estimates stabilize over iterations and how numerical measures like bias and standard error can validate simulation accuracy. Ultimately, these performance measures help determine whether a simulation provides reliable and valid inference in settings where analytical solutions are impractical.

Estimating Causal Effects

This tutorial demonstrates how to estimate causal effects using regression-based Monte Carlo simulations under a fully specified data-generating process. A simulated dataset is constructed where age confounds the relationship between diabetes medication and cholesterol levels, and the true treatment effect is set to be known. By repeatedly generating data and fitting regression models, the simulation estimates the average treatment effect and evaluates estimator performance using standard metrics. The results confirm that the estimates converge toward the true effect, with minimal bias, consistent variance, and accurate confidence interval coverage—highlighting how simulation can validate causal inference under controlled conditions.

Note

What is Coming Next:

This chapter on Monte Carlo simulation lays the groundwork for the following chapter on causal roles by introducing key methods for evaluating statistical estimators in controlled settings. Through simulation, it demonstrates how treatment effects can be estimated and validated when the data-generating process is fully known. This foundation directly connects to the causal roles chapter, which builds on these ideas to explore how specific types of variables—such as confounders, mediators, colliders, and instruments—affect causal effect estimation. Together, these chapters form a logical progression: from simulating causal effects under ideal conditions to examining the complexities and biases that arise in real-world causal analysis.

Warning

Bug Report:

Fill out this form to report any issues with the tutorial.