| Function_name | Package_name | Use |
|---|---|---|
| set.seed | base | Sets a seed for random number generation, ensuring the simulation is reproducible. |
| sample | base | Draws a random sample (with or without replacement); used to simulate coin flips and dice rolls. |
| replicate | base | Repeatedly evaluates an expression a fixed number of times; used to simulate multiple dice rolls per iteration. |
| numeric | base | Creates a numeric vector of a given length, pre-allocated to store results across iterations. |
| for | base | Loops over a sequence of iterations to repeat the random experiment many times (Monte Carlo loop). |
| sum | base | Adds up elements of a vector, e.g., counting the number of heads in a set of coin flips. |
| ifelse | base | Vectorised conditional selection; assigns the observed outcome based on the treatment value. |
| mean | base | Computes the arithmetic mean, e.g., the running average of the estimate across iterations. |
| round | base | Rounds numeric values to a specified number of decimal places for tidy output. |
| base | Prints an object's value to the console, e.g., the estimated probability. | |
| function | base | Defines a custom function, such as the `simulate_data` data-generating function. |
| require | base | Loads (attaches) an installed package so its functions are available. |
| hist | base/graphics | Computes (and optionally plots) a histogram; used here with `plot = FALSE` to get bin counts. |
| barplot | base/graphics | Draws a bar plot; used to display the manually computed density histogram. |
| diff | base | Computes successive differences of a vector, e.g., bin widths from histogram breaks. |
| density | base/stats | Estimates a kernel density, used for the mirrored cholesterol density plot. |
| data.frame | base | Creates a data frame, e.g., to store estimates and standard errors across iterations. |
| cbind | base | Binds vectors or data frames together column-wise. |
| names | base | Gets or sets the names of an object, e.g., renaming columns of the generated data. |
| order | base | Returns a permutation that sorts a vector; used to order rows by age (L) and ID. |
| match | base | Returns positions of matches; used to attach the confounded exposure by ID. |
| ggplot | ggplot2 | Initializes a ggplot2 plot object for the trace and density plots. |
| geom_line | ggplot2 | Adds a line layer (used for the trace plot of the running estimate). |
| geom_hline | ggplot2 | Adds a horizontal reference line, e.g., the true parameter value. |
| geom_area | ggplot2 | Adds a filled area layer for the mirrored density plot. |
| labs | ggplot2 | Sets plot titles and axis labels. |
| theme_minimal | ggplot2 | Applies the minimal ggplot2 theme for a clean appearance. |
| DAG.empty | simcausal | Initializes an empty DAG object (structural causal model). |
| node | simcausal | Defines a node (variable) in the DAG with its conditional distribution. |
| set.DAG | simcausal | Locks in the DAG definition so data can be simulated from it. |
| action | simcausal | Defines an intervention (action) on a node, e.g., setting A = 1 or A = 0. |
| sim | simcausal | Simulates observational and/or counterfactual data from the DAG. |
| plotDAG | simcausal | Plots the DAG to visualize the assumed causal structure. |
| plogis | base/stats | Inverse-logit (logistic CDF); converts a linear predictor to a probability for the exposure node. |
| rnorm | base/stats | Generates normally distributed values; the distribution used for L and Y nodes. |
| rbern | simcausal | Generates Bernoulli (0/1) values; the distribution used for the exposure node A. |
| glm | base/stats | Fits a generalized linear model, e.g., `glm(Y ~ A + L)` to estimate the treatment effect. |
| summary | base | Summarizes a fitted model, returning coefficients and standard errors. |
| coef | base/stats | Extracts model coefficients (e.g., the row for the exposure A). |
| simsum | rsimsum | Computes simulation performance measures (bias, SE, MSE, coverage, power) from stored estimates. |
| format | base | Formats numbers for display, e.g., switching off scientific notation. |
| kable | knitr | Renders a table in the output document. |
R Functions (U)
This review/summary page provides an extensive list of R functions used for the Monte Carlo simulation tasks covered in this chapter. Each function is systematically described, highlighting its primary package source and its specific utility.
To learn more about these functions, readers can:
Use R’s Built-in Help System: For each function, access its documentation by prefixing the function name with a question mark in the R console, e.g.,
?set.seed. This displays the function’s manual page with descriptions, usage, and examples.Search Websites: Simply Google, or visit the CRAN website to search for specific function documentation. Websites like Stack Overflow and RStudio Community often have discussions related to R functions.
Tutorials and Online Courses: Platforms like DataCamp, Coursera, and edX offer R courses that cover many functions in depth. Also there are examples of dedicated R tutorial websites that you might find useful. One example is “Introduction to R for health data analysis” by Ehsan Karim, An Hoang and Qu.
Books: There are numerous R programming books, such as “R for Data Science” by Hadley Wickham and “The Art of R Programming” by Norman Matloff.
Workshops and Webinars: Institutions and organizations occasionally offer R programming workshops or webinars.
Whenever in doubt, exploring existing resources can be highly beneficial.