R Functions (U)

Note

This review/summary page provides an extensive list of R functions used for the Monte Carlo simulation tasks covered in this chapter. Each function is systematically described, highlighting its primary package source and its specific utility.

To learn more about these functions, readers can:

Use R’s Built-in Help System: For each function, access its documentation by prefixing the function name with a question mark in the R console, e.g., ?set.seed. This displays the function’s manual page with descriptions, usage, and examples.
Search Websites: Simply Google, or visit the CRAN website to search for specific function documentation. Websites like Stack Overflow and RStudio Community often have discussions related to R functions.
Tutorials and Online Courses: Platforms like DataCamp, Coursera, and edX offer R courses that cover many functions in depth. Also there are examples of dedicated R tutorial websites that you might find useful. One example is “Introduction to R for health data analysis” by Ehsan Karim, An Hoang and Qu.
Books: There are numerous R programming books, such as “R for Data Science” by Hadley Wickham and “The Art of R Programming” by Norman Matloff.
Workshops and Webinars: Institutions and organizations occasionally offer R programming workshops or webinars.

Whenever in doubt, exploring existing resources can be highly beneficial.

Function_name	Package_name	Use
set.seed	base	Sets a seed for random number generation, ensuring the simulation is reproducible.
sample	base	Draws a random sample (with or without replacement); used to simulate coin flips and dice rolls.
replicate	base	Repeatedly evaluates an expression a fixed number of times; used to simulate multiple dice rolls per iteration.
numeric	base	Creates a numeric vector of a given length, pre-allocated to store results across iterations.
for	base	Loops over a sequence of iterations to repeat the random experiment many times (Monte Carlo loop).
sum	base	Adds up elements of a vector, e.g., counting the number of heads in a set of coin flips.
ifelse	base	Vectorised conditional selection; assigns the observed outcome based on the treatment value.
mean	base	Computes the arithmetic mean, e.g., the running average of the estimate across iterations.
round	base	Rounds numeric values to a specified number of decimal places for tidy output.
print	base	Prints an object's value to the console, e.g., the estimated probability.
function	base	Defines a custom function, such as the `simulate_data` data-generating function.
require	base	Loads (attaches) an installed package so its functions are available.
hist	base/graphics	Computes (and optionally plots) a histogram; used here with `plot = FALSE` to get bin counts.
barplot	base/graphics	Draws a bar plot; used to display the manually computed density histogram.
diff	base	Computes successive differences of a vector, e.g., bin widths from histogram breaks.
density	base/stats	Estimates a kernel density, used for the mirrored cholesterol density plot.
data.frame	base	Creates a data frame, e.g., to store estimates and standard errors across iterations.
cbind	base	Binds vectors or data frames together column-wise.
names	base	Gets or sets the names of an object, e.g., renaming columns of the generated data.
order	base	Returns a permutation that sorts a vector; used to order rows by age (L) and ID.
match	base	Returns positions of matches; used to attach the confounded exposure by ID.
ggplot	ggplot2	Initializes a ggplot2 plot object for the trace and density plots.
geom_line	ggplot2	Adds a line layer (used for the trace plot of the running estimate).
geom_hline	ggplot2	Adds a horizontal reference line, e.g., the true parameter value.
geom_area	ggplot2	Adds a filled area layer for the mirrored density plot.
labs	ggplot2	Sets plot titles and axis labels.
theme_minimal	ggplot2	Applies the minimal ggplot2 theme for a clean appearance.
DAG.empty	simcausal	Initializes an empty DAG object (structural causal model).
node	simcausal	Defines a node (variable) in the DAG with its conditional distribution.
set.DAG	simcausal	Locks in the DAG definition so data can be simulated from it.
action	simcausal	Defines an intervention (action) on a node, e.g., setting A = 1 or A = 0.
sim	simcausal	Simulates observational and/or counterfactual data from the DAG.
plotDAG	simcausal	Plots the DAG to visualize the assumed causal structure.
plogis	base/stats	Inverse-logit (logistic CDF); converts a linear predictor to a probability for the exposure node.
rnorm	base/stats	Generates normally distributed values; the distribution used for L and Y nodes.
rbern	simcausal	Generates Bernoulli (0/1) values; the distribution used for the exposure node A.
glm	base/stats	Fits a generalized linear model, e.g., `glm(Y ~ A + L)` to estimate the treatment effect.
summary	base	Summarizes a fitted model, returning coefficients and standard errors.
coef	base/stats	Extracts model coefficients (e.g., the row for the exposure A).
simsum	rsimsum	Computes simulation performance measures (bias, SE, MSE, coverage, power) from stored estimates.
format	base	Formats numbers for display, e.g., switching off scientific notation.
kable	knitr	Renders a table in the output document.