R Functions (U)

Note

This review/summary page provides an extensive list of R functions used for the Monte Carlo simulation tasks covered in this chapter. Each function is systematically described, highlighting its primary package source and its specific utility.

To learn more about these functions, readers can:

  1. Use R’s Built-in Help System: For each function, access its documentation by prefixing the function name with a question mark in the R console, e.g., ?set.seed. This displays the function’s manual page with descriptions, usage, and examples.

  2. Search Websites: Simply Google, or visit the CRAN website to search for specific function documentation. Websites like Stack Overflow and RStudio Community often have discussions related to R functions.

  3. Tutorials and Online Courses: Platforms like DataCamp, Coursera, and edX offer R courses that cover many functions in depth. Also there are examples of dedicated R tutorial websites that you might find useful. One example is “Introduction to R for health data analysis” by Ehsan Karim, An Hoang and Qu.

  4. Books: There are numerous R programming books, such as “R for Data Science” by Hadley Wickham and “The Art of R Programming” by Norman Matloff.

  5. Workshops and Webinars: Institutions and organizations occasionally offer R programming workshops or webinars.

Whenever in doubt, exploring existing resources can be highly beneficial.

Function_name Package_name Use
set.seed base Sets a seed for random number generation, ensuring the simulation is reproducible.
sample base Draws a random sample (with or without replacement); used to simulate coin flips and dice rolls.
replicate base Repeatedly evaluates an expression a fixed number of times; used to simulate multiple dice rolls per iteration.
numeric base Creates a numeric vector of a given length, pre-allocated to store results across iterations.
for base Loops over a sequence of iterations to repeat the random experiment many times (Monte Carlo loop).
sum base Adds up elements of a vector, e.g., counting the number of heads in a set of coin flips.
ifelse base Vectorised conditional selection; assigns the observed outcome based on the treatment value.
mean base Computes the arithmetic mean, e.g., the running average of the estimate across iterations.
round base Rounds numeric values to a specified number of decimal places for tidy output.
print base Prints an object's value to the console, e.g., the estimated probability.
function base Defines a custom function, such as the `simulate_data` data-generating function.
require base Loads (attaches) an installed package so its functions are available.
hist base/graphics Computes (and optionally plots) a histogram; used here with `plot = FALSE` to get bin counts.
barplot base/graphics Draws a bar plot; used to display the manually computed density histogram.
diff base Computes successive differences of a vector, e.g., bin widths from histogram breaks.
density base/stats Estimates a kernel density, used for the mirrored cholesterol density plot.
data.frame base Creates a data frame, e.g., to store estimates and standard errors across iterations.
cbind base Binds vectors or data frames together column-wise.
names base Gets or sets the names of an object, e.g., renaming columns of the generated data.
order base Returns a permutation that sorts a vector; used to order rows by age (L) and ID.
match base Returns positions of matches; used to attach the confounded exposure by ID.
ggplot ggplot2 Initializes a ggplot2 plot object for the trace and density plots.
geom_line ggplot2 Adds a line layer (used for the trace plot of the running estimate).
geom_hline ggplot2 Adds a horizontal reference line, e.g., the true parameter value.
geom_area ggplot2 Adds a filled area layer for the mirrored density plot.
labs ggplot2 Sets plot titles and axis labels.
theme_minimal ggplot2 Applies the minimal ggplot2 theme for a clean appearance.
DAG.empty simcausal Initializes an empty DAG object (structural causal model).
node simcausal Defines a node (variable) in the DAG with its conditional distribution.
set.DAG simcausal Locks in the DAG definition so data can be simulated from it.
action simcausal Defines an intervention (action) on a node, e.g., setting A = 1 or A = 0.
sim simcausal Simulates observational and/or counterfactual data from the DAG.
plotDAG simcausal Plots the DAG to visualize the assumed causal structure.
plogis base/stats Inverse-logit (logistic CDF); converts a linear predictor to a probability for the exposure node.
rnorm base/stats Generates normally distributed values; the distribution used for L and Y nodes.
rbern simcausal Generates Bernoulli (0/1) values; the distribution used for the exposure node A.
glm base/stats Fits a generalized linear model, e.g., `glm(Y ~ A + L)` to estimate the treatment effect.
summary base Summarizes a fitted model, returning coefficients and standard errors.
coef base/stats Extracts model coefficients (e.g., the row for the exposure A).
simsum rsimsum Computes simulation performance measures (bias, SE, MSE, coverage, power) from stored estimates.
format base Formats numbers for display, e.g., switching off scientific notation.
kable knitr Renders a table in the output document.