Perform Reliability Diagnostics on Survey Regression Models

This function takes a fitted survey regression model object (e.g., from `svyglm` or `svycoxph`) and produces a tibble with key reliability and diagnostic metrics for each coefficient.

Usage

svydiag(fit, p_threshold = 0.05, rse_threshold = 30)

Arguments

fit: A fitted model object from the `survey` package, such as `svyglm` or `svycoxph`.
p_threshold: A numeric value (between 0 and 1) for the significance threshold. Defaults to `0.05`.
rse_threshold: A numeric value for flagging high Relative Standard Error (RSE). Defaults to `30`.

Value

A `tibble` containing the following columns:

Term: The name of the regression coefficient.
Estimate: The coefficient's point estimate (e.g., on the log-odds scale for logistic models).
SE: The standard error of the estimate.
p.value: The p-value for the coefficient.
is_significant: A logical flag, `TRUE` if `p.value` is less than `p_threshold`.
CI_Lower: The lower bound of the 95
CI_Upper: The upper bound of the 95
CI_Width: The absolute width of the confidence interval (`CI_Upper - CI_Lower`).
RSE_percent: The Relative Standard Error, as a percentage.
is_rse_high: A logical flag, `TRUE` if `RSE_percent` is greater than or equal to `rse_threshold`.

Details

The output provides a comprehensive overview to help assess the stability and precision of each regression coefficient. The metrics include:

Standard Error (SE): A measure of the estimate's precision. Smaller is better.
p-value: The probability of observing the data if the coefficient were zero.
Confidence Interval (CI) Width: A wide CI indicates greater uncertainty.
Relative Standard Error (RSE): Calculated as `(SE / |Estimate|) * 100`.

Note on RSE: While included for comparative purposes, the use of RSE to evaluate the reliability of regression coefficients is not recommended by agencies like NCHS/CDC. Coefficients near zero can have an extremely large RSE even if precisely estimated. It is better to rely on the standard error, p-value, and confidence interval width for reliability assessment.

Examples

# Ensure required packages are loaded
if (requireNamespace("survey", quietly = TRUE) &&
    requireNamespace("NHANES", quietly = TRUE) &&
    requireNamespace("dplyr", quietly = TRUE)) {

  # 1. Prepare Data using the NHANES example
  data(NHANESraw, package = "NHANES")
  nhanes_adults_with_na <- NHANESraw %>%
    dplyr::filter(Age >= 20) %>%
    dplyr::mutate(
      ObeseStatus = factor(ifelse(BMI >= 30, "Obese", "Not Obese"),
                           levels = c("Not Obese", "Obese")),
      Race1 = factor(Race1)
    )

  # Create a complete-case design object for the regression model
  nhanes_complete <- nhanes_adults_with_na[complete.cases(
    nhanes_adults_with_na[, c("ObeseStatus", "Age", "Race1")]
  ), ]

  adult_design_complete <- survey::svydesign(
    id = ~SDMVPSU,
    strata = ~SDMVSTRA,
    weights = ~WTMEC2YR,
    nest = TRUE,
    data = nhanes_complete
  )

  # 2. Fit a survey-weighted logistic regression model
  fit <- survey::svyglm(
    ObeseStatus ~ Age + Race1,
    design = adult_design_complete,
    family = quasibinomial()
  )

  # 3. Get the reliability diagnostics table using the new function
  diagnostics_table <- svydiag(fit)

  # Print the resulting table
  print(diagnostics_table)

  # For a publication-ready table, pipe the result to kable()
  if (requireNamespace("knitr", quietly = TRUE)) {
    knitr::kable(diagnostics_table,
                 caption = "Reliability Diagnostics for NHANES Obesity Model",
                 digits = 3)
  }
}
#> # A tibble: 6 × 10
#>   Term       Estimate      SE  p.value is_significant CI_Lower CI_Upper CI_Width
#>   <chr>         <dbl>   <dbl>    <dbl> <lgl>             <dbl>    <dbl>    <dbl>
#> 1 (Intercep… -0.388   0.110   1.42e- 3 TRUE           -0.612    -0.163   0.449  
#> 2 Age         0.00766 0.00166 8.14e- 5 TRUE            0.00426   0.0111  0.00682
#> 3 Race1Hisp… -0.485   0.103   6.09e- 5 TRUE           -0.695    -0.274   0.421  
#> 4 Race1Mexi… -0.224   0.0885  1.73e- 2 TRUE           -0.405    -0.0427  0.363  
#> 5 Race1White -0.654   0.0824  1.18e- 8 TRUE           -0.823    -0.486   0.337  
#> 6 Race1Other -1.39    0.130   2.40e-11 TRUE           -1.65     -1.12    0.534  
#> # ℹ 2 more variables: RSE_percent <dbl>, is_rse_high <lgl>
#> 
#> 
#> Table: Reliability Diagnostics for NHANES Obesity Model
#> 
#> |Term          | Estimate|    SE| p.value|is_significant | CI_Lower| CI_Upper| CI_Width| RSE_percent|is_rse_high |
#> |:-------------|--------:|-----:|-------:|:--------------|--------:|--------:|--------:|-----------:|:-----------|
#> |(Intercept)   |   -0.388| 0.110|   0.001|TRUE           |   -0.612|   -0.163|    0.449|      28.255|FALSE       |
#> |Age           |    0.008| 0.002|   0.000|TRUE           |    0.004|    0.011|    0.007|      21.710|FALSE       |
#> |Race1Hispanic |   -0.485| 0.103|   0.000|TRUE           |   -0.695|   -0.274|    0.421|      21.222|FALSE       |
#> |Race1Mexican  |   -0.224| 0.088|   0.017|TRUE           |   -0.405|   -0.043|    0.363|      39.518|TRUE        |
#> |Race1White    |   -0.654| 0.082|   0.000|TRUE           |   -0.823|   -0.486|    0.337|      12.586|FALSE       |
#> |Race1Other    |   -1.387| 0.130|   0.000|TRUE           |   -1.654|   -1.120|    0.534|       9.398|FALSE       |