Exact Matching (CCHS)

In the following code chunk, we load the necessary R libraries for our analysis. MatchIt is used for matching methods to find comparable control units, tableone for creating Table 1 to describe baseline characteristics, Publish for generating readable output of regression analysis, and survey for analyzing complex survey samples.

# Load required packages
library(MatchIt)
require(tableone)
require(Publish)
require(survey)

Load data

In the following code chunk, we load the CCHS dataset which is related to the Canadian Community Health Survey (CCHS). We then use ls() to list all objects in the workspace and str to display the structure of the data frame, providing a quick overview of the data and checking for any character variables.

load("Data/propensityscore/cchs123b.RData")
ls()
#> [1] "analytic.miss" "analytic2"
str(analytic.miss) # is there any character variable?
#> 'data.frame':    397173 obs. of  22 variables:
#>  $ CVD      : chr  "event" "no event" "no event" "no event" ...
#>  $ age      : chr  "65 years and over" "65 years and over" "30-39 years" "65 years and over" ...
#>  $ sex      : chr  "Female" "Female" "Male" "Female" ...
#>  $ married  : chr  "single" "single" "not single" "single" ...
#>  $ race     : chr  "White" "White" "White" "White" ...
#>  $ edu      : chr  "2nd grad." "Post-2nd grad." "Post-2nd grad." "Post-2nd grad." ...
#>  $ income   : chr  "$29,999 or less" "$29,999 or less" "$80,000 or more" "$29,999 or less" ...
#>  $ bmi      : Factor w/ 3 levels "Underweight",..: NA NA 2 NA 2 NA 3 NA 2 3 ...
#>  $ phyact   : chr  "Inactive" "Inactive" "Inactive" "Inactive" ...
#>  $ doctor   : chr  "Yes" "Yes" "Yes" "Yes" ...
#>  $ stress   : chr  "Not too stressed" "Not too stressed" "stressed" "Not too stressed" ...
#>  $ smoke    : Factor w/ 3 levels "Never smoker",..: 3 1 3 3 2 2 3 1 2 2 ...
#>  $ drink    : Factor w/ 3 levels "Never drank",..: 2 1 2 2 2 2 3 1 2 2 ...
#>  $ fruit    : Factor w/ 3 levels "0-3 daily serving",..: 2 2 2 3 3 3 2 2 2 2 ...
#>  $ bp       : chr  "No" "No" "No" "No" ...
#>  $ diab     : chr  "No" "No" "No" "No" ...
#>  $ province : Factor w/ 2 levels "South","North": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ weight   : num  142.8 71.4 168.3 71.4 196.1 ...
#>  $ cycle    : Factor w/ 3 levels "11","21","31": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ ID       : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ OA       : chr  "Control" "Control" "Control" "Control" ...
#>  $ immigrate: Factor w/ 3 levels "not immigrant",..: 1 1 3 1 1 1 1 1 1 1 ...

Data pre-pocessing

In the following code chunk, we define a vector containing the names of variables of interest that needs to be converted to factor variables. We then convert these variables to factors, ensuring they are treated as categorical in subsequent analyses. We also recode the Osteoarthritis (OA) variable into a numeric binary format and display the frequency table of OA before and after the transformation.

var.names <- c("age", "sex", "stress", "married", "income", "race", 
               "bmi", "phyact", "smoke", "doctor", "drink", "bp", "province",
               "immigrate", "fruit", "diab", "edu", "CVD", "OA")
analytic.miss[var.names] <- lapply(analytic.miss[var.names] , factor)
table(analytic.miss$OA)
#> 
#> Control      OA 
#>  314542   40943
analytic.miss$OA <- as.numeric(analytic.miss$OA=="OA") 
table(analytic.miss$OA)
#> 
#>      0      1 
#> 314542  40943

Identify subjects with missing

In the following code chunk, we create a new variable miss and initially assign all its values to 1 in the full dataset (that contains some missing observations). We then adjust this assignment by setting miss to 0 for observations that are also present in another complete case dataset. That means any row with miss equal to 0 means that row has no missing observations. Finally, we display the frequency table of the miss variable to check the number of missing and non-missing observations.

analytic.miss$miss <- 1
head(analytic.miss$ID) # full data
#> [1] 1 2 3 4 5 6
head(analytic2$ID) # complete case
#> [1]  3  5  7 10 11 13
head(analytic.miss$ID[analytic.miss$ID %in% analytic2$ID])
#> [1]  3  5  7 10 11 13
# if associated with complete case, assign miss <- 0
analytic.miss$miss[analytic.miss$ID %in% analytic2$ID] <- 0
table(analytic.miss$miss)
#> 
#>      0      1 
#> 185613 211560

Setting Design

Unconditional design

In the following code chunk, we explore the summary of the weight variable and establish an unconditional survey design object w.design0 using the svydesign function, which will be used for subsequent survey-weighted analyses. We then explore the summary, standard deviation, and sum of the weights within our design object.

summary(analytic.miss$weight)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    1.17   65.28  126.63  200.09  243.21 7154.95
w.design0 <- svydesign(id=~1, weights=~weight, 
                      data=analytic.miss)
summary(weights(w.design0))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    1.17   65.28  126.63  200.09  243.21 7154.95
sd(weights(w.design0))
#> [1] 241.0279
sum(weights(w.design0))
#> [1] 79468929

Conditioning the design

In the following code chunk, we create a new survey design object w.design by subsetting w.design0 to only include observations without missing data (miss == 0). We then explore the summary, standard deviation, and sum of the weights within this new design object.

w.design <- subset(w.design0, miss == 0)
summary(weights(w.design))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    1.17   71.56  137.95  214.61  261.91 7154.95
sd(weights(w.design))
#> [1] 254.9346
sum(weights(w.design))
#> [1] 39835061

Subset data (more!)

We subset the data for fast results (less computation). We will only work with cycle 1.1, and the people from Northern provinces in Canada.

w.design1 <- subset(w.design, cycle == 11 & province == "North")
sum(weights(w.design1))
#> [1] 42786.28

Preliminary analysis

Table 1

In the following code chunk, we define a new variable vector var.names and create a categorical table using svyCreateCatTable to explore the distribution of age and sex across strata of OA within our subsetted design object w.design1. We then print the table with standardized mean differences (SMD) to assess the balance of these variables across strata.

var.names <- c("age", "sex")
tab0 <- svyCreateCatTable(var = var.names, strata= "OA", data=w.design1,test=FALSE)
print(tab0, smd = TRUE)
#>                       Stratified by OA
#>                        0               1              SMD   
#>   n                    40691.2         2095.1               
#>   age (%)                                              1.084
#>      20-29 years       10889.4 (26.8)   120.9 ( 5.8)        
#>      30-39 years       12251.7 (30.1)   237.8 (11.3)        
#>      40-49 years       11094.0 (27.3)   572.7 (27.3)        
#>      50-59 years        5346.6 (13.1)  1092.4 (52.1)        
#>      60-64 years        1109.4 ( 2.7)    71.4 ( 3.4)        
#>      65 years and over     0.0 ( 0.0)     0.0 ( 0.0)        
#>      teen                  0.0 ( 0.0)     0.0 ( 0.0)        
#>   sex = Male (%)       20824.6 (51.2)  1050.8 (50.2)   0.020

Treatment effect

In the following code chunk, we fit a logistic regression model using svyglm to estimate the effect of OA and other covariates on the binary outcome CVD (cardiovascular disease). We then use publish to display the results in a readable format.

fit.outcome <- svyglm(I(CVD=="event") ~ OA + age + sex + stress + married + 
                         income + race + bmi + phyact + smoke +
                         immigrate + fruit + diab + edu,
                   design = w.design1,
                   family = binomial(logit))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
publish(fit.outcome)
#>   Variable             Units OddsRatio         CI.95    p-value 
#>         OA                        0.89   [0.17;4.59]   0.887411 
#>        age       20-29 years       Ref                          
#>                  30-39 years      2.62  [0.29;23.43]   0.389521 
#>                  40-49 years      4.89  [0.59;40.73]   0.142280 
#>                  50-59 years     17.95 [2.59;124.68]   0.003550 
#>                  60-64 years     23.95 [3.41;168.27]   0.001439 
#>        sex            Female       Ref                          
#>                         Male      1.32   [0.64;2.71]   0.456222 
#>     stress  Not too stressed       Ref                          
#>                     stressed      0.54   [0.21;1.39]   0.198815 
#>    married        not single       Ref                          
#>                       single      0.75   [0.31;1.80]   0.513807 
#>     income   $29,999 or less       Ref                          
#>              $30,000-$49,999      0.72   [0.24;2.16]   0.556703 
#>              $50,000-$79,999      0.95   [0.27;3.40]   0.939104 
#>              $80,000 or more      0.47   [0.10;2.15]   0.332557 
#>       race         Non-white       Ref                          
#>                        White      0.33   [0.11;0.94]   0.038131 
#>        bmi       Underweight       Ref                          
#>               healthy weight      0.29   [0.03;3.20]   0.310237 
#>                   Overweight      0.44   [0.04;4.77]   0.503130 
#>     phyact            Active       Ref                          
#>                     Inactive      0.84   [0.30;2.40]   0.751345 
#>                     Moderate      1.02   [0.32;3.27]   0.979528 
#>      smoke      Never smoker       Ref                          
#>               Current smoker      0.98   [0.26;3.76]   0.981454 
#>                Former smoker      0.71   [0.18;2.71]   0.612518 
#>  immigrate     not immigrant       Ref                          
#>                   > 10 years      0.14   [0.03;0.78]   0.025010 
#>                       recent      0.00   [0.00;0.00]    < 1e-04 
#>      fruit 0-3 daily serving       Ref                          
#>            4-6 daily serving      1.15   [0.52;2.56]   0.725722 
#>             6+ daily serving      0.68   [0.17;2.71]   0.583752 
#>       diab                No       Ref                          
#>                          Yes      3.08  [0.93;10.23]   0.066677 
#>        edu          < 2ndary       Ref                          
#>                    2nd grad.      4.12  [0.87;19.43]   0.074178 
#>              Other 2nd grad.      3.04  [0.63;14.67]   0.167135 
#>               Post-2nd grad.      3.00  [0.82;10.98]   0.096939

Matching: Estimating treatment effect

Going back to the data (not working on design here while matching)

In the following code chunk, we create a new dataset by omitting NA values from analytic.miss and converting it to a data frame. We then create a subset analytic11n which includes only observations from cycle 1.1 and the Northern provinces. We display the dimensions of this subset, as well as frequency tables of OA and a cross-tabulation of OA and age to understand the distribution of our target variable and a key covariate.

# Create the dataset without design features
analytic2 <- as.data.frame(na.omit(analytic.miss))
analytic11n <- subset(analytic2, cycle == 11 & province == "North")
dim(analytic11n)
#> [1] 1424   23
table(analytic11n$OA)
#> 
#>    0    1 
#> 1357   67
table(analytic11n$OA,analytic11n$age)
#>    
#>     20-29 years 30-39 years 40-49 years 50-59 years 60-64 years
#>   0         345         432         358         177          45
#>   1           4          11          18          31           3
#>    
#>     65 years and over teen
#>   0                 0    0
#>   1                 0    0

Matching by 1 matching variable

In the following code chunk, we perform exact matching using a single variable, age. We define the matching formula and apply the matchit function to create matched sets of treated and control units. The resulting matching.obj object is displayed to summarize the matching results.

match.formula <- as.formula("OA ~ age")
matching.obj <- matchit(match.formula,
                        data = analytic11n,
                        method = "exact")
matching.obj
#> A matchit object
#>  - method: Exact matching
#>  - number of obs.: 1424 (original), 1424 (matched)
#>  - target estimand: ATT
#>  - covariates: age

Matching by 2 matching variables

In the following code chunk, we extend the matching to include two variables, age and sex. We create a new variable var.comb that concatenates these two variables and display its frequency table and the number of unique combinations. We then perform exact matching using both variables and display the resulting object.

var.comb <- do.call('paste0', 
                    analytic11n[, c('age', 'sex')])
table(var.comb)
#> var.comb
#> 20-29 yearsFemale   20-29 yearsMale 30-39 yearsFemale   30-39 yearsMale 
#>               184               165               220               223 
#> 40-49 yearsFemale   40-49 yearsMale 50-59 yearsFemale   50-59 yearsMale 
#>               187               189               101               107 
#> 60-64 yearsFemale   60-64 yearsMale 
#>                24                24
length(table(var.comb))
#> [1] 10
match.formula <- as.formula("OA ~ age + sex")
matching.obj <- matchit(match.formula,
                        data = analytic11n,
                        method = "exact")
matching.obj
#> A matchit object
#>  - method: Exact matching
#>  - number of obs.: 1424 (original), 1424 (matched)
#>  - target estimand: ATT
#>  - covariates: age, sex

Matching by 3 matching variables

In the following code chunk, we further extend the matching to include three variables: age, sex, and stress. We explore the unique combinations of these variables and their distribution across levels of OA. We then perform exact matching using these three variables and display the resulting object.

var.comb <- do.call('paste0', 
                    analytic11n[, c('age', 'sex', 'stress')])
table(var.comb)
#> var.comb
#> 20-29 yearsFemaleNot too stressed         20-29 yearsFemalestressed 
#>                               157                                27 
#>   20-29 yearsMaleNot too stressed           20-29 yearsMalestressed 
#>                               147                                18 
#> 30-39 yearsFemaleNot too stressed         30-39 yearsFemalestressed 
#>                               170                                50 
#>   30-39 yearsMaleNot too stressed           30-39 yearsMalestressed 
#>                               183                                40 
#> 40-49 yearsFemaleNot too stressed         40-49 yearsFemalestressed 
#>                               142                                45 
#>   40-49 yearsMaleNot too stressed           40-49 yearsMalestressed 
#>                               141                                48 
#> 50-59 yearsFemaleNot too stressed         50-59 yearsFemalestressed 
#>                                72                                29 
#>   50-59 yearsMaleNot too stressed           50-59 yearsMalestressed 
#>                                78                                29 
#> 60-64 yearsFemaleNot too stressed         60-64 yearsFemalestressed 
#>                                18                                 6 
#>   60-64 yearsMaleNot too stressed           60-64 yearsMalestressed 
#>                                20                                 4
length(table(var.comb))
#> [1] 20
table(var.comb,analytic11n$OA)
#>                                    
#> var.comb                              0   1
#>   20-29 yearsFemaleNot too stressed 156   1
#>   20-29 yearsFemalestressed          27   0
#>   20-29 yearsMaleNot too stressed   144   3
#>   20-29 yearsMalestressed            18   0
#>   30-39 yearsFemaleNot too stressed 168   2
#>   30-39 yearsFemalestressed          49   1
#>   30-39 yearsMaleNot too stressed   178   5
#>   30-39 yearsMalestressed            37   3
#>   40-49 yearsFemaleNot too stressed 130  12
#>   40-49 yearsFemalestressed          42   3
#>   40-49 yearsMaleNot too stressed   138   3
#>   40-49 yearsMalestressed            48   0
#>   50-59 yearsFemaleNot too stressed  65   7
#>   50-59 yearsFemalestressed          22   7
#>   50-59 yearsMaleNot too stressed    67  11
#>   50-59 yearsMalestressed            23   6
#>   60-64 yearsFemaleNot too stressed  17   1
#>   60-64 yearsFemalestressed           5   1
#>   60-64 yearsMaleNot too stressed    19   1
#>   60-64 yearsMalestressed             4   0
match.formula <- as.formula("OA ~ age + sex + stress")
matching.obj <- matchit(match.formula,
                        data = analytic11n,
                        method = "exact")
matching.obj
#> A matchit object
#>  - method: Exact matching
#>  - number of obs.: 1424 (original), 1327 (matched)
#>  - target estimand: ATT
#>  - covariates: age, sex, stress

Matching by 4 matching variables

The process of matching by 4 variables involves creating combinations of the 4 variables, exploring their distributions, and performing exact matching.

var.comb <- do.call('paste0', 
                    analytic11n[, c('age', 'sex',
                                    'stress','income')])
#table(var.comb)
length(table(var.comb))
#> [1] 76
match.formula <- as.formula("OA ~ age + sex + stress + income")
matching.obj <- matchit(match.formula,
                        data = analytic11n,
                        method = "exact")
matching.obj
#> A matchit object
#>  - method: Exact matching
#>  - number of obs.: 1424 (original), 900 (matched)
#>  - target estimand: ATT
#>  - covariates: age, sex, stress, income

Matching by 5 matching variables

var.comb <- do.call('paste0', 
                    analytic11n[, c('age', 'sex',
                                    'stress','income','race')])
length(table(var.comb))
#> [1] 146
match.formula <- as.formula("OA ~ age + sex + stress + income + race")
matching.obj <- matchit(match.formula,
                        data = analytic11n,
                        method = "exact")
matching.obj
#> A matchit object
#>  - method: Exact matching
#>  - number of obs.: 1424 (original), 616 (matched)
#>  - target estimand: ATT
#>  - covariates: age, sex, stress, income, race

Matching by 6 matching variables

var.comb <- do.call('paste0', 
                    analytic11n[, c('age', 'sex',
                                    'stress','income','race','edu')])
length(table(var.comb))
#> [1] 354
match.formula <- as.formula("OA ~ age + sex + stress + income + race + edu")
matching.obj <- matchit(match.formula,
                        data = analytic11n,
                        method = "exact")
matching.obj
#> A matchit object
#>  - method: Exact matching
#>  - number of obs.: 1424 (original), 399 (matched)
#>  - target estimand: ATT
#>  - covariates: age, sex, stress, income, race, edu
OACVD.match.11n <- match.data(matching.obj)
var.names <- c("age", "sex", "stress", "income", "race", "edu")
tab1 <- CreateCatTable(var = var.names, strata= "OA", data=OACVD.match.11n,test=FALSE)
print(tab1, smd = TRUE)
#>                        Stratified by OA
#>                         0           1          SMD   
#>   n                     337         62               
#>   age (%)                                       0.565
#>      20-29 years         63 (18.7)   4 ( 6.5)        
#>      30-39 years         61 (18.1)  11 (17.7)        
#>      40-49 years        127 (37.7)  17 (27.4)        
#>      50-59 years         82 (24.3)  28 (45.2)        
#>      60-64 years          4 ( 1.2)   2 ( 3.2)        
#>      65 years and over    0 ( 0.0)   0 ( 0.0)        
#>      teen                 0 ( 0.0)   0 ( 0.0)        
#>   sex = Male (%)        211 (62.6)  29 (46.8)   0.322
#>   stress = stressed (%)  42 (12.5)  17 (27.4)   0.381
#>   income (%)                                    0.115
#>      $29,999 or less     69 (20.5)  11 (17.7)        
#>      $30,000-$49,999     57 (16.9)  13 (21.0)        
#>      $50,000-$79,999     69 (20.5)  12 (19.4)        
#>      $80,000 or more    142 (42.1)  26 (41.9)        
#>   race = White (%)      242 (71.8)  43 (69.4)   0.054
#>   edu (%)                                       0.146
#>      < 2ndary            73 (21.7)  11 (17.7)        
#>      2nd grad.            5 ( 1.5)   2 ( 3.2)        
#>      Other 2nd grad.      0 ( 0.0)   0 ( 0.0)        
#>      Post-2nd grad.     259 (76.9)  49 (79.0)

Treatment effect

Convert data to design

In the following code chunk, we create a new variable matched in the analytic.miss dataset to indicate whether an observation was included in the matched dataset OACVD.match.11n. We then create a new survey design object w.design.m that includes only the matched observations for subsequent analyses.

analytic.miss$matched <- 0
length(analytic.miss$ID) # full data
#> [1] 397173
length(OACVD.match.11n$ID) # matched data
#> [1] 399
length(analytic.miss$ID[analytic.miss$ID %in% OACVD.match.11n$ID])
#> [1] 399
analytic.miss$matched[analytic.miss$ID %in% OACVD.match.11n$ID] <- 1
table(analytic.miss$matched)
#> 
#>      0      1 
#> 396774    399
w.design0 <- svydesign(id=~1, weights=~weight, 
                      data=analytic.miss)
w.design.m <- subset(w.design0, matched == 1)

Outcome analysis

The subsequent code chunks involve fitting logistic regression models to estimate the treatment effect, both in a crude and adjusted manner, respectively. The models are fitted using the matched survey design object and the results are displayed in a readable format.

Crude
fit.outcome <- svyglm(I(CVD=="event") ~ OA,
                   design = w.design.m,
                   family = binomial(logit))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
publish(fit.outcome)
#>  Variable Units OddsRatio        CI.95  p-value 
#>        OA            3.14 [0.80;12.40]   0.1025
Adjusted
fit.outcome <- svyglm(I(CVD=="event") ~ OA + 
                        age + sex + stress + income + race + edu,
                   design = w.design.m,
                   family = binomial(logit))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
publish(fit.outcome)
#>  Variable            Units    OddsRatio                        CI.95   p-value 
#>        OA                          2.04                 [0.34;12.16]   0.43593 
#>       age      20-29 years          Ref                                        
#>                30-39 years         0.54                  [0.08;3.51]   0.51962 
#>                40-49 years  30148597.83    [7758796.64;117149345.85]   < 1e-04 
#>                50-59 years  63290825.92   [12589874.04;318170669.03]   < 1e-04 
#>                60-64 years         0.31                  [0.01;9.33]   0.49735 
#>       sex           Female          Ref                                        
#>                       Male         1.58                  [0.29;8.62]   0.59729 
#>    stress Not too stressed          Ref                                        
#>                   stressed         0.15                  [0.01;1.80]   0.13666 
#>    income  $29,999 or less          Ref                                        
#>            $30,000-$49,999         0.20                  [0.01;3.84]   0.28640 
#>            $50,000-$79,999         0.20                  [0.02;1.95]   0.16543 
#>            $80,000 or more         0.08                  [0.01;0.68]   0.02122 
#>      race        Non-white          Ref                                        
#>                      White         1.02                  [0.11;9.45]   0.98723 
#>       edu         < 2ndary          Ref                                        
#>                  2nd grad. 845233463.16 [44642866.24;16002995941.97]   < 1e-04 
#>             Post-2nd grad.  69867459.11    [9660580.05;505296971.72]   < 1e-04

Questions for the students

  • Look at all the ORs. Some of them are VERY high. Why?
  • Look at the CI in the above table. Some of them are Inf. Why?
  • Should we match matching variables in the regression?

Matching by a lot of variables

The code chunks involve performing matching using a large number of variables and estimating the treatment effect using the matched data. The process involves creating matched datasets, converting them to survey design objects, and fitting logistic regression models.

Matching part in data

match.formula <- as.formula("OA ~ age + sex + stress + married + 
                         income + race + bmi + phyact + smoke +
                         immigrate + fruit + diab + edu")
matching.obj2 <- matchit(match.formula,
                        data = analytic11n,
                        method = "exact")
matching.obj2
#> A matchit object
#>  - method: Exact matching
#>  - number of obs.: 1424 (original), 22 (matched)
#>  - target estimand: ATT
#>  - covariates: age, sex, stress, married, income, race, bmi, phyact, smoke, immigrate, fruit, diab, edu
OACVD.match.11n2 <- match.data(matching.obj2)
var.names <- c("age", "sex", "stress", "married", "income", "race", 
               "bmi", "phyact", "smoke", "immigrate", "fruit", "diab", "edu")
tab2 <- CreateCatTable(var = var.names, strata= "OA", data=OACVD.match.11n2,test=FALSE)
print(tab2, smd = TRUE)
#>                        Stratified by OA
#>                         0          1          SMD   
#>   n                     11         11               
#>   age (%)                                     <0.001
#>      20-29 years         3 (27.3)   3 (27.3)        
#>      30-39 years         1 ( 9.1)   1 ( 9.1)        
#>      40-49 years         4 (36.4)   4 (36.4)        
#>      50-59 years         3 (27.3)   3 (27.3)        
#>      60-64 years         0 ( 0.0)   0 ( 0.0)        
#>      65 years and over   0 ( 0.0)   0 ( 0.0)        
#>      teen                0 ( 0.0)   0 ( 0.0)        
#>   sex = Male (%)         6 (54.5)   6 (54.5)  <0.001
#>   stress = stressed (%)  1 ( 9.1)   1 ( 9.1)  <0.001
#>   married = single (%)   3 (27.3)   3 (27.3)  <0.001
#>   income (%)                                  <0.001
#>      $29,999 or less     1 ( 9.1)   1 ( 9.1)        
#>      $30,000-$49,999     2 (18.2)   2 (18.2)        
#>      $50,000-$79,999     2 (18.2)   2 (18.2)        
#>      $80,000 or more     6 (54.5)   6 (54.5)        
#>   race = White (%)      10 (90.9)  10 (90.9)  <0.001
#>   bmi (%)                                     <0.001
#>      Underweight         0 ( 0.0)   0 ( 0.0)        
#>      healthy weight      4 (36.4)   4 (36.4)        
#>      Overweight          7 (63.6)   7 (63.6)        
#>   phyact (%)                                  <0.001
#>      Active              3 (27.3)   3 (27.3)        
#>      Inactive            5 (45.5)   5 (45.5)        
#>      Moderate            3 (27.3)   3 (27.3)        
#>   smoke (%)                                   <0.001
#>      Never smoker        3 (27.3)   3 (27.3)        
#>      Current smoker      2 (18.2)   2 (18.2)        
#>      Former smoker       6 (54.5)   6 (54.5)        
#>   immigrate (%)                               <0.001
#>      not immigrant      10 (90.9)  10 (90.9)        
#>      > 10 years          1 ( 9.1)   1 ( 9.1)        
#>      recent              0 ( 0.0)   0 ( 0.0)        
#>   fruit (%)                                   <0.001
#>      0-3 daily serving   3 (27.3)   3 (27.3)        
#>      4-6 daily serving   6 (54.5)   6 (54.5)        
#>      6+ daily serving    2 (18.2)   2 (18.2)        
#>   diab = Yes (%)         0 ( 0.0)   0 ( 0.0)  <0.001
#>   edu (%)                                     <0.001
#>      < 2ndary            1 ( 9.1)   1 ( 9.1)        
#>      2nd grad.           0 ( 0.0)   0 ( 0.0)        
#>      Other 2nd grad.     0 ( 0.0)   0 ( 0.0)        
#>      Post-2nd grad.     10 (90.9)  10 (90.9)

Treatment effect estimation in design

Create design
analytic.miss$matched2 <- 0
length(analytic.miss$ID) # full data
#> [1] 397173
length(OACVD.match.11n2$ID) # matched data
#> [1] 22
length(analytic.miss$ID[analytic.miss$ID %in% OACVD.match.11n2$ID])
#> [1] 22
analytic.miss$matched2[analytic.miss$ID %in% OACVD.match.11n2$ID] <- 1
table(analytic.miss$matched2)
#> 
#>      0      1 
#> 397151     22
w.design0 <- svydesign(id=~1, weights=~weight, 
                      data=analytic.miss)
w.design.m2 <- subset(w.design0, matched2 == 1)
outcome analysis
fit.outcome <- svyglm(I(CVD=="event") ~ OA + age + sex + stress + married +
                         income + race + bmi + phyact + smoke +
                         immigrate + fruit + diab + edu,
                   design = w.design.m2,
                   family = binomial(logit))
publish(fit.outcome)
# Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
#   contrasts can be applied only to factors with 2 or more levels
Questions for the students
  • Why the above model not fitting?

Save data for later use

save(analytic11n, analytic2, analytic.miss, file="Data/propensityscore/cchs123c.RData")

Video content (optional)

Tip

For those who prefer a video walkthrough, feel free to watch the video below, which offers a description of an earlier version of the above content.