Exact Matching (CCHS)
In the following code chunk, we load the necessary R libraries for our analysis. MatchIt
is used for matching methods to find comparable control units, tableone
for creating Table 1 to describe baseline characteristics, Publish
for generating readable output of regression analysis, and survey
for analyzing complex survey samples.
Load data
In the following code chunk, we load the CCHS dataset which is related to the Canadian Community Health Survey (CCHS). We then use ls()
to list all objects in the workspace and str
to display the structure of the data frame, providing a quick overview of the data and checking for any character variables.
load("Data/propensityscore/cchs123b.RData")
ls()
#> [1] "analytic.miss" "analytic2"
str(analytic.miss) # is there any character variable?
#> 'data.frame': 397173 obs. of 22 variables:
#> $ CVD : chr "event" "no event" "no event" "no event" ...
#> $ age : chr "65 years and over" "65 years and over" "30-39 years" "65 years and over" ...
#> $ sex : chr "Female" "Female" "Male" "Female" ...
#> $ married : chr "single" "single" "not single" "single" ...
#> $ race : chr "White" "White" "White" "White" ...
#> $ edu : chr "2nd grad." "Post-2nd grad." "Post-2nd grad." "Post-2nd grad." ...
#> $ income : chr "$29,999 or less" "$29,999 or less" "$80,000 or more" "$29,999 or less" ...
#> $ bmi : Factor w/ 3 levels "Underweight",..: NA NA 2 NA 2 NA 3 NA 2 3 ...
#> $ phyact : chr "Inactive" "Inactive" "Inactive" "Inactive" ...
#> $ doctor : chr "Yes" "Yes" "Yes" "Yes" ...
#> $ stress : chr "Not too stressed" "Not too stressed" "stressed" "Not too stressed" ...
#> $ smoke : Factor w/ 3 levels "Never smoker",..: 3 1 3 3 2 2 3 1 2 2 ...
#> $ drink : Factor w/ 3 levels "Never drank",..: 2 1 2 2 2 2 3 1 2 2 ...
#> $ fruit : Factor w/ 3 levels "0-3 daily serving",..: 2 2 2 3 3 3 2 2 2 2 ...
#> $ bp : chr "No" "No" "No" "No" ...
#> $ diab : chr "No" "No" "No" "No" ...
#> $ province : Factor w/ 2 levels "South","North": 1 1 1 1 1 1 1 1 1 1 ...
#> $ weight : num 142.8 71.4 168.3 71.4 196.1 ...
#> $ cycle : Factor w/ 3 levels "11","21","31": 1 1 1 1 1 1 1 1 1 1 ...
#> $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
#> $ OA : chr "Control" "Control" "Control" "Control" ...
#> $ immigrate: Factor w/ 3 levels "not immigrant",..: 1 1 3 1 1 1 1 1 1 1 ...
Data pre-pocessing
In the following code chunk, we define a vector containing the names of variables of interest that needs to be converted to factor variables. We then convert these variables to factors, ensuring they are treated as categorical in subsequent analyses. We also recode
the Osteoarthritis (OA) variable into a numeric binary format and display the frequency table of OA before and after the transformation.
var.names <- c("age", "sex", "stress", "married", "income", "race",
"bmi", "phyact", "smoke", "doctor", "drink", "bp", "province",
"immigrate", "fruit", "diab", "edu", "CVD", "OA")
analytic.miss[var.names] <- lapply(analytic.miss[var.names] , factor)
table(analytic.miss$OA)
#>
#> Control OA
#> 314542 40943
analytic.miss$OA <- as.numeric(analytic.miss$OA=="OA")
table(analytic.miss$OA)
#>
#> 0 1
#> 314542 40943
Identify subjects with missing
In the following code chunk, we create a new variable miss
and initially assign all its values to 1 in the full dataset (that contains some missing observations). We then adjust this assignment by setting miss to 0 for observations that are also present in another complete case dataset. That means any row with miss
equal to 0 means that row has no missing observations. Finally, we display the frequency table of the miss variable to check the number of missing and non-missing observations.
analytic.miss$miss <- 1
head(analytic.miss$ID) # full data
#> [1] 1 2 3 4 5 6
head(analytic2$ID) # complete case
#> [1] 3 5 7 10 11 13
head(analytic.miss$ID[analytic.miss$ID %in% analytic2$ID])
#> [1] 3 5 7 10 11 13
# if associated with complete case, assign miss <- 0
analytic.miss$miss[analytic.miss$ID %in% analytic2$ID] <- 0
table(analytic.miss$miss)
#>
#> 0 1
#> 185613 211560
Setting Design
Unconditional design
In the following code chunk, we explore the summary of the weight variable and establish an unconditional survey design object w.design0
using the svydesign
function, which will be used for subsequent survey-weighted analyses. We then explore the summary, standard deviation, and sum of the weights within our design object.
summary(analytic.miss$weight)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.17 65.28 126.63 200.09 243.21 7154.95
w.design0 <- svydesign(id=~1, weights=~weight,
data=analytic.miss)
summary(weights(w.design0))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.17 65.28 126.63 200.09 243.21 7154.95
sd(weights(w.design0))
#> [1] 241.0279
sum(weights(w.design0))
#> [1] 79468929
Conditioning the design
In the following code chunk, we create a new survey design object w.design
by subsetting w.design0
to only include observations without missing data (miss == 0)
. We then explore the summary, standard deviation, and sum of the weights within this new design object.
Subset data (more!)
We subset the data for fast results (less computation). We will only work with cycle 1.1, and the people from Northern provinces in Canada.
Preliminary analysis
Table 1
In the following code chunk, we define a new variable vector var.names
and create a categorical table using svyCreateCatTable
to explore the distribution of age
and sex
across strata of OA
within our subsetted design object w.design1
. We then print the table with standardized mean differences (SMD) to assess the balance of these variables across strata.
var.names <- c("age", "sex")
tab0 <- svyCreateCatTable(var = var.names, strata= "OA", data=w.design1,test=FALSE)
print(tab0, smd = TRUE)
#> Stratified by OA
#> 0 1 SMD
#> n 40691.2 2095.1
#> age (%) 1.084
#> 20-29 years 10889.4 (26.8) 120.9 ( 5.8)
#> 30-39 years 12251.7 (30.1) 237.8 (11.3)
#> 40-49 years 11094.0 (27.3) 572.7 (27.3)
#> 50-59 years 5346.6 (13.1) 1092.4 (52.1)
#> 60-64 years 1109.4 ( 2.7) 71.4 ( 3.4)
#> 65 years and over 0.0 ( 0.0) 0.0 ( 0.0)
#> teen 0.0 ( 0.0) 0.0 ( 0.0)
#> sex = Male (%) 20824.6 (51.2) 1050.8 (50.2) 0.020
Treatment effect
In the following code chunk, we fit a logistic regression model using svyglm
to estimate the effect of OA
and other covariates on the binary outcome CVD
(cardiovascular disease). We then use publish
to display the results in a readable format.
fit.outcome <- svyglm(I(CVD=="event") ~ OA + age + sex + stress + married +
income + race + bmi + phyact + smoke +
immigrate + fruit + diab + edu,
design = w.design1,
family = binomial(logit))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
publish(fit.outcome)
#> Variable Units OddsRatio CI.95 p-value
#> OA 0.89 [0.17;4.59] 0.887411
#> age 20-29 years Ref
#> 30-39 years 2.62 [0.29;23.43] 0.389521
#> 40-49 years 4.89 [0.59;40.73] 0.142280
#> 50-59 years 17.95 [2.59;124.68] 0.003550
#> 60-64 years 23.95 [3.41;168.27] 0.001439
#> sex Female Ref
#> Male 1.32 [0.64;2.71] 0.456222
#> stress Not too stressed Ref
#> stressed 0.54 [0.21;1.39] 0.198815
#> married not single Ref
#> single 0.75 [0.31;1.80] 0.513807
#> income $29,999 or less Ref
#> $30,000-$49,999 0.72 [0.24;2.16] 0.556703
#> $50,000-$79,999 0.95 [0.27;3.40] 0.939104
#> $80,000 or more 0.47 [0.10;2.15] 0.332557
#> race Non-white Ref
#> White 0.33 [0.11;0.94] 0.038131
#> bmi Underweight Ref
#> healthy weight 0.29 [0.03;3.20] 0.310237
#> Overweight 0.44 [0.04;4.77] 0.503130
#> phyact Active Ref
#> Inactive 0.84 [0.30;2.40] 0.751345
#> Moderate 1.02 [0.32;3.27] 0.979528
#> smoke Never smoker Ref
#> Current smoker 0.98 [0.26;3.76] 0.981454
#> Former smoker 0.71 [0.18;2.71] 0.612518
#> immigrate not immigrant Ref
#> > 10 years 0.14 [0.03;0.78] 0.025010
#> recent 0.00 [0.00;0.00] < 1e-04
#> fruit 0-3 daily serving Ref
#> 4-6 daily serving 1.15 [0.52;2.56] 0.725722
#> 6+ daily serving 0.68 [0.17;2.71] 0.583752
#> diab No Ref
#> Yes 3.08 [0.93;10.23] 0.066677
#> edu < 2ndary Ref
#> 2nd grad. 4.12 [0.87;19.43] 0.074178
#> Other 2nd grad. 3.04 [0.63;14.67] 0.167135
#> Post-2nd grad. 3.00 [0.82;10.98] 0.096939
Matching: Estimating treatment effect
Going back to the data (not working on design here while matching)
In the following code chunk, we create a new dataset by omitting NA
values from analytic.miss
and converting it to a data frame. We then create a subset analytic11n
which includes only observations from cycle 1.1 and the Northern provinces. We display the dimensions of this subset, as well as frequency tables of OA
and a cross-tabulation of OA
and age
to understand the distribution of our target variable and a key covariate.
# Create the dataset without design features
analytic2 <- as.data.frame(na.omit(analytic.miss))
analytic11n <- subset(analytic2, cycle == 11 & province == "North")
dim(analytic11n)
#> [1] 1424 23
table(analytic11n$OA)
#>
#> 0 1
#> 1357 67
table(analytic11n$OA,analytic11n$age)
#>
#> 20-29 years 30-39 years 40-49 years 50-59 years 60-64 years
#> 0 345 432 358 177 45
#> 1 4 11 18 31 3
#>
#> 65 years and over teen
#> 0 0 0
#> 1 0 0
Matching by 1 matching variable
In the following code chunk, we perform exact matching using a single variable, age
. We define the matching formula and apply the matchit
function to create matched sets of treated and control units. The resulting matching.obj
object is displayed to summarize the matching results.
Matching by 2 matching variables
In the following code chunk, we extend the matching to include two variables, age
and sex
. We create a new variable var.comb
that concatenates these two variables and display its frequency table and the number of unique combinations. We then perform exact matching using both variables and display the resulting object.
var.comb <- do.call('paste0',
analytic11n[, c('age', 'sex')])
table(var.comb)
#> var.comb
#> 20-29 yearsFemale 20-29 yearsMale 30-39 yearsFemale 30-39 yearsMale
#> 184 165 220 223
#> 40-49 yearsFemale 40-49 yearsMale 50-59 yearsFemale 50-59 yearsMale
#> 187 189 101 107
#> 60-64 yearsFemale 60-64 yearsMale
#> 24 24
length(table(var.comb))
#> [1] 10
match.formula <- as.formula("OA ~ age + sex")
matching.obj <- matchit(match.formula,
data = analytic11n,
method = "exact")
matching.obj
#> A matchit object
#> - method: Exact matching
#> - number of obs.: 1424 (original), 1424 (matched)
#> - target estimand: ATT
#> - covariates: age, sex
Matching by 3 matching variables
In the following code chunk, we further extend the matching to include three variables: age
, sex
, and stress
. We explore the unique combinations of these variables and their distribution across levels of OA
. We then perform exact matching using these three variables and display the resulting object.
var.comb <- do.call('paste0',
analytic11n[, c('age', 'sex', 'stress')])
table(var.comb)
#> var.comb
#> 20-29 yearsFemaleNot too stressed 20-29 yearsFemalestressed
#> 157 27
#> 20-29 yearsMaleNot too stressed 20-29 yearsMalestressed
#> 147 18
#> 30-39 yearsFemaleNot too stressed 30-39 yearsFemalestressed
#> 170 50
#> 30-39 yearsMaleNot too stressed 30-39 yearsMalestressed
#> 183 40
#> 40-49 yearsFemaleNot too stressed 40-49 yearsFemalestressed
#> 142 45
#> 40-49 yearsMaleNot too stressed 40-49 yearsMalestressed
#> 141 48
#> 50-59 yearsFemaleNot too stressed 50-59 yearsFemalestressed
#> 72 29
#> 50-59 yearsMaleNot too stressed 50-59 yearsMalestressed
#> 78 29
#> 60-64 yearsFemaleNot too stressed 60-64 yearsFemalestressed
#> 18 6
#> 60-64 yearsMaleNot too stressed 60-64 yearsMalestressed
#> 20 4
length(table(var.comb))
#> [1] 20
table(var.comb,analytic11n$OA)
#>
#> var.comb 0 1
#> 20-29 yearsFemaleNot too stressed 156 1
#> 20-29 yearsFemalestressed 27 0
#> 20-29 yearsMaleNot too stressed 144 3
#> 20-29 yearsMalestressed 18 0
#> 30-39 yearsFemaleNot too stressed 168 2
#> 30-39 yearsFemalestressed 49 1
#> 30-39 yearsMaleNot too stressed 178 5
#> 30-39 yearsMalestressed 37 3
#> 40-49 yearsFemaleNot too stressed 130 12
#> 40-49 yearsFemalestressed 42 3
#> 40-49 yearsMaleNot too stressed 138 3
#> 40-49 yearsMalestressed 48 0
#> 50-59 yearsFemaleNot too stressed 65 7
#> 50-59 yearsFemalestressed 22 7
#> 50-59 yearsMaleNot too stressed 67 11
#> 50-59 yearsMalestressed 23 6
#> 60-64 yearsFemaleNot too stressed 17 1
#> 60-64 yearsFemalestressed 5 1
#> 60-64 yearsMaleNot too stressed 19 1
#> 60-64 yearsMalestressed 4 0
match.formula <- as.formula("OA ~ age + sex + stress")
matching.obj <- matchit(match.formula,
data = analytic11n,
method = "exact")
matching.obj
#> A matchit object
#> - method: Exact matching
#> - number of obs.: 1424 (original), 1327 (matched)
#> - target estimand: ATT
#> - covariates: age, sex, stress
Matching by 4 matching variables
The process of matching by 4 variables involves creating combinations of the 4 variables, exploring their distributions, and performing exact matching.
var.comb <- do.call('paste0',
analytic11n[, c('age', 'sex',
'stress','income')])
#table(var.comb)
length(table(var.comb))
#> [1] 76
match.formula <- as.formula("OA ~ age + sex + stress + income")
matching.obj <- matchit(match.formula,
data = analytic11n,
method = "exact")
matching.obj
#> A matchit object
#> - method: Exact matching
#> - number of obs.: 1424 (original), 900 (matched)
#> - target estimand: ATT
#> - covariates: age, sex, stress, income
Matching by 5 matching variables
var.comb <- do.call('paste0',
analytic11n[, c('age', 'sex',
'stress','income','race')])
length(table(var.comb))
#> [1] 146
match.formula <- as.formula("OA ~ age + sex + stress + income + race")
matching.obj <- matchit(match.formula,
data = analytic11n,
method = "exact")
matching.obj
#> A matchit object
#> - method: Exact matching
#> - number of obs.: 1424 (original), 616 (matched)
#> - target estimand: ATT
#> - covariates: age, sex, stress, income, race
Matching by 6 matching variables
var.comb <- do.call('paste0',
analytic11n[, c('age', 'sex',
'stress','income','race','edu')])
length(table(var.comb))
#> [1] 354
match.formula <- as.formula("OA ~ age + sex + stress + income + race + edu")
matching.obj <- matchit(match.formula,
data = analytic11n,
method = "exact")
matching.obj
#> A matchit object
#> - method: Exact matching
#> - number of obs.: 1424 (original), 399 (matched)
#> - target estimand: ATT
#> - covariates: age, sex, stress, income, race, edu
OACVD.match.11n <- match.data(matching.obj)
var.names <- c("age", "sex", "stress", "income", "race", "edu")
tab1 <- CreateCatTable(var = var.names, strata= "OA", data=OACVD.match.11n,test=FALSE)
print(tab1, smd = TRUE)
#> Stratified by OA
#> 0 1 SMD
#> n 337 62
#> age (%) 0.565
#> 20-29 years 63 (18.7) 4 ( 6.5)
#> 30-39 years 61 (18.1) 11 (17.7)
#> 40-49 years 127 (37.7) 17 (27.4)
#> 50-59 years 82 (24.3) 28 (45.2)
#> 60-64 years 4 ( 1.2) 2 ( 3.2)
#> 65 years and over 0 ( 0.0) 0 ( 0.0)
#> teen 0 ( 0.0) 0 ( 0.0)
#> sex = Male (%) 211 (62.6) 29 (46.8) 0.322
#> stress = stressed (%) 42 (12.5) 17 (27.4) 0.381
#> income (%) 0.115
#> $29,999 or less 69 (20.5) 11 (17.7)
#> $30,000-$49,999 57 (16.9) 13 (21.0)
#> $50,000-$79,999 69 (20.5) 12 (19.4)
#> $80,000 or more 142 (42.1) 26 (41.9)
#> race = White (%) 242 (71.8) 43 (69.4) 0.054
#> edu (%) 0.146
#> < 2ndary 73 (21.7) 11 (17.7)
#> 2nd grad. 5 ( 1.5) 2 ( 3.2)
#> Other 2nd grad. 0 ( 0.0) 0 ( 0.0)
#> Post-2nd grad. 259 (76.9) 49 (79.0)
Treatment effect
Convert data to design
In the following code chunk, we create a new variable matched in the analytic.miss
dataset to indicate whether an observation was included in the matched dataset OACVD.match.11n
. We then create a new survey design object w.design.m
that includes only the matched observations for subsequent analyses.
analytic.miss$matched <- 0
length(analytic.miss$ID) # full data
#> [1] 397173
length(OACVD.match.11n$ID) # matched data
#> [1] 399
length(analytic.miss$ID[analytic.miss$ID %in% OACVD.match.11n$ID])
#> [1] 399
analytic.miss$matched[analytic.miss$ID %in% OACVD.match.11n$ID] <- 1
table(analytic.miss$matched)
#>
#> 0 1
#> 396774 399
w.design0 <- svydesign(id=~1, weights=~weight,
data=analytic.miss)
w.design.m <- subset(w.design0, matched == 1)
Outcome analysis
The subsequent code chunks involve fitting logistic regression models to estimate the treatment effect, both in a crude and adjusted manner, respectively. The models are fitted using the matched survey design object and the results are displayed in a readable format.
Crude
Adjusted
fit.outcome <- svyglm(I(CVD=="event") ~ OA +
age + sex + stress + income + race + edu,
design = w.design.m,
family = binomial(logit))
#> Warning in eval(family$initialize): non-integer #successes in a binomial glm!
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
publish(fit.outcome)
#> Variable Units OddsRatio CI.95 p-value
#> OA 2.04 [0.34;12.16] 0.43593
#> age 20-29 years Ref
#> 30-39 years 0.54 [0.08;3.51] 0.51962
#> 40-49 years 30148597.83 [7758796.64;117149345.85] < 1e-04
#> 50-59 years 63290825.92 [12589874.04;318170669.03] < 1e-04
#> 60-64 years 0.31 [0.01;9.33] 0.49735
#> sex Female Ref
#> Male 1.58 [0.29;8.62] 0.59729
#> stress Not too stressed Ref
#> stressed 0.15 [0.01;1.80] 0.13666
#> income $29,999 or less Ref
#> $30,000-$49,999 0.20 [0.01;3.84] 0.28640
#> $50,000-$79,999 0.20 [0.02;1.95] 0.16543
#> $80,000 or more 0.08 [0.01;0.68] 0.02122
#> race Non-white Ref
#> White 1.02 [0.11;9.45] 0.98723
#> edu < 2ndary Ref
#> 2nd grad. 845233463.16 [44642866.24;16002995941.97] < 1e-04
#> Post-2nd grad. 69867459.11 [9660580.05;505296971.72] < 1e-04
Questions for the students
- Look at all the ORs. Some of them are VERY high. Why?
- Look at the CI in the above table. Some of them are Inf. Why?
- Should we match matching variables in the regression?
Matching by a lot of variables
The code chunks involve performing matching using a large number of variables and estimating the treatment effect using the matched data. The process involves creating matched datasets, converting them to survey design objects, and fitting logistic regression models.
Matching part in data
match.formula <- as.formula("OA ~ age + sex + stress + married +
income + race + bmi + phyact + smoke +
immigrate + fruit + diab + edu")
matching.obj2 <- matchit(match.formula,
data = analytic11n,
method = "exact")
matching.obj2
#> A matchit object
#> - method: Exact matching
#> - number of obs.: 1424 (original), 22 (matched)
#> - target estimand: ATT
#> - covariates: age, sex, stress, married, income, race, bmi, phyact, smoke, immigrate, fruit, diab, edu
OACVD.match.11n2 <- match.data(matching.obj2)
var.names <- c("age", "sex", "stress", "married", "income", "race",
"bmi", "phyact", "smoke", "immigrate", "fruit", "diab", "edu")
tab2 <- CreateCatTable(var = var.names, strata= "OA", data=OACVD.match.11n2,test=FALSE)
print(tab2, smd = TRUE)
#> Stratified by OA
#> 0 1 SMD
#> n 11 11
#> age (%) <0.001
#> 20-29 years 3 (27.3) 3 (27.3)
#> 30-39 years 1 ( 9.1) 1 ( 9.1)
#> 40-49 years 4 (36.4) 4 (36.4)
#> 50-59 years 3 (27.3) 3 (27.3)
#> 60-64 years 0 ( 0.0) 0 ( 0.0)
#> 65 years and over 0 ( 0.0) 0 ( 0.0)
#> teen 0 ( 0.0) 0 ( 0.0)
#> sex = Male (%) 6 (54.5) 6 (54.5) <0.001
#> stress = stressed (%) 1 ( 9.1) 1 ( 9.1) <0.001
#> married = single (%) 3 (27.3) 3 (27.3) <0.001
#> income (%) <0.001
#> $29,999 or less 1 ( 9.1) 1 ( 9.1)
#> $30,000-$49,999 2 (18.2) 2 (18.2)
#> $50,000-$79,999 2 (18.2) 2 (18.2)
#> $80,000 or more 6 (54.5) 6 (54.5)
#> race = White (%) 10 (90.9) 10 (90.9) <0.001
#> bmi (%) <0.001
#> Underweight 0 ( 0.0) 0 ( 0.0)
#> healthy weight 4 (36.4) 4 (36.4)
#> Overweight 7 (63.6) 7 (63.6)
#> phyact (%) <0.001
#> Active 3 (27.3) 3 (27.3)
#> Inactive 5 (45.5) 5 (45.5)
#> Moderate 3 (27.3) 3 (27.3)
#> smoke (%) <0.001
#> Never smoker 3 (27.3) 3 (27.3)
#> Current smoker 2 (18.2) 2 (18.2)
#> Former smoker 6 (54.5) 6 (54.5)
#> immigrate (%) <0.001
#> not immigrant 10 (90.9) 10 (90.9)
#> > 10 years 1 ( 9.1) 1 ( 9.1)
#> recent 0 ( 0.0) 0 ( 0.0)
#> fruit (%) <0.001
#> 0-3 daily serving 3 (27.3) 3 (27.3)
#> 4-6 daily serving 6 (54.5) 6 (54.5)
#> 6+ daily serving 2 (18.2) 2 (18.2)
#> diab = Yes (%) 0 ( 0.0) 0 ( 0.0) <0.001
#> edu (%) <0.001
#> < 2ndary 1 ( 9.1) 1 ( 9.1)
#> 2nd grad. 0 ( 0.0) 0 ( 0.0)
#> Other 2nd grad. 0 ( 0.0) 0 ( 0.0)
#> Post-2nd grad. 10 (90.9) 10 (90.9)
Treatment effect estimation in design
Create design
analytic.miss$matched2 <- 0
length(analytic.miss$ID) # full data
#> [1] 397173
length(OACVD.match.11n2$ID) # matched data
#> [1] 22
length(analytic.miss$ID[analytic.miss$ID %in% OACVD.match.11n2$ID])
#> [1] 22
analytic.miss$matched2[analytic.miss$ID %in% OACVD.match.11n2$ID] <- 1
table(analytic.miss$matched2)
#>
#> 0 1
#> 397151 22
w.design0 <- svydesign(id=~1, weights=~weight,
data=analytic.miss)
w.design.m2 <- subset(w.design0, matched2 == 1)
outcome analysis
fit.outcome <- svyglm(I(CVD=="event") ~ OA + age + sex + stress + married +
income + race + bmi + phyact + smoke +
immigrate + fruit + diab + edu,
design = w.design.m2,
family = binomial(logit))
publish(fit.outcome)
# Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
Questions for the students
- Why the above model not fitting?
Save data for later use
Video content (optional)
For those who prefer a video walkthrough, feel free to watch the video below, which offers a description of an earlier version of the above content.