GEE

In this section, we will learn about generalized estimating equation (GEE). GEE is another popular method for modeling longitudinal or clustered data.

Note

GEE is a population-averaged (e.g., marginal) model, whereas mixed effects models are subject/cluster-specific. For example, we can use GEE when we are interested in exploring the population average effect. On the other hand, when we are interested in individual/cluster-specific effects, we can use the mixed effects models.
Also, in contrast to the mixed effects models where we assume error term and beta coefficients for subject \(i\) both follow normal distribution, GEE is distribution free. However, in GEE, we assume some correlation structures for within subjects/clusters. Hence, we can use GEE for non-normal data such as skewed data, binary data, or count data.

# Load required packages
knitr::opts_chunk$set(echo = TRUE)
require(HSAUR2)
library(gee)
library(MuMIn)
library(geepack)
library("lme4")
require(afex)

Data Description

In addition to BtheB dataset in part 1, we used respiratory data from HSAUR2 package to demonstrate the analysis with non-normal responses:

The response variable in this dataset is status (the respiratory status), which is a binary response
Other covariates are: treatment, age, gender, the study center
The response has been measured at 0, 1, 2, 3, 4 mths for each subject

data("respiratory", package = "HSAUR2")
head(respiratory)

Methods for Non-normal Distributions

ref: (Hothorn and Everitt 2014) Section 13.2
In addition to normally distributed response variables, the response variable can also follow non-normal distributions, such as binary or count responses.
We have learned logistic regression or Poisson regression (glm) to analyze binary and count data but they are not considering the “repeated measurement” structure.
The consequence of ignoring the longitudinal/repeated measurements structure is to have the wrong estimated standard errors for regression coefficients. Hence, our inference will be invalid. However, the glm still gives consistent estimated beta coefficients.
There are many correlation structures in modelling GEE.
With non-normal responses, different assumptions about these correlation structures can lead to different interpretations of the beta coefficients. Here, we will introduce two different GEE models: marginal model and conditional model.

Marginal Models

ref: (Hothorn and Everitt 2014) Section 13.2.1

Repeated measurement and longitudinal data has responses taken at different time points. Therefore, we could simply review them as many series of cross-sectional data. Each cross-sectional data can be analyzed using glm introduced in previous lectures. Then a correlation structure can be assumed to connect different “cross-sections”. The common correlation structures are:

An independent structure

An independent structure which assumes the repeated measures are independent. An example of independent structure is basically is an identity matrix:

\[ \left(\begin{array}{ccc} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{array}\right) \]

An exchangeable correlation

An exchangeable correlation assumes that for the each individual, the correlation between each pair of repeated measurements is the same, i.e., \(Cor(Y_{ij},Y_{ik})=\rho\). In the correlation matrix, only \(\rho\) is unknown. Therefore, it is a single parameter working correlation matrix. An example of exchangeable correlation matrix is:

\[ \left(\begin{array}{ccc} 1 & \rho & \rho\\ \rho & 1 & \rho\\ \rho & \rho & 1 \end{array}\right) \]

An AR-1 autoregressive correlation

Defined as \(Cor(Y_{ij},Y_{ik})=\rho^{|k-j|}\). If two measurements are taken at two closer time points the correlation is higher than these taken at two farther apart time points. It is also a single parameter working correlation matrix. An example of AR-1 correlation matrix is:

\[ \left(\begin{array}{ccc} 1 & \rho & \rho^2\\ \rho & 1 & \rho\\ \rho^2 & \rho & 1 \end{array}\right) \]

An unstructured correlation

Different pairs of observation for each individual have different correlations \(Cor(Y_{ij},Y_{ik})=\rho_{jk}\). Assume that each individual has \(K\) pairs of measurements, it is a \(K(K-1)/2\) parameters working correlation matrix. An example of unstructured correlation matrix is:

\[ \left(\begin{array}{ccc} 1 & \rho_{12} & \rho_{13}\\ \rho_{12} & 1 & \rho_{23}\\ \rho_{13} & \rho_{23} & 1 \end{array}\right) \]

Sometimes, specifying a “right” correlation matrix is hard. However, the marginal model (usually we use GEE) gives us consistent estimated coefficients even with misspecified correlation structure.

Conditional Models

ref: (Hothorn and Everitt 2014) Section 13.2.2
In GEE marginal models, the estimated regression coefficients are marginal (or population-averaged) effects. Therefore, the interpretation are at population-level. It is almost impossible to make inference on any specific individual or cluster.
One solution is to do conditional models. The random effect approach in the part 1 can be extended to non-Gaussian response.

GEE models

After the short introduction of two models, let’s take a look at real examples

ref: (Hothorn and Everitt 2014) Section 13.3
ref: (Faraway 2016) Section 10.2

Binary response

We started with binary response using respiratory dataset:

library(gee)
data("respiratory", package = "HSAUR2")
## Data manipulation
resp <- subset(respiratory, month > "0")
resp$baseline <- rep(subset(respiratory, month == "0")$status, rep(4, 111))
## Change the response to 0 and 1
resp$nstat <- as.numeric(resp$status == "good")
resp$month <- resp$month[, drop = TRUE]

Now we will fit a regular glm, i.e., model without random effect or any correlation structures. For binary outcomes, the estimated coefficients are log odds.

## Regular GLM 
resp_glm <- glm(status ~ centre + treatment + gender + baseline+ age, 
                data = resp, family = "binomial")
summary(resp_glm)
#> 
#> Call:
#> glm(formula = status ~ centre + treatment + gender + baseline + 
#>     age, family = "binomial", data = resp)
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -2.3146  -0.8551   0.4336   0.8953   1.9246  
#> 
#> Coefficients:
#>                     Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)        -0.900171   0.337653  -2.666  0.00768 ** 
#> centre2             0.671601   0.239567   2.803  0.00506 ** 
#> treatmenttreatment  1.299216   0.236841   5.486 4.12e-08 ***
#> gendermale          0.119244   0.294671   0.405  0.68572    
#> baselinegood        1.882029   0.241290   7.800 6.20e-15 ***
#> age                -0.018166   0.008864  -2.049  0.04043 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 608.93  on 443  degrees of freedom
#> Residual deviance: 483.22  on 438  degrees of freedom
#> AIC: 495.22
#> 
#> Number of Fisher Scoring iterations: 4

Now we will fit GEE with independent correlation structure:

## GEE with identity matrix
resp_gee.in <- gee(nstat ~ centre + treatment + gender + baseline + age, 
                 data = resp, family = "binomial", 
                 id = subject,corstr = "independence", 
                 scale.fix = TRUE, scale.value = 1)
#> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#>        (Intercept)            centre2 treatmenttreatment         gendermale 
#>        -0.90017133         0.67160098         1.29921589         0.11924365 
#>       baselinegood                age 
#>         1.88202860        -0.01816588
summary(resp_gee.in)
#> 
#>  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
#>  gee S-function, version 4.13 modified 98/01/27 (1998) 
#> 
#> Model:
#>  Link:                      Logit 
#>  Variance to Mean Relation: Binomial 
#>  Correlation Structure:     Independent 
#> 
#> Call:
#> gee(formula = nstat ~ centre + treatment + gender + baseline + 
#>     age, id = subject, data = resp, family = "binomial", corstr = "independence", 
#>     scale.fix = TRUE, scale.value = 1)
#> 
#> Summary of Residuals:
#>         Min          1Q      Median          3Q         Max 
#> -0.93134415 -0.30623174  0.08973552  0.33018952  0.84307712 
#> 
#> 
#> Coefficients:
#>                       Estimate  Naive S.E.   Naive z Robust S.E.   Robust z
#> (Intercept)        -0.90017133 0.337653052 -2.665965  0.46032700 -1.9555041
#> centre2             0.67160098 0.239566599  2.803400  0.35681913  1.8821889
#> treatmenttreatment  1.29921589 0.236841017  5.485603  0.35077797  3.7038127
#> gendermale          0.11924365 0.294671045  0.404667  0.44320235  0.2690501
#> baselinegood        1.88202860 0.241290221  7.799854  0.35005152  5.3764332
#> age                -0.01816588 0.008864403 -2.049306  0.01300426 -1.3969169
#> 
#> Estimated Scale Parameter:  1
#> Number of Iterations:  1
#> 
#> Working Correlation
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    1    0    0
#> [3,]    0    0    1    0
#> [4,]    0    0    0    1

This model assumes an independent correlation structure, the output will be equal to glm.
The outputs started from a summary of residuals
The estimated coefficients are the same as GLM. For binary outcome, you may still interpret them as log odds. Naive SE and z value are estimated directly from this model. Robust SE and z are sandwich estimates.
The difference between naive and robust indicates that the correlation structure may not be good.
Working Correlation is the correlation structure estimated from the data (identity matrix for independence).

Let fit the GEE model with exchangeable correlation structure:

## GEE with exchangeable matrix
resp_gee.ex <- gee(nstat ~ centre + treatment + gender + baseline+ age, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "exchangeable", 
                 scale.fix = TRUE, scale.value = 1)
#> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#>        (Intercept)            centre2 treatmenttreatment         gendermale 
#>        -0.90017133         0.67160098         1.29921589         0.11924365 
#>       baselinegood                age 
#>         1.88202860        -0.01816588
summary(resp_gee.ex)
#> 
#>  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
#>  gee S-function, version 4.13 modified 98/01/27 (1998) 
#> 
#> Model:
#>  Link:                      Logit 
#>  Variance to Mean Relation: Binomial 
#>  Correlation Structure:     Exchangeable 
#> 
#> Call:
#> gee(formula = nstat ~ centre + treatment + gender + baseline + 
#>     age, id = subject, data = resp, family = "binomial", corstr = "exchangeable", 
#>     scale.fix = TRUE, scale.value = 1)
#> 
#> Summary of Residuals:
#>         Min          1Q      Median          3Q         Max 
#> -0.93134415 -0.30623174  0.08973552  0.33018952  0.84307712 
#> 
#> 
#> Coefficients:
#>                       Estimate Naive S.E.    Naive z Robust S.E.   Robust z
#> (Intercept)        -0.90017133  0.4784634 -1.8813796  0.46032700 -1.9555041
#> centre2             0.67160098  0.3394723  1.9783676  0.35681913  1.8821889
#> treatmenttreatment  1.29921589  0.3356101  3.8712064  0.35077797  3.7038127
#> gendermale          0.11924365  0.4175568  0.2855747  0.44320235  0.2690501
#> baselinegood        1.88202860  0.3419147  5.5043802  0.35005152  5.3764332
#> age                -0.01816588  0.0125611 -1.4462014  0.01300426 -1.3969169
#> 
#> Estimated Scale Parameter:  1
#> Number of Iterations:  1
#> 
#> Working Correlation
#>           [,1]      [,2]      [,3]      [,4]
#> [1,] 1.0000000 0.3359883 0.3359883 0.3359883
#> [2,] 0.3359883 1.0000000 0.3359883 0.3359883
#> [3,] 0.3359883 0.3359883 1.0000000 0.3359883
#> [4,] 0.3359883 0.3359883 0.3359883 1.0000000

This model assumes an exchangeable correlation structure.
The outputs started from a summary of residuals
The estimated coefficients are the same GLM. For binary outcome, you may still interpret them as log odds. Naive S.E and z value are estimated directly from this model. Robust SE and z are sandwich estimates
The difference between naive and robust is smaller, which indicates that the correlation structure is better specified.
Working Correlation is the correlation structure estimated from the data

Let’s check the estimated coefficients from all three models.

summary(resp_glm)$coefficients
#>                       Estimate  Std. Error    z value     Pr(>|z|)
#> (Intercept)        -0.90017133 0.337652992 -2.6659658 7.676750e-03
#> centre2             0.67160098 0.239566555  2.8034004 5.056684e-03
#> treatmenttreatment  1.29921589 0.236840962  5.4856047 4.120574e-08
#> gendermale          0.11924365 0.294671000  0.4046671 6.857223e-01
#> baselinegood        1.88202860 0.241290163  7.7998563 6.197770e-15
#> age                -0.01816588 0.008864401 -2.0493065 4.043215e-02
summary(resp_gee.in)$coefficients
#>                       Estimate  Naive S.E.   Naive z Robust S.E.   Robust z
#> (Intercept)        -0.90017133 0.337653052 -2.665965  0.46032700 -1.9555041
#> centre2             0.67160098 0.239566599  2.803400  0.35681913  1.8821889
#> treatmenttreatment  1.29921589 0.236841017  5.485603  0.35077797  3.7038127
#> gendermale          0.11924365 0.294671045  0.404667  0.44320235  0.2690501
#> baselinegood        1.88202860 0.241290221  7.799854  0.35005152  5.3764332
#> age                -0.01816588 0.008864403 -2.049306  0.01300426 -1.3969169
summary(resp_gee.ex)$coefficients
#>                       Estimate Naive S.E.    Naive z Robust S.E.   Robust z
#> (Intercept)        -0.90017133  0.4784634 -1.8813796  0.46032700 -1.9555041
#> centre2             0.67160098  0.3394723  1.9783676  0.35681913  1.8821889
#> treatmenttreatment  1.29921589  0.3356101  3.8712064  0.35077797  3.7038127
#> gendermale          0.11924365  0.4175568  0.2855747  0.44320235  0.2690501
#> baselinegood        1.88202860  0.3419147  5.5043802  0.35005152  5.3764332
#> age                -0.01816588  0.0125611 -1.4462014  0.01300426 -1.3969169
# Same estimated coefficients but different SEs

GEE with identity matrix is the same as GLM model. If we change the correlation structure to exchangeable does not change the beta estimates, but the naive SEs are closer to Robust SE, which indicates that the exchangeable correlation structure is a better reflection of the correlation structures.

Gaussian response

GEE can also be applied to Gaussian response

library(gee)
library(HSAUR2)
BtheB$subject <- factor(rownames(BtheB))
nobs <- nrow(BtheB)
BtheB_long <- reshape(BtheB, idvar = "subject", 
                      varying = c("bdi.2m", "bdi.3m", "bdi.5m", "bdi.8m"), 
                      direction = "long")
BtheB_long$time <- rep(c(2, 3, 5, 8), rep(nobs, 4))
osub <- order(as.integer(BtheB_long$subject))
BtheB_long <- BtheB_long[osub,]
btb_gee.ind <- gee(bdi ~ bdi.pre + treatment + length + drug, 
               data = BtheB_long, id = subject, 
               family = gaussian, corstr = "independence")
#> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#>    (Intercept)        bdi.pre treatmentBtheB      length>6m        drugYes 
#>      3.5686314      0.5818494     -3.2372285      1.4577182     -3.7412982
summary(btb_gee.ind)
#> 
#>  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
#>  gee S-function, version 4.13 modified 98/01/27 (1998) 
#> 
#> Model:
#>  Link:                      Identity 
#>  Variance to Mean Relation: Gaussian 
#>  Correlation Structure:     Independent 
#> 
#> Call:
#> gee(formula = bdi ~ bdi.pre + treatment + length + drug, id = subject, 
#>     data = BtheB_long, family = gaussian, corstr = "independence")
#> 
#> Summary of Residuals:
#>         Min          1Q      Median          3Q         Max 
#> -21.6497810  -5.8485100   0.1131663   5.5838383  28.1871039 
#> 
#> 
#> Coefficients:
#>                  Estimate Naive S.E.   Naive z Robust S.E.   Robust z
#> (Intercept)     3.5686314  1.4833349  2.405816  2.26947617  1.5724472
#> bdi.pre         0.5818494  0.0563904 10.318235  0.09156455  6.3545274
#> treatmentBtheB -3.2372285  1.1295569 -2.865928  1.77459534 -1.8242066
#> length>6m       1.4577182  1.1380277  1.280916  1.48255866  0.9832449
#> drugYes        -3.7412982  1.1766321 -3.179667  1.78271179 -2.0986557
#> 
#> Estimated Scale Parameter:  79.25813
#> Number of Iterations:  1
#> 
#> Working Correlation
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    0    0    0
#> [2,]    0    1    0    0
#> [3,]    0    0    1    0
#> [4,]    0    0    0    1
# require(Publish)
# publish(btb_gee.ind)

This model assumes an independent correlation structure.
The outputs started from a summary of residuals
The estimated coefficient will be interpreted the same way as linear model. Naive S.E and z value are estimated directly from this model. Robust SE and z are sandwich estimates
Working Correlation is the correlation struture estimated from the data (identity matrix for indepedence)

With exchangeable correlation matrix:

btb_gee.ex <- gee(bdi ~ bdi.pre + treatment + length + drug,
                data = BtheB_long, id = subject, 
                family = gaussian, corstr = "exchangeable")
#> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#>    (Intercept)        bdi.pre treatmentBtheB      length>6m        drugYes 
#>      3.5686314      0.5818494     -3.2372285      1.4577182     -3.7412982
summary(btb_gee.ex)
#> 
#>  GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
#>  gee S-function, version 4.13 modified 98/01/27 (1998) 
#> 
#> Model:
#>  Link:                      Identity 
#>  Variance to Mean Relation: Gaussian 
#>  Correlation Structure:     Exchangeable 
#> 
#> Call:
#> gee(formula = bdi ~ bdi.pre + treatment + length + drug, id = subject, 
#>     data = BtheB_long, family = gaussian, corstr = "exchangeable")
#> 
#> Summary of Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -23.955980  -6.643864  -1.109741   4.257688  25.452310 
#> 
#> 
#> Coefficients:
#>                  Estimate Naive S.E.     Naive z Robust S.E.   Robust z
#> (Intercept)     3.0231602 2.30390185  1.31219140  2.23204410  1.3544357
#> bdi.pre         0.6479276 0.08228567  7.87412417  0.08351405  7.7583066
#> treatmentBtheB -2.1692863 1.76642861 -1.22806339  1.73614385 -1.2494854
#> length>6m      -0.1112910 1.73091679 -0.06429596  1.55092705 -0.0717577
#> drugYes        -2.9995608 1.82569913 -1.64296559  1.73155411 -1.7322940
#> 
#> Estimated Scale Parameter:  81.7349
#> Number of Iterations:  5
#> 
#> Working Correlation
#>           [,1]      [,2]      [,3]      [,4]
#> [1,] 1.0000000 0.6757951 0.6757951 0.6757951
#> [2,] 0.6757951 1.0000000 0.6757951 0.6757951
#> [3,] 0.6757951 0.6757951 1.0000000 0.6757951
#> [4,] 0.6757951 0.6757951 0.6757951 1.0000000
#publish(btb_gee.ex)

The interpretation of estimated coefficients are similar to LM. Naive S.E and z value are estimated directly from this model. Robust SE and z are sandwich estimates
Working Correlation is the correlation structure estimated from the data
When we change the structure of correlation, the estimates and naive SE and z changed. A closer naive SE to robust SE indicates that the correlation structure is better specified.

Compare with Random Effects

ref: (Hothorn and Everitt 2014) Section 13.4 We then use the conditional models (i.e., adding random effect) with non-normal response

Let us compare the GEE models with the mixed effect models.

## Generalized mixed effect model model
library("lme4")
resp_lmer <- glmer(nstat ~ baseline + month + treatment + 
                   gender + age + centre + 
                    (1 | subject), family = binomial(), 
                  data = resp)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.197453 (tol = 0.002, component 1)

require(afex)
resp_afex <- mixed(nstat ~ baseline + month + treatment + 
                   gender + age + centre + 
                    (1 | subject), family = binomial(), 
                  data = resp, method = "LRT")
#> Contrasts set to contr.sum for the following variables: baseline, month, treatment, gender, centre, subject
#> Numerical variables NOT centered on 0: age
#> If in interactions, interpretation of lower order (e.g., main) effects difficult.
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.00495318 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0047889 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0466122 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0949357 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.041661 (tol = 0.002, component 1)

## GEE model
resp_gee3 <- gee(nstat ~ baseline + month + treatment + 
                   gender + age + centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "exchangeable", 
                 scale.fix = TRUE, scale.value = 1)
#> Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
#> running glm to get initial regression estimate
#>        (Intercept)       baselinegood            month.L            month.Q 
#>        -0.90363465         1.88945124        -0.14372490        -0.02455122 
#>            month.C treatmenttreatment         gendermale                age 
#>        -0.23255161         1.30410916         0.11969528        -0.01823703 
#>            centre2 
#>         0.67417628
library(MuMIn)
library(geepack)
resp_gee4 <- geeglm(nstat ~ baseline + month + treatment + 
                   gender + age + centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "exchangeable", 
                 scale.fix = TRUE)
resp_gee5 <- geeglm(nstat ~ baseline + month + treatment + 
                   gender + age + centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "independence", 
                 scale.fix = TRUE)
resp_gee6 <- geeglm(nstat ~ baseline + month + treatment + 
                   gender + age + centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "ar1", 
                 scale.fix = TRUE)
QIC(resp_gee4)
#>        QIC       QICu  Quasi Lik        CIC     params       QICC 
#>  510.94098  499.71690 -240.85845   14.61204    9.00000  513.14098
QIC(resp_gee5)
#>        QIC       QICu  Quasi Lik        CIC     params       QICC 
#>  511.14981  499.69791 -240.84895   14.72595    9.00000  512.93199
QIC(resp_gee6)
#>        QIC       QICu  Quasi Lik        CIC     params       QICC 
#>  511.81242  500.27657 -241.13829   14.76792    9.00000  514.01242
# Smaller QIC values for correlation structure represents better models
# i.e., "exchangeable" in our case

resp_gee4a <- geeglm(nstat ~ month + treatment + gender + age + centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "exchangeable", 
                 scale.fix = TRUE)

resp_gee4b <- geeglm(nstat ~ treatment + gender + age + centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "exchangeable", 
                 scale.fix = TRUE)

resp_gee4c <- geeglm(nstat ~ gender + age + centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "exchangeable", 
                 scale.fix = TRUE)

resp_gee4d <- geeglm(nstat ~ age + centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "exchangeable", 
                 scale.fix = TRUE)

resp_gee4e <- geeglm(nstat ~ centre, 
                 data = resp, family = "binomial", 
                 id = subject, corstr = "exchangeable", 
                 scale.fix = TRUE)
QIC(resp_gee4)
#>        QIC       QICu  Quasi Lik        CIC     params       QICC 
#>  510.94098  499.71690 -240.85845   14.61204    9.00000  513.14098
QIC(resp_gee4a)[2]
#>   QICu 
#> 567.35
QIC(resp_gee4b)[2]
#>     QICu 
#> 562.6247
QIC(resp_gee4c)[2]
#>     QICu 
#> 585.3099
QIC(resp_gee4d)[2]
#>     QICu 
#> 586.1581
QIC(resp_gee4e)[2]
#>     QICu 
#> 592.3437
QIC(resp_gee4d)[2]
#>     QICu 
#> 586.1581
# Covariates are selected based on the QICu criteria

## compare estimates (conditional vs. marginal)
summary(resp_lmer)$coefficients # Model failed to converge
#>                       Estimate Std. Error    z value     Pr(>|z|)
#> (Intercept)        -1.65464822 0.77621551 -2.1316866 3.303262e-02
#> baselinegood        3.08898642 0.59858787  5.1604561 2.463489e-07
#> month.L            -0.20348428 0.27957512 -0.7278340 4.667152e-01
#> month.Q            -0.02821472 0.27907476 -0.1011009 9.194703e-01
#> month.C            -0.35569321 0.28084908 -1.2664923 2.053369e-01
#> treatmenttreatment  2.16621998 0.55157850  3.9273104 8.590108e-05
#> gendermale          0.23836172 0.66606490  0.3578656 7.204439e-01
#> age                -0.02557199 0.01993989 -1.2824544 1.996833e-01
#> centre2             1.03849559 0.54182606  1.9166586 5.528132e-02
summary(resp_afex)$coefficients # Model failed to converge
#>                Estimate Std. Error    z value     Pr(>|z|)
#> (Intercept)  1.61366355 0.79597814  2.0272712 4.263469e-02
#> baseline1   -1.55321951 0.30467425 -5.0979678 3.433192e-07
#> month1       0.21660822 0.24441639  0.8862263 3.754956e-01
#> month2      -0.17701748 0.24250556 -0.7299523 4.654194e-01
#> month3       0.21559489 0.24440392  0.8821253 3.777090e-01
#> treatment1  -1.09162957 0.28098692 -3.8849836 1.023368e-04
#> gender1     -0.10313396 0.33932808 -0.3039358 7.611768e-01
#> age         -0.02576276 0.02031564 -1.2681244 2.047535e-01
#> centre1     -0.52829986 0.27639119 -1.9114207 5.595053e-02
summary(resp_gee3)$coefficients # Marginalized model
#>                       Estimate Naive S.E.    Naive z Robust S.E.   Robust z
#> (Intercept)        -0.90565507 0.47940039 -1.8891413  0.46042640 -1.9669920
#> baselinegood        1.86665412 0.34220231  5.4548261  0.35023043  5.3297885
#> month.L            -0.14330949 0.18044542 -0.7941986  0.18210027 -0.7869812
#> month.Q            -0.02448267 0.18034926 -0.1357514  0.18329974 -0.1335663
#> month.C            -0.23187407 0.18073662 -1.2829391  0.15929035 -1.4556693
#> treatmenttreatment  1.28311415 0.33592099  3.8196903  0.35055323  3.6602548
#> gendermale          0.11513183 0.41851611  0.2750953  0.44135628  0.2608592
#> age                -0.01785399 0.01258158 -1.4190585  0.01299686 -1.3737157
#> centre2             0.68481139 0.34010911  2.0135050  0.35681635  1.9192265
summary(resp_gee4)$coefficients # Marginalized model

The significance of variables are similar in both variables, but the estimated coefficients are larger in generalized mixed effect model. However, it does not mean the estimated coefficients are inconsistent. Instead, two models are estimating different parameters. Remember, the mixed effect model is conditional on random effects, while the GEE is a marginalized model.

Video content (optional)

Tip

For those who prefer a video walkthrough, feel free to watch the video below, which offers a description of an earlier version of the above content.

Reference

Faraway, Julian J. 2016. Extending the linear model with R: generalized linear, mixed effects and nonparametric regression models. CRC press.

Hothorn, Torsten, and Brian S Everitt. 2014. A Handbook of Statistical Analyses Using r. CRC press.