In this tutorial, we will see how to use Poisson and negative binomial regression. In practice, we use Poisson regression to model a count outcome. Note that we assume the mean is equal to the variance in Poisson. When the variance is greater than what’s assumed by the model, overdispersion occurs. Poisson regression of overdispersed data leads to under-estimated or deflated standard errors, which leads to inflated test statistics and p-values. We can use negative binomial regression to model overdispersed data.
# Design-adjusted Poisson - adjusted for covariatesfit2<-svyglm(fruit.cont ~phyact2 + age + sex + income + race + bmicat + smoke + edu, design=w.design, family=poisson)summ(fit2, confint =TRUE, digits =3, vifs =TRUE)
Observations
21623
Dependent variable
fruit.cont
Type
Survey-weighted generalized linear model
Family
poisson
Link
log
Pseudo-R² (Cragg-Uhler)
0.137
Pseudo-R² (McFadden)
0.076
AIC
97795.011
Est.
2.5%
97.5%
t val.
p
VIF
(Intercept)
1.175
1.113
1.237
37.230
0.000
NA
phyact2Active
0.241
0.210
0.272
15.211
0.000
1.346
phyact2Moderate
0.105
0.074
0.135
6.775
0.000
1.346
age40-49 years
0.041
0.010
0.071
2.623
0.009
1.447
age50-59 years
0.102
0.071
0.133
6.402
0.000
1.447
age60-64 years
0.149
0.096
0.203
5.461
0.000
1.447
sexMale
-0.136
-0.161
-0.111
-10.500
0.000
1.148
income$30,000-$49,999
0.025
-0.015
0.066
1.217
0.224
1.294
income$50,000-$79,999
0.031
-0.008
0.070
1.559
0.119
1.294
income$80,000 or more
0.057
0.018
0.096
2.874
0.004
1.294
raceWhite
0.023
-0.012
0.059
1.289
0.197
1.193
bmicatOverweight
-0.009
-0.035
0.017
-0.708
0.479
1.331
bmicatUnderweight
0.029
-0.033
0.091
0.928
0.353
1.331
smokeFormer smoker
0.094
0.059
0.128
5.352
0.000
1.474
smokeNever smoker
0.140
0.102
0.177
7.290
0.000
1.474
edu2nd grad.
0.026
-0.020
0.072
1.093
0.274
1.302
eduOther 2nd grad.
0.033
-0.024
0.090
1.136
0.256
1.302
eduPost-2nd grad.
0.129
0.085
0.173
5.807
0.000
1.302
Standard errors: Robust
Negative binomial regression
Let’s fit the negative binomial regression model using the glm function. Below, we specify the dispersion parameter (theta) is equal to 1, which suggests that the ratio of mean and variance is assumed to be 1.
require(MASS)analytic2$phyact2=relevel(analytic2$phyact, ref ="Inactive")# Negative binomial regression - crudefit3<-glm(fruit.cont ~ phyact2, data=analytic2, family =negative.binomial(theta =1))round(exp(cbind(coef(fit3), confint(fit3))),2)#> Waiting for profiling to be done...#> 2.5 % 97.5 %#> (Intercept) 3.81 3.76 3.85#> phyact2Active 1.32 1.29 1.34#> phyact2Moderate 1.15 1.12 1.17# Negative binomial regression - adjusted for covariatesfit4<-glm(fruit.cont ~phyact2 + age + sex + income + race + bmicat + smoke + edu, data=analytic2,family =negative.binomial(theta =1))round(exp(cbind(coef(fit4), confint(fit4))),2)#> Waiting for profiling to be done...#> 2.5 % 97.5 %#> (Intercept) 3.17 3.05 3.30#> phyact2Active 1.29 1.26 1.31#> phyact2Moderate 1.12 1.10 1.14#> age40-49 years 1.03 1.01 1.05#> age50-59 years 1.08 1.06 1.11#> age60-64 years 1.14 1.10 1.17#> sexMale 0.84 0.83 0.86#> income$30,000-$49,999 1.05 1.02 1.07#> income$50,000-$79,999 1.06 1.03 1.08#> income$80,000 or more 1.09 1.07 1.12#> raceWhite 0.99 0.97 1.02#> bmicatOverweight 0.98 0.96 1.00#> bmicatUnderweight 0.99 0.95 1.04#> smokeFormer smoker 1.14 1.12 1.16#> smokeNever smoker 1.20 1.17 1.23#> edu2nd grad. 1.04 1.01 1.07#> eduOther 2nd grad. 1.05 1.01 1.09#> eduPost-2nd grad. 1.13 1.10 1.16
Survey weighted negative binomial regression
Now, let’s fit the design-adjusted negative binomial regression model: