The mixpoissonreg package is useful for fitting and analysing regression models with overdispersed count responses. It provides two regression models for overdispersed count data:
Negative-Binomial regression models;
Poisson Inverse Gaussian regression models.
The parameters of the above models may be estimated by the Expectation-Maximization algorithm or by direct maximization of the likelihood function.
There are several functions to aid the user to check the adequacy of the fitted model, perform some inference and study influential observations.
For theoretical details we refer the reader to Barreto-Souza and Simas (2016)
Some reasons are:
glm
-like functions to fit and analyze fitted
models
Useful plots for fitted models available using R
’s
base graphics or ggplot2
Estimation of parameters via EM-algorithm
Easy to do inference on fitted models with help of
lmtest
package
Regression structure for the precision parameter
Several global and local influence measures implemented (along with corresponding plots)
Compatibility with the tidyverse
We will now delve into the reasons above in more detail.
glm
-like functions to fit and analyze fitted
modelsIf you know how to fit glm
models then you already know
how to fit mixpoissonreg
models. You use a
formula
expression for the regression with respect to mean
parameter:
If you do not provide further arguments, the mixpoissonreg
package will fit a Negative-Binomial model with a log
link function for the mean estimated via EM-algorithm. Just like you
would do with a glm
fit, you can just type fit
to get a simplified output of the fit, or get a summary, by typing
summary(fit)
:
fit
#>
#> Negative Binomial Regression - Expectation-Maximization Algorithm
#>
#> Call:
#> mixpoissonreg(formula = daysabs ~ gender + math + prog, data = Attendance)
#>
#> Coefficients modeling the mean (with log link):
#> (Intercept) gendermale math progAcademic progVocational
#> 2.707563981 -0.211066195 -0.006235448 -0.424575877 -1.252812675
#> Coefficients modeling the precision (with identity link):
#> (Intercept)
#> 1.047289
summary(fit)
#>
#> Negative Binomial Regression - Expectation-Maximization Algorithm
#>
#> Call:
#> mixpoissonreg(formula = daysabs ~ gender + math + prog, data = Attendance)
#>
#>
#> Pearson residuals:
#> RSS Min 1Q Median 3Q Max
#> 345.7857 -0.9662 -0.7356 -0.3351 0.3132 5.5512
#>
#> Coefficients modeling the mean (with link):
#> Estimate Std.error z-value Pr(>|z|)
#> (Intercept) 2.707564 0.203069 13.333 < 2e-16 ***
#> gendermale -0.211066 0.122373 -1.725 0.0846 .
#> math -0.006235 0.002491 -2.503 0.0123 *
#> progAcademic -0.424576 0.181772 -2.336 0.0195 *
#> progVocational -1.252813 0.201381 -6.221 4.94e-10 ***
#>
#> Coefficients modeling the precision (with link):
#> Estimate Std.error z-value Pr(>|z|)
#> (Intercept) 1.0473 0.1082 9.675 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Efron's pseudo R-squared: 0.1854615
#> Number of iterations of the EM algorithm = 1
If you want to fit a Poisson Inverse Gaussian regression model, just
set the model
argument to PIG
:
fit_pig <- mixpoissonreg(daysabs ~ gender + math + prog, model = "PIG",
data = Attendance)
summary(fit_pig)
#>
#> Poisson Inverse Gaussian Regression - Expectation-Maximization Algorithm
#>
#> Call:
#> mixpoissonreg(formula = daysabs ~ gender + math + prog, data = Attendance,
#> model = "PIG")
#>
#>
#> Pearson residuals:
#> RSS Min 1Q Median 3Q Max
#> 276.4232 -0.8225 -0.6386 -0.3219 0.3038 5.4969
#>
#> Coefficients modeling the mean (with link):
#> Estimate Std.error z-value Pr(>|z|)
#> (Intercept) 3.060166 0.227621 13.444 < 2e-16 ***
#> gendermale -0.244078 0.129011 -1.892 0.058502 .
#> math -0.007421 0.002641 -2.810 0.004951 **
#> progAcademic -0.719714 0.193040 -3.728 0.000193 ***
#> progVocational -1.591909 0.215375 -7.391 1.45e-13 ***
#>
#> Coefficients modeling the precision (with link):
#> Estimate Std.error z-value Pr(>|z|)
#> (Intercept) 0.7323 0.1121 6.533 6.46e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Efron's pseudo R-squared: 0.1758126
#> Number of iterations of the EM algorithm = 20
There are also several standard methods implemented for
mixpoissonreg
fitted objects: plot
,
residuals
, vcov
, influence
,
cooks.distance
, hatvalues
,
predict
, etc.
A fitted mixpoissonreg
object is compatible with the
lrtest
, waldtest
, coeftest
and
coefci
methods from the lmtest
package, which,
for instance, allows one to easily perform ANOVA-like tests for
mixpoissonreg
models.
We refer the reader to the Analyzing overdispersed count data
with the mixpoissonreg package vignette for further details
on fitting mixpoissonreg
objects.
R
’s base
graphics or ggplot2
Once you have a fitted mixpoissonreg
object, say
fit
, you can easily produce useful plots by typing
plot(fit)
:
You can also have the ggplot2
version of the above plots
by typing autoplot(fit)
:
For further details on plots of mixpoissonreg
objects,
we refer the reader to the Building
and customizing base-R diagnostic plots with the mixpoissonreg
package and Building and
customizing ggplot2-based diagnostic plots with the mixpoissonreg
package vignettes.
The EM-algorithm also tries to find the maximum-likelihood estimate of the parameters. So, why is there advantage on using the EM-algorithm? The reason is purely numerical. Sometimes the log-likelihood function has several local maxima or almost flat, which makes the direct maximization of the likelihood function converge (and making it very difficult not to obtain early convergence) before reaching its maximum. In such situations the EM-algorithm may be able to obtain estimates closer to the maximum-likelihood estimate. However, it has a drawback that it may take much longer to converge on such situations.
We refer the reader to the Supplementary Material of Barreto-Souza and Simas (2016) and to Barreto-Souza and Simas (2017) for numerical studies.
lmtest
packageWe can easily perform ANOVA-like tests using the lmtest
package. Let us perform a likelihood-ratio test against the NULL model
that only contains the intercept:
lmtest::lrtest(fit)
#> Likelihood ratio test
#>
#> Model 1: daysabs ~ gender + math + prog | 1
#> Model 2: daysabs ~ 1 | 1
#> #Df LogLik Df Chisq Pr(>Chisq)
#> 1 6 -864.15
#> 2 2 -896.47 -4 64.638 3.068e-13 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Now let us perform the Wald test against the NULL model that only contains the intercept:
lmtest::waldtest(fit)
#> Wald test
#>
#> Model 1: daysabs ~ gender + math + prog | 1
#> Model 2: daysabs ~ 1 | 1
#> Res.Df Df Chisq Pr(>Chisq)
#> 1 308
#> 2 312 -4 74.257 2.861e-15 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let us now consider a reduced model:
We can perform a likelihood ratio test of full fit
model
against the NULL of the reduced model as:
lmtest::lrtest(fit, fit_red)
#> Likelihood ratio test
#>
#> Model 1: daysabs ~ gender + math + prog | 1
#> Model 2: daysabs ~ math | 1
#> #Df LogLik Df Chisq Pr(>Chisq)
#> 1 6 -864.15
#> 2 3 -888.15 -3 47.999 2.131e-10 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the same fashion, we can perform a Wald test of full
fit
model against the NULL of the reduced model as:
lmtest::waldtest(fit, fit_red)
#> Wald test
#>
#> Model 1: daysabs ~ gender + math + prog | 1
#> Model 2: daysabs ~ math | 1
#> Res.Df Df Chisq Pr(>Chisq)
#> 1 308
#> 2 311 -3 52.537 2.301e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can obtain confidence intervals for the estimated parameters by
using the coefci
method:
The mixpoissonreg
package allows the regression models
to also have a regression structure on the precision parameter. It is
very simple to fit such models, one simply write the
formula
as formula_1 | formula_2
, where
formula_1
corresponds to the regression structure with
respect to the mean parameter and contains the response variable and
formula_2
corresponds to the regression structure with
respect to the precision parameter and does not contain
the response variable.
Consider the following example:
fit_prec <- mixpoissonreg(daysabs ~ gender + math + prog | gender + math + prog,
data = Attendance)
summary(fit_prec)
#>
#> Negative Binomial Regression - Expectation-Maximization Algorithm
#>
#> Call:
#> mixpoissonreg(formula = daysabs ~ gender + math + prog | gender +
#> math + prog, data = Attendance)
#>
#>
#> Pearson residuals:
#> RSS Min 1Q Median 3Q Max
#> 322.0142 -1.1751 -0.6992 -0.3600 0.3014 4.7178
#>
#> Coefficients modeling the mean (with link):
#> Estimate Std.error z-value Pr(>|z|)
#> (Intercept) 2.746123 0.147412 18.629 < 2e-16 ***
#> gendermale -0.245113 0.117967 -2.078 0.03773 *
#> math -0.006617 0.002317 -2.856 0.00429 **
#> progAcademic -0.425983 0.132144 -3.224 0.00127 **
#> progVocational -1.269755 0.174444 -7.279 3.37e-13 ***
#>
#> Coefficients modeling the precision (with link):
#> Estimate Std.error z-value Pr(>|z|)
#> (Intercept) 1.414227 0.343243 4.120 3.79e-05 ***
#> gendermale -0.208397 0.203692 -1.023 0.306262
#> math -0.005123 0.004181 -1.225 0.220458
#> progAcademic -1.084418 0.305479 -3.550 0.000385 ***
#> progVocational -1.422051 0.343812 -4.136 3.53e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Efron's pseudo R-squared: 0.1860886
#> Number of iterations of the EM algorithm = 11
In the abscence of link.precision
argument, a
log link will be used as link function for the precision
parameter.
There are several global influence measures implemented. See
influence
, hatvalues
,
cooks.distance
and their arguments.
The easiest way to perform global influence analysis on a
mixpoissonreg
fitted object is to obtain the plots related
to global influence:
One can also get the ggplot2
version of the above
plots:
In a similar fashion, there are several local influence measures
implemented in the local_influence
method. In the same
spirit as above, the easiest way to perform local influence analysis on
a mixpoissonreg
fitted object is to obtain plots by using
local_influence_plot
method:
One can also obtain ggplot2
versions of the above plots
by using the local_influence_autoplot
method:
For further details on global and local influence analyses of
mixpoissonreg
objects, we refer the reader to the Global and local influence analysis
with the mixpoissonreg package vignette.
tidyverse
We provide the standard methods from the broom
package,
namely augment
, glance
and tidy
,
for mixpoissonreg
objects:
augment(fit_prec)
#> # A tibble: 314 × 12
#> daysabs gender math prog .fitted .fittedlwrconf .fitteduprconf .resid
#> <dbl> <fct> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 4 male 63 Academic 5.25 4.14 6.66 -0.200
#> 2 4 male 27 Academic 6.66 5.40 8.22 -0.370
#> 3 2 female 20 Academic 8.92 7.21 11.0 -0.814
#> 4 3 female 16 Academic 9.15 7.34 11.4 -0.713
#> 5 3 female 2 Academic 10.0 7.76 13.0 -0.772
#> 6 13 female 71 Academic 6.36 5.02 8.07 0.956
#> 7 11 female 63 Academic 6.71 5.39 8.34 0.599
#> 8 7 male 3 Academic 7.81 6.04 10.1 -0.102
#> 9 10 male 51 Academic 5.68 4.58 7.05 0.660
#> 10 9 male 49 Vocational 2.48 1.86 3.30 1.86
#> # ℹ 304 more rows
#> # ℹ 4 more variables: .resfit <dbl>, .hat <dbl>, .cooksd <dbl>,
#> # .gencooksd <dbl>
glance(fit_prec)
#> # A tibble: 1 × 9
#> efron.pseudo.r2 df.null logLik AIC BIC df.residual nobs model.type
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr>
#> 1 0.186 312 -853. 1726. 1764. 304 314 NB
#> # ℹ 1 more variable: est.method <chr>
tidy(fit_prec)
#> # A tibble: 10 × 6
#> component term estimate std.error statistic p.value
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 mean (Intercept) 2.75 0.147 18.6 1.87e-77
#> 2 mean gendermale -0.245 0.118 -2.08 3.77e- 2
#> 3 mean math -0.00662 0.00232 -2.86 4.29e- 3
#> 4 mean progAcademic -0.426 0.132 -3.22 1.27e- 3
#> 5 mean progVocational -1.27 0.174 -7.28 3.37e-13
#> 6 precision (Intercept) 1.41 0.343 4.12 3.79e- 5
#> 7 precision gendermale -0.208 0.204 -1.02 3.06e- 1
#> 8 precision math -0.00512 0.00418 -1.23 2.20e- 1
#> 9 precision progAcademic -1.08 0.305 -3.55 3.85e- 4
#> 10 precision progVocational -1.42 0.344 -4.14 3.53e- 5
As seen above, the autoplot
method from
ggplot2
package is also implemented for
mixpoissonreg
objects.
Finally, we provide additional methods related to local influence
analysis: tidy_local_influence
,
local_influence_benchmarks
and
local_influence_autoplot
:
tidy_local_influence(fit_prec)
#> # A tibble: 314 × 5
#> case_weights hidden_variable mean_explanatory precision_explanatory
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.00146 0.000353 0.000533 0.00217
#> 2 0.000970 0.000268 0.000782 0.00129
#> 3 0.000712 0.000369 0.000535 0.00132
#> 4 0.000350 0.000328 0.000875 0.000130
#> 5 0.000680 0.000562 0.00101 0.000701
#> 6 0.00118 0.00387 0.00494 0.000740
#> 7 0.00125 0.00230 0.00396 0.00139
#> 8 0.00243 0.00127 0.00230 0.00185
#> 9 0.00107 0.00180 0.00266 0.00160
#> 10 0.00249 0.00325 0.000340 0.000788
#> # ℹ 304 more rows
#> # ℹ 1 more variable: simultaneous_explanatory <dbl>
local_influence_benchmarks(fit_prec)
#> # A tibble: 1 × 5
#> case_weights hidden_variable mean_explanatory precision_explanatory
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.0140 0.0206 0.0182 0.0195
#> # ℹ 1 more variable: simultaneous_explanatory <dbl>
local_influence_autoplot(fit_prec)
For further details on tidyverse
compatibility of
mixpoissonreg
objects and how to use it to produce nice
diagnostics and plots, we refer the reader to the mixpoissonreg in the
tidyverse vignette.
glm
-like
functions to fit and analyze fitted modelsR
’s base graphics
or ggplot2
lmtest
packagetidyverse