American Journal of Mathematics and Statistics
p-ISSN: 2162-948X e-ISSN: 2162-8475
2018; 8(6): 179-183
doi:10.5923/j.ajms.20180806.03

Obubu Maxwell1, Babalola A. Mayowa2, Ikediuwa U. Chinedu1, Amadi E. Peace3
1Department of Statistics, Nnamdi Azikiwe University, Awka, Nigeria
2Department of Statistics, University of Ilorin, Ilorin, Nigeria
3Department of Statistics, Abia State Polytechnic, Aba, Nigeria
Correspondence to: Obubu Maxwell, Department of Statistics, Nnamdi Azikiwe University, Awka, Nigeria.
| Email: | ![]()  | 
Copyright © 2018 The Author(s). Published by Scientific & Academic Publishing.
This work is licensed under the Creative Commons Attribution International License (CC BY). 
                    	http://creativecommons.org/licenses/by/4.0/
                    	
Count Data Models allow for regression-type analyses when the dependent variable of interest is a numerical count. They can be used to estimate the effect of a policy intervention either on the average rate or on the probability of no event, a single event, or multiple events. The mostly used distribution for modeling count data is the Poisson distribution (Horim and Levy; 1981) which assume equidispersion (Variance is equal to the mean). Since observed count data often exhibit over or under dispersion, the Poisson model becomes less ideal for modeling. To deal with a wide range of dispersion levels, Negative Binomial Regression, Generalized Poisson Regression, Poisson Regression, and lately Conway-Maxwell-Poisson (COM-Poisson) Regression can be used as alternative regression models. We compared the Generalized Poisson regression to all other regression models and also stated their advantages and usefulness. Data were analyzed using these four methods, the results from the four methods are compared using the Akaike Information Criterion (AIC) and Bayesian Information Criterion with the Generalized Poisson Regression having the smallest AIC and BIC values. The Generalized Poisson Regression Model was considered a better model when analyzing road traffic crashes for the data set considered.
Keywords: Over-dispersion, Count Data, Negative Binomial Regression, Generalized Poisson Regression, Conway-Maxwell Poisson, Akaike Information Criterion, Equidispersion
Cite this paper: Obubu Maxwell, Babalola A. Mayowa, Ikediuwa U. Chinedu, Amadi E. Peace, Modelling Count Data; A Generalized Linear Model Framework, American Journal of Mathematics and Statistics, Vol. 8 No. 6, 2018, pp. 179-183. doi: 10.5923/j.ajms.20180806.03.
For 
. The mean and variance of this distribution can be shown to be 
. Since the mean is equal to the variance, any factor that affects one will  also affect the other. Thus, the usual assumption of homoscedasticity would not be appropriate for Poisson data.Suppose that we have a sample of n observations 
 which can be treated as realizations of independent Poisson random variables, with 
 and suppose that we want to let the mean µi (and therefore the variance) depend on a vector of explanatory variables 
 [13-15]. We could entertain a simple linear model of the form 
but this model has the disadvantage that the linear predictor on the right hand side can assume any real value, whereas the Poisson mean on the left hand side, which represents an expected count, has to be non-negative. A straightforward solution to this problem is to model instead the logarithm of the mean using a linear model. Thus, we take logs calculating 
and assume that the transformed mean follows a linear model 
Thus, we consider a generalized linear model with link log. Combining these two steps in one we can write the log-linear model as 
In this model the regression coefficient 
 represents the expected change in the log of the mean per unit change in the predictor 
 . In other words increasing 
 by one unit is associated with an increase of 
 in the log of the mean. Exponentiating the above equation, we obtain a multiplicative model for the mean itself: 
In this model, an exponentiated regression coefficient 
 represents a multiplicative effect of the 
 predictor on the mean. Increasing 
  by one unit multiplies the mean by a factor  
. A further advantage of using the log link stems from the empirical observation that with count data the effects of predictors are often multiplicative rather than additive [16]. That is, one typically observes small effects for small counts, and large effects for large counts. If the effect is in fact proportional to the count, working in the log scale leads to a much simpler model.The Likelihood function for the Poisson model is;
Where 
 is a normalizing constant defined by
The domain of admissible parameters for which defines a probability distribution is 
. The introduction of the second parameter ν allows for either sub or super-linear growth of the ratio 
, and allows X  to have variance either less than or greater than it’s mean. Of course, the mean of 
  is not, in general, λ. Clearly, in the case where 
 has the Poisson distribution 
 and the normalizing constant 
. Note, other choices of ν also give rise to well-known distributions. For example, in the case where 
 and 
, X has a geometric distribution, with 
. In the limit 
, X converges in distribution to a Bernoulli random variable with mean 
 and lim 
. In general, of course, the normalizing constant 
  does not permit such a neat, closed-form expression. Asymptotic results are available, however. Gillispie and Green [17] prove that, for fixed ν, 
As 
, confirming a conjecture made by Shmueli et al [18-19]. This asymptotic result may also be used to obtain asymptotic results for the probability generating function of 
, since it may be easily seen that 
, as well as under-dispersion, 
. Suppose is a count response variable that follows a generalized Poisson distribution, the probability density function of 
 is given as (Famoye (1993), Wang and Famoye (1997)) [20];
Where 
 is a 
 dimensional vector of covariates including demographic factors, driving habits and medication use, and 
 is a 
 dimensional vector of regression parameters. For details on the generalized Poisson regression model, the reader is referred to Famoye (1993) [21]. The mean and variance of 
 are, respectively, given by
and 
The generalized Poisson regression model above is a generalization of the standard Poisson regression (PR) model. When 
 the probability function model, the equality constraint is observed between the conditional mean 
 and the conditional variance 
  of the dependent variable for each observation. In practical applications and in “real” situations, this assumption is questionable since the variance can either be larger or smaller than the mean. If the variance is not equal to the mean, the estimates in PR model are still consistent but are inefficient, which leads to the invalidation of inference based on the estimated standard errors. 
The parameter 
 is the mean incidence rate of y per unit of exposure. Exposure may be time, space, distance, area, volume, or population size. Because exposure is often a period of time, we use the symbol 
 to represent the exposure for a particular observation. When no exposure given, it is assumed to be one. The parameter 
 may be interpreted as the risk of a new occurrence of the event during a specified exposure period, t.The results below make use of the following relationship derived from the definition of the gamma function
In negative binomial regression, the mean of y is determined by the exposure time t and a set of k regressors variables (the x’s). The expression relating these quantities is 
Often, 
, in which case 
  is called the intercept. The regression coefficients 
 are unknown parameters that are estimated from a set of data. Their estimates are symbolized as 
. Using this notation, the fundamental negative binomial regression model for an observation i  is written as
Where 
 is the coefficient of determination of a regression of an explanatory variable j on all the other explanatory variables. A VIF value of 10 and above indicates a Multicollinearity problem.Table 1 shows that all the variables have VIF values <10. Thus all the variables can be included in the subsequent analyses and modeling with the Poisson regression, Generalized Poisson regression, and Negative Binomial Regression.
  | 
Where 
 denotes the fitted log likelihood and 𝑝 the number of parameters. A relatively small value of AIC is favorable for the fitted model.
Where, 
 the maximized value of the likelihood function of the model. n = the number of data points in the observed data, the number of observations, or equivalently, the sample size. k = the number of parameters estimated by the model.
  |