International Journal of Statistics and Applications

p-ISSN: 2168-5193    e-ISSN: 2168-5215

2016;  6(3): 123-137

doi:10.5923/j.statistics.20160603.05

 

Statistical Models for Count Data with Applications to Road Accidents in Ghana

A. Y. Omari-Sasu , Adjei Mensah Isaac , R. K. Boadi

Department of Mathematics, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana

Correspondence to: Adjei Mensah Isaac , Department of Mathematics, Kwame Nkrumah University of Science and Technology, Kumasi, Ghana.

Email:

Copyright © 2016 Scientific & Academic Publishing. All Rights Reserved.

This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/

Abstract

Road accidents in Ghana seems to be on ascendency and the root causes have been attributed to issues such as human errors and superstitions. Since the occurrences of these accidents are discrete, they are often modelled using count regression models. It is therefore the purpose of this study to determine an appropriate count regression model that adequately fits road accidents in Ghana and determine the key predictors using the appropriate model with respect to the expected number of persons killed in road accidents. Several models were compared to fit count data that encounter the field of transportation. These models include Poisson, Negative Binomial (NB) and Conway-Maxwell-Poisson (CMP) models. In order to compare the performance of these models, the various model selection methods such as Deviance goodness of fit test, Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC) were employed. Because the values of Deviance goodness of fit test, AIC and BIC for the NB model was the smallest as compared to that of the Poisson and CMP models, it appeared that, the NB model performed best than the Poisson and CMP models. Base on the appropriate model selected (NB model), the key predictors that contributed significantly and also had a high effect on the expected or mean number of persons killed in road accidents within a particular period were Head-on collision as Collision type, Improper overtaking and Loss of control as Driver errors, Bus/Minibus as Type of vehicle, Fog/Midst as Weather condition and Night with street lights off as Light condition.

Keywords: AIC, BIC, Goodness of fit, Poisson model, Negative binomial model, CMP model, Count Data

Cite this paper: A. Y. Omari-Sasu , Adjei Mensah Isaac , R. K. Boadi , Statistical Models for Count Data with Applications to Road Accidents in Ghana, International Journal of Statistics and Applications, Vol. 6 No. 3, 2016, pp. 123-137. doi: 10.5923/j.statistics.20160603.05.

1. Introduction

Road accident is defined as any activity that distracts the normal trajectory of moving vehicles in a manner that causes instability in the free flow of the vehicle. Road accidents in Ghana has been a serious concern to many Ghanaians in recent times. As a result of the tremendous effect of road accidents on human lives, properties and the environment as a whole, numerous researchers have come out with the causes, effects and recommendations to road accidents. These causes include drunk driving, over speeding and machine failure (Sagberg, Fosser and Saetermo (1997), Adams (1982) and National Road Safety Commission (2009)). Despite all these being done, yet every year the Ghana Statistical Service, Road Safety Commission and other organizations would report there is an increase in road accidents in the country (Annual Report, National Road Safety Commission. 2009).
Researchers in recent times have been modelling road accidents with crash prevention models in various parts of the world. However, it is extremely tedious to just apply models which have worked in other area to data obtained from different countries due to the variations in the various factors pertaining in different countries (Fletcher et al., 2006). There has not been much statistical research work conducted with respect to accidents on urban roads in Ghana and this might have been as a result of inadequate information available on accidents on urban roads and its impact on human lives and the environment as a whole in the country.
Salifu (2004) developed a forecasting model for traffic crashes for urban junctions with no road signs, Afukaar and Debrah (2007) on the other hand modelled traffic crashes for signalized urban junctions in Ghana and Akaah (2011) additionally modelled traffic crashes on rural highways in Ashanti region. Road accidents have always been attributed to human errors such as high alcoholic content in blood stream of some drivers, over speeding, wrong-overtaking among others. It has also been linked to poor road networks, poor surfacing of roads, witchcraft and death dying nature of some vehicles which ply the roads. It is however surprising that in spite of the numerous factors identified by researchers as causes of road accidents in Ghana and its consequences on human lives and properties; nobody has modelled the causes of accidents on urban roads and their contributions to the death of casualties in Ghana to authenticate the contributions of each of these factors to the casualties. It is therefore in light of this that this research is conducted in order to determine the appropriate count regression model that fits accident data with respect to the expected number of persons who will be killed via road accidents on urban roads in Ghana and additionally use the appropriate count regression model to investigate the key predictors of accidents that results in the death of casualties.

2. Related Works

Numerous road accident models have been developed to estimate the expected number of accident frequencies on roads as well as to identify the various factors associated with the occurrence of accidents. It is not possible for regression models to account for each and every factor that affects accident occurrences (Persaud and Dzibik, 1993). Previous researchers have focused on the no-behavioural factors such as traffic flow characteristics, road geometry and environmental conditions. Persaud and Dzibik (1993), as the first accident modellers who worked on multilane road, investigated the relationship between freeway crash data and accident volumes. By using the hourly traffic, the model indicated that, a higher accident risk is associated with congestion and afternoon rush hour. Shankar et al., (1995) modelled the monthly accident frequency of rural motorway as a function of geometrics, weather conditions and their interaction and found it dangerous for the areas with large rainfall and snowfall to have steep grades and tight horizontal curves.
Abdel and Radwan (2000) attempted to use Poisson regression model and then rejected it with the reason that, different mean and variance value of the dependent variable showed over-dispersion in the accident data. Consequently, the Negative binomial model also called the Poisson-Gamma mixture model was then adopted as a superior alternative model to accommodate the vehicle accident analysis for rural highways, arterial roadways, urban roads and rural motorways (Shankar et al., 1995; Lord, 2005; Montella, 2008; Pemmanaboina, 2005). Recently, a number of approaches have been proposed in the domain of accident models seeking improvement from traditional methods earlier discussed. For instance, El-Basyouny and Sayed (2006) proposed a modified Negative binomial regression technique which improved goodness of fit. Caliendo et al, (2007) on the other hand applied Negative Multinomial distribution to model multiple observations in the same road section at different years that may not be mutually independent. The generalized estimating equation was also used by researchers such as Abdel and Abdalla (2004) and Lord and Persaud (2000).
In New Zealand, the generalized linear models (GLMs) have been used in accident research of different intersection types as a well as for two-lane roads. The earlier work usually emphasized on the effect of traffic volume. So far the effort to model accident likelihood on motorway in New Zealand have been minimized except the flow-only models presented by Turner (2011) and the Economic Evaluation Manual (2010).
In order to obtain a valid and very accurate model that can fit accident data comparison and prediction, other researchers upon identifying the flaws in previous models tried different means by including more factors in the accident data analysis. Livneh and Hakkert (1972) researched into road accidents in Israel using employment and population data. On the other hand Susan and Partyka (1984) also modelled road accidents with the help of employment and population data. Research conducted by Andreassen (1985) raised serious objection to the use of death per vehicles licensed in order to make international comparison of road accident fatalities with the reason that it was found out that the two parameters were not linearly related overtime. As a result, he came out with a general formula;
(1)
which could be used to predict the number of deaths in road accidents. Where D is the number of deaths in road accidents, N is the number of vehicles in use, P is the population of the country by Andreassen was how to determine the constant term and the indices which might vary from one country to the other.
Additionally, a research by Minter (1987), discussed the application of two accident models which were developed by Wright (1998) for road safety problems and finally came out with new model for estimating road accidents in the United Kingdom. Pramada and Sarkar (1997) also in their research used road length as an additional parameter and established a model for road accidents. A general model was also presented by Jamal and Jamil (2001) to predict road accidents fatalities.
Jovanis and Chang (1986), Joshua and Garber (1990) and Miaou and Lum (1993) demonstrated that conventional linear regression models are not appropriate for modelling vehicle accidents events on roadways and that, test statistics from these models are often erroneous. They therefore concluded that, Poisson and Negative binomial regression models are more appropriate tool in accident modelling. The inadequacy of linear regression models uncovering the relationship between vehicular accidents and roadway characteristics have therefore led to numerous applications of Poisson and Negative binomial regression models (Shankar, Mannering and Barfield, 1995, Poch and Mannering, 1998).
Shankar, Mannering and Barfield (1995) used both Poisson (when the data were not significantly over-dispersed) and the Negative binomial (when the data were over-dispersed) to explore the frequency of rural freeway accidents with information on roadway geometry and weather-related environmental factors. Separate regressions of specific accident types, as well as overall accident frequency were modelled. The estimation results showed that, the Negative binomial regression model was the appropriate model for all accident types, with the exception of those involving overturned vehicles. Poch and Mannering (1996) also demonstrated that Negative binomial regression model was the appropriate model isolating the traffic and geometric elements that influence accident frequencies. Milton and Mannering (1998) on the other hand used the Negative binomial regression model as a predictive toll to evaluate the relationship among highway geometry, traffic related elements and motor-vehicle accident frequencies.
Shankar, Milton and Mannering (1997) argued that the traditional application of Poisson and Negative binomial regression models did not address the probability of zero-inflated counting process. They distinguished the truly safe road section (zero accident state) from the unsafe section (non-zero accident state with the probability of having zero observed accidents) to show that zero-inflated model structure is often appropriate for estimating the accident frequency of road section.
Some works have been done by earlier researchers in Ghana with respect to road accidents in the country. For instance, Oppong (2012) used Poisson count regression to model the number of persons who are killed by road accidents in Ghana. In addition, Salifu (2007) used generalized linear models to predict road accidents in unsignable urban junctions. Afukaar et al, (2007) and Akaah and Salifu (2011) additionally applied Poisson, Negative Binomial and Log-linear regression models in modelling road accidents in Ghana.

3. Method

This research work mainly utilized secondary data obtained from the Building and Road Research Institute (BRRI) of the Council of Scientific and Industrial Research (CSIR). The data for the research was originally collected with the help of the Police accident report by the Motor Traffic and Transport Unit (MTTU) of the Ghana Police Service. This research work considered a data for five (5) years period from 2009 to 2013. The mean or expected number of persons killed in individual road accidents for the five years period was used as the dependent or response variable and other categorical variables such as collision type, weather conditions, light condition, type of vehicle and driver error as, the explanatory or independent variables.

3.1. Regression Models

Count data, such as accident fatalities are better modelled using Poisson, Negative Binomial and Conway- Maxwell- Poisson regression models. Regardless of whether the assumed model is Poisson, Negative Binomial or Conway- Maxwell-Poisson, it will be assumed that the occurrences will be independent of each other. The three (3) types of count regression models are briefly explained as follows:
3.1.1. Poisson Regression
The most basic model for event counts is the Poisson regression model. If the variance of the counts approximately equals the mean counts, then the Poisson regression model is expressed as:
(2)
where is the number of counts (persons killed in accidents) for a particular period of time is the expected or mean number of counts (persons killed in road accidents) per period, which can be modelled as;
(3)
where is the vector of the explanatory or independent variables and is the vector of unknown regression parameters.
Equation 3 above as a result gives the indication that a unit increase in an increases by a multiplicative factor of . The main constraint of the Poisson regression model is that, the mean and the variance are approximately equal, that is;
(4)
As a result when there is heterogeneity or over-dispersion (when the variance increases faster than what the Poisson regression allows), the Poisson regression model does not work well hence there is the need to fit a parametric model that is more dispersed than the Poisson model and a natural choice is the Negative binomial and the Conway-Maxwell- Poisson models.
The Log-likelihood function of the Poisson model is expressed as;
(5)
By substituting equation 2 into equation 5, we further obtain the Log-likelihood function as;
(6)
In order to estimate the regression coefficients by the Maximum Likelihood Estimation (MLE) procedure, the derivative of the Log-likelihood that is relative to the must be set to zero as;
(7)
Estimating the regression coefficient in the Poisson regression model is not obtained from a direct equation but rather the Newton Raphson method is used for estimating the parameters that are unknown in the model (Anseline, 2002).
3.1.2. Negative Binomial Model
The Negative Binomial model can be obtained from the mixture of Poisson and Gamma distributions and is expressed as;
(8)
where is the number of road accidents for a road segment and is the mean or expected number of persons killed in a road accidents per period, which can be expressed as;
The conditional mean and variance of the Negative Binomial distribution are and respectively. Hence the NB model is over-dispersed and allows extra variation relative to the traditional Poisson model. It has more desirable properties than the Poisson model (Chin and Quddus, 2003).
The variance of the Negative Binomial is significantly greater than the mean.
The NB model, represents the dispersion parameter which allows or indicates the degree of over-dispersion. For instance, if the NB model reduces to the traditional Poisson model.
The Log-likelihood function of the NB model is obtained from the following equation;
(9)
In order to estimate and as in the Poisson model, the iteration procedure or method of Newton Raphson is applied (Lee and Mannering, 2002).
3.1.3. Conway-Maxwell-Poisson Model
The Conway-Maxwell-Poisson model (CMP) is a discrete probability distribution that generalizes the Poisson distribution by adding a parameter to model over-dispersion and also a member of the exponential family. The Conway-Maxwell-Poisson has a probability mass function of the form:
(10)
Where
For and. The function is an infinite series that converges for and.
The CMP distribution shows that, there is a non-linear relationship between the ration of successive probabilities as displayed by;
(11)
The mean and the variance of the CMP distribution can be represented respectively as;
(12)
(13)
Since the moments of the distribution do not have a closed from, Schmueli et al., (2005) showed that (26) can be approximated as;
(14)
and Sellers and Schumeli (2008) also showed that (27) can also be approximated as;
(15)
3.1.4. Conway-Maxwell-Poisson Generalized Linear Model
The Conway-Maxwell-Poisson distribution was extended into the Classical General Linear Model framework by Sellers and Schumeli (2008). Hence the CMP distribution is a member of the linear exponential family as displayed by;
(16)
As compared to the Poisson case, the nuisance function is assumed to be normalized and has the logarithm link function. The log-likelihood function is therefore given as;
(17)
The log-likelihood function of the CMP model can be computed in few ways that are different. In the case where Newton-Ralphson algorithm is employed, the log-likelihood function must be maximized under the constraint (Sellers and Schmueli, 2008).

3.2. Selecting Over-Dispersed Models

In order to test whether the data is over-dispersed or not, we test the following hypothesis;
where is a dispersion parameter.
The corresponding test statistic is;
(18)
Under and for large sample size, the z has an approximate standard normal distribution. We therefore reject at alpha level of significance if
If we reject the null hypothesis as a result then it gives an indication that the data is over-dispersed hence the Negative Binomial and the CMP models will be more appropriate compared to the Poisson model.

3.3. Model Specification

Models considered in this research work has the mean or expected number of persons killed in road accidents within a particular period of time as a function of the categorical variable; collision type, driver error, type of vehicle, weather condition and light condition. Each model parameterizes;
(19)
Where individuals, CT denotes the Collision Type, DE represents the Driver Errors, TV is the Type of Vehicle, LC as the Light Condition and WC representing the Weather Condition.

3.4. Parameter Estimation

The Maximum Likelihood Estimation (MLE) method has been considered due to the fact that all the models involved in the research work can be estimated using the MLE procedure. To evaluate the models involved in the research work, it is necessary to examine the significance of the variables included in the model. For a better model, the estimated regression coefficients have to be statistically significant.

3.5. Model Selection Methods

3.5.1. Goodness of Fit
After fitting the various models involve in the research work to the data obtained, it very necessary or important to check the overall fit as well as the quality of the fit of the respective models. The quality of the fit between the observed values (y) and the predicted values can be measured by the various test statistics, but one useful statistics is called the deviance goodness of fit test which is defined as;
(20)
Where is the number of events (observed values or counts), n is the number of observations and represents the fitted means (predicted values) of the models.
For a better model, one must expect a smaller value of the Deviance. Hence the smaller the value of the deviance of a specific model the better the model or the more statistically significant the model becomes.
3.5.2. Akaike’s Information Criterion (AIC)
Akaike’s Information Criterion (AIC, Akaike, 1973) is a measure of the relative quality of a statistical model for a given data. That is, given a collection of models for a data, AIC estimates the quality of each model, relative to other models. Hence, AIC provides a means for model selection. For any statistical model, the AIC value is computed using the relation;
(21)
Where L is the maximized value of the likelihood function and k is the number of parameters in the model. For the best model with respect to the AIC one must expect the lowest or the smallest AIC value.
3.5.3. Bayesian Information Criterion (BIC)
Bayesian Information Criterion (BIC) is a criterion for model selection among a finite set of models. It is based in the part on the likelihood function and it is closely related to the Akaike’s Information Criterion (AIC). Mathematically the BIC is an asymptotic result derived under the assumption that, the data distribution is in the exponential family.
Let;
x=the observed data
n=the number of data points in x, the number of observation or equivalently the sample size
k=the number of free parameters to be estimated. For instance if the model under consideration is linear regression, then k is the number of regressors or independent variables, including the intercept.
the marginal likelihood of the observed data given the model M.
= the maximized value of the likelihood function of the model M that is where is the parameter value that maximizes the likelihood function.
Then the formula for the Bayesian Information Criterion (BIC) is given by;
(22)
For a best model using BIC one must expect the lowest BIC value.

4. Results and Discussions

A review of the road accidents data with respect to the number of persons killed, revealed that there were 79,114 road accidents in Ghana from 2009-2013 which killed 10,836 people. This as a result gives indication that on the average, 15,823 road accidents occur every year and 2,167 lives were lost through road accidents.
The table 1 below shows the time in years for which road accidents that resulted in death of people occurred and additionally presents the total number of persons killed in road accidents in road accidents yearly from 2009-2013.
Table 1. Distribution of Persons Killed by Road Accidents Annually from 2009-2013 in Ghana
     
The most important feature of the table 1 above is that, the number of persons killed as a result of road accidents in Ghana seems to be increasing as the years go by. In the year 2009 from the table, there were 2,237 people who died in road accidents, this was fortunately reduced to 1,986 in 2010 and in the year 2011 the figure rose to 2,199 people dying as a result of road accidents. By the year 2013, the number of people killed by road accidents by road accidents had risen to 2m249. This statistics for the five years period therefore supports what Odoom (2010) stated in his research work on road accidents that, as the years go by, the number of motor vehicles in Ghana will be increasing with increasing number of road accidents and as such the number of persons who are likely to be killed in road accidents will also increase.

4.1. Poisson Regression Model for the Number of Persons Killed in Road Accidents from 2009-2013

The table 2 below presents the results on the estimate of the Poisson count regression model for the association of collision type resulting in death, type of vehicle resulting in death, weather condition of occurrence of accident resulting in death, light condition prevailing during accident, and driver error resulting in death. The table contains the parameter estimates, standard errors, death rates and the respective p-values of the various categories of variables used in the model.
Table 2. Parameter estimates of Poisson Regression Model for the Number of Persons Killed in Road Accidents from 2009-2013
     
The parameter estimate from the table 2 above were interpreted in terms of the rate of the number of persons killed or died in road accidents. These rates reflects the multiplicative effect of the various collision types, driver errors, type of vehicles, weather conditions, and light conditions on the number of persons killed in road accidents in Ghana. The parameter estimates of the intercept of 1.5047 with p-value=0.00000786 with an estimated death rate of exp(1.5047)=4.50 was statistically significant and different from zero at 0.05 level of significance.
In this model, overturn, inexperience, car, clear and day were used as reference levels for the categorical variables Collision type, Driver error, Type of vehicle, Weather condition and Light condition due to the R statistical software used for the analysis.
With respect to the categorical variable Collision type from table 2 above, rear-end collision, side swipe collision, right angle collision, hit object on road, and hit object off road with p-values 0.09702, 0.20404, 0.85036, 0.29680 and 0.92492 respectively were not significantly associated with the number of persons killed in road accidents in Ghana. However, only two collision types including Ran off collision and Head-on collision with p-values 0.03514 and 0.00158 respectively were significantly associated with the number of persons killed in road accidents. The parameter estimates of these significant collision types that is Ran off collision and Head-on collision were 0.28387 and 0.93386 with death rates of 1.33 and 2.54 respectively.
The death rate value for Ran off collision suggests that, the rate of death in road accidents is 1.33 times higher among deaths caused by ran-off collision compared to overturn collision whilst that of Head-on collision on the other hand indicates that, the rate of death in road accidents is 2.54 times higher among deaths caused by Head-on collision compared to deaths caused by overturn collision.
Also, inattentiveness on part of drivers, improper overtaking and loss of control with their respective p-values 0.00874. 0.04291 and 0.02658 were the only categories of the variable Driver error which were significantly associated with the number of persons killed by road accidents in the country. The estimated rate of death for inattentiveness from the table above was 0.45 indicating that, the rate of death in road accidents is 0.45 times higher with deaths caused by drivers inattentiveness compared to drivers inexperience, estimated death rate for Improper overtaking was also 1.25 which means that the rate of death in road accidents in Ghana is 1.25 times higher with improper overtaking compared to inexperience and that of loss of control is 0.43 indicating that the rate of death or number of persons killed in road accidents is 0.43 times higher with Loss of control compared to inexperience. The remaining Driver errors including driver driving too fast, too close and Fatigue/asleep were significantly associated with the rate of death in road accidents in Ghana.
In addition, only two types of vehicles were significantly associated with the number of persons killed in road accidents (Goods vehicle and Bus/Minibus) with p-values of 0.00127 and 0.0215 respectively. From the table 2, Goods vehicle as a type of vehicle contributing significantly to number of persons killed in road accidents had a rate value of 0.57 indicating that the rate of number of persons killed in road accidents was 0.57 times higher with Goods vehicle compared to cars whilst the rate value for Bus/Minibus was obtained as 1.36 meaning the rate of death in road accidents was 1.36 times higher with Bus/Minibus compared cars in Ghana.
Furthermore, other weather conditions was the only category of the variable weather condition which was not significantly associated with the number of persons killed in road accidents since it had a p-value (0.90341) greater than the 5 percent level of significance. Thus, Fog/midst and Rain their respective p-values, 0.03427 and 0.00155 proofed to be significantly associated with the number of persons killed in road accidents in Ghana. Fog/midst and Rain as types of weather conditions from the table 2 above had their rate values as 0.74 and 0.65 respectively indicating that, the rate of death in road accidents in the country is 0.74 times higher with Fog/midst compared to when the weather is clear and rate of death in road accidents is 0.65 times higher with Rain compared to clear weather condition.
Lastly from the table 2 above, the last variable Light condition was classified into four categories of which only two were significantly associated with the number of persons killed in road accidents in Ghana. These categories include “Night no street light” and “Night street light off” with p-values 0.02316 and 0.00442 respectively being less the 5 percent level of significance. The rate of death in road accidents on the other hand of these categories contributing significantly to number of persons killed were 3.68 and 1.84 respectively. This as a result gives the indication that, the death rate in road accidents in Ghana is 3.68 times higher with accidents that occur in the night with no street lights compared to the Day whilst on the other side the rate of death in road accidents is 1.84 times higher with accidents that occur during the Night with street lights off compared to the accidents that occur in the Day.
Hence the mean of the Poisson count regression model with the significant variables for estimating the mean or the average number of persons killed in road accidents in Ghana is:
(23)
Where RO is Ran off collision, HO IS Head-on collision, IN is Inattentiveness, IO is Improper overtaking, LC represents Loss of Control, GV is Goods vehicle, BM is Bus/Minibus, F is Fog/midst, R is rain, NS is Night with no street lights and NO represents Nights with street lights off.
4.1.1. Goodness of Fist Test of the Poisson Regression
Table 3 below displays the goodness of fit test of the fitted mean Poisson regression model for the number of persons killed in road accidents in Ghana from 2009-2013. This goodness of fit test helps one to determine how quality fit or appropriate the fitted model is as compared to other models.
Table 3. Goodness of Fit test of the Poisson Regression
     
The above table reveals that the Akaike’s Information Criterion (AIC) is 1059.31, Bayesian Information Criterion is also 1070.65 and the Residual deviance with degree of freedom 54 on the other hand is 470.85. This value of the residual deviance divided by its degree of freedom gives 8.71944. The resulting figure is approximately 9 which is greater than 1.
This gives the indication that there is over dispersion. As a result, the fitted model as a whole is not appropriate since the major assumption of Poisson regression which is equidispersion (when the dispersion parameter=1) is violated hence the Negative Binomial and the Conway-Maxwell Poisson is preferred.

4.2. Negative Binomial Regression Model for the Number of Persons Killed in Road Accidents from 2009-2013

The table 4 below shows the results of the results of the Negative Binomial regression model with collision type, driver error, and type of vehicle, light condition and weather condition as the independent categorical variables and the number of persons killed as the response variable. The parameter estimates associated with the categorical independent variables were the same as those of the Poisson regression model. As a result the estimated rate of the number of persons killed in road accidents associated with collision type, driver error, type of vehicle, light condition and weather condition remained unaltered. However, the standard errors and the respective probability values p-values changed from one variable to the other.
Table 4. Parameter estimates of Negative Binomial Regression Model for the Number of Persons Killed in Road Accidents from 2009-2013
     
From the table 4 above, the intercept which depicts the effects of the other variables which were not considered in the model was significant with a parameter estimate 1.50417, a death rate of 4.50 and a p-value of 0.00153. With respect to the variable “collision type”, head on collision was the only type of collision that was significantly associated with the number of persons killed in road accidents. The parameter estimate for this collision type was 0.92703 with an estimated death rate of exp(0.92703)=2.53. This figure as a result suggest that, the rate of death in road accidents is 2.53 times higher among accidents caused as a result of head-on collision compared to overturn collision.
However, Improper-overtaking and loss of control under the categorical variable “Driver error” with respective p-values 0.02638 and 0.00303 remained significantly associated with the number of persons killed in road accidents in Ghana at 5 percent level of significance with death rate of 1.25 and 0.43 respectively. This therefore implies that, the rate of death in road accidents is 1.25 times higher among accidents caused by improper overtaking compared to inexperience nature of some drivers and also the rate of death in road accidents is 0.43 times higher among accidents caused by a drivers losing control compared inexperience. On the other hand, a driver driving too fast and No signal” from the table 4 had a high effect on the rate of death in road accidents with rate values of 1.31 and 1.08 respectively but were not significantly associated with the number of persons killed at the 5 percent level of significance.
Also among the various classifications of the type of vehicle as a categorical independent variable, only Bus/Minibus was significantly associated with the number of persons killed in road accidents with a p-value of 0.02107 and a death rate value of 1.36. The death rate value of the Bus/Minibus as a result gives an indication that in Ghana, the rate of death when it comes to road accidents is 1.36 higher among accidents that involves Bus/Minibus as compared to accidents caused by cars.
Not forgetting the categorical variable ”light condition”, Night with street light off remained statistically significant with a probability of 0.00125 and an estimated death rate of 1.84. This rate value therefore indicates that, the rate of death in road accidents is 1.84 times higher with accidents that occur on streets with lights off during the night compared to accidents that occur during the day.
The mean regression of the Negative Binomial model for estimating the expected number of persons killed in road accidents within a specific year with significant variables is therefore formulated as;
(24)
where HO is Head-on collision, IO is Improper overtaking, LC represents Loss of control, BM is Bus/Minibus, F is Fog/Midst, NO represents Nights with street lights off.
4.2.1. Goodness of Fit Test of the Negative Binomial Regression
Table 5 below depicts the goodness of fit test of the fitted Negative Binomial regression model for the number of persons killed as a result of road accidents in Ghana from 2009-2013. This table gives the details of null deviance, residual deviance, AIC, BIC and the dispersion parameter of the model.
Table 5. Goodness of Fit test of the Negative Binomial Regression
     
From the table, the residual deviance of the model decreased substantially that is 65.666, which is much smaller compared to 470.85 of the Poisson model. The corresponding value by dividing the residual deviance by its degree of freedom is 1.21603 (dispersion parameter) which is approximately equal to 1, therefore suggesting the elimination of over dispersion. These results suggest that the Negative binomial model is reasonable or appropriate.

4.3. Conway-Maxwell-Poisson Regression Model for the Number of Persons Killed in Road Accidents from 2009-2013

The parameter estimates of the Conway-Maxwell-Poisson model from the table 5 below on the other hand as compared to the Negative Binomial model and Poisson regression model were the same. This as a result give the indication that the estimated rate of death values for the respective variables used in the analysis remains the same as well. However the standard errors and the probability values of the variables involved were changed.
Table 6. Parameter Estimates of Conway-Maxwell-Poisson model for the number of persons killed in Road Accidents from 2009-2013
     
The table 6 above depicts that the parameter estimates of the intercept of 1.50417 with a probability value 0.0000542 and an estimated rate of 4.50 was statistically significant. In this model, only head-on collision was significantly associated with the number of persons killed in road accidents under the variable “collision type” with a p-value of 0.001123 and an estimated death rate of 2.53. This estimated rate value of the head-on collision indicates that, the rate of death in road accidents is 2.53 times higher with accidents as result of head-on collision compared to overturn collision.
Also, inattentive and improper overtaking with their respective p-values 0.036919 and 0.021518 were the driver errors in this model which were significantly associated with the number of persons killed by road accidents. The estimated death rate for inattentiveness from the table 6 is 0.45 which suggests that the rate of death in road accidents is 0.45 times higher among accidents caused by drivers inattentiveness compared to drivers’ inexperience. The estimated rate of death for improper overtaking on the other hand is 1.25 which as result indicates that, the rate of death in road accidents in this country is 1.25 times higher with accidents caused as a result of improper overtaking compared to inexperience of some drivers.
In addition, from the table 6, only Bus/Minibus remained statistically significant with respect to the number of persons killed in road accidents with p-value of 0.007719 and an estimated death rate of 1.36 indicating that, the rate of death in road accidents in our country is 1.36 times higher with Bus/Minibus compared to cars (taxis, trotro etc.).
Furthermore, under the variable ”Weather condition” fog/midst and other weather conditions with p-values 0.872703 and 0.112926 respectively were not significantly associated with the number of persons killed in road accidents. However rain in this case with a p-value of 0.021015 was significantly related to the number of persons killed in road accidents at 5 percent level of significance. The parameter estimate of this variable was -0.42441 and an estimated death rate of 0.65 suggesting that, that rate of death among accidents on the road in Ghana is 0.65 times higher with deaths that resulted from accidents that occurred during the rainy weather condition compared to clear weather condition.
Lastly in this model, the categorical independent variable ”Light condition” had only Night with street lights off having a significant relation with the number of persons killed as a result of road accident. Night with street light off from table 6 had a p-value of 0.039922 and death rate of 1.84 which implies that, the rate of death in road accidents is 1.84 times higher when it comes to the accidents that occur during the night with street lights off compared to accidents that occur during the day.
Therefore, the mean of the Conway-Maxwell-Poisson count regression model for estimating the mean or the average number of persons killed in road accidents yearly for the various significant variables is stated as:
(25)
where HO is Head collision, IN is Inattentiveness, IO is Improper Overtaking, BM is Bus/Minibus, R is Rain and NO is Night with street lights off.
4.3.1. Goodness of Fit test of the Conway-Maxwell-Poisson Model
The table 7 below indicates the goodness of fit test of the Conway-Maxwell-Poisson regression model for the number of persons killed in road accidents from 2009-2013. This table helps to determine as to whether the fitted model is appropriate or whether the model is quality fit.
Table 7. Goodness of Fit test of the Conway-Maxwell-Poisson Model
     
The table 7 above reveals that, in this model, the residual deviance, AIC and BIC all have decreased substantially with their respective values 76.28, 826.46 and 835.79, which are much smaller compare to residual deviance of 470.85, AIC of 1059.31 and BIC OF 1070,65 of the Poisson regression model. The corresponding value by dividing the residual deviance by its degrees of freedom is approximately 1, therefore suggesting that the CMP regression model compared to the Poisson model can take care of the presence of over-dispersion.
This results indicates that the CMP model is also an appropriate model.

4.4. Model Evaluation and Comparison

With respect to the table 8 presented above, there is a clear evidence that Negative binomial model performs best and is the best model which best fit the accident data well with respect to the expected number of persons killed in road accidents as compared to the Poisson and the CMP model.
This is because in order to select the best model that performs best or best fits with respect to a certain data among other models with the help AIC, BIC or deviance, the criterion set is that, the smaller the value of the AIC or BIC or Deviance the better that model becomes. Hence by comparing the respective values of the AIC, BIC as well as the residual deviance of the Poisson, Negative Binomial and the CMP model from the table 8, it is seen that the Negative binomial has the smallest AIC value of 749.10, BIC value of 758.46 and a residual deviance value of 65.666 as compared those of Poisson and the CMP model.
Table 8. Results of the model evaluation and comparison
     
Although both the Negative Binomial and the CMP model takes care of over dispersion, the dispersion parameter value from the table 8 indicates that the Negative binomial regression takes care of over dispersion more than that of the CMP model. The above statistics therefore gives an indication that, the model that best fits accident data with number of persons killed in the Ghana is the Negative Binomial model followed by the CMP model.
Hence in order for one to estimate the expected number of persons killed in road accidents in Ghana per year, the mean of the Negative binomial model with the significant variables head-on collision as a collision type, Improper overtaking and loss of control as driver errors, Bus/Minibus as type of vehicle, Fog/midst as weather condition and Night with street lights off as Light condition must be used and is given as:
where represents the expected number of persons killed in road accidents in a particular period of time, HO is Head-on collision, IO is Improper overtaking, LC represents Loss of control, BM is Bus/Minibus, F is Fog/Midst, NO represents Nights with street lights off.

5. Conclusions

This research work was aimed at examining the efficiency of different statistical models for count data with application to road accidents in Ghana. As a result three (3) statistical models including Poisson, Negative Binomial and Conway- Maxwell-Poisson count regression models were fitted. All the fitted models include significant explanatory variables.
Base on the deviances, AIC and BIC of the respective fitted models it appeared that only Negative Binomial model performed best as compared to Poisson and the Conway-Maxwell-Poisson model. The predictors in this model were investigated using their respective p-values and was found out that Head-on collision as collision type, Improper overtaking and Loss of control as driver errors, Bus/Minibus as type of vehicle, Fog/midst as weather condition and Night with street lights off as Light condition were the key predictors or independent variables contributing significantly to the expected number of persons to be killed in road accidents in Ghana.
The empirical study of this research work additionally revealed that in the presence of over-dispersion, both the Negative Binomial and Conway-Maxwell-Poisson count regression models are potential alternative to the Poisson count regression model due to its major assumption of equi-dispersion.
Thus the Poisson count regression model serves well under equi-dispersion condition whiles both Negative binomial and CMP count regression models serve better whiles the data is over-dispersed.

Appendix

Derivation of the Poisson Regression
From the generalized linear model framework, the exponential family in canonical form is given by the probability function;
(1)
But the Poisson distribution function on the other hand is given by;
By taking the exponent and natural log (ln) of the Poisson pdf, we obtain the following;
(2)
By comparing equation 9 to equation 8, we have;
But
(3)
Therefore
(4)
Hence by making the subject from (11) we obtain;
(5)
as the Poisson mean regression model
Derivation of NB Model
Suppose that we have a series of random counts that follows a Poisson distribution;
Where is the observed number of counts (accidents) for and is the mean or the expected number of counts (persons killed in road accidents) of the Poisson distribution.
In the Poisson distribution, the was fully determined by a linear combination of The Negative Binomial on the other hand adds some unobserved heterogeneity, such that is determined by the and some unobserved specific random effect called Gamma distributed error which is assumed to be uncorrelated with the
Hence we obtain the following relationships;
(6)
where is defined as the unobserved random effect; is the log-link between the Poisson mean and the covariance or independent variables and the are the regression coefficients.
The Negative Binomial is not defined without the assumption about the mean error term and the most convenient assumption is that, since this gives This is as a result indicates that we have the same expected counts as the Poisson distribution.
Since the distribution of the observations is still Poisson with given and , we obtain the relation:
(7)
where is unknown.
Since is unknown, cannot be computed and instead there is a need to compute the distribution of given only. Hence, to compute without conditioning , we compute the average of by the probability of each value of
Hence if is the probability density function of then;
(8)
where is a mixing distribution. In the case of the case of the Poisson-Gamma mixture, is the Poisson distribution and is the Gamma distribution. This distribution has a closed form and as a result leads to the Negative Binomial distribution.
In order to solve (17), there is the need to specify the form of the pdf for Assuming that the variable follows a two-parameter Gamma distribution then we have;
(9)
where
An interesting thing about this distribution is that, which is a convenient assumption indicating that the mean of the Negative Binomial is the same as the Poisson. Also
By using (7) and (9) to solve (8), we obtain following Negative Binomial model;
(10)
The expected value of the above relation (NB model) is the same as the Poisson distribution that is . But the conditional variance differs from that of the Poisson distribution;
(11)
Because and are both positive from (11), the variance automatically exceeds the conditional mean. This as a result increases the relative frequency of low and high counts. There will be more parameters than observations since the variances remain undefined if varies by observation.
The typical assumption is that is the same for the observations (counts) that is;
(12)
This formulation as a result helps to simplify some equations if we substitute (21) into (19). Hence we obtain the NB model as;
(13)
and the conditional variance as;
(14)

References

[1]  Abdel-Aty, M.A., Radwan, A.E. (2000), Modelling traffic accidents occurrence and involvement. Accident Analysis and Prevention, Vol. 32, pp. 633-642. http://dx.doi.org/10.1016-4575(99)00094-9.
[2]  Abdel-Aty and Abdalla (2004), Modelling drivers diversion from normal routes under ATIS using generalized estimating equations and binomial probit link function. http://dx.doi.org/10.1023/b:port.0000025396.32909.dc.
[3]  Adams JGV…….and how much for your grandmother? Environment and Planning, 1974, (6), pp 619-626. http://dx.doi.org/10.1068/a060619.
[4]  Afukaar F.K., Agyemang W, and Most I. (2009). Accident statistics 2007. Building and Road Research Institution, Council for Scientific and Industrial Research, Kumasi.
[5]  Afukaar F.K., and Debrah E.K., (2007). Accident prediction for signalized intersections in Ghana. Ministry of Transport: Nation Accident Management Project, report, Final report.
[6]  Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In the second international symposium on information theory, edited B.V Petrov and B.F Csaki, Academical Kiado.
[7]  Andreassen D. (1985). Linking deaths with vehicles and population. Traffic Engineering and Control. 26:547-549.
[8]  Caliendo, C., Guida, M., Parisi, A. (2007). A crash-prediction model for multilane roads.http://dx.doi.org/10.1016/j.aap.2006.10012.
[9]  Chin, H.C. and Quddus, M.A. (2003). Modelling count data with excess zeros: An empirical application to traffic accidents. Sociological Methods and Research, 32, 90-116. http://dx.doi.org/10.1177/0049124103253459.
[10]  El-Basyouny, K., Sayed, T. (2006). Comparison of two Negative Binomial Regression Techniques in Developing Accident Prediction Models. Transportation Research Record: Journal of the Transportation Research Board, No. 1950, Transport Research Board of the National Academies, Washington, DC, 2006, pp. 9-16.http://dx.doi.org/10.3141/1950-02.
[11]  Fletcher J.P., Baguley C.J., Sexton B., and Done S. (2006). Road accident modeling for highway development countries. Main Report Trials in India and Tanzania. Report No. PPRO95.
[12]  Jamal R.M.A., and Jamil A.N. (2001). Casual models for road accidents in Yemen. Accident Analysis and Prevention. 33:547-561.htpp://dx.doi.org/10.1016/s0001-4575(00)00069-5.
[13]  Jovanis, P.P., and Chang, H. (1986). Modelling the relationship of accidents to miles travelled.
[14]  Lee, J., and Mannering, F. (2002). Impact of roadside features on the frequency and severity of run-off-roadway accidents: An empirical analysis. Accidents Analysis and Prevention, 34, 149-161. http://dx.doi.or/10.1016/s001-4575(01)00009-4.
[15]  Lord, D., Persaud, B.N. (2000). Accident prediction models with and without trend: application of the generalized estimating equations procedure. Transportation Research. http://dx.doi.org/10.1016/j.aap.2004.02.004.
[16]  Lord, D., Washing ton, S.P., and Ivan, J.N. (2005). Poisson, Poisson-gamma, and Zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis and Prevention, 37(1), 35-46.
[17]  Miaou, S.P., and Lum, H. (1993). Modelling vehicle accidents and highway geometric design relationships. Accident Analysis and Prevention. 25:689-1993. http://dx.doi.org/10.1016/0001-4575(93)90034-t.
[18]  Minter A.L., Road casualties improvement by learning processes. Traffic Engineering and Control, 28:74-79.
[19]  Montella (2008). Rates ration relationships for rural and urban freeway segments. Accidents Analysis and Record, Vol.1717, pp. 102-108
[20]  National Road Safety Commission (2009). Annual Report. National Road Safety Commission, Ghana
[21]  Oppong Richard Asumadu, (2012), Statistical Analysis of Road Accidents Fatalities in Ghana using Poisson regression, Mphil thesis, Kwame Nkrumah University of Science and Technology.
[22]  Pemmanaboina, R (2005). Assessing accident occurrence on urban freeways using static and dynamic factors b applying a system of interrelated equations, M.E. thesis, University of Central Florida.
[23]  Persaud, B., Dzibik, L., (1993) Accident prediction models for freeways. Transportation Research Record, Vol.1401, pp. 55-60.
[24]  Poch, M., and Mannering, F.L. (1996). Negative Binomial analysis of intersection accident frequency. Journal of Transportation Engineering, 122, 105-133.http://dx.doi.org/10.1061/(asce)0733-947x(1996)122:29105).
[25]  Pramada, V.P., and Sarkar, P.K. (1997). Variation in the pattern of road accidents in different states and union territories in India. Proceedings of the third national conference on transportation systems studies: Analysis and Policy. 1X-5 to 1X-9. Prevention, Vol. 37, pp. 185-199.
[26]  Sagberg, F., and Saetermo (1997). An investigation of behavioural adaptation to airbags and antilog brakes among taxi drivers (29 ed). Accident Analysis and Prevention. 293-302. http://dx.doi.org/10.1016/s0001-4575(96)00083-8.
[27]  Shankar, V., Mannering, F., and Barfield, W., (1995). Effect of roadway geometrics and environmental factors on rural freeway accident frequencies. Accidents Analysis and Prevention, 27,371-389.
[28]  Shankar, V., Milton, J., and Mannering, F. (1997). Modeling accident frequencies as zero-altered probability processes: An empirical inquiry. Accidents Analysis and Prevention, 29, 829-837. http://dx.doi.org/10.1016/s0001-4575(97)00052-3.
[29]  Shmueli, G., Minka, T.P., Kadane, J.B., Borle, S., and Boatwright, P. (2005). A useful distribution for fitting discrete data: revival of the Conway-Maxwell-Poisson distribution, Applied Statistics, 54, 127-142. http://dx.doi.org/10.1111/j.1467-9876.2005.00474.x.
[30]  Susan, C. and Partyka (1984). Simple model of fatality trends using employment and population data. Accident Analysis and Prevention. 16:21-222.http://dx.doi.org/10.1016/0001-4575(84)90015-0.