Over- and Under-dispersed Crash Data: Comparing the Conway-Maxwell-Poisson and Double-Poisson Distributions
MetadataShow full item record
In traffic safety analysis, a large number of distributions have been proposed to analyze motor vehicle crashes. Among those distributions, the traditional Poisson and Negative Binomial (NB) distributions have been the most commonly used. Although the Poisson and NB models possess desirable statistical properties, their application on modeling motor vehicle crashes are associated with limitations. In practice, traffic crash data are often over-dispersed. On rare occasions, they have shown to be under-dispersed. The over-dispersed and under-dispersed data can lead to the inconsistent standard errors of parameter estimates using the traditional Poisson distribution. Although the NB has been found to be able to model over-dispersed data, it cannot handle under-dispersed data. Among those distributions proposed to handle over-dispersed and under-dispersed datasets, the Conway-Maxwell-Poisson (COM-Poisson) and double Poisson (DP) distributions are particularly noteworthy. The DP distribution and its generalized linear model (GLM) framework has seldom been investigated and applied since its first introduction 25 years ago. The objectives of this study are to: 1) examine the applicability of the DP distribution and its regression model for analyzing crash data characterized by over- and under-dispersion, and 2) compare the performances of the DP distribution and DP GLM with those of the COM-Poisson distribution and COM-Poisson GLM in terms of goodness-of-fit (GOF) and theoretical soundness. All the DP GLMs in this study were developed based on the approximate probability mass function (PMF) of the DP distribution. Based on the simulated data, it was found that the COM-Poisson distribution performed better than the DP distribution for all nine mean-dispersion scenarios and that the DP distribution worked better for high mean scenarios independent of the type of dispersion. Using two over-dispersed empirical datasets, the results demonstrated that the DP GLM fitted the over-dispersed data almost the same as the NB model and COM-Poisson GLM. With the use of the under-dispersed empirical crash data, it was found that the overall performance of the DP GLM was much better than that of the COM-Poisson GLM in handling the under-dispersed crash data. Furthermore, it was found that the mathematics to manipulate the DP GLM was much easier than for the COM-Poisson GLM and that the DP GLM always gave smaller standard errors for the estimated coefficients.