Investigating the Effects of Sample Size, Model Misspecification, and Underreporting in Crash Data on Three Commonly Used Traffic Crash Severity Models

Date

2011-08-08

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Numerous studies have documented the application of crash severity models to explore the relationship between crash severity and its contributing factors. These studies have shown that a large amount of work was conducted on this topic and usually focused on different types of models. However, only a limited amount of research has compared the performance of different crash severity models. Additionally, three major issues related to the modeling process for crash severity analysis have not been sufficiently explored: sample size, model misspecification and underreporting in crash data. Therefore, in this research, three commonly used traffic crash severity models: multinomial logit model (MNL), ordered probit model (OP) and mixed logit model (ML) were studied in terms of the effects of sample size, model misspecification and underreporting in crash data, via a Monte-Carlo approach using simulated and observed crash data.

The results of sample size effects on the three models are consistent with prior expectations in that small sample sizes significantly affect the development of crash severity models, no matter which model type is used. Furthermore, among the three models, the ML model was found to require the largest sample size, while the OP model required the lowest sample size. The sample size requirement for the MNL model is intermediate to the other two models.

In addition, when the sample size is sufficient, the results of model misspecification analysis lead to the following suggestions: in order to decrease the bias and variability of estimated parameters, logit models should be selected over probit models. Meanwhile, it was suggested to select more general and flexible model such as those allowing randomness in the parameters, i.e., the ML model.

Another important finding was that the analysis of the underreported data for the three models showed that none of the three models was immune to this underreporting issue. In order to minimize the bias and reduce the variability of the model, fatal crashes should be set as the baseline severity for the MNL and ML models while, for the OP models, the rank for the crash severity should be set from fatal to property-damage-only (PDO) in a descending order. Furthermore, when the full or partial information about the unreported rates for each severity level is known, treating crash data as outcome-based samples in model estimation, via the Weighted Exogenous Sample Maximum Likelihood Estimator (WESMLE), dramatically improve the estimation for all three models compared to the result produced from the Maximum Likelihood estimator (MLE).

Description

Citation