Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.
Published in | Science Journal of Applied Mathematics and Statistics (Volume 4, Issue 6) |
DOI | 10.11648/j.sjams.20160406.12 |
Page(s) | 256-262 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2016. Published by Science Publishing Group |
Count, Modeling, Simulation, AIC, Compare
[1] | Dalrymple, M. L., Hudson, I., & Ford, R. P. K. (2003). Finite mixture, zero-inflated poisson and hurdle models with application to sids. Computational Statistics & Data Analysis, 41 (3), 491-504. |
[2] | Gurmu, S., & Trivedi, P. K. (1996). Excess zeros in count models for recreational trips. Journal of Business & Economic Statistics, 14 (4), 469-477. |
[3] | Johansson, A. (2014). A comparison of regression models for count data in third party automobile insurance. |
[4] | Lord, D., Washington, S. P., & Ivan, J. N. (2005). Poisson, poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention, 37 (1), 35-46. |
[5] | Frees, E. W. (2010). Regression modeling with actuarial and financial applications. Cambridge University Press. |
[6] | Cameron, A., & Trivedi, P. (1999). Regression analysis of count data. Cam-bridge University Press. |
[7] | Johnson, N. L., Kotz, S., & Kemp, A. (1992). Univariate distributions. New York, John Wiley. |
[8] | Hilbe, J. (2014). Modeling count data. Cambridge University Press. |
[9] | Bonate, P. L. (2001). A brief introduction to monte carlo simulation. Clinical pharmacokinetics, 40 (1), 15-22. |
[10] | Mooney, C. Z. (1997). Monte carlo simulation (quantitative applications in the social sciences). |
[11] | Min, Y., & Agresti, A. (2005). Random e ect models for repeated measures of zero-in ated count data. Statistical Modelling, 5 (1), 1-19. |
[12] | Civettini, A. J., & Hines, E. (2005). Misspeci cation e ects in zero-in ated negative binomial regression models: Common cases. In Annual meeting of the southern political science association. new orleans, la. |
[13] | Lambert, D. (1992). Zero-in ated poisson regression, with an application to defects in manufacturing. Technometrics, 34 (1), 1-14. |
[14] | Miller, J. M. (2007). Comparing poisson, hurdle, and zip model fit under varying degrees of skew and zero-inflation. University of Florida |
APA Style
Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu. (2016). Statistical Models for Count Data. Science Journal of Applied Mathematics and Statistics, 4(6), 256-262. https://doi.org/10.11648/j.sjams.20160406.12
ACS Style
Alexander Kasyoki Muoka; Oscar Owino Ngesa; Anthony Gichuhi Waititu. Statistical Models for Count Data. Sci. J. Appl. Math. Stat. 2016, 4(6), 256-262. doi: 10.11648/j.sjams.20160406.12
AMA Style
Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu. Statistical Models for Count Data. Sci J Appl Math Stat. 2016;4(6):256-262. doi: 10.11648/j.sjams.20160406.12
@article{10.11648/j.sjams.20160406.12, author = {Alexander Kasyoki Muoka and Oscar Owino Ngesa and Anthony Gichuhi Waititu}, title = {Statistical Models for Count Data}, journal = {Science Journal of Applied Mathematics and Statistics}, volume = {4}, number = {6}, pages = {256-262}, doi = {10.11648/j.sjams.20160406.12}, url = {https://doi.org/10.11648/j.sjams.20160406.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20160406.12}, abstract = {Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.}, year = {2016} }
TY - JOUR T1 - Statistical Models for Count Data AU - Alexander Kasyoki Muoka AU - Oscar Owino Ngesa AU - Anthony Gichuhi Waititu Y1 - 2016/10/15 PY - 2016 N1 - https://doi.org/10.11648/j.sjams.20160406.12 DO - 10.11648/j.sjams.20160406.12 T2 - Science Journal of Applied Mathematics and Statistics JF - Science Journal of Applied Mathematics and Statistics JO - Science Journal of Applied Mathematics and Statistics SP - 256 EP - 262 PB - Science Publishing Group SN - 2376-9513 UR - https://doi.org/10.11648/j.sjams.20160406.12 AB - Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero. VL - 4 IS - 6 ER -