| Peer-Reviewed

Statistical Models for Count Data

Received: 13 September 2016     Accepted: 23 September 2016     Published: 15 October 2016
Views:       Downloads:
Abstract

Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.

Published in Science Journal of Applied Mathematics and Statistics (Volume 4, Issue 6)
DOI 10.11648/j.sjams.20160406.12
Page(s) 256-262
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Count, Modeling, Simulation, AIC, Compare

References
[1] Dalrymple, M. L., Hudson, I., & Ford, R. P. K. (2003). Finite mixture, zero-inflated poisson and hurdle models with application to sids. Computational Statistics & Data Analysis, 41 (3), 491-504.
[2] Gurmu, S., & Trivedi, P. K. (1996). Excess zeros in count models for recreational trips. Journal of Business & Economic Statistics, 14 (4), 469-477.
[3] Johansson, A. (2014). A comparison of regression models for count data in third party automobile insurance.
[4] Lord, D., Washington, S. P., & Ivan, J. N. (2005). Poisson, poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention, 37 (1), 35-46.
[5] Frees, E. W. (2010). Regression modeling with actuarial and financial applications. Cambridge University Press.
[6] Cameron, A., & Trivedi, P. (1999). Regression analysis of count data. Cam-bridge University Press.
[7] Johnson, N. L., Kotz, S., & Kemp, A. (1992). Univariate distributions. New York, John Wiley.
[8] Hilbe, J. (2014). Modeling count data. Cambridge University Press.
[9] Bonate, P. L. (2001). A brief introduction to monte carlo simulation. Clinical pharmacokinetics, 40 (1), 15-22.
[10] Mooney, C. Z. (1997). Monte carlo simulation (quantitative applications in the social sciences).
[11] Min, Y., & Agresti, A. (2005). Random e ect models for repeated measures of zero-in ated count data. Statistical Modelling, 5 (1), 1-19.
[12] Civettini, A. J., & Hines, E. (2005). Misspeci cation e ects in zero-in ated negative binomial regression models: Common cases. In Annual meeting of the southern political science association. new orleans, la.
[13] Lambert, D. (1992). Zero-in ated poisson regression, with an application to defects in manufacturing. Technometrics, 34 (1), 1-14.
[14] Miller, J. M. (2007). Comparing poisson, hurdle, and zip model fit under varying degrees of skew and zero-inflation. University of Florida
Cite This Article
  • APA Style

    Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu. (2016). Statistical Models for Count Data. Science Journal of Applied Mathematics and Statistics, 4(6), 256-262. https://doi.org/10.11648/j.sjams.20160406.12

    Copy | Download

    ACS Style

    Alexander Kasyoki Muoka; Oscar Owino Ngesa; Anthony Gichuhi Waititu. Statistical Models for Count Data. Sci. J. Appl. Math. Stat. 2016, 4(6), 256-262. doi: 10.11648/j.sjams.20160406.12

    Copy | Download

    AMA Style

    Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu. Statistical Models for Count Data. Sci J Appl Math Stat. 2016;4(6):256-262. doi: 10.11648/j.sjams.20160406.12

    Copy | Download

  • @article{10.11648/j.sjams.20160406.12,
      author = {Alexander Kasyoki Muoka and Oscar Owino Ngesa and Anthony Gichuhi Waititu},
      title = {Statistical Models for Count Data},
      journal = {Science Journal of Applied Mathematics and Statistics},
      volume = {4},
      number = {6},
      pages = {256-262},
      doi = {10.11648/j.sjams.20160406.12},
      url = {https://doi.org/10.11648/j.sjams.20160406.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20160406.12},
      abstract = {Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.},
     year = {2016}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Statistical Models for Count Data
    AU  - Alexander Kasyoki Muoka
    AU  - Oscar Owino Ngesa
    AU  - Anthony Gichuhi Waititu
    Y1  - 2016/10/15
    PY  - 2016
    N1  - https://doi.org/10.11648/j.sjams.20160406.12
    DO  - 10.11648/j.sjams.20160406.12
    T2  - Science Journal of Applied Mathematics and Statistics
    JF  - Science Journal of Applied Mathematics and Statistics
    JO  - Science Journal of Applied Mathematics and Statistics
    SP  - 256
    EP  - 262
    PB  - Science Publishing Group
    SN  - 2376-9513
    UR  - https://doi.org/10.11648/j.sjams.20160406.12
    AB  - Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.
    VL  - 4
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • Department of Basic and Applied Sciences, Jomo Kenyatta University of Agriculture and Technology-Westlands campus, Nairobi, Kenya

  • Mathematics and Informatics department, Taita Taveta University College, Voi, Kenya

  • Department of Basic and Applied Sciences, Jomo Kenyatta University of Agriculture and Technology-Westlands campus, Nairobi, Kenya

  • Sections