Statistical Models for Count Data

Alexander Kasyoki Muoka; Oscar Owino Ngesa; Anthony Gichuhi Waititu

doi:doi:10.11648/j.sjams.20160406.12

| Peer-Reviewed

Statistical Models for Count Data

Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu

Published in Science Journal of Applied Mathematics and Statistics (Volume 4, Issue 6)

Received: 13 September 2016 Accepted: 23 September 2016 Published: 15 October 2016

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.

Published in	Science Journal of Applied Mathematics and Statistics (Volume 4, Issue 6)
DOI	10.11648/j.sjams.20160406.12
Page(s)	256-262
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Count, Modeling, Simulation, AIC, Compare

References

[1]	Dalrymple, M. L., Hudson, I., & Ford, R. P. K. (2003). Finite mixture, zero-inflated poisson and hurdle models with application to sids. Computational Statistics & Data Analysis, 41 (3), 491-504.
[2]	Gurmu, S., & Trivedi, P. K. (1996). Excess zeros in count models for recreational trips. Journal of Business & Economic Statistics, 14 (4), 469-477.
[3]	Johansson, A. (2014). A comparison of regression models for count data in third party automobile insurance.
[4]	Lord, D., Washington, S. P., & Ivan, J. N. (2005). Poisson, poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention, 37 (1), 35-46.
[5]	Frees, E. W. (2010). Regression modeling with actuarial and financial applications. Cambridge University Press.
[6]	Cameron, A., & Trivedi, P. (1999). Regression analysis of count data. Cam-bridge University Press.
[7]	Johnson, N. L., Kotz, S., & Kemp, A. (1992). Univariate distributions. New York, John Wiley.
[8]	Hilbe, J. (2014). Modeling count data. Cambridge University Press.
[9]	Bonate, P. L. (2001). A brief introduction to monte carlo simulation. Clinical pharmacokinetics, 40 (1), 15-22.
[10]	Mooney, C. Z. (1997). Monte carlo simulation (quantitative applications in the social sciences).
[11]	Min, Y., & Agresti, A. (2005). Random e ect models for repeated measures of zero-in ated count data. Statistical Modelling, 5 (1), 1-19.
[12]	Civettini, A. J., & Hines, E. (2005). Misspeci cation e ects in zero-in ated negative binomial regression models: Common cases. In Annual meeting of the southern political science association. new orleans, la.
[13]	Lambert, D. (1992). Zero-in ated poisson regression, with an application to defects in manufacturing. Technometrics, 34 (1), 1-14.
[14]	Miller, J. M. (2007). Comparing poisson, hurdle, and zip model fit under varying degrees of skew and zero-inflation. University of Florida

Cite This Article

Plain Text BibTeX RIS

APA Style

Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu. (2016). Statistical Models for Count Data. Science Journal of Applied Mathematics and Statistics, 4(6), 256-262. https://doi.org/10.11648/j.sjams.20160406.12

Copy | Download

ACS Style

Alexander Kasyoki Muoka; Oscar Owino Ngesa; Anthony Gichuhi Waititu. Statistical Models for Count Data. Sci. J. Appl. Math. Stat. 2016, 4(6), 256-262. doi: 10.11648/j.sjams.20160406.12

Copy | Download

AMA Style

Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu. Statistical Models for Count Data. Sci J Appl Math Stat. 2016;4(6):256-262. doi: 10.11648/j.sjams.20160406.12

Copy | Download

@article{10.11648/j.sjams.20160406.12,
  author = {Alexander Kasyoki Muoka and Oscar Owino Ngesa and Anthony Gichuhi Waititu},
  title = {Statistical Models for Count Data},
  journal = {Science Journal of Applied Mathematics and Statistics},
  volume = {4},
  number = {6},
  pages = {256-262},
  doi = {10.11648/j.sjams.20160406.12},
  url = {https://doi.org/10.11648/j.sjams.20160406.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20160406.12},
  abstract = {Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.},
 year = {2016}
}

Copy | Download

TY  - JOUR
T1  - Statistical Models for Count Data
AU  - Alexander Kasyoki Muoka
AU  - Oscar Owino Ngesa
AU  - Anthony Gichuhi Waititu
Y1  - 2016/10/15
PY  - 2016
N1  - https://doi.org/10.11648/j.sjams.20160406.12
DO  - 10.11648/j.sjams.20160406.12
T2  - Science Journal of Applied Mathematics and Statistics
JF  - Science Journal of Applied Mathematics and Statistics
JO  - Science Journal of Applied Mathematics and Statistics
SP  - 256
EP  - 262
PB  - Science Publishing Group
SN  - 2376-9513
UR  - https://doi.org/10.11648/j.sjams.20160406.12
AB  - Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.
VL  - 4
IS  - 6
ER  -

Copy | Download

Author Information

Alexander Kasyoki Muoka

Department of Basic and Applied Sciences, Jomo Kenyatta University of Agriculture and Technology-Westlands campus, Nairobi, Kenya
Oscar Owino Ngesa

Mathematics and Informatics department, Taita Taveta University College, Voi, Kenya
Anthony Gichuhi Waititu

Department of Basic and Applied Sciences, Jomo Kenyatta University of Agriculture and Technology-Westlands campus, Nairobi, Kenya

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu. (2016). Statistical Models for Count Data. Science Journal of Applied Mathematics and Statistics, 4(6), 256-262. https://doi.org/10.11648/j.sjams.20160406.12

Copy | Download

ACS Style

Alexander Kasyoki Muoka; Oscar Owino Ngesa; Anthony Gichuhi Waititu. Statistical Models for Count Data. Sci. J. Appl. Math. Stat. 2016, 4(6), 256-262. doi: 10.11648/j.sjams.20160406.12

Copy | Download

AMA Style

Alexander Kasyoki Muoka, Oscar Owino Ngesa, Anthony Gichuhi Waititu. Statistical Models for Count Data. Sci J Appl Math Stat. 2016;4(6):256-262. doi: 10.11648/j.sjams.20160406.12

Copy | Download

@article{10.11648/j.sjams.20160406.12,
  author = {Alexander Kasyoki Muoka and Oscar Owino Ngesa and Anthony Gichuhi Waititu},
  title = {Statistical Models for Count Data},
  journal = {Science Journal of Applied Mathematics and Statistics},
  volume = {4},
  number = {6},
  pages = {256-262},
  doi = {10.11648/j.sjams.20160406.12},
  url = {https://doi.org/10.11648/j.sjams.20160406.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20160406.12},
  abstract = {Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.},
 year = {2016}
}

Copy | Download

TY  - JOUR
T1  - Statistical Models for Count Data
AU  - Alexander Kasyoki Muoka
AU  - Oscar Owino Ngesa
AU  - Anthony Gichuhi Waititu
Y1  - 2016/10/15
PY  - 2016
N1  - https://doi.org/10.11648/j.sjams.20160406.12
DO  - 10.11648/j.sjams.20160406.12
T2  - Science Journal of Applied Mathematics and Statistics
JF  - Science Journal of Applied Mathematics and Statistics
JO  - Science Journal of Applied Mathematics and Statistics
SP  - 256
EP  - 262
PB  - Science Publishing Group
SN  - 2376-9513
UR  - https://doi.org/10.11648/j.sjams.20160406.12
AB  - Statistical analyses involving count data may take several forms depending on the context of use, that is; simple counts such as the number of plants in a particular field and categorical data in which counts represent the number of items falling in each of the several categories. The mostly adapted model for analyzing count data is the Poisson model. Other models that can be considered for modeling count data are the negative binomial and the hurdle models. It is of great importance that these models are systematically considered and compared before choosing one at the expense of others to handle count data. In real world situations count data sets may have zero counts which have an importance attached to them. In this work, statistical simulation technique was used to compare the performance of these count data models. Count data sets with different proportions of zero were simulated. Akaike Information Criterion (AIC) was used in the simulation study to compare how well several count data models fit the simulated datasets. From the results of the study it was concluded that negative binomial model fits better to over-dispersed data which has below 0.3 proportion of zeros and that hurdle model performs better in data with 0.3 and above proportion of zero.
VL  - 4
IS  - 6
ER  -

Copy | Download