Missing data is a real problem in many surveys. To overcome the problems caused by missing data, partial deletion and single imputation methods among others have been proposed. However, problems such as discarding usable data, inaccuracy in reproducing known population parameters and standard errors are associated with them. In ratio, regression and stochastic imputation, it is assumed that there is a variable with complete cases that can be used as a predictor in estimating missing values in the other variable(s) and the relationship between the dependent and independent variable(s) is linear. This might not always be the case. To overcome these problems accompanied to stochastic and regression estimation, two-phase sampling and nonparametric model-based estimation were employed in this research. Estimator of population total in two-phase sampling was modified. The variance of estimator developed by Hidiroglou, Haziza and Rao was used to compare the performance of the proposed non-parametric model-based imputation in reproducing well known population total and standard errors compared to mean, regression and stochastic methods of imputation. The data was simulated and analyzed using R-statistical Software. The empirical study revealed that non-parametric model-base imputation method is better in reproducing both known population total and standard error.
Published in | Science Journal of Applied Mathematics and Statistics (Volume 9, Issue 5) |
DOI | 10.11648/j.sjams.20210905.12 |
Page(s) | 126-132 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Finite Population Total, Missing Values, Two-phase Sampling
[1] | Cali C., Rachel M. K., Richard F. and Christopher V. H. (2019). Dealing with Missing Data: A Comparative Exploration of Approaches Using the Integrated City Sustainability Database. Urban Affairs Review, Vol. 55 (2), 591–615. |
[2] | Bii N. K., Onyango C. O. and Odhiambo J. (2020). Estimation of a Finite Population Mean under Random Nonresponse Using Kernel Weights. Journal of Probability and Statistics, vol. 2020, 1-9. |
[3] | Yiran D. and Chao-Ying J. P. (2013). Principled missing data methods for researchers. Springer Plus 2 (1), 222-240. |
[4] | Adnan F. A., Jamaludin K. R., Muhamad W. Z. and Miskon S. (2021). Review of Current Publications Trend on Missing Data Imputation Over Three Decades: Direction and Future Research. https://doi.org/10.21203/rs.3.rs-996596/v1 |
[5] | Howell, D. (2012). Treatment of Missing Data-Part 1. www.uvm.edu/dhowell/StatPages/More_Stuff/.../Missing.html |
[6] | Bii N. K., Onyango C. O. and Odhiambo J. (2020). Estimating a Finite Population Mean Using Transformed Data in Presence of Random Nonresponse. International Journal of Mathematics and Mathematical Sciences 2020(4), 1-7. |
[7] | Dorfman, R. (1992). Nonparametric Regression for Estimating Totals in Finite Populations. Proceedings of the Section on Survey Research Methods, American Statistical Association, 622–625. |
[8] | Enders C. K. (2010). Applied Missing Data Analysis. New York: Guilford Press. |
[9] | Brady T. W and Roderick J. A. (2013). Non-response adjustment of survey estimates based on auxiliary variables subject to error. Journal of Royal Statistical Society, Vol. 62 (2), 213–231. |
[10] | Särndal, C. E. and Lundstrom, S. (2005). Estimation in Surveys with Nonresponse. New York: John Wiley & Sons. |
[11] | Yulei, H. (2010). “Missing Data Analysis using Multiple Imputation: Getting to the Heart of the Matter” American Heart Association, 3, 98-105. |
[12] | Saunder, J. A., Morrow, N. H., Spitznagel, E., Dori, P., Enola, K. P. and Pescarino, R. (2006). “Imputing Missing Data: A Comparison of Methods for Social Work Researchers” Social Work Research, 30, 19-32. |
[13] | Little, R. J., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley. |
[14] | Chao-Ying, J. P., Harwell, M., Show-Mann, L. and Lee, H. E. (2006). “Advances in Missing Data Methods and Implications for Educational Research.” In S. Sawilowsky (Ed.), Real data analysis. Greenwich, CT: Information Age Publishing Inc. |
[15] | Amanda, N. B. and Enders, C. K. (2010). “An introduction to modern missing data analyses.” Journal of School Psychology, 48, 5–37. |
[16] | Lehtonen, R. and Pahkinen, E. (2004). Practical Methods for Design and Analysis of Complex Surveys (2nd Edition). New York: John Wiley & Sons Ltd. |
[17] | Overton, W. S. (1985). A Sampling Plan Tor Streams in the National Stream Survey. Statistics, Technical Report 114, Department Oregon State University, Corvallis, Oregon, 97331. |
[18] | Särndal, C. E., Swensson, B., Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer. |
[19] | Nadaraya, E. A. (1964). “On Estimation Regression” Theory of Probability and Application, 9, 141-142. |
[20] | Watson, G. S. (1964). “Smoothing Regression Analysis” Sankhya, Series A, 26, 359-372. |
[21] | Hidiroglou, M. A., Haziza, D. and Rao, J. N. K. (2009). “Comparison of Variance Estimator in Two-phase Sampling: An Empirical Investigation” Pak. J. of Statistics, 27, 477-492. |
[22] | Cochran, W. G. (1977). Sampling Techniques (3rd Edition). New York, John Wiley and Sons. |
[23] | Dennis, D. W., Mendenhall, R. and Schaeffer, R. L. (2008). Mathematical Statistics with Application (7th Edition). Duxbury: Thomson Books/Cole. |
[24] | Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. New York: Chapman & Hall. |
APA Style
Kemei Anderson Kimutai, Christopher Ouma Onyango, Mike Wafula. (2021). Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling. Science Journal of Applied Mathematics and Statistics, 9(5), 126-132. https://doi.org/10.11648/j.sjams.20210905.12
ACS Style
Kemei Anderson Kimutai; Christopher Ouma Onyango; Mike Wafula. Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling. Sci. J. Appl. Math. Stat. 2021, 9(5), 126-132. doi: 10.11648/j.sjams.20210905.12
AMA Style
Kemei Anderson Kimutai, Christopher Ouma Onyango, Mike Wafula. Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling. Sci J Appl Math Stat. 2021;9(5):126-132. doi: 10.11648/j.sjams.20210905.12
@article{10.11648/j.sjams.20210905.12, author = {Kemei Anderson Kimutai and Christopher Ouma Onyango and Mike Wafula}, title = {Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling}, journal = {Science Journal of Applied Mathematics and Statistics}, volume = {9}, number = {5}, pages = {126-132}, doi = {10.11648/j.sjams.20210905.12}, url = {https://doi.org/10.11648/j.sjams.20210905.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20210905.12}, abstract = {Missing data is a real problem in many surveys. To overcome the problems caused by missing data, partial deletion and single imputation methods among others have been proposed. However, problems such as discarding usable data, inaccuracy in reproducing known population parameters and standard errors are associated with them. In ratio, regression and stochastic imputation, it is assumed that there is a variable with complete cases that can be used as a predictor in estimating missing values in the other variable(s) and the relationship between the dependent and independent variable(s) is linear. This might not always be the case. To overcome these problems accompanied to stochastic and regression estimation, two-phase sampling and nonparametric model-based estimation were employed in this research. Estimator of population total in two-phase sampling was modified. The variance of estimator developed by Hidiroglou, Haziza and Rao was used to compare the performance of the proposed non-parametric model-based imputation in reproducing well known population total and standard errors compared to mean, regression and stochastic methods of imputation. The data was simulated and analyzed using R-statistical Software. The empirical study revealed that non-parametric model-base imputation method is better in reproducing both known population total and standard error.}, year = {2021} }
TY - JOUR T1 - Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling AU - Kemei Anderson Kimutai AU - Christopher Ouma Onyango AU - Mike Wafula Y1 - 2021/11/17 PY - 2021 N1 - https://doi.org/10.11648/j.sjams.20210905.12 DO - 10.11648/j.sjams.20210905.12 T2 - Science Journal of Applied Mathematics and Statistics JF - Science Journal of Applied Mathematics and Statistics JO - Science Journal of Applied Mathematics and Statistics SP - 126 EP - 132 PB - Science Publishing Group SN - 2376-9513 UR - https://doi.org/10.11648/j.sjams.20210905.12 AB - Missing data is a real problem in many surveys. To overcome the problems caused by missing data, partial deletion and single imputation methods among others have been proposed. However, problems such as discarding usable data, inaccuracy in reproducing known population parameters and standard errors are associated with them. In ratio, regression and stochastic imputation, it is assumed that there is a variable with complete cases that can be used as a predictor in estimating missing values in the other variable(s) and the relationship between the dependent and independent variable(s) is linear. This might not always be the case. To overcome these problems accompanied to stochastic and regression estimation, two-phase sampling and nonparametric model-based estimation were employed in this research. Estimator of population total in two-phase sampling was modified. The variance of estimator developed by Hidiroglou, Haziza and Rao was used to compare the performance of the proposed non-parametric model-based imputation in reproducing well known population total and standard errors compared to mean, regression and stochastic methods of imputation. The data was simulated and analyzed using R-statistical Software. The empirical study revealed that non-parametric model-base imputation method is better in reproducing both known population total and standard error. VL - 9 IS - 5 ER -