| Peer-Reviewed

Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling

Received: 16 September 2021     Accepted: 9 November 2021     Published: 17 November 2021
Views:       Downloads:
Abstract

Missing data is a real problem in many surveys. To overcome the problems caused by missing data, partial deletion and single imputation methods among others have been proposed. However, problems such as discarding usable data, inaccuracy in reproducing known population parameters and standard errors are associated with them. In ratio, regression and stochastic imputation, it is assumed that there is a variable with complete cases that can be used as a predictor in estimating missing values in the other variable(s) and the relationship between the dependent and independent variable(s) is linear. This might not always be the case. To overcome these problems accompanied to stochastic and regression estimation, two-phase sampling and nonparametric model-based estimation were employed in this research. Estimator of population total in two-phase sampling was modified. The variance of estimator developed by Hidiroglou, Haziza and Rao was used to compare the performance of the proposed non-parametric model-based imputation in reproducing well known population total and standard errors compared to mean, regression and stochastic methods of imputation. The data was simulated and analyzed using R-statistical Software. The empirical study revealed that non-parametric model-base imputation method is better in reproducing both known population total and standard error.

Published in Science Journal of Applied Mathematics and Statistics (Volume 9, Issue 5)
DOI 10.11648/j.sjams.20210905.12
Page(s) 126-132
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Finite Population Total, Missing Values, Two-phase Sampling

References
[1] Cali C., Rachel M. K., Richard F. and Christopher V. H. (2019). Dealing with Missing Data: A Comparative Exploration of Approaches Using the Integrated City Sustainability Database. Urban Affairs Review, Vol. 55 (2), 591–615.
[2] Bii N. K., Onyango C. O. and Odhiambo J. (2020). Estimation of a Finite Population Mean under Random Nonresponse Using Kernel Weights. Journal of Probability and Statistics, vol. 2020, 1-9.
[3] Yiran D. and Chao-Ying J. P. (2013). Principled missing data methods for researchers. Springer Plus 2 (1), 222-240.
[4] Adnan F. A., Jamaludin K. R., Muhamad W. Z. and Miskon S. (2021). Review of Current Publications Trend on Missing Data Imputation Over Three Decades: Direction and Future Research. https://doi.org/10.21203/rs.3.rs-996596/v1
[5] Howell, D. (2012). Treatment of Missing Data-Part 1. www.uvm.edu/dhowell/StatPages/More_Stuff/.../Missing.html
[6] Bii N. K., Onyango C. O. and Odhiambo J. (2020). Estimating a Finite Population Mean Using Transformed Data in Presence of Random Nonresponse. International Journal of Mathematics and Mathematical Sciences 2020(4), 1-7.
[7] Dorfman, R. (1992). Nonparametric Regression for Estimating Totals in Finite Populations. Proceedings of the Section on Survey Research Methods, American Statistical Association, 622–625.
[8] Enders C. K. (2010). Applied Missing Data Analysis. New York: Guilford Press.
[9] Brady T. W and Roderick J. A. (2013). Non-response adjustment of survey estimates based on auxiliary variables subject to error. Journal of Royal Statistical Society, Vol. 62 (2), 213–231.
[10] Särndal, C. E. and Lundstrom, S. (2005). Estimation in Surveys with Nonresponse. New York: John Wiley & Sons.
[11] Yulei, H. (2010). “Missing Data Analysis using Multiple Imputation: Getting to the Heart of the Matter” American Heart Association, 3, 98-105.
[12] Saunder, J. A., Morrow, N. H., Spitznagel, E., Dori, P., Enola, K. P. and Pescarino, R. (2006). “Imputing Missing Data: A Comparison of Methods for Social Work Researchers” Social Work Research, 30, 19-32.
[13] Little, R. J., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.
[14] Chao-Ying, J. P., Harwell, M., Show-Mann, L. and Lee, H. E. (2006). “Advances in Missing Data Methods and Implications for Educational Research.” In S. Sawilowsky (Ed.), Real data analysis. Greenwich, CT: Information Age Publishing Inc.
[15] Amanda, N. B. and Enders, C. K. (2010). “An introduction to modern missing data analyses.” Journal of School Psychology, 48, 5–37.
[16] Lehtonen, R. and Pahkinen, E. (2004). Practical Methods for Design and Analysis of Complex Surveys (2nd Edition). New York: John Wiley & Sons Ltd.
[17] Overton, W. S. (1985). A Sampling Plan Tor Streams in the National Stream Survey. Statistics, Technical Report 114, Department Oregon State University, Corvallis, Oregon, 97331.
[18] Särndal, C. E., Swensson, B., Wretman, J. (1992). Model Assisted Survey Sampling. New York: Springer.
[19] Nadaraya, E. A. (1964). “On Estimation Regression” Theory of Probability and Application, 9, 141-142.
[20] Watson, G. S. (1964). “Smoothing Regression Analysis” Sankhya, Series A, 26, 359-372.
[21] Hidiroglou, M. A., Haziza, D. and Rao, J. N. K. (2009). “Comparison of Variance Estimator in Two-phase Sampling: An Empirical Investigation” Pak. J. of Statistics, 27, 477-492.
[22] Cochran, W. G. (1977). Sampling Techniques (3rd Edition). New York, John Wiley and Sons.
[23] Dennis, D. W., Mendenhall, R. and Schaeffer, R. L. (2008). Mathematical Statistics with Application (7th Edition). Duxbury: Thomson Books/Cole.
[24] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. New York: Chapman & Hall.
Cite This Article
  • APA Style

    Kemei Anderson Kimutai, Christopher Ouma Onyango, Mike Wafula. (2021). Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling. Science Journal of Applied Mathematics and Statistics, 9(5), 126-132. https://doi.org/10.11648/j.sjams.20210905.12

    Copy | Download

    ACS Style

    Kemei Anderson Kimutai; Christopher Ouma Onyango; Mike Wafula. Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling. Sci. J. Appl. Math. Stat. 2021, 9(5), 126-132. doi: 10.11648/j.sjams.20210905.12

    Copy | Download

    AMA Style

    Kemei Anderson Kimutai, Christopher Ouma Onyango, Mike Wafula. Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling. Sci J Appl Math Stat. 2021;9(5):126-132. doi: 10.11648/j.sjams.20210905.12

    Copy | Download

  • @article{10.11648/j.sjams.20210905.12,
      author = {Kemei Anderson Kimutai and Christopher Ouma Onyango and Mike Wafula},
      title = {Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling},
      journal = {Science Journal of Applied Mathematics and Statistics},
      volume = {9},
      number = {5},
      pages = {126-132},
      doi = {10.11648/j.sjams.20210905.12},
      url = {https://doi.org/10.11648/j.sjams.20210905.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20210905.12},
      abstract = {Missing data is a real problem in many surveys. To overcome the problems caused by missing data, partial deletion and single imputation methods among others have been proposed. However, problems such as discarding usable data, inaccuracy in reproducing known population parameters and standard errors are associated with them. In ratio, regression and stochastic imputation, it is assumed that there is a variable with complete cases that can be used as a predictor in estimating missing values in the other variable(s) and the relationship between the dependent and independent variable(s) is linear. This might not always be the case. To overcome these problems accompanied to stochastic and regression estimation, two-phase sampling and nonparametric model-based estimation were employed in this research. Estimator of population total in two-phase sampling was modified. The variance of estimator developed by Hidiroglou, Haziza and Rao was used to compare the performance of the proposed non-parametric model-based imputation in reproducing well known population total and standard errors compared to mean, regression and stochastic methods of imputation. The data was simulated and analyzed using R-statistical Software. The empirical study revealed that non-parametric model-base imputation method is better in reproducing both known population total and standard error.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Estimation of Finite Population Total in Presence of Missing Values in Two-Phase Sampling
    AU  - Kemei Anderson Kimutai
    AU  - Christopher Ouma Onyango
    AU  - Mike Wafula
    Y1  - 2021/11/17
    PY  - 2021
    N1  - https://doi.org/10.11648/j.sjams.20210905.12
    DO  - 10.11648/j.sjams.20210905.12
    T2  - Science Journal of Applied Mathematics and Statistics
    JF  - Science Journal of Applied Mathematics and Statistics
    JO  - Science Journal of Applied Mathematics and Statistics
    SP  - 126
    EP  - 132
    PB  - Science Publishing Group
    SN  - 2376-9513
    UR  - https://doi.org/10.11648/j.sjams.20210905.12
    AB  - Missing data is a real problem in many surveys. To overcome the problems caused by missing data, partial deletion and single imputation methods among others have been proposed. However, problems such as discarding usable data, inaccuracy in reproducing known population parameters and standard errors are associated with them. In ratio, regression and stochastic imputation, it is assumed that there is a variable with complete cases that can be used as a predictor in estimating missing values in the other variable(s) and the relationship between the dependent and independent variable(s) is linear. This might not always be the case. To overcome these problems accompanied to stochastic and regression estimation, two-phase sampling and nonparametric model-based estimation were employed in this research. Estimator of population total in two-phase sampling was modified. The variance of estimator developed by Hidiroglou, Haziza and Rao was used to compare the performance of the proposed non-parametric model-based imputation in reproducing well known population total and standard errors compared to mean, regression and stochastic methods of imputation. The data was simulated and analyzed using R-statistical Software. The empirical study revealed that non-parametric model-base imputation method is better in reproducing both known population total and standard error.
    VL  - 9
    IS  - 5
    ER  - 

    Copy | Download

Author Information
  • Department of Mathematics, Kiriri Women’s University of Science and Technology, Nairobi, Kenya

  • Department of Mathematics, Statistics & Actuarial Science, Kenyatta University, Nairobi, Kenya

  • Department of Mathematics, Statistics & Actuarial Science, Kenyatta University, Nairobi, Kenya

  • Sections