Analyzing meteorological data in the northern region of Bangladesh is crucial for understanding various aspects influenced by humidity. This study employs machine learning algorithms, including k-nearest neighbor, Classification and Regression Trees, C5.0, Naive Bayes, Random Forest, and Support Vector Machine, to forecast the humidity of northern Bangladesh. Data from 1981 to 2020 from two meteorological stations, Rangpur and Dinajpur, were utilized. Results indicate that Rangpur had the highest average daily humidity (80.34%), while Dinajpur had the lowest (77.26%). Cloud amount correlates positively with humidity and inversely with temperature. The k-nearest neighbor, random forest, and support vector machine algorithms generally revealed better prediction performance than other algorithms. All things considered, the Random Forest model demonstrates superior performance on the testing dataset at both stations, achieving 70% accuracy, F1-score (75.85%), and a kappa value of approximately 53.3% at Rangpur Station, and 74% accuracy, F1-score (78.4%), and a kappa value of approximately 60% at Dinajpur Station. Subsequently, this study analyzes the best performance and accuracy of the random forest classification algorithms through k-fold cross-validation for predicting humidity. With this piece of information, it is anticipated that the study underscores the importance of random forest in predicting humidity and aiding decision-makers in water demand management, ecological balance, and health quality in the northern region of Bangladesh.
Published in | American Journal of Data Mining and Knowledge Discovery (Volume 10, Issue 1) |
DOI | 10.11648/j.ajdmkd.20251001.11 |
Page(s) | 1-19 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Machine Learning, Cross Validation, Classification, Climate, Humidity, Bangladesh
[1] | Islam, M. M., 2014. Regional Differentials of Annual Average Humidity over Bangladesh. ASA University Review, 8(1), pp. 1-14. |
[2] | Abu-Taleb, A. A., Alawneh, A. J. and Smadi, M. M., 2007. Statistical analysis of recent changes in relative humidity in Jordan. American Journal of Environmental Sciences, 3(2), pp. 75-77. |
[3] | Arundel, A. V., Sterling, E. M., Biggin, J. H. and Sterling, T. D., 1986. Indirect health effects of relative humidity in indoor environments. Environmental health perspectives, 65, pp. 351-361. |
[4] | Salim, M. J. N. P., 1989. Effects of salinity and relative humidity on growth and ionic relations of plants. New Phytologist, 113(1), pp. 13-20. |
[5] | Assmann, S. M. and Grantz, D. A., 1990. The magnitude of the stomatal response to blue light: modulation by atmospheric humidity. Plant Physiology, 93(2), pp. 701- 707. |
[6] | Chowdhury, M., Mondal, S. and Islam, J., 2018. Modeling and forecasting humidity in Bangladesh: box-jenkins approach. International Journal of Research, 6(4), pp. 50-60, |
[7] | Ruane, A. C., Major, D. C., Winston, H. Y., Alam, M., Hussain, S. G., Khan, A. S., Hassan, A., Al Hossain, B. M. T., Goldberg, R., Horton, R. M. and Rosenzweig, C., 2013. Multi-factor impact analysis of agricultural production in Bangladesh with climate change. Global environmental change, 23(1), pp. 338- 350, |
[8] | Rahman, M. H., Hossain, M. M., 2019. Classification and regression tree to predict the precipitation labels of north-west region in Bangladesh. Environment and Natural Resources Research, 9(3), pp. 117-126, |
[9] | Rahman, M. H., Matin, M., Salma, U., 2018. Analysis of precipitation data in Bangladesh through hierarchical clustering and multidimensional scaling. Theoretical and Applied Climatology 134, pp. 689-705, |
[10] | Rahman, M. H., 2022. Prediction of homogeneous region over Bangladesh based on temperature: a non-hierarchical clustering approach. Theoretical and Applied Climatology, 148(3-4), pp. 1127-1149. |
[11] | Ridwan, W. M., Sapitang, M., Aziz, A., Kushiar, K. F., Ahmed, A. N. and El-Shafie, A., 2021. Rainfall forecasting model using machine learning methods: Case study Terengganu, Malaysia. Ain Shams Engineering Journal, 12(2), pp. 1651-1663. |
[12] | Yamac, S. S. and Todorovic, M., 2020. Estimation of daily potato crop evapotranspiration using three different machine learning algorithms and four scenarios of available meteorological data. Agricultural Water Management, 228, p. 105875. |
[13] | Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J., 2017. Classification and Regression Trees. Routledge. |
[14] | Ghiasi, M. M., Zendehboudi, S. and Mohsenipour, A. A., 2020. Decision tree-based diagnosis of coronary artery disease: CART model. Computer methods and programs in biomedicine, 192, p. 105400. |
[15] | Atkinson, E. J., Therneau, T. M., 2000. An introduction to recursive partitioning using the rpart routines. Rochester: Mayo Foundation. |
[16] | Quinlan, J. R., 1986. Induction of decision trees. Machine learning, 1, pp. 81-106. |
[17] | Williams, N., Zander, S. and Armitage, G., 2006. A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. ACM SIGCOMM Computer Communication Review, 36(5), pp. 5-16. |
[18] | Ray, S., 2019. February. A quick review of machine learning algorithms. In 2019 International Conference on Machine Learning, Big Data, cloud and Parallel Computing (COMITCon) (pp. 35-39). IEEE. |
[19] | Parthiban, G., Rajesh, A. and Srivatsa, S. K., 2011. Diagnosis of heart disease for diabetic patients using naive Bayes method. International Journal of Computer Applications, 24(3), pp. 7-11. |
[20] | Breiman, L., 2001. Random forests. Machine learning, 45, pp. 5-32. |
[21] | Hastie, T., 2009. The elements of statistical learning: data mining, inference, and prediction. |
[22] | Xu, W., Zhang, J., Zhang, Q. and Wei, X., 2017, February. Risk prediction of type II diabetes based on random forest model. In 2017 third International Conference on advances in electrical, electronics, information, communication and bio- informatics (AEEICB), pp. 382-386). IEEE. |
[23] | Ukil, A. and Ukil, A., 2007. Support vector machine. Intelligent systems and signal processing in power engineering, pp. 161-226. |
[24] | Suykens, J. A., De Brabanter, J., Lukas, L. and Vandewalle, J., 2002. Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing, 48(1-4), pp. 85-105. |
[25] | Rohani, A., Taki, M. and Abdollahpour, M., 2018. A novel soft computing model (Gaussian process regression with K-fold cross validation) for daily and monthly solar radiation forecasting (Part: I). Renewable Energy, 115, pp. 411-422. |
[26] | Borna, N. J. and Rahman, M. H., 2024. Evaluating the degree of cloudiness using machine learning techniques based on different atmospheric conditions. Theoretical and Applied Climatology, pp. 1-30. |
[27] | Rahman, M. H., 2024. ANN-based and DT- based Classification Approaches to Predict the Rainfall Level of the Grid (90°E − 92°E, 23°N − 25°N) in Bangladesh. International Journal of Data Science and Analysis, 10(6), pp. 109-128. |
APA Style
Akter, M. R., Rahman, M. H. (2025). Analysis of Climatic Factors and Utilization of Machine Learning Techniques to Anticipate Humidity Levels in Northern Bangladesh. American Journal of Data Mining and Knowledge Discovery, 10(1), 1-19. https://doi.org/10.11648/j.ajdmkd.20251001.11
ACS Style
Akter, M. R.; Rahman, M. H. Analysis of Climatic Factors and Utilization of Machine Learning Techniques to Anticipate Humidity Levels in Northern Bangladesh. Am. J. Data Min. Knowl. Discov. 2025, 10(1), 1-19. doi: 10.11648/j.ajdmkd.20251001.11
@article{10.11648/j.ajdmkd.20251001.11, author = {Most. Rubina Akter and Md. Habibur Rahman}, title = {Analysis of Climatic Factors and Utilization of Machine Learning Techniques to Anticipate Humidity Levels in Northern Bangladesh}, journal = {American Journal of Data Mining and Knowledge Discovery}, volume = {10}, number = {1}, pages = {1-19}, doi = {10.11648/j.ajdmkd.20251001.11}, url = {https://doi.org/10.11648/j.ajdmkd.20251001.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajdmkd.20251001.11}, abstract = {Analyzing meteorological data in the northern region of Bangladesh is crucial for understanding various aspects influenced by humidity. This study employs machine learning algorithms, including k-nearest neighbor, Classification and Regression Trees, C5.0, Naive Bayes, Random Forest, and Support Vector Machine, to forecast the humidity of northern Bangladesh. Data from 1981 to 2020 from two meteorological stations, Rangpur and Dinajpur, were utilized. Results indicate that Rangpur had the highest average daily humidity (80.34%), while Dinajpur had the lowest (77.26%). Cloud amount correlates positively with humidity and inversely with temperature. The k-nearest neighbor, random forest, and support vector machine algorithms generally revealed better prediction performance than other algorithms. All things considered, the Random Forest model demonstrates superior performance on the testing dataset at both stations, achieving 70% accuracy, F1-score (75.85%), and a kappa value of approximately 53.3% at Rangpur Station, and 74% accuracy, F1-score (78.4%), and a kappa value of approximately 60% at Dinajpur Station. Subsequently, this study analyzes the best performance and accuracy of the random forest classification algorithms through k-fold cross-validation for predicting humidity. With this piece of information, it is anticipated that the study underscores the importance of random forest in predicting humidity and aiding decision-makers in water demand management, ecological balance, and health quality in the northern region of Bangladesh.}, year = {2025} }
TY - JOUR T1 - Analysis of Climatic Factors and Utilization of Machine Learning Techniques to Anticipate Humidity Levels in Northern Bangladesh AU - Most. Rubina Akter AU - Md. Habibur Rahman Y1 - 2025/03/05 PY - 2025 N1 - https://doi.org/10.11648/j.ajdmkd.20251001.11 DO - 10.11648/j.ajdmkd.20251001.11 T2 - American Journal of Data Mining and Knowledge Discovery JF - American Journal of Data Mining and Knowledge Discovery JO - American Journal of Data Mining and Knowledge Discovery SP - 1 EP - 19 PB - Science Publishing Group SN - 2578-7837 UR - https://doi.org/10.11648/j.ajdmkd.20251001.11 AB - Analyzing meteorological data in the northern region of Bangladesh is crucial for understanding various aspects influenced by humidity. This study employs machine learning algorithms, including k-nearest neighbor, Classification and Regression Trees, C5.0, Naive Bayes, Random Forest, and Support Vector Machine, to forecast the humidity of northern Bangladesh. Data from 1981 to 2020 from two meteorological stations, Rangpur and Dinajpur, were utilized. Results indicate that Rangpur had the highest average daily humidity (80.34%), while Dinajpur had the lowest (77.26%). Cloud amount correlates positively with humidity and inversely with temperature. The k-nearest neighbor, random forest, and support vector machine algorithms generally revealed better prediction performance than other algorithms. All things considered, the Random Forest model demonstrates superior performance on the testing dataset at both stations, achieving 70% accuracy, F1-score (75.85%), and a kappa value of approximately 53.3% at Rangpur Station, and 74% accuracy, F1-score (78.4%), and a kappa value of approximately 60% at Dinajpur Station. Subsequently, this study analyzes the best performance and accuracy of the random forest classification algorithms through k-fold cross-validation for predicting humidity. With this piece of information, it is anticipated that the study underscores the importance of random forest in predicting humidity and aiding decision-makers in water demand management, ecological balance, and health quality in the northern region of Bangladesh. VL - 10 IS - 1 ER -