When Multicollinearity exists in a data set, the data is considered deficient. Multicollinearity is frequently encountered in observational studies. It creates difficulties when building regression models. It is a phenomenon whereby two or more explanatory variable in a multiple regression model are highly correlated. Variable selection is an important aspect of model building as such the choice of the best subset among many variables to be included in a model is the most difficult part of model building in regression analysis. Data was obtained from Nigerian Stock Exchange Fact Book, Nigerian Stock Exchange Annual Report and Account, CBN Statistical Bulletin and FOS Statistical bulletin from 1987 to 2018. Variance Inflation Factor (VIF) and correlation matrices were used to detect the presence of multicollinearity. Ridge regression and Least Square Regression were applied using R-package, Minitab and SPSS Packages. Ridge Models with constant range of 0.01 ≤ K ≤ 1.5 and Least Square Regression models were considered for each value of P = 2, 3, …,7. The optimal Ridge and Least Square model from the Ridge and Least Square Regression models were obtained by taking the average rank of the Coefficient of Determination and Mean Square Error. The result showed that the choices of variable selection were affected by the presence of multicollinearity as different variables were selected under Ridge and Least Square Regression for same level of P.
Published in | Science Journal of Applied Mathematics and Statistics (Volume 9, Issue 6) |
DOI | 10.11648/j.sjams.20210906.12 |
Page(s) | 141-153 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Regression, Multicollinearity, Ridge Regression, Partial Least Square, Extra Sum of Squares
[1] | Alin, A. (2010). Multicollinearity. – WIREs Computational Statistics. |
[2] | Brue Ratner (2009). Variable selection method in regression: Ignorable Problem, Outing notable solution. 574 Flander drive north, woodmere NT 11581, USA. |
[3] | Bertsimas D. and King A.: (2016) OR forum—An algorithmic approach to linear regression. Operations Research, 64, 2–16. |
[4] | Bertsimas D., King A., and Mazumder R. (2016) Best subset selection via a modern optimization lens. The Annals of Statistics, 44, 813–852. |
[5] | Chatterjee S. and Hadi A. S (2012).: Regression Analysis by Example, Fifth Edition (Wiley, Hoboken). |
[6] | Dormann CF, Elith J, Bacher S, Buchmann C, Carl G, Carré G, (2013) Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography. 36 (1): 27–46. |
[7] | Farrar D. E. and Glauber R. R (1967).: Multicollinearity in regression analysis: The problem revisited. The Review of Economic and Statistics, 49, 92–107. |
[8] | Frank, I. and Friedman, J. (1993) A statistical view of some chemometrics regression tools. Technometrics, 35, 109–148. |
[9] | Fu, W. (1998) Penalized regression: the bridge versus the lasso. J. Computnl Graph. Statist., 7, 397–416. |
[10] | Gujarati DN, and Porter DCn (2009) Basic Econometrics. New York: McGraw Hill Inc. |
[11] | Gunst R. F. and Webster J. T. (1975),: Regression analysis and problems of multicollinearity. Communications in Statistics—Theory and Methods, 4 277–292. |
[12] | Hadi A. S. and Ling R. F. (1998): Some cautionary notes on the use of principal components regression. The American Statistician, 52, 15–19. |
[13] | Hoerl, A. E and Kennard, R. W (1970). Ridge regression: Biased estimation of non-orthogonal problems. Techno metrics, 12 (1): 55-67. |
[14] | Jolliffe I. T. (1982): A note on the use of principal components in regression. Applied Statistics, 31 300–303. |
[15] | Mansfield E. R. and Helms B. P.: Detecting multicollinearity. The American Statistician, 36 (1982), 158–160. |
[16] | Massy W. F (1965). Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 60, 234–256. |
[17] | Meloun, M., M. Meloun, J. Militky, M. Hill, R. G. Brereton (2002). Crucial problems in regression modelling and their solutions. – Analyst 127: 433–450. |
[18] | Murray, C. J. L. (2006). Eight Americas: investigating mortality disparity across races, counties, and race-counties in the United States. |
[19] | Tibshirani, R. (1996) Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 58, 267–288. |
[20] | Vandenberghe L. and Boyd S (1996), Semidefinite programming. SIAM Review, 38 49–95. |
[21] | Wold H. (1966): Estimation of principal components and related models by iterative least squares. In P. R. Krishnaiaah (ed.): Multivariate Analysis (Academic Press, New York,), 391–420. |
[22] | Zou H. and Hastie T.: Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B (Statistical Methodology), 67 (2005), 301–320. |
APA Style
Harrison Oghenekevwe Etaga, Roseline Chibotu Ndubisi, Ngonadi Lilian Oluebube. (2021). Effect of Multicollinearity on Variable Selection in Multiple Regression. Science Journal of Applied Mathematics and Statistics, 9(6), 141-153. https://doi.org/10.11648/j.sjams.20210906.12
ACS Style
Harrison Oghenekevwe Etaga; Roseline Chibotu Ndubisi; Ngonadi Lilian Oluebube. Effect of Multicollinearity on Variable Selection in Multiple Regression. Sci. J. Appl. Math. Stat. 2021, 9(6), 141-153. doi: 10.11648/j.sjams.20210906.12
AMA Style
Harrison Oghenekevwe Etaga, Roseline Chibotu Ndubisi, Ngonadi Lilian Oluebube. Effect of Multicollinearity on Variable Selection in Multiple Regression. Sci J Appl Math Stat. 2021;9(6):141-153. doi: 10.11648/j.sjams.20210906.12
@article{10.11648/j.sjams.20210906.12, author = {Harrison Oghenekevwe Etaga and Roseline Chibotu Ndubisi and Ngonadi Lilian Oluebube}, title = {Effect of Multicollinearity on Variable Selection in Multiple Regression}, journal = {Science Journal of Applied Mathematics and Statistics}, volume = {9}, number = {6}, pages = {141-153}, doi = {10.11648/j.sjams.20210906.12}, url = {https://doi.org/10.11648/j.sjams.20210906.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjams.20210906.12}, abstract = {When Multicollinearity exists in a data set, the data is considered deficient. Multicollinearity is frequently encountered in observational studies. It creates difficulties when building regression models. It is a phenomenon whereby two or more explanatory variable in a multiple regression model are highly correlated. Variable selection is an important aspect of model building as such the choice of the best subset among many variables to be included in a model is the most difficult part of model building in regression analysis. Data was obtained from Nigerian Stock Exchange Fact Book, Nigerian Stock Exchange Annual Report and Account, CBN Statistical Bulletin and FOS Statistical bulletin from 1987 to 2018. Variance Inflation Factor (VIF) and correlation matrices were used to detect the presence of multicollinearity. Ridge regression and Least Square Regression were applied using R-package, Minitab and SPSS Packages. Ridge Models with constant range of 0.01 ≤ K ≤ 1.5 and Least Square Regression models were considered for each value of P = 2, 3, …,7. The optimal Ridge and Least Square model from the Ridge and Least Square Regression models were obtained by taking the average rank of the Coefficient of Determination and Mean Square Error. The result showed that the choices of variable selection were affected by the presence of multicollinearity as different variables were selected under Ridge and Least Square Regression for same level of P.}, year = {2021} }
TY - JOUR T1 - Effect of Multicollinearity on Variable Selection in Multiple Regression AU - Harrison Oghenekevwe Etaga AU - Roseline Chibotu Ndubisi AU - Ngonadi Lilian Oluebube Y1 - 2021/12/09 PY - 2021 N1 - https://doi.org/10.11648/j.sjams.20210906.12 DO - 10.11648/j.sjams.20210906.12 T2 - Science Journal of Applied Mathematics and Statistics JF - Science Journal of Applied Mathematics and Statistics JO - Science Journal of Applied Mathematics and Statistics SP - 141 EP - 153 PB - Science Publishing Group SN - 2376-9513 UR - https://doi.org/10.11648/j.sjams.20210906.12 AB - When Multicollinearity exists in a data set, the data is considered deficient. Multicollinearity is frequently encountered in observational studies. It creates difficulties when building regression models. It is a phenomenon whereby two or more explanatory variable in a multiple regression model are highly correlated. Variable selection is an important aspect of model building as such the choice of the best subset among many variables to be included in a model is the most difficult part of model building in regression analysis. Data was obtained from Nigerian Stock Exchange Fact Book, Nigerian Stock Exchange Annual Report and Account, CBN Statistical Bulletin and FOS Statistical bulletin from 1987 to 2018. Variance Inflation Factor (VIF) and correlation matrices were used to detect the presence of multicollinearity. Ridge regression and Least Square Regression were applied using R-package, Minitab and SPSS Packages. Ridge Models with constant range of 0.01 ≤ K ≤ 1.5 and Least Square Regression models were considered for each value of P = 2, 3, …,7. The optimal Ridge and Least Square model from the Ridge and Least Square Regression models were obtained by taking the average rank of the Coefficient of Determination and Mean Square Error. The result showed that the choices of variable selection were affected by the presence of multicollinearity as different variables were selected under Ridge and Least Square Regression for same level of P. VL - 9 IS - 6 ER -