Shrinkage Parameter Selection Via Modified Cross Validation Approach For Ridge Regression Model
Shrinkage Parameter Selection Via Modified Cross Validation Approach For Ridge Regression Model
Computation
To cite this article: Zakariya Yahya Algamal (2020) Shrinkage parameter selection via modified
cross-validation approach for ridge regression model, Communications in Statistics - Simulation
and Computation, 49:7, 1922-1930, DOI: 10.1080/03610918.2018.1508704
1. Introduction
Linear regression models are widely applied for studying several real data problems. In
dealing with the linear regression models, it is assumed that there is no high correlation
among the explanatory variables. In practice, however, this assumption often not holds,
which leads to the problem of multicollinearity. In the presence of multicollinearity,
when estimating the regression coefficients for linear regression model using the ordin-
ary least squares (OLS) method, the estimated coefficients are usually become unstable
with a high variance, and therefore low statistical significance with incorrect signs
€
(ALheety and Kibria 2014; Batah, Ozkale and Gore 2009; Jou, Huang and Cho 2014).
Numerous remedial methods have been proposed to overcome the problem of multi-
collinearity. The ridge regression method (Hoerl and Kennard 1970) has been consist-
ently demonstrated to be an attractive and alternative to the OLS estimation method.
Ridge regression is a shrinkage method that shrinks the average length of the coeffi-
cient vector to zero to reduce the large variance (Algamal 2018a, 2018b; Asar and Genç
2015). The ridge regression performance greatly relies on the choice of shrinkage par-
ameter. Consequently, choosing a suitable value of the shrinkage parameter is an
important part of ridge regression model fitting (S€ ok€ €
ut Açar and Ozkale 2015). Several
methods, which they are based on the original ridge regression of (Hoerl and Kennard
1970), are available for estimating the ridge shrinkage parameter in the literature
(Alkhamisi, Khalaf and Shukur 2006; Asar, Karaibrahimoglu, and Genç 2014; Hamed,
Hefnawy and Farag 2013; Hefnawy and Farag 2014; Khalaf and Shukur 2005; Kibria
2003; Muniz and Kibria 2009). Cross-validation method (CV), a data-driven approach,
on the other hand, is a practically useful approach for handling the shrinkage selection
€
problem in ridge regression (Ozkale 2015). This is due to the attractive property of the
CV, which does not assume any underlying distribution about the data. Furthermore,
CV can consider a natural choice when the target of model fitting is prediction
(Sabourin, Valdar and Nobel 2015). Several researchers employed the using of CV in
penalized, shrinkage, and variable selection methods (Stone 1974; Tibshirani 1996;
Vach, Sauerbrei and Schumacher 2001; van Houwelingen and Sauerbrei 2013). In add-
ition, Jung (2009) proposed a robust CV instead of CV in estimating the ridge param-
eter when there are outliers.
The idea behind the CV is to randomly split the data into k mutually exclusive folds
of approximately equal size. Among the k folds, one-fold is retained as validation data
set for testing the model fitting, and the remaining k1 folds are used as training data
set to fit the model with a specific value of the shrinkage parameter. Then, the predic-
tion performance over these splits is averaged to represent the predictability of the fitted
model. After that, the best value of the shrinkage parameter is the corresponding value
to the small prediction error. It is clear that CV method is greatly dependent on the
fold assignment process which leads to large variability in selecting the shrinkage par-
ameter value and, consequently, will negatively affect the prediction performance of the
ridge model.
In this paper, a modification of CV is proposed to address the variability of shrinkage
parameter selection. This modification is based on repeated fold assignment. And then,
a proper quantile value of the best shrinkage parameter values, which are obtained over
the repeated fold assignment, is utilized. Due to this proposed modification, the shrink-
age parameter selection is shown to have better performance in terms of
model prediction.
The remainder of this paper is organized as follows. Section 2 contains the prelimina-
ries of the related subject. Section 3 presents the proposed method and its related algo-
rithm. While Secs. 4 and 5 cover the simulation and real data results. Finally, the
conclusion is covered by Sec. 6.
2. Preliminaries
Suppose that we have a data set fðyi ; xi Þgni¼1 where yi 2 R is a response variable and
xi ¼ ðxi1 ; xi2 ; :::; xip Þ 2 Rp represents a p-dimensional explanatory variable vector.
Without loss of generality, it is assumed that the response variable is centered and the
explanatory variables are standardized.
Consider the following linear regression model,
y ¼ Xb þ e; (1)
where y is an n 1 vector of observations of the response variable, X ¼ ðx1 ; :::; xp Þ is
an n p known design matrix of explanatory variables, b ¼ ðb1 ; :::; bp Þ is a p 1 vector
of unknown regression coefficients, and e is an n 1 vector of random errors with
mean 0 and variance r2 . Using OLS method, the parameter estimation of Eq. (1) is
1924 Z. Y. ALGAMAL
given by
1
b^OLS ¼ ðXT XÞ XT y: (2)
OLS estimator is unbiased and it has minimum variance among all linear unbiased
estimators. However, in the presence of multicollinearity, the XT X matrix is nearly sin-
gular that makes OLS estimator unstable due to their large variance. To reduce the
effects of the multicollinearity, ridge regression (RR) (Hoerl and Kennard 1970), which
is the most commonly used method, adds a positive shrinkage parameter, k, to the
main diagonal of the XT X matrix. The RR estimator is defined as
1
b^RR ¼ ðXT X þ kIÞ XT y; (3)
where I is the identity matrix with dimension p p. The estimator b^RR is biased but
more stable and has less mean square error. The shrinkage parameter, k, controls the
shrinkage of b toward zero. The OLS estimator can be considered as a special estimator
from the RR with k ¼ 0. For larger value of k, the RR estimator yields greater shrinkage
approaching zero (Hefnawy and Farag 2014).
by
yi ¼ b0 þ b1 xi1 þ ::: þ bp xip þ ei ; (7)
where ei is independent and identically normal distributed pseudo-random numbers
Pp 2
with zero mean and variance r2 and b ¼ ðb0 ; b1 ; :::; bp Þ with j¼1 bj ¼ 1 and b1 ¼
b2 ¼ ::: ¼ bp (Kibria 2003; Månsson and Shukur 2011). Because the sample size has dir-
ect impact on the prediction accuracy, three representative values of the sample size are
considered: 50, 100, and 150. In addition, the number of the explanatory variables are
considered as p ¼ 3 and p ¼ 5. Further, because we are interested in the effect of multi-
collinearity, in which the degrees of correlation considered more important, three values
of the pairwise correlation are considered with q ¼ f0:90; 0:95; 0:99g. Besides, three val-
ues of r2 are investigated, which are 0.5, 1, and 10.
For a combination of these different values of n; p; q, and r2 the generated data is
repeated 5000 times and the averaged mean squared errors (MSE) is calculated as
5000
1 X
MSE b^RR ¼ b^RR bÞ b^RR b ;
T
(8)
5000 i¼1
where b^RR is the obtained ridge estimator by MCV, CV, and GCV. The MSE values
from the Monte Carlo simulation study are reported in Tables 1–3.
It can be observed from Tables 1–3 that the performance of the MCV in general
exhibit better than its classical counterparts providing smallest MSE. For instance, when
n ¼ 50, p ¼ 5, r2 ¼ 10, and q ¼ 0:99, the MSE of the MCV was about 44.673 and
31.712% lower than that of CV and GCV, respectively. In addition, it can be noted
from Tables 1–3 that for a fixed number of sample size, for a fixed number of
Figure 1. The RMSE of MCV with respect to each of HKB, Kibria, Kh, and KH when p ¼ 3.
explanatory variables, for a fixed value of the correlation among the explanatory varia-
bles, as the r2 increases, the MSE of all the used methods is monotone increasing.
However, when r2 ¼ 0:5, r2 ¼ 1, and r2 ¼ 10, regardless of n, p, and q, the order of
the performance does not change, where the MCV method is still the best and respect-
ively, the GCV and the CV come after. Furthermore, for all the value of n, p, r2 , and q,
CV gave highest MSE. When r2 increasing, the performance of the CV deteriorates
indicating inaccurate parameter estimation. Moreover, when the degree of multicolli-
nearity is increasing, it is obvious that the MSE of the MCV method is
slightly increasing.
To further highlight the estimation performance of the proposed method, MCV, a
comparison with previously proposed methods are performed. Specifically, we compared
MCV with each of Hoerl et al. (1975) (HKB), Kibria (2003) (Kibria), Khalaf and Shukur
(2005) (KS), and Alkhamisi, Khalaf and Shukur (2006) (Kh). Figures 1 and 2 display
the average of relative mean squared errors (RMSE) of MCV with respect to each of
HKB, Kibria, KS, and Kh. The values of RMSE less than 1 indicate that the MCV are
superior to the used methods.
It is clearly seen from Figures 1 and 2 that MCV method is usually more efficient
than the HKB and Kibria methods for all the cases when the multicollinearity is chang-
ing. On the other hand, the efficiency of MCV is comparable with KS and Kh methods
when r2 1 and q 0:95, regardless of the sample size. In contrast, regardless of the
1928 Z. Y. ALGAMAL
Figure 2. The RMSE of MCV with respect to each of HKB, Kibria, Kh, and KH when p ¼ 5.
Table 4. The estimated regression parameters and the MSE results for the used methods for Portland
cements dataset.
Methods
Covariates
MCV CV GCV HKB Kibria KS Kh
x1 2.103 2.021 1.520 1.902 1.864 2.038 1.125
x2 1.190 1.107 1.108 1.491 1.492 1.137 1.380
x3 0.690 0.712 0.709 0.669 0.634 0.699 0.331
x4 0.594 0.461 0.473 0.460 0.481 0.403 0.418
MSE 9303.049 9322.722 9318.217 9314.508 9313.549 9308.713 9308.671
sample size, MCV method is slightly less efficient than the KS and Kh when r2 ¼ 10
and q ¼ 0:90 for both the number of explanatory variables.
Overall, the simulation results show that MCV achieves a competitive performance
among other methods, especially when the multicollinearity is high.
6. Conclusion
In this paper, a new shrinkage parameter selection of the ridge regression model was
proposed by modifying the cross-validation method. This modification allows us to han-
dle multicollinearity with decreasing the variability of shrinkage parameter selection that
is used using the classical cross-validation method. Simulation and real data example
results demonstrate that the proposed method is outperformed the classical cross-valid-
ation and the generalized cross-validation methods. Furthermore, the results proved
that the proposed method is more efficient than HKB, Kibria, KS, and Kh methods
when r2 1 and q 0:95.
ORCID
Zakariya Yahya Algamal http://orcid.org/0000-0002-0229-7958
References
Algamal, Z. Y. 2018a. Developing a ridge estimator for the gamma regression model. Journal of
Chemometrics. doi:10.1002/cem.3054
Algamal, Z. Y. 2018b. Shrinkage estimators for gamma regression model. Electronic Journal of
Applied Statistical Analysis 11 (1):253–68.
Alheety, M. I., and B. M. G. Kibria. 2014. A generalized stochastic restricted ridge regression esti-
mator. Communications in Statistics—Theory and Methods 43 (20):4415–27. doi:10.1080/
03610926.2012.724506
Alkhamisi, M., G. Khalaf, and G. Shukur. 2006. Some modifications for choosing ridge parame-
ters. Communications in Statistics: Theory and Methods 35 (11):2005–20. doi:10.1080/
03610920600762905
Asar, Y., and A. Genç. 2015. New shrinkage parameters for the Liu-type logistic estimators.
Communications in Statistics: Simulation and Computation 45 (3):1094–103. doi:10.1080/
03610918.2014.995815
Asar, Y., A. Karaibrahimoglu, and A. Genç. 2014. Modified ridge regression parameters: A com-
parative Monte Carlo study. Hacettepe Journal of Mathematics and Statistics 43 (5):827–41.
€
Batah, F. S. M., M. R. Ozkale, and S. D. Gore. 2009. Combining unbiased ridge and principal
component regression estimators. Communications in Statistics: Theory and Methods 38 (13):
2201–9. doi:10.1080/03610920802503396
Bhat, S., and R. Vidya. 2016. A class of generalised ridge estimator. Communications in Statistics:
Simulation and Computation. doi:10.1080/03610918.2016.1144765
Boonstra, P. S., B. Mukherjee, and J. M. Taylor. 2015. A small-sample choice of the tuning par-
ameter in ridge regression. Statistica Sinica 25 (3):1185–206.
1930 Z. Y. ALGAMAL
Golub, G. H., M. Heath, and G. Wahba. 1979. Generalized cross-validation as a method for
choosing a good ridge parameter. Technometrics 21 (2):215–23. doi:10.1080/
00401706.1979.10489751
Hamed, R., A. E. L. Hefnawy, and A. Farag. 2013. Selection of the ridge parameter using math-
ematical programming. Communications in Statistics: Simulation and Computation 42 (6):
1409–32. doi:10.1080/03610918.2012.659821
Hefnawy, A. E., and A. Farag. 2014. A combined nonlinear programming model and Kibria
method for choosing ridge parameter regression. Communications in Statistics: Simulation and
Computation 43 (6):1442–70. doi:10.1080/03610918.2012.735317
Hoerl, A. E., and R. W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal
problems. Technometrics 12 (1):55–67.
Hoerl, A. E., R. W. Kannard, and K. F. Baldwin. 1975. Ridge regression: some simulations.
Communications in Statistics-Theory and Methods 4 (2):105–23.
Jou, Y.-J., C.-C. L. Huang, and H.-J. Cho. 2014. A VIF-based optimization model to alleviate col-
linearity problems in multiple linear regression. Computational Statistics 29 (6):1515–41. doi:
10.1007/s00180-014-0504-3
Jung, K.-M. 2009. Robust cross validations in ridge regression. Journal of Applied Mathematics &
Informatics 27 (3):903–8.
Khalaf, G., and G. Shukur. 2005. Choosing ridge parameter for regression problems.
Communications in Statistics: Theory and Methods 34 (5):1177–82. doi:10.1081/sta-200056836
Kibria, B. M. G. 2003. Performance of some new ridge regression estimators. Communications in
Statistics: Simulation and Computation 32 (2):419–35. doi:10.1081/sac-120017499
Månsson, K., and G. Shukur. 2011. A Poisson ridge regression estimator. Economic Modelling 28
(4):1475–81. doi:10.1016/j.econmod.2011.02.030
McDonald, G. C., and D. I. Galarneau. 1975. A Monte Carlo evaluation of some ridge-type esti-
mators. Journal of the American Statistical Association 70 (350):407–16.
Muniz, G., and B. M. G. Kibria. 2009. On some ridge regression estimators: an empirical compar-
isons. Communications in Statistics: Simulation and Computation 38 (3):621–30. doi:10.1080/
03610910802592838
€
Ozkale, M. R. 2015. Predictive performance of linear regression models. Statistical Papers 56 (2):
531–67. doi:10.1007/s00362-014-0596-4
Sabourin, J. A., W. Valdar, and A. B. Nobel. 2015. A permutation approach for selecting the pen-
alty parameter in penalized model selection. Biometrics 71 (4):1185–94.
S€
ok€ €
ut Açar, T., and M. R. Ozkale. 2015. Cross validation of ridge regression estimator in autocor-
related linear regression models. Journal of Statistical Computation and Simulation 86 (12):
2429–40. doi:10.1080/00949655.2015.1112392
Stone, M. 1974. Cross-validatory choice and assessment of statistical predictions. Journal of the
Royal Statistical Society. Series B (Methodological) 36 (2):111–47.
Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal
Statistical Society. Series B (Methodological) 58 (1):267–88.
Vach, K., W. Sauerbrei, and M. Schumacher. 2001. Variable selection and shrinkage: comparison
of some approaches. Statistica Neerlandica 55 (1):53–75.
van Houwelingen, H. C., and W. Sauerbrei. 2013. Cross-validation, shrinkage and variable selec-
tion in linear regression revisited. Open Journal of Statistics 03 (02):79–102.