International Journal of Statistics and Applications
p-ISSN: 2168-5193 e-ISSN: 2168-5215
2015; 5(2): 72-76
doi:10.5923/j.statistics.20150502.04
Gafar Matanmi Oyeyemi1, Eyitayo Oluwole Ogunjobi2, Adeyinka Idowu Folorunsho3
1Department of Statistics, University of Ilorin
2Department of Mathematics and Statistics, The Polytechnic Ibadan, Adeseun Ogundoyin Campus, Eruwa
3Department of Mathematics and Statistics, Osun State Polytechnic Iree
Correspondence to: Gafar Matanmi Oyeyemi, Department of Statistics, University of Ilorin.
| Email: | ![]() |
Copyright © 2015 Scientific & Academic Publishing. All Rights Reserved.
Multicollinearity has been a serious problem in regression analysis, Ordinary Least Squares (OLS) regression may result in high variability in the estimates of the regression coefficients in the presence of multicollinearity. Least Absolute Shrinkage and Selection Operator (LASSO) methods is a well established method that reduces the variability of the estimates by shrinking the coefficients and at the same time produces interpretable models by shrinking some coefficients to exactly zero. We present the performance of LASSO -type estimators in the presence of multicollinearity using Monte Carlo approach. The performance of LASSO, Adaptive LASSO, Elastic Net, Fused LASSO and Ridge Regression (RR) in the presence of multicollinearity in simulated data sets are compared Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) criteria. A Monte Carlo experiment of 1000 trials was carried out at different sample sizes n (50, 100 and 150) with different levels of multicollinearity among the exogenous variables (ρ = 0.3, 0.6, and 0.9). The overall performance of Lasso appears to be the best but Elastic net tends to be more accurate when the sample size is large.
Keywords: Multicollinearity, Least Absolute shrinkage and Selection operator, Elastic net, Ridge, Adaptive Lasso, Fused Lasso
Cite this paper: Gafar Matanmi Oyeyemi, Eyitayo Oluwole Ogunjobi, Adeyinka Idowu Folorunsho, On Performance of Shrinkage Methods – A Monte Carlo Study, International Journal of Statistics and Applications, Vol. 5 No. 2, 2015, pp. 72-76. doi: 10.5923/j.statistics.20150502.04.
- norm,
norm or both
which stand as tuning parameters
were used to influence the parameter estimates in order to minimized the effect of the collinearity. Shrinkage methods are popular among the researchers for their theoretical properties e.g. parameter estimation.Over the years, the LASSO - type methods have become popular methods for parameter estimation and variable selection due to their property of shrinking some of the model coefficients to exactly zero see [3], [4]. [3] Proposed a new shrinkage method Least Absolute Shrinkage and Selection Operator (LASSO) with tuning parameter
0 which is a penalized method, [5] for the first systematic study of the asymptotic properties of Lasso – type estimators [4]. The LASSO shrinks some coefficients while setting others to exactly zero, and thus theoretical properties suggest that the LASSO potentially enjoys the good features of both subset selection and ridge regression. [6] had earlier proposed Ridge regression which minimizes the Residual Sum of Squares subject to constraint with
[6] argued that the optimal choice of parameter
yields reasonable predictors because it controls the degree of precision for true coefficient of
to aligned with original variable axis direction in the predictor space. [7] Introduced the Smoothing Clipped Absolute Deviation (SCAD) which penalized Least Square estimate to reduce bias and satisfy certain conditions to yield continuous solutions. [8] was first to propose Ridge Regression which minimizes the Residual Sum of Squares subject to constraint with
thus regarded as
- norm. [9] developed Least Angle Regression Selection (LARS) for a model selection algorithm [10], [11] study the properties of adaptive group Lasso. In 2006, [12] proposed a Generalization of LASSO and other shrinkage methods include Dantzig Selector with Sequential Optimization, DASSO [13], Elastic Net [14], Variable Inclusion and Selection Algorithm, VISA [15], (Adaptive LASSO [16] among others.LASSO-type estimators are the techniques that are often suggested to handle the problem of multicollinearity in regression model. More often than none, Bayesian simulation with secondary data has been used. When the ordinary least squares are adopted there is tendency to have poor inferences, but with LASSO-type estimators which have recently been adopted may still come with its shortcoming by shrinking important parameters, we intend to examine how this shrink parameters may be affected asymptotically. However, the performances of other estimators have not been exhaustively compared in the presence of all these problems. Moreover, the question of which estimator is robust in the presence of a LASSO-type estimators of these problems have not been fully addressed. This is the focus of this research work.![]() | (1) |
are exogenous,
are
random variable with mean zero and finite variance
is
vector. Suppose
takes the largest possible dimension , in other words the number of regressors may be at most
but the true p is somewhere between 1 and
The issue here is to come up with the true model and estimate it at the same time.The least squares estimate without model selection is
with
estimates.Shrinkage estimators are not that easy to calculate as least squares. Thus the objective functions for the shrinkage estimators:![]() | (2) |
is a tuning parameter (for penalization), it is a positive sequence, and
will not be estimated, and
will be specified by us. The objective function consists of 2 parts, the first one is the LS objective function part, and then the penalty factor.Thus, taking the penalty part only
If
is going to infinity or to a constant, the values of
that minimizes that part should be the case that
We get all zeros if we minimize only the penalty part. So the penalty part will shrink the coefficients to zero. This is the function of the penalty.Ridge Regression (RR) by [17] is ideal if there are many predictors, all with non-zero coefficients and drawn from a normal distribution [18] In particular, it performs well with many predictors each having small effect and prevents coefficients of linear regression models with many correlated variables from being poorly determined and exhibiting high variance. RR shrinks the coefficients of correlated predictors equally towards zero. So, for example, given k identical predictors, each would get identical coefficients equal to
the size that any one predictor would get if fit singly [18]. Ridge regression thus does not force coefficients to vanish and hence cannot select a model with only the most relevant and predictive subset of predictors. The ridge regression estimator solves the regression problem in [17] using
penalized least squares:![]() | (3) |
is the
–norm (quadratic) loss function (i.e. residual sum of squares),
is the
of X,
is the
– norm penalty on
and
is the tuning (penalty, regularization, or complexity) parameter which regulates the strength of the penalty (linear shrinkage) by determining the relative importance of the data-dependent empirical error and the penalty term. The larger the value of
the greater is the amount of shrinkage. As the value of
is dependent on the data it can be determined using data-driven methods, such as cross-validation. The intercept is assumed to be zero in (3) due to mean centering of the phenotypes.
penalized least squares criterion to obtain a sparse solution to the following optimization problem:![]() | (4) |
is the
-norm penalty on
which induces sparsity in the solution, and
is a tuning parameter.The
penalty enables the LASSO to simultaneously regularize the least squares fit and shrinks some components of
to zero for some suitably chosen
The cyclical coordinate descent algorithm, [18], efficiently computes the entire lasso solution paths for
for the lasso estimator and is faster than the well-known LARS algorithm [9]. These properties make the lasso an appealing and highly popular variable selection method.
-norm of both the coefficients and their differences:![]() | (5) |
Use of this penalty function has several limitations. For instance, in the "large p, small n" case the LASSO selects at most n variables before it saturates. Also if there is a group of highly correlated variables, then the LASSO tends to select one variable from a group and ignore the others. To overcome these limitations, the elastic net adds a quadratic part to the penalty
which when used alone is (known also as ). The elastic net estimator can be expressed as![]() | (7) |
![]() | (8) |
|
where
is a
consistent estimator such as
where
are the adaptive data-driven weights, which can be estimated by,
where
is a positive constant and
is an initial consistent estimator of
obtained through least squares or ridge regression if multicollinearity is important [16]. The optimal value of
can be simultaneously selected from a grid of values, with values of
selected from {0.5, 1, 2}, using two-dimensional cross-validation [16]. The weights allow the adaptive LASSO to apply different amounts of shrinkage to different coefficients and hence to more severely penalize coefficients with small values. The flexibility introduced by weighting each coefficient differently corrects for the undesirable tendency of the lasso to shrink large coefficients too much yet insufficiently shrink small coefficients by applying the same penalty to every regression coefficient [16].
|