0% found this document useful (0 votes)
7 views26 pages

Variable Selection

Uploaded by

BHAI LOG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views26 pages

Variable Selection

Uploaded by

BHAI LOG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Variable Selection Methods

For Regression and Classification problems

W W W . P E A K S 2 T A I L S . C O M
Feature Preparation/Processing

Feature engineering Covariates creation and Feature transformation

Covariates creation Features transformation

➢ Taking lags of MEVs ➢ Lag transformation


➢ Taking interaction terms ➢ Growth rates
➢ Debt to income ratio ➢ Weight of evidence
➢ Loan to value ratio (discretization)
➢ Utilization ratio

➢ Basic Filters
Feature selection Selecting important variables ➢ Statistical Filters
➢ Wrapper Methods
➢ Embedded Methods

W W W . P E A K S 2 T A I L S . C O M
Variable Selection Methods

Remove constants, quasi constants and


Sort of data cleaning Basic Filters duplicates

Selecting important variables


Statistical Filters Fairly high correlation, intuitive signs, Woe
individually
Variable Selection trend, high information value

Selecting important variables Wrapper Methods Forward, backward, stepwise regression


collectively Model Selection based on 𝑅 2 , AUC, p value, VIF,
Exhaustive Model search

Unselecting variables leading Embedded methods Penalized regression models


to overfitting Model tuning ridge, lasso, elastic net

W W W . P E A K S 2 T A I L S . C O M
Linear Regression Logistic Regression
1. Co-variates creation Ratios, interactions Ratios, interactions
LTV ratio, DTI ratio, Utilisation ratio
2. Variable transformation Lagged transformations, relative Weight of evidence transformations
change
3. Basic Filters Constants, quasi constants and Constants, quasi constants and
duplicates duplicates
4. Statistical Filters Correlation, Multicollinearity, Sign WOE trend, Information value, Gini, Chi
intuitiveness square test, Mutual information
5. Wrapper methods Forward/Backward/Stepwise Forward/Backward/Stepwise
Regression, Exhaustive Model Search Regression, Exhaustive Model Search
6. Embedded Methods Penalised Regression – Lasso, Ridge, Penalised Regression – Lasso, Ridge,
Elastic Net Elastic Net
Cost Sensitive Learning

W W W . P E A K S 2 T A I L S . C O M
Particular Formula
1. Growth 𝑋𝑡 − 𝑋𝑡 − 1
𝑋𝑡
2. Difference Xt – Xt-1
3. MA Average of n quarters
4. QoQ Xt vs Xt-1
5. YoY Xt vs Xt-4
6. Lag Xt-1
7. Leading Xt+1
8. Log Odds 𝐷𝑅
Log1−𝐷𝑅
9. Vasicek Z calculated by minimizing squared
errors

W W W . P E A K S 2 T A I L S . C O M
Basic Filters Rationale
1. Constants We want to explain variation in Y through variations in X. If X has no variation, it
is not useful to explain the variance in Y

2. Quasi- Constants Variables with low variance are dropped for the same reasons as above

3. Duplicates Adding duplicate variables causing redundancy in the model. It can also lead to
problems like multicollinearity

Statistical Filters Linear Regression Logistic Regression


1. Magnitude of relationship with Y Correlation should be greater than IV should be high. Gini should be
should be significant 30% above 10%
2. Direction of relationship should Actual sign of correlation should be Trend of WOE should be same as
be as expected same as expected expected

W W W . P E A K S 2 T A I L S . C O M
Variable Clustering (for dimension reduction)
• The variable clustering (PROC VARCLUS in SAS) procedure is standard and widely used in the industry for variable selection. The VARCLUS procedure
divides a set of numeric variables into either disjointed or hierarchical clusters. Associated with each cluster is a linear combination of the variables in
the cluster, which may be either the first principal component or the centroid component. PROC VARCLUS displays the R 2 value of each variable within
its own cluster and against its nearest cluster. The lower the ratio of (1 - R2own) / (1 - R2nearest) for each variable, the better it can represent the cluster.
Either the top 10 variables from each cluster or ratio below some cut-off (0.5) to account for significant set of variables are selected. The cut-off used
would change iteratively as the total number of variables that would appear fit to be imputed in the step-wise regression (after variable reduction
techniques like correlation), regression results are examined, analyzing model performance and goodness of fit results. This technique helps in
providing the first line of defense to multi-collinearity.

In simple language it’s a 4-step process


1. Create Principal components – Think of these as clusters
2. Assign variables to Principal components – Based on correlation of each variable to each component
3. Create Subcomponents – If the variables in cluster don’t have significant correlation the principal component,
then go more granular
4. Choose the best variable – Out of all the variables that belong to a cluster, choose a variable which has
the highest correlation with that cluster and lowest correlation to other clusters.

W W W . P E A K S 2 T A I L S . C O M
Variable Clustering

W W W . P E A K S 2 T A I L S . C O M
Weight of
Information Value Predictive Power
< 0.02 Useless for prediction
0.02 to 0.1 Weak predictor

Evidences 0.1 to 0.3


0.3 to 0.5
Medium predictor
Strong predictor
> 0.5 Suspicious or too good to be true

Is high IV really bad ?

W W W . P E A K S 2 T A I L S . C O M
Wrapper methods
Technique Explanation

1. Forward Selection Start with a Null model. Add one variable at a time. Start with variable for which AIC is
the lowest. Add next variable for which the AIC is lower. We can also use p value or
AUC or Marginal Information Value or Marginal Contributions
2. Backward Selection Start with a Full model. Eliminate one variable at a time. Start with the variable for
which AIC is lowest. Eliminate next variable for which the AIC is lower.
3. Stepwise Regression Suppose you add a variable through Forward regression. After adding a variable, you
can eliminate an existing variable if p value is above 5% or VIF is above threshold.
4. Sequential Forward At every step after you add new variables, you eliminate each of the existing variable
and see if model performance can be improved. Also, model performance is checked
on testing set.
5. Sequential Backward At every step after you eliminate variables, you add back each of the eliminated
variable and see if model performance can be improved. Also, model performance is
checked on testing set.
6. Recursive Feature Elimination Rank all the features based on absolute values of beta coefficients. Eliminate the least
ranked feature. Repeat the process on balance variables.
7. Exhaustive Model Search Try all the possible combinations of models and choose the model which passes all
the assumptions tests and has sufficient accuracy

Note - We can also use some other techniques like Forced Variable selection or controlling selection sequence
W W W . P E A K S 2 T A I L S . C O M
Wrapper
Methods
Which technique is the best ?

W W W . P E A K S 2 T A I L S . C O M
Penalised Regression to reduce Overfitting

• What leads to ‘overfitting’

Too much simplicity


and ‘underfitting’ in the
data.
- Overfitting a cause by
taking extremely high no. of
variables making the model
complex.
- Underfitting is caused by
taking extremely low no. of
variables. Too much complexity

W W W . P E A K S 2 T A I L S . C O M
# Reducing the problem of Overfitting
• Preventing the Algorithm from getting too complex requires estimating a penalty () for increase in
complexity & proper data sampling achieved using validation ( k fold validation).
• When we add penalty terms to our regular regression model, it becomes penalized regression model.

# Penalized Regression
• Penalized Regression is useful for reducing a large no. of features to a manageable set and for making
good prediction in a variety of large data sets specially when the features (X’s) are correlated.
→ Penalised regression includes a constraints such that the regression coefficients are chosen to minimize the
SSE + a penalty term that increases in size with the number of included features. So, in penalized regression
a features must make a significant contribution to the model fit to offset penalty from including it.  only
the most imp features for explaining Y will remain in the penalised regression model.

W W W . P E A K S 2 T A I L S . C O M
# LASSO Regression
Q. How to find  ? (Regularisation Parameter)
We choose that level of , for which the mean squared error of
validation set is the lowest.
# Ridge Regression
→ Finding out the MSE on validation set requires K – fold cross
validation.

Step 1: For the first fold, on training data, take  = say 0.1, run the penalized regression model and find
out the Beta Coefficients.
# Elastic Nets Regression
Step 2: Based on estimated Beta Coefficients find out Y for the first fold validation set and thus the error

1 and 2
terms, now you can calculate mean square error in the validation set.
Step 3: If K = 4, repeat the above 2 steps 3 times more and collect MSE, in validation fold 2, fold 3, fold 4.
Take the average MSE.
Step 4: Repeat all the above 3 steps taking lambda = say 0.3.
Step 5: Plot the average MSE validation against possible  (lambdas) and choose that  which gives the
lowest average MSE in the validation sets.

Regression Objective Function for Ridge


Linear Regression SSE +
Logistic Regression - Log Likelihood +
W W W . P E A K S 2 T A I L S . C O M
Penalised Regression – Linear
Regression
W W W . P E A K S 2 T A I L S . C O M
Penalised Regression - Logistic

W W W . P E A K S 2 T A I L S . C O M
Regression Pipeline

Dimension Sign test Explanatory power & Add back business


Reduction Contemporaneous preferred variables

• Variable • Actual signs = • Correlation > 30% • To keep the model


Clustering expected signs useful for them

Exhaustive Model Statistical accuracy Contemporaneous Line of Business


Search and adequacy and Parsimonious and
Stability

• Without • Adjusted R2, MSE, • Fewer lags, fewer • Business people


regularisation i.e. Regression variables will choose the
K fold CV or with assumptions best model finally
regularisation i.e.
nested K fold CV

W W W . P E A K S 2 T A I L S . C O M
_log
_lag_1
_lag_2
_lag_3
_lag_4
_lead_1
_lead_2
Variable transformation list
_lead_3
_lead_4
_qoq_diff
_qoq_diff_lag_1
_qoq_diff_lag_2
_qoq_diff_lag_3
_qoq_diff_lag_4
_yoy_diff
_yoy_diff_lag_1
_yoy_diff_lag_2
_yoy_diff_lag_3
_yoy_diff_lag_4
_qoq_log_growth
_qoq_log_growth_lag_1
_qoq_log_growth_lag_2
_qoq_log_growth_lag_3
_qoq_log_growth_lag_4
_qoq_simple_growth
_qoq_simple_growth_lag_1
_qoq_simple_growth_lag_2
_qoq_simple_growth_lag_3
data
_qoq_simple_growth_lag_4
_yoy_log_growth 1.000
_yoy_log_growth_lag_1
_yoy_log_growth_lag_2
_yoy_log_growth_lag_3
0.800
_yoy_log_growth_lag_4
_yoy_simple_growth 0.600
_yoy_simple_growth_lag_1
_yoy_simple_growth_lag_2 0.400
_yoy_simple_growth_lag_3
0.200
_yoy_simple_growth_lag_4
_qqma2_leading
_qqma3_leading 0.000
_qqma4_leading 1 3 5 7 9 11131517192123252729313335373941434547495153555759
_qqma2_lagging
_qqma3_lagging -0.200
_qqma4_lagging

W W W . P E A K S 2 T A I L-0.400S . C O M
Is Correlation sufficient to detect Multicollinearity ?

W W W . P E A K S 2 T A I L S . C O M
Exhaustive
Model Search

W W W . P E A K S 2 T A I L S . C O M
• Run all possible models without
Dimension • Variable Exhaustive Model regularisation i.e. K fold CV or with
Reduction Clustering Search regularisation i.e. nested K fold CV

Logical WOE Trend


• Monotonic trend
achieved Statistical accuracy • Multicollinearity, LR test, p value
and adequacy • Area under the Curve, AIC

Stability
• Characteristic
Stability Index
Contemporaneous
• Fewer lags,
and Parsimonious fewer variables
and Stability
Explanatory Power
&
• High IV or
Contemporaneous Gini > 0.1

Line of
Add back business Business
preferred variables

Classification
Pipeline

W W W . P E A K S 2 T A I L S . C O M
What qualitative factors do business consider?

Characteristics Examples
I - Implementable LTV_time if no objective way of getting valuations timely
M - Manipulative Variables based on self-reported income
P – Policy or Legal Constraints Utilizing religion as a risk driver, Alternative data
O - Objective Length of employment may have subjective interpretations
(full time or part time)
R - Recognisable Variable like social media activity level not related to credit risk
T - Transparency Calculation methodology of variable should be clear
A - Available Data with high missing values like investment portfolio value
N - Necessary Variables with low statistical performance but very important
T - Tangible Intended use of loan although important but non- tangible

W W W . P E A K S 2 T A I L S . C O M
bureau_score num_ccj max_arrears_12m
2 0.5 1
0 0
1 1 2
-0.5 3 4 5 6 1 2 3 4 5 6
7 8 9 7 8 9 10 11
-1 12 13 14 15
0 -1
1 2 -1.5 -2
3 4
-1 5 6 7 -2 -3

cc_util annual_income months_since_recent_cc_delinq


1
1
1
0 0 0.5
1 2 3 4 1 2 3 4 5
5 6 7 8 6 7 0
-1 9 10 -1 8 9 10 11 12
-0.5 1 2 3 4 5 6 7
-2 -2 8 9
-1

W W W . P E A K S 2 T A I L S . C O M
Equal Frequency Bins

Monotonic Bins

W W W . P E A K S 2 T A I L S . C O M
Exhaustive
Model Search

W W W . P E A K S 2 T A I L S . C O M
Important Thresholds
Metric Threshold
Characteristic Stability Index < 0.1
Gini > 0.1
Information Value > 0.02
Multicollinearity correlation cutoff < 0.5 to 0.7
Multicollinearity IV cut off < 2 to 3
Model AUC > 0.7
Model Gini > 0.4 to 0.5
Model PSI <0.1

Gini or IV which is more important ?

W W W . P E A K S 2 T A I L S . C O M

You might also like