0% found this document useful (0 votes)

11 views12 pages

ML LVC 2 Post-Session Summary

Uploaded by

golgothgolgoth039

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views12 pages

ML LVC 2 Post-Session Summary

Uploaded by

golgothgolgoth039

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Machine Learning

LVC 2: Model Evaluation, Cross-Validation and Bootstrapping

In regression problems, linear regression has been one of the prominent algorithms to give an
appropriate model for correct predictions. As it is a simple algorithm to use and make predictions,
it is also one of the preferred algorithms to tackle regression problems. Once we build the linear
regression model, the next step is to interpret the results.

Let’s start by disusssing what can go wrong with the model that may lead to non-reliable results
or misinterpret the results.

Heteroscedasticity

To get a better fit of the model, it is required to have a constant variance of residuals along the
best-fit line. If the variance of residuals along the line of regression is varying, this case is called
heteroscedasticity. The formulas in regression assume that all residuals are drawn from a
[email protected]
R8L0PN473F
population that has a constant variance (homoscedasticity). Hence, if the assumption is violated,
then the results of linear regression might not be reliable.

The below graph shows that the variance of residuals is not constant along the regression line for
simple regression using the TV feature. The residuals have a larger variance for larger values of
TV, or in other words, the variance of residuals is not constant.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 1
Sharing or publishing the contents in part or full is liable for legal action.
In the above figure, we used the scatter plot to detect heteroscedasticity, but
it might not be feasible to do so when the number of independent variables is
higher. In that case, we can plot the scatter plot of residuals vs predicted values and check if
the variance is constant or not.

In general, outliers are a good example of data points that make the distribution heteroscedastic,
because they create high variance in the distribution.

Multicollinearity

It is a phenomenon when there is a correlation between the independent features in a dataset.

The linear regression model is not applicable in such a case. Since it assumes no correlation
between the independent features, multicollinearity causes repetition of information shared to the
model.

In case of multicollinearity, the matrix of independent features, namely 𝑋, is not a full rank matrix,
𝑇 𝑇 −1
and hence 𝑋 𝑋 is not invertible. Since calculation of standard errors involve (𝑋 𝑋) , we need to
matrix 𝑋 to be invertible. To make it a full rank matrix, it is required to remove some of the
features. Removing features from the set of multicollinear features will make the information
content unique.

Let us understand this

[email protected] with the help of the below figure. 𝑋𝑅𝑎𝑑𝑖𝑜 and 𝑋𝑇𝑉 are the two independent
R8L0PN473F
features respectively on the Y and the X-axis. It can be seen that the two features are showing a
strong correlation with each other. Now, using both of them in a single model will create a
duplicate of information for the model. So it is good to remove either 𝑋𝑅𝑎𝑑𝑖𝑜 or 𝑋𝑇𝑉 from the model.

Remarks:
1. A square matrix has full rank if all the columns are linearly independent.
2. A set of vectors is linearly dependent if there is a nontrivial linear combination of the
vectors that equals 0, otherwise the set of vectors will be linearly independent.
3. If some correlation exists between the features, then it is not a problem because it does
not create the total duplicate of any of the features and the matrix 𝑋 might still has full
rank.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 2
Sharing or publishing the contents in part or full is liable for legal action.
It happens in applicaton that there seems to be a relation existing between
two features, the correlation will also be high, but in fact, changing one will
not necessarily change the other. There are cases where making a change in one feature will not
nudge the other in either way. Let us understand this below.

Prediction versus modeling

It is possible that there is a correlation between two features that are not sensibly relatable in real
life. For example, there is a strong relationship found between the amount of chocolate consumed
and the number of Nobel prizes won by different countries.

[email protected]
R8L0PN473F

Below are some of the possible theories created out of that:

a. Chocolate makes people intelligent and hence they win more Nobels. In this case, there is
a causal relationship between consumption of chocolate and the number of Nobel prizes.

b. Countries with more wealth are consuming more chocolate and earning more Nobels. In
this case, there is no causal relationship between chocolate consumption and the number
of Nobels. They are correlated due to their dependence on a third factor, wealth. This
shows that correlation does not imply causation.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Sharing or publishing the contents in part or full is liable for legal action.
One can use the above relation to make predictions but that cannot be
theoretically supported. We cannot model the relationship, or in other words,
we cannot extract meaningful insights from the model.

In the above example, wealth is considered to be a latent variable. The latent variables are
variables that are not directly observed but are rather inferred through a mathematical model from
other variables that are observed.

When there are such latent variables, the independent features become correlated with residuals.
This phenomenon is commonly known as endogeneity.

Modeling is hard in the presence of latent variables

Let’s consider a scenario. Advertisements are done as an experiment in two different parts of a
town. In the first part, there is not much competition, and the sales are good even with a little
advertisement. In the second part, there are competing stores/dealers, and needs more
advertisement for more sales, although not as good as the first part. The following figure shows
the regression lines for both parts of the town.

[email protected]
R8L0PN473F

In both the towns, the regression line shows that as the marketing budget is increasing, sales are
also increasing. So, it can be concluded that increasing the marketing budget will increase sales.

Now, let us fit the regression line on both parts of the town. It shows that the sales are decreasing
with the increase in the marketing budget. That is not true and it can be deceiving. Such an effect
is due to the latent variable, competition.

This is called Simpson’s paradox, in which a trend appears in several groups of data but
disappears or reverses when the groups are combined.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Sharing or publishing the contents in part or full is liable for legal action.
In such cases, we can still make good predictions, but we do not get the right
structural model, we cannot answer “what if” questions, and the formulas for
standard errors, etc., do not hoId. It is a big issue in social and bio/medical science.

Mitigation: use more variables

The issue of latent variables can be solved by adding extra variables to the model. For example, if
market size is important to the model, then add market size to it. In terms of adding variables to
the model, one of the problems is what variable to add? It is not possible all the time to collect all
the relevant data.

The individual features in linear regression are generally linear features. It has been observed that
adding non-linear features (created by either transformation of one of the existing features or the
combination of two or more features) to the model is equivalent to adding new variables to the
model. Using such features in the model is called Linear regression with non-linear features,
which is explained below.

Some common transformations are taking the logarithm of a feature, square root, or some other
functional mapping, based on how what is more relevant to the problem. In a combination of
features, products of two or more features, or some other nonlinear combination of features can
be taken into account.
[email protected]
If the variable is transformed
R8L0PN473F but is now useful for the model, it is still a linear regression problem
but with more features. The nonlinear features might be a combination of more than two inherent
features but mathematically it will still be a least square problem. So adding nonlinear features
still keeps the mathematics behind it a linear equation.

Note: The algorithm is still called lienar regression after adding nonlinear features because the
linearity assumption in linear regression means the model is linear in parameters (i.e. coefficients
of variables) and may or may not be linear in variables.

It is also feasible that when the combination of two features is useful but not the individual
features. In that case, we can replace the original features with with the new combined feature. In
case we don’t know, it is better to keep old as well as new features.

Let us have a look at the equation of the model we got for our advertising example:

𝑆𝑎𝑙𝑒𝑠 = 2. 94 + 0. 046 × (𝑇𝑉) + 0. 19 × (𝑅𝑎𝑑𝑖𝑜) − 0. 001 × (𝑁𝑒𝑤𝑠)

2
Here, 𝑅 = 0. 897

This is a simple model where no transformations or combinations of features are used.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Sharing or publishing the contents in part or full is liable for legal action.
Now, let us add a feature to the model which is the product of two existing
features, TV and Radio.

𝑆𝑎𝑙𝑒𝑠 = 2. 94 + 0. 046 × (𝑇𝑉) + 0. 19 × (𝑅𝑎𝑑𝑖𝑜) + 0. 001 × (𝑇𝑉). (𝑅𝑎𝑑𝑖𝑜)

2
Here, 𝑅 = 0. 968

2
It can be observed that by adding new feature, the model is giving a better 𝑅 . So, it is advisable
in such a case to retain the product of the features along with the original features.

Adding too many non-significant variables to the model may lead to overfitting in the model. The
model will start capturing the noise of the data, and will not be able to make relevant, generic
predictions over unseen data.

Overfitting

Adding more and more variables may seem good while assessing the performance of the model
on the training data but adding an excessive number of features will lead to overfitting in the
model. On adding more features, it will start following the noise. This will lead to problems in
predicting new or unseen data. So, we need to have a way to decide when to stop the addition of
further variables.
[email protected]
R8L0PN473F
One of the foremost methods to deal with overfitting is regularization. Two most common
regularization techniques that are found useful to deal with overfitting in linear regression models
are Ridge regularization and Lasso regularization. Let us understand them one by one.

Regularization: Ridge Regression

In Ridge Regression, the loss function consists of an extra term called the regularization term. The
parameters are not allowed to follow the noise too much. They will be kept in control by
penalizing them. This may be understood as trying to fit the noise in an economical way. If the
penalization factor α is 0, then no regularization is done. If it is greater than zero, then it is a
regularized regression. Mathematically, the loss function of ridge regression can be given as
follows:

𝑛
2
𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑒𝑑 𝑙𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = 𝑙𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 + α ∑ θ𝑗
𝑗=1

Where, α is the regularization hyperparameter.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
Sharing or publishing the contents in part or full is liable for legal action.
Along with Ridge regression, there is another way of doing the regularization
that is named Lasso regression. The only difference is there in the
regularization term in the loss function. Let us see it below in a bit more detail.

Regularization: Lasso Regression

Here, the modulus of the weights is used as an additive term to the loss function of linear
regression. It sets many of the weights to zero and this way it will make the model sparse. When
α is big, weights are small. Larger α means more sparse solutions. Mathematically, it can be given
as follows:

𝑛
𝑅𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑒𝑑 𝑙𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 = 𝑙𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 + α ∑ |θ𝑗|
𝑗=1

Sparsity means if 𝑋 vector is of dimension 100, θ will also be of dimension 100. Out of this 100, if
θ is 0 most of the time then it is a sparse matrix.

In general, to find the best working regularization methods, it is advisable to try both of them with
different values of the regularizartion hyperparameter and use the one working the best.

The below diagram is drawn between the weights and the output feature. It can be understood
that the Ridge regression has a bias in favor of zero. It moves the weights towards zero. With
[email protected]
R8L0PN473F
Lasso, if the weight is small then bring it to zero. The black line is the regression line without
regularization. The blue line is Ridge regression while the red line is the Lasso regression. In the
Lasso regression, it can be observed that when the weights are close to zero, it is suppressing
them to zero. So, Lasso penalizes small weights to totally zero.

Marketing example: Lasso

As we know in Lasso regression, if the weights are getting closer to zero, they will be made
exactly zero. From the below figure, there is a variation of α with θ, with θ on the y-axis and α on
the x-axis. It can be observed that for newspaper and radio θ is made exactly zero as α
increases, hence these two variables will vanish because their weights are made 0. The other two
features that are tv and the combination of tv and radio are retained.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Sharing or publishing the contents in part or full is liable for legal action.
In the abundance of features, it happens many times that there are too many optimal predictors
to choose from. In such a case, selecting good features becomes a challenging task. To resolve
such a case, the first step is to fit the model on the existing data, and then make predictions on
the unseen data from the same population. This is part of the generalization of the model. Then
the model that is generalizing better has to be selected as the final model with a certain set of
[email protected]
features as input.
R8L0PN473F

It has been a common observation that when we increase the number of features in the model,
the training error drops. This happens because as the number of features increases, it starts
capturing the noises of the data itself. Let us go through the below figure.

Here, the y-axis represents the training error and the x-axis depicts the number of features
utilized. As the number of features increases, the training error goes down.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
Sharing or publishing the contents in part or full is liable for legal action.
The model seems to fit better if we increase the number of features, but in
reality, we are overfitting the model on the training data. Overall, we need a
model that is more generic on new data. To do so, let us understand how to manage the training
error (bias) and the testing error (variance) using the concept of Bias-variance tradeoff.

Bias-variance tradeoff

Loosely, bias can be interpreted as the training error of the model, while variance is the testing
error of the model. In real life, we do not prefer a model with a high bias or a high variance. It
must be a model with low bias and low variance because that model is supposed to make
reliable predictions. It is found that having very few features will lead to underfitting and too many
features will lead to overfitting of the model.

To resolve this, let us understand the concept of the validation set. A validation set is a subset of
the original data that is used to validate and tune the performance of the model. It is initiated by
randomly dividing the data into training and validation sets. The model is trained on the training
set and validated on the validation set by making predictions on that. The validation set should be
from the same population.

From the below figure, it can be observed that as the number of features increases, the training
error goes down, while the validation error has a different pattern. It initially goes down but after a
while, it starts going up.
[email protected]
R8L0PN473F

The appropriate number of features belong to a point where the validation error becomes
minimum, i.e., in the valley of the convex part of the figure. The validation error is high for both, a
lower number of features as well as a higher number of features. This is because when the
number of features are less, the model is so simple that it does not capture the patterns in the
data. It learned less than enough to perform better on unseen data. However, when the number
of features is too high, then the model has become too complex, it cannot make generic
predictions on unseen data. Somewhere in the middle of these two extremes, there is a sweet
spot where the model is flexible enough to learn from the training set in such a way that it uses
that learning efficiently on the unseen data to make generic predictions on unseen data.

There are a few drawbacks to this validation process. They are listed below:
1. Some data is wasted and not used for training

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
Sharing or publishing the contents in part or full is liable for legal action.
2. Error on the validation set has high randomness

To resolve this, the next idea is the concept of LOOCV (Leave-One-Out Cross-Validation). This is
explained below:

Leave-One-Out Cross-Validation

Here, the model is trained on n-1 (n is the number of records) data points and 1 data point is kept
for testing. That is why it is called Leave one out cross-validation. The process is repeated n
times and will lead to making 𝑛 different models. At the end, it calculates the mean of errors over
the n repetitions. The process can be understood with the following figure:

[email protected]
In the above figure,
R8L0PN473F there are 100 data points. Out of them, 99 are selected for training and
hundred such models are trained.

There are certain advantages of this model that are listed below:

1. No variability due to random choice of validation set. Here, the validation set is not
chosen randomly. Every data point has to be present once in the validation set.

2. Uses full data for training. It uses all the data points for training as only one point is left
for validation at each iteration.

There are also disadvantages associated with this method that are as follows:

1. Have to train n times. As only one point is left in the validation set, there becomes a total
of n iterations of training the model. This is computationally challenging and time
consuming.

2. N prediction errors are highly dependent. As different models are having many records
common in the data they are trained on, the prediction errors are highly dependent on
each other.

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
Sharing or publishing the contents in part or full is liable for legal action.
To overcome this, the concept of K-Fold Cross-Validation is introduced.

K-fold cross-validation

In this method, the entire data is divided into K folds. Out of them, one fold is kept for testing,
while others are used for training the model. This can be understood as a more practical version
of LOOCV. The steps are as follows:

1. Randomly divide the data into K groups.

2. For every value of K, keep one partition as a hold-out set and use the rest of the data for
training.
3. Repeat the process K times.
4. Take the average of the error from K folds at the end.

This method can be used to find a different set of parameters and features. From the below
figure, the variation of the mean squared error with the number of folds can be observed. As the
number of folds increases, the error also increases. Generally, K corresponding to a certain
threshold error is taken as the final K.

[email protected]
R8L0PN473F

Commonly, it is observed that for different models trained on different samples from the same
population, the error is also different. So, there is inherent variance in the performance of the
model. To overcome this, one of the solutions is bootstrapping the model.

Data-driven bootstrap: The idea

In bootstrapping, from a given dataset, a large number of datasets are created using random
sampling with replacement. In sampling with replacement, one data point is taken from the
dataset and then placed back into the dataset. Again, another data point is taken at random from

This file is meant for personal use by [email protected] only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 11
Sharing or publishing the contents in part or full is liable for legal action.
the same dataset. This process runs until we have the desired number of data
points in each sample. These datasets may have duplicate records as well
because the points are sampled and then replaced back in the main dataset.

Bootstrapping is done when we can’t get more data from the same population. From the same
data, more data sets are created by repetition. It removes the dependency of output from a
single data point. This way it handles the variance in the data.

We can find the coefficients for each sample and plot them using a histogram. This distribution is
called a sampling distribution as we can plotting some estimate from multiple samples of the
original data.

We can have more confidence in the estimates reporting by the sampling distribution due to the
central limit theorem. The bootstrap method creates a new synthetic dataset of the same size as
the original dataset (if size of each sample is the same as n). It creates samples by sampling from
the original dataset but with replacement.

We think θ in terms of another estimate produced from a data sample drawn from that
distribution and we repeat this process a certain number of times. Let’s say we have 𝑚 bootstrap
samples, for which we get 𝑚 number of estimates and those estimates are generated from
different datasets but all of these datasets were coming from the representative distribution.
[email protected]
R8L0PN473F
And, we calculate the average of different estimates we got and then we look at the distance of a
typical estimate from the average estimate and this reflects the inherence variance in our
estimates. By using that, we can calculate the standard error of the estimator.

𝑚
1
θ𝑎𝑣𝑒 = 𝑚
Σ θ𝑖
𝑖=1

1 𝑚 2
𝑉𝑎𝑟 (θ) = 𝑚
Σ𝑖=1(θ𝑖 − θ𝑎𝑣𝑒)

𝑠𝑒(θ) = 𝑉𝑎𝑟( θ)

The general idea is to learn something about the distribution of the estimates by resampling, by
reusing the data in a clever way many times.

This file is meant for personal use by [email protected] only.

Verified PDF Download Basic Econometrics Gujarati Porter 5th Edition FULL Version
100% (2)
Verified PDF Download Basic Econometrics Gujarati Porter 5th Edition FULL Version
407 pages
Heteroscedsaticity Lecture 2023
No ratings yet
Heteroscedsaticity Lecture 2023
20 pages
MPC 006
No ratings yet
MPC 006
99 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
The Five Assumptions of Multiple Linear Regression
No ratings yet
The Five Assumptions of Multiple Linear Regression
18 pages
Econometrics Module
No ratings yet
Econometrics Module
79 pages
Programming With Python and GUI Development... 2024
No ratings yet
Programming With Python and GUI Development... 2024
145 pages
Econometrics Board Questions
No ratings yet
Econometrics Board Questions
13 pages
Econometrics moduleII
100% (2)
Econometrics moduleII
114 pages
Linear Regression With R
No ratings yet
Linear Regression With R
45 pages
1 - UNIT 2 2 Files Merged
No ratings yet
1 - UNIT 2 2 Files Merged
80 pages
Heteroscedasticity Workshop
No ratings yet
Heteroscedasticity Workshop
72 pages
Assumptions of Regression and Model Evaluation Simplified
No ratings yet
Assumptions of Regression and Model Evaluation Simplified
69 pages
Assignment For Viva
No ratings yet
Assignment For Viva
54 pages
Chap4 Econometrics I Jonse
No ratings yet
Chap4 Econometrics I Jonse
51 pages
Econometrics II Chapter Two
No ratings yet
Econometrics II Chapter Two
40 pages
Unit-2 Ak
No ratings yet
Unit-2 Ak
106 pages
Statistic and Data Science Ii PDF
No ratings yet
Statistic and Data Science Ii PDF
37 pages
Multiple Linear Regression Test - 2025
No ratings yet
Multiple Linear Regression Test - 2025
47 pages
Violation of Assumptions
No ratings yet
Violation of Assumptions
61 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
ch4 (Multi Hetro Auto)
No ratings yet
ch4 (Multi Hetro Auto)
43 pages
Topic 5
No ratings yet
Topic 5
30 pages
Second Stats Packet 24
No ratings yet
Second Stats Packet 24
100 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
Regression Analysis in Business Analytics
No ratings yet
Regression Analysis in Business Analytics
14 pages
7dJDuD5Y2Fia6Ch 6 Multicollinearity&Heterosced
No ratings yet
7dJDuD5Y2Fia6Ch 6 Multicollinearity&Heterosced
23 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Stats Ch.13 Linear Regression
No ratings yet
Stats Ch.13 Linear Regression
42 pages
Assumptions Linear Regression
No ratings yet
Assumptions Linear Regression
14 pages
CH2 Complete Simple Linear Regression 2011 Mesfin
No ratings yet
CH2 Complete Simple Linear Regression 2011 Mesfin
42 pages
Chapter 6 (Part Ii)
No ratings yet
Chapter 6 (Part Ii)
41 pages
LR Assumptions - 05
No ratings yet
LR Assumptions - 05
12 pages
LR Assumptions
No ratings yet
LR Assumptions
9 pages
Linear Regression Assumptions and Limitations
No ratings yet
Linear Regression Assumptions and Limitations
10 pages
Topic 4 ETC1000
No ratings yet
Topic 4 ETC1000
13 pages
6338 - Multicollinearity & Autocorrelation
No ratings yet
6338 - Multicollinearity & Autocorrelation
28 pages
Assignment QMT533 RUJUKN
No ratings yet
Assignment QMT533 RUJUKN
33 pages
LSE Data Analysis M6U1 Notes
No ratings yet
LSE Data Analysis M6U1 Notes
25 pages
Autocorrelation
No ratings yet
Autocorrelation
24 pages
Homework 3
No ratings yet
Homework 3
10 pages
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
No ratings yet
Business Econometrics Using SAS Tools (BEST) : Class XI and XII - OLS BLUE and Assumption Errors
15 pages
Economic
No ratings yet
Economic
11 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Assumption Checking On Linear Regression
No ratings yet
Assumption Checking On Linear Regression
65 pages
Multicollinearity Autocorrelation
No ratings yet
Multicollinearity Autocorrelation
28 pages
Multicollinearity and Endogeneity PDF
No ratings yet
Multicollinearity and Endogeneity PDF
37 pages
Marketing Analytics Assignment
No ratings yet
Marketing Analytics Assignment
8 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
The Four Assumptions of Linear Regression
No ratings yet
The Four Assumptions of Linear Regression
10 pages
Chapter One Part 2
No ratings yet
Chapter One Part 2
5 pages
Regression Analysis (AI)
No ratings yet
Regression Analysis (AI)
9 pages
Chapter 4 - Acct
No ratings yet
Chapter 4 - Acct
16 pages
Study Material - Econometrics
No ratings yet
Study Material - Econometrics
20 pages
ARM 2nd Mid
No ratings yet
ARM 2nd Mid
13 pages
Trachnhiem
50% (2)
Trachnhiem
4 pages
Regession Assumptions - For Students
No ratings yet
Regession Assumptions - For Students
2 pages
Assosa University School of Graduate Studies Mba Program
No ratings yet
Assosa University School of Graduate Studies Mba Program
10 pages
Question 4 (A) What Are The Stochastic Assumption of The Ordinary Least Squares? Assumption 1
No ratings yet
Question 4 (A) What Are The Stochastic Assumption of The Ordinary Least Squares? Assumption 1
9 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
Lesson 04
No ratings yet
Lesson 04
5 pages
Homework 3
No ratings yet
Homework 3
10 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
Notebook - Deep Neural Networks
No ratings yet
Notebook - Deep Neural Networks
28 pages
Kuzey, Uyar, Gerged and Karaman (2025)
No ratings yet
Kuzey, Uyar, Gerged and Karaman (2025)
59 pages
For Finance Course Outline
No ratings yet
For Finance Course Outline
5 pages
MLS 1 - Regression
No ratings yet
MLS 1 - Regression
20 pages
Test Bank Introductory Econometrics
No ratings yet
Test Bank Introductory Econometrics
134 pages
Chapter 7. Software Application
No ratings yet
Chapter 7. Software Application
43 pages
Group 8 - DS II Report
No ratings yet
Group 8 - DS II Report
7 pages
ML Unit 3 Notes 1
No ratings yet
ML Unit 3 Notes 1
58 pages
Gujarati - Chap 4
No ratings yet
Gujarati - Chap 4
22 pages
Notebook - Music Recommendation System Reference
No ratings yet
Notebook - Music Recommendation System Reference
22 pages
5 2-4 Spatial Environmental Data Gaussian Processes
No ratings yet
5 2-4 Spatial Environmental Data Gaussian Processes
3 pages
Hall 1967
No ratings yet
Hall 1967
14 pages
ML LVC 3 Post-Session Summary
No ratings yet
ML LVC 3 Post-Session Summary
16 pages
Instant Download Estimation and Inference in Econometrics 1st Edition Russell Davidson PDF All Chapter
100% (4)
Instant Download Estimation and Inference in Econometrics 1st Edition Russell Davidson PDF All Chapter
63 pages
Practical Business Forecasting 1st Edition Michael K. Evans Instant Download
No ratings yet
Practical Business Forecasting 1st Edition Michael K. Evans Instant Download
33 pages
Econometrics
No ratings yet
Econometrics
18 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Midterm 1a Solutions
No ratings yet
Midterm 1a Solutions
9 pages
Time Series Analysis 1718649022
No ratings yet
Time Series Analysis 1718649022
5 pages
Introduction To Econometrics
No ratings yet
Introduction To Econometrics
37 pages
CH 05
No ratings yet
CH 05
64 pages
A Proposed Model For Supermarket Branch Network Expansion in Kenya Denis Ouma Doctor of Philosophy
No ratings yet
A Proposed Model For Supermarket Branch Network Expansion in Kenya Denis Ouma Doctor of Philosophy
61 pages
5 2-6 Spatial Environmental Data Gaussian Processes
No ratings yet
5 2-6 Spatial Environmental Data Gaussian Processes
4 pages
5 3-2 Spatial Environmental Data Model Selection Long-Range Dependencies
No ratings yet
5 3-2 Spatial Environmental Data Model Selection Long-Range Dependencies
3 pages
Bitcoin As A Global Currency - Exploring The Wild West of Crypto
No ratings yet
Bitcoin As A Global Currency - Exploring The Wild West of Crypto
45 pages
The Importance of Digital Financial Literacy To Anticipaye Impulsive Buying Behavior in Buy-Now-Pay-Later Mode
No ratings yet
The Importance of Digital Financial Literacy To Anticipaye Impulsive Buying Behavior in Buy-Now-Pay-Later Mode
13 pages
Notebook - Text Classification
No ratings yet
Notebook - Text Classification
7 pages
The CNN Architecture
No ratings yet
The CNN Architecture
15 pages
Building A Tanh Activation Function
No ratings yet
Building A Tanh Activation Function
9 pages
Stock Market Dashboard in Python
No ratings yet
Stock Market Dashboard in Python
4 pages
Notebook - Agave Plant Maturation Model Inference and Testing
No ratings yet
Notebook - Agave Plant Maturation Model Inference and Testing
7 pages
1 3 Multiple Hypothesis Testing
No ratings yet
1 3 Multiple Hypothesis Testing
14 pages
New System To Harness 40% of The Sun's Heat To Produce Clean Hydrogen Fuel
No ratings yet
New System To Harness 40% of The Sun's Heat To Produce Clean Hydrogen Fuel
6 pages
The Influence of Brand Awareness Brand Image and W
No ratings yet
The Influence of Brand Awareness Brand Image and W
16 pages
Mustafa Et Al. 2020
No ratings yet
Mustafa Et Al. 2020
17 pages
Factors Influencing The Companies Profitability
No ratings yet
Factors Influencing The Companies Profitability
11 pages
RAGE Against The Machine - Retrieval-Augmented LLM Explanations
No ratings yet
RAGE Against The Machine - Retrieval-Augmented LLM Explanations
4 pages
Glossary of Notations - Recommender Systems Part 3
No ratings yet
Glossary of Notations - Recommender Systems Part 3
4 pages
Q1.2024 TheImpactofLiquidityandCorporateEfficiencyonProfitability Nguyenetal
No ratings yet
Q1.2024 TheImpactofLiquidityandCorporateEfficiencyonProfitability Nguyenetal
13 pages
Notebook - Geospatial
No ratings yet
Notebook - Geospatial
11 pages
Data Pipeline in ML
No ratings yet
Data Pipeline in ML
3 pages
Jurnal en - Gedion Cahyamulia - 2017051104
No ratings yet
Jurnal en - Gedion Cahyamulia - 2017051104
12 pages
1.10 Simple Linear Regression
No ratings yet
1.10 Simple Linear Regression
9 pages
Boston Dataset
No ratings yet
Boston Dataset
6 pages
What Is The Meaning of The Term
No ratings yet
What Is The Meaning of The Term
5 pages
Notebook - Main Code
No ratings yet
Notebook - Main Code
4 pages
Quiz 8 Chap 9
No ratings yet
Quiz 8 Chap 9
5 pages
ML LVC 3 Glossary
No ratings yet
ML LVC 3 Glossary
1 page
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Gale Researcher Guide for: Econometric Models
From Everand
Gale Researcher Guide for: Econometric Models
Chupp
No ratings yet

ML LVC 2 Post-Session Summary

Uploaded by

ML LVC 2 Post-Session Summary

Uploaded by

Machine Learning

LVC 2: Model Evaluation, Cross-Validation and Bootstrapping

This file is meant for personal use by [email protected] only.

It is a phenomenon when there is a correlation between the independent features in a dataset.

Let us understand this

This file is meant for personal use by [email protected] only.

Prediction versus modeling

Below are some of the possible theories created out of that:

This file is meant for personal use by [email protected] only.

Modeling is hard in the presence of latent variables

This file is meant for personal use by [email protected] only.

Mitigation: use more variables

𝑆𝑎𝑙𝑒𝑠 = 2. 94 + 0. 046 × (𝑇𝑉) + 0. 19 × (𝑅𝑎𝑑𝑖𝑜) − 0. 001 × (𝑁𝑒𝑤𝑠)

This is a simple model where no transformations or combinations of features are used.

This file is meant for personal use by [email protected] only.

𝑆𝑎𝑙𝑒𝑠 = 2. 94 + 0. 046 × (𝑇𝑉) + 0. 19 × (𝑅𝑎𝑑𝑖𝑜) + 0. 001 × (𝑇𝑉). (𝑅𝑎𝑑𝑖𝑜)

Regularization: Ridge Regression

Where, α is the regularization hyperparameter.

This file is meant for personal use by [email protected] only.

Regularization: Lasso Regression

Marketing example: Lasso

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

1. Randomly divide the data into K groups.

Data-driven bootstrap: The idea

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

You might also like