0% found this document useful (0 votes)
14 views58 pages

ML Unit 3 Notes 1

DFGFD FGH FGH DFD

Uploaded by

Thil Pa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views58 pages

ML Unit 3 Notes 1

DFGFD FGH FGH DFD

Uploaded by

Thil Pa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Course Name : MACHINE LEARNING

Course Code : 20AD04


Course Instructor : M. Rajesh Reddy
Semester : V
Regulation : R20
Unit: 3

1
UNIT--3: SYLLABUS
UNIT
Regression
Introduction to regression analysis, Simple
linear regression, Multiple linear regression,
Assumptions in Regression Analysis, Main
Problems in Regression Analysis, Improving
Accuracy of the linear regression model,
Polynomial Regression Model, Logistic
Regression, Regularization, Regularized
Linear Regression, Regularized Logistic
Regression.
3.1 Introduction
In the context of regression, dependent variable (Y) is
the one whose value is to be predicted.
e.g. the price quote of the real estate
The dependent variable (Y) is functionally related to one
(say, X) or more independent variables called
predictors.
Regression is essentially finding a relationship (or)
association between the dependent variable (Y) and the
independent variable(s) (X), i.e. to find the function ‘f ’
for the association Y = f (X).
The most common regression algorithms are
◦ Simple linear regression
◦ Multiple linear regression
◦ Polynomial regression
◦ Multivariate adaptive regression splines
◦ Logistic regression
◦ Maximum likelihood estimation (least squares)
3.2 Simple Linear Regression
Simple linear regression is the simplest regression
model which involves only one predictor.
This model assumes a linear relationship between the
dependent variable and the predictor variable as shown
in Figure
3.2 Simple Linear Regression
In the context of real estate problem, if we take Price of a
Property as the dependent variable and the Area of the
Property (in sq. m.) as the predictor variable, we can build a
model using simple linear regression.

where ‘a’ and ‘b’ are intercept and slope of the


straight line,
3.2.1 Slope of the simple
linear regression model
• Slope of a straight line
represents how much the line
in a graph changes in the
vertical direction (Y-axis) over
a change in the horizontal
direction (X-axis)
Slope = Change in Y/Change in X
Rise is the change in Y-axis (Y2 − Y1 )
Run is the change in X-axis (X2 − X1 ).
3.2 Simple Linear Regression
3.2.1 Slope of the simple linear
regression model Cont…
There can be two types of slopes in a
linear regression model: positive slope
and negative slope.
Different types of regression lines
based on the type of slope include:
◦ Linear positive slope
◦ Curve linear positive slope
◦ Linear negative slope
◦ Curve linear negative slope
3.2 Simple Linear Regression
3.2.1 Slope of the simple linear regression
model Cont…
1. Linear positive slope
A positive slope always moves upward on a
graph from left to right

Slope = Rise/Run = (Y2 − Y1 ) / (X2 − X1 )


= Delta (Y) / Delta(X)

Scenario 1 for positive slope: Delta (Y) is positive and Delta (X) is positive
Scenario 2 for positive slope: Delta (Y) is negative and Delta (X) is negative
3.2 Simple Linear Regression
3.2.1 Slope of the simple linear regression
model Cont…
2. Curve linear positive slope
Curves in these graphs slope upward from left to
right.

Slope for a variable (X) may vary between two graphs, but it
will always be positive; hence, the above graphs are called as
graphs with curve linear positive slope.
3.2 Simple Linear Regression
3.2.1 Slope of the simple linear regression model
Cont…
3. Linear negative slope
A negative slope always moves downward on a graph
from left to right.
As X value (on X-axis) increases, Y value decreases

Slope = Rise/Run = (Y2 − Y1 ) / (X2 − X1 )


= Delta (Y) / Delta(X)

Scenario 1 for negative slope: Delta (Y) is positive and Delta (X) is negative
Scenario 2 for positive slope: Delta (Y) is negative and Delta (X) is positive
3.2 Simple Linear Regression
3.2.1 Slope of the simple linear regression
model Cont…
4. Curve linear negative slope
Curves in these graphs slope downward from left
to right.

Slope for a variable (X) may vary between two graphs, but it
will always be negaitive; hence, the above graphs are called as
graphs with curve linear negaitive slope.
3.2 Simple Linear Regression
3.2.2 No relationship graph
Scatter graph shown in Figure indicates ‘no
relationship’ curve as it is very difficult to
conclude whether the relationship between X
and Y is positive or negative.
3.2 Simple Linear Regression
3.2.3 Error in simple regression
For a regression model, X and Y values are
provided to the machine, and it identifies the
values of a (intercept) and b (slope) by
relating the values of X and Y.
Identifying the exact match of values for a
and b is not always possible. There will be
some error value (ɛ) associated with it. This
error is called marginal or residual error.
3.2 Simple Linear Regression
3.2.4 Example of simple regression
A college professor believes that if the
grade for internal examination is high in
a class, the grade for external
examination will also be high.
A random sample of 15 students in that
class was selected, and the data is given
below:
3.2 Simple Linear Regression
3.2.4 Example of simple regression Cont…
A scatter plot was drawn to explore the relationship
between the independent variable (internal marks)
mapped to X-axis and dependent variable (external
marks) mapped to Y-axis

• We can observe from the above graph, the line (i.e. the regression
line) does not predict the data exactly
• Instead, it just cuts through the data.
• Some predictions are lower than expected, while some others are
higher than expected.
3.2 Simple Linear Regression
3.2.4 Example of simple regression Cont…
Residual is the distance between the
predicted point on the regression line and the
actual point.
3.2 Simple Linear Regression
3.2.4 Example of simple regression Cont…
In simple linear regression, the line is drawn using the
regression formula.

Procedure to calculate the values of ‘a’ and ‘b’ for a


given set of X and Y values:
◦ A straight line is drawn as close as possible over the points
on the scatter plot.
◦ Ordinary Least Squares (OLS) is the technique used to
estimate a line that will minimize the error (ε)
◦ Calculate the Sum of the Squares of the Errors (SSE)
◦ It is observed that the SSE is least when b takes the value

◦ Corresponding value of ‘a’ calculated using the above


value of ‘b’ is
3.2 Simple Linear Regression
3.2.4 Example of simple regression Cont…
Calculate the value of a and b for the given
example.
◦ Sum of X = 299
◦ Sum of Y = 852
◦ Mean X, MX = 19.93
◦ Mean Y, MY = 56.8
◦ Sum of squares (SSX ) = 226.9333
◦ Sum of products (SP) = 429.8
◦ Regression equation = ŷ = bX + a

the estimated regression equation is


In the context of the given problem:
Marks in external exam = 19.05 + 1.89 × (Marks in internal exam)
MExt = 19.05 + 1.89 × MInt
3.2 Simple Linear Regression
3.2.4 Example of simple regression Cont…
Detailed calculation of regression parameters
3.2 Simple Linear Regression
3.2.4 Example of simple regression Cont…

In the given problem : MExt = 19.05 + 1.89 × MInt


The value of the intercept from the above
equation is 19.05. However, none of the internal
mark is 0.
So, intercept = 19.05 indicates that 19.05 is the
portion of the external examination marks not
explained by the internal examination marks.
Slope measures the estimated change in the
average value of Y as a result of a one-unit
change in X.
Here, slope = 1.89 tells us that the average value
of the external examination marks increases by
1.89 for each additional 1 mark in the internal
examination.
3.2 Simple Linear Regression
3.2.4 Example of simple regression Cont…
The model built above can be represented
graphically as:
◦ an extended version
◦ a zoom-in version
3.2 Simple Linear Regression
3.2.5 OLS algorithm
Step 1: Calculate the mean of X and Y
Step 2: Calculate the errors of X and Y
Step 3: Get the product
Step 4: Get the summation of the
products
Step 5: Square the difference of X
Step 6: Get the sum of the squared
difference
Step 7: Divide output of step 4 by output
of step
Step 8: Calculate ‘a’ using the value of ‘b’
3.2 Simple Linear Regression
3.2.5 Maximum and minimum point of
curves
Maximum and minimum points on a
graph are found at points where the
slope of the curve is zero.
It becomes zero either from positive or
negative value.
The maximum point is the point on the
curve of the graph with the highest y-
coordinate and a slope of zero.
The minimum point is the point on the
curve of the graph with the lowest y-
coordinate and a slope of zero.
3.2 Simple Linear Regression
3.2.5 Maximum and minimum point of
curves

• Point 63 is at the maximum point for this curve.


• Point 63 is at the highest point on this curve.
• It has a greater y-coordinate value than any other point
on the curve and has a slope of zero.
3.2 Simple Linear Regression
3.2.5 Maximum and minimum point of curves

• Point 40 is at the minimum point for this curve.


• Point 63 is at the lowest point on this curve.
• It has a lesser y-coordinate value than any other point on
the curve and has a slope of zero.
3.3 Multiple Linear Regression
In a multiple regression model, two or more independent
variables, i.e. predictors are involved in the model.
Eg: In House Price Prediction problem, we can consider
Price of a Property (in $) as the dependent variable and Area
of the Property (in sq. m.), location, floor, number of years
since purchase and amenities available as the independent
variables.
We can form a multiple regression equation as:

The following expression describes the equation involving the


relationship with two predictor variables, namely X1 and X2 .

The model describes a plane in the three-dimensional space of Ŷ, X1


and X2 .
Parameter ‘a’ is the intercept of this plane.
Parameters ‘b1’ and’ b2’ are referred to as partial regression
coefficients.
Parameter b1 represents the change in the mean response
corresponding to a unit change in X1 when X2 is held constant.
Parameter b2 represents the change in the mean response
corresponding to a unit change in X2 when X1 is held constant.
3.3 Multiple Linear Regression
Consider the following example of a multiple linear
regression model with two predictor variables, namely
X1 and X2.
3.3 Multiple Linear Regression
Multiple regression for estimating
equation when there are ‘n’ predictor
variables is as follows:

• While finding the best fit line, we can fit either a


polynomial or curvilinear regression. These are
known as polynomial or curvilinear regression,
respectively.
3.4 Assumptions in Regression Analysis

1. The dependent variable (Y) can be


calculated / predicated as a linear function of
a specific set of independent variables (X’s)
plus an error term (ε).
2. The number of observations (n) is greater
than the number of parameters (k) to be
estimated, i.e. n > k.
3. Relationships determined by regression are
only relationships of association based on the
data set and not necessarily of cause and
effect of the defined class.
4. Regression line can be valid only over a
limited range of data. If the line is extended
(outside the range of extrapolation), it may
only lead to wrong predictions.
3.4 Assumptions in Regression Analysis
5. If the business conditions change and the business
assumptions underlying the regression model are no
longer valid, then the past data set will no longer be
able to predict future trends.
6. Variance is the same for all values of X
(homoskedasticity).
7. The error term (ε) is normally distributed. This also
means that the mean of the error (ε) has an expected
value of 0.
8. The values of the error (ε) are independent and are not
related to any values of X. This means that there are no
relationships between a particular X, Y that are related
to another specific value of X, Y.

Given the above assumptions, the OLS estimator is the


Best Linear Unbiased Estimator (BLUE), and this is
called as Gauss-Markov Theorem.
3.5 Main Problems in Regression Analysis
In multiple regressions, there are two primary problems:
1. Multi-collinearity
2. heteroskedasticity.
3.5.1 Multi-collinearity
Two variables are perfectly collinear if there is an exact linear
relationship between them.
Multi-collinearity is in which the degree of correlation is not only
between the dependent variable and the independent variable, but
there is also a strong correlation among the independent variables
themselves.
A multiple regression equation can make good predictions when
there is multi-collinearity.
It is difficult for us to determine how the dependent variable will
change if each independent variable is changed one at a time.
When multi-collinearity is present, it increases the standard
errors of the coefficients.
By overinflating the standard errors, multicollinearity tries to make
some variables statistically insignificant when they actually should
be significant
One way to gauge multi-collinearity is to calculate the Variance
Inflation Factor (VIF), which assesses how much the variance of an
estimated regression coefficient increases if the predictors are
correlated. If no factors are correlated, the VIFs will be equal to 1.
3.5 Main Problems in Regression Analysis
The assumption of no perfect collinearity
states that there is no exact linear
relationship among the independent
variables.
This assumption implies two aspects of
the data on the independent variables.
◦ First, none of the independent variables,
other than the variable associated with the
intercept term, can be a constant.
◦ Second, variation in the X’s is necessary.
the more variation in the independent
variables, the better will be the OLS
estimates in terms of identifying the impacts
of the different independent variables on the
dependent variable.
3.5 Main Problems in Regression Analysis
3.5.2 Heteroskedasticity
It refers to the changing variance of the error term.
If the variance of the error term is not constant across
data sets, there will be erroneous predictions.
For a regression equation to make accurate predictions,
the error term should be independent, i.e., identically
(normally) distributed.
Mathematically, this assumption is written as:
where ‘var’ is the variance,
‘cov’ is the covariance,
‘u’ represents error terms,
‘X’ represents independent variables.

This assumption is more commonly written as


3.6 Improving Accuracy of the Linear Regression
Model
The concept of bias and variance is similar to accuracy and
prediction.
Accuracy refers to how close the estimation is near the actual
value, whereas prediction refers to continuous estimation of the
value.
◦ High bias = low accuracy (not close to real value)
◦ High variance = low prediction (values are scattered)
◦ Low bias = high accuracy (close to real value)
◦ Low variance = high prediction (values are close to each other).
If we have a regression model which is highly accurate and
highly predictive; then, the overall error of our model will be low,
implying a low bias (high accuracy) and low variance (high
prediction). This is highly preferable.
If the variance increases (low prediction), the spread of our
data points increases, which results in less accurate
prediction.
If the bias increases (low accuracy), the error between our
predicted value and the observed values increases.
Therefore, balancing out bias and accuracy is essential in a
regression model.
3.6 Improving Accuracy of the Linear Regression
Model
In the linear regression model, it is assumed that the
number of observations (n) is greater than the number of
parameters (k) to be estimated, i.e. n > k. So, the least
squares estimates tend to have low variance and hence
will perform well on test observations.
If observations (n) is not much larger than parameters (k),
then there can be high variability in the least squares fit,
resulting in overfitting and leading to poor predictions.
If k > n, then linear regression is not usable. This also
indicates infinite variance, and so, the method cannot be
used at all.
Accuracy of linear regression can be improved using the
following three methods:
1. Shrinkage Approach
2. Subset Selection
3. Dimensionality (Variable) Reduction
3.6 Improving Accuracy of the Linear Regression
Model
3.6.1 Shrinkage (Regularization) approach
By limiting (shrinking) the estimated coefficients, we can try to
reduce the variance at the cost of a negligible increase in bias.
This can lead to substantial improvements in the accuracy of the
model.
In the multiple regression model, few variables used are not
associated with the overall response and are called as irrelevant
variables.
Shrinkage approach involves fitting a model involving all
predictors.
The estimated coefficients are shrunken towards zero relative to
the least squares estimates.
This shrinkage (also known as regularization) has the effect of
reducing the overall variance.
Some of the coefficients may also be estimated to be exactly
zero, thereby indirectly performing variable selection.
The two best-known techniques for shrinking the regression
coefficients towards zero are:
1. ridge regression
2. lasso (Least Absolute Shrinkage Selector Operator)
3.6 Improving Accuracy of the Linear Regression
Model
3.6.1 Shrinkage (Regularization) approach
1. Ridge regression:
Ridge regression performs L2 regularization, i.e. it adds penalty
equivalent to square of the magnitude of coefficients.
Minimization objective of ridge =
LS Obj + α × (sum of square of coefficients)
Ridge regression adds “squared magnitude” of coefficient as
penalty term to the loss function. Here the highlighted part
represents L2 regularization element.
Ridge regression works best in situations where the least
squares estimates have high variance.
Ridge regression will perform better when the response is a
function of many predictors, all with coefficients of roughly equal
size.
One disadvantage with ridge regression is that it will include all k
predictors in the final model.
3.6 Improving Accuracy of the Linear Regression
Model
3.6.1 Shrinkage (Regularization) approach
2. Lasso (Least Absolute Shrinkage and Selection
Operator) regression:
Lasso regression performs L1 regularization, i.e. it adds
penalty equivalent to the absolute value of the magnitude
of coefficients.
Minimization objective of lasso =
LS Obj + α × (absolute value of the magnitude of
coefficients)
Lasso Regression adds “absolute value of magnitude” of
coefficient as penalty term to the loss function.
The lasso can be expected to perform better in a setting
where a relatively small number of predictors have
substantial coefficients, and the remaining predictors have
coefficients that are very small or equal to zero.
3.6 Improving Accuracy of the Linear Regression
Model
3.6.2 Subset Selection
Identify a subset of the predictors that is assumed
to be related to the response and then fit a model
using OLS on the selected reduced subset of
variables.
There are two methods in which subset of the
regression can be selected:
1. Best subset selection
2. Stepwise subset selection
1. Best subset selection:
◦ In best subset selection, we fit a separate least
squares regression for each possible subset of the k
predictors.
◦ This procedure considers all the possible (2k ) models
containing subsets of the p predictors.
◦ For computational reasons, best subset selection cannot
be applied with very large value of predictors (k).
3.6 Improving Accuracy of the Linear Regression
Model
3.6.2 Subset Selection
1. Stepwise subset selection:
Two types of stepwise subset selection:
1. Forward stepwise selection (0 to k)
2. Backward stepwise selection (k to 0)
Forward stepwise selection begins with a model
containing no predictors, and then, predictors are added
one by one to the model, until all the k predictors are
included in the model.
At each step, the variable (X) that gives the highest
additional improvement to the fit is added.
Forward stepwise selection is a computationally efficient
alternative to best subset selection and it considers much
smaller set of models compared to best subset selection
Backward stepwise selection begins with the least
squares model which contains all k predictors and then
iteratively removes the least useful predictor one by one.
3.6 Improving Accuracy of the Linear Regression
Model
3.6.3 Dimensionality reduction (Variable
reduction)
Subset selection or shrinkage approaches
control variance either by using a subset of the
original variables or by shrinking their
coefficients towards zero.
In dimensionality reduction, predictors (X) are
transformed, and the model is set up using the
transformed variables after dimensionality
reduction.
The number of variables is reduced using the
dimensionality reduction method.
Principal component analysis is one of the most
important dimensionality (variable) reduction
techniques.
3.7 Polynomial Regression Model
Polynomial Regression is a regression algorithm that
models the relationship between a dependent(y) and
independent variable(x) as nth degree polynomial.
Polynomial Regression equation is:
y= b0+b1x1+ b2x12+ b2x13+......+bnx1n
It is a linear model with some modification in order to
increase the accuracy.
The dataset used in Polynomial regression for training
is of non-linear nature.
It makes use of a linear regression model to fit the
complicated and non-linear functions and datasets.
In Polynomial regression, the original features are
converted into Polynomial features of required degree
(2,3,..,n) and then modeled using a linear model.
Consider an example where input value (X) is 35, and the
degree of a polynomial is 2, so we will find 35 power 0, 35
power 1, and 35 power 2 this helps to interpret the non-
linear relationship in data.
3.7 Polynomial Regression Model
Need for Polynomial Regression:
If we apply a linear model on a linear dataset, then it
provides us a good result as we have seen in Simple
Linear Regression.
But if we apply the same model without any modification
on a non-linear dataset, then it will produce a drastic
output. Due to which loss function will increase, the error
rate will be high, and accuracy will be decreased.
So for such cases, where data points are arranged in a
non-linear fashion, we need the Polynomial
Regression model.
3.7 Polynomial Regression Model
Example: Let us use the below data set of (X, Y) for
degree 3 polynomial.

the regression
line is slightly
curved for
polynomial
degree 3 with
the above 15
data points.
3.7 Polynomial Regression Model
Example: Let us use the below data set of (X, Y) for
degree 14 polynomial.

At degree 14,
(extreme case)
the regression line
will be overfitting
into all the original
values of X.
3.8 Logistic Regression
Logistic regression is one of the most popular
Machine Learning algorithms, which comes under
the Supervised Learning technique.
It is used for predicting the categorical
dependent variable using a given set of
independent variables.
Logistic regression predicts the output of a
categorical dependent variable. Therefore the
outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc.
but instead of giving the exact value as 0 and 1, it
gives the probabilistic values which lie between
0 and 1.
Linear Regression is used for solving Regression
problems, whereas Logistic regression is used
for solving the classification problems.
3.8 Logistic Regression
In Logistic regression, instead of fitting a regression
line, we fit an "S" shaped logistic function, which
predicts two maximum values (0 or 1).
The curve from the logistic function indicates the
likelihood of something such as whether the cells
are cancerous or not, a mouse is obese or not
based on its weight, etc
3.8 Logistic Regression
In logistic regression, dependent variable (Y) is
binary (0,1) and independent variables (X) are
continuous in nature.
The probabilities describing the possible
outcomes (probability that Y = 1) of a single trial
are modeled as a logistic function of the
predictor variables.
In the logistic regression model, a chi-square
test is used to gauge how well the logistic
regression model fits the data.
The goal of logistic regression is to predict the
likelihood that Y is equal to 1 given certain
values of X.
So, we are predicting probabilities rather than
the scores of the dependent variable.
3.8 Logistic Regression
Example: We might try to predict whether or not a small project
will succeed or fail on the basis of the number of years of
experience of the project manager handling the project.
We presume that those project managers who have been
managing projects for many years will be more likely to succeed.
This means that as X (the number of years of experience of
project manager) increases, the probability that Y will be equal to
1 (success of the new project) will tend to increase.
If we take a hypothetical example in which 60 already executed
projects were studied and the years of experience of project
managers ranges from 0 to 20 years, we could represent this
tendency to increase the probability that Y = 1 with a graph.
To illustrate this, it is convenient to segregate years of
experience into categories (i.e. 0–8, 9–16, 17–24, 25–32, 33–
40). If we compute the mean score on Y (averaging the 0s and
1s) for each category of years of experience, we will get
something like
3.8 Logistic Regression
When the graph is drawn for the above values of X
and Y, it appears like the graph in Figure 8.18.
As X increases, the probability that Y = 1 increases.
In other words, when the project manager has more
years of experience, a larger percentage of projects
succeed.
A perfect relationship represents a perfectly curved
S rather than a straight line,
3.8 Logistic Regression
In logistic regression, we use a logistic function,
which always takes values between zero and one.
The logistic formulae are stated in terms of the
probability that Y = 1, which is referred to as P. The
probability that Y is 0 is 1 − P.

The ‘ln’ symbol refers to a natural logarithm and a + bX is


the regression line equation.
Probability (P) can also be computed from the regression
equation. So, if we know the regression equation, we
could, theoretically, calculate the expected probability that
Y = 1 for a given value of X.

where ‘exp’ is the exponent function


3.8 Logistic Regression
Example: Let us say we have a model that can
predict whether a person is male or female on
the basis of their height. Given a height of 150
cm, we need to predict whether the person is
male or female.
We know that the coefficients of a = −100 and b
= 0.6.
Using the above equation, we can calculate the
probability of male given a height of 150 cm or
more formally P(male|height = 150).

or a probability of near zero that the person is a male.


3.8 Logistic Regression
Assumptions in logistic regression:
1. There exists a linear relationship between
logit function and independent variables
2. The dependent variable Y must be
categorical (1/0) and take binary value,
e.g. if pass then Y = 1; else Y = 0
3. The data meets the ‘iid’ criterion, i.e. the
error terms, ε, are independent from one
another and identically distributed
4. The error term follows a binomial
distribution [n, p]
n = # of records in the data
p = probability of success (pass, responder)
3.9 Regularization
Regularization refers to techniques used to calibrate
machine learning models to minimize the adjusted loss
function and avoid overfitting or underfitting.
Using Regularization, we can fit our machine learning
model appropriately on a given test set and hence reduce
the errors in it.
This technique can be used in such a way that it will allow
to maintain all variables or features in the model by
reducing the magnitude of the variables.
In regularization technique, we reduce the magnitude of
the features by keeping the same number of features.
3.9 Regularization
How Regularization works?
Regularization works by adding a penalty or complexity term to the
complex model.
Consider the simple linear regression equation:
y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b
Y represents the value to be predicted
X1, X2, …Xn are the features for Y.
β1,…..βn are the weights or magnitude attached to the features, respectively.
β0 represents the bias of the model,
b represents the intercept.
Linear regression models try to optimize the β0 and b to minimize the
cost function.
The equation for the cost function for the linear model is given below:

Now, we will add a loss


function and optimize
parameter to make the model
that can predict the accurate
value of Y
3.9 Regularization
Ridge Regularization :
Also known as Ridge Regression, it modifies the over-fitted or under
fitted models by adding the penalty equivalent to the sum of the squares
of the magnitude of coefficients.
The magnitude of coefficients is squared and added.
Ridge Regression performs regularization by shrinking the coefficients
present.
This is also called L2 regularization, since its adding a penalty-
equivalent to the Square-of-the Magnitude of coefficients.
In this technique, the cost function is altered by adding the penalty term
to it. The amount of bias added to the model is called Ridge
Regression penalty.
In the cost function, the penalty term is represented by Lambda λ. By
changing the values of the penalty function, we are controlling the
penalty term.
it is used to prevent multicollinearity, and it reduces the model
complexity by coefficient shrinkage.
3.9 Regularization
Lasso Regularization :
Lasso regression is another regularization technique to
reduce the complexity of the model. It stands for Least
Absolute and Selection Operator.
It is similar to the Ridge Regression except that the penalty
term contains only the absolute weights instead of a
square of weights.
It is also called as L1 regularization.
The equation for the cost function of Lasso regression will
be:
3.9 Regularization
Elastic Net Regularization :
Elastic net linear regression uses the penalties from both the
lasso and ridge techniques to regularize regression models.
The technique combines both the lasso and ridge regression
methods by learning from their shortcomings to improve the
regularization of statistical models.
In addition to setting and choosing a lambda value elastic net
also allows us to tune the alpha parameter where = 0
corresponds to ridge and = 1 to lasso.
If you plug in 0 for alpha, the penalty function reduces to the L1
(ridge) term and if we set alpha to 1 we get the L2 (lasso) term.
Cost function of Elastic Net Regularization

Therefore we can choose an alpha value between 0 and 1 to optimize


the elastic net(here we can adjust the weightage of each
regularization, thus giving the name elastic).
Unit--3 Question Bank
Unit
1. Explain about Simple Linear Regression Model with
suitable example.
2. Problems on Simple Linear Regression:
a) Finding regression equation
b) Calculating R-squared score
3. Discuss about OLS method in linear regression with an
example.
4. Write a short note on multi-linear regression.
5. What are the assumptions to be made for creating a
successful regression model
6. Discuss the major problems in regression analysis.
7. Explain various methods in improving the accuracy of a
regression model.
8. Illustrate polynomial regression with relevant example.
9. Differentiate linear and logistic regression
10. Discuss the need of regularization in regression and
explain different regularization techniques.

You might also like