0% found this document useful (0 votes)

68 views35 pages

Regression

The document discusses regression analysis techniques. Regression is used to model predictive relationships between independent and dependent variables. The goal is to find the best fitting curve for the dependent variable based on the independent variables. The quality of fit can be measured by the coefficient of correlation. Key steps for regression include identifying variables, establishing the dependent variable, examining relationships visually, and using other variables to predict the dependent variable. Regression determines the relationship between independent and dependent variables to predict one with the other.

Uploaded by

karvamin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views35 pages

Regression

Uploaded by

karvamin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Unit 4:

Regression
INTRODUCTION
TO REGRESSION
• Regression is a well-known statistical technique to model the
predictive relationship between several independent variables
and one dependent variable.
• The objective is to find the best-fitting curve for a dependent
variable in a multidimensional space, with each independent
variable being a dimension.
• The curve could be a straight line, or it could be a nonlinear
curve.
• The quality of fit of the curve to the data can be measured by a
coefficient of correlation (r), which is the square root of the
amount of variance explained by the curve.
POINT TO PONDER?
● “Imagine you have made plans with friends after a long time and you wish to go
out, but you are not sure whether it will rain or not. It’s the monsoon season, but
your mom says the air feels dry today, and therefore the probability of raining
today is less. On the contrary, your sister believes because it rained yesterday it’s
likely that it will rain today. Considering you have no control over the weather,
how will you decide whose opinion to take more seriously, keeping in mind the
fact that you are impartial towards both?”

Source: https://www.dezyre.com/article/types-of-regression-analysis-in-machine-learning/410
Geographical
location

Dependent Variable

Linearly Correlated

Rainfall/
Precipitation
Independent Variable
Wind
Humidity Speed
KEY STEPS

The key steps for regression are simple

1. List all the variables available for making
the model.
2. Establish a Dependent Variable (DV) of
interest.
3. Examine visual (if possible) relationships
between variables of interest.
4. Find a way to predict DV using other
variables.
INDEPENDENT AND DEPENDENT
VARIABLES
• In our example what we are trying to predict is
today’s precipitation level which is dependent on
the level of humidity and rain received yesterday
hence it is called, the dependent variable.
• The variables on which it depends will be
called independent variables.
• What we try to do with regression Analysis is to
model or quantify the relationship between these
two kinds of variables and hence predict one with
the help of the other with a level of certainty.
• To solve our problem If we were to do a simple
linear regression, we would collect the humidity
level and precipitation level for the previous month
and plot them.
REGRESSION ANALYSIS
• Regression analysis is a predictive modelling technique that analyzes the relation between the
target or dependent variable and independent variable in a dataset.
• The different types of regression analysis techniques get used when the target and
independent variables show a linear or non-linear relationship between each other, and the
target variable contains continuous values.
• The regression technique gets used mainly to determine the predictor strength, forecast
trend, time series, and in case of cause & effect relation.
• Regression analysis is the primary technique to solve the regression problems in machine
learning using data modelling.
• It involves determining the best fit line, which is a line that passes through all the data points
in such a way that distance of the line from each data point is minimized.

Original Source: https://www.upgrad.com/blog/types-of-regression-models-in-machine-learning/

Univariate vs Multivariate vs Time-series
Regression

Univariate: Input Vector Regression Model Continuous Output Value

Continuous Output Value 1

Multivariate: Input Vector Regression Model ……………………………..
Continuous Output Value n

Regression Model
Previous Values Future Value
Time-series: Xt-1 Xt
as Prediction
Model Xt+1
EVALUATING
REGRESSION MODELS
ACCURACY IS NOT A MEASURE TO
CALCULATE REGRESSION!

• A common question by beginners to regression predictive modeling projects is:

How do I calculate accuracy for my regression model?

• Accuracy (e.g. classification accuracy) is a measure for classification, not regression.
• We cannot calculate accuracy for a regression model.
• The skill or performance of a regression model must be reported as an error in those predictions.
• This makes sense if you think about it. If you are predicting a numeric value like a height or a
dollar amount, you don’t want to know if the model predicted the value exactly (this might be
intractably difficult in practice); instead, we want to know how close the predictions were to the
expected values.
• Error addresses exactly this and summarizes on average how close predictions were to their
expected values.
ERROR METRICS
• There are three error metrics that are commonly used for evaluating and
reporting the performance of a regression model; they are:
• Mean Squared Error (MSE).
• Root Mean Squared Error (RMSE).
• Mean Absolute Error (MAE)

• There are many other metrics for regression, although these are the most
commonly used. You can see the full list of regression metrics supported by
the scikit-learn Python machine learning library here:
• Scikit-Learn API: Regression Metrics.
Original Source: https://machinelearningmastery.com/regression-metrics-for-
machine-learning/
1.1. MEAN SQUARED ERROR MEAN
SQUARED ERROR
• Mean Squared Error, or MSE for short, is a popular error metric for
regression problems.
• It is also an important loss function for algorithms fit or optimized using
the least squares framing of a regression problem. Here “least squares”
refers to minimizing the mean squared error between predictions and
expected values.
• The MSE is calculated as the mean or average of the squared differences
between predicted and expected target values in a dataset.
• The squaring also has the effect of inflating or magnifying large errors.
That is, the larger the difference between the predicted and expected
values, the larger the resulting squared positive error. This has the effect
of “punishing” models more for larger errors when MSE is used as a loss
function. It also has the effect of “punishing” models by inflating the
average error score when used as a metric.
• The mean squared error between your expected and predicted values
can be calculated using the mean_squared_error() function from the
scikit-learn library.
• The function takes a one-dimensional array or list of expected values and
predicted values and returns the mean squared error value.
2. ROOT MEAN SQUARED ERROR
• The Root Mean Squared Error, or RMSE, is an extension of
the mean squared error.
• Importantly, the square root of the error is calculated,
which means that the units of the RMSE are the same as
the original units of the target value that is being predicted.
• As such, it may be common to use MSE loss to train a
regression predictive model, and to use RMSE to evaluate
and report its performance.
• MSE uses the square operation to remove the sign of each
error value and to punish large errors. The square root
reverses this operation, although it ensures that the result
remains positive.
• The root mean squared error between your expected and
predicted values can be calculated using
the mean_squared_error() function from the scikit-learn
library.
3. MEAN ABSOLUTE ERROR
• Mean Absolute Error, or MAE, is a popular metric because, like RMSE,
the units of the error score match the units of the target value that is
being predicted.
• MSE and RMSE punish larger errors more than smaller errors, inflating
or magnifying the mean error score. This is due to the square of the
error value. The MAE does not give more or less weight to different
types of errors and instead the scores increase linearly with increases
in error.
• As its name suggests, the MAE score is calculated as the average of
the absolute error values. Absolute or abs() is a mathematical function
that simply makes a number positive. Therefore, the difference
between an expected and predicted value may be positive or negative
and is forced to be positive when calculating the MAE.
• The mean absolute error between your expected and predicted values
can be calculated using the mean_absolute_error() function from the
scikit-learn library.
REGRESSION
ANALYSIS
Outliers Underfitting

Regression
Analysis

Overfitting Heteroscedasticity
Outliers

● Outliers are basically values or data

points that are very stray from the
general population or distribution of
data.
● Outliers have the ability to skew the
results of any ML model towards
their detection.
● Therefore, it is necessary to detect
them early on or use algorithms
resistant to outliers.

Image Source: https://datascience.foundation/

Underfitting and Overfitting
● Bias: Assumptions made by a model to make a
function easier to learn. It is actually the error
rate of the training
● Variance: The difference between the error
rate of training data and testing data is called
variance.
● Underfitting: A statistical model or a machine
learning algorithm is said to have underfitting
when it cannot capture the underlying trend of
the data, i.e., it only performs well on training
data but performs poorly on testing data.
● Overfitting: A statistical model is said to be
overfitted when the model does not make
accurate predictions on testing data. When a
model gets trained with so much data, it starts
learning from the noise and inaccurate data Image Source: https://datascience.foundation/
entries in our data set.
Heteroskedasticity
● Heteroskedasticity refers to situations where the
variance of the residuals is unequal over a range
of measured values.

● When running a regression analysis,

heteroskedasticity results in an unequal scatter of
the residuals (also known as the error term).

● If there is an unequal scatter of residuals, the

population used in the regression contains
unequal variance, and therefore the analysis
results may be invalid.

● We have humidity which predicts rainfall or

precipitation. Now as humidity increases the
amount by which precipitation increases or Image Source: https://datascience.foundation/
decreases is variable and is not fixed.
TYPES OF
REGRESSION
LINEAR REGRESSION
● Linear regression is one of the most basic types of regression in machine learning.
● The linear regression model consists of a predictor variable and a dependent variable related linearly to each other.
● Linear regression with one predictor or independent variable is called Simple Linear Regression
● In case the data involves more than one independent variable, then linear regression is called Multiple Linear Regression
● The below-given equation is used to denote the linear regression model:
● y=mx+c+e

● where m is the slope of the line, c is an intercept, and e represents the error in the model.

● The best fit line is determined by varying the values of m and c.

● The predictor error is the difference between the observed values and the predicted value.

● The values of m and c get selected in such a way that it gives the minimum predictor error. It is important to note that a
simple linear regression model is susceptible to outliers.
Source: https://medium.com/machine-learning-id/simple-linear-
regression-teori-d4abebd1ade2
Multiple Linear Regression
The equation for a multiple linear regression is
shown below.

n stands for the number of variables

More variables are added as features

increase!
LOGISTIC REGRESSION
• Logistic regression is one of the types of regression analysis technique,
which gets used when the dependent variable is discrete.
• Example: 0 or 1, true or false, etc. This means the target variable can
have only two values, and a sigmoid curve denotes the relation between
the target variable and the independent variable.
• Logit function is used in Logistic Regression to measure the relationship
between the target variable and independent variables. Below is the
equation that denotes the logistic regression.
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3….+bkXk
where p is the probability of occurrence of the feature.
LOGISTIC REGRESSION (CONTD.)

● When Logistic regression is applied in real-

world problems – like detecting cancer in
people P here, would tell the probability of
whether the person has cancer or not.
● P less than 0.5 would be mapped to no
cancer and greater than that would map to
cancer.
● Logistic regression is a linear method, but
the predictions are transformed using the
logistic function.
● The curve for it follows the curve for log
function.
Original Source:
https://www.upgrad.com/blog/types-of-
regression-models-in-machine-learning/
Sources:
1. https://www.statisticshowto.com/regularized-regression/z
2. https://algoritmaonline.com/
3. https://www.statology.org/lasso-regression/
Bull’s Eye Example
Simple Linear Regression

Overfitting!!!
Regularization to reduce Overfitting

● Regularized regression is a type of regression where

the coefficient estimates are constrained to zero. The magnitude
(size) of coefficients, as well as the magnitude of the error term, are
penalized. Complex models are discouraged, primarily to
avoid overfitting.

● There are two types of regression that are quite familiar and use this
Regularization technique, namely:
○ Ridge Regression
○ Lasso Regression
Ridge Regression

● Ridge Regression is a variation of linear regression. We use

ridge regression to tackle the multicollinearity problem.
● So to reduce this variance a degree of bias is added to the
regression estimates.
● It can be seen that the main idea of Ridge Regression is to add a
little bias to reduce the value of the variance estimator.
Ridge Regression
Ridge Regression

● It can be seen that the greater the

value of λ (lambda) the
regression line will be more
horizontal, so the coefficient
value approaches 0.
● If λ = 0, the output is similar to
simple linear regression.
● If λ = very large , the coefficients
value approaches 0.
Lasso Regression
● LASSO (Least Absolute Shrinkage Selector
Operator), The algorithm is another variation
of linear regression like ridge regression. We
use lasso regression when we have large
number of predictor variables.

● Lasso regression is a type of linear

regression that uses shrinkage. Shrinkage is
where data values are shrunk towards a
central point, like the mean.

● This type is very useful when you have high

levels of multicollinearity or when you want
to automate certain parts of model
selection, like variable selection/parameter
elimination.
Difference between Ridge and Lasso Regression

● Lasso regression and ridge regression are both known as regularization methods because they both
attempt to minimize the sum of squared residuals (RSS) along with some penalty term.
● In other words, they constrain or regularize the coefficient estimates of the model.
● The main difference between Ridge and LASSO Regression is that if ridge regression can shrink the
coefficient close to 0 so that all predictor variables are retained. Whereas LASSO can shrink the
coefficient to exactly 0 so that LASSO can select and discard the predictor variables that have the right
coefficient of 0.
● When we use ridge regression, the coefficients of each predictor are shrunken towards zero but none of
them can go completely to zero.
● Conversely, when we use lasso regression it’s possible that some of the coefficients could go completely
to zero when λ gets sufficiently large.
Which is better: Ridge and Lasso Regression?

● In cases where only a small number of predictor variables are significant,

lasso regression tends to perform better because it’s able to shrink
insignificant variables completely to zero and remove them from the model.
● However, when many predictor variables are significant in the model and their
coefficients are roughly equal then ridge regression tends to perform better
because it keeps all of the predictors in the model.
● To determine which model is better at making predictions, we perform k-fold
cross-validation. Whichever model produces the lowest test mean squared
error (MSE) is the preferred model to use.

Alumni Management System Report
63% (30)
Alumni Management System Report
53 pages
Unit 2
No ratings yet
Unit 2
100 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
6.classification & Regression
No ratings yet
6.classification & Regression
45 pages
Week 6 - Lecture 12-1
No ratings yet
Week 6 - Lecture 12-1
34 pages
Chapter Three - Regression Feb 26 2024
No ratings yet
Chapter Three - Regression Feb 26 2024
17 pages
Unit2 ML Notes
No ratings yet
Unit2 ML Notes
19 pages
UCS-401 - CSE7th M L Lect 06 - Done
No ratings yet
UCS-401 - CSE7th M L Lect 06 - Done
12 pages
Regression v33
No ratings yet
Regression v33
81 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
Day.10 Regression Evaluation Metrics MSE, RMSE, MAE, R-Squared
No ratings yet
Day.10 Regression Evaluation Metrics MSE, RMSE, MAE, R-Squared
8 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
1-Linear Regression
No ratings yet
1-Linear Regression
22 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Mod2 - Regression Metrics
No ratings yet
Mod2 - Regression Metrics
24 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
Regression Metrics
No ratings yet
Regression Metrics
3 pages
Modern Pridictive Modelling (Regression)
No ratings yet
Modern Pridictive Modelling (Regression)
12 pages
Regresion
No ratings yet
Regresion
38 pages
Chapter2 1 55ppt
No ratings yet
Chapter2 1 55ppt
8 pages
Week 7 - Regression
No ratings yet
Week 7 - Regression
24 pages
ML U2 Regression
No ratings yet
ML U2 Regression
20 pages
Regression
No ratings yet
Regression
45 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
MEd20v2 PDF
No ratings yet
MEd20v2 PDF
502 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Regression Notes
No ratings yet
Regression Notes
6 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
59 pages
ML PR-2
No ratings yet
ML PR-2
11 pages
4 Unit 1 Part
No ratings yet
4 Unit 1 Part
9 pages
Mid-1 ML
No ratings yet
Mid-1 ML
12 pages
Regression Metrics
No ratings yet
Regression Metrics
26 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
Least Square Method Definition
No ratings yet
Least Square Method Definition
7 pages
Data Science
No ratings yet
Data Science
5 pages
Model Evaluation Metrics
No ratings yet
Model Evaluation Metrics
21 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Assesing Performance of Regression-Error Measures
No ratings yet
Assesing Performance of Regression-Error Measures
5 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
ML Exp5
No ratings yet
ML Exp5
7 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Evaluation Metrics For Your Regression Model - Analytics Vidhya
No ratings yet
Evaluation Metrics For Your Regression Model - Analytics Vidhya
6 pages
10 - 4 - ML - SUP - Linear Regression
No ratings yet
10 - 4 - ML - SUP - Linear Regression
59 pages
UNIT 3 Regression
No ratings yet
UNIT 3 Regression
5 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
L4b - Perfomance Evaluation Metric - Regression
No ratings yet
L4b - Perfomance Evaluation Metric - Regression
6 pages
A Guide On How To Compare Different Models in Linear Progression
No ratings yet
A Guide On How To Compare Different Models in Linear Progression
8 pages
L4b - Perfomance Evaluation Metric - Regression
No ratings yet
L4b - Perfomance Evaluation Metric - Regression
6 pages
Standard Spreadsheet For Continuous Column
100% (1)
Standard Spreadsheet For Continuous Column
12 pages
Teit ML2
No ratings yet
Teit ML2
11 pages
Love With Pain E-Book
No ratings yet
Love With Pain E-Book
124 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Regression Notes
100% (1)
Regression Notes
20 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Guieline Full
No ratings yet
Guieline Full
460 pages
Iso 25539-4-2021
100% (1)
Iso 25539-4-2021
18 pages
Zapper Frequency Generator Hulda Clark Royal Rife
No ratings yet
Zapper Frequency Generator Hulda Clark Royal Rife
48 pages
Bài So N - Syntax Lesson 5
No ratings yet
Bài So N - Syntax Lesson 5
21 pages
Chapter Four Structure of Cooperatives in Ethiopia
100% (1)
Chapter Four Structure of Cooperatives in Ethiopia
3 pages
Chief Architect x7 Users Guide
No ratings yet
Chief Architect x7 Users Guide
240 pages
History of English Language
No ratings yet
History of English Language
10 pages
2600 Corporate Telecom Cabling Standard Rev 1A - (66778120)
No ratings yet
2600 Corporate Telecom Cabling Standard Rev 1A - (66778120)
182 pages
Final Case
No ratings yet
Final Case
45 pages
Project On Marketing at University of Axis
No ratings yet
Project On Marketing at University of Axis
57 pages
Launchpad
No ratings yet
Launchpad
41 pages
Questionnaire - The Effects of Hybrid Work Setup To The Employee Well Being and Productivity
No ratings yet
Questionnaire - The Effects of Hybrid Work Setup To The Employee Well Being and Productivity
3 pages
Term Papers
No ratings yet
Term Papers
130 pages
DDRM Quarter 2 - Lesson 5
No ratings yet
DDRM Quarter 2 - Lesson 5
24 pages
07-Manual Transaxle System
No ratings yet
07-Manual Transaxle System
29 pages
Core Answer
No ratings yet
Core Answer
22 pages
E-Series, Manual Torque Multipliers: Accurate, Efficient Torque Multiplication
No ratings yet
E-Series, Manual Torque Multipliers: Accurate, Efficient Torque Multiplication
2 pages
Joint Application For Sale and Transfer of Permanent Authority
No ratings yet
Joint Application For Sale and Transfer of Permanent Authority
5 pages
Grade 7 Math Paper 1st Term 2023
No ratings yet
Grade 7 Math Paper 1st Term 2023
4 pages
5.QFP - Probability-1
No ratings yet
5.QFP - Probability-1
2 pages
DBATU Dec 2019 Time Table Sem 7
No ratings yet
DBATU Dec 2019 Time Table Sem 7
2 pages
NG Âm Bu I 1
No ratings yet
NG Âm Bu I 1
6 pages
Specialties and Accessories: Buffer Tank Hydraulic Separator
No ratings yet
Specialties and Accessories: Buffer Tank Hydraulic Separator
4 pages
Certifications: Toastmasters Diploma in IFRS Us-Gaap (FP&A) Oracle
No ratings yet
Certifications: Toastmasters Diploma in IFRS Us-Gaap (FP&A) Oracle
1 page
Collaboration and Conflict in The Development of A Computerized Dispatch Facility
No ratings yet
Collaboration and Conflict in The Development of A Computerized Dispatch Facility
11 pages
Seminar Report On Bio-Diesel: (In Partial Fulfilment To B.Tech Degree From MMEC, Mullana.)
No ratings yet
Seminar Report On Bio-Diesel: (In Partial Fulfilment To B.Tech Degree From MMEC, Mullana.)
9 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet

Regression

Uploaded by

Regression

Uploaded by

Unit 4:

The key steps for regression are simple

Original Source: https://www.upgrad.com/blog/types-of-regression-models-in-machine-learning/

Univariate: Input Vector Regression Model Continuous Output Value

Continuous Output Value 1

• A common question by beginners to regression predictive modeling projects is:

How do I calculate accuracy for my regression model?

● Outliers are basically values or data

Image Source: https://datascience.foundation/

● When running a regression analysis,

● If there is an unequal scatter of residuals, the

● We have humidity which predicts rainfall or

● The best fit line is determined by varying the values of m and c.

n stands for the number of variables

More variables are added as features

● When Logistic regression is applied in real-

● Regularized regression is a type of regression where

● Ridge Regression is a variation of linear regression. We use

● It can be seen that the greater the

● Lasso regression is a type of linear

● This type is very useful when you have high

● In cases where only a small number of predictor variables are significant,

You might also like