Hhghiikkk
Hhghiikkk
Hhghiikkk
Linear Regression
Linear Regression
Objective
• Introduction to Linear Regression
• Regression use case
• Types of regression models
• Regression modelling
• Parameter estimation
Introduction
• In 1800, a person named Francis Galton, was studying the relationship
between parents and their children.
• He investigated the relationship between height of fathers and their sons.
• He discovered that a man’s son tends to be roughly as tall as his father,
however, son’s height tended to be closer to the overall average
height of all people.
Continuous values
different cars.
• The question is: Given this
dataset, can we predict the
Co2 emission of a car,
using another field, such as
Engine size? Yes!
Scatter Plot
• To understand linear regression, we can
plot our variables here.
• Engine size -- independent variable,
Emission – dependent/target value that
we would like to predict.
• A scatterplot clearly shows the relation
between variables where changes in
one variable "explain" or possibly
"cause" changes in the other variable.
• Also, it indicates that these variables are
linearly related.
Inference from Scatter Plot
• As the Engine Size increases, so
do the emissions.
• How do we use this line for
prediction now?
• Let us assume, for a moment, that
the line is a good fit of data.
• We can use it to predict the
emission of an unknown car.
Regression Modeling – Fitting Line
• Fitting line help us to predict the target value, Y, using the independent
variable 'Engine Size' represented on X axis
• The fit line is shown traditionally as a polynomial.
• In Simple regression Problem (single x), the form of the model would be
𝑦ො = 𝜃1 + 𝜃2 𝑥1
𝜃1 = intercept 𝜃2 = slope of the line
• Where Y is the dependent variable, or the predicted value and X is the
independent variable.
• 𝜃1 and 𝜃2 are coefficient of linear equation
Regression Modeling
𝑦ො = 𝜃1 + 𝜃2 𝑥1
Now the questions are:
"How would you draw a line through the points?"
"How do you determine which line fits best?"
• Linear regression estimates the coefficients of the line.
• This means we must calculate 𝜽𝟎 and 𝜽𝟏 to find the best line to ‘fit’ the
data.
• Let’s see how we can adjust the parameters to make the line the best fit
for the data ?
• Let’s assume we have already found the ‘best fit’ line for our data.
Model Error
• If we have, for instance, a car with engine
size x1 =5.4, and actual Co2=250,
• Its Co2 should be predicted very close to
the actual value, which is y=250, based on
historical data.
• But, if we use the fit line it will return
ŷ =340.
• Compare the actual value with we
predicted using our model, you will find
out that we have a 90-unit error.
• Prediction line is not accurate. This error is Error = ŷ – y = 340-250 = 90
also called the residual error. linear-regression-machine-learning
Mean Absolute Error
• θ0 and θ1 (intercept and slope of the line) are the coefficients of the fit
line.
• Need to calculate the mean of the independent and dependent or target
columns, from the dataset.
σ𝒔𝒊=𝟏 𝒙𝒊 −ഥ
𝒙 𝒚𝒊 −ഥ
𝒚
𝜣𝟏 = 𝟐
σ𝒔𝒊=𝟏 𝒙𝒊 −ഥ
𝒙
𝑥ҧ = 3.34
𝑦ത = 256
Θ0 = 𝑦ത − Θ1 𝑥ҧ = 125.74 Θ1 = 39
Making Predictions
• We can write down the polynomial of the line.
ෝ = 125.74 + 39x1
𝒚
• Making predictions is as simple as solving the equation for a specific set
of inputs.
• Imagine we are predicting Co2 Emission(y) from EngineSize(x) for the
Automobile in record number 9. So, looking to the dataset, x1 = 2.4