0% found this document useful (0 votes)
9 views23 pages

03 Linear Regression Intuition

Uploaded by

elrasifasmaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views23 pages

03 Linear Regression Intuition

Uploaded by

elrasifasmaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Machine Learning

Linear Regression Intuition

Mostafa S. Ibrahim
Teaching, Training and Coaching for more than a decade!

Artificial Intelligence & Computer Vision Researcher


PhD from Simon Fraser University - Canada
Bachelor / MSc from Cairo University - Egypt
Ex-(Software Engineer / ICPC World Finalist)

© 2023 All rights reserved.


Please do not reproduce or redistribute this work without permission from the author
Recall
● Supervised learning
○ We are given both the input (X) and its output (Y)
○ We want to be able to map X to Y (predict [‫ ]ﺗوﻗﻊ‬Y given the input X)
● Regression
○ The Y that we predict is a real-valued output (e.g. 0.7)
○ For example, we can predict the price of a property based on the features and/or location
○ Forget about the English meaning of regression. There is some history for it.
● Classification
○ The Y that we predict is a discrete-valued output (e.g. 3)
○ Example: is this image a cat, dot or cow?
House Price Prediction
● One of the most common ML tasks is predicting the price of a property
● For simplicity, we will assume we have only a single factor: the house size
● Goal: Learn and Predict
● 1) Learn
○ We collect data from 'N' pairs of examples (input being the size, output being the price)
○ Learn the patterns/associations in these data
● 2) Predict
○ Given a new query of a ‘house size’, predict the ‘price’
House Price Prediction: Dataset
ID Size in meters2 Price (target)
● N = 8 (8 training samples)
0 100 250,000 ● Size is X (input)
● Price is Y (output)
1 130 340,000
● The i-th examples is referred with: x(i), y(i)
2 200 550,000

3 250 700,000

4 270 760,000

5 300 850,000

6 325 925,000

7 400 115,0000
Visualization
● Visualization is
a critical key for success
● Can you guess the price
of a house of size
350 m2?
● How can we model this
data such that in
future we can make a
prediction automatically?
Modeling the data as a line
● The data seems came from a linear equation (mx+c)
● Use 2 points to compute these 2 parameters (weights)
● With some math:
y = 3 * x - 50
● Given x = 350
y = 1000 (thouthands)
● We just learned from data
Real-life Data
● Sadly, real life data has
variance (e.g. property house
between 250k +/- 10k
● From numbers perspective,
we can think, there is some
noise added to our data
● Observe: the x and y range is
now close to the [0, 1] range
How to model such noisy data?
● Our intuition is, this data really
came from some line
● How can we find one good line to
fit the data as closely as possible?
○ Good is a vague word!
○ How to define the criteria?
Which line is a better fit?
● We have 6 data points (e.g. size vs price)
● 2 lines are proposed here
● Which one is a better fit?
● How did you decide so?
Criteria: The closest!
● We need the line that is closer to most of the points!
● How can we measure how close a line is to a set of data points?
● We need to use some distance metric between the ground truth and the
prediction
○ Assume our dataset has point (size=200, price = 350,000)
○ Using size=200 in a line gives us price = 350,427
○ We need to compute the distance between 350,000 and 350,427
○ There is an error from the difference between these 2 values
■ Target (ground truth) vs prediction (of our model, the line)
Distance metric between 2 values
● Linear regression typically uses
the squared error cost function
○ A cost function returns a numerical
value based on the error
● The squared error function
computes:
(target - prediction)2
○ Error: 6.75 - 4.5 = 2.25
■ Aka residual
○ Squared error =2.25 x 2.25
Mean Squared Error (MSE)
● Now, we know how to compute the error of a single point
● What about a dataset of M points?
● Simply sum the cost and average it
● Why average? To give the average squared error per example
○ Yi is the ground truth
○ We can calculate square root to get the average error (+ve)

● Without the 1/n, it is called the sum of square error (SSE)


Can we use other metrics?
● Can we use other metrics such as the absolute error |T-P|? Yes
● There are many reasons for preferring squared error (easy to optimize,
derivative everywhere, has a closed form, works in practice)
● However, one important reason is related to gaussian noise
○ You can find more mathematical details in famous books, such as Bishop’s book
○ The web also has a lot of details
○ Gaussian also plays a vital role in math and statistics
● One major drawback: sensitivity to outliers
● We may return to this concern later on
○ While rare, interviewers can sometimes stress regarding the deep differences
○ Observe: the normal distribution formula is a function in Euclidean distance
Back to the 2 lines
● Now we know a formula that can assign the total error/cost of a line
○ MSE: target vs prediction
● Then we can simply choose the
line that has the smallest error
○ E.g. 5.0 (red) vs 27.9 (blue)
● The key question now is:
Given the dataset, how can
we find the parameters (m, c)
that minimize the cost function?
/2
Code snippet
Question (src)
● We have a line and 2 datasets. Which data has the higher MSE?
Question!
● Assume we learned the below line of data (size vs price in thouthands)
● Now a new query is: what is the price for a house of size 600?
Question!
● Assume all our data have the same x (e.g. multiple prices of the same house
specs)
○ E.g. (5, 3), (5, 7), (5, 50)
● What is an optimal line representing the data?

● Any line passing with the point x = x, y = average(ys)


○ x = 5, y = 20
Relevant Materials
● Prof Andrew Ng: video1, video2
● Hesham Asem (Arabic): video1, video
“Acquire knowledge and impart it to the people.”

“Seek knowledge from the Cradle to the Grave.”

You might also like