03 Linear Regression Intuition
03 Linear Regression Intuition
Mostafa S. Ibrahim
Teaching, Training and Coaching for more than a decade!
3 250 700,000
4 270 760,000
5 300 850,000
6 325 925,000
7 400 115,0000
Visualization
● Visualization is
a critical key for success
● Can you guess the price
of a house of size
350 m2?
● How can we model this
data such that in
future we can make a
prediction automatically?
Modeling the data as a line
● The data seems came from a linear equation (mx+c)
● Use 2 points to compute these 2 parameters (weights)
● With some math:
y = 3 * x - 50
● Given x = 350
y = 1000 (thouthands)
● We just learned from data
Real-life Data
● Sadly, real life data has
variance (e.g. property house
between 250k +/- 10k
● From numbers perspective,
we can think, there is some
noise added to our data
● Observe: the x and y range is
now close to the [0, 1] range
How to model such noisy data?
● Our intuition is, this data really
came from some line
● How can we find one good line to
fit the data as closely as possible?
○ Good is a vague word!
○ How to define the criteria?
Which line is a better fit?
● We have 6 data points (e.g. size vs price)
● 2 lines are proposed here
● Which one is a better fit?
● How did you decide so?
Criteria: The closest!
● We need the line that is closer to most of the points!
● How can we measure how close a line is to a set of data points?
● We need to use some distance metric between the ground truth and the
prediction
○ Assume our dataset has point (size=200, price = 350,000)
○ Using size=200 in a line gives us price = 350,427
○ We need to compute the distance between 350,000 and 350,427
○ There is an error from the difference between these 2 values
■ Target (ground truth) vs prediction (of our model, the line)
Distance metric between 2 values
● Linear regression typically uses
the squared error cost function
○ A cost function returns a numerical
value based on the error
● The squared error function
computes:
(target - prediction)2
○ Error: 6.75 - 4.5 = 2.25
■ Aka residual
○ Squared error =2.25 x 2.25
Mean Squared Error (MSE)
● Now, we know how to compute the error of a single point
● What about a dataset of M points?
● Simply sum the cost and average it
● Why average? To give the average squared error per example
○ Yi is the ground truth
○ We can calculate square root to get the average error (+ve)