0% found this document useful (0 votes)
7 views64 pages

Linear Regression

The document discusses the concepts of breast cancer classification and housing price prediction using linear regression. It outlines the differences between classification and regression problems, provides examples, and explains the cost function and gradient descent in linear regression. The document emphasizes the importance of fitting a line to data and evaluating its accuracy through a cost function.

Uploaded by

vagifsamadov2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views64 pages

Linear Regression

The document discusses the concepts of breast cancer classification and housing price prediction using linear regression. It outlines the differences between classification and regression problems, provides examples, and explains the cost function and gradient descent in linear regression. The document emphasizes the importance of fitting a line to data and evaluating its accuracy through a cost function.

Uploaded by

vagifsamadov2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Breast Cancer (malignant, benign)

Classification
1(Y)
Discrete valued
Malignant?
output (0 or 1)
0(N)
Tumor Size

Tumor Size
[Feature 1]

Andrew Ng
Breast Cancer (malignant, benign)
Malignant
- Clump Thickness
- Uniformity of Cell Size
Age - Uniformity of Cell Shape

Benign

Tumor Size

Andrew Ng
Predicting Housing Prices

Andrew Ng
Predicting Housing Prices

3000 square feet


Andrew Ng
Predicting Housing Prices
500

400
Price 300
(in 1000s 220
200
of dollars)
100

0
0 500 1000 1500 2000 2500 3000
1250
Size (feet2)
Andrew Ng
Predicting Housing Prices
500

400

300
270
Price 200

(in 1000s 100


of dollars) 0
0 500 1000 1500 2000 2500 3000
1250
Size (feet2)
Andrew Ng
500
Housing Prices 400

300

Price 200

(in 1000s 100


of dollars) 0
0 500 1000 1500 2000 2500 3000
Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict continuous real-valued
each example in the data. output

Andrew Ng
You’re running a company, and you want to develop learning algorithms to address each
of two problems.

Problem 1: You have a large inventory of identical items. You want to predict how many
of these items will sell over the next 3 months.
Problem 2: You’d like software to examine individual customer accounts, and for each
account decide if it has been hacked/compromised.

Should you treat these as classification or as regression problems?


(a) Treat both as classification problems.

(b) Treat problem 1 as a classification problem, problem 2 as a regression problem.


(c) Treat problem 1 as a regression problem, problem 2 as a classification problem.

(d) Treat both as regression problems.

Andrew Ng
Linear Regression
with One Variable
Model
representation

Andrew Ng
Training Set of Size in feet2 (x) Price ($) in 1000's (y)
housing prices 2104 460
1416 232
1534 315 m=47
852 178
… …
Notation:
m = Number of training examples
x’s = “input” variable / features 𝑥 (1)=2104
y’s = “output” variable / “target” variable 𝑦 (1) = 460
(𝒙 𝑖 , 𝒚(𝑖) ) = i-th training example
Andrew Ng
Training Set How do we represent h ?

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 (𝑥)

Learning Algorithm
𝑦

Size of h Estimated
house price
𝑥
𝑥 hypothesis 𝑦
Linear regression with one variable
Univariate linear regression

Andrew Ng
Linear regression
with one variable
Cost function

Andrew Ng
How to fit the best possible line to our data?

ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 (𝑥)

Andrew Ng
Size in feet2 (x) Price ($) in 1000's (y)
Training Set
2104 460
1416 232
1534 315
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ?
Andrew Ng
3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3

Andrew Ng
𝑦
𝜃0 , 𝜃1 =? ?

Idea: Choose so that


is close to for our
training examples

Andrew Ng
Cost Function
Find 𝜃0 and 𝜃1 that minimize ℎ𝜃 𝑥 − 𝑦 where

2
minimize ℎ𝜃 𝑥 − 𝑦
𝜃0 ,𝜃1

𝑚
(𝑖) (𝑖) 2
minimize ෍ ℎ𝜃 𝑥 −𝑦 Squared Error
𝜃0 ,𝜃1
𝑖=1 Cost Function
𝑚
1 2
minimize ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖) min 𝐽(𝜃0 , 𝜃1 )
𝜃0 ,𝜃1 2𝑚 𝜃0 ,𝜃1
𝑖=1

Andrew Ng
Cost Function
Calculate how far off we are, on average, from the actual data

Andrew Ng
Cost Function

Andrew Ng
Linear regression
with one variable
Cost function
intuition I
Andrew Ng
Simplified
Hypothesis:

Parameters:

Cost Function:

Goal:

Andrew Ng
(for fixed , this is a function of x) (function of the parameter )
3
3
2
2
y
1
1
0
0
0 1 2 3
𝑚 -0.5 0 0.5 1 1.5 2 2.5
1 2
𝐽 𝜃1 = ෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
2𝑚
𝑖=1
𝑚
1 2 1 𝐽 1 =0
= ෍ 𝜃1 𝑥 𝑖 − 𝑦 (𝑖) = 02 + 02 + 02 = 0
2𝑚 2𝑚
𝑖=1
Andrew Ng
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1
𝐽(0.5) = (0.5 − 1)2 +(1 − 2)2 +(1.5 − 3)2
2𝑚 𝐽 0.5 = 0.58
1
= 3.5 = 0.58
6
Andrew Ng
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x
1
𝐽(0) = (1)2 +(2)2 +(3)2
2𝑚 𝐽 0 = 2.3
1
= 14 = 2.3
6
Andrew Ng
(for fixed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5
x

Andrew Ng
Linear regression
with one variable
Cost function
intuition II
Andrew Ng
Hypothesis:

Parameters:

Cost Function:

Goal:

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

500
3
400
Price ($) 300 2
in 1000’s
200
1
100
0
0
0 1000 2000 3000 -0.5 0 0.5 1 1.5 2 2.5
Size in feet2 (x)

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

500

400
Price ($) 300
in 1000’s
200

100

0
0 1000 2000 3000
Size in feet2 (x)

Andrew Ng
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

𝐽(𝜃0 , 𝜃1 )

-0.15

800

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

ℎ𝜃 𝑥 = 0𝑥 + 360
0

360

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
Linear regression
with one variable
Gradient
descent
Andrew Ng
Have some function
Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum

Andrew Ng
J()




Andrew Ng
J()




Andrew Ng
Gradient descent algorithm

Learning Rate / Step Size Update Term / Direction of descent

Correct: Simultaneous update Incorrect:

Andrew Ng
Linear regression
with one variable
Gradient descent
intuition

Andrew Ng
Gradient Descent

Andrew Ng
Gradient descent algorithm

Learning Rate / Step Size


Update Term / Direction of descent

Andrew Ng
𝐽(𝜃1 )
𝑑
𝜃1 = 𝜃1 − 𝛼 𝐽(𝜃1 )
𝑑𝜃1

𝜃1 = 𝜃1 − 𝛼. (𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑢𝑚𝑏𝑒𝑟)

𝜃1
𝐽(𝜃1 )
𝑑
𝛼 𝐽 𝜃1 ≤ 0
𝑑𝜃1

𝜃1 = 𝜃1 − 𝛼. (𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑛𝑢𝑚𝑏𝑒𝑟)

𝜃1
Andrew Ng
If α is too small, gradient descent
can be slow.

If α is too large, gradient descent


can overshoot the minimum. It may
fail to converge, or even diverge.

Andrew Ng
0
at local optima

𝜃1 = 𝜃1 − 𝛼. 0 = 𝜃1

Andrew Ng
Gradient descent can converge to a local
minimum, even with the learning rate α fixed.

As we approach a local
minimum, gradient
descent will automatically
take smaller steps. So, no
need to decrease α over
time.
Andrew Ng
Linear regression
with one variable
Gradient descent for
linear regression

Andrew Ng
Gradient descent algorithm Linear Regression Model

Andrew Ng
𝑚
𝜕 1 2
෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
𝜕𝜃𝑗 2𝑚
𝑖=1
𝑚
𝜕 1 2
= ෍ 𝜃0 + 𝜃1 𝑥 (𝑖) − 𝑦 (𝑖)
𝜕𝜃𝑗 2𝑚
𝑖=1

𝑚
1
෍ ℎ𝜃 𝑥 (𝑖) − 𝑦 (𝑖)
𝑚
𝑖=1
𝑚
1 𝑖 𝑖
෍ ℎ𝜃 𝑥 −𝑦 . 𝑥 (𝑖)
𝑚
𝑖=1
Andrew Ng
Gradient descent algorithm

update
and
simultaneously

Andrew Ng
Andrew Ng
J()

local minimum

 global minimum

Andrew Ng
Our cost function is convex ➔ one global minimum

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

Andrew Ng
(for fixed , this is a function of x) (function of the parameters )

250

1250

Andrew Ng
Summary
• Linear regression is used to predict a value (like the sale price of a
house).
• Given a set of data, first try to fit a line to it.
• The cost function tells you how good your line is.
• You can use gradient descent to find the best line.

Andrew Ng
Predicting Housing Prices

3000 square feet


Andrew Ng

You might also like