Linear Regression
Linear Regression
Linear Regression
Hanoi, 09/2016
Outline
Linear Regression
Gradient Descent?
2
Review of Machine Learning
Training Data
System to
Input (X) train h Output (Y)
(hypothesis)
Price (billion
Size (m2) VND)
(x(2),y(2)) 30 2.5
43 3.4
25 1.8
51 4.5
40 3.2 (x(5),y(5))
20 1.6
Table 1: Training data of housing price in Hanoi
4.5
3.5
y-Price (billion VND)
2.5
1.5
0.5
0
0 10 20 30 40 50 60
x-Square (m2)
y=h(x)=a*x+b
5
Linear Regression
Objective:
Learning the function y=h(x)=a*x+b, such as it returns the minimize
error- cost function for the training data (optimization problem)
6
Linear Regression
Cost Function:
Size(m2) Price (b.VND)
The error for training data:
1 1
30 2.5
e(1) (h( x (1) ) y (1) ) 2 (30a b 2.5) 2 43 3.4
2 2
25 1.8
1 1
e(2) (h( x (2) ) y (2) .) 2 (43a b 3.4) 2 51 4.5
2 2
. 40 3.2
. 20 1.6
1 1
e ( m ) ( h( x ( m ) ) y ( m ) ) 2 ( x ( m ) a b y ( m ) ) 2
2 2
The cost function is define as:
m
1 (1) (2) 1 1 m
E (e e ... e ) e
( m) (i )
( h( x ( i ) ) y ( i ) ) 2
m m i 1 2m i 1
1 m
E
2m i 1
( ax (i )
b y (i ) 2
)
7
Gradient Descent
Objective:
Use the gradient function to find a minimum of a function
1 m
E
2m i 1
( ax (i )
b y (i ) 2
)
8
Gradient Descent
9
Gradient Descent
1. Leave x0 unchanged
2. Change x0 in random direction
3. Move x0 toward the global minimum
4. Decrease x0
10
Gradient Descent
2
(x ) 2x ( f ( x) ) 2 f ( x).
2
f ( x)
x x x
11
Gradient Descent
1 m
E (a ) (ax (i ) b y (i ) ).x ( i )
a m i 1
1 m
E (b) (ax (i ) b y ( i ) )
b m i 1
12
Gradient Descent
Exercise:
Starting at a=0 and b= 0, α=0.01, what is the cost function?
Calculate the value of a and b after first iteration (first step). Confirm
if the cost function is reduced or not?
Size(m2) Price (b.VND)
30 2.5
43 3.4
25 1.8
51 4.5
40 3.2
20 1.6
13
Gradient Descent
14
Gradient Descent
15
Gradient Descent
16
Gradient Descent
Convergence:
How to know if a function is converged or not?
- Cost function is smaller than a predefined threshold
- After a big enough number of step
- Cost function decreased less than a predefine threshold
17
Gradient Descent
Summarization:
1. Calculate the cost function
2. Select random value for coefficient a,b
3. Step by step modify a, b such as the cost function is decreased
While (not converged)
do
a : a E (a) b : b E (b)
a b
18
Multiple Input Representation
Example:
Consider the same example, but with more inputs
Size (m2) N0 of floors N0 of rooms Price (billion
VND)
30 3 6 2.5
43 4 8 3.4
25 2 3 1.8
51 4 9 4.5
40 3 5 3.2
20 1 2 1.6
𝑥 𝑖
: the input of ith training data
(i ) : the component j of ith training data
xj
19
Multiple Input Representation
Matrix representation:
y h( x) 0 1 x1 2 x2 ... m xm
1 0
x
x 1 1
... ...
xm m
x0
x
1
h( x) [ 0 1 2 ... m ] x2 T x
...
xm
20
Multiple Input Representation
Cost Function:
1 m 1 m
E ( )
2m i 1
(h ( x (i ) ) y (i ) ) 2
2m i 1
( T x (i ) y (i ) )2
21
Multiple Input Representation
Gradient Descent
Start with random value of θ, step by step modify θ in order to decrease
the cost function
j : j E ( )
j
1 m T (i )
j
E ( )
2m i 1 j
( x y (i ) 2
)
1 m T (i )
( x y (i ) ) x j ( i )
m i 1
22
Multiple Input Representation
Gradient Descent
Start with random value of θ, step by step modify θ in order to decrease
the cost function
j : j E ( )
j
1 m T (i )
j
E ( )
2m i 1 j
( x y (i ) 2
)
1 m T (i )
( x y (i ) ) x j ( i )
m i 1
23
Normal Equations
Linear Regression:
Minimize the value of the cost function:
1 m 1 m
E ( )
2m i 1
( h ( x (i )
) y (i ) 2
)
2m i 1
( T (i )
x y (i ) 2
)
Normal Equations:
Solve the following equation to find out the optimized value of θ
E ( ) 0
24
Normal Equations
Linear Regression:
Minimize the value of the cost function:
1 m 1 m
E ( )
2m i 1
( h ( x (i )
) y (i ) 2
)
2m i 1
( T (i )
x y (i ) 2
)
Normal Equations:
Solve the following equation to find out the optimized value of θ
E ( ) 0 j (0,1,..., n), E ( ) 0
j
25
Normal Equations
Solution:
Given a training set of m training example, each contain n inputs, we have
the matrix X (m,n+1) of inputs and vector of output Y
( X T X ) 1 X T Y 26
Homework
27
Polynomial Regression
Polynomial Regression
Output is an polynomial function of the input
For example
h( x) 0 1 x 2 x 2 ... n x n
Assume x1 x
x2 x 2
h( x) 0 1 x1 2 x2 ... n xn
...
x n x n
Linear Regression
28
References
http://openclassroom.stanford.edu/MainFolder/VideoPage.php?co
urse=MachineLearning&video=02.4-LinearRegressionI-
GradientDescent&speed=100
29
Feature Rescale
Objective: Scale all features to the same scale, in order to have easier
computation
Popular scale : [0,1],[-0.5,0.5]
x=x/max (x)
c= mean (x) x=x-c/max(x) =>
30