Linear Regression With Multiple Variables
Linear Regression With Multiple Variables
Linear Regression With Multiple Variables
500
PRICE (IN 1000S OF $)
400
300
200
100
0
0 500 1000 1500 2000 2500 3000
SIZE (FEET2)
Housing Price
600
500
h(x) = θ0 + θ1x
PRICE (IN 1000S OF $)
400
300
200
100
0
0 500 1000 1500 2000 2500 3000
SIZE (FEET2)
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
Θ0 =1.5 Θ0 = 0 Θ0 = 1
Θ1 = 0 Θ1 = 0.5 Θ1 = 0.5
θ0, θ1
y
x
• Based on our training set we want to generate parameters
which make the straight line
– Chosen these parameters so hθ(x) is close to y for our training
examples
• Basically, uses x’s in training set with hθ(x) to give output which is as close
to the actual y value as possible
• Think of hθ(x) as a "y imitator" - it tries to convert the x into y, and
considering we already have - we can evaluate, how well hθ(x) does this.
2
hθ(x) = θ1x
1
0
0 1 2 3
Θ0 = 0
hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3
hθ(x)
2 2
J(θ1)
y
1 1
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1
Θ1= 1 Θ1 = 1
J(θ1) = 0
hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3
2 2
J(θ1)
y
1 1
hθ(x)
0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1
hθ(x) J(θ1)
(for fixed θ1, this is a function of x) (function of the parameter θ1)
3 3
Θ1 = 1
2 2
J(θ1)
y
Θ1= 0.5
1 1
hθ(x)
0
Θ1 = 0 0
0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 3
x Θ1
– If we have
• θ0 = ~360
• θ1 = 0
• This gives a better hypothesis, but still not great - not in the center of
the contour plot
• Derivative term
• Derivative says
– Lets take the tangent at the point and look at the slope of the line
– So moving towards the minimum (down) will create a negative
derivative, alpha is always positive, so will update J(θ1) to a smaller
value
– Similarly, if we're moving up a slope we make J(θ1) a bigger numbers
positive slope d
θ1 = θ1 − α J(θ1) ≥ 0
dθ1
θ1 = θ1 − α (positive number)
θ1
J(θ1) (θ1 ϵ ℝ)
d
negative slope θ1 = θ1 − α J(θ1) ≤ 0
dθ1
θ1 = θ1 − α (negative number)
θ1
CSE 445 Machine Learning ECE@NSU
Gradient Descent Algorithm
d J(θ1) (θ1 ϵ ℝ)
θ1 = θ1 − α J(θ1)
dθ1
θ1
J(θ1) (θ1 ϵ ℝ)
J(θ1) (θ1 ϵ ℝ)
θ1 at local optima
θ1
d J(θ1) (θ1 ϵ ℝ)
θ1 = θ1 − α J(θ1) ≥ 0
dθ1
θ1
Update θ0 and θ1
simultaneously