0% found this document useful (0 votes)

11 views60 pages

BookSlides - 7 Part A - Error-Based - Learning

The document outlines the fundamentals of multivariate linear regression using gradient descent, emphasizing the iterative adjustment of parameters to minimize prediction error. It discusses the importance of error functions, particularly the sum of squared errors, and illustrates these concepts with a dataset of office rental prices. The document also introduces the concept of error surfaces and how they relate to finding optimal model parameters.

Uploaded by

Houda Maarfi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views60 pages

BookSlides - 7 Part A - Error-Based - Learning

Uploaded by

Houda Maarfi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Error-based Learning
Sections 7.1, 7.2, 7.3

Dr. Mohamed Brahimi and Prof. Ahmed Guessoum

Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

1 Big Idea

2 Fundamentals
Simple Linear Regression
Measuring Error
Error Surfaces
3 Standard Approach: Multivariate Linear Regression
with Gradient Descent
Multivariate Linear Regression
Gradient Descent
Choosing Learning Rates & Initial Weights
A Worked Example

4 Summary
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Big Idea
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

It is common practice for humans solving various problems

(facing different challenges) to progressively adjust their
solution to the problem and aiming to reduce the gap
(error) they are making in their attempt to find the ”right”
(acceptable) solution.
A paramaterised prediction model is initialised with a set
of random parameters and an error function is used to
judge how well this initial model performs when making
predictions for instances in a training dataset.
Based on the value of the error function, the parameters
are iteratively adjusted to create a more and more accurate
model.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

It is common practice for humans solving various problems

(facing different challenges) to progressively adjust their
solution to the problem and aiming to reduce the gap
(error) they are making in their attempt to find the ”right”
(acceptable) solution.
A paramaterised prediction model is initialised with a set
of random parameters and an error function is used to
judge how well this initial model performs when making
predictions for instances in a training dataset.
Based on the value of the error function, the parameters
are iteratively adjusted to create a more and more accurate
model.
A family of error-based machine learning algorithms takes
this same approach.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Fundamentals
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Simple Linear Regression

Table: The office rentals dataset: a dataset that includes office

rental prices and a number of descriptive features for 10 Dublin
city-centre offices.
Size B ROADBAND E NERGY R ENTAL
ID SQ F T F LOOR R ATE R ATING P RICE
1 500 4 8 C 320
2 550 7 50 A 380
3 620 9 7 A 400
4 630 5 24 B 390
5 665 8 100 C 385
6 700 4 8 B 410
7 770 10 7 B 480
8 880 12 50 A 600
9 920 14 8 C 570
10 1,000 9 24 B 620

We start by considering only SIZE and RENTAL PRICE.

Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Simple Linear Regression

Table: The office rentals dataset: a dataset that includes office

rental prices and a number of descriptive features for 10 Dublin
city-centre offices.
R ENTAL
ID S IZE P RICE
1 500 320
2 550 380
3 620 400
4 630 390
5 665 385
6 700 410
7 770 480
8 880 600
9 920 570
10 1,000 620
600
550
Rental Price
500
450
400
350

500 600 700 800 900 1000

Size

Figure: A scatter plot of the S IZE and R ENTAL P RICE features from
the office rentals dataset.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Simple Linear Regression

From the scatter plot it appears that there is a linear

relationship between the S IZE and R ENTAL P RICE.
The equation of a line can be written as:
y = mx + b (1)
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Simple Linear Regression

The scatter plot below shows the same scatter plot as

shown in Figure 1 [10] with a simple linear model added to
capture the relationship between office sizes and office
rental prices.
This model is:
R ENTAL P RICE = 6.47 + 0.62 × S IZE
600
550
Rental Price
500
450
400
350

500 600 700 800 900 1000

Size
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Simple Linear Regression

R ENTAL P RICE = 6.47 + 0.62 × S IZE

Using this model determine the expected rental price of the

730 square foot office:
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Simple Linear Regression

R ENTAL P RICE = 6.47 + 0.62 × S IZE

Using this model determine the expected rental price of the

730 square foot office:
R ENTAL P RICE = 6.47 + 0.62 × 730
= 459.07
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Simple Linear Regression

Mw (d) = w[0] + w[1] × d[1] (2)

Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Measuring Error

600
550
Rental Price
500
450
400
350

500 600 700 800 900 1000

Size

Figure: A scatter plot of the S IZE and R ENTAL P RICE features from
the office rentals dataset. A collection of possible simple linear
regression models capturing the relationship between these two
features are also shown. For all models w[0] is set to 6.47. From top
to bottom the models use 0.4, 0.5, 0.62, 0.7 and 0.8 respectively for
w[1].
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Measuring Error

600
550
Rental Price
500
450
400
350

500 600 700 800 900 1000

Size

Figure: A scatter plot of the S IZE and R ENTAL P RICE features from
the office rentals dataset showing a candidate prediction model (with
w[0] = 6.47 and w[1] = 0.62) and the resulting errors.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Measuring Error

There are many different kinds of error functions, but for

measuring the fit of simple linear regression models, the most
commonly used is the sum of squared errors error function, or
L2 .
n
1X
L2 (Mw , D) = (ti − Mw (di [1]))2 (3)
2
i=1
n
1X
= (ti − (w[0] + w[1] × di [1]))2 (4)
2
i=1
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Measuring Error

Table: Calculating the sum of squared errors for the candidate model
(with w[0] = 6.47 and w[1] = 0.62) making predictions for the office
rentals dataset.
R ENTAL Model Error Squared
ID P RICE PredictionError Error
1 320 316.79 3.21 10.32
2 380 347.82 32.18 1,035.62
3 400 391.26 8.74 76.32
4 390 397.47 -7.47 55.80
5 385 419.19-34.19 1,169.13
6 410 440.91-30.91 955.73
7 480 484.36 -4.36 19.01
8 600 552.63 47.37 2,243.90
9 570 577.46 -7.46 55.59
10 620 627.11 -7.11 50.51
Sum 5,671.64
Sum of squared errors (Sum/2) 2,835.82
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Measuring Error

If we perform the same calculation for the other candidate

models shown in an earlier Figure, we find that with w[1]
set to 0.4, 0.5, 0.7, and 0.8, the sums of squared errors are
136218, 42712, 20092, and 90978, respectively.
The errors for these models are larger than for the model
with w[1] set to 0.62 which confirms the previous visual
intuition that this model most accurately fits the training
data was correct.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Error Surfaces

For every possible combination of weights, w[0] and w[1],

there is a corresponding sum of squared errors value that
can be joined together to make a surface.

4.0e+07
3

3.5e+07

2
3.0e+07

2.5e+07
1

2.0e+07

w1
0 1.5e+07

1.0e+07

−1

5.0e+06

0.0e+00
−2

−5 0 5 10 15

(a) (b)

Figure: (a) A 3D surface plot and (b) a contour plot of the error
surface generated by plotting the sum of squared errors value for the
office rentals training set for each possible combination of values for
w[0] (from the range [−10, 20]) and w[1] (from the range [−2, 3]).
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Error Surfaces

The x-y plane is known as a weight space and the

surface is known as an error surface.
The model that best fits the training data is the model
corresponding to the lowest point on the error surface.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Error Surfaces

The x-y plane is known as a weight space and the

surface is known as an error surface.
The model that best fits the training data is the model
corresponding to the lowest point on the error surface.
Due to the model used, the error surface is convex and
possesses a global minimum.
So we can find the optimal weights at the point where the
partial derivatives of the error surface with respect to w[0]
and w[1] are equal to 0.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Error Surfaces

Using Equation (4)[18] we can formally define this point on

the error surface as the point at which:

n
∂ 1X
(ti − (w[0] + w[1] × di [1]))2 = 0 (5)
∂w[0] 2
i=1
and
n
∂ 1 X
(ti − (w[0] + w[1] × di [1]))2 = 0 (6)
∂w[1] 2
i=1
There are a number of different ways to find this point.
We will describe a guided search approach known as the
gradient descent algorithm.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Standard Approach: Multivariate

Linear Regression with Gradient
Descent
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

The most common approach to error-based machine learning

for predictive analytics is to use multivariable linear regression
with gradient descent to train a best-fit model for a given
training dataset.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

Table: A dataset that includes office rental prices and a number of

descriptive features for 10 Dublin city-center offices.
B ROADBAND E NERGY R ENTAL
ID S IZE F LOOR R ATE R ATING P RICE
1 500 4 8 C 320
2 550 7 50 A 380
3 620 9 7 A 400
4 630 5 24 B 390
5 665 8 100 C 385
6 700 4 8 B 410
7 770 10 7 B 480
8 880 12 50 A 600
9 920 14 8 C 570
10 1,000 9 24 B 620
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

We can define a multivariate linear regression model as:

Mw (d) = w[0] + w[1] × d[1] + · · · + w[m] × d[m] (7)
Xm
= w[0] + w[j] × d[j] (8)
j=1
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

We can make Equation (8)[28] look a little neater by

inventing a dummy descriptive feature, d[0], that is always
equal to 1:

Mw (d) = w[0] × d[0] + w[1] × d[1] + . . . + w[m] × d[m] (9)

Xm
= w[j] × d[j] (10)
j=0
= w·d (11)
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

The sum of squared errors loss function, L2 , definition that

we gave in Equation (4)[18] changes only very slightly to
reflect the new regression equation:
n
1X
L2 (Mw , D) = (ti − Mw (di ))2 (12)
2
i=1
n
1X
= (ti − (w · di ))2 (13)
2
i=1
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

This multivariate model allows us to include all but one of

the descriptive features in Table 3 [19] in a regression model
to predict office rental prices.
The resulting multivariate regression model equation is:

R ENTAL P RICE = w[0] + w[1] × S IZE + w[2] × F LOOR

+ w[3] × B ROADBAND R ATE
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

We will see in the next section how the best-fit set of

weights for this equation are found, but for now we will set:
w[0] = −0.1513,
w[1] = 0.6270,
w[2] = −0.1781,
w[3] = 0.0714.
This means that the model is rewritten as:

R ENTAL P RICE = −0.1513 + 0.6270 × S IZE

− 0.1781 × F LOOR
+ 0.0714 × B ROADBAND R ATE
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

Using this model:

R ENTAL P RICE = −0.1513 + 0.6270 × S IZE

− 0.1781 × F LOOR
+ 0.0714 × B ROADBAND R ATE

We can, for example, predict the expected rental price of a

690 square foot office on the 11th floor of a building with a
broadband rate of 50 Mb per second as:

R ENTAL P RICE = ?
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Multivariate Linear Regression

Using this model:

R ENTAL P RICE = −0.1513 + 0.6270 × S IZE

− 0.1781 × F LOOR
+ 0.0714 × B ROADBAND R ATE

we can, for example, predict the expected rental price of a

690 square foot office on the 11th floor of a building with a
broadband rate of 50 Mb per second as:

R ENTAL P RICE = −0.1513 + 0.6270 × 690

−0.1781 × 11 + 0.0714 × 50
= 434.0896
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Gradient Descent

(a) (b)

Figure: (a) A 3D surface plot and (b) a contour plot of the same error
surface. The lines indicate the path that the gradient decent algorithm
would take across this error surface from different starting positions to
the global minimum - marked as the white dot in the centre.
The journey across the error surface that is taken by the
gradient descent algorithm when training the simple
version of the office rentals example - involving just S IZE
and R ENTAL P RICE.

(a) (b)

Figure: (a) A 3D surface plot and (b) a contour plot of the error
surface for the office rentals dataset showing the path that the
gradient descent algorithm takes towards the best fit model.
600

600

600
550

550

550
Rental Price

Rental Price

Rental Price
500

500

500
450

450

450
400

400

400
350

350

350
500 600 700 800 900 1000 500 600 700 800 900 1000 500 600 700 800 900 1000
Size Size Size

8e+05
600

600

6e+05
550

550

Sum of Squared Errors

Rental Price

Rental Price
500

500

4e+05
450

450

2e+05
400

400
350

350

0e+00
500 600 700 800 900 1000 500 600 700 800 900 1000 0 20 40 60 80 100
Size Size Training Iteration

Figure: A selection of the simple linear regression models developed

during the gradient descent process for the office rentals dataset. The
final panel shows the sum of squared error values generated during
the gradient descent process.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Gradient Descent

Require: set of training instances D

Require: a learning rate α that controls how quickly the
algorithm converges
Require: a function, errorDelta, that determines the direction
in which to adjust a given weight, w[j], so as to move down
the slope of an error surface determined by the dataset, D
Require: a convergence criterion that indicates that the
algorithm has completed
1: w ← random starting point in the weight space
2: repeat
3: for each w[j] in w do
4: w[j] ← w[j] + α × errorDelta(D, w[j])
5: end for
6: until convergence occurs

The gradient descent algorithm for training multivariate

linear regression models.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Gradient Descent

The most important part to the gradient descent algorithm

is Line Rule 4 on which the weights are updated.

w[j] ← w[j] + α × errorDelta(D, w[j])

Each weight is considered independently and for each one

a small adjustment is made by adding a small delta value
to the current weight, w[j].
This adjustment should ensure that the change in the
weight leads to a move downwards on the error surface.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Gradient Descent

Imagine for a moment that our training dataset, D contains

just one training example: (d, t)
The gradient of the error surface is given as the partial
derivative of L2 with respect to each weight, w[j]:

∂ ∂1 2
L2 (Mw , D) = (t − Mw (d)) (14)
∂w[j] 2
∂w[j]
∂
= (t − Mw (d)) × (t − Mw (d)) (15)
∂w[j]
∂
= (t − Mw (d)) × (t − (w · d)) (16)
∂w[j]
= (t − Mw (d)) × −d[j] (17)
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Gradient Descent

Adjusting the calculation to take into account multiple

training instances:
n
∂ X
L2 (Mw , D) = ((ti − Mw (di )) × −di [j])
∂w[j]
i=1

We use this equation to define the errorDelta in our

gradient descent algorithm.
X n
w[j] ← w[j] + α ((ti − Mw (di )) × di [j])
|i=1 {z }
errorDelta(D,w[j])

This rule is known as the known as the weight update rule for
multivariable linear regression with gradient descent.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Gradient Descent

The approach to training multivariable linear regression

models described so far is known as batch gradient
descent.
The word batch is used because only one adjustment is
made to each weight at each iteration of the algorithm
based on summing the squared error made by the
candidate model for each instance in the training dataset.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Choosing Learning Rates & Initial Weights

The (constant) learning rate, α, determines the size of the

adjustment made to each weight at each step in the
process.
Unfortunately, choosing learning rates is not a well defined
science.
Most practitioners use rules of thumb and trial and error.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Choosing Learning Rates & Initial Weights

(a) (b) (c)

80
● ● ●
●
●
●
●
●
●
●
●
Sum of Squared Errors

Sum of Squared Errors

60
●
●
●
●
●
●
●
●
●
●
●
●●
40

40
●
● ●
●
●
●
●●
●
●
●
●
●
●●
●
● ●
●
●
●
●●
●
●
20

20
●
●●
● ●
●●
●
●
● ●
●
●
● ●
●
●
●
●●
●
●●
●
●
●● ●
●
●●
●
●
● ● ●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
● ●●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
● ● ●●●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
● ●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
● ●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
0

0
0 100 200 300 400 500 5 10 15 20 0 5 10 15 20 25 30 35
Training Iteration Training Iteration Training Iteration

(d) (e) (f)

Figure: Plots of the journeys made across the error surface for the
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Choosing Learning Rates & Initial Weights

Previous figures: Plots of the journeys made across the error

surface for the simple office rentals prediction problem for
different learning rates: (a) a very small learning rate (0.002),
(b) a medium learning rate (0.08) and (c) a very large learning
rate (0.18).
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Choosing Learning Rates & Initial Weights

A typical range for learning rates is [0.00001, 10]

Based on empirical evidence, choosing random initial
weights uniformly from the range [−0.2, 0.2] tends to work
well.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

We are now in a position to build a linear regression model

that uses all of the continuous descriptive features in the
office rentals dataset.
The general structure of the model is:

R ENTAL P RICE = w[0] + w[1] × S IZE + w[2] × F LOOR

+ w[3] × B ROADBAND R ATE
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

Table: The office rentals dataset: a dataset that includes office

rental prices and a number of descriptive features for 10 Dublin
city-centre offices.
B ROADBAND E NERGY R ENTAL
ID S IZE F LOOR R ATE R ATING P RICE
1 500 4 8 C 320
2 550 7 50 A 380
3 620 9 7 A 400
4 630 5 24 B 390
5 665 8 100 C 385
6 700 4 8 B 410
7 770 10 7 B 480
8 880 12 50 A 600
9 920 14 8 C 570
10 1,000 9 24 B 620
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

For this example let’s assume that:

α = 0.00000002 and
the initial weights are chosen from a uniform random
distribution in the range [−0.2, 0.2].
Initial Weights
w[0]: -0.146 w[1]: 0.185 w[2]: -0.044 w[3]: 0.119
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

Iteration 1
R ENTAL Squared errorDelta(D, w[i])
ID P RICE Pred.
Error Error w[0] w[1] w[2] w[3]
1 320 93.26
226.74 51411.08 226.74 113370.05 906.96 1813.92
2 380 107.41
272.59 74307.70 272.59 149926.92 1908.16 13629.72
3 400 115.15
284.85 81138.96 284.85 176606.39 2563.64 1993.94
4 390 119.21
270.79 73327.67 270.79 170598.22 1353.95 6498.98
5 385 134.64
250.36 62682.22 250.36 166492.17 2002.91 25036.42
6 410 130.31
279.69 78226.32 279.69 195782.78 1118.76 2237.52
7 480 142.89
337.11 113639.88 337.11 259570.96 3371.05 2359.74
8 600 168.32
431.68 186348.45 431.68 379879.24 5180.17 21584.05
9 570 170.63
399.37 159499.37 399.37 367423.83 5591.23 3194.99
10 620 187.58
432.42 186989.95 432.42 432423.35 3891.81 10378.16
Sum 1067571.59 3185.61 2412073.90 27888.65 88727.43
Sum of squared errors (Sum/2) 533785.80
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

n
X
w[j] ← w[j] + α ((ti − Mw (di )) × di [j])
i=1
| {z }
errorDelta(D,w[j])

Initial Weights
w[0]: -0.146 w[1]: 0.185 w[2]: -0.044 w[3]: 0.119

Example

w[1] ← 0.185 + 0.00000002 × 2, 412, 074 = 0.23324148

New Weights (Iteration 1)
w[0]: -0.146 w[1]: 0.233 w[2]: -0.043 w[3]: 0.121
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

Iteration 2
R ENTAL Squared errorDelta(D, w[i])
ID P RICE Pred.
Error Error w[0] w[1] w[2] w[3]
1 320 117.40
202.60 41047.92 202.60 101301.44 810.41 1620.82
2 380 134.03
245.97 60500.69 245.97 135282.89 1721.78 12298.44
3 400 145.08
254.92 64985.12 254.92 158051.51 2294.30 1784.45
4 390 149.65
240.35 57769.68 240.35 151422.55 1201.77 5768.48
5 385 166.90
218.10 47568.31 218.10 145037.57 1744.81 21810.16
6 410 164.10
245.90 60468.86 245.90 172132.91 983.62 1967.23
7 480 180.06
299.94 89964.69 299.94 230954.68 2999.41 2099.59
8 600 210.87
389.13 151424.47 389.13 342437.01 4669.60 19456.65
9 570 215.03
354.97 126003.34 354.97 326571.94 4969.57 2839.76
10 620 187.58
432.42 186989.95 432.42 432423.35 3891.81 10378.16
Sum 886723.04 2884.32 2195615.84 25287.08 80023.74
Sum of squared errors (Sum/2) 443361.52
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

n
X
w[j] ← w[j] + α ((ti − Mw (di )) × di [j])
i=1
| {z }
errorDelta(D,w[j])

Initial Weights (Iteration 2)

w[0]: -0.146 w[1]: 0.233 w[2]: -0.043 w[3]: 0.121

Exercise
w[1] ←?, α = 0.00000002

New Weights (Iteration 2)

w[0]: ? w[1]: ? w[2]: ? w[3]: ?
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

n
X
w[j] ← w[j] + α ((ti − Mw (di )) × di [j])
i=1
| {z }
errorDelta(D,w[j])

Initial Weights (Iteration 2)

w[0]: -0.146 w[1]: 0.233 w[2]: -0.043 w[3]: 0.121

Exercise

w[1] ← 0.233 + 0.00000002 × 2195616.08 = 0.27691232

New Weights (Iteration 2)
w[0]: -0.145 w[1]: 0.277 w[2]: -0.043 w[3]: 0.123
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

The algorithm then keeps iteratively applying the weight

update rule until it converges on a stable set of weights
beyond which little improvement in model accuracy is
possible.
After 100 iterations the final values for the weights are:
w[0] = −0.1513,
w[1] = 0.6270,
w[2] = −0.1781
w[3] = 0.0714
which results in a sum of squared errors value of 2, 913.5
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

Note that a careful examination of the previous tables

A Worked Example

Note that a careful examination of the previous tables

shows why such a low learning rate is used in this example.
The large values of the RENTAL PRICE feature, [320,
620], cause the squared errors and, in turn, the error delta
values to become very large.
So a very low learning rate is required in order to ensure
that the changes made to the weights at each iteration of
the learning process are small enough for the algorithm to
work effectively.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

A Worked Example

Note that a careful examination of the previous tables

shows why such a low learning rate is used in this example.
The large values of the RENTAL PRICE feature, [320,
620], cause the squared errors and, in turn, the error delta
values to become very large.
So a very low learning rate is required in order to ensure
that the changes made to the weights at each iteration of
the learning process are small enough for the algorithm to
work effectively.
Using normalization on the features can help avoid these
large squared errors, and will be done in most examples
from now on.
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

1 Big Idea

4 Summary
Big Idea Fundamentals Standard Approach: Multivariate Linear Regression with Gradient Descent Summary

Slide Acknowledgment

The slides used in this course are based on the official

textbook materials, with modifications made where
necessary to suit the course requirements and enhance
the learning experience.

ZJC Focus On Combined Science Form 1
92% (36)
ZJC Focus On Combined Science Form 1
226 pages
Grade 12 Physics Exam Questions and Answers
80% (10)
Grade 12 Physics Exam Questions and Answers
3 pages
Unit - 1 - Ohs352-Project Report Writing
No ratings yet
Unit - 1 - Ohs352-Project Report Writing
23 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Analysis and Design of (Concentric, Edge, Corner) Footing: Sample Structural Manila
100% (1)
Analysis and Design of (Concentric, Edge, Corner) Footing: Sample Structural Manila
3 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Structure and Written Expression: Section Two
100% (1)
Structure and Written Expression: Section Two
26 pages
Part1 Overview Release 13 en
No ratings yet
Part1 Overview Release 13 en
38 pages
BookSlides 7A Error-Based Learning
No ratings yet
BookSlides 7A Error-Based Learning
49 pages
Psychology and Other Disciplines
No ratings yet
Psychology and Other Disciplines
5 pages
Project Analysis / Decision Making: Engineering 90 Dr. Gregory Crawford
No ratings yet
Project Analysis / Decision Making: Engineering 90 Dr. Gregory Crawford
49 pages
Day3 L1 ML Regression
No ratings yet
Day3 L1 ML Regression
91 pages
Assignment - DADS303 - MBA 3 - Set 1 and 2
No ratings yet
Assignment - DADS303 - MBA 3 - Set 1 and 2
9 pages
Lecture Slides-Week9,10
No ratings yet
Lecture Slides-Week9,10
66 pages
Lecture Slides-Week9
No ratings yet
Lecture Slides-Week9
46 pages
2 - Multiple Linear Regression
No ratings yet
2 - Multiple Linear Regression
71 pages
Unit 2
No ratings yet
Unit 2
35 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
2 Linear
No ratings yet
2 Linear
15 pages
Ch6 Multiple Regression
No ratings yet
Ch6 Multiple Regression
29 pages
10 - 4 - ML - SUP - Linear Regression
No ratings yet
10 - 4 - ML - SUP - Linear Regression
59 pages
Essentials of Strategic Management The Quest For Competitive Advantage 8th Edition Gamble Test Bank Available Instantly
No ratings yet
Essentials of Strategic Management The Quest For Competitive Advantage 8th Edition Gamble Test Bank Available Instantly
341 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
Machine Learning Summary
No ratings yet
Machine Learning Summary
38 pages
Week 7 - Regression
No ratings yet
Week 7 - Regression
24 pages
Regression
No ratings yet
Regression
35 pages
PDA Unit-3 (Full Unit)
No ratings yet
PDA Unit-3 (Full Unit)
61 pages
Medical Insurance Prediction Slides
No ratings yet
Medical Insurance Prediction Slides
43 pages
10 - 4 - ML - SUP - Linear Regression
No ratings yet
10 - 4 - ML - SUP - Linear Regression
59 pages
Unit 2
No ratings yet
Unit 2
79 pages
Error Based Learning
No ratings yet
Error Based Learning
48 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Lecture - 4 - Logistic Regression
No ratings yet
Lecture - 4 - Logistic Regression
62 pages
Evolution of Stars
No ratings yet
Evolution of Stars
3 pages
Linear Regression
No ratings yet
Linear Regression
37 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
2-LR Optim
No ratings yet
2-LR Optim
60 pages
Lecture3 - Linear Regression and Logistic Regression
No ratings yet
Lecture3 - Linear Regression and Logistic Regression
60 pages
DSR Notes 3 To 5
No ratings yet
DSR Notes 3 To 5
70 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
AI Lab7
No ratings yet
AI Lab7
13 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
The Three Lines of Defence: Audit Committee Institute
No ratings yet
The Three Lines of Defence: Audit Committee Institute
4 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Data Science Module 5 Q & A
No ratings yet
Data Science Module 5 Q & A
8 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
CH 06
No ratings yet
CH 06
20 pages
Da Sem Unit 3-1
No ratings yet
Da Sem Unit 3-1
13 pages
Modern Pridictive Modelling (Regression)
No ratings yet
Modern Pridictive Modelling (Regression)
12 pages
Unit Ii
No ratings yet
Unit Ii
48 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
Group 1 Practical
No ratings yet
Group 1 Practical
16 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
기존 시설물 (기초및지반) 내진성능 평가요령 (안)
No ratings yet
기존 시설물 (기초및지반) 내진성능 평가요령 (안)
216 pages
Heizer 17
No ratings yet
Heizer 17
33 pages
Very Low Drop 5V Regulator With Reset: Description
No ratings yet
Very Low Drop 5V Regulator With Reset: Description
79 pages
Iso 11600 2002
No ratings yet
Iso 11600 2002
9 pages
Letter Writing: Lead in
No ratings yet
Letter Writing: Lead in
23 pages
Essay and Elocution Competition
No ratings yet
Essay and Elocution Competition
1 page
Visual COBOL Question and Answers PDF
No ratings yet
Visual COBOL Question and Answers PDF
33 pages
Tổng Hợp Đề Thi Ielts Speaking Quý 4 - 2019 by Ngocbach
No ratings yet
Tổng Hợp Đề Thi Ielts Speaking Quý 4 - 2019 by Ngocbach
14 pages
Whiplash Project
No ratings yet
Whiplash Project
11 pages
Unit I Notes
No ratings yet
Unit I Notes
54 pages
5 Versionfinal
No ratings yet
5 Versionfinal
8 pages
FRC Series
No ratings yet
FRC Series
1 page
T-HA Series: Panasonic Industrial Company
No ratings yet
T-HA Series: Panasonic Industrial Company
6 pages
Quiz 2 PF
No ratings yet
Quiz 2 PF
7 pages
Optimal Lot-Size With The Andler Formula: Sensitivity Analysis
No ratings yet
Optimal Lot-Size With The Andler Formula: Sensitivity Analysis
3 pages
Teacher Notes and Answers 8 Fluid Mechanics
No ratings yet
Teacher Notes and Answers 8 Fluid Mechanics
3 pages
Evaluation of Golden Proportions Among Three Ethnic Groups of Indian Females
No ratings yet
Evaluation of Golden Proportions Among Three Ethnic Groups of Indian Females
7 pages
Planning A Lesson Using PRIMM: The Five Stages of PRIMM
No ratings yet
Planning A Lesson Using PRIMM: The Five Stages of PRIMM
2 pages
Schmidt Sciences
No ratings yet
Schmidt Sciences
6 pages
Item Analysis Procedures 1
No ratings yet
Item Analysis Procedures 1
2 pages
Visualizing Financial Data
From Everand
Visualizing Financial Data
Julie Rodriguez
No ratings yet
JavaScript and jQuery for Data Analysis and Visualization
From Everand
JavaScript and jQuery for Data Analysis and Visualization
Jon Raasch
No ratings yet
Machine Learning for iOS Developers
From Everand
Machine Learning for iOS Developers
Abhishek Mishra
No ratings yet
CATIA V5-6R2017 Basics
From Everand
CATIA V5-6R2017 Basics
Tutorial Books
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet