Exercise
Exercise
Exercise
Machine Learning
Introduction
In this exercise, you will implement linear regression and get to see it work
on data. Before starting on this programming exercise, we strongly recommend watching the video lectures and completing the review questions for
the associated topics.
To get started with the exercise, you will need to download the starter
code and unzip its contents to the directory where you wish to complete the
exercise. If needed, use the cd command in Octave/MATLAB to change to
this directory before starting this exercise.
You can also find instructions for installing Octave/MATLAB in the Environment Setup Instructions of the course website.
Throughout the exercise, you will be using the scripts ex1.m and ex1 multi.m.
These scripts set up the dataset for the problems and make calls to functions
that you will write. You do not need to modify either of them. You are only
required to modify functions in other files, by following the instructions in
this assignment.
For this programming exercise, you are only required to complete the first
part of the exercise to implement linear regression with one variable. The
second part of the exercise, which is optional, covers linear regression with
multiple variables.
The first part of ex1.m gives you practice with Octave/MATLAB syntax and
the homework submission process. In the file warmUpExercise.m, you will
find the outline of an Octave/MATLAB function. Modify it to return a 5 x
5 identity matrix by filling in the following code:
A = eye(5);
1
Octave is a free alternative to MATLAB. For the programming exercises, you are free
to use either Octave or MATLAB.
When you are finished, run ex1.m (assuming you are in the correct directory, type ex1 at the Octave/MATLAB prompt) and you should see
output similar to the following:
ans =
Diagonal Matrix
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
Now ex1.m will pause until you press any key, and then will run the code
for the next part of the assignment. If you wish to quit, typing ctrl-c will
stop the program in the middle of its run.
1.1
Submitting Solutions
After completing a part of the exercise, you can submit your solutions for
grading by typing submit at the Octave/MATLAB command line. The submission script will prompt you for your login e-mail and submission token
and ask you which files you want to submit. You can obtain a submission
token from the web page for the assignment.
You should now submit your solutions.
You are allowed to submit your solutions multiple times, and we will take
only the highest score into consideration.
In this part of this exercise, you will implement linear regression with one
variable to predict profits for a food truck. Suppose you are the CEO of a
restaurant franchise and are considering different cities for opening a new
outlet. The chain already has trucks in various cities and you have data for
profits and populations from the cities.
You would like to use this data to help you select which city to expand
to next.
The file ex1data1.txt contains the dataset for our linear regression problem. The first column is the population of a city and the second column is
the profit of a food truck in that city. A negative value for profit indicates a
loss.
The ex1.m script has already been set up to load this data for you.
2.1
Next, the script calls the plotData function to create a scatter plot of
the data. Your job is to complete plotData.m to draw the plot; modify the
file and fill in the following code:
plot(x, y, 'rx', 'MarkerSize', 10);
ylabel('Profit in $10,000s');
xlabel('Population of City in 10,000s');
Now, when you continue to run ex1.m, our end result should look like
Figure 1, with the same red x markers and axis labels.
To learn more about the plot command, you can type help plot at the
Octave/MATLAB command prompt or to search online for plotting documentation. (To change the markers to red x, we used the option rx
together with the plot command, i.e., plot(..,[your options here],..,
rx); )
25
20
Profit in $10,000s
15
10
10
12
14
16
Population of City in 10,000s
18
20
22
24
2.2
Gradient Descent
In this part, you will fit the linear regression parameters to our dataset
using gradient descent.
2.2.1
Update Equations
J() =
2
1 X
h (x(i) ) y (i)
2m i=1
Recall that the parameters of your model are the j values. These are
the values you will adjust to minimize cost J(). One way to do this is to
use the batch gradient descent algorithm. In batch gradient descent, each
iteration performs the update
1 X
(i)
j := j
(h (x(i) ) y (i) )xj
m i=1
With each step of gradient descent, your parameters j come closer to the
optimal values that will achieve the lowest cost J().
Implementation Note: We store each example as a row in the the X
matrix in Octave/MATLAB. To take into account the intercept term (0 ),
we add an additional first column to X and set it to all ones. This allows
us to treat 0 as simply another feature.
2.2.2
Implementation
In ex1.m, we have already set up the data for linear regression. In the
following lines, we add another dimension to our data to accommodate the
0 intercept term. We also initialize the initial parameters to 0 and the
learning rate alpha to 0.01.
X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
theta = zeros(2, 1); % initialize fitting parameters
iterations = 1500;
alpha = 0.01;
2.2.3
As you perform gradient descent to learn minimize the cost function J(),
it is helpful to monitor the convergence by computing the cost. In this
section, you will implement a function to calculate J() so you can check the
convergence of your gradient descent implementation.
Your next task is to complete the code in the file computeCost.m, which
is a function that computes J(). As you are doing this, remember that the
variables X and y are not scalar values, but matrices whose rows represent
the examples from the training set.
Once you have completed the function, the next step in ex1.m will run
computeCost once using initialized to zeros, and you will see the cost
printed to the screen.
You should expect to see a cost of 32.07.
You should now submit your solutions.
6
2.2.4
Gradient descent
2.3
Debugging
Here are some things to keep in mind as you implement gradient descent:
Octave/MATLAB array indices start from one, not zero. If youre storing 0 and 1 in a vector called theta, the values will be theta(1) and
theta(2).
If you are seeing many errors at runtime, inspect your matrix operations
to make sure that youre adding and multiplying matrices of compatible dimensions. Printing the dimensions of variables with the size
command will help you debug.
7
25
20
Profit in $10,000s
15
10
Training data
Linear regression
10
12
14
16
Population of City in 10,000s
18
20
22
24
2.4
Visualizing J()
To understand the cost function J() better, you will now plot the cost over
a 2-dimensional grid of 0 and 1 values. You will not need to code anything
new for this part, but you should understand how the code you have written
already is creating these images.
In the next step of ex1.m, there is code set up to calculate J() over a
grid of values using the computeCost function that you wrote.
After these lines are executed, you will have a 2-D array of J() values.
The script ex1.m will then use these values to produce surface and contour
plots of J() using the surf and contour commands. The plots should look
something like Figure 3:
4
3.5
800
3
700
2.5
600
500
2
1
400
300
200
1.5
1
100
0.5
0
4
0
3
10
2
5
1
1
0.5
0
0
5
1
1
10
10
(a) Surface
0
0
10
Optional Exercises
If you have successfully completed the material above, congratulations! You
now understand linear regression and should able to start using it on your
own datasets.
For the rest of this programming exercise, we have included the following
optional exercises. These exercises will help you gain a deeper understanding
of the material, and if you are able to do so, we encourage you to complete
them as well.
In this part, you will implement linear regression with multiple variables to
predict the prices of houses. Suppose you are selling your house and you
want to know what a good market price would be. One way to do this is to
first collect information on recent houses sold and make a model of housing
prices.
The file ex1data2.txt contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the
second column is the number of bedrooms, and the third column is the price
of the house.
The ex1 multi.m script has been set up to help you step through this
exercise.
3.1
Feature Normalization
The ex1 multi.m script will start by loading and displaying some values
from this dataset. By looking at the values, note that house sizes are about
1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent converge
much more quickly.
Your task here is to complete the code in featureNormalize.m to
Subtract the mean value of each feature from the dataset.
After subtracting the mean, additionally scale (divide) the feature values
by their respective standard deviations.
10
3.2
Gradient Descent
11
J() =
1
(X ~y )T (X ~y )
2m
where
(x(1) )T
(x(2) )T
X=
..
.
(m) T
(x )
~y =
y (1)
y (2)
..
.
y (m)
In this part of the exercise, you will get to try out different learning rates for
the dataset and find a learning rate that converges quickly. You can change
the learning rate by modifying ex1 multi.m and changing the part of the
code that sets the learning rate.
The next phase in ex1 multi.m will call your gradientDescent.m function and run gradient descent for about 50 iterations at the chosen learning
rate. The function should also return the history of J() values in a vector
J. After the last iteration, the ex1 multi.m script plots the J values against
the number of the iterations.
If you picked a learning rate within a good range, your plot look similar
Figure 4. If your graph looks very different, especially if your value of J()
increases or even blows up, adjust your learning rate and try again. We recommend trying values of the learning rate on a log-scale, at multiplicative
steps of about 3 times the previous value (i.e., 0.3, 0.1, 0.03, 0.01 and so on).
You may also want to adjust the number of iterations you are running if that
will help you see the overall trend in the curve.
12
13
Notice the changes in the convergence curves as the learning rate changes.
With a small learning rate, you should find that gradient descent takes a very
long time to converge to the optimal value. Conversely, with a large learning
rate, gradient descent might not converge or might even diverge!
Using the best learning rate that you found, run the ex1 multi.m script
to run gradient descent until convergence to find the final values of . Next,
use this value of to predict the price of a house with 1650 square feet and
3 bedrooms. You will use value later to check your implementation of the
normal equations. Dont forget to normalize your features when you make
this prediction!
You do not need to submit any solutions for these optional (ungraded)
exercises.
3.3
Normal Equations
In the lecture videos, you learned that the closed-form solution to linear
regression is
= XT X
1
X T ~y .
Using this formula does not require any feature scaling, and you will get
an exact solution in one calculation: there is no loop until convergence like
in gradient descent.
Complete the code in normalEqn.m to use the formula above to calculate . Remember that while you dont need to scale your features, we still
need to add a column of 1s to the X matrix to have an intercept term (0 ).
The code in ex1.m will add the column of 1s to X for you.
You should now submit your solutions.
Optional (ungraded) exercise: Now, once you have found using this
method, use it to make a price prediction for a 1650-square-foot house with
3 bedrooms. You should find that gives the same predicted price as the value
you obtained using the model fit with gradient descent (in Section 3.2.1).
14
Submitted File
warmUpExercise.m
computeCost.m
gradientDescent.m
Points
10 points
40 points
50 points
100 points
Submitted File
featureNormalize.m
computeCostMulti.m
Points
0 points
0 points
gradientDescentMulti.m
0 points
normalEqn.m
0 points
You are allowed to submit your solutions multiple times, and we will take
only the highest score into consideration.
15