ML Coursera Python Assignments
ML Coursera Python Assignments
# Plotting library
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D # needed to plot 3-D surfaces
# library written for this exercise providing additional functions for assignment submission, and others
import utils
Debugging
Here are some things to keep in mind throughout this exercise:
Python array indices start from zero, not one (contrary to OCTAVE/MATLAB).
There is an important distinction between python arrays (called list or tuple ) and numpy arrays. You should use
numpy arrays in all your computations. Vector/matrix
Yash Pantoperations work only with numpy arrays. Published
/ ml-coursera-python-assignments Python listsat do
Julnot
19, 2021
support vector operations (you need to use for loops).
If you are seeing many errors at runtime, inspect your matrix operations to make sure that you are adding and
multiplying matrices of compatible dimensions. Printing the dimensions of numpy arrays using the shape property will
help you debug.
By default, numpy interprets math operators to be element-wise operators. If you want to do matrix multiplication, you
need to use the dot function in numpy . For, example if A and B are two numpy matrices, then the matrix operation
AB is np.dot(A, B) . Note that for 2-dimensional matrices or vectors (1-dimensional), this is also equivalent to A@B
(requires python >= 3.5).
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 2/20
31/03/2023, 12:22 ml-coursera-python-assignments
def warmUpExercise():
"""
Example function in Python which computes the identity matrix.
Returns
-------
A : array_like
The 5x5 identity matrix.
Instructions
------------
Return the 5x5 identity matrix.
"""
# ======== YOUR CODE HERE ======
A = np.identity(5,dtype = float) # modify this line
# ==============================
return A
The previous cell only defines the function warmUpExercise . We can now run it by executing the following cell to see its
output. You should see output similar to the following:
array([[ 1., 0., 0., 0., 0.],
[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.]])
warmUpExercise()
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 3/20
31/03/2023, 12:22 ml-coursera-python-assignments
# send the added functions to coursera grader for getting a grade on this part
grader.grade()
Invalid email or token. You used an invalid email or your token may have expired. Please make sure you have entered all field
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 4/20
31/03/2023, 12:22 ml-coursera-python-assignments
Parameters
----------
x : array_like
Data point values for x-axis.
y : array_like
Data point values for y-axis. Note x and y should have the same size.
Instructions
------------
Plot the training data into a figure using the "figure" and "plot"
functions. Set the axes labels using the "xlabel" and "ylabel" functions.
Assume the population and revenue data have been passed in as the x
and y arguments of this function.
Hint
----
You can use the 'ro' option with plot to have the markers
appear as red circles. Furthermore, you can make the markers larger by
using plot(..., 'ro', ms=10), where `ms` refers to marker size. You
can also set the marker edge color using the `mec` property.
"""
fig = pyplot.figure() # open a new figure
# =============================================================
Now run the defined function with the loaded data to visualize the data. The end result should look like the following figure:
plotData(X, y)
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 5/20
31/03/2023, 12:22 ml-coursera-python-assignments
To quickly learn more about the matplotlib plot function and what arguments you can provide to it, you can type ?
pyplot.plot in a cell within the jupyter notebook. This opens a separate page showing the documentation for the
requested function. You can also search online for plotting documentation.
To set the markers to red circles, we used the option 'or' within the plot function.
?pyplot.plot
Recall that the parameters of your model are the θj values. These are the values you will adjust to minimize cost J(θ). One
way to do this is to use the batch gradient descent algorithm. In batch gradient descent, each iteration performs the update
(i)
θj = θj − α m1 ∑m
(i)
(i)
i=1 (hθ (x ) − y ) xj
simultaneously update θj for all j
With each step of gradient descent, your parameters θj come closer to the optimal values that will achieve the lowest cost
J(θ).
**Implementation Note:** We store each example as a row in the the $X$ matrix in Python `numpy`. To take
into account the intercept term ($\theta_0$), we add an additional first column to $X$ and set it to all ones.
This allows us to treat $\theta_0$ as simply another 'feature'.
2.2.2 Implementation
We have already set up the data for linear regression. In the following cell, we add another dimension to our data to
accommodate the θ0 intercept term. Do NOT execute this cell more than once.
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 6/20
31/03/2023, 12:22 ml-coursera-python-assignments
# Add a column of ones to X. The numpy function stack joins arrays along a given axis.
# The first axis (axis=0) refers to rows (training examples)
# and second axis (axis=1) refers to columns (features).
X = np.stack([np.ones(m), X], axis=1)
Parameters
----------
X : array_like
The input dataset of shape (m x n+1), where m is the number of examples,
and n is the number of features. We assume a vector of one's already
appended to the features so we have n+1 columns.
y : array_like
The values of the function at each data point. This is a vector of
shape (m, ).
theta : array_like
The parameters for the regression function. This is a vector of
shape (n+1, ).
Returns
-------
J : float
The value of the regression cost function.
Instructions
------------
Compute the cost of a particular choice of theta.
You should set J to the cost.
"""
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 7/20
31/03/2023, 12:22 ml-coursera-python-assignments
# ===========================================================
return J
Once you have completed the function, the next step will run computeCost two times using two different initializations of θ.
You will see the cost printed to the screen.
You should now submit your solutions by executing the following cell.
grader[2] = computeCost
grader.grade()
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 8/20
31/03/2023, 12:22 ml-coursera-python-assignments
A vector in numpy is a one dimensional array, for example np.array([1, 2, 3]) is a vector. A matrix in numpy is a two
dimensional array, for example np.array([[1, 2, 3], [4, 5, 6]]) . However, the following is still considered a matrix
np.array([[1, 2, 3]]) since it has two dimensions, even if it has a shape of 1x3 (which looks like a vector).
Given the above, the function np.dot which we will use for all matrix/vector multiplication has the following properties:
It always performs inner products on vectors. If x=np.array([1, 2, 3]) , then np.dot(x, x) is a scalar.
For matrix-vector multiplication, so if X is a m × n matrix and y is a vector of length m, then the operation np.dot(y,
X) considers y as a 1 × m vector. On the other hand, if y is a vector of length n, then the operation np.dot(X, y)
considers y as a n × 1 vector.
A vector can be promoted to a matrix using y[None] or [y[np.newaxis] . That is, if y = np.array([1, 2, 3]) is a
vector of size 3, then y[None, :] is a matrix of shape 1 × 3. We can use y[:, None] to obtain a shape of 3 × 1.
Parameters
----------
X : array_like
The input dataset of shape (m x n+1).
y : array_like
Value at given features. A vector of shape (m, ).
theta : array_like
Initial values for the linear regression parameters.
A vector of shape (n+1, ).
alpha : float
The learning rate.
num_iters : int
The number of iterations for gradient descent.
Returns
-------
theta : array_like
The learned linear regression parameters. A vector of shape (n+1, ).
J_history : list
A python list for the values of the cost function after each iteration.
Instructions
------------
Peform a single gradient step on the parameter vector theta.
# make a copy of theta, to avoid changing the original array, since numpy arrays
# are passed by reference to functions
theta = theta.copy()
h = np.dot(X,theta)
for i in range(num_iters):
# ==================== YOUR CODE HERE =================================
theta = theta - alpha/m * (np.sum((h - y)*i))
# =====================================================================
After you are finished call the implemented gradientDescent function and print the computed θ. We initialize the θ
parameters to 0 and the learning rate α to 0.01. Execute the following cell to check your code.
We will use your final parameters to plot the linear fit. The results should look like the following figure.
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 10/20
31/03/2023, 12:22 ml-coursera-python-assignments
Your final values for θ will also be used to make predictions on profits in areas of 35,000 and 70,000 people.
Note the way that the following lines use matrix multiplication, rather than explicit summation or looping, to
calculate the predictions. This is an example of code vectorization in `numpy`.
Note that the first argument to the `numpy` function `dot` is a python list. `numpy` can internally converts
**valid** python lists to numpy arrays when explicitly provided as arguments to `numpy` functions.
You should now submit your solutions by executing the next cell.
grader[3] = gradientDescent
grader.grade()
Invalid email or token. You used an invalid email or your token may have expired. Please make sure you have entered all field
The purpose of these graphs is to show you how J(θ) varies with changes in θ0 and θ1 . The cost function J(θ) is bowl-
shaped and has a global minimum. (This is easier to see in the contour plot than in the 3D surface plot). This minimum is the
optimal point for θ0 and θ1 , and each step of gradient descent moves closer to this point.
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 11/20
31/03/2023, 12:22 ml-coursera-python-assignments
# surface plot
fig = pyplot.figure(figsize=(12, 5))
ax = fig.add_subplot(121, projection='3d')
ax.plot_surface(theta0_vals, theta1_vals, J_vals, cmap='viridis')
pyplot.xlabel('theta0')
pyplot.ylabel('theta1')
pyplot.title('Surface')
# contour plot
# Plot J_vals as 15 contours spaced logarithmically between 0.01 and 100
ax = pyplot.subplot(122)
pyplot.contour(theta0_vals, theta1_vals, J_vals, linewidths=2, cmap='viridis', levels=np.logspace(-2, 3, 20)
pyplot.xlabel('theta0')
pyplot.ylabel('theta1')
pyplot.plot(theta[0], theta[1], 'ro', ms=10, lw=2)
pyplot.title('Contour, showing minimum')
pass
Optional Exercises
If you have successfully completed the material above, congratulations! You now understand linear regression and should
able to start using it on your own datasets.
For the rest of this programming exercise, we have included the following optional exercises. These exercises will help you
gain a deeper understanding of the material, and if you are able to do so, we encourage you to complete them as well. You
can still submit your solutions to these exercises to check if your answers are correct.
We start by loading and displaying some values from this dataset. By looking at the values, note that house sizes are about
1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make
gradient descent converge much more quickly.
# Load data
data = np.loadtxt(os.path.join('Data', 'ex1data2.txt'), delimiter=',')
X = data[:, :2]
y = data[:, 2]
m = y.size
You will do this for all the features and your code should work with datasets of all sizes (any number of features / examples).
Note that each column of the matrix X corresponds to one feature.
**Implementation Note:** When normalizing the features, it is important to store the values used for
normalization - the mean value and the standard deviation used for the computations. After learning the
parameters from the model, we often want to predict the prices of houses we have not seen before. Given a
new x value (living room area and number of bedrooms), we must first normalize x using the mean and
standard deviation that we had previously computed from the training set.
def featureNormalize(X):
"""
Normalizes the features in X. returns a normalized version of X where
the mean value of each feature is 0 and the standard deviation
is 1. This is often a good preprocessing step to do when working with
learning algorithms.
Parameters
----------
X : array_like
The dataset of shape (m x n).
Returns
-------
X_norm : array_like
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 13/20
31/03/2023, 12:22 ml-coursera-python-assignments
Instructions
------------
First, for each feature dimension, compute the mean of the feature
and subtract it from the dataset, storing the mean value in mu.
Next, compute the standard deviation of each feature and divide
each feature by it's standard deviation, storing the standard deviation
in sigma.
Note that X is a matrix where each column is a feature and each row is
an example. You needto perform the normalization separately for each feature.
Hint
----
You might find the 'np.mean' and 'np.std' functions useful.
"""
# You need to set these values correctly
X_norm = X.copy()
mu = np.zeros(X.shape[1])
sigma = np.zeros(X.shape[1])
# ================================================================
return X_norm, mu, sigma
grader[4] = featureNormalize
grader.grade()
After the featureNormalize function is tested, we now add the intercept term to X_norm :
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 14/20
31/03/2023, 12:22 ml-coursera-python-assignments
You should complete the code for the functions computeCostMulti and gradientDescentMulti to implement the cost
function and gradient descent for linear regression with multiple variables. If your code in the previous part (single variable)
already supports multiple variables, you can use it here too. Make sure your code supports any number of features and is
well-vectorized. You can use the shape property of numpy arrays to find out how many features are present in the dataset.
**Implementation Note:** In the multivariate case, the cost function can also be written in the following
vectorized form:
1
J(θ) = 2m (Xθ
− y )T (Xθ − y )
where
KaTeX parse error: Expected 'EOF', got '\\' at position 27: … (x^{(1)})^T - \̲\̲ - (x…
def computeCostMulti(X, y, theta):
"""
Compute cost for linear regression with multiple variables.
Computes the cost of using theta as the parameter for linear regression to fit the data points in X and
Parameters
----------
X : array_like
The dataset of shape (m x n+1).
y : array_like
A vector of shape (m, ) for the values at a given data point.
theta : array_like
The linear regression parameters. A vector of shape (n+1, )
Returns
-------
J : float
The value of the cost function.
Instructions
------------
Compute the cost of a particular choice of theta. You should set J to the cost.
"""
# Initialize some useful values
m = y.shape[0] # number of training examples
# ==================================================================
return J
grader[5] = computeCostMulti
grader.grade()
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 15/20
31/03/2023, 12:22 ml-coursera-python-assignments
Parameters
----------
X : array_like
The dataset of shape (m x n+1).
y : array_like
A vector of shape (m, ) for the values at a given data point.
theta : array_like
The linear regression parameters. A vector of shape (n+1, )
alpha : float
The learning rate for gradient descent.
num_iters : int
The number of iterations to run gradient descent.
Returns
-------
theta : array_like
The learned linear regression parameters. A vector of shape (n+1, ).
J_history : list
A python list for the values of the cost function after each iteration.
Instructions
------------
Peform a single gradient step on the parameter vector theta.
J_history = []
for i in range(num_iters):
# ======================= YOUR CODE HERE ==========================
# =================================================================
grader[6] = gradientDescentMulti
grader.grade()
If your graph looks very different, especially if your value of J(θ) increases or even blows up, adjust your learning rate and
try again. We recommend trying values of the learning rate α on a log-scale, at multiplicative steps of about 3 times the
previous value (i.e., 0.3, 0.1, 0.03, 0.01 and so on). You may also want to adjust the number of iterations you are running if
that will help you see the overall trend in the curve.
**Implementation Note:** If your learning rate is too large, $J(\theta)$ can diverge and ‘blow up’, resulting in
values which are too large for computer calculations. In these situations, `numpy` will tend to return NaNs. NaN
stands for ‘not a number’ and is often caused by undefined operations that involve −∞ and +∞.
**MATPLOTLIB tip:** To compare how different learning learning rates affect convergence, it is helpful to plot
$J$ for several learning rates on the same figure. This can be done by making `alpha` a python list, and looping
across the values within this list, and calling the plot function in every iteration of the loop. It is also useful to
have a legend to distinguish the different lines within the plot. Search online for `pyplot.legend` for help on
showing legends in `matplotlib`.
Notice the changes in the convergence curves as the learning rate changes. With a small learning rate, you should find that
gradient descent takes a very long time to converge to the optimal value. Conversely, with a large learning rate, gradient
descent might not converge or might even diverge! Using the best learning rate that you found, run the script to run gradient
descent until convergence to find the final values of θ. Next, use this value of θ to predict the price of a house with 1650
square feet and 3 bedrooms. You will use value later to check your implementation of the normal equations. Don’t forget to
normalize your features when you make this prediction!
"""
Instructions
------------
We have provided you with the following starter code that runs
gradient descent with a particular learning rate (alpha).
Finally, you should complete the code at the end to predict the price
of a 1650 sq-ft, 3 br house.
Hint
----
At prediction, make sure you do the same feature normalization.
"""
# Choose some alpha value - change this
alpha = 0.1
num_iters = 400
# ===================================================================
You do not need to submit any solutions for this optional (ungraded) part.
Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no “loop
until convergence” like in gradient descent.
First, we will reload the data to ensure that the variables have not been modified. Remember that while you do not need to
scale your features, we still need to add a column of 1’s to the X matrix to have an intercept term (θ0 ). The code in the next
cell will add the column of 1’s to X for you.
# Load data
data = np.loadtxt(os.path.join('Data', 'ex1data2.txt'), delimiter=',')
X = data[:, :2]
y = data[:, 2]
m = y.size
X = np.concatenate([np.ones((m, 1)), X], axis=1)
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 18/20
31/03/2023, 12:22 ml-coursera-python-assignments
Complete the code for the function normalEqn below to use the formula above to calculate θ.
Parameters
----------
X : array_like
The dataset of shape (m x n+1).
y : array_like
The value at each data point. A vector of shape (m, ).
Returns
-------
theta : array_like
Estimated linear regression parameters. A vector of shape (n+1, ).
Instructions
------------
Complete the code to compute the closed form solution to linear
regression and put the result in theta.
Hint
----
Look up the function `np.linalg.pinv` for computing matrix inverse.
"""
theta = np.zeros(X.shape[1])
# =================================================================
return theta
grader[7] = normalEqn
grader.grade()
Optional (ungraded) exercise: Now, once you have found θ using this method, use it to make a price prediction for a 1650-
square-foot house with 3 bedrooms. You should find that gives the same predicted price as the value you obtained using
the model fit with gradient descent (in Section 3.2.1).
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 19/20
31/03/2023, 12:22 ml-coursera-python-assignments
# ============================================================
https://deepnote.com/@yash-pant-/ml-coursera-python-assignments-fcbc6966-87b3-47b5-8aec-4ed678023448 20/20