C1 W3 Logistic Regression
C1 W3 Logistic Regression
In this exercise, you will implement logistic regression and apply it to two different datasets.
Outline
• 1 - Packages
• 2 - Logistic Regression
– 2.1 Problem Statement
– 2.2 Loading and visualizing the data
– 2.3 Sigmoid function
– 2.4 Cost function for logistic regression
– 2.5 Gradient for logistic regression
– 2.6 Learning parameters using gradient descent
– 2.7 Plotting the decision boundary
– 2.8 Evaluating logistic regression
• 3 - Regularized Logistic Regression
– 3.1 Problem Statement
– 3.2 Loading and visualizing the data
– 3.3 Feature mapping
– 3.4 Cost function for regularized logistic regression
– 3.5 Gradient for regularized logistic regression
– 3.6 Learning parameters using gradient descent
– 3.7 Plotting the decision boundary
– 3.8 Evaluating regularized logistic regression model
1 - Packages
First, let's run the cell below to import all the packages that you will need during this
assignment.
2 - Logistic Regression
In this part of the exercise, you will build a logistic regression model to predict whether a
student gets admitted into a university.
• You have historical data from previous applicants that you can use as a training set for
logistic regression.
• For each training example, you have the applicant’s scores on two exams and the
admissions decision.
• Your task is to build a classification model that estimates an applicant’s probability of
admission based on the scores from those two exams.
• The load_dataset() function shown below loads the data into variables X_train and
y_train
– X_train contains exam scores on two exams for a student
– y_train is the admission decision
• y_train = 1 if the student was admitted
• y_train = 0 if the student was not admitted
– Both X_train and y_train are numpy arrays.
# load dataset
X_train, y_train = load_data("data/ex2data1.txt")
• A good place to start is to just print out each variable and see what it contains.
The code below prints the first five values of X_train and the type of the variable.
print("First five elements in X_train are:\n", X_train[:5])
print("Type of X_train:",type(X_train))
• The code below displays the data on a 2D plot (as shown below), where the axes are the
two exam scores, and the positive and negative examples are shown with different
markers.
• We use a helper function in the utils.py file to generate this plot.
# Plot examples
plot_data(X_train, y_train[:], pos_label="Admitted", neg_label="Not
admitted")
# Set the y-axis label
plt.ylabel('Exam 2 score')
# Set the x-axis label
plt.xlabel('Exam 1 score')
plt.legend(loc="upper right")
plt.show()
• With this model, you can then predict if a new student will be admitted based on their
scores on the two exams.
f w ,b ( x )=g ( w ⋅ x +b )
where function g is the sigmoid function. The sigmoid function is defined as:
1
g ( z )= −z
1+ e
Let's implement the sigmoid function first, so it can be used by the rest of this assignment.
Exercise 1
Please complete the sigmoid function to calculate
1
g ( z )= −z
1+ e
Note that
If you get stuck, you can check out the hints presented after the cell below to help you with the
implementation.
# UNQ_C1
# GRADED FUNCTION: sigmoid
def sigmoid(z):
"""
Compute the sigmoid of z
Args:
z (ndarray): A scalar, numpy array of any size.
Returns:
g (ndarray): sigmoid(z), with the same shape as z
"""
numpy has a function called np.exp(), which offers a convinient way to calculate the
exponential ( e z ) of all elements in the input array (z).
When you are finished, try testing a few values by calling sigmoid(x) in the cell below.
• For large positive values of x, the sigmoid should be close to 1, while for large negative
values, the sigmoid should be close to 0.
• Evaluating sigmoid(0) should give you exactly 0.5.
print ("sigmoid(0) = " + str(sigmoid(0)))
sigmoid(0) = 0.5
Expected Output: sigmoid(0) 0.5
• As mentioned before, your code should also work with vectors and matrices. For a
matrix, your function should perform the sigmoid function on every element.
print ("sigmoid([ -1, 0, 1, 2]) = " + str(sigmoid(np.array([-1, 0, 1,
2]))))
# UNIT TESTS
from public_tests import *
sigmoid_test(sigmoid)
Exercise 2
Please complete the compute_cost function using the equations below.
Recall that for logistic regression, the cost function is of the form
m −1
1
[
J ( w , b )= ∑ l o s s ( f w ,b ( x (i ) ) , y ( i) ))
m i=0
where
l o s s ( f w ,b ( x ) , y )=¿
(i ) (i )
• As you are doing this, remember that the variables X_train and y_train are not scalar
values but matrices of shape (m , n) and (𝑚,1) respectively, where 𝑛 is the number of
features and 𝑚 is the number of training examples.
• You can use the sigmoid function that you implemented above for this part.
If you get stuck, you can check out the hints presented after the cell below to help you with the
implementation.
# UNQ_C2
# GRADED FUNCTION: compute_cost
def compute_cost(X, y, w, b, lambda_= 1):
"""
Computes the cost over all examples
Args:
X : (ndarray Shape (m,n)) data, m examples by n features
y : (array_like Shape (m,)) target value
w : (array_like Shape (n,)) Values of parameters of the model
m, n = X.shape
Run the cells below to check your implementation of the compute_cost function with two
different initializations of the parameters w
m, n = X_train.shape
# UNIT TESTS
compute_cost_test(compute_cost)
Exercise 3
∂ J ( w , b) ∂ J ( w , b)
Please complete the compute_gradient function to compute , from
∂w ∂b
equations (2) and (3) below.
m −1
∂ J ( w , b) 1
= ∑ ( f w, b ( x (i) ) − y ( i) )
∂b m i=0
m −1
∂ J ( w , b) 1
= ∑ ( f w, b ( x (i) ) − y ( i) ) x (ji )
∂wj m i=0
• m is the number of training examples in the dataset
• Note: While this gradient looks identical to the linear regression gradient, the
formula is actually different because linear and logistic regression have different
definitions of f w ,b ( x ).
As before, you can use the sigmoid function that you implemented above and if you get stuck,
you can check out the hints presented after the cell below to help you with the implementation.
# UNQ_C3
# GRADED FUNCTION: compute_gradient
def compute_gradient(X, y, w, b, lambda_=None):
"""
Computes the gradient for logistic regression
Args:
X : (ndarray Shape (m,n)) variable such as house size
y : (array_like Shape (m,1)) actual value
w : (array_like Shape (n,1)) values of parameters of the model
• Here's how you can structure the overall implementation for this function ```python
def compute_gradient(X, y, w, b, lambda_=None): m, n = X.shape dj_dw =
np.zeros(w.shape) dj_db = 0.
```
If you're still stuck, you can check the hints presented below to figure out how to
calculate f_wb, dj_db_i and dj_dw_ij
Hint to calculate f_wb Recall that you calculated f_wb in compute_cost above
— for detailed hints on how to calculate each intermediate term, check out the hints
section below that exercise More hints to calculate f_wb You can
calculate f_wb as for i in range(m):
# Calculate f_wb (exactly how you did it in the compute_cost function above) z_wb =
0 # Loop over each feature for j in range(n): # Add the corresponding term to z_wb
z_wb_ij = X[i, j] * w[j] z_wb += z_wb_ij
Hint to calculate dj_db_i You can calculate dj_db_i as dj_db_i = f_wb - y[i]
Hint to calculate dj_dw_ij You can calculate dj_dw_ij as dj_dw_ij = (f_wb - y[i])*
X[i][j]
Run the cells below to check your implementation of the compute_gradient function with
two different initializations of the parameters w
# UNIT TESTS
compute_gradient_test(compute_gradient)
• You don't need to implement anything for this part. Simply run the cells below.
• A good way to verify that gradient descent is working correctly is to look at the
value of J ( w , b ) and check that it is decreasing with each step.
• Assuming you have implemented the gradient and computed the cost correctly, your
value of J ( w , b ) should never increase, and should converge to a steady value by the
end of the algorithm.
Args:
X : (array_like Shape (m, n)
y : (array_like Shape (m,))
w_in : (array_like Shape (n,)) Initial values of parameters of
the model
b_in : (scalar) Initial value of parameter of
the model
cost_function: function to compute cost
alpha : (float) Learning rate
num_iters : (int) number of iterations to run
gradient descent
lambda_ (scalar, float) regularization constant
Returns:
w : (array_like Shape (n,)) Updated values of parameters of the
model after
running gradient descent
b : (scalar) Updated value of parameter of the
model after
running gradient descent
"""
for i in range(num_iters):
Now let's run the gradient descent algorithm above to learn the parameters for our dataset.
Note
The code block below takes a couple of minutes to run, especially with a non-vectorized version.
You can reduce the iterations to test your implementation and iterate faster. If you have
time, try running 100,000 iterations for better results.
np.random.seed(1)
intial_w = 0.01 * (np.random.rand(2).reshape(-1,1) - 0.5)
initial_b = -8
We will use a helper function in the utils.py file to create this plot.
Exercise 4
Please complete the predict function to produce 1 or 0 predictions given a dataset and a
learned parameter vector w and b .
• First you need to compute the prediction from the model f ( x ( i) )=g ( w ⋅ x (i) ) for every
example
– You've implemented this before in the parts above
• We interpret the output of the model ( f ( x ( i) )) as the probability that y (i )=1 given x (i )
and parameterized by w .
• Therefore, to get a final prediction ( y (i )=0 or y (i )=1) from the logistic regression
model, you can use the following heuristic -
# UNQ_C4
# GRADED FUNCTION: predict
Args:
X : (ndarray Shape (m, n))
w : (array_like Shape (n,)) Parameters of the model
b : (scalar, float) Parameter of the model
Returns:
p: (ndarray (m,1))
The predictions for X using a threshold at 0.5
"""
# number of training examples
m, n = X.shape
p = np.zeros(m)
Once you have completed the function predict, let's run the code below to report the training
accuracy of your classifier by computing the percentage of examples it got correct.
# UNIT TESTS
predict_test(predict)
Output of predict: shape (4,), value [0. 1. 1. 1.]
All tests passed!
Expected output
Now let's use this to compute the accuracy on the training set
• From these two tests, you would like to determine whether the microchips should be
accepted or rejected.
• To help you make the decision, you have a dataset of test results on past microchips,
from which you can build a logistic regression model.
• The load_dataset() function shown below loads the data into variables X_train and
y_train
– X_train contains the test results for the microchips from two tests
– y_train contains the results of the QA
• y_train = 1 if the microchip was accepted
• y_train = 0 if the microchip was rejected
– Both X_train and y_train are numpy arrays.
# load dataset
X_train, y_train = load_data("data/ex2data2.txt")
# print X_train
print("X_train:", X_train[:5])
print("Type of X_train:",type(X_train))
# print y_train
print("y_train:", y_train[:5])
print("Type of y_train:",type(y_train))
# Plot examples
plot_data(X_train, y_train[:], pos_label="Accepted",
neg_label="Rejected")
# Set the y-axis label
plt.ylabel('Microchip Test 2')
# Set the x-axis label
plt.xlabel('Microchip Test 1')
plt.legend(loc="upper right")
plt.show()
Figure 3 shows that our dataset cannot be separated into positive and negative examples by a
straight-line through the plot. Therefore, a straight forward application of logistic regression
will not perform well on this dataset since logistic regression will only be able to find a linear
decision boundary.
As a result of this mapping, our vector of two features (the scores on two QA tests) has been
transformed into a 27-dimensional vector.
• A logistic regression classifier trained on this higher-dimension feature vector will have a
more complex decision boundary and will be nonlinear when drawn in our 2-dimensional
plot.
• We have provided the map_feature function for you in utils.py.
print("Original shape of data:", X_train.shape)
Let's also print the first elements of X_train and mapped_X to see the tranformation.
print("X_train[0]:", X_train[0])
print("mapped X_train[0]:", mapped_X[0])
While the feature mapping allows us to build a more expressive classifier, it is also more
susceptible to overfitting. In the next parts of the exercise, you will implement regularized
logistic regression to fit the data and also see for yourself how regularization can help combat
the overfitting problem.
3.4 Cost function for regularized logistic regression
In this part, you will implement the cost function for regularized logistic regression.
Recall that for regularized logistic regression, the cost function is of the form
m −1 n −1
J ( w , b )=
1
m i=0
[
∑ − y (i ) log ( f w , b ( x( i) ) ) − ( 1− y (i )) log ( 1− f w ,b ( x (i) ) ) )+ 2λm ∑ w2j
j=0
Compare this to the cost function without regularization (which you implemented above), which
is of the form
m− 1
1
J ( w . b )= ∑ ¿ ¿
m i=0
The difference is the regularization term, which is
n−1
λ
∑
2m j=0
w 2j
Exercise 5
Please complete the compute_cost_reg function below to calculate the following term for
each element in w
n−1
λ
∑
2m j=0
w 2j
The starter code then adds this to the cost without regularization (which you computed above in
compute_cost) to calculate the cost with regulatization.
If you get stuck, you can check out the hints presented after the cell below to help you with the
implementation.
# UNQ_C5
def compute_cost_reg(X, y, w, b, lambda_ = 1):
"""
Computes the cost over all examples
Args:
X : (array_like Shape (m,n)) data, m examples by n features
y : (array_like Shape (m,)) target value
w : (array_like Shape (n,)) Values of parameters of the model
m, n = X.shape
return total_cost
Run the cell below to check your implementation of the compute_cost_reg function.
# UNIT TEST
compute_cost_reg_test(compute_cost_reg)
( )
m− 1
∂ J ( w , b) 1 λ
= ∑ ( f w ,b ( x (i ) ) − y (i )) x (ji) + w j for j=0 . . . ( n −1 )
∂wj m i=0 m
Compare this to the gradient of the cost function without regularization (which you
implemented above), which is of the form
m −1
∂ J ( w , b) 1
= ∑ ( f w, b ( x (i) ) − y ( i) )
∂b m i=0
m −1
∂ J ( w , b) 1
= ∑ ( f w, b ( x (i) ) − y ( i) ) x (ji )
∂wj m i=0
∂ J ( w , b) ∂ J ( w , b)
As you can see, is the same, the difference is the following term in , which is
∂b ∂w
λ
w for j=0 .. . ( n −1 )
m j
Exercise 6
Please complete the compute_gradient_reg function below to modify the code below to
calculate the following term
λ
w for j=0 .. . ( n −1 )
m j
∂ J ( w , b)
The starter code will add this term to the returned from compute_gradient above
∂w
to get the gradient for the regularized cost function.
If you get stuck, you can check out the hints presented after the cell below to help you with the
implementation.
# UNQ_C6
def compute_gradient_reg(X, y, w, b, lambda_ = 1):
"""
Computes the gradient for linear regression
Args:
X : (ndarray Shape (m,n)) variable such as house size
y : (ndarray Shape (m,)) actual value
w : (ndarray Shape (n,)) values of parameters of the model
"""
m, n = X.shape
Run the cell below to check your implementation of the compute_gradient_reg function.
lambda_ = 0.5
dj_db, dj_dw = compute_gradient_reg(X_mapped, y_train, initial_w,
initial_b, lambda_)
print(f"dj_db: {dj_db}", )
print(f"First few elements of regularized dj_dw:\n
{dj_dw[:4].tolist()}", )
# UNIT TESTS
compute_gradient_reg_test(compute_gradient_reg)
dj_db: 0.07138288792343662
First few elements of regularized dj_dw:
[-0.010386028450548703, 0.011409852883280122, 0.0536273463274574,
0.003140278267313462]
All tests passed!
• If you have completed the cost and gradient for regularized logistic regression correctly,
you should be able to step through the next cell to learn the parameters w .
• After training our parameters, we will use it to plot the decision boundary.
Note
The code block below takes quite a while to run, especially with a non-vectorized version. You
can reduce the iterations to test your implementation and iterate faster. If you hae time, run
for 100,000 iterations to see better results.
• After learning the parameters w ,b , the next step is to plot a decision boundary
similar to Figure 4.