ML Week 3 Logistic Regression
ML Week 3 Logistic Regression
To review the material and deepen your understanding of the course content, please
answer the review questions below, and hit submit at the bottom of the page when you're
done.
You are allowed to take/re-take these review quizzes multiple times, and each time you
will see a slightly different set of questions or answers. We will use only your highest
score, and strongly encourage you to continue re-taking each quiz until you get a 100%
score at least once. (Even after that, you can re-take it to review the content further, with
no risk of your final score being reduced.) To prevent rapid-fire guessing, the system
enforces a minimum of 10 minutes between each attempt.
Question 1
Suppose that you have trained a logistic regression classifier, and it outputs on a new
example x a prediction h(x) = 0.2. This means (check all that apply):
Our estimate for P(y=0|x;) is 0.8.
Our estimate for P(y=1|x;) is 0.8.
Our estimate for P(y=1|x;) is 0.2.
Our estimate for P(y=0|x;) is 0.2.
Your answer
Score
Choice explanation
Since we must have P(y=0|
0.25
is 10.2=0.8.
0.25
not 1P(y=1|x;).
0.25
is 0.2.
0.25
Total
1.00 / 1.00
Question 2
Suppose you train a logistic classifier h(x)=g(0+1x1+2x2).
Suppose0=6,1=0,2=1. Which of the following figures represents the decision
boundary found by your classifier?
Your answer
Score
Choice explanation
In this figure, we transition from negative
to positive when x1 goes from below 6 to
above 6, but for the given values of ,
the transition occurs when x2 goes from
0.00
Total
below 6 to above 6
0.00 / 1.00
Question 3
Suppose you have the following training set, and fit a logistic regression
classifier h(x)=g(0+1x1+2x2).
x
1
x2
0.5
1.5
J()0.
J() will be a convex function, so gradient descent should converge to the global
minimum.
Your answer
Scor
e
Choice explanation
using h(x)=g(0+1x1+2x2+3x21+4x1x2+
y(i))x(i)j. Which of these is a correct gradient descent update for logistic regression
with a learning rate of ? Check all that apply.
j:=j1mmi=1(Txy(i))x(i)j(simultaneo
usly update for all j).
Scor
e
Choice explanation
This uses the linear regression
hypothesis Tx instead of that for
:=1mmi=1(h(x(i))y(i))x(i).
j:=j1mmi=1(11+eTx(i)
descent update.
This vectorized version uses the
linear regression
hypothesis Tx instead of that for
:=1mmi=1(Txy(i))x(i).
Total
1.00 / 1.00
Question 5
Which of the following statements are true? Check all that apply.
The cost function J() for logistic regression trained with m1 examples is
always greater than or equal to zero.
The sigmoid function g(z)=11+ez is never greater than one (>1).
For logistic regression, sometimes gradient descent will converge to a local
minimum (and fail to find the global minimum). This is the reason we prefer more
advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/LBFGS/etc).
Linear regression always works well for classification if you classify by using a
threshold on the prediction made by linear regression.
Your answer
Score
Choice explanation
The cost for any example x(i) is
always 0since it is the negative log of
a quantity less than one. The cost
function J() is a summation over the
always in (0,1).
The cost function for logistic regression
is convex, so gradient descent will
always converge to the global minimum.
We still might use a more advanded
Total
0.50 / 1.00