Revision: High Variance
Revision: High Variance
High variance
• indicated by gap in errors between training and testing data sets.
• Algorithm has overfit the data for the training data set.
1
Week 6: Advice for Applying Machine Learning
Question 1.
You train a learning algorithm, and find that it has unacceptably high error on the test
set. You plot the learning curve, and obtain the figure below. Is the algorithm suffering
from high bias, high variance, or neither?
(i) Neither
Question 2.
Suppose you have implemented regularized logistic regression to classify what object is
in an image (i.e., to do object recognition). However, when you test your hypothesis on a
new set of images, you find that it makes unacceptably large errors with its predictions
on the new images. However, your hypothesis performs well (has low error) on the
training set. Which of the following are promising steps to take? Check all that apply.
(ii) WRONG Try evaluating the hypothesis on a cross validation set rather than the
test set.
Question 3.
Suppose you have implemented regularized logistic regression to predict what items cus-
tomers will purchase on a web shopping site. However, when you test your hypothesis
on a new set of customers, you find that it makes unacceptably large errors in its predic-
tions. Furthermore, the hypothesis performs poorly on the training set. Which of the
following might be promising steps to take? Check all that apply.
(iii) WRONG Try evaluating the hypothesis on a cross validation set rather than the
test set.
2
Question 4.
Which of the following statements are true? Check all that apply.
(i) WRONG Suppose you are training a regularized linear regression model. The
recommended way to choose what value of regularization parameter λ to use is to
choose the value of λ which gives the lowest test set error.
(ii) CORRECT Suppose you are training a regularized linear regression model. The
recommended way to choose what value of regularization parameter λ to use is to
choose the value of λ which gives the lowest cross validation error.
(iii) CORRECT The performance of a learning algorithm on the training set will typ-
ically be better than its performance on the test set.
(iv) WRONG Suppose you are training a regularized linear regression model.The rec-
ommended way to choose what value of regularization parameter λ to use is to
choose the value of λ which gives the lowest training set error.
(iv) CORRECT A typical split of a dataset into training, validation and test sets might
be 60% training set, 20% validation set, and 20% test set.
• WRONG It is okay to use data from the test set to choose the regularization
parameter λ, but not the model parameters (θ).
• WRONG Suppose you are training a logistic regression classifier using polynomial
features and want to select what degree polynomial (denoted d in the lecture
videos) to use. After training the classifier on the entire training set, you decide
to use a subset of the training examples as a validation set. This will work just as
well as having a validation set that is separate (disjoint) from the training set.
• CORRECT Suppose you are using linear regression to predict housing prices, and
your dataset comes sorted in order of increasing sizes of houses. It is then important
to randomly shuffle the dataset before splitting it into training, validation and test
sets, so that we don’t have all the smallest houses going into the training set, and
all the largest houses going into the test set.
Question 5.
Which of the following statements are true? Check all that apply.
(i) CORRECT A model with more parameters is more prone to overfitting and typi-
cally has higher variance.
(ii) WRONG If the training and test errors are about the same, adding more features
will not help improve the results.
3
(iii) CORRECT If a learning algorithm is suffering from high variance, adding more
training examples is likely to improve the test error.
(iv) CORRECT If a learning algorithm is suffering from high bias, only adding more
training examples may not improve the test error significantly.
4
Week 6: Machine Learning System Design
Question 1
You are working on a spam classification system using regularized logistic regression.
“Spam” is a positive class (y = 1) and “not spam” is the negative class (y = 0). You
have trained your classifier and there are m = 1000 examples in the cross-validation set.
The chart of predicted class vs. actual class is:
For reference:
0.1 Question 1
CORRECT Suppose a massive dataset is available for training a learning algorithm.
Training on a lot of data is likely to give good performance when two of the following
conditions hold true. Which are the two?
(i) WRONG When we are willing to include high order polynomial features of x (such
as x21 , x22 , x1 x2 , etc.).
(ii) SELECTED The features x contain sufficient information to predict y accurately.
(For example, one way to verify this is if a human expert on the domain can
confidently predict y when given only x).
(iii) WRONG We train a learning algorithm with a large number of parameters (that
is able to learn/represent fairly complex functions).
(iv) We train a learning algorithm with a small number of parameters (that is thus
unlikely to overfit).
5
0.2 Question 3.
Suppose you have trained a logistic regression classifier which is outputing hθ(x). Cur-
rently, you predict 1 if hθ(x) ≥threshold, and predict 0 if hθ(x) ¡ threshold, where
currently the threshold is set to 0.5. Suppose you decrease the threshold to 0.1. Which
of the following are true? Check all that apply.
(i) The classifier is likely to have unchanged precision and recall, but lower accuracy.
(iv) The classifier is likely to have unchanged precision and recall, but higher accuracy.
(v) The classifier is likely to have unchanged precision and recall, and thus the same
F1 score.
(vi) The classifier is likely to have unchanged precision and recall, but higher accuracy.
Question 4.
WRONG Suppose you are working on a spam classifier, where spam emails are positive
examples (y=1) and non-spam emails are negative examples (y=0). You have a training
set of emails in which 99% of the emails are non-spam and the other 1% is spam. Which
of the following statements are true? Check all that apply.
• WRONG If you always predict spam (output y=1), your classifier will have a recall
of 0% and precision of 99%.
• WRONG If you always predict non-spam (output y=0), your classifier will have a
recall of 0%.
• CORRECT If you always predict spam (output y=1), your classifier will have a
recall of 100% and precision of 1%.
• CORRECT If you always predict non-spam (output y=0), your classifier will have
an accuracy of 99%.
predictT predictF
actualT
actualF
6
Question 5.
Which of the following statements are true? Check all that apply.
(i) CORRECT Using a very large training set makes it unlikely for model to overfit
the training data.
(ii) WRONG If your model is underfitting the training set, then obtaining more data
is likely to help.
(iii) CORRECT The ”error analysis” process of manually examining the examples
which your algorithm got wrong can help suggest what are good steps to take
(e.g., developing new features) to improve your algorithm’s performance.
(iv) WRONG It is a good idea to spend a lot of time collecting a large amount of data
before building your first version of a learning algorithm.
(v) WRONG After training a logistic regression classifier, you must use 0.5 as your
threshold for predicting whether an example is positive or negative.
7
What is the classifier’s recall : 0.85 Accuracy : 0.095 Suppose a massive dataset is
available for training a learning algorithm. Training on a lot of data is likely to give
good performance when two of the following conditions hold true.
Which are the two?
We train a learning algorithm with a large number of parameters (that is able to
learn/represent fairly complex functions).
We train a learning algorithm with a small number of parameters (that is thus
unlikely to overfit).
CORRECT The features x contain sufficient information to predict y accurately.
(For example, one way to verify this is if a human expert on the domain can confidently
predict y when given only x).
WRONG When we are willing to include high order polynomial features of x (such
as x21, x22, x1x2, etc.).
CORRECT The classifier is likely to now have lower precision.
CORRECT If you always predict non-spam (output y=0), your classifier will have a
Recall of 0
If you always predict non-spam (output y=0), your classifier will have an accuracy
of 99
WRONG If you always predict non-spam (output y=0), your classifier will have 99
WRONG If you always predict non-spam (output y=0), your classifier will have 99
A good classifier should have both a high precision and high recall on the cross
validation set.
Q5
CORRECT The ”error analysis” process of manually examining the examples which
your algorithm got wrong can help suggest what are good steps to take (e.g., developing
new features) to improve your algorithm’s performance.
CORRECT Using a very large training set makes it unlikely for model to overfit the
training data.