0% found this document useful (0 votes)
88 views8 pages

Revision: High Variance

- The classifier's precision is 0.1, based on 85 true positives and 890 false positives out of 1000 total examples in the cross-validation set. - Two conditions where training on a large dataset is likely to give good performance are: 1) the features contain sufficient information to predict the target accurately, and 2) the learning algorithm has a large number of parameters.

Uploaded by

Kailash A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views8 pages

Revision: High Variance

- The classifier's precision is 0.1, based on 85 true positives and 890 false positives out of 1000 total examples in the cross-validation set. - Two conditions where training on a large dataset is likely to give good performance are: 1) the features contain sufficient information to predict the target accurately, and 2) the learning algorithm has a large number of parameters.

Uploaded by

Kailash A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Revision

High variance
• indicated by gap in errors between training and testing data sets.

• Algorithm has overfit the data for the training data set.

• Increasing the regularization parameter will reduce overfitting

The recommended way to choose a value of regularization parameter λ to use is to


choose the lowest cross validation error. You should not use the training data set for
this purpose.

1
Week 6: Advice for Applying Machine Learning
Question 1.
You train a learning algorithm, and find that it has unacceptably high error on the test
set. You plot the learning curve, and obtain the figure below. Is the algorithm suffering
from high bias, high variance, or neither?

(i) Neither

(ii) High variance

(iii) High bias [CORRECT]

Question 2.
Suppose you have implemented regularized logistic regression to classify what object is
in an image (i.e., to do object recognition). However, when you test your hypothesis on a
new set of images, you find that it makes unacceptably large errors with its predictions
on the new images. However, your hypothesis performs well (has low error) on the
training set. Which of the following are promising steps to take? Check all that apply.

(i) SELECTED Try increasing the regularization parameter λ.

(ii) WRONG Try evaluating the hypothesis on a cross validation set rather than the
test set.

(iii) CORRECT Try using a smaller set of features.

(iv) WRONG Try decreasing the regularization parameter λ.

• SELECTED Get more training examples.

Question 3.
Suppose you have implemented regularized logistic regression to predict what items cus-
tomers will purchase on a web shopping site. However, when you test your hypothesis
on a new set of customers, you find that it makes unacceptably large errors in its predic-
tions. Furthermore, the hypothesis performs poorly on the training set. Which of the
following might be promising steps to take? Check all that apply.

(i) SELECTED Try decreasing the regularization parameter λ.

(ii) WRONG Use fewer training examples.

(iii) WRONG Try evaluating the hypothesis on a cross validation set rather than the
test set.

(iv) CORRECT Try adding polynomial features.

2
Question 4.
Which of the following statements are true? Check all that apply.

(i) WRONG Suppose you are training a regularized linear regression model. The
recommended way to choose what value of regularization parameter λ to use is to
choose the value of λ which gives the lowest test set error.

(ii) CORRECT Suppose you are training a regularized linear regression model. The
recommended way to choose what value of regularization parameter λ to use is to
choose the value of λ which gives the lowest cross validation error.

(iii) CORRECT The performance of a learning algorithm on the training set will typ-
ically be better than its performance on the test set.

(iv) WRONG Suppose you are training a regularized linear regression model.The rec-
ommended way to choose what value of regularization parameter λ to use is to
choose the value of λ which gives the lowest training set error.

(iv) CORRECT A typical split of a dataset into training, validation and test sets might
be 60% training set, 20% validation set, and 20% test set.

• WRONG It is okay to use data from the test set to choose the regularization
parameter λ, but not the model parameters (θ).

• WRONG Suppose you are training a logistic regression classifier using polynomial
features and want to select what degree polynomial (denoted d in the lecture
videos) to use. After training the classifier on the entire training set, you decide
to use a subset of the training examples as a validation set. This will work just as
well as having a validation set that is separate (disjoint) from the training set.

• CORRECT Suppose you are using linear regression to predict housing prices, and
your dataset comes sorted in order of increasing sizes of houses. It is then important
to randomly shuffle the dataset before splitting it into training, validation and test
sets, so that we don’t have all the smallest houses going into the training set, and
all the largest houses going into the test set.

Question 5.
Which of the following statements are true? Check all that apply.

(i) CORRECT A model with more parameters is more prone to overfitting and typi-
cally has higher variance.

(ii) WRONG If the training and test errors are about the same, adding more features
will not help improve the results.

3
(iii) CORRECT If a learning algorithm is suffering from high variance, adding more
training examples is likely to improve the test error.

(iv) CORRECT If a learning algorithm is suffering from high bias, only adding more
training examples may not improve the test error significantly.

4
Week 6: Machine Learning System Design
Question 1
You are working on a spam classification system using regularized logistic regression.
“Spam” is a positive class (y = 1) and “not spam” is the negative class (y = 0). You
have trained your classifier and there are m = 1000 examples in the cross-validation set.
The chart of predicted class vs. actual class is:

Actual Class: 1 Actual Class: 0


Predicted Class: 1 85 890
Predicted Class: 0 15 10

For reference:

• Accuracy = (true positives + true negatives) / (total examples)

• Precision = (true positives) / (true positives + false positives)

• Recall = (true positives) / (true positives + false negatives)

• F1 score = (2 × precision × recall) / (precision + recall)

What is the classifier’s precision (as a value from 0 to 1)?


Enter your answer in the box below. If necessary, provide at least two values after
the decimal point.
0.09

0.1 Question 1
CORRECT Suppose a massive dataset is available for training a learning algorithm.
Training on a lot of data is likely to give good performance when two of the following
conditions hold true. Which are the two?

(i) WRONG When we are willing to include high order polynomial features of x (such
as x21 , x22 , x1 x2 , etc.).
(ii) SELECTED The features x contain sufficient information to predict y accurately.
(For example, one way to verify this is if a human expert on the domain can
confidently predict y when given only x).
(iii) WRONG We train a learning algorithm with a large number of parameters (that
is able to learn/represent fairly complex functions).
(iv) We train a learning algorithm with a small number of parameters (that is thus
unlikely to overfit).

5
0.2 Question 3.
Suppose you have trained a logistic regression classifier which is outputing hθ(x). Cur-
rently, you predict 1 if hθ(x) ≥threshold, and predict 0 if hθ(x) ¡ threshold, where
currently the threshold is set to 0.5. Suppose you decrease the threshold to 0.1. Which
of the following are true? Check all that apply.

(i) The classifier is likely to have unchanged precision and recall, but lower accuracy.

(ii) The classifier is likely to now have higher precision.

(iii) SELECTED The classifier is likely to now have higher recall.

(iv) The classifier is likely to have unchanged precision and recall, but higher accuracy.

(v) The classifier is likely to have unchanged precision and recall, and thus the same
F1 score.

(vi) The classifier is likely to have unchanged precision and recall, but higher accuracy.

(vii) SELECTED The classifier is likely to now have lower precision.

(viii) The classifier is likely to now have lower recall.

Question 4.
WRONG Suppose you are working on a spam classifier, where spam emails are positive
examples (y=1) and non-spam emails are negative examples (y=0). You have a training
set of emails in which 99% of the emails are non-spam and the other 1% is spam. Which
of the following statements are true? Check all that apply.

• WRONG If you always predict spam (output y=1), your classifier will have a recall
of 0% and precision of 99%.

• WRONG If you always predict non-spam (output y=0), your classifier will have a
recall of 0%.

• CORRECT If you always predict spam (output y=1), your classifier will have a
recall of 100% and precision of 1%.

• CORRECT If you always predict non-spam (output y=0), your classifier will have
an accuracy of 99%.

predictT predictF
actualT
actualF

6
Question 5.
Which of the following statements are true? Check all that apply.

(i) CORRECT Using a very large training set makes it unlikely for model to overfit
the training data.

(ii) WRONG If your model is underfitting the training set, then obtaining more data
is likely to help.

(iii) CORRECT The ”error analysis” process of manually examining the examples
which your algorithm got wrong can help suggest what are good steps to take
(e.g., developing new features) to improve your algorithm’s performance.

(iv) WRONG It is a good idea to spend a lot of time collecting a large amount of data
before building your first version of a learning algorithm.

(v) WRONG After training a logistic regression classifier, you must use 0.5 as your
threshold for predicting whether an example is positive or negative.

7
What is the classifier’s recall : 0.85 Accuracy : 0.095 Suppose a massive dataset is
available for training a learning algorithm. Training on a lot of data is likely to give
good performance when two of the following conditions hold true.
Which are the two?
We train a learning algorithm with a large number of parameters (that is able to
learn/represent fairly complex functions).
We train a learning algorithm with a small number of parameters (that is thus
unlikely to overfit).
CORRECT The features x contain sufficient information to predict y accurately.
(For example, one way to verify this is if a human expert on the domain can confidently
predict y when given only x).
WRONG When we are willing to include high order polynomial features of x (such
as x21, x22, x1x2, etc.).
CORRECT The classifier is likely to now have lower precision.
CORRECT If you always predict non-spam (output y=0), your classifier will have a
Recall of 0
If you always predict non-spam (output y=0), your classifier will have an accuracy
of 99
WRONG If you always predict non-spam (output y=0), your classifier will have 99
WRONG If you always predict non-spam (output y=0), your classifier will have 99
A good classifier should have both a high precision and high recall on the cross
validation set.
Q5
CORRECT The ”error analysis” process of manually examining the examples which
your algorithm got wrong can help suggest what are good steps to take (e.g., developing
new features) to improve your algorithm’s performance.
CORRECT Using a very large training set makes it unlikely for model to overfit the
training data.

You might also like