0% found this document useful (0 votes)
160 views6 pages

XII. Support Vector Machines

Here are three ways you could try to improve the performance of your logistic regression classifier: 1. Collect more training data. With only 5000 examples and 10 features, the model may not have enough data to learn the complex patterns in the data. Adding more labeled examples could help the model fit the training set better. 2. Try adding regularization. Adding an L2 regularization term (ridge regression) or L1 regularization term (lasso regression) can prevent overfitting and encourage simpler models. This may help the model generalize better. 3. Increase the number of features. If important predictive features are not included, the model will not be able to fit the data well no matter how much data it has. Engineering additional relevant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views6 pages

XII. Support Vector Machines

Here are three ways you could try to improve the performance of your logistic regression classifier: 1. Collect more training data. With only 5000 examples and 10 features, the model may not have enough data to learn the complex patterns in the data. Adding more labeled examples could help the model fit the training set better. 2. Try adding regularization. Adding an L2 regularization term (ridge regression) or L1 regularization term (lasso regression) can prevent overfitting and encourage simpler models. This may help the model generalize better. 3. Increase the number of features. If important predictive features are not included, the model will not be able to fit the data well no matter how much data it has. Engineering additional relevant
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Feedback — XII.

Support Vector Machines Help Center

You submitted this quiz on Tue 7 Apr 2015 11:03 AM CEST. You got a score of
5.00 out of 5.00.

Question 1
Suppose you have trained an SVM classifier with a Gaussian kernel, and it learned the following
decision boundary on the training set:

When you measure the SVM's performance on a cross validation set, it does poorly. Should you
try increasing or decreasing C ? Increasing or decreasing σ 2 ?

Your Answer Score Explanation

It would be  1.00 The figure shows a decision boundary that is overfit to


reasonable to try the training set, so we'd like to increase the bias / lower
decreasing C . It the variance of the SVM. We can do so by either
would also be decreasing the parameter C or increasing σ 2 .
reasonable to try
increasing σ 2 .
It would be
reasonable to try
increasing C . It would
also be reasonable to
try decreasing σ 2 .

It would be
reasonable to try
decreasing C . It
would also be
reasonable to try
decreasing σ .
2

It would be
reasonable to try
increasing C . It would
also be reasonable to
try increasing σ 2 .

Total 1.00 /
1.00

Question 2
−l (1)


2
||x ||
The formula for the Gaussian kernel is given by similarity(x, l (1) ) = exp (
σ
2
2
) . The figure
below shows a plot of f 1 = similarity(x, l
(1)
) when σ 2 = 1 .
Which of the following is a plot of f 1 when σ 2 = 0.25 ?

Your Answer Score Explanation

 1.00 This figure shows a "narrower" Gaussian


kernel centered at the same location which is
the effect of decreasing σ 2 .

Total 1.00 /
1.00
Question 3
The SVM solves min θ  C ∑ m

i=1
y
(i)
θ
cost 1 (
T
x
(i)
) + (1 −y (i)
)cost 0 (θ T
x
(i)
) + ∑ n

j=1
θ 2
j
where

the functions cost 0 (z) and cost 1 (z) look like this:

The first term in the objective is: C ∑ i=1


m
y
(i)
cost 1 ( θ T
x
(i)
) + (1 −y (i)
)cost 0 ( θ T
x
(i)
). This first

term will be zero if two of the following four conditions hold true. Which are the two conditions
that would guarantee that this term equals zero?

Your Answer Score Explanation

For every example  0.25 For examples with y (i) = 1 , only the cost 1 (θ T x (i) )
with y (i) = 1 , we have term is present. As you can see in the graph, this will
that θ T
x
(i)
≥ 1. be zero for all inputs greater than or equal to 1.

For every example 0.25 For examples with y (i) , only the cost 0 (θ
T
 = 0 x
(i)
)

with y (i)
= 0 , we have term is present. As you can see in the graph, this will
that θ ≤ −1. be zero for all inputs less than or equal to -1.
T (i)
x

For every example  0.25 cost 1 ( θ T


x
(i)
) is still non-zero for inputs between 0
with y (i) = 1 , we have and 1, so being greater than or equal to 0 is
that θ x (i) ≥ 0 . insufficient.
T

For every example  0.25 cost 0 ( θ


is still non-zero for inputs between -1
T
x
(i)
)

with y (i) = 0 , we have and 0, so being less than or equal to 0 is insufficient.


that θ T x (i) ≤ 0 .

Total 1.00 /
1.00

Question 4
Suppose you have a dataset with n = 10 features and m = 5000 examples. After training your
logistic regression classifier with gradient descent, you find that it has underfit the training set and

does not achieve the desired performance on the training or cross validation sets. Which of the
following might be promising steps to take? Check all that apply.

Your Answer Score Explanation

Use an SVM with a  0.25 By using a Gaussian kernel, your model will have
Gaussian Kernel. greater complexity and can avoid underfitting the
data.

Use a different  0.25 The logistic regression cost function is convex,


optimization method since so gradient descent will always find the global
using gradient descent to minimum.
train logistic regression might
result in a local minimum.

Create / add new  0.25 When you add more features, you increase the
polynomial features. variance of your model, reducing the chances of
underfitting.

Reduce the number of  0.25 While you can improve accuracy on the training
examples in the training set. set by removing examples, doing so results in a
worse model that will not generalize as well.

Total 1.00 /
1.00

Question 5
Which of the following statements are true? Check all that apply.

Your Answer Score Explanation

Suppose you have 2D  0.25 The SVM without any kernel (ie, the linear kernel)
input examples (ie, predicts output based only on θ x, so it gives a
T

x
(i)

∈ 2 ). The decision linear / straight-line decision boundary, just as
boundary of the SVM logistic regression does.
(with the linear kernel) is a
straight line.

Suppose you are using  0.25 The one-vs-all method requires that we have a
SVMs to do multi-class separate classifier for every class, so you will train K
classification and would different SVMs.
like to use the one-vs-all
approach. If you have K
different classes, you will
train K - 1 different SVMs.

It is important to  0.25 The similarity measure used by the Gaussian kernel


perform feature expects that the data lie in approximately the same
normalization before using range.
the Gaussian kernel.

If you are training multi-  0.25 Each SVM you train in the one-vs-all method is a
class SVMs with the one- standard SVM, so you are free to use a kernel.
vs-all method, it is not
possible to use a kernel.

Total 1.00 /
1.00

You might also like