Stanford KNNassignment
Stanford KNNassignment
drive.mount('/content/drive', force_remount=True)
# enter the foldername in your Drive where you have saved the unzipped
# 'cs231n' folder containing the '.py', 'classifiers' and 'datasets'
# folders.
# e.g. 'cs231n/assignments/assignment1/cs231n/'
FOLDERNAME = 'cs231n/assignments/assignment1/cs231n/'
Mounted at /content/drive
/content/drive/My Drive
/content
/content/cs231n/datasets
--2020-04-23 04:41:28-- http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Resolving www.cs.toronto.edu (www.cs.toronto.edu)... 128.100.3.30
Connecting to www.cs.toronto.edu (www.cs.toronto.edu)|128.100.3.30|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 170498071 (163M) [application/x-gzip]
Saving to: cifar-10-python.tar.gz
cifar-10-batches-py/
1
cifar-10-batches-py/data_batch_4
cifar-10-batches-py/readme.html
cifar-10-batches-py/test_batch
cifar-10-batches-py/data_batch_3
cifar-10-batches-py/batches.meta
cifar-10-batches-py/data_batch_2
cifar-10-batches-py/data_batch_5
cifar-10-batches-py/data_batch_1
/content
import random
import numpy as np
from cs231n.data_utils import load_CIFAR10
import matplotlib.pyplot as plt
# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/
,→autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
2
[22]: # Load the raw CIFAR-10 data.
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
# Cleaning up variables to prevent loading data multiple times (which may cause␣
,→memory issue)
try:
del X_train, y_train
del X_test, y_test
print('Clear previously loaded data.')
except:
pass
# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y)
idxs = np.random.choice(idxs, samples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt_idx = i * num_classes + y + 1
plt.subplot(samples_per_class, num_classes, plt_idx)
plt.imshow(X_train[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls)
plt.show()
3
[24]: # Subsample the data for more efficient code execution in this exercise
num_training = 5000
mask = list(range(num_training))
X_train = X_train[mask]
y_train = y_train[mask]
num_test = 500
mask = list(range(num_test))
X_test = X_test[mask]
y_test = y_test[mask]
4
[0]: from cs231n.classifiers import KNearestNeighbor
We would now like to classify the test data with the kNN classifier. Recall that we can break
down this process into two steps:
1. First we must compute the distances between all test examples and all train examples.
2. Given these distances, for each test example we find the k nearest examples and have them
vote for the label
Lets begin with computing the distance matrix between all training and test examples. For
example, if there are Ntr training examples and Nte test examples, this stage should result in
a Nte x Ntr matrix where each element (i,j) is the distance between the i-th test and j-th train
example.
Note: For the three distance computations that we require you to implement in this note-
book, you may not use the np.linalg.norm() function that numpy provides.
First, open cs231n/classifiers/k_nearest_neighbor.py and implement the function
compute_distances_two_loops that uses a (very inefficient) double loop over all pairs of (test,
train) examples and computes the distance matrix one element at a time.
[27]: # Open cs231n/classifiers/k_nearest_neighbor.py and implement
# compute_distances_two_loops.
(500, 5000)
[28]: # We can visualize the distance matrix: each row is a single test example and
# its distances to training examples
plt.imshow(dists, interpolation='none')
plt.show()
5
Inline Question 1
Notice the structured patterns in the distance matrix, where some rows or columns are visi-
ble brighter. (Note that with the default color scheme black indicates low distances while white
indicates high distances.)
• What in the data is the cause behind the distinctly bright rows?
• What causes the columns?
YourAnswer : Since we are working with pixel intensity values, distinctly bright rows indicate
a significant delta in pixel intensities between the corresponding test and training images (say a
black background being "compared" to a white background and vice-versa). While foreground dif-
ferences can obviously contribute to pixel deltas, in some cases the background of an image can be
a relatively huge proportion of the image which can heavily contribute to a significant pixel-level
delta. For the case of distinctly bright rows, we can infer that for the particular test image corre-
sponding to that row, the test images’ contents have a distinctly different foreground/background
that leads to a significantly large pixel delta. Similarly, the distinctly bright columns can be due to
a particular train image having content that does not match the test images.
For e.g., consider an image which has a cat in the center of the image on a white background.
If the other images that this image is being compared against have black backgrounds, this will
cause a large pixel-wise distance between the two images.
[29]: # Now implement the function predict_labels and run the code below:
# We use k = 1 (which is Nearest Neighbor).
y_test_pred = classifier.predict_labels(dists, k=1)
You should expect to see approximately 27% accuracy. Now lets try out a larger k, say k = 5:
[30]: y_test_pred = classifier.predict_labels(dists, k=5)
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))
6
the mean µ across all pixels over all images is
1 n h w (k)
nhw k∑ ∑ ∑ pij
µ=
=1 i =1 j =1
1 n (k)
n k∑
µij = pij .
=1
The general standard deviation σ and pixel-wise standard deviation σij is defined similarly.
Which of the following preprocessing steps will not change the performance of a Nearest
Neighbor classifier that uses L1 distance? Select all that apply. 1. Subtracting the mean µ
(k) (k) (k) (k)
( p̃ij = pij − µ.) 2. Subtracting the per pixel mean µij ( p̃ij = pij − µij .) 3. Subtracting the
mean µ and dividing by the standard deviation σ. 4. Subtracting the pixel-wise mean µij and
dividing by the pixel-wise standard deviation σij . 5. Rotating the coordinate axes of the data.
YourAnswer : 1, 2, 3, 4
YourExplanation :
1. Subtracting the mean µ from the data does have the conditioning effect of centering the data
around the origin. However, in our case since we are dealing with images, the feature ranges
are aleady localized, i.e., all pixels lie within [0-255]. Subtracting the mean will only offset
the pixel values and will likely not have a drastic effect on the performance of the Nearest
Neighbour classifier. To demonstrate this mathematically, the distance between a given test
sample over each sample in the training set is given by,
1 n
n k∑
(k) (k)
L1 = || ( xtest − µ) − ( xtrain − µ) ||1 = || xtest − xtrain ||1
=1
k
Here, xtest and xtrain are vectors that hold a test and the kth training image.
2. Subtracting the per-pixel mean µij from the data also has the conditioning effect of centering
the data around the origin. However, as explained above, since all pixel values are already
localized to [0-255], it wouldn’t impact the performance of the classifier. Similar to (1) above,
to demonstrate this mathematically, the distance between a test and kth training sample is
given by,
1 n h w
∑ ∑ ∑
(k) (k)
L1 = || ( pij − µij ) − (qij − µij ) ||1 = || pij − qij ||1
nhw k=1 i=1 j=1
(k)
Here, pij and qij are the pixels at (i, j) within the test image and the kth training image respec-
tively.
3. Subtracting the mean and dividing by the standard deviation would yield zero-centered
data (µ = 0) with unit variance (σ = 1), i.e., properties of a standard normal distribution. This
pre-processing step would normally be very useful if different features/dimensions of the
data had different ranges, e.g., if feature #1 has a range [-1000-1000] and feature #2 has a
7
range of [0-1], feature #1 will have more impact in the calculation of distances because of the
order of magnitude difference between the two features. In such cases where there is a stark
delta in the features ranges, normalization is key. However, in our case since we are dealing
with images, the feature ranges are mostly similar, i.e., all pixels lie within [0-255] and there
is no feature scale mismatch. The benefits of normalization are limited in our specific case
due to the fact that we have an image dataset. Performing feature scaling does help facilitate
learning by bounding the gradients and thus making gradient descent converge faster and
speed up training, but this does not affect classifier accuracy. Applying normalization thus
does not affect the performance of the classifier.
4. Similar to above, since we are dealing with images, the feature ranges are mostly similar,
i.e., all pixels lie within [0-255]. Performing feature scaling does help facilitate learning by
bounding the gradients and thus making gradient descent converge faster but this does
not improve classifier accuracy. Thus, pixel-wise normalization would not improve perfor-
mance during training.
5. L1 distance is not invariant to the rotation of the coordinate axes, unlike L2. The distance
between points changes with the rotation of axes and thus, the performance of the classifier
can potentially be affected. To demonstrate this mathematically, consider that the coordinate
axes of the data (in red) are rotated 45 degrees (or by pi/4 radians) in the counter-clockwise
direction (in blue), illustrated in the diagram below.
Lets consider three points x = (0, 1), y = (1, 0) and z = (1, -2). The L1-distance between the three
points with y as pivot would be,
∥ x − y ∥1 = ∥ y − z ∥1 = 2
This implies that both x and z are at the same distance from y. Now consider the 45 degrees
rotation matrix:
[ −1
]
√1 √
2 2
√1 √1
2 2
Then, the new co-ordinates after the 45 degrees rotation would be given by,
[ −1
][ ]
√1 √ x
2 2
√1 √1 y
2 2
8
x ′ = Ax
[ ]
− √12
= 1 √
2
′
y = Ay
[ 1 ]
√
= 2
√1
2
z′ = Az
[ ]
√3
= 2
− √12
The new L1-distance between the three points (again, with y as pivot) is,
√
∥ x ′ − y ′ ∥1 = 2
3
∥ y ′ − z ′ ∥1 = √
2
and thus ∥ x ′ − y′ ∥1 < ∥y′ − z′ ∥1 . Hence, ordering is not preserved with L1 distance post
rotation.
[31]: # Now lets speed up distance matrix computation by using partial vectorization
# with one loop. Implement the function compute_distances_one_loop and run the
# code below:
dists_one = classifier.compute_distances_one_loop(X_test)
# the matrices into vectors and compute the Euclidean distance between them.
difference = np.linalg.norm(dists - dists_one, ord='fro')
print('One loop difference was: %f' % (difference, ))
if difference < 0.001:
print('Good! The distance matrices are the same')
else:
print('Uh-oh! The distance matrices are different')
9
dists_two = classifier.compute_distances_no_loops(X_test)
# check that the distance matrix agrees with the one we computed before:
difference = np.linalg.norm(dists - dists_two, ord='fro')
print('No loop difference was: %f' % (difference, ))
if difference < 0.001:
print('Good! The distance matrices are the same')
else:
print('Uh-oh! The distance matrices are different')
"""
import time
tic = time.time()
f(*args)
toc = time.time()
return toc - tic
# You should see significantly faster performance with the fully vectorized␣
,→implementation!
10
1.0.1 Cross-validation
We have implemented the k-Nearest Neighbor classifier but we set the value k = 5 arbitrarily. We
will now determine the best value of this hyperparameter with cross-validation.
[34]: num_folds = 5
k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
X_train_folds = []
y_train_folds = []
################################################################################
# TODO: ␣
,→#
# Split up the training data into folds. After splitting, X_train_folds and ␣
,→#
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
# TODO: ␣
,→#
# Perform k-fold cross validation to find the best value of k. For each ␣
,→#
# where in each case you use all but one of the folds as training data and the␣
,→#
# last fold as a validation set. Store the accuracies for all fold and all ␣
,→#
11
# values of k in the k_to_accuracies dictionary. ␣
,→#
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
np.concatenate(y_train_folds[:validation_fold] +␣
,→y_train_folds[validation_fold + 1:]))
y_pred_fold = classifier.predict(X_train_folds[validation_fold], k =␣
,→ current_k)
k_to_accuracies[current_k].append(np.mean(y_pred_fold ==␣
,→y_train_folds[validation_fold]))
k = 1, accuracy = 0.263000
k = 1, accuracy = 0.257000
k = 1, accuracy = 0.264000
k = 1, accuracy = 0.278000
k = 1, accuracy = 0.266000
k = 3, accuracy = 0.239000
k = 3, accuracy = 0.249000
k = 3, accuracy = 0.240000
k = 3, accuracy = 0.266000
k = 3, accuracy = 0.254000
k = 5, accuracy = 0.248000
k = 5, accuracy = 0.266000
k = 5, accuracy = 0.280000
k = 5, accuracy = 0.292000
k = 5, accuracy = 0.280000
12
k = 8, accuracy = 0.262000
k = 8, accuracy = 0.282000
k = 8, accuracy = 0.273000
k = 8, accuracy = 0.290000
k = 8, accuracy = 0.273000
k = 10, accuracy = 0.265000
k = 10, accuracy = 0.296000
k = 10, accuracy = 0.276000
k = 10, accuracy = 0.284000
k = 10, accuracy = 0.280000
k = 12, accuracy = 0.260000
k = 12, accuracy = 0.295000
k = 12, accuracy = 0.279000
k = 12, accuracy = 0.283000
k = 12, accuracy = 0.280000
k = 15, accuracy = 0.252000
k = 15, accuracy = 0.289000
k = 15, accuracy = 0.278000
k = 15, accuracy = 0.282000
k = 15, accuracy = 0.274000
k = 20, accuracy = 0.270000
k = 20, accuracy = 0.279000
k = 20, accuracy = 0.279000
k = 20, accuracy = 0.282000
k = 20, accuracy = 0.285000
k = 50, accuracy = 0.271000
k = 50, accuracy = 0.288000
k = 50, accuracy = 0.278000
k = 50, accuracy = 0.269000
k = 50, accuracy = 0.266000
k = 100, accuracy = 0.256000
k = 100, accuracy = 0.270000
k = 100, accuracy = 0.263000
k = 100, accuracy = 0.256000
k = 100, accuracy = 0.263000
# plot the trend line with error bars that correspond to standard deviation
accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.
,→items())])
13
plt.title('Cross-validation on k')
plt.xlabel('k')
plt.ylabel('Cross-validation accuracy')
plt.show()
[36]: # Based on the cross-validation results above, choose the best value for k,
# retrain the classifier using all the training data, and test it on the test
# data. You should be able to get above 28% accuracy on the test data.
best_k = k_choices[accuracies_mean.argmax()]
classifier = KNearestNeighbor()
classifier.train(X_train, y_train)
y_test_pred = classifier.predict(X_test, k=best_k)
14
Inline Question 3
Which of the following statements about k-Nearest Neighbor (k-NN) are true in a classification
setting, and for all k? Select all that apply. 1. The decision boundary of the k-NN classifier is linear.
2. The training error of a 1-NN will always be lower than that of 5-NN. 3. The test error of a 1-NN
will always be lower than that of a 5-NN. 4. The time needed to classify a test example with the
k-NN classifier grows with the size of the training set. 5. None of the above.
YourAnswer : 2 and 4.
YourExplanation :
1. False. The decision boundary of the k-NN classifier is not linear. If you consider a dataset
where the classes belong to concentric circles, the decision boundaries in this case will follow
the curvature of the concentric circles.
2. True. The training error of a 1-NN will always be lower than that of 5-NN because for each
training example, its nearest neighbor is always going to be itself, i.e., error of 1-NN will be
zero.
3. False. The test error of a 1-NN will not always be lower than 5-NN. Lets consider an example.
Suppose xtrain = (1, 2, 3, 4, 5) and ytrain = (1, 0, 0, 0, 0). For a test sample of x = 0 with
y = 0, y pred would be 1 for 1-NN (thus, error = 100%) while y pred would be 0 for 5-NN (thus,
error = 0%). The value of k is thus data-dependent, which is why we need to perform cross
validation to determine the best k for your intended application and dataset.
4. True. The testing phase of k-NN is essentially performing comparisons of each test sample
with the entire training set, which needs one full pass through the training set. Infact, the
training phase of k-NN, which consists of remembering the traning set would also grow
with the size of the training set. However, in order to decrease the number of comparisons
and thus improve time complexity, we can use Approximate Nearest Neighbor techniques
(such as k-d trees, ball trees etc.).
2 IMPORTANT
This is the end of this question. Please do the following:
1. Click File -> Save to make sure the latest checkpoint of this notebook is saved to your
Drive.
2. Execute the cell below to download the modified .py files back to your drive.
[0]: import os
f.write(''.join(open(files).readlines()))
15
svm
drive.mount('/content/drive', force_remount=True)
# enter the foldername in your Drive where you have saved the unzipped
# 'cs231n' folder containing the '.py', 'classifiers' and 'datasets'
# folders.
# e.g. 'cs231n/assignments/assignment1/cs231n/'
FOLDERNAME = 'cs231n/assignments/assignment1/cs231n/'
Mounted at /content/drive
/content/drive/My Drive
/content
/content/cs231n/datasets
--2020-04-19 08:28:04-- http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Resolving www.cs.toronto.edu (www.cs.toronto.edu)... 128.100.3.30
Connecting to www.cs.toronto.edu (www.cs.toronto.edu)|128.100.3.30|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 170498071 (163M) [application/x-gzip]
Saving to: cifar-10-python.tar.gz
cifar-10-batches-py/
1
cifar-10-batches-py/data_batch_4
cifar-10-batches-py/readme.html
cifar-10-batches-py/test_batch
cifar-10-batches-py/data_batch_3
cifar-10-batches-py/batches.meta
cifar-10-batches-py/data_batch_2
cifar-10-batches-py/data_batch_5
cifar-10-batches-py/data_batch_1
/content
# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/
,→autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
2
1.1 CIFAR-10 Data Loading and Preprocessing
# Cleaning up variables to prevent loading data multiple times (which may cause␣
,→memory issue)
try:
del X_train, y_train
del X_test, y_test
print('Clear previously loaded data.')
except:
pass
# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)
num_classes = len(classes)
samples_per_class = 7
for y, cls in enumerate(classes):
idxs = np.flatnonzero(y_train == y)
idxs = np.random.choice(idxs, samples_per_class, replace=False)
for i, idx in enumerate(idxs):
plt_idx = i * num_classes + y + 1
plt.subplot(samples_per_class, num_classes, plt_idx)
plt.imshow(X_train[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls)
plt.show()
3
[0]: # Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500
# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
4
# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]
# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
5
print(mean_image[:10]) # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean␣
,→image
plt.show()
# second: subtract the mean image from train and test data
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image
# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])
6
1.2 SVM Classifier
The loss function for a SVM loss is given by:
Your code for this section will all be written inside cs231n/classifiers/linear_svm.py.
As you can see, we have prefilled the function svm_loss_naive which uses for loops to evalu-
ate the multiclass SVM loss function.
[0]: # Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time
loss: 8.974389
The grad returned from the function above is right now all zero. Derive and implement the
gradient for the SVM cost function and implement it inline inside the function svm_loss_naive.
You will find it helpful to interleave your new code inside the existing function.
To check that you have correctly implemented the gradient correctly, you can numerically
estimate the gradient of the loss function and compare the numeric estimate to the gradient that
you computed. We have provided code that does this for you:
[0]: # Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you
# compare them with your analytically computed gradient. The numbers should␣
,→match
7
numerical: -9.571098 analytic: -9.571098, relative error: 1.515400e-11
numerical: 25.058381 analytic: 25.058381, relative error: 8.780704e-12
numerical: 26.673645 analytic: 26.673645, relative error: 1.396251e-11
numerical: 24.445590 analytic: 24.445590, relative error: 2.285221e-11
numerical: 24.340310 analytic: 24.298053, relative error: 8.688018e-04
numerical: 24.188987 analytic: 24.188987, relative error: 3.314928e-12
numerical: -15.150728 analytic: -15.124028, relative error: 8.819086e-04
numerical: 13.356248 analytic: 13.356248, relative error: 7.883734e-12
numerical: -10.771455 analytic: -10.771455, relative error: 1.824420e-11
numerical: 3.441360 analytic: 3.441360, relative error: 1.092105e-12
numerical: -6.963752 analytic: -6.963752, relative error: 5.920761e-11
numerical: 4.056253 analytic: 4.056253, relative error: 5.278267e-12
numerical: -3.111351 analytic: -3.111351, relative error: 1.714545e-10
numerical: 12.732826 analytic: 12.732826, relative error: 3.920699e-11
numerical: -4.949063 analytic: -4.949063, relative error: 7.253134e-11
numerical: -3.586712 analytic: -3.586712, relative error: 2.907606e-11
numerical: 10.951367 analytic: 10.951367, relative error: 3.988842e-11
numerical: -29.997081 analytic: -29.997081, relative error: 7.208802e-12
numerical: 0.023334 analytic: 0.023334, relative error: 3.560866e-09
numerical: -3.237983 analytic: -3.237983, relative error: 5.787310e-11
Inline Question 1
It is possible that once in a while a dimension in the gradcheck will not match exactly. What
could such a discrepancy be caused by? Is it a reason for concern? What is a simple example in
one dimension where a gradient check could fail? How would change the margin affect of the
frequency of this happening? Hint: the SVM loss function is not strictly speaking differentiable
YourAnswer : Indeed this is possible. Recall the SVM loss function: max(0, x), where x is is the
difference between the scores of incorrect classes and correct class plus delta. If x > 0, we incur a
loss, else if x < 0, we clamp/threshold the output to 0. A problem with the max function arises
when we try to calculate the gradients at a certain value of x where the analytic and numerical
gradients mismatch. For instance, the gradient of the SVM loss function is undefined at the hinge,
i.e., at x = 0. Generally, when we have max(x, y), at x = y the gradient is undefined. These non-
differentiable parts of the function are called “kinks” and they lead to failed gradchecks.
Kinks are not a cause for concern since we can still do gradient descent because the gradients
everywhere else apart from the hinge in case of SVM are valid. In practice, it is very rare to actually
have your loss be at this precise point in the function where you can’t compute the gradient.
However, if that happens, it is safe to just skip that gradient update step when doing gradient
descent.
Some examples where gradcheck could fail:
Increasing the margin (delta), would increase the chances of the class scores being higher (and
thus positive) which would mean max(0, x) would output a positive number more often (i.e., the
8
function would tend further away from 0) and thus reduce the frequency of a dimension mismatch
during gradcheck which occurs at x = 0.
[0]: # Next implement the function svm_loss_vectorized; for now only compute the␣
,→loss;
# The losses should match but your vectorized implementation should be much␣
,→faster.
# The naive implementation and the vectorized implementation should match, but
# the vectorized version should still be much faster.
tic = time.time()
_, grad_naive = svm_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Naive loss and gradient: computed in %fs' % (toc - tic))
tic = time.time()
_, grad_vectorized = svm_loss_vectorized(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('Vectorized loss and gradient: computed in %fs' % (toc - tic))
9
difference: 0.000000
10
[0]: # Write the LinearSVM.predict function and evaluate the performance on both the
# training and validation set
y_train_pred = svm.predict(X_train)
print('training accuracy: %f' % (np.mean(y_train == y_train_pred), ))
y_val_pred = svm.predict(X_val)
print('validation accuracy: %f' % (np.mean(y_val == y_val_pred), ))
[0]: # Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of about 0.39 on the validation set.
11
# (training_accuracy, validation_accuracy). The accuracy is simply the fraction
# of data points that are correctly classified.
results = {}
best_val = -1 # The highest validation accuracy that we have seen so far.
best_svm = None # The LinearSVM object that achieved the highest validation␣
,→rate.
################################################################################
# TODO: ␣
,→#
# Write code that chooses the best hyperparameters by tuning on the validation␣
,→#
# training set, compute its accuracy on the training and validation sets, and ␣
,→#
# store these numbers in the results dictionary. In addition, store the best ␣
,→#
# validation accuracy in best_val and the LinearSVM object that achieves this ␣
,→#
# accuracy in best_svm. ␣
,→#
# ␣
,→#
# Hint: You should use a small value for num_iters as you develop your ␣
,→#
# validation code so that the SVMs don't take much time to train; once you are␣
,→#
# confident that your validation code works, you should rerun the validation ␣
,→#
################################################################################
12
for config_num, config in enumerate(grid_search):
print("Hyperparam config #{} of #{}".format(config_num+1, len(grid_search)))
print("Hyperparam config: {}".format(config))
lr, reg = config
svm = LinearSVM()
# store results
results[(lr, reg)] = (current_y_train_accuracy, current_y_val_accuracy)
13
Hyperparam config: (1e-08, 30000.0)
Hyperparam config #6 of #28
Hyperparam config: (1e-08, 35000.0)
Hyperparam config #7 of #28
Hyperparam config: (1e-08, 40000.0)
Hyperparam config #8 of #28
Hyperparam config: (2e-07, 5000.0)
Hyperparam config #9 of #28
Hyperparam config: (2e-07, 10000.0)
Hyperparam config #10 of #28
Hyperparam config: (2e-07, 20000.0)
Hyperparam config #11 of #28
Hyperparam config: (2e-07, 25000.0)
Hyperparam config #12 of #28
Hyperparam config: (2e-07, 30000.0)
Hyperparam config #13 of #28
Hyperparam config: (2e-07, 35000.0)
Hyperparam config #14 of #28
Hyperparam config: (2e-07, 40000.0)
Hyperparam config #15 of #28
Hyperparam config: (1e-07, 5000.0)
Hyperparam config #16 of #28
Hyperparam config: (1e-07, 10000.0)
Hyperparam config #17 of #28
Hyperparam config: (1e-07, 20000.0)
Hyperparam config #18 of #28
Hyperparam config: (1e-07, 25000.0)
Hyperparam config #19 of #28
Hyperparam config: (1e-07, 30000.0)
Hyperparam config #20 of #28
Hyperparam config: (1e-07, 35000.0)
Hyperparam config #21 of #28
Hyperparam config: (1e-07, 40000.0)
Hyperparam config #22 of #28
Hyperparam config: (3e-05, 5000.0)
Hyperparam config #23 of #28
Hyperparam config: (3e-05, 10000.0)
Hyperparam config #24 of #28
Hyperparam config: (3e-05, 20000.0)
Hyperparam config #25 of #28
Hyperparam config: (3e-05, 25000.0)
Hyperparam config #26 of #28
Hyperparam config: (3e-05, 30000.0)
Hyperparam config #27 of #28
Hyperparam config: (3e-05, 35000.0)
Hyperparam config #28 of #28
Hyperparam config: (3e-05, 40000.0)
14
/content/cs231n/classifiers/linear_svm.py:131: RuntimeWarning: overflow
encountered in double_scalars
loss += reg * np.sum(W * W)
/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py:90:
RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/content/cs231n/classifiers/linear_svm.py:131: RuntimeWarning: overflow
encountered in multiply
loss += reg * np.sum(W * W)
# pdb.set_trace()
15
# plot training accuracy
marker_size = 100
colors = [results[x][0] for x in results]
plt.subplot(2, 1, 1)
plt.tight_layout(pad=3)
plt.scatter(x_scatter, y_scatter, marker_size, c=colors, cmap=plt.cm.coolwarm)
plt.colorbar()
plt.xlabel('log learning rate')
plt.ylabel('log regularization strength')
plt.title('CIFAR-10 training accuracy')
16
[0]: # Evaluate the best svm on test set
y_test_pred = best_svm.predict(X_test)
test_accuracy = np.mean(y_test == y_test_pred)
print('linear SVM on raw pixels final test set accuracy: %f' % test_accuracy)
17
for i in range(10):
plt.subplot(2, 5, i + 1)
Inline question 2
Describe what your visualized SVM weights look like, and offer a brief explanation for why
they look they way that they do.
YourAnswer : The visualized SVM weights represent templates for each class that have been
learned from the data. Each of them essentially describe the "essential construction" of the train-
ing images that belong to a particular class. For instance, the weights of the class "horse" look
like a horse with two heads because the dataset likely has images of horses with some of them
looking left and others looking right. With k-NN, we compare a test image with all of the training
examples using an appropriate distance measure (say L1 or L2) in order to predict the class of a
particular test sample -- however, with SVM, we compare the test image with the templates of
each class by using the inner product.
18
2 IMPORTANT
This is the end of this question. Please do the following:
1. Click File -> Save to make sure the latest checkpoint of this notebook is saved to your
Drive.
2. Execute the cell below to download the modified .py files back to your drive.
[0]: import os
f.write(''.join(open(files).readlines()))
19
softmax
drive.mount('/content/drive', force_remount=True)
# enter the foldername in your Drive where you have saved the unzipped
# 'cs231n' folder containing the '.py', 'classifiers' and 'datasets'
# folders.
# e.g. 'cs231n/assignments/assignment1/cs231n/'
FOLDERNAME = 'cs231n/assignments/assignment1/cs231n/'
1
Length: 170498071 (163M) [application/x-gzip]
Saving to: cifar-10-python.tar.gz
cifar-10-batches-py/
cifar-10-batches-py/data_batch_4
cifar-10-batches-py/readme.html
cifar-10-batches-py/test_batch
cifar-10-batches-py/data_batch_3
cifar-10-batches-py/batches.meta
cifar-10-batches-py/data_batch_2
cifar-10-batches-py/data_batch_5
cifar-10-batches-py/data_batch_1
/content
1 Softmax exercise
Complete and hand in this completed worksheet (including its outputs and any supporting code outside of
the worksheet) with your assignment submission. For more details see the assignments page on the course
website.
This exercise is analogous to the SVM exercise. You will:
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
%load_ext autoreload
2
%autoreload 2
[0]: def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000,␣
,→num_dev=500):
"""
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the linear classifier. These are the same steps as we used for the
SVM, but condensed to a single function.
"""
# Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
3
X_test -= mean_image
X_dev -= mean_image
# Generate a random softmax weight matrix and use it to compute the loss.
W = np.random.randn(3073, 10) * 0.0001
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)
4
# As a rough sanity check, our loss should be something close to -log(0.1).
print('loss: %f' % loss)
print('sanity check: %f' % (-np.log(0.1)))
loss: 2.344473
sanity check: 2.302585
Inline Question 1
Why do we expect our loss to be close to -log(0.1)? Explain briefly.**
YourAnswer : Since we are calculating our loss based on random weights (i.e., we haven’t
started the "learning" process yet), we expect that the initial loss has to be close to -log(0.1) because
initially all the classes are equally likely to be chosen. Since CIFAR-10 consists of samples which
belong to one of ten classes, the probability of the correct class will be 1/10 = 0.1. The softmax loss
is the negative log probability of the correct class, therefore it is -log(0.1).
[0]: # Complete the implementation of softmax_loss_naive and implement a (naive)
# version of the gradient that uses nested loops.
loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)
# As we did for the SVM, use numeric gradient checking as a debugging tool.
# The numeric gradient should be close to the analytic gradient.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad, 10)
5
numerical: -5.601057 analytic: -5.601058, relative error: 5.149598e-09
numerical: -2.693165 analytic: -2.693165, relative error: 1.669678e-08
[0]: # Now that we have a naive implementation of the softmax loss function and its␣
,→gradient,
# much faster.
tic = time.time()
loss_naive, grad_naive = softmax_loss_naive(W, X_dev, y_dev, 0.000005)
toc = time.time()
print('naive loss: %e computed in %fs' % (loss_naive, toc - tic))
toc = time.time()
print('vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))
# As we did for the SVM, we use the Frobenius norm to compare the two versions
# of the gradient.
grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')
print('Loss difference: %f' % np.abs(loss_naive - loss_vectorized))
print('Gradient difference: %f' % grad_difference)
[0]: # Use the validation set to tune hyperparameters (regularization strength and
# learning rate). You should experiment with different ranges for the learning
# rates and regularization strengths; if you are careful you should be able to
# get a classification accuracy of over 0.35 on the validation set.
################################################################################
# TODO: ␣
,→#
# Use the validation set to set the learning rate and regularization strength.␣
,→#
6
# This should be identical to the validation that you did for the SVM; save ␣
,→#
################################################################################
learning_rates = [1e-8, 2e-7, 1e-7, 2e-6, 3e-5] #, 1e-3, 3e-3, 1e-2, 3e-2,␣
,→1e-1, 3e-1, 1e1, 3e1, 5e1]
softmax = Softmax()
# store results
results[(learning_rate, regularization_strength)] = \
(current_y_train_accuracy, current_y_val_accuracy)
7
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
8
lr 2.000000e-07 reg 3.000000e+04 train accuracy: 0.331122 val accuracy: 0.350000
lr 2.000000e-07 reg 3.500000e+04 train accuracy: 0.309878 val accuracy: 0.328000
lr 2.000000e-07 reg 4.000000e+04 train accuracy: 0.310612 val accuracy: 0.322000
lr 2.000000e-06 reg 5.000000e+02 train accuracy: 0.404673 val accuracy: 0.403000
lr 2.000000e-06 reg 1.000000e+03 train accuracy: 0.393755 val accuracy: 0.395000
lr 2.000000e-06 reg 5.000000e+03 train accuracy: 0.363184 val accuracy: 0.371000
lr 2.000000e-06 reg 1.000000e+04 train accuracy: 0.347694 val accuracy: 0.369000
lr 2.000000e-06 reg 2.000000e+04 train accuracy: 0.305408 val accuracy: 0.313000
lr 2.000000e-06 reg 2.500000e+04 train accuracy: 0.315755 val accuracy: 0.321000
lr 2.000000e-06 reg 3.000000e+04 train accuracy: 0.304429 val accuracy: 0.294000
lr 2.000000e-06 reg 3.500000e+04 train accuracy: 0.298837 val accuracy: 0.296000
lr 2.000000e-06 reg 4.000000e+04 train accuracy: 0.283694 val accuracy: 0.315000
lr 3.000000e-05 reg 5.000000e+02 train accuracy: 0.206490 val accuracy: 0.196000
lr 3.000000e-05 reg 1.000000e+03 train accuracy: 0.229694 val accuracy: 0.239000
lr 3.000000e-05 reg 5.000000e+03 train accuracy: 0.133265 val accuracy: 0.133000
lr 3.000000e-05 reg 1.000000e+04 train accuracy: 0.102735 val accuracy: 0.092000
lr 3.000000e-05 reg 2.000000e+04 train accuracy: 0.078551 val accuracy: 0.080000
lr 3.000000e-05 reg 2.500000e+04 train accuracy: 0.078122 val accuracy: 0.070000
lr 3.000000e-05 reg 3.000000e+04 train accuracy: 0.060796 val accuracy: 0.054000
lr 3.000000e-05 reg 3.500000e+04 train accuracy: 0.102551 val accuracy: 0.122000
lr 3.000000e-05 reg 4.000000e+04 train accuracy: 0.078551 val accuracy: 0.066000
best validation accuracy achieved during cross-validation: 0.403000
9
that are below that of the correct class by atleast delta. Put differently, how far below the scores
of the incorrect classes are compared to that of the correct class (as long as they are atleast delta
apart) is not a point of consideration for SVM, unlike Softmax.
[0]: # Visualize the learned weights for each class
w = best_softmax.W[:-1,:] # strip out the bias
w = w.reshape(32, 32, 3, 10)
for i in range(10):
plt.subplot(2, 5, i + 1)
10
2 IMPORTANT
This is the end of this question. Please do the following:
1. Click File -> Save to make sure the latest checkpoint of this notebook is saved to your
Drive.
2. Execute the cell below to download the modified .py files back to your drive.
[0]: import os
f.write(''.join(open(files).readlines()))
11
two_layer_net
drive.mount('/content/drive', force_remount=True)
# enter the foldername in your Drive where you have saved the unzipped
# 'cs231n' folder containing the '.py', 'classifiers' and 'datasets'
# folders.
# e.g. 'cs231n/assignments/assignment1/cs231n/'
FOLDERNAME = 'cs231n/assignments/assignment1/cs231n/'
Mounted at /content/drive
/content/drive/My Drive
/content
/content/cs231n/datasets
--2020-04-19 08:27:38-- http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
Resolving www.cs.toronto.edu (www.cs.toronto.edu)... 128.100.3.30
Connecting to www.cs.toronto.edu (www.cs.toronto.edu)|128.100.3.30|:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: 170498071 (163M) [application/x-gzip]
Saving to: cifar-10-python.tar.gz
1
cifar-10-batches-py/
cifar-10-batches-py/data_batch_4
cifar-10-batches-py/readme.html
cifar-10-batches-py/test_batch
cifar-10-batches-py/data_batch_3
cifar-10-batches-py/batches.meta
cifar-10-batches-py/data_batch_2
cifar-10-batches-py/data_batch_5
cifar-10-batches-py/data_batch_1
/content
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
%load_ext autoreload
%autoreload 2
2
input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5
def init_toy_model():
np.random.seed(0)
return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)
def init_toy_data():
np.random.seed(1)
X = 10 * np.random.randn(num_inputs, input_size)
y = np.array([0, 1, 2, 2, 1])
return X, y
net = init_toy_model()
X, y = init_toy_data()
Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
3
[-0.17129677 -1.18803311 -0.47310444]
[-0.51590475 -1.01354314 -0.8504215 ]
[-0.15419291 -0.48629638 -0.52901952]
[-0.00618733 -0.12435261 -0.15226949]]
correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
[-0.17129677 -1.18803311 -0.47310444]
[-0.51590475 -1.01354314 -0.8504215 ]
[-0.15419291 -0.48629638 -0.52901952]
[-0.00618733 -0.12435261 -0.15226949]]
4 Backward pass
Implement the rest of the function. This will compute the gradient of the loss with respect to the
variables W1, b1, W2, and b2. Now that you (hopefully!) have a correctly implemented forward
pass, you can debug your backward pass using a numeric gradient check:
[0]: from cs231n.gradient_check import eval_numerical_gradient
4
param_grad_num = eval_numerical_gradient(f, net.params[param_name],␣
,→verbose=False)
print('%s max relative error: %e' % (param_name, rel_error(param_grad_num,␣
,→grads[param_name])))
5
6 Load the data
Now that you have implemented a two-layer network that passes gradient checks and works on
toy data, it’s time to load up our favorite CIFAR-10 data so we can use it to train a classifier on a
real dataset.
[0]: from cs231n.data_utils import load_CIFAR10
6
try:
del X_train, y_train
del X_test, y_test
print('Clear previously loaded data.')
except:
pass
7
Test data shape: (1000, 3072)
Test labels shape: (1000,)
7 Train a network
To train our network we will use SGD. In addition, we will adjust the learning rate with an ex-
ponential learning rate schedule as optimization proceeds; after each epoch, we will reduce the
learning rate by multiplying it by a decay rate.
[0]: input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)
8
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Classification accuracy')
plt.legend()
plt.show()
def show_net_weights(net):
9
W1 = net.params['W1']
W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
plt.gca().axis('off')
plt.show()
show_net_weights(net)
10
gap between the training and validation accuracy (and that they are both relatively low, ~30%),
suggesting that the model we used has low capacity, and that we should increase its size. On
the other hand, with a very large model we would expect to see more overfitting, which would
manifest itself as a very large gap between the training and validation accuracy.
Tuning. Tuning the hyperparameters and developing intuition for how they affect the final
performance is a large part of using Neural Networks, so we want you to get a lot of practice.
Below, you should experiment with different values of the various hyperparameters, including
hidden layer size, learning rate, number of training epochs, and regularization strength. You
might also consider tuning the learning rate decay, but you should be able to get good performance
using the default value.
Approximate results. You should be aim to achieve a classification accuracy of greater than
48% on the validation set. Our best network gets over 52% on the validation set.
Experiment: You goal in this exercise is to get as good of a result on CIFAR-10 as you can (52%
could serve as a reference), with a fully-connected Neural Network. Feel free implement your
own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the
solver, etc.).
Explain your hyperparameter tuning process below.
YourAnswer : We performed random search-based hyperparameter tuning over a range of
hyperparameters such as hidden layer size, learning rate, number of training epochs, and regu-
larization strength. Random search is a technique where hyperparameter values are randomzied
and combinations of these randomized hyperparameter values are used to find the sweet spot for
the model’s performance. The design space of hyperparameters can be huge as it grows expo-
nentially with the addition of every hyperpamater, leading to the curse of dimensionality. To that
end, we optimize random search by considering randomly-selected pre-defined hyperparameter
configurations in the parameter space and evaluate the network’s performance at these points.
In our implementation, we have parameterized the number of random hyperparameter combina-
tions to look for, to enable deep searches if need be. The other alternative is grid search where
every combination of a pre-set list of reasonable hyperparameter values is exhaustively evaluated
against the model.
Since grid search grows exponentially in complexity as the number of hyperparameters (i.e.,
the dimensionality of our search) increases, it isn’t the optimal choice for our scenario given the
number of hyperparameter values we would need to search over.
Also, in "Random Search for Hyper-Parameter Optimization" by Bergstra and Bengio, the au-
thors show theoretically and empirically that random search is more efficient for hyperparameter
optimization than grid search. They show that this is especially true for lower dimensional data
since the time taken to find the optimum parameter set is lesser than grid search since it uses less
number of iterations.
[0]: best_net = None # store the best model into this
#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained ␣
,→#
# model in best_net. ␣
,→#
# ␣
,→#
11
# To help debug your network, it may help to use visualizations similar to the ␣
,→#
# differences from the ones we saw above for the poorly tuned network. ␣
,→#
# ␣
,→#
# Tweaking hyperparameters by hand can be fun, but you might find it useful to ␣
,→#
#################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
lr = lr_values[np.random.randint(0, len(lr_values))]
reg = reg_values[np.random.randint(0, len(reg_values))]
hidden_size = hidden_size_values[np.random.randint(0,␣
,→len(hidden_size_values))]
12
# randomly permute over learning_rate, regularization_strength,␣
,→hidden_layer_size and num_training_epochs
# store results
results[(lr, reg, hidden_size, epochs)] = \
(current_y_train_accuracy, current_y_val_accuracy)
13
print('lr %e reg %e hidden_size %e num_training_epochs %e train accuracy:␣
,→%f val accuracy: %f' % (
lr, reg, hidden_size, num_training_epochs, train_accuracy,␣
,→val_accuracy))
14
1.500000e+03 train accuracy: 0.512633 val accuracy: 0.492000
lr 1.000000e-03 reg 5.000000e-01 hidden_size 1.500000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.555571 val accuracy: 0.518000
best validation accuracy achieved during cross-validation: 0.542000
15
10 Run on the test set
When you are done experimenting, you should evaluate your final trained network on the test
set; you should get above 48%.
[0]: # Print your test accuracy: this should be above 48%
test_acc = (best_net.predict(X_test) == y_test).mean()
print('Test accuracy: ', test_acc)
11 IMPORTANT
This is the end of this question. Please do the following:
1. Click File -> Save to make sure the latest checkpoint of this notebook is saved to your
Drive.
2. Execute the cell below to download the modified .py files back to your drive.
16
[0]: import os
f.write(''.join(open(files).readlines()))
17
features
drive.mount('/content/drive', force_remount=True)
# enter the foldername in your Drive where you have saved the unzipped
# 'cs231n' folder containing the '.py', 'classifiers' and 'datasets'
# folders.
# e.g. 'cs231n/assignments/assignment1/cs231n/'
FOLDERNAME = 'cs231n/assignments/assignment1/cs231n/'
1
Length: 170498071 (163M) [application/x-gzip]
Saving to: cifar-10-python.tar.gz
cifar-10-batches-py/
cifar-10-batches-py/data_batch_4
cifar-10-batches-py/readme.html
cifar-10-batches-py/test_batch
cifar-10-batches-py/data_batch_3
cifar-10-batches-py/batches.meta
cifar-10-batches-py/data_batch_2
cifar-10-batches-py/data_batch_5
cifar-10-batches-py/data_batch_1
/content
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
%load_ext autoreload
%autoreload 2
2
1.1 Load data
Similar to previous exercises, we will load CIFAR-10 data from disk.
[0]: from cs231n.features import color_histogram_hsv, hog_feature
3
matrix where each column is the concatenation of all feature vectors for a single image.
[0]: from cs231n.features import *
4
Done extracting features for 20000 / 49000 images
Done extracting features for 21000 / 49000 images
Done extracting features for 22000 / 49000 images
Done extracting features for 23000 / 49000 images
Done extracting features for 24000 / 49000 images
Done extracting features for 25000 / 49000 images
Done extracting features for 26000 / 49000 images
Done extracting features for 27000 / 49000 images
Done extracting features for 28000 / 49000 images
Done extracting features for 29000 / 49000 images
Done extracting features for 30000 / 49000 images
Done extracting features for 31000 / 49000 images
Done extracting features for 32000 / 49000 images
Done extracting features for 33000 / 49000 images
Done extracting features for 34000 / 49000 images
Done extracting features for 35000 / 49000 images
Done extracting features for 36000 / 49000 images
Done extracting features for 37000 / 49000 images
Done extracting features for 38000 / 49000 images
Done extracting features for 39000 / 49000 images
Done extracting features for 40000 / 49000 images
Done extracting features for 41000 / 49000 images
Done extracting features for 42000 / 49000 images
Done extracting features for 43000 / 49000 images
Done extracting features for 44000 / 49000 images
Done extracting features for 45000 / 49000 images
Done extracting features for 46000 / 49000 images
Done extracting features for 47000 / 49000 images
Done extracting features for 48000 / 49000 images
Done extracting features for 49000 / 49000 images
results = {}
best_val = -1
best_svm = None
5
################################################################################
# TODO: ␣
,→#
# Use the validation set to set the learning rate and regularization strength.␣
,→#
# This should be identical to the validation that you did for the SVM; save ␣
,→#
# the best trained classifer in best_svm. You might also want to play ␣
,→#
# with different numbers of bins in the color histogram. If you are careful ␣
,→#
# you should be able to get accuracy of near 0.44 on the validation set. ␣
,→#
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
svm = LinearSVM()
# store results
results[(lr, reg)] = (current_y_train_accuracy, current_y_val_accuracy)
6
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
7
Hyperparam config: (1e-07, 0.7)
Hyperparam config #19 of #36
Hyperparam config: (1e-05, 0.01)
Hyperparam config #20 of #36
Hyperparam config: (1e-05, 0.05)
Hyperparam config #21 of #36
Hyperparam config: (1e-05, 0.1)
Hyperparam config #22 of #36
Hyperparam config: (1e-05, 0.2)
Hyperparam config #23 of #36
Hyperparam config: (1e-05, 0.3)
Hyperparam config #24 of #36
Hyperparam config: (1e-05, 0.4)
Hyperparam config #25 of #36
Hyperparam config: (1e-05, 0.5)
Hyperparam config #26 of #36
Hyperparam config: (1e-05, 0.6)
Hyperparam config #27 of #36
Hyperparam config: (1e-05, 0.7)
Hyperparam config #28 of #36
Hyperparam config: (0.001, 0.01)
Hyperparam config #29 of #36
Hyperparam config: (0.001, 0.05)
Hyperparam config #30 of #36
Hyperparam config: (0.001, 0.1)
Hyperparam config #31 of #36
Hyperparam config: (0.001, 0.2)
Hyperparam config #32 of #36
Hyperparam config: (0.001, 0.3)
Hyperparam config #33 of #36
Hyperparam config: (0.001, 0.4)
Hyperparam config #34 of #36
Hyperparam config: (0.001, 0.5)
Hyperparam config #35 of #36
Hyperparam config: (0.001, 0.6)
Hyperparam config #36 of #36
Hyperparam config: (0.001, 0.7)
lr 1.000000e-09 reg 1.000000e-02 train accuracy: 0.100204 val accuracy: 0.092000
lr 1.000000e-09 reg 5.000000e-02 train accuracy: 0.093837 val accuracy: 0.100000
lr 1.000000e-09 reg 1.000000e-01 train accuracy: 0.092898 val accuracy: 0.083000
lr 1.000000e-09 reg 2.000000e-01 train accuracy: 0.107980 val accuracy: 0.113000
lr 1.000000e-09 reg 3.000000e-01 train accuracy: 0.118163 val accuracy: 0.102000
lr 1.000000e-09 reg 4.000000e-01 train accuracy: 0.123122 val accuracy: 0.110000
lr 1.000000e-09 reg 5.000000e-01 train accuracy: 0.086939 val accuracy: 0.081000
lr 1.000000e-09 reg 6.000000e-01 train accuracy: 0.100143 val accuracy: 0.100000
lr 1.000000e-09 reg 7.000000e-01 train accuracy: 0.098878 val accuracy: 0.080000
lr 1.000000e-07 reg 1.000000e-02 train accuracy: 0.134469 val accuracy: 0.123000
lr 1.000000e-07 reg 5.000000e-02 train accuracy: 0.119367 val accuracy: 0.118000
8
lr 1.000000e-07 reg 1.000000e-01 train accuracy: 0.119857 val accuracy: 0.125000
lr 1.000000e-07 reg 2.000000e-01 train accuracy: 0.110898 val accuracy: 0.114000
lr 1.000000e-07 reg 3.000000e-01 train accuracy: 0.119837 val accuracy: 0.132000
lr 1.000000e-07 reg 4.000000e-01 train accuracy: 0.137878 val accuracy: 0.147000
lr 1.000000e-07 reg 5.000000e-01 train accuracy: 0.114408 val accuracy: 0.122000
lr 1.000000e-07 reg 6.000000e-01 train accuracy: 0.135714 val accuracy: 0.136000
lr 1.000000e-07 reg 7.000000e-01 train accuracy: 0.148367 val accuracy: 0.161000
lr 1.000000e-05 reg 1.000000e-02 train accuracy: 0.410347 val accuracy: 0.411000
lr 1.000000e-05 reg 5.000000e-02 train accuracy: 0.411837 val accuracy: 0.406000
lr 1.000000e-05 reg 1.000000e-01 train accuracy: 0.413551 val accuracy: 0.403000
lr 1.000000e-05 reg 2.000000e-01 train accuracy: 0.408918 val accuracy: 0.419000
lr 1.000000e-05 reg 3.000000e-01 train accuracy: 0.416857 val accuracy: 0.416000
lr 1.000000e-05 reg 4.000000e-01 train accuracy: 0.411653 val accuracy: 0.424000
lr 1.000000e-05 reg 5.000000e-01 train accuracy: 0.411388 val accuracy: 0.413000
lr 1.000000e-05 reg 6.000000e-01 train accuracy: 0.413265 val accuracy: 0.410000
lr 1.000000e-05 reg 7.000000e-01 train accuracy: 0.413265 val accuracy: 0.416000
lr 1.000000e-03 reg 1.000000e-02 train accuracy: 0.503918 val accuracy: 0.495000
lr 1.000000e-03 reg 5.000000e-02 train accuracy: 0.503959 val accuracy: 0.491000
lr 1.000000e-03 reg 1.000000e-01 train accuracy: 0.503592 val accuracy: 0.491000
lr 1.000000e-03 reg 2.000000e-01 train accuracy: 0.498633 val accuracy: 0.483000
lr 1.000000e-03 reg 3.000000e-01 train accuracy: 0.498102 val accuracy: 0.490000
lr 1.000000e-03 reg 4.000000e-01 train accuracy: 0.495469 val accuracy: 0.479000
lr 1.000000e-03 reg 5.000000e-01 train accuracy: 0.493102 val accuracy: 0.476000
lr 1.000000e-03 reg 6.000000e-01 train accuracy: 0.489959 val accuracy: 0.473000
lr 1.000000e-03 reg 7.000000e-01 train accuracy: 0.489755 val accuracy: 0.483000
best validation accuracy achieved during cross-validation: 0.495000
[0]: # Evaluate your trained SVM on the test set: you should be able to get at least␣
,→0.40
y_test_pred = best_svm.predict(X_test_feats)
test_accuracy = np.mean(y_test == y_test_pred)
print(test_accuracy)
0.475
examples_per_class = 8
classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse',␣
,→'ship', 'truck']
9
for i, idx in enumerate(idxs):
plt.subplot(examples_per_class, len(classes), i * len(classes) + cls +␣
,→1)
plt.imshow(X_test[idx].astype('uint8'))
plt.axis('off')
if i == 0:
plt.title(cls_name)
plt.show()
10
class mismatches due to the background similarites as well - for instance, consider the plane class
where the misclassified examples have similarities in the background because the sky and sea
colors are similar, which is why they are commonly mixed up with ship images.
In conclusion, the combination of HOG and color histogram feature vectors are not enough
to distinguish between these classes with impeccable accuracy. The HOG descriptor is useful
because it takes into account textures within an image, but HOG does not take into account types
of intra-class variance such as rotation, scaling, translation, illumination, posture deformation etc.
Scale-invariant feature transform (SIFT) by Lowe can help improve the accuracy of our model
in such scenarios. Color histogram features help discriminating classes on the basis of color
schemes. We can weigh color histograms lower when using them in conjuction with more ad-
vanced feature descriptors like HOG or SIFT to reduce the problem of simple color-based class
segregation.
print(X_train_feats.shape)
(49000, 155)
(49000, 154)
input_dim = X_train_feats.shape[1]
hidden_dim = 500
num_classes = 10
################################################################################
# TODO: Train a two-layer neural network on image features. You may want to ␣
,→#
11
# model in the best_net variable. ␣
,→#
################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
lr = lr_values[np.random.randint(0, len(lr_values))]
reg = reg_values[np.random.randint(0, len(reg_values))]
hidden_size = hidden_size_values[np.random.randint(0,␣
,→len(hidden_size_values))]
12
current_y_train_accuracy = np.mean(y_train_pred == y_train)
current_y_val_accuracy = np.mean(y_val_pred == y_val)
# store results
results[(lr, reg, hidden_size, epochs)] = \
(current_y_train_accuracy, current_y_val_accuracy)
13
Hyperparam config: (0.001, 1e-07, 200, 5000)
Hyperparam config #12 of #30
Hyperparam config: (0.001, 1e-07, 50, 3000)
Hyperparam config #13 of #30
Hyperparam config: (0.001, 2e-07, 100, 5000)
Hyperparam config #14 of #30
Hyperparam config: (0.001, 2e-07, 30, 3000)
Hyperparam config #15 of #30
Hyperparam config: (0.3, 1e-06, 150, 5000)
Hyperparam config #16 of #30
Hyperparam config: (0.3, 1e-05, 100, 5000)
Hyperparam config #17 of #30
Hyperparam config: (0.1, 1e-05, 50, 5000)
Hyperparam config #18 of #30
Hyperparam config: (0.3, 1e-07, 30, 5000)
Hyperparam config #19 of #30
Hyperparam config: (0.01, 2e-07, 100, 3000)
Hyperparam config #20 of #30
Hyperparam config: (0.2, 1e-07, 150, 3000)
Hyperparam config #21 of #30
Hyperparam config: (0.3, 2e-07, 100, 5000)
Hyperparam config #22 of #30
Hyperparam config: (0.2, 1e-06, 50, 5000)
Hyperparam config #23 of #30
Hyperparam config: (0.2, 2e-07, 50, 3000)
Hyperparam config #24 of #30
Hyperparam config: (0.01, 1e-06, 50, 5000)
Hyperparam config #25 of #30
Hyperparam config: (0.1, 2e-07, 100, 5000)
Hyperparam config #26 of #30
Hyperparam config: (0.3, 1e-05, 150, 3000)
Hyperparam config #27 of #30
Hyperparam config: (0.3, 1e-06, 50, 5000)
Hyperparam config #28 of #30
Hyperparam config: (0.1, 1e-07, 100, 3000)
Hyperparam config #29 of #30
Hyperparam config: (0.2, 1e-05, 50, 3000)
Hyperparam config #30 of #30
Hyperparam config: (0.2, 1e-05, 200, 3000)
lr 1.000000e-03 reg 1.000000e-07 hidden_size 5.000000e+01 num_training_epochs
3.000000e+03 train accuracy: 0.099898 val accuracy: 0.105000
lr 1.000000e-03 reg 1.000000e-07 hidden_size 2.000000e+02 num_training_epochs
5.000000e+03 train accuracy: 0.100449 val accuracy: 0.078000
lr 1.000000e-03 reg 2.000000e-07 hidden_size 3.000000e+01 num_training_epochs
3.000000e+03 train accuracy: 0.100449 val accuracy: 0.078000
lr 1.000000e-03 reg 2.000000e-07 hidden_size 1.000000e+02 num_training_epochs
5.000000e+03 train accuracy: 0.100429 val accuracy: 0.079000
lr 1.000000e-03 reg 1.000000e-06 hidden_size 8.000000e+01 num_training_epochs
14
3.000000e+03 train accuracy: 0.100449 val accuracy: 0.078000
lr 1.000000e-03 reg 1.000000e-06 hidden_size 8.000000e+01 num_training_epochs
5.000000e+03 train accuracy: 0.100429 val accuracy: 0.079000
lr 1.000000e-02 reg 2.000000e-07 hidden_size 1.000000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.234898 val accuracy: 0.245000
lr 1.000000e-02 reg 2.000000e-07 hidden_size 2.000000e+02 num_training_epochs
5.000000e+03 train accuracy: 0.328857 val accuracy: 0.322000
lr 1.000000e-02 reg 1.000000e-06 hidden_size 5.000000e+01 num_training_epochs
5.000000e+03 train accuracy: 0.292388 val accuracy: 0.293000
lr 1.000000e-02 reg 1.000000e-05 hidden_size 1.500000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.244898 val accuracy: 0.257000
lr 1.000000e-01 reg 1.000000e-07 hidden_size 1.000000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.579755 val accuracy: 0.555000
lr 1.000000e-01 reg 1.000000e-07 hidden_size 2.000000e+02 num_training_epochs
5.000000e+03 train accuracy: 0.622020 val accuracy: 0.569000
lr 1.000000e-01 reg 2.000000e-07 hidden_size 1.000000e+02 num_training_epochs
5.000000e+03 train accuracy: 0.608449 val accuracy: 0.581000
lr 1.000000e-01 reg 1.000000e-06 hidden_size 1.000000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.576714 val accuracy: 0.548000
lr 1.000000e-01 reg 1.000000e-05 hidden_size 5.000000e+01 num_training_epochs
5.000000e+03 train accuracy: 0.587898 val accuracy: 0.546000
lr 2.000000e-01 reg 1.000000e-07 hidden_size 1.500000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.648939 val accuracy: 0.569000
lr 2.000000e-01 reg 2.000000e-07 hidden_size 5.000000e+01 num_training_epochs
3.000000e+03 train accuracy: 0.606673 val accuracy: 0.567000
lr 2.000000e-01 reg 1.000000e-06 hidden_size 5.000000e+01 num_training_epochs
5.000000e+03 train accuracy: 0.622122 val accuracy: 0.565000
lr 2.000000e-01 reg 1.000000e-05 hidden_size 5.000000e+01 num_training_epochs
3.000000e+03 train accuracy: 0.601673 val accuracy: 0.564000
lr 2.000000e-01 reg 1.000000e-05 hidden_size 2.000000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.652776 val accuracy: 0.591000
lr 3.000000e-01 reg 1.000000e-07 hidden_size 3.000000e+01 num_training_epochs
5.000000e+03 train accuracy: 0.592796 val accuracy: 0.547000
lr 3.000000e-01 reg 2.000000e-07 hidden_size 1.000000e+02 num_training_epochs
5.000000e+03 train accuracy: 0.689388 val accuracy: 0.572000
lr 3.000000e-01 reg 2.000000e-07 hidden_size 1.500000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.682653 val accuracy: 0.594000
lr 3.000000e-01 reg 1.000000e-06 hidden_size 5.000000e+01 num_training_epochs
5.000000e+03 train accuracy: 0.628939 val accuracy: 0.553000
lr 3.000000e-01 reg 1.000000e-06 hidden_size 1.500000e+02 num_training_epochs
5.000000e+03 train accuracy: 0.722551 val accuracy: 0.586000
lr 3.000000e-01 reg 1.000000e-05 hidden_size 1.000000e+02 num_training_epochs
5.000000e+03 train accuracy: 0.684347 val accuracy: 0.582000
lr 3.000000e-01 reg 1.000000e-05 hidden_size 1.500000e+02 num_training_epochs
3.000000e+03 train accuracy: 0.679531 val accuracy: 0.582000
best validation accuracy achieved during cross-validation: 0.598000
15
[0]: # Run your best neural net classifier on the test set. You should be able
# to get more than 55% accuracy.
0.579
2 IMPORTANT
This is the end of this question. Please do the following:
1. Click File -> Save to make sure the latest checkpoint of this notebook is saved to your
Drive.
2. Execute the cell below to download the modified .py files back to your drive.
[0]: import os
f.write(''.join(open(files).readlines()))
16