0% found this document useful (0 votes)
31 views

Assignment 4x

Uploaded by

lizahxm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Assignment 4x

Uploaded by

lizahxm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

assignment_4 22/05/2022, 23:45

Assignment 4
Assignment 4: Neural Networks

Goal: Get familiar with neural networks by implementing them and applying them to image classification.

In this assignment we are going to learn about neural networks (NNs). The goal is to implement two
neural networks: a fully-connected neural network, a convolutional neural network, and analyze their
behavior.

The considered task is image classification. We consider a dataset of small natural images (see the
additional file) with multiple classes. We aim at formulating a model (a neural network) and learning it
using the negative log-likelihood function (i.e., the cross-entropy loss) as the objective function, and the
stochastic gradient descent as the optimizer.

In this assignment, the code must be implemented in PyTorch.

1 Understanding the problem


The considered problem is about classifying images to $L$ classes. In the first part of the assignment,
you are asked get familiar with PyTorch, a deep learning library, and the basics of neural networks, and
implement neural-network-based classifiers. For this purpose, we will start with classifying small images
(8px x 8px) of handwritten digits to one of 10 classes. The dataset is very small and all experiments could
be achieved within a couple of minutes.

In the second part, you are asked to implement the whole pipeline for a given dataset by yourself.

Please run the code below and spend a while on analyzing the images.

If any code line is unclear to you, please read on that in numpy, scipy, matplotlib and PyTorch docs.

about:srcdoc Page 1 of 19
assignment_4 22/05/2022, 23:45

In [48]:

import os
import sys
!{sys.executable} -m pip install torch
import numpy as np
import matplotlib.pyplot as plt
import torch
from sklearn.datasets import load_digits
from sklearn import datasets
from torch.utils.data import Dataset, DataLoader
import torch.nn as nn
import torch.nn.functional as F

EPS = 1.e-7

Requirement already satisfied: torch in /Users/phil/opt/anaconda3/


lib/python3.8/site-packages (1.11.0)
Requirement already satisfied: typing-extensions in /Users/phil/op
t/anaconda3/lib/python3.8/site-packages (from torch) (3.7.4.2)

In [49]:

# IF YOU USE COLAB, THIS IS VERY USEFUL! OTHERWISE, PLEASE REMOVE IT.
# mount drive: WE NEED IT FOR SAVING IMAGES!
#from google.colab import drive
#drive.mount('/content/gdrive')

In [50]:

# IF YOU USE COLAB, THIS IS VERY USEFUL! OTHERWISE, PLEASE REMOVE IT.
# PLEASE CHANGE IT TO YOUR OWN GOOGLE DRIVE!
#results_dir = '/content/gdrive/My_Drive/Colab Notebooks/TEACHING/'

about:srcdoc Page 2 of 19
assignment_4 22/05/2022, 23:45

In [51]:

# PLEASE DO NOT REMOVE!


# This is a class for the dataset of small (8px x 8px) digits.
# Please try to understand in details how it works!
class Digits(Dataset):
"""Scikit-Learn Digits dataset."""

def __init__(self, mode='train', transforms=None):


digits = load_digits()
if mode == 'train':
self.data = digits.data[:1000].astype(np.float32)
self.targets = digits.target[:1000]
elif mode == 'val':
self.data = digits.data[1000:1350].astype(np.float32)
self.targets = digits.target[1000:1350]
else:
self.data = digits.data[1350:].astype(np.float32)
self.targets = digits.target[1350:]

self.transforms = transforms

def __len__(self):
return len(self.data)

def __getitem__(self, idx):


sample_x = self.data[idx]
sample_y = self.targets[idx]
if self.transforms:
sample_x = self.transforms(sample_x)
return (sample_x, sample_y)

about:srcdoc Page 3 of 19
assignment_4 22/05/2022, 23:45

In [52]:

# PLEASE DO NOT REMOVE


# Here, we plot some images (8px x 8px).
digits = load_digits()
x = digits.data[:16].astype(np.float32)

fig_data, axs = plt.subplots(4,4,figsize=(4, 4))


fig_data.tight_layout()

for i in range(4):
for j in range(4):
img = np.reshape(x[4*i+j],(8,8))
axs[i,j].imshow(img, cmap='gray')
axs[i,j].axis('off')

2 Neural Networks for Digits (4pt)


In this assignment, you are asked to implement a neural network (NN) classifier. Please take a look at the
class below and fill in the missing parts.

NOTE: Please pay attention to the inputs and outputs of each function.

2.1 Neural Network Classifier

Below, we have two helper modules (layers) that can be used to reshape and flatten a tensor. They are
useful for creating sequentials with convolutional layers.

about:srcdoc Page 4 of 19
assignment_4 22/05/2022, 23:45

In [53]:

# PLEASE DO NOT REMOVE!


# Here are two auxiliary functions that can be used for a convolutional NN (CN
N).

# This module reshapes an input (matrix -> tensor).


class Reshape(nn.Module):
def __init__(self, size):
super(Reshape, self).__init__()
self.size = size # a list

def forward(self, x):


assert x.shape[1] == np.prod(self.size)
return x.view(x.shape[0], *self.size)

# This module flattens an input (tensor -> matrix) by blending dimensions


# beyond the batch size.
class Flatten(nn.Module):
def __init__(self):
super(Flatten, self).__init__()

def forward(self, x):


return x.view(x.shape[0], -1)

Below is the main class for a classifier parameterized by a neural network.

about:srcdoc Page 5 of 19
assignment_4 22/05/2022, 23:45

In [54]:

#=========
# GRADING:
# 0
# 0.5 pt if code works but it is explained badly
# 1.0 pt if code works and it is explained well
#=========
# Implement a neural network (NN) classifier.
class ClassifierNeuralNet(nn.Module):
def __init__(self, classnet):
super(ClassifierNeuralNet, self).__init__()
# We provide a sequential module with layers and activations
self.classnet = classnet
# The loss function (the negative log-likelihood)
self.nll = nn.NLLLoss(reduction='none') #it requires log-softmax as in
put!!

# This function classifies an image x to a class.


# The output must be a class label (long).
def classify(self, x):
#------
# PLEASE FILL IN
y_pred = self.classnet(x).max(1).indices

return y_pred

# This function is crucial for a module in PyTorch.


# In our framework, this class outputs a value of the loss function.
def forward(self, x, y, reduction='avg'):
#------
# PLEASE FILL IN
loss = self.nll(classnet(x),y.long())
#------
if reduction == 'sum':
return loss.sum()
else:
return loss.mean()

Question 1 (0-0.5pt): What is the objective function for a classification task? In other words, what is
nn.NLLLoss in the code above? Please write it in mathematical terms.

Answer: In this case, the objective function is the negative log likelihood function.

Maximum log likelihood function: log p(Dy/Dx,alpha) = Sum(log p(yi/xi,alpha))

Minimum log likelihood function: -max log - likelihood function

about:srcdoc Page 6 of 19
assignment_4 22/05/2022, 23:45

Question 2 (0-0.5pt): In the code above, it is said to use the logarithm of the softmax as the final
activation function. Is it correct to use the log-softmax instead of the softmax for making predictions (i.e.,
picking the most probable label).

Answer: Yes, it is fine because the logarithm does not change the most probable label, it changes only
the probability to the log-probability.

2.2 Evaluation

In [55]:

# PLEASE DO NOT REMOVE


def evaluation(test_loader, name=None, model_best=None, epoch=None):
# If available, load the best performing model
if model_best is None:
model_best = torch.load(name + '.model')

model_best.eval()# set the model to the evaluation mode


loss_test = 0.
loss_error = 0.
N = 0.
# start evaluation
for indx_batch, (test_batch, test_targets) in enumerate(test_loader):
# loss (nll)
loss_test_batch = model_best.forward(test_batch, test_targets, reduction='
sum')
loss_test = loss_test + loss_test_batch.item()
# classification error
y_pred = model_best.classify(test_batch)
e = 1.*(y_pred == test_targets)
loss_error = loss_error + (1. - e).sum().item()
# the number of examples
N = N + test_batch.shape[0]
# divide by the number of examples
loss_test = loss_test / N
loss_error = loss_error / N

# Print the performance


if epoch is None:
print(f'-> FINAL PERFORMANCE: nll={loss_test}, ce={loss_error}')
else:
if epoch % 10 == 0:
print(f'Epoch: {epoch}, val nll={loss_test}, val ce={loss_error}')

return loss_test, loss_error

# An auxiliary function for plotting the performance curves


def plot_curve(name, signal, file_name='curve.pdf', xlabel='epochs', ylabel='n
ll', color='b-', test_eval=None):
# plot the curve

about:srcdoc Page 7 of 19
assignment_4 22/05/2022, 23:45

plt.plot(np.arange(len(signal)), signal, color, linewidth='3', label=ylabel


+' val')
# if available, add the final (test) performance
if test_eval is not None:
plt.hlines(test_eval, xmin=0, xmax=len(signal), linestyles='dashed', label
=ylabel +' test')
plt.text(len(signal), test_eval, "{:.3f}".format(test_eval),)
# set x- and ylabels, add legend, save the figure
plt.xlabel(xlabel), plt.ylabel(ylabel)
plt.legend()
plt.savefig(name + file_name, bbox_inches='tight')
plt.show()

2.3 Training procedure

about:srcdoc Page 8 of 19
assignment_4 22/05/2022, 23:45

In [56]:

# PLEASE DO NOT REMOVE!


# The training procedure
def training(name, max_patience, num_epochs, model, optimizer, training_loader
, val_loader):
nll_val = []
error_val = []
best_nll = 1000.
patience = 0

# Main training loop


for e in range(num_epochs):
model.train() # set the model to the training mode
# load batches
for indx_batch, (batch, targets) in enumerate(training_loader):
# calculate the forward pass (loss function for given images and labels)
loss = model.forward(batch, targets)
# remember we need to zero gradients! Just in case!
optimizer.zero_grad()
# calculate backward pass
loss.backward(retain_graph=True)
# run the optimizer
optimizer.step()

# Validation: Evaluate the model on the validation data


loss_e, error_e = evaluation(val_loader, model_best=model, epoch=e)
nll_val.append(loss_e) # save for plotting
error_val.append(error_e) # save for plotting

# Early-stopping: update the best performing model and break training if n


o
# progress is observed.
if e == 0:
torch.save(model, name + '.model')
best_nll = loss_e
else:
if loss_e < best_nll:
torch.save(model, name + '.model')
best_nll = loss_e
patience = 0
else:
patience = patience + 1

if patience > max_patience:


break

# Return nll and classification error.


nll_val = np.asarray(nll_val)
error_val = np.asarray(error_val)

return nll_val, error_val

about:srcdoc Page 9 of 19
assignment_4 22/05/2022, 23:45

2.4 Experiments

Initialize dataloaders

In [57]:

# PLEASE DO NOT REMOVE


# Initialize training, validation and test sets.
train_data = Digits(mode='train')
val_data = Digits(mode='val')
test_data = Digits(mode='test')

# Initialize data loaders.


training_loader = DataLoader(train_data, batch_size=64, shuffle=True)
val_loader = DataLoader(val_data, batch_size=64, shuffle=False)
test_loader = DataLoader(test_data, batch_size=64, shuffle=False)

In [58]:

print("How do we get our data from Digits class? \n")


print(f"Feature example: {train_data[1][0]}")
print(f"Feature example shape: {train_data[1][0].shape}")
print(f"Label example: {train_data[1][1]}")

How do we get our data from Digits class?

Feature example: [ 0. 0. 0. 12. 13. 5. 0. 0. 0. 0. 0. 11.


16. 9. 0. 0. 0. 0.
3. 15. 16. 6. 0. 0. 0. 7. 15. 16. 16. 2. 0. 0. 0. 0.
1. 16.
16. 3. 0. 0. 0. 0. 1. 16. 16. 6. 0. 0. 0. 0. 1. 16. 1
6. 6.
0. 0. 0. 0. 0. 11. 16. 10. 0. 0.]
Feature example shape: (64,)
Label example: 1

about:srcdoc Page 10 of 19
assignment_4 22/05/2022, 23:45

In [59]:

print("How do we get our data from Pytorch DataLoader class? \n")


train_features, train_labels = next(iter(training_loader))
print(f"Feature batch shape: {train_features.size()}")
print(f"Labels batch shape: {train_labels.size()}")

print("\n\nWhat happens if we reshape a feature batch? \n")


reshape = Reshape(size=(1,8,8))
train_features_reshaped = reshape(train_features)
print(f"Feature batch shape after reshape: {train_features_reshaped.size()}")

print("\n\nWhat happens if we flatten a reshaped feature batch? \n")


flatten = Flatten()
train_features_flattened = flatten(train_features_reshaped)
print(f"Feature batch shape after flatten: {train_features_flattened.size()}")

How do we get our data from Pytorch DataLoader class?

Feature batch shape: torch.Size([64, 64])


Labels batch shape: torch.Size([64])

What happens if we reshape a feature batch?

Feature batch shape after reshape: torch.Size([64, 1, 8, 8])

What happens if we flatten a reshaped feature batch?

Feature batch shape after flatten: torch.Size([64, 64])

Initialize hyperparameters

about:srcdoc Page 11 of 19
assignment_4 22/05/2022, 23:45

In [60]:

# PLEASE DO NOT REMOVE


# Hyperparameters
# -> data hyperparams
D = 64 # input dimension

# -> model hyperparams


M = 256 # the number of neurons in scale (s) and translation (t) nets
K = 10 # the number of labels
num_kernels = 32 #the number of kernels for CNN

# -> training hyperparams


lr = 1e-3 # learning rate
wd = 1e-5 # weight decay
num_epochs = 1000 # max. number of epochs
max_patience = 20 # an early stopping is used, if training doesn't improve for
longer than 20 epochs, it is stopped

Running experiments

In the code below, you are supposed to implement architectures for MLP and CNN. For properly
implementing these architectures, you can get 0.5pt for each of them.

In [62]:

# PLEASE DO NOT REMOVE and FILL IN WHEN NECESSARY!


# We will run two models: MLP and CNN
names = [
'classifier_mlp',
'classifier_cnn']

# loop over models


for name in names:
print('\n-> START {}'.format(name))
# Create a folder (REMEMBER: You must mount your drive if you use Colab!)
if name == 'classifier_mlp':
name = name + '_M_' + str(M)
elif name == 'classifier_cnn':
name = name + '_M_' + str(M) + '_kernels_' + str(num_kernels)

# Create a folder if necessary


# result_dir = os.path.join(results_dir, 'results', name + '/')
# if not(os.path.exists(result_dir)):
# os.mkdir(result_dir)

# MLP
if name[0:14] == 'classifier_mlp':
#=========
# GRADING:
# 0

about:srcdoc Page 12 of 19
assignment_4 22/05/2022, 23:45

# 0.5pt if properly implemented


#=========
#------
# PLEASE FILL IN:
classnet = nn.Sequential(
nn.Linear(D, 250),
nn.Linear(250, 100),
nn.Linear(100, 32),
nn.LogSoftmax(dim = 1)
)

# You are asked here to propose your own architecture


# NOTE: Please remember that the output must be LogSoftmax!
#------

# CNN
elif name[0:14] == 'classifier_cnn':

#=========
# GRADING:
# 0
# 0.5pt if properly implemented
#=========
#------
# PLEASE FILL IN:

size = (1,8,8)

classnet = nn.Sequential(
Reshape(size=(1, 8, 8)),
nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride = 1),
nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride = 1),
nn.Flatten(),
nn.Linear(M, 10),
nn.LogSoftmax(dim=1))
#
# You are asked here to propose your own architecture
# NOTE: Plese note that the images are represented as vectors, thus, you m
ust
# use Reshape(size) as the first layer, and Flatten() after all convolutio
nal
# layers and before linear layers.
# NOTE: Please remember that the output must be LogSoftmax!
#------

# Init ClassifierNN
model = ClassifierNeuralNet(classnet)

# Init OPTIMIZER (here we use ADAMAX)

about:srcdoc Page 13 of 19
assignment_4 22/05/2022, 23:45

optimizer = torch.optim.Adamax([p for p in model.parameters() if p.requires_


grad == True], lr=lr, weight_decay=wd)

# Training procedure
nll_val, error_val = training(name=name,
max_patience=max_patience,
num_epochs=num_epochs,
model=model,
optimizer=optimizer,
training_loader=training_loader,
val_loader=val_loader)

# The final evaluation (on the test set)


test_loss, test_error = evaluation(name=name, test_loader=test_loader)
# write the results to a file
f = open(name + '_test_loss.txt', "w")
f.write('NLL: ' + str(test_loss) + '\nCE: ' + str(test_error))
f.close()
# create curves
plot_curve(name, nll_val, file_name='_nll_val_curve.pdf', ylabel='nll', test
_eval=test_loss)
plot_curve(name, error_val, file_name='_ca_val_curve.pdf', ylabel='ce', colo
r='r-',
test_eval=test_error)

-> START classifier_mlp


Epoch: 0, val nll=0.5433347920009068, val ce=0.16285714285714287
Epoch: 10, val nll=0.15177136148725237, val ce=0.03428571428571429
Epoch: 20, val nll=0.1306105194772993, val ce=0.03428571428571429
Epoch: 30, val nll=0.13596793515341624, val ce=0.03428571428571429
Epoch: 40, val nll=0.13106895531926838, val ce=0.03428571428571429
Epoch: 50, val nll=0.12974674650600979, val ce=0.03428571428571429
Epoch: 60, val nll=0.13179131201335362, val ce=0.03428571428571429
-> FINAL PERFORMANCE: nll=0.4181273458254684, ce=0.089485458612975
4

about:srcdoc Page 14 of 19
assignment_4 22/05/2022, 23:45

-> START classifier_cnn


Epoch: 0, val nll=1.3602893175397601, val ce=0.4514285714285714
Epoch: 10, val nll=0.20448618480137418, val ce=0.05428571428571428
4
Epoch: 20, val nll=0.13120422022683279, val ce=0.03714285714285714
4
Epoch: 30, val nll=0.09847970860345023, val ce=0.03428571428571429
Epoch: 40, val nll=0.09816357306071691, val ce=0.02857142857142857
Epoch: 50, val nll=0.07949816703796386, val ce=0.02857142857142857
Epoch: 60, val nll=0.07780093295233591, val ce=0.02857142857142857
Epoch: 70, val nll=0.07747297406196595, val ce=0.02857142857142857
Epoch: 80, val nll=0.07752850753920419, val ce=0.02857142857142857
Epoch: 90, val nll=0.07367285541125707, val ce=0.02857142857142857
Epoch: 100, val nll=0.07535132501806531, val ce=0.0257142857142857
14
-> FINAL PERFORMANCE: nll=0.25273222891276315, ce=0.06263982102908
278

about:srcdoc Page 15 of 19
assignment_4 22/05/2022, 23:45

2.5 Analysis

Question 3 (0-0.5pt): Please compare the convergence of MLP and CNN in terms of the loss function
and the classification error.

Answer: CNN has both lower scores for the loss function and for the classification error. Which means
CNN's performance is better than MLP's.

about:srcdoc Page 16 of 19
assignment_4 22/05/2022, 23:45

about:srcdoc Page 17 of 19
assignment_4 22/05/2022, 23:45

about:srcdoc Page 18 of 19
assignment_4 22/05/2022, 23:45

Question 4 (0-0.5pt): In general, for a properly picked architectures, a CNN should work better than an
MLP. Did you notice that? Why (in general) CNNs are better suited to images than MLPs?

Answer: See Question 3 for the images. CNN's are better suited to images thsan MLP's because they
take a tensor as input while MLP's do not. Therefore CNN's can understand spatial relation between the
pixels of images better, which allows CNN's to outperform MLP's when it comes to more complex
images.

3 Application to Street House View Numbers (SVHN) (6pt)

Please repeat (some) of the code in the previous section and apply a bigger convolutional neural network
(CNN) to the following dataset:

http://ufldl.stanford.edu/housenumbers/ (http://ufldl.stanford.edu/housenumbers/)

Please follow the following steps:

1. (1pt) Create appropriate Dataset class. Please remember to use the original training data and test
data, and also to create a validation set from the traning data (at least 10% of the training examples).
Do not use extra examples!
2. (1pt) Implement an architecture that will give at most 0.1 classification error. For instance, see this
paper as a reference:
https://arxiv.org/pdf/1204.3968.pdf#:~:text=The%20SVHN%20classification%20dataset%20%5B8,set%20
(https://arxiv.org/pdf/1204.3968.pdf#:~:text=The%20SVHN%20classification%20dataset%20%5B8,set%20
3. (1pt) Think of an extra component that could improve the performance (e.g., a regularization, specific
activation functions).
4. (1pt) Provide a good explanation of the applied architecture and a description of all components.
5. (2pt) Analyze the results.

Please be very precise, comment your code and provide a comprehensive and clear analysis.

In [ ]:

about:srcdoc Page 19 of 19

You might also like