CS401 24 Assign 2 Template Fixed
CS401 24 Assign 2 Template Fixed
Tip
Hidden below is a useful snippet of HTML to setup a restart button in case training gets out
of hand.
Restart
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 1/11
11/11/2024, 22:50 CIFAR10_CPU
Table of Contents
Training a colour image classifier using Flux
Load the dataset
This is a slightly more complex learning task than the MNIST example. CIFAR10 is a dataset of 50k
tiny coloured training images split into 10 classes.
Again, most of the steps are identical with what we did for MNIST task, but some dimesnsion
adjustments are required because the images are slightly bigger and also involve three colour
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 2/11
11/11/2024, 22:50 CIFAR10_CPU
channels.
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 3/11
11/11/2024, 22:50 CIFAR10_CPU
1 begin
2 using Statistics
3 using Flux, Flux.Optimise
4 using MLDatasets: CIFAR10
5 using Images.ImageCore
6 using Flux: onehotbatch, onecold
7 using Base.Iterators: partition
8 using MLUtils
9 using Plots
10 using DataFrames
11 end
1 begin
2 train_x, train_y = CIFAR10(split=:train)[:]
3 train_labels = onehotbatch(train_y, 0:9)
4 classes = ["airplane", "automobile", "bird", "cat",
5 "deer", "dog", "frog", "horse", "ship", "truck"]
6 end;
The images are simply 32 x 32 matrices of numbers in 3 channels (R,G,B). The train_x array contains
50,000 images converted to 32 x 32 x 3 arrays with the third dimension being the 3 channels
(R,G,B). Let's take a look at a random image from train_x. However, to do this we need to define a
function called image , which calls colorview on the training image, which we have to permute
from 32x32x3 to 3x32x32:
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 4/11
11/11/2024, 22:50 CIFAR10_CPU
The first 49k images (in batches of 1,000) will be our training set, and the rest is for validation.
partition handily breaks down the set we give it into consecutive chunks (1,000 in this case).
Task 1
Partition train_x into training and validation parts, along the lines done for the MNIST example.
Note that train is an array of tuples, where the first tuple element is the image and the second is
the label. This is the format in which the Flux defined model expects its training data.
A convolutional neural network is one which defines a kernel and slides it across a matrix to create
an intermediate representation from which to extract features. It creates higher-order features as it
goes into deeper layers, making it suitable for images, where the structure of the image will help us
determine which class to which it belongs.
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 5/11
11/11/2024, 22:50 CIFAR10_CPU
In this case we use two convolutional layers of 16 and 8 channels, respectively. Each convolution
phase is passed through a pooling layer, which reduces the image's dimentionality. The SamePad()
function is used to ensure appropriate padding is used to preserve the dimensions of the original
image.
Finally, the 3D array is flattened to a 512 element 1D vector, which is then passed through a
sequence of fully-connected layers to reduce its length to 10. Finally a softmax transformation is
applied to the 10 element output vector to transform the outputs to probabilities.
Model fix
I neglected to use padding in the last version of the template. This resulted in the convolution
not preserving the original dimensions of the image. The use of SamePad() to calculate the
required padding fixes this.
model = Chain(
Conv((5, 5), 3 => 16, relu, pad=2), # 1_216 parameters
MaxPool((2, 2)),
Conv((5, 5), 16 => 8, relu, pad=2), # 3_208 parameters
MaxPool((2, 2)),
Flux.flatten,
Dense(512 => 256), # 131_328 parameters
Dense(256 => 10), # 2_570 parameters
softmax,
) # Total: 8 arrays, 138_322 parameters, 541.023 KiB.
1 model = Chain(
2 Conv((5,5), 3=>16, pad=SamePad(), relu),
3 MaxPool((2,2)),
4 Conv((5,5), 16=>8, pad=SamePad(), relu),
5 MaxPool((2,2)),
6 Flux.flatten,
7 Dense(512, 256),
8 Dense(256, 10),
9 softmax)
Task 2
Make modifications to the network architecture above to (a) insert a new pair of convolutional
and pooling layers between the existing 1st and 2nd ones. Use 16 filters for the new kernel; (b)
insert a new Dense layer just before the final one that goes from a width of 256 down to 128.
Modify the final Dense layer approprately.
Do these modifications separately and in each case calculate the training time and
classification accuracy. Note that each training test may take up to 30 minutes, depending on
your machine.
Comment on and explain what differences, if any, there are between the baseline model and
these two modifications.
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 6/11
11/11/2024, 22:50 CIFAR10_CPU
Test network
Use this partial network to check the dimension of outputs from each layer (use # to comment out
layers not of interest).
(128, 1)
1 with_terminal() do
2 # Test the model up to flattening step
3 x = rand(Float32, 32, 32, 3, 1) # Example input of shape 32x32x3 (one image)
4 model = Chain(
5 Conv((5,5), 3=>16, pad=SamePad(), relu),
6 MaxPool((2,2)),
7 Conv((5,5), 16=>16, pad=SamePad(), relu),
8 MaxPool((2,2)),
9 Conv((5,5), 16=>8, pad=SamePad(), relu),
10 MaxPool((2,2)),
11 Flux.flatten
12 )
13
14 output = model(x)
15 println(size(output))
16 end
We will use a crossentropy loss and the Momentum optimiser here. Crossentropy is a good option
when working with multiple independent classes. Momentum smooths out the noisy gradients and
helps towards a smooth convergence. Gradually lowering the learning rate along with momentum
helps to maintain adaptivity in our optimisation, preventing overshooting of the error minimum.
1 begin
2 using Flux: crossentropy, Momentum
3 loss(x, y) = sum(crossentropy(model(x), y))
4 optimiser = Momentum(0.01)
5 end;
We can start writing our train loop where we will keep track of some basic accuracy numbers about
our model. We can define an accuracy function for it like so:
Training
Training is where we do a bunch of the interesting operations we defined earlier, and see what our
net is capable of. We will loop over the dataset 10 times and feed the inputs to the neural network
and optimise.
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 7/11
11/11/2024, 22:50 CIFAR10_CPU
0.644
0.644
0.646
0.636
0.625
0.645
0.642
0.644
0.644
0.643
0.644
0.648
0.644
0.642
0.643
0.64
0.636
0.634
0.637
0.651
0.638
1 with_terminal() do
2 correct = []
3 epochs = 100
4 for epoch = 1:epochs
5 for d in train
6 gradients = gradient(Flux.params(model)) do
7 l = loss(d...)
8 end
9 update!(optimiser, Flux.params(model), gradients)
10 end
11 acc = accuracy(validate_x, validate_y)
12 push!(correct, acc)
13 println(acc)
14 end
15 plot(correct, ylim=(0.0, 0.75),
16 legend=:none, title="Accuracy", xlabel="epoch", ylabel="proportion correct")
17 end
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 8/11
11/11/2024, 22:50 CIFAR10_CPU
We will check this by predicting the class label that the neural network outputs for the test test.
We need to perform the exact same preprocessing on this set, as we did on our training set.
Task 3
Partition the test set similarly to the training set.
Task 4
Test a random sample of 10 test images. Display a dataframe of outputs as below. Use a slider
to display each image and its predicted class.
The dataframe below contains probabilities for the 10 classes (left column). The model's
predictions are indicated by the column names.
Tip
Here's some of the code needed to create the DataFrame:
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 9/11
11/11/2024, 22:50 CIFAR10_CPU
DataFrame(round.(model(rand_test), digits=2),
Symbol.(rand_label),
makeunique=true)
This looks similar to how we would expect the results to be. At this point, it's a good idea to see
how our net actually performs on new data, that we have prepared.
Overall accuracy
We iterate over the entire test set to calculate the overall model accuracy.
0.643
1 round(mean([accuracy(test[i]...) for i in 1:10]), digits=3)
This is much better than random chance set at 10% (since we only have 10 classes), and not bad at
all for a small handcrafted network like ours.
Let's take a look at how the net performed on all the classes individually.
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 10/11
11/11/2024, 22:50 CIFAR10_CPU
1 begin
2 class_correct = zeros(10)
3 class_total = zeros(10)
4 for i in 1:10
5 preds = model(test[i][1])
6 lab = test[i][2]
7 for j = 1:1000
8 pred_class = findmax(preds[:, j])[2]
9 actual_class = findmax(lab[:, j])[2]
10 if pred_class == actual_class
11 class_correct[pred_class] += 1
12 end
13 class_total[actual_class] += 1
14 end
15 end
16 end
accuracy class
1 0.629 "airplane"
2 0.702 "automobile"
3 0.378 "bird"
4 0.447 "cat"
5 0.778 "deer"
6 0.526 "dog"
7 0.693 "frog"
8 0.671 "horse"
9 0.797 "ship"
10 0.813 "truck"
The spread seems pretty good, with certain classes performing significantly better than the others.
localhost:1234/edit?id=5eb37fb4-a06e-11ef-1e46-eb0d2c302b1d# 11/11