0% found this document useful (0 votes)
81 views10 pages

Convolutional Neural Networks: Objectives

The document discusses convolutional neural networks (CNNs) for image classification. It explains that CNNs are well-suited for this task as they can detect important features in images. The document then loads and preprocesses image data, reshaping it to a format suitable for CNNs. Finally, it defines a CNN model with convolutional, max pooling, dropout and dense layers to classify images of American Sign Language letters.

Uploaded by

Praveen Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views10 pages

Convolutional Neural Networks: Objectives

The document discusses convolutional neural networks (CNNs) for image classification. It explains that CNNs are well-suited for this task as they can detect important features in images. The document then loads and preprocesses image data, reshaping it to a format suitable for CNNs. Finally, it defines a CNN model with convolutional, max pooling, dropout and dense layers to classify images of American Sign Language letters.

Uploaded by

Praveen Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

03_asl_cnn about:srcdoc

Convolutional Neural Networks

In the previous section, we built and trained a simple model to classify ASL images. The model was able to
learn how to correctly classify the training dataset with very high accuracy, but, it did not perform nearly as
well on validation dataset. This behavior of not generalizing well to non-training data is called overfitting
(https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html), and in this
section, we will introduce a popular kind of model called a convolutional neural network
(https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-
way-3bd2b1164a53) that is especially good for reading images and classifying them.

Objectives

Prep data specifically for a CNN


Create a more sophisticated CNN model, understanding a greater variety of model layers
Train a CNN model and observe its performance

Loading and Preparing the Data

The below cell contains the data preprocessing techniques we learned in the previous labs. Review it and
execute it before moving on:

1 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

In [1]: import tensorflow.keras as keras


import pandas as pd

# Load in our data from CSV files


train_df = pd.read_csv("data/asl_data/sign_mnist_train.csv")
valid_df = pd.read_csv("data/asl_data/sign_mnist_valid.csv")

# Separate out our target values


y_train = train_df['label']
y_valid = valid_df['label']
del train_df['label']
del valid_df['label']

# Separate out our image vectors


x_train = train_df.values
x_valid = valid_df.values

# Turn our scalar targets into binary categories


num_classes = 24
y_train = keras.utils.to_categorical(y_train, num_classes)
y_valid = keras.utils.to_categorical(y_valid, num_classes)

# Normalize our image data


x_train = x_train / 255
x_valid = x_valid / 255

Reshaping Images for a CNN

In the last exercise, the individual pictures in our dataset are in the format of long lists of 784 pixels:

In [2]: x_train.shape, x_valid.shape

Out[2]: ((27455, 784), (7172, 784))

In this format, we don't have all the information about which pixels are near each other. Because of this, we
can't apply convolutions that will detect features. Let's reshape our dataset so that they are in a 28x28 pixel
format. This will allow our convolutions to associate groups of pixels and detect important features.

Note that for the first convolutional layer of our model, we need to have not only the height and width of the
image, but also the number of color channels (https://www.photoshopessentials.com/essentials/rgb/). Our
images are grayscale, so we'll just have 1 channel.

That means that we need to convert the current shape (27455, 784) to (27455, 28, 28, 1) . As a
convenience, we can pass the reshape (https://numpy.org/doc/stable/reference/generated
/numpy.reshape.html#numpy.reshape) method a -1 for any dimension we wish to remain the same,
therefore:

2 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

In [3]: x_train = x_train.reshape(-1,28,28,1)


x_valid = x_valid.reshape(-1,28,28,1)

In [4]: x_train.shape

Out[4]: (27455, 28, 28, 1)

In [5]: x_valid.shape

Out[5]: (7172, 28, 28, 1)

In [6]: x_train.shape, x_valid.shape

Out[6]: ((27455, 28, 28, 1), (7172, 28, 28, 1))

Creating a Convolutional Model

These days, many data scientists start their projects by borrowing model properties from a similar project.
Assuming the problem is not totally unique, there's a great chance that people have created models that will
perform well which are posted in online repositories like TensorFlow Hub (https://www.tensorflow.org/hub) and
the NGC Catalog (https://ngc.nvidia.com/catalog/models). Today, we'll provide a model that will work well for
this problem.

We covered many of the different kinds of layers in the lecture, and we will go over them all here with links to
their documentation. When in doubt, read the official documentation (or ask stackoverflow
(https://stackoverflow.com/)).

3 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

In [7]: from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import (
Dense,
Conv2D,
MaxPool2D,
Flatten,
Dropout,
BatchNormalization,
)

model = Sequential()
model.add(Conv2D(75, (3, 3), strides=1, padding="same", activation="rel
u",
input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(50, (3, 3), strides=1, padding="same", activation="rel
u"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same", activation="rel
u"))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Flatten())
model.add(Dense(units=512, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(units=num_classes, activation="softmax"))

Conv2D (https://www.tensorflow.org/api_docs/python/tf/keras/layers
/Conv2D)

These are our 2D convolutional layers. Small kernels will go over the input image and detect features that are
important for classification. Earlier convolutions in the model will detect simple features such as lines. Later
convolutions will detect more complex features. Let's look at our first Conv2D layer:

model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same'...)

75 refers to the number of filters that will be learned. (3,3) refers to the size of those filters. Strides refer to the
step size that the filter will take as it passes over the image. Padding refers to whether the output image that's
created from the filter will match the size of the input image.

BatchNormalization (https://www.tensorflow.org/api_docs/python/tf/keras
/layers/BatchNormalization)

4 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

Like normalizing our inputs, batch normalization scales the values in the hidden layers to improve training.
Read more about it in detail here (https://blog.paperspace.com/busting-the-myths-about-batch-
normalization/).

MaxPool2D (https://www.tensorflow.org/api_docs/python/tf/keras/layers
/MaxPool2D)

Max pooling takes an image and essentially shrinks it to a lower resolution. It does this to help the model be
robust to translation (objects moving side to side), and also makes our model faster.

Dropout (https://www.tensorflow.org/api_docs/python/tf/keras/layers
/Dropout)

Dropout is a technique for preventing overfitting. Dropout randomly selects a subset of neurons and turns
them off, so that they do not participate in forward or backward propagation in that particular pass. This helps
to make sure that the network is robust and redundant, and does not rely on any one area to come up with
answers.

Flatten (https://www.tensorflow.org/api_docs/python/tf/keras/layers
/Flatten)

Flatten takes the output of one layer which is multidimensional, and flattens it into a one-dimensional array.
The output is called a feature vector and will be connected to the final classification layer.

Dense (https://www.tensorflow.org/api_docs/python/tf/keras/layers
/Dense)

We have seen dense layers before in our earlier models. Our first dense layer (512 units) takes the feature
vector as input and learns which features will contribute to a particular classification. The second dense layer
(24 units) is the final classification layer that outputs our prediction.

Summarizing the Model

5 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

This may feel like a lot of information, but don't worry. It's not critical that to understand everything right now in
order to effectively train convolutional models. Most importantly we know that they can help with extracting
useful information from images, and can be used in classification tasks.

Here, we summarize the model we just created. Notice how it has fewer trainable parameters than the model
in the previous notebook:

In [8]: model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 28, 28, 75) 750
_________________________________________________________________
batch_normalization (BatchNo (None, 28, 28, 75) 300
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 75) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 14, 14, 50) 33800
_________________________________________________________________
dropout (Dropout) (None, 14, 14, 50) 0
_________________________________________________________________
batch_normalization_1 (Batch (None, 14, 14, 50) 200
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 50) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 7, 7, 25) 11275
_________________________________________________________________
batch_normalization_2 (Batch (None, 7, 7, 25) 100
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 25) 0
_________________________________________________________________
flatten (Flatten) (None, 400) 0
_________________________________________________________________
dense (Dense) (None, 512) 205312
_________________________________________________________________
dropout_1 (Dropout) (None, 512) 0
_________________________________________________________________
dense_1 (Dense) (None, 24) 12312
=================================================================
Total params: 264,049
Trainable params: 263,749
Non-trainable params: 300
_________________________________________________________________

Compiling the Model

6 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

We'll compile the model just like before:

In [9]: model.compile(loss="categorical_crossentropy", metrics=["accuracy"])

Training the Model

Despite the very different model architecture, the training looks exactly the same. Run the cell below to train
for 20 epochs and let's see if the accuracy improves:

7 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

In [10]: model.fit(x_train, y_train, epochs=20, verbose=1, validation_data=(x_va


lid, y_valid))

8 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

Epoch 1/20
858/858 [==============================] - 5s 6ms/step - loss: 0.3007
- accuracy: 0.9084 - val_loss: 0.8315 - val_accuracy: 0.7683
Epoch 2/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0201
- accuracy: 0.9943 - val_loss: 0.6473 - val_accuracy: 0.8600
Epoch 3/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0111
- accuracy: 0.9968 - val_loss: 0.2570 - val_accuracy: 0.9525
Epoch 4/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0082
- accuracy: 0.9981 - val_loss: 0.2768 - val_accuracy: 0.9476
Epoch 5/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0077
- accuracy: 0.9979 - val_loss: 1.3836 - val_accuracy: 0.7810
Epoch 6/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0070
- accuracy: 0.9979 - val_loss: 0.3430 - val_accuracy: 0.9511
Epoch 7/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0034
- accuracy: 0.9989 - val_loss: 0.4399 - val_accuracy: 0.9449
Epoch 8/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0034
- accuracy: 0.9989 - val_loss: 0.3869 - val_accuracy: 0.9564
Epoch 9/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0034
- accuracy: 0.9989 - val_loss: 0.3922 - val_accuracy: 0.9527
Epoch 10/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0033
- accuracy: 0.9992 - val_loss: 0.3484 - val_accuracy: 0.9543
Epoch 11/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0041
- accuracy: 0.9991 - val_loss: 1.7482 - val_accuracy: 0.7867
Epoch 12/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0034
- accuracy: 0.9991 - val_loss: 0.2865 - val_accuracy: 0.9647
Epoch 13/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0020
- accuracy: 0.9997 - val_loss: 0.4210 - val_accuracy: 0.9371
Epoch 14/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0042
- accuracy: 0.9990 - val_loss: 0.3172 - val_accuracy: 0.9586
Epoch 15/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0013
- accuracy: 0.9995 - val_loss: 0.3165 - val_accuracy: 0.9649
Epoch 16/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0015
- accuracy: 0.9996 - val_loss: 0.4247 - val_accuracy: 0.9572
Epoch 17/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0016
- accuracy: 0.9996 - val_loss: 0.8863 - val_accuracy: 0.8879
Epoch 18/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0018
- accuracy: 0.9997 - val_loss: 0.4617 - val_accuracy: 0.9407
Epoch 19/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0016

9 of 10 12/07/2021, 04:59 pm
03_asl_cnn about:srcdoc

- accuracy: 0.9995 - val_loss: 0.4430 - val_accuracy: 0.9586


Epoch 20/20
858/858 [==============================] - 4s 5ms/step - loss: 0.0027
- accuracy: 0.9992 - val_loss: 0.3215 - val_accuracy: 0.9636
Out[10]: <tensorflow.python.keras.callbacks.History at 0x7f92c0663978>

Discussion of Results

It looks like this model is significantly improved! The training accuracy is very high, and the validation
accuracy has improved as well. This is a great result, as all we had to do was swap in a new model.

You may have noticed the validation accuracy jumping around. This is an indication that our model is still not
generalizing perfectly. Fortunately, there's more that we can do. Let's talk about it in the next lecture.

Summary

In this section, we utilized several new kinds of layers to implement a CNN, which performed better than the
more simple model used in the last section. Hopefully the overall process of creating and training a model
with prepared data is starting to become even more familiar.

Clear the Memory


Before moving on, please execute the following cell to clear up the GPU memory. This is required to move on
to the next notebook.

In [11]: import IPython


app = IPython.Application.instance()
app.kernel.do_shutdown(True)

Out[11]: {'status': 'ok', 'restart': True}

Next

In the last several sections you have focused on the creation and training of models. In order to further
improve performance, you will now turn your attention to data augmentation, a collection of techniques that
will allow your models to train on more and better data than what you might have originally at your disposal.

10 of 10 12/07/2021, 04:59 pm

You might also like