6 CNN
6 CNN
Learning
Samatrix Consulting Pvt Ltd
Convolutional Neural Network
Convolutional Neural Network
• In this chapter, we will learn about a deep learning technique called
convolution.
• Convolution has become the standard method of classifying,
manipulating, and generating images.
• It is easy to implement convolution in deep learning.
Convolutional Neural Network
• In this chapter, we will focus on the key ideas behind convolution and
the related techniques that can be used to make convolution work on
images.
• Convolutional neural networks have been used to recognize the
people in a photograph, detect and classify different types of skin
cancers, repair image damage such as dust, scratches, and blur, and
classify people’s age and gender from their photos.
• Convolutional neural networks are also used in natural language
processing.
Convolutional Neural Network
• Convolutional neural networks are specialized for processing data
that has a grid-like topology.
• Examples include time-series data, which is a 1-D grid taking samples
at regular time intervals, and image data, which is a 2-D grid of pixels.
• The network uses a mathematical operation called convolution hence
it gets the name “convolutional neural network”
What is Convolution
In the following equation, the convolution operation is denoted by an
asterisk
𝑠 𝑡 = (𝑥 ∗ 𝑤)(𝑡)
The first argument, the function 𝑥, is often referred to as the input. The
second argument, the function 𝑤, is referred to as kernel. The output is
referred to as the feature map.
import tensorflow as tf
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(4, activation='relu', input_shape=(1000,)),
tf.keras.layers.Dense(4, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Reduce Network’s size
• The comparison of the validation loss for the original network and a smaller
network is as follows.
• We can see that the smaller network starts overfitting later than the reference
one (after 6 epochs rather than 4).
• The performance of the smaller network degrades much more slowly after it
starts overfitting.
Reduce Network’s size
We can try a different network that has a bigger capacity.
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(512,activation='relu',input_shape=(1000,)),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(512, activation='sigmoid')
])
Reduce Network’s size
• The comparison of the validation loss for a bigger network with the original
network is as follows
• The bigger network starts overfitting after the first epoch. It overfits much more
severely.
Reduce Network’s size
• We can also compare the training losses of both networks.
• The training loss of the bigger network approaches zero very quickly.
• The more capacity the network has, the quicker the model will be able to model
the training data (which results in low training loss), but it is more vulnerable to
overfitting (resulting in a large difference between training and validation loss).
Adding Weight Regularization
• Given some training data and network architecture, simpler models are less likely
to overfit than complex ones.
• We can force the weights in the network to take only small values.
• This makes the distribution of weights values more regular.
• This is called “weight regularization”, which is done by adding a cost that is
associated with having large weights.
Adding Weight Regularization
• The cost comes in two flavors:
• L1 regularization: The added cost is proportional to the absolute value of the weight’s
coefficients. It is also called “L1 Norm”
• L2 regularization: The added cost is proportional to the square of the value of the weight’s
coefficients. It is also called “L2 Norm”
Adding Weight Regularization
We can add the L2 weight regularization as follows
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(16, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu', input_shape=(1000,)),
tf.keras.layers.Dense(16, kernel_regularizer=tf.keras.regularizers.l2(0.001), activation='relu’),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Adding Weight Regularization
• The impact of L2 regularization is as follows.
• We can see that even though both the models have the same number
of parameters, the model with L2 regularization has become more
resistant to overfitting than the reference model.
Adding Weight Regularization
As an alternative to L2 regularization, you can use one of the following
Keras weight regularizations.
#L1 regularization
tf.keras.regularizers.l1(0.001)
tf.keras.layers.Dropout(rate=0.2)
Let’s add two Dropout layers in our model and see how well they can reduce overfitting
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(16,activation='relu', input_shape=(1000,)),
tf.keras.layers.Dropout(rate=0.2),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dropout(rate=0.2),
tf.keras.layers.Dense(1, activation='sigmoid')
])
Dropout
• Let’s plot the results
• We could see a clear improvement over the reference network
Summary
• Therefore, the most common ways to prevent overfitting in neural
network
• Getting more training data.
• Reducing the capacity of the network.
• Adding weight regularization.
• Adding dropout.
Universal Workflow – Deep Learning
1. Define the problem at hand and assemble a dataset
a. What is the input data? What do you want to predict?
b. What type of problem you are facing – binary classification, multi-class
classification, regression, or something else
2. Pick a measure of success
a. How do you define success – accuracy? Precision-Recall? Customer retention
rate?
b. The metric of success will help you decide the loss function
Universal Workflow – Deep Learning
3. Prepare your data - The data should be formatted before it can be fed
into the machine learning model
a. The data should be formatted as tensors
b. The value taken by the tensors should be a small value. For example, in the [-1, 1]
range or [0, 1] range
c. For heterogeneous data, the data should be normalized
d. If the dataset is small, you may consider feature engineering
4. Develop the model – three key choices that you need to make
a. Choice of the last-layer activation
b. Choice of the loss function. This should match the problem that you are trying to
solve
c. Choice of an optimizer. What would be the optimizer? What would be the learning
rate? Can we go with the default optimizer for Keras, rmsprop, and its default learning
rate?
Universal Workflow – Deep Learning
You can also pick a last layer activation and a loss function from the
following table
Problem Type Last-layer activation Loss function