Deep Learning Lab Manual
Deep Learning Lab Manual
Software requirements
● imageio: 2.5.0
● keras: 2.2.4
● keras-lr-finder: 0.1
● keras-vis: 0.4.1
● matplotlib: 2.0.2
● mlxtend: 0.15.0.0
● numpy: 1.16.4
● pandas: 0.24.1
● Pillow: 6.0.0
● scikit-learn: 0.21.2
● scipy: 1.2.1
● tensorboard: 1.12.2
● tensorflow: 1.12.0
● vis: 0.0.5
Over these stipulated years, deep learning has left its fingerprints globally and it is
being used by not only Fortune 500 companies but also mid to small size firms in
technology, healthcare, marketing, finance and many other big sectors. And this trend
without a doubt will continue to grow in leaps and bounds.
Let us assume that we are trying to teach a child how to recognize cats for the very first time
in life. When the child is trying to learn to identify a cat, we never know exactly how the
child is relating the word ‘cat’ to that specific mammal. Having said that, let us still try to
reflect upon the various stages of learning of the child.
Stage 1: The child may wonder that all the mammals with fur are cats. So, initially the child
tries to identify small features to arrive at the result. However, this could lead to the child
incorrectly identifying a dog with fur as a cat. This tells us that the child still needs another
stage of learning.
Similarly, a deep learning neural network starts with small features. With limited knowledge
retaining capability (i.e. number of nodes and layers), it too tends to misidentify the target
output initially.
Stage 2: To improve over the previous course of learning, the child starts combining few
small features together like a mammal with long whiskers around nose. This way the child’s
capability starts increasing as the child learns with more experience, utilizing more memory
to piece and remember the features together.
Similarly, a deep learning network joins small features and look for improving its
performance and the higher performance probability is still dependent upon ample data
and computing resources.
Stage 3: With enough experience, a child connects various small features together to finally
distinguish a cat from other mammals.
Likewise, a deep learning network built from scratch does various mistakes due to less
knowledge (small features) but it gradually improves with more data (taking less time, if
provided with high-end computational resources) to connect those small features to build
an intuition to arrive at an output.
Course name:
Build a Convolution Neural Network for Image Recognition.
Convolution Neural Network is similar to multi-layer perceptron having made up of neurons
with learnable parameters computing the loss function in the last layer. However, CNN
architecture primarily made an explicit assumption that inputs are images but in recent
times their applications are seen in the areas of text, speech and time series forecasting.
Let's understand the performance difference between the two using a small example.
Consider an image with a dimension of 16x16x3.
Implementation using Multi-Layer Perceptron
For a classical neural network, all joined connections in its first hidden layer will result in 768
weights (16*16*3). However, with an increased number of neurons and image size (say
300x300x3 with 270000 weights) the structure with a vast number of parameters quickly
leads to overfitting. Also, MLP are not known to preserve the spatial features within the
images.
Implementation using Convolutional Neural Network
ConvNet pre-assumes the input to be images and hence arranges its neurons in a 3D volume
structure: width, height, and depth.
So, an image of shape 16x16x3 will form a volume of similar shape i.e. 16x16x3 with
neurons connecting only to a slice of preceding layer neurons. The resulting output has a
dimension of 1x1xh where h represents the target labels.
Let us learn what are the components of a CNN and how does it work.
Input -> Convolutional -> Pooling -> Output
Among the Convolutional and Pooling layers, both can be repeated as many times as you
like.
Input layer
Input layer having two dimensions works as a storage unit for holding raw image data with
the preferable size in a multiple of 16, 32, 64, 224, 256, etc. for both height and width for
the efficient use of memory fields.
Convolutional Layer
Once the image is loaded in the input layer, the succeeding hidden layers connect back to
their preceding layers only on a local region known as the receptive field. This follows a
convolution operation which is a combined integration between two functions. It depicts
how one function modifies the shape of others.
Since images are represented as a form of a multi-dimensional matrix in the system,
therefore, consider the below picture to learn how convolution takes place on a channel
(RGB) of an image:
Here, at a time a certain image slice is chosen. The filter slides over the input volume
convolving with the local region at a time. The number of pixels to jump for next
convolution is governed by the stride. Stride with value 1 makes the filter slide over the
input volume with 1 pixel, a value of 2 makes the filter slide with 2 pixels and so on. Larger
the stride, lesser the spatial extent of output volume. In this illustration, the stride is taken
as 1.
Sometimes, the filter size along with stride value doesn't fit the shape of the image,
therefore, in such cases, extra padding of zero is preferred. Zero-padding across the input
volume border provides us two benefits: first, it helps retain the border information of the
image. Since with each convolution, the size of the image keeps reducing and hence without
padding the border information may simply be removed. Second, it helps keep the shape of
input and output volume equal. Since filter convolution may change the output volume
spatial extent, hence padding helps to avoid such cases.
During the process, you can choose the number of filters where each one of them locates
distinct features likes edges, blobs, etc. Distinct filters are indeed necessary to gain distinct
features in the process. For instance, with distinct filters, we can attain features including a
sharp image, blurred image, image edges, etc.
To wrap up this idea in one single example, consider the given image where we choose a
stride value of two, zero-padding value as one along with two filters. So, to arrive at the
output -3 (colored in red), you need to get the sum of the pointwise multiplication of the
similar colored matrixes.
For instance,
Output volumeRed = Input volumeGreen * Filter 1Green + Input volumeOrange * Filter 1Orange + Input
volumePink * Filter 1Pink + Bias 1
Similarly, you can proceed to find the values of other cells of the output matrixes. Note, the
first output volume matrix is formed using Filter 1 and Bias 1 whereas the second output
volume matrix is formed by Filter 2 and Bias 2.
Here's a GIF illustration to understand the process:
Pooling layer:-
This layer performs downsampling operation along the two dimensions (width and height),
hence reducing the number of required parameters and thus reduced computation and a
lesser chance of overfitting. It uses the MAX function and requires two hyperparameters the
receptive field, and the stride rate. Padding is generally not used with pooling layer. Also, it
doesn’t introduce any new parameter as it works on a fixed function.
An alternative to pooling layer: Jost Tobias Springenberg et.al. in their paper "Striving for
Simplicity: The All Convolutional Net" suggests that using a higher stride once and using only
CONV layers can completely remove the need of having a pooling layer in the architecture.
Fully-Connected layer
Neurons in the Fully-Connected layer are connected to all the activations in previous layers
as in ordinary neural networks. It uses the softmax activation function for classifying input
images into various classes.
This page lists the conventions required in the process:
If you input a volume of size W1 * H1 * D1, it requires four hyperparameters:
1. Number of filters, K
2. Receptive field, F
3. The stride, S
4. The amount of zero-padding, P
Which produces a volume of size W2 * H2 * D2 where
● W2=(W1-F+2P)/(S+1)
● H2=(H1-F+2P)/(S+1)
● D2=K
Exercise 2:- Module name : Understanding and Using ANN : Identifying age
group of an actor Exercise : Design Artificial Neural Networks for Identifying
and Classifying an actor using Kaggle Dataset.
For our problem statement, we will resize all the images to 32 x 32 shape. All the images
have red, blue and green color components, therefore, the final shape becomes 32 x 32 x 3
giving us a total of 3072 nodes for the input layer.
Next, we will choose one hidden layer to start with along with 500 nodes making a total of
1536500 (3072 x 500) connections between the input and the hidden layer. We will use the
ReLU activation function in this layer.
Next, we have the output layer having only three classes and hence three nodes making a
total of 1503 (500 x 3) connections between hidden and output layer. In this layer, we will
use the Softmax activation function.
While building the model, we will split the training data into training and validation data set
and will find the loss and accuracy for both the data sets.
Now that we have defined the structure of our neural network, let us start implementing
the code in Keras.
Before we proceed, let us take a look at the current challenges of the given data set:
● Variations in shape: For example, one image has a shape of (66, 46) whereas another
has a shape of (102, 87), there is no consistency
● Brightness and contrast: It varies across images and can introduce discrepancy in few
cases
In this resource, we are going to handle the above challenges by performing image
preprocessing, as well as implement a basic neural network.
Let us first import all the necessary libraries and modules which will be used throughout the
code:
# Importing necessary libraries
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import LabelEncoder
from tensorflow.python.keras import utils
from keras.models import Sequential
from keras.layers import Dense, Flatten, InputLayer
import keras
import imageio # To read images
e from PIL import Imag# For image resizing
Next, let us read the train and test data sets into separate pandas DataFrames as shown
below:
# Reading the data
train = pd.read_csv('age_detection_train/train.csv')
test = pd.read_csv('age_detection_test/test.csv')
Once, both the data sets are read successfully, we can display any random movie character
along with their age group to verify the ID against the Class value, as shown below:
np.random.seed(10)
idx = np.random.choice(train.index)
img_name = train.ID[idx]
img = imageio.imread(os.path.join('age_detection_train/Train', img_name))
print('Age group:', train.Class[idx])
plt.imshow(img)
plt.axis('off')
plt.show()
Next, we can start transforming the data sets to a one-dimensional array after reshaping all
the images to a size of 32 x 32 x 3.
Let us reshape and transform the training data first, as shown below:
temp = []
for img_name in train.ID:
img_path = os.path.join('age_detection_train/Train', img_name)
img = imageio.imread(img_path)
img = np.array(Image.fromarray(img).resize((32, 32))).astype('float32')
temp.append(img)
train_x = np.stack(temp)
Next, let us reshape and transform the testing data, as shown below:
temp = []
for img_name in test.ID:
img_path = os.path.join('age_detection_test/Test', img_name)
img = imageio.imread(img_path)
img = np.array(Image.fromarray(img).resize((32, 32))).astype('float32')
temp.append(img)
test_x = np.stack(temp)
Next, let us normalize the values in both the data sets to feed it to the network. To
normalize, we can divide each value by 255 as the image values lie in the range of 0-255.
# Normalizing the images
train_x = train_x / 255.
test_x = test_x / 255.
and label encodes the output classes to numerics:
# Encoding the categorical variable to numeric
lb = LabelEncoder()
train_y = lb.fit_transform(train.Class)
train_y = utils.np_utils.to_categorical(train_y)
Next, let us specify the network parameters to be used, as shown below:
# Specifying all the parameters we will be using in our network
input_num_units = (32, 32, 3)
hidden_num_units = 500
output_num_units = 3
epochs = 5
batch_size = 128
Next, let us define a network with one input layer, one hidden layer, and one output layer,
as shown below:
model = Sequential([
InputLayer(input_shape=input_num_units),
Flatten(),
Dense(units=hidden_num_units, activation='relu'),
Dense(units=output_num_units, activation='softmax'),
])
We can also use summary() method to visualize the connections between each layer, as
shown below:
# Printing model summary
model.summary()
Next, let us compile our network with SGD optimizer and use accuracy as a metric:
# Compiling and Training Network
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])
Now, let us build the model, using the fit() method:
model.fit(train_x, train_y, batch_size=batch_size, epochs=epochs, verbose=1)
This results in the following log:
We can observe in the above results, that the final accuracy is 62.78%. However, it is
recommended that we use 20% to 30% of our training data as a validation data set to
observe how the model works on unseen data.
The following code considers 20 percent of the training data as validation data set:
# Training model along with validation data
model.fit(train_x, train_y, batch_size=batch_size, epochs=epochs, verbose=1,
validation_split=0.2)
We can observe, that the training accuracy is 64.51% and validation accuracy is 63.64%.
Since both the results are quite close we can conclude that there's no overfitting in the
model. However, the accuracy itself is too low. The accuracy can be increased by
overcoming the previously stated challenges and some difference can even be observed by
tuning the hyper-parameters which we are going to observe in the next resource.
With our baseline neural network, we can now predict the age group of test data and save
the results in an output file, as shown below:
# Predicting and importing the result in a csv file
pred = model.predict_classes(test_x)
pred = lb.inverse_transform(pred)
test['Class'] = pred
test.to_csv('out.csv', index=False)
We can also perform the visual inspection on any random image, as shown below:
# Visual Inspection of predictions
idx = 2481
img_name = test.ID[idx]
img = imageio.imread(os.path.join('age_detection_test/Test', img_name))
plt.imshow(np.array(Image.fromarray(img).resize((128, 128))))
pred = model.predict_classes(test_x)
print('Original:', train.Class[idx], 'Predicted:', lb.inverse_transform(pred[idx]))
The network misidentified the current image from the middle age group as young. This
could be due to the 64% accuracy of the model. Therefore, let us learn about hyper-
parameter tuning and try to improve the results.
--------------------------------------------------------------------------------------------------------------------------
Exercise 3: Module name : Understanding and Using CNN : Image recognition
Design a CNN for Image Recognition which includes hyperparameter tuning.
from matplotlib import pyplot as plt
%matplotlib inline
from sklearn.preprocessing import LabelEncoder
import keras
import pandas as pd
import numpy as np
from PIL import Image
import os
import warnings
warnings.filterwarnings('ignore')
Next, let us import the label file and view any random image along with its label:
labels = pd.read_csv('cifar10_Labels.csv', index_col=0)
# View an image
img_idx = 5
print(labels.label[img_idx])
Image.open('cifar10/'+str(img_idx)+'.png')
As we can observe the label is correct as per the image. Now, let us split the
data into training and test, follow up with its transformation and
normalization:
# Splitting data into Train and Test data
from sklearn.model_selection import train_test_split
y_train, y_test = train_test_split(labels.label, test_size=0.3,
random_state=42)
train_idx, test_idx = y_train.index, y_test.index # Stroing indexes for later use
# Reading images for training
temp = []
for img_idx in y_train.index:
img_path = os.path.join('cifar10/', str(img_idx) + '.png')
img = np.array(Image.open(img_path)).astype('float32')
temp.append(img)
X_train = np.stack(temp)
# Reading images for testing
temp = []
for img_idx in y_test.index:
img_path = os.path.join('cifar10/', str(img_idx) + '.png')
img = np.array(Image.open(img_path)).astype('float32')
temp.append(img)
X_test = np.stack(temp)
# Normalizing image data
X_train = X_train/255.
X_test = X_test/255.
The next preprocessing step it to label encode the image respective labels:
# One-hot encoding 10 output classes
encode_X = LabelEncoder()
encode_X_fit = encode_X.fit_transform(y_train)
y_train = keras.utils.np_utils.to_categorical(encode_X_fit)
Now, let us define the CNN network:
# Defining CNN network
num_classes = 10
model = keras.models.Sequential([
# Adding first convolutional layer
keras.layers.Conv2D(filters=32, kernel_size=(3, 3), strides=1, padding='same',
activation='relu',
kernel_regularizer=keras.regularizers.l2(0.001), input_shape=(32, 32, 3),
name='Conv_1'),
# Normalizing the parameters from last layer to speed up the performance (optional)
keras.layers.BatchNormalization(name='BN_1'),
# Adding first pooling layer
keras.layers.MaxPool2D(pool_size=(2, 2), name='MaxPool_1'),
# Adding second convolutional layer
keras.layers.Conv2D(filters=64, kernel_size=(3, 3), strides=1, padding='same',
activation='relu',
kernel_regularizer=keras.regularizers.l2(0.001), name='Conv_2'),
keras.layers.BatchNormalization(name='BN_2'),
# Adding second pooling layer
keras.layers.MaxPool2D(pool_size=(2, 2), name='MaxPool_2'),
# Flattens the input
keras.layers.Flatten(name='Flat'),
# Fully-Connected layer
keras.layers.Dense(num_classes, activation='softmax', name='pred_layer')
])
In the above model, we have used two convolution layers paired with max pool layers finally
connecting with the Fully-Connected layer. We kept the "same" padding i.e., the output
volume will have the same length as the original input. For no padding, you can choose the
'valid' argument. The stride is chosen as 1, a total number of 32 and 64 filters for each
respective convolution layer and lastly keeping the kernel size as 3x3.
L2 regularization has been added to cost function via the convolution layers. Also, there's an
addition of a new concept termed Batch Normalization. It is added due to the following
reasons:
● Since we normalize the data before passing it to the input layer to increase the
performance, therefore, we add this extra layer of normalization to normalize the
values at the intermediate steps.
● It doesn't let the activation go higher or lower, therefore, you can use a higher
learning rate to check for new feature possibilities.
We can further proceed with train and test accuracy along with the confusion matrix to
judge which class the model is predicting better:
from mlxtend.evaluate import scoring
train_acc = scoring(encode_X.inverse_transform(model.predict_classes(X_train)),
encode_X.inverse_transform([np.argmax(x) for x in y_train]))
test_acc = scoring(encode_X.inverse_transform(model.predict_classes(X_test)), y_test)
print('Train accuracy: ', np.round(train_acc, 5))
print('Test accuracy: ', np.round(test_acc, 5))
So, the model has presented a high probability that the given image belongs to the ship
class.
To visualize activation over final dense layer outputs, we need to switch the softmax
activation to linear since the gradient of the output node will depend on all the other node
activations.
# Utility to search for layer index by name.
layer_idx = utils.find_layer_idx(model, 'pred_layer')
# Swap softmax with linear
model.layers[layer_idx].activation = keras.activations.linear
model = utils.apply_modifications(model)
1. Saliency map
Saliency maps clarify which part does our model focuses on to get a prediction.
plt.figure(figsize=(12,6))
for i in range(len(classes)):
plt.subplot(2, 5, i + 1)
grads = visualize_saliency(model, layer_idx, filter_indices=i, seed_input=ship_img,
backprop_modifier='guided')
plt.xticks([])
plt.yticks([])
plt.xlabel(classes[i])
plt.title('p=' + str(np.round(ship_prob[i], 4)))
plt.imshow(grads, cmap='jet')
plt.show()
s shown in the probability graph only the first two and last two nodes are active, leaving
center six almost dead. The second last node meant for the ship has the highest probability
and as seen in the above plot only emphasize on the region of the ship. Other three active
nodes include noise and thus reduced the probability.
2. Class activation maps or Grad-cam
These maps contain more detail since they use Conv or Pooling features that contain more
spatial detail which is lost in Dense layers. The only additional detail compared to saliency is
the penultimate_layer_idx. This specifies the pre-layer whose gradients should be used.
plt.figure(figsize=(12,6))
for i in range(len(classes)):
plt.subplot(2, 5, i + 1)
cam_grads = visualize_cam(model, layer_idx, filter_indices=i, seed_input=ship_img,
backprop_modifier='guided',
penultimate_layer_idx=utils.find_layer_idx(model, 'BN_2'))#
batch_normalization_14
plt.xticks([])
plt.yticks([])
plt.xlabel(classes[i])
plt.title('p=' + str(np.round(ship_prob[i], 4)))
plt.imshow(overlay(cam_grads, ship_img, alpha=0.3))
plt.show()
The ninth subplot constitutes of a contour plot overlayed over the original image which
captures ship structure better than other three contour plots which are heavily affected by
noise. Center six nodes are almost dead and thus no contour plot.
--------------------------------------------------------------------------------------------------------------------------
EXERCISE 4 :-
Module name : Predicting Sequential Data 🡪 Implement a Recurrence Neural
Network for Predicting Sequential Data.
To perform text generation using RNN, we need to show the model various examples to
make a better prediction character by character. For this task, we will be using the Project
Gutenberg's The Adventures of Sherlock Holmes, by Arthur Conan Doyle available as
a .txt file. Download the file from here and read the file in Python.
# Necessary libraries
from keras.models import Sequential
from keras.layers import Dense, Activation, LSTM, Dropout
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
import keras
import random
import numpy as np
# Reading the data
text = open('Sherlock Holmes.txt').read().lower()
print('Given script has ' + str(len(text)) + ' characters')
Since the dataset is too long, therefore, let us strip the dataset and perform basis
preprocessing.
text = text[1302:]
for ch in ['0','1', '2', '3', '4', '5', '6', '7', '8', '9', '!', '"', '$', '%', '&', '~', '`', '(', ')', '*',
'-', '/', ';', '@', '?', ':', '©', '¢', 'ã', '\xa0', '\n', '\r', '.']:
if ch in text:
text=text.replace(ch,' ')
print(set(text))
Now, we can create a sliding window function in which all the characters inside the window
are treated as input and the following character is treated as output. We use the window
size of 50 and step size as 3.
def window_transform(text, window_size, step_size):
inputs = []
outputs = []
n_batches = int((len(text)-window_size) / step_size)
for i in range(n_batches-1):
a = text[i * step_size:((i * step_size) + window_size)]
inputs.append(a)
b = text[(i * step_size) + window_size]
outputs.append(b)
return inputs,outputs
# Calling the window function
window_size = 50
step_size = 3
inputs, outputs = window_transform(text, window_size, step_size)
Let us verify the results from the above function:
inputs[502], outputs[502]
As you can observe the length of the window (input size) is 50 which you can verify
using len(inputs[502]) and the corresponding output is the following character of the
ongoing sentence which here is 'd'.
For a confirmation here is the sentence at input[503] sliding with a step size of 3 (taking
three new characters).
Now, let us try to formulate the problem in the context of machine learning. Above we saw
the output of set(text) resulted in 33 unique characters, therefore, the given problem
formulates in the multi-class classification problem.
To begin with, we first sort the output of set(text) and map them to a unique numerical
value.
# Sorting the unique elements
chars = sorted(list(set(text)))
# Encoding
chars_to_indices = dict((c, i) for i, c in enumerate(chars))
# Decoding
indices_to_chars = dict((i, c) for i, c in enumerate(chars))
For instance, chars_to_indices['r'] results in 20 which defines that character r is mapped to
value 20.
Now, we have each character mapped to a numeric value, it is time to transform the
input/output vector in the same numeric format:
def encode_io_pairs(text, window_size, step_size):
num_chars = len(chars)
# cut up text into character input/output pairs
inputs, outputs = window_transform(text, window_size, step_size)
# create empty vessels for one-hot encoded input/output
X = np.zeros((len(inputs), window_size, num_chars), dtype=np.bool)
y = np.zeros((len(inputs), num_chars), dtype=np.bool)
# loop over inputs/outputs and tranform and store in X/y
for i, sentence in enumerate(inputs):
for t, char in enumerate(sentence):
X[i, t, chars_to_indices[char]] = 1
y[i, chars_to_indices[outputs[i]]] = 1
return X,y
X, y = encode_io_pairs(text, window_size, step_size)
This completes the formatting of the data set. Now, we can build the LSTM network starting
with the first layer having 120 nodes followed by a fully-connected linear layer and a
softmax layer.
# Designing the model
model = Sequential()
model.add(LSTM(120, input_shape=(window_size, len(chars))))
model.add(Dropout(0.22))
model.add(Dense(len(chars), activation='linear'))
model.add(Dense(y.shape[1], activation='softmax'))
# Compiling the model
model.compile(loss='categorical_crossentropy', optimizer='adam')
# Subsetting data for an example
Xsmall = X[:20000,:,:]
ysmall = y[:20000,:]
# Model training
model.fit(Xsmall, ysmall, batch_size=500, epochs=10)
To proceed with the prediction, we need to follow a simple rule of thumb. We know that at
a time, our script accepts a window size of 50 and takes the output as the 51st character.
Following this rule, we need to predict a character, later remove the first character from our
previous window and add the newly predicted character at the end making it still a window
of size 50 then predict the second character and keep following the process.
This method along with the number of characters to be predicted is coded below:
def predict_next_chars(model, input_chars, num_to_predict):
pred_chars = ''
for i in range(num_to_predict):
# Converting this round's predicted characters to numerical input
x_test = np.zeros((1, window_size, len(chars)))
for t, char in enumerate(input_chars):
x_test[0, t, chars_to_indices[char]] = 1.
# make this round's prediction
test_predict = model.predict(x_test,verbose = 0)[0]
# translate numerical prediction back to characters
r = np.argmax(test_predict)
d = indices_to_chars[r]
# update predicted_chars and input
pred_chars+=d
input_chars+=d
input_chars = input_chars[1:]
return pred_chars
Now, all you're left with is the prediction of the new characters which can be performed as
shown:
# Prediction
start = 89
num_to_predict = 10
input_chars = text[start: start + window_size]
print('Complete sequence:', text[start:start + window_size + num_to_predict])
print('Input sequence:', input_chars)
print('Output sequence:', predict_next_chars(model, input_chars, num_to_predict =
num_to_predict))
---------------------------------------------------------------------------------------------------------
Exercise 5:
Module Name: Removing noise from the images 🡪 Implement Multi-Layer
Perceptron algorithm for Image denoising hyperparameter tuning.
In this module, we will start with the CIFAR-10 data set but this time we will introduce some
random noise in each of the images. To initiate, let us read the images in the environment:
# Importing basic libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import os
# Reading all the images in a python list
img_arr = []
for i in range(1, 151):
img_path = os.path.join('cifar10/'+str(i) +'.png')
img = np.array(Image.open(img_path))/255. # Scaling
img_arr.append(img)
# Converting back to numpy array
img_arr = np.array(img_arr)
img_arr.shape
So, as you can observe in the above code, we have used only 150 CIFAR-10 dataset images
and stored all of these 32x32x3 dimensional images to a numpy array. Now, we can add
noise to each of these images:
# Original image
plt.imshow(img_arr[4])
plt.show()
2. Coloring the black and white photos or videos: Do you have your favorite old photos,
movies? Are the in black and white? Do you like to see them colored? It is possible to
restore the black and white movies or your favorite photos. The Deep Learning network can
be trained to learn patterns that naturally occur in photos like the sky is usually blue, clouds
are often white/gray and grass is typically green in order to restore these colors without
human intervention.
3. Voice Generation
Siri and Alexa are two successful voice generating system. But these are not completely an
autonomous system since, they were trained manually to convert text to voice. WaveNet, a
deep generative model-Google and Deep Speech- Baidu are Deep Learning networks that
generated voice automatically. systems created today learn to mimic human voices by
themselves and improve with time. When letting an audience try to differentiate them from
a real human speaking, it is much harder to do so.
4. Deep Learning networks for creating deep learning networks
Neural complete is a deep learning code that can generate new deep learning networks. It is
not only written in Python, but also is trained on generating Python code.
5. Deep Dreaming
Have you noticed YouTube is packed nowadays with hallucinated images based on existing
photos? Deep learning has let the computers to deep dream on to create hallucinated
images by enhancing the existing photos.
● VGGnet
● YOLO
● The availability of heavy graphics cards like GPU, TPU which has affluence training
and tuning of DL model. A huge number of layers can be used to train the DL model
with ease.
● What are the challenging problems that demands advanced deep learning?
Classification and object detection are the main parts of computer vision. Recent
advances in Deep Neural Networks (DNNs) have led to the development of DNN-
driven autonomous cars that, using sensors like camera, LiDAR, etc., can drive
without any human intervention. Most major manufacturers including Tesla, GM,
Ford, BMW, and Waymo/Google are working on building and testing different types
of autonomous vehicles which requires no driver input. Self-driving cars rely on
artificial intelligence to work. Rapid and reliable video object recognition, localization
is the foundation of such autonomous driving. Such autonomous system requires
road signs, recognition of pedestrians, obstacles on the road, buildings in roadsides
with 100% accuracy to cause the right and safe reactions from the vehicle.
Application of object recognition has the potential to save lives by helping doctors
diagnose diseases from high-resolution photographs, MRIs, and CT scans.
● Object detection is a computer vision task that involves both localizing one or more
objects within an image and classifying each object in the image. Fig 2 indicates
some of the object detection problems.
The challenges in NLP is not yet completely addressed. For example, every time Siri tries
to answer a question that has not been programmed to respond, it fails miserably.
Speech recognition, question answering, reputation monitoring, market intelligence and
real time handwritten character recognition are some of the NLP related challenging
tasks.
the scene requires to recognize where are the objects; and how far away are they.
The brain seems to solve scene understanding with a statistical model that
decomposes the scene into affordances and structured relationships that have soft
and ambiguous boundaries. A similar approach is required to be adopted.
● Beyond the identification of object, machines are required to “be creative” and use
Object Detection:
In computer vision applications, deep learning is used to perform three most commonly
occurring task in real time as depicted in fig.5
Deep neural networks will be trained to extract the features from the input images to
perform the given task. Fig 6 shows different levels of features associated with
images.For object detection, deep neural networks use features. The hierarchical
representation of image features is as shown in fig 6.
Fig 6: Hierarchical representation of image feature
It is possible to use:
using SVM
● Convolutional neural networks for image classification, ex: MNIST handwritten digit
recognition
Among the above listed ML/DL techniques, CNN is proved to be the best DNN
architecture for Image classification. But for a real-time object detection, the advanced
DL architecture namely YOLO is supreme. Let’s look at how it works.
● YOLO – “You Only Look Once” are a series of end-to-end deep learning models
designed for fast object detection, developed by Joseph Redmon, et al. in the 2015.
● Once the complexity of the image increases, it is not possible to have computational
resources to build a Deep Learning model from scratch. So, predefined frameworks
and pertained models come in handy.one such framework for object detection is
YOLO.
● It is used as the framework for training YOLO, i.e., it sets the architecture of the
network
● Darknet is mainly used to implement YOLO algorithm
2. Download the pre trained weights from the link yolov4.conv.137 and save it in the
darknet-master folder.
3. In a WordPad type the name of each object in separate lines and save the file as
obj.names in darknet-master->data folder.
4. Create file obj.data in the folder darknet-master->data, and edit the following
5. Create a folder in darknet-master->data -> obj. Store all the images in obj
6. Create a train.txt file in a path: darknet-master->data folder-> train.txt. This file
includes all training images.
data/obj/img1.jpg
data/obj/img2.jpg
data/obj/img3.jpg
data/obj/img4.jpg
7. In the darknet-master folder open Makefile in wordpad and change
GPU=0,CUDNN=1,OPENCV=1 as shown in the following picture. This is done to make
the training on CPU.
Compile darknet:
To compile the darknet execute the following commands:
< make >
< ./darknet >
Train the network:
● Training can be done parts by parts. After each 1000 epoch weights are saved
in the backup folder so we could just retrain from there. For starting the
training run the code.
TESTING : For testing run the following code
!./darknet detector test data/obj.data cfg/yolo-obj.cfg backup/yolo-
obj_12000.weights
What are the Advantages of YOLO over other decoders?
● Rather than using two step method for classification and localization of
object, YOLO applies single CNN for both classification and localization of the
object.
● YOLO can process images at about 40-90 FPS, so it is quite fast. This means
streaming video can be processed in real-time, with negligible latency in a few
milliseconds. The architecture of YOLO makes it extremely fast. When
compared with R-CNN, it is 1000 times faster and 100 times faster than fast
R-CNN.
Limitations and drawbacks of the YOLO object detector
2 It does not always handle small objects well: YOLO can detect only 49 objects. The
reason for this limitation is due to the YOLO algorithm itself. The YOLO object
detector divides an input image into an MxM grid where each cell in the grid predicts
only a single object. If there exist multiple, small objects in a single cell then YOLO
will be unable to detect them, ultimately leading to missed object detections.
Deep Learning is an iterative way of training the machine. Like, any iterative or a cyclic
process, DL involves three main components namely, formulate- training the NN, test the
model and evaluate the model. In other words, iteration in DL indicates the number of
times the hyperparameter are updated. Hyperparameters are the core entities in any DL
models. In the previous modules, t hyperparameter was discussed. The best combination of
various permutations of the hyperparameters must be found to ensure the accurate results.
Finding the best permutations of the hyperparameters turns out to be an optimization
problem in context of training the neural network.
Training the deep learning models take long time. Training few complex DL models take
hours to days. Advanced deep learning architecture like YOLO take 12-hours on COCO data
set. In order to achieve the best training efficiency, the performance of the optimization
algorithm becomes an important factor. Deep learning algorithms requires optimization in
different circumstances but training the neural network is said to be most difficult task for
the following reasons:
Time consuming: In real time, training a single neural network instance on several
machines will take days to months in real time scenarios.
Most of the deep learning models are stochastic in nature. Hence, their performance
optimization techniques are different from conventional optimization techniques. In
conventional optimization techniques, optimization of the performance is direct and is the
objective. Whereas in neural network, performance optimization is indirect. i.e.,
optimization of the objective function will minimize the error or cost function and improving
the overall prediction accuracy of the model.
The problem of training the neural network or the learning is an optimization problem.
The fine-tuning of randomly chosen weights while training the NN will impact on the
learning. Maximizing or minimizing the weights become the objective function during
training. The objective function or loss function needs to be adjusted to minimize the
prediction error. Diminishing the loss function becomes the main target of the optimization
procedure.
Demonstration of most used Loss function as optimization algorithm using CNN for
MNIST.
Step2: Compile the CNN model with Adadelta Optimizer given below replacing
Adagrad in the above CNN model
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
optimizer = tensorflow.keras.optimizers.Adadelta(learning_rate=0.001,
rho=0.95, epsilon=1e-07, name="Adadelta"))
Step4: Compile the CNN model with Adabound momentum estimation given
below replacing Adagrad in the above CNN model
from keras_adabound import AdaBound
model.compile(loss=tensorflow.keras.losses.sparse_categorical_crossentropy,
optimizer=AdaBound(lr=1e-3, final_lr=0.1))
In each case the model is executed and for first 5 epoch, loss and the accuracy
are recorded. In order to study the performance of the optimizer with respect
to cross entropy loss function, a graph was plotted. The code for plotting the
graph is given below:
element = 200
plt.imshow(x_train[element])
plt.show()
print("Label for the element", element,":", y_train[element])
x_train = x_train.reshape((-1, 28*28))
x_test = x_test.reshape((-1, 784))
print(x_train.shape)
print(x_test.shape)
x_train = x_train / 255
x_test = x_test / 255
model = Sequential()
model.add(Conv2D(filters = 96, input_shape = (60000,784, 3),kernel_size = (11, 11), strides =
(4, 4), padding = 'valid'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size = (2, 2),strides = (2, 2), padding = 'valid'))
model.add(BatchNormalization())
model.add(Flatten())
model.add(Dense(4096))
model.add(Activation('relu'))
model.add(Dropout(0.4))
model.add(BatchNormalization())
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(optimizer='Adam',loss='categorical_crossentropy',metrics=['accuracy'])
y=to_categorical(y_train)
model.fit(x=x_train,y=to_categorical(y_train),epochs=10,batch_size=64,shuffle=True)
predictions = model.predict(x_test[0:100])
predictions[0]
np.argmax(predictions[0])
plt.imshow(x_test[0].reshape(28,28))
Exercise 9:Module name: Autoencoders Advanced
Exercise: Demonstration of Application of Autoencoders.
The application of GAN models is varied. Here two case studies are considered to
demonstrate GAN for Data set generation. They are as follows:
1. Image Augmentation using MNIST data set
2. New image generation for CIFAR data Set
1. Image Augmentation: Case study of GAN
Whenever the data set don’t have enough samples to train the machine due to various
constraints in data collection process then, it becomes necessary to use Augmentation.
Particularly, when more complex object needs to be recognized then, Image data
augmentation technique is used. It is a method of artificially escalating the size of a training
dataset by creating artificially new set of images.
Using Keras Image augmentation is demonstrated by building deep learning GANs.
ImageDataGenerator class available in Keras is used in demonstration. It defines the
configuration for image data preparation and augmentation.
In this demonstration following properties are demonstrated:
● Feature standardization.
● ZCA whitening.
● Random flips.
a. Feature Standardization
Using GANs model the pixel values across the entire dataset can be standardized.
Feature standardization is the process of standardizing the pixel which is performed
for each column in a tabular dataset. This can be done by setting the feature
wise_center and feature wise_std_normalization arguments on the
ImageDataGenerator class.
from keras.datasets import mnist
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(featurewise_center=True,
featurewise_std_normalization=True)
# fit parameters from data
datagen.fit(X_train)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9):
# create a grid of 3x3 images
for i in range(0, 9):
pyplot.subplot(330 + 1 + i)
pyplot.imshow(X_batch[i].reshape(28, 28), cmap=pyplot.get_cmap('gray'))
# show the plot
pyplot.show()
break
Output
c. Random Flips
Random Flip can be used as augmentation technique on an image data to improve the
performance on large and complex problems.
from keras.datasets import mnist
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1))
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1))
# convert from int to float
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# define data preparation
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True)
# fit parameters from data
datagen.fit(X_train)
# configure batch size and retrieve one batch of images
for X_batch, y_batch in datagen.flow(X_train, y_train, batch_size=9):
# create a grid of 3x3 images
for i in range(0, 9):
pyplot.subplot(330 + 1 + i)
pyplot.imshow(X_batch[i].reshape(28, 28), cmap=pyplot.get_cmap('gray'))
# show the plot
pyplot.show()
break
Output:
def build_generator():
model = Sequential()
model.add(Dense(128 * 8 * 8, activation="relu",
input_dim=latent_dimensions))
model.add(Reshape((8, 8, 128)))
model.add(UpSampling2D())
model.add(Conv2D(128, kernel_size=3, padding="same"))
model.add(BatchNormalization(momentum=0.78))
model.add(Activation("relu"))
model.add(UpSampling2D())
model.add(Conv2D(64, kernel_size=3, padding="same"))
model.add(BatchNormalization(momentum=0.78))
model.add(Activation("relu"))
model.add(Conv2D(3, kernel_size=3, padding="same"))
model.add(Activation("tanh"))
noise = Input(shape=(latent_dimensions,))
image = model(noise)
return Model(noise, image)
if epoch % display_interval == 0:
display_images()
Output:
Exercise 11:
Module name : Capstone project
Exercise : Complete the requirements given in capstone project
Description: In this capstone, learners will apply their deep learning knowledge and
expertise to a real world challenge.
● You can use tensorflow / Keras for downloading the data set and to build the model.
● Fine tune the hyperparameters and perform the model evaluation.
● Substantiate your solution based on your insights for better visualization and provide
a report on model performance.
Data set description:
Initially to test the model you can use the benchmark data set namely. Fashion-MNIST data
set before deploying it. This dataset is a standard dataset that can be loaded directly. For
more details click here. The data set description is as follows:
● Size of training set = 60,000 images
● Birds
● Cats
● Deer
● Dog
● Frog
● Horses
● Ships
● Trucks
● Airplanes