Morgan & Claypool - Introduction To Deep Learning For Engineers Using Python and Google Clod Platform - 2020
Morgan & Claypool - Introduction To Deep Learning For Engineers Using Python and Google Clod Platform - 2020
Mechanical Engineering
Introduction to Deep Learning for Engineers
Using Python and Google Cloud Platform
Tariq M. Arif, Weber State University
This book provides a short introduction and easy-to-follow implementation steps of deep learning
using Google Cloud Platform. It also includes a practical case study that highlights the utilization
ABOUT SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis lectures
provide concise original presentations of important research and
development topics, published quickly in digital and print formats. For
Synthesis Lectures on
store.morganclaypool.com
Mechanical Engineering
Introduction to
Deep Learning for Engineers
Using Python and Google Cloud Platform
Synthesis Lectures on
Mechanical Engineering
Synthesis Lectures on Mechanical Engineering series publishes 60–150 page publications
pertaining to this diverse discipline of mechanical engineering. The series presents Lectures
written for an audience of researchers, industry engineers, undergraduate and graduate
students.
Additional Synthesis series will be developed covering key areas within mechanical
engineering.
Introduction to Deep Learning for Engineers: Using Python and Google Cloud Platform
Tariq M. Arif
2020
Engineering Dynamics
Cho W.S. To
2018
Mathematical Magnetohydrodynamics
Nikolas Xiros
2018
Resistance Spot Welding: Fundamentals and Applications for the Automotive Industry
Menachem Kimchi and David H. Phillips
2017
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
Introduction to Deep Learning for Engineers: Using Python and Google Cloud Platform
Tariq M. Arif
www.morganclaypool.com
DOI 10.2200/S01029ED1V01Y202007MEC028
Lecture #28
Series ISSN
Print 2573-3168 Electronic 2573-3176
Introduction to
Deep Learning for Engineers
Using Python and Google Cloud Platform
Tariq M. Arif
Weber State University
M
&C Morgan & cLaypool publishers
ABSTRACT
This book provides a short introduction and easy-to-follow implementation steps of deep learn-
ing using Google Cloud Platform. It also includes a practical case study that highlights the
utilization of Python and related libraries for running a pre-trained deep learning model.
In recent years, deep learning-based modeling approaches have been used in a wide variety
of engineering domains, such as autonomous cars, intelligent robotics, computer vision, natural
language processing, and bioinformatics. Also, numerous real-world engineering applications
utilize an existing pre-trained deep learning model that has already been developed and opti-
mized for a related task. However, incorporating a deep learning model in a research project is
quite challenging, especially for someone who doesn’t have related machine learning and cloud
computing knowledge. Keeping that in mind, this book is intended to be a short introduction
of deep learning basics through the example of a practical implementation case.
The audience of this short book is undergraduate engineering students who wish to explore
deep learning models in their class project or senior design project without having a full journey
through the machine learning theories. The case study part at the end also provides a cost-
effective and step-by-step approach that can be replicated by others easily.
KEYWORDS
Google Cloud Platform (GCP), Python, PyTorch, artificial neural network, ma-
chine learning, deep learning, transfer learning, pre-trained model, convolutional
neural network, pooling layers, EfficientNet
ix
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
2 Introduction to PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Setting up PyTorch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Basic PyTorch Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Author’s Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
xiii
Preface
My goal in writing this book is to motivate engineering students to learn deep learning appli-
cations. As a newly evolved area, deep learning models and their implementation methods are
rapidly changing. This nature of continuous development provides an extra layer of challenges
for newcomers, especially those who are not in the field of data science or machine learning. I
found it exciting that many engineering projects can be optimized and improved significantly by
implementing a deep learning model that is already tuned for a similar application. Currently,
there are numerous excellent books and online resources available on deep learning to explain
theories and applications. But I always felt the need for a short book that shows all the neces-
sary techniques to utilize deep learning models in real-life projects using cloud computing. So,
my primary motivation in writing this manuscript is to give a whole picture of the application
process.
One of the challenges I faced while writing is to squeeze the introductory chapters that
cover fundamental theories. I would suggest readers relate the case study part with fundamental
theories and focus on running the codes provided in this book by setting up a GCP account. Al-
though this book covers only a small facet of deep learning implementation, I hope it will inspire
engineering students to learn other different variations that can be related to their projects.
Tariq M. Arif
July 2020
xv
Acknowledgments
I would like to thank Mr. Paul Petralia, Dr. C.L. Tondo, Ms. Melanie Carlson, and all of the
members at Morgan & Claypool Publishers who are involved in improving and editing this
manuscript at different stages. I would also like to thank Mr. Adilur Rahim for his expert inputs
on deep learning and motivating me to work in this field.
I must thank my wife, Mahbuba Sultana, for being tremendously supportive, and I am
also grateful to my daughter, Nuha Arif, who sometimes decided not to distract me while I was
working on this manuscript.
Tariq M. Arif
July 2020
1
CHAPTER 1
Python is a popular object-oriented programming language that was initially created by Guido
van Roussum in 1990. Later on, many developers and programmers contributed to its growth,
and its popularity has been escalated in recent years in many different domains, including ma-
chine learning [1]. Python also has many vibrant online communities and active support forums.
Among the popular programming forums, the https://stackoverflow.com/ and https://python-
forum.io/ sites have a substantial amount of useful information for beginners. One of the ben-
efits of using Python is that it provides numerous ready-to-use robust libraries that are actively
maintained by developers through the GitHub page (https://github.com/). Many of its libraries
can be implemented effectively for general-purpose programming, scientific computing, data vi-
sualization, and machine learning applications. Beside freely available libraries, there are numer-
ous other reasons why Python become a popular programming language, not only for machine
learning or artificial intelligence but also for web, game development, and business applications.
The Python hype is likely to grow for many more years to come, and recently, many top U.S.
universities have switched to Python for teaching introductory programming classes [2].
Deep learning is a new and popular platform that is widely used in many different pre-
dictive problems related to artificial intelligence. Many scientists and researchers use Python
programming language and its libraries to utilize deep learning models. Training a deep learn-
ing model requires an extensive amount of data processing capabilities by GPU-based hardware,
which is relatively expensive. Nowadays, there are cloud computing platforms available such as
Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), and these
platforms can be used for training a data-intensive deep learning model. In this book, we have
shown the steps required to run a deep learning model using the google cloud platform in a
cost-effective way. Since the basic Python programming is required to follow this text, the in-
troductory chapters (Chapters 1 and 2) are included to provide relevant Python programming
skills and library descriptions. For running deep learning model, a widely used library known as
NumPy (Numerical Python) and a tensor array library (PyTorch) are used.
Next, Chapters 3–5 provide basic introductions related to the artificial neural network,
deep learning, and transfer learning frameworks. Chapter 6 illustrates the essential tips and
2 1. INTRODUCTION: PYTHON AND ARRAY OPERATIONS
tricks to set up the Google Cloud Platform’s GPU engine and the PyTorch library. Finally,
Chapter 7 presents a case study for the actual implementation of a deep learning model.
Although introductory backgrounds are provided, anyone who is completely new to
Python should read some basics of Python data structure and string manipulations before start-
ing Chapter 7. Numerous materials are freely available online to learn the basics of Python
programming. A good starting point is the official Python tutorials available at https://docs.
python.org/3/tutorial/.
In the following sections, we are going to show the steps to install Anaconda and run
Jupyter Notebook. These steps will be useful to follow the case study presented at the end. Later
in this chapter, we will show a few simple array manipulations using Python and NumPy library.
Figure 1.1: Select “Add Anaconda3 to my PATH environment variable” during installation.
are created based on NumPy library. The NumPy module can be imported as “np” (different
keywords can be chosen), and after importing, it can be used to call any array by typing “np.”
Figure 1.5: Using numpy for converting python list to array type.
Note that we can also use “B” in the command line instead of “print(B)”, and press
“Shift+Enter” from the keyboard to see the array output.
Figure 1.12: Using “ones” and “reshape” functions in one line to create a 100 by 100 array.
Figure 1.13: Using “full” function to create a 3 by 10 array where all elements are 55.
For creating an array filled with a different number (e.g., 55), we may use the “full”
function. The following commands in Fig. 1.13 are creating a 3 by 10 array where all elements
are 55.
Figure 1.14: Scalar value 5 is added to E (3 by 10 array). Here, 5 is broadcasted and added to all
the elements of E.
Figure 1.15: NumPy broadcasting operation during multiplication of two different array sizes.
Figure 1.16: The NumPy dot multiplication will give an error since the array sizes of E and F are
different.
Now let’s say we have another array of “2” (Fig. 1.15) whose size is 3 by 1. If we multiply
E and F, only the column of F will be broadcasted horizontally to get the multiplication done
without any error.
Here, if we want to do the dot multiplication (using np.dot function), we will get an error
because the sizes of these two matrices are not aligned (Fig. 1.16).
To fix this error, we have to make sure that the sizes of the matrices are such that actual
matrix multiplication is allowed. Therefore, the column size of the first matrix should be equal to
1.4. ARRAY COMPUTING USING NUMPY 9
the row size of the second matrix. Now, let’s reshape the matrix size F using the “full” function,
as shown in Fig. 1.17 and then try to do the dot multiplication.
11
CHAPTER 2
Introduction to PyTorch
2.1 INTRODUCTION
PyTorch is a highly optimized library that can manipulate data in the form of tensors. This
library was introduced in 2016, and since then, it has been used in creating deep learning models
effectively using GPUs and CPUs. The PyTorch documentation and libraries can be found at [4].
For multi-dimensional data manipulation, machine learning architectures use tensors as
their primary data structure. François Chollet [5] defined tensor as a container of data, and
also classified typically used multi-dimensional tensors in neural-networks. A Tensor can be
0-D (only one scalar value), 1-D (one-dimensional vector), 2-D (two-dimensional matrix), 3-
D (three-dimensional array to represent three channels of RGB for color images), 4-D (four-
dimensional array, which can be a collection of RGB color images), and 5-D collection of RGB
images represented by four-dimensional tensors at different times. Although the concept of a
tensor is different in the field of physics and engineering, in recent years, the machine learning
community widely adopted this term to explain generalized matrices. So, in this book, we will
consider tensor as a vehicle to organize information in more than two dimensions. The multi-
dimensional tensors can be illustrated by using cubes, as shown in Fig. 2.1.
In deep learning architecture, the tensor-based data can quickly become enormous during
weight and back-propagation calculations. Hence, most of the models are executed on GPUs
for parallel computing and effective data manipulations. In 2018, Google released TPU (Tensor
Processing Units), which is specially designed for Google’s Tensorflow software to train deep
neural networks [6].
Figure 2.2: Changing Anaconda’s environment for PyTorch. The blue lines here are for hiding
usernames.
Step 1: Go to the search bar, type Anaconda prompt, and select “Anaconda Prompt.”
Step 2: Note that the “<base>” environment is written before the path.
Step 3: Type “conda activate pytrochenv” and press Enter (Fig. 2.2). Note that the “<base>”
has been changed to “<pytorchenv>.”
2.2. SETTING UP PYTORCH 13
Figure 2.3: Type “jupyter notebook” in Anaconda Prompt and press Enter.
Figure 2.4: Jupyter Notebook interface. Here, select new to create a new script in PyTorch en-
vironment.
Figure 2.5: Type “conda deactivate,” and press Enter to deactivate PyTorch and go back to the
base environment.
Step 4: To open Jupyter Notebook type “jupyter notebook,” and press Enter (Fig. 2.3).
Step 5: Now, the Jupyter Notebook will start, and we will be able to create a new python file
for using the PyTorch functionalities (Fig. 2.4).
Step 6: Once we finish using Jupyter Notebook, we can shut down the kernel by pressing
“Ctrl+C” in the command window (multiple times might be required), and then we
can deactivate the PyTorch environment by typing “conda deactivate” in the command
window (Fig. 2.5). Note that in the terminal, the Anaconda environment changes from
“<pytorchenv>” to “<base>.”
14 2. INTRODUCTION TO PYTORCH
Figure 2.6: Importing torch and numpy library from Jupyter Notebook.
Figure 2.10: Use “torch.arange” to create a tensor and use “torch.device” to define
GPU/CUDA.
Using a GPU for running a deep learning model is highly recommended, and if you want
to define a GPU/CUDA, you can use “torch.device( )” function, as shown in Fig. 2.10.
Right now, the system we are using doesn’t have a GPU. We can quickly check the status of
GPU by using the commands shown in Fig. 2.11. We can also check the device type for tensor
B, which is a CPU in our example.
17
CHAPTER 3
Figure 3.1: Multi-layer neural network architecture with two hidden layers.
correct results of the example dataset are unknown. And, the reinforcement learning are used
when learning through observations from a complex environment is required. Here, the model
makes a final decision based on the penalties or rewards it received for its performance.
The selection of these schemes depends on the types of dataset and output requirements
for a particular task.
3.2 APPLICATIONS
The artificial neural network is very effective in function approximations, identifying patterns,
and implementing complex mapping. Recently, it has been used in diverse science, engineering,
art, and medical applications. It is also widely used for forecasting and predicting many real-
life businesses, economic and social issues [7–10]. In Chapter 7, we provided a case study of a
multiclass classification problem where the model learns from 8,144 car images to determine
car model, make, and year.
3.3. NEURONS AND ACTIVATION FUNCTIONS 19
Figure 3.2: Computational steps inside one neuron of the hidden layer. Here, .z/ is the sigmoid
activation, x1 ; x2 ; x3 are inputs to the neuron, w1 ; w2 ; w3 are the weights, and b is the bias to
the input.
The use of sigmoid activation in a deep neural network is minimal, and most of the time,
nonlinear activations such as tanh, ReLU, and Softmax are used. It is important to note that
different types of activations can exist in the neurons of the hidden and output layer. Only the
input layer neurons don’t have activation functions. The derivatives of activation functions also
play an important role as they are used in the backpropagation error calculation during gradient
descent. A short description of typical activation functions and their derivatives are given in the
following sections for reference.
and
0 .z/ D 1 .z/2 : (3.4)
This function performs better than the sigmoid function when the output is a large neg-
ative number. This function can also map numbers between 1 to C1. A plot of hyperbolic
function in the range of 5 to 5 is given in Fig. 3.4.
and (
0
0 for z 0
.z/ D (3.6)
1 for z > 0:
The ReLu function updates parameters faster as its gradient calculation is simpler com-
pared to other activations. A plot of ReLU function in the range of 1 and C1 is given in
Fig. 3.5.
and (
0
a for z 0
.z/ D (3.8)
1 for z > 0:
A plot of Leaky ReLU function in the range of 1 and C1 is given in Fig. 3.6.
3.3. NEURONS AND ACTIVATION FUNCTIONS 23
Finding out the appropriate activation function for a model is a challenge due to very small
gradients during the backpropagation steps. This phenomenon is also known as diminishing
gradients. Besides the activation functions shown here, many other functions are continuously
being explored and added in the field of machine learning. For example, ArcTan, Piecewise
Linear Unit (PLU), Scaled Exponential Linear Unit (SELU), Inverse Square Root Unit (ISRU),
etc. A comprehensive list of activation functions can be found on the Wikipedia page [11].
M
1 X
J D Œym log .h .xm // C .1 ym / log .1 h .xm // ; (3.13)
M mD1
where M D number of training examples, ym D target label for training example m, xm D input
for training example m, h D model with neural network weights . For categorical targets if
yO1 : : : yOm are the probabilities of the “m” classes, and the r th class is the ground-truth class, then
3.5. GRADIENT DESCENT ALGORITHM 25
Figure 3.9: If the initial starting point is Start Point 1, the gradient descent algorithm will con-
verge to a local minima. On the other hand, if the Start Point 2 is used, the algorithm will
converge to the global minima.
the loss function for a single instance is defined by Equation (3.14) [13]:
The goal of any neural network is to minimize Equation (3.14). This minimization process
is done by updating weights using an optimization function or optimizer. An important iterative
algorithm known as backpropagation is used to apply this optimization, and in this process, the
loss function dictates how well the network is learning.
Many different optimizers have been used in artificial neural networks such as gradient
descent, gradient descent with momentum, Root Mean Square Propagation (RMSProp), Adam
optimizers, and so on. The selection of a suitable optimizer depends on the computational sce-
nario, and they typically differ in terms of their simulation speed and convergence ability.
Figure 3.10: If the Cost function, J is a function of two parameters (weight and bias), the
gradient descent algorithm will search for a minima on the surface. In this example plot, the
Start Point 1 will converge to a local minima, and the Start Point 2 will converge to a global
minima.
in Fig. 3.10. Here, the height of the surface shows the value of cost function J for respective
weight and bias.
In most of the machine learning optimization tasks, the cost function is a higher-
dimensional function due to the existence of many weights and biases. This type of higher-
dimensional plot is impossible to visualize.
The gradient descent algorithm is simultaneously performed for all values of weights and
biases. If we have n neurons in a hidden layer, all the weights will update through Equa-
tion (3.15) [14]. Here ˛ is the learning rate, which tells the algorithm how big or small the
step size would be while moving in the direction of downhill steep. The negative gradients push
the algorithm to move toward the downhill directions.
@
!1 D !1 ˛ J.!; b/
@!1
@
!2 D !2 ˛ J.!; b/
@!2
@ (3.15)
!3 D !3 ˛ J.!; b/
@!3
:
:
3.5. GRADIENT DESCENT ALGORITHM 27
:
:
@
!n D !n ˛ J.!; b/:
@!n
The bias parameters for a hidden layer will update using Equation (3.16):
@
bDb ˛ J.!; b/: (3.16)
@b
Equations (3.15) and (3.16) shows the calculation for one layer only. This computational
process becomes expensive in case of deep learning when we have many hidden layers.
The learning rate, ˛ is one of the critical hyper-parameters for model training, and it
needs to be tuned carefully. If the learning rate is too low, the gradient descent algorithm will
become computationally expensive. On the other hand, if the learning rate is relatively big, it
may overshoot the minima and fail to converge.
One way to use the learning rate effectively is to reduce its value as the training pro-
gresses. This process is known as the adaptive learning rate [15]. The adaptive learning rate
can be reduced or adjusted using a pre-defined schedule such as time-based decay, step decay,
and exponential decay [16]. Many algorithms, such as Adam, SGD, AdaGrad, and RMSProp,
use the adaptive learning rate. The PyTorch “torch.optim” package has built-in functions that
allow users to pass arguments in different optimizers for implementing the adaptive learning
algorithm [17].
29
CHAPTER 4
Deep learning is a computational neural network architecture containing multiple hidden layers.
Since deep learning is a larger neural network, it is treated as a subfield of machine learning and
artificial intelligence. It is an extremely powerful tool for speech recognition, remote sensing,
classifying objects, and pattern recognition tasks. For the past few years, deep learning algo-
rithms are extensively employed in many different scientific and engineering applications such
as self-driving cars, fraud detections, health care, entertainment, and so on. Using many hidden
layers, a deep learning model learns from big datasets with multiple levels of abstraction.
During the learning process, a backpropagation algorithm helps the model to fine-tune
the parameters of one layer from the calculated parameters of the previous layer. Figure 4.1 shows
a schematic of a deep learning model with n hidden layers and biases, where every neuron in a
layer is connected to every neuron of the next layer.
If we start modeling from scratch, the optimum numbers of layers and neurons of a deep
network should be chosen carefully. There are no correct layer or neuron numbers for a specific
problem, and numerous factors can influence the output. Many powerful deep learning architec-
tures such as inception-V3, ResNet, Inception-ResNet, DenseNet, and Pyramidal Residual Net
have more than 100 layers [18–22]. However, the performance of these architectures depends
on both node and layer numbers. Having too many hidden layers is not always effective, and it
should be selected based on the type of training dataset and output requirements. An optimum
selection of these parameters (also called capacity) can help us to avoid overfitting and underfit-
ting during the training process [23]. Most of the time, we don’t need too many layers as they
can add unnecessary difficulties to the computational process. A system with 5–20 nonlinear
hidden layers can map extremely complex functions for inputs that are simultaneously sensitive
to minute details [24]. Therefore, while selecting a deep learning model, we need to be extra
cautious regarding network layers and architecture.
As the growth of deep learning is accelerating, many new deep learning models are con-
tinuously coming out each year. Some of the popular deep learning architectures are briefly
discussed in the following sections.
30 4. INTRODUCTION TO DEEP LEARNING
Figure 4.1: Schematic of deep learning model with n hidden layers and biases, where every
neuron in a layer is connected to every neuron of the next layer.
Figure 4.2: Schematic of max, average, and sum pooling methods to remap the input features in
a 3 3 array.
Figure 4.3: CNN transforms the input car image (RGB inputs) through convolutional, pooling,
and fully connected layers. The output layer has softmax activation (for multi-class classification)
to represent probability distribution for different car models. This example indicates that the car
is Model 1.
connected layer, followed by the output layer. The fully connected layers also use a backprop-
agation algorithm to update their weights and biases. The properties of fully connected layers
depend on the types of problems (e.g., classification or segmentation). It is important to note
that, for classification and regression problems, another supervised algorithm known as support
vector machine (SVM) can also be used instead of fully connected layers.
At the end of the fully connected layer, a softmax activation function can be used for do-
ing multi-class classification, where the probabilities of each class will always sum up to one.
Figure 4.3 shows a general framework of a convolutional neural network for multi-class classifi-
cation problems. This example illustrates that, after training over a lot of images, a deep learning
model can learn features to identify a car model based on the given input picture. If we have
more than two classes (different car models), the softmax activation function or the SVM can
be used in the last layer.
The convolutional model is extensively used in recent years, and it has been proved to
be very useful for 2D image classification problems. Very recently, CNN models are used on
3D MRI image data to explore complicated functions of the human brain [30–32]. Among the
deep learning architectures, CNN is computationally expensive as it requires a large volume of
training data to avoid overfitting.
4.3. RECURRENT NEURAL NETWORK (RNN) 33
emotional state and sensitivity from writings. A schematic of many-to-one RNN architecture
is given in Fig. 4.5.
In the many-to-many RNN structure, multiple inputs go through hidden states and pro-
duce multiple outputs. It is used for translating sentences in another language, video frame
labeling, and for many other prediction applications where multiple inputs produce multiple
outputs. Many-to-many RNN structures might have different layouts based on the numbers of
inputs and outputs. A schematic of many-to-many RNN architecture is given in Fig. 4.6.
Besides these RNN architectures, there are many other input-output dynamics. For ex-
ample, a simple one-to-one RNN structure similar to the classical feed-forward network.
An RNN model is challenging to train because to update the parameters, backpropagation
algorithm needs to calculate gradients at different time steps. This operation makes the network
unstable due to vanishing and exploding gradients. In order to avoid this problem, different
supporting units such as Gated Recurrent Unit (GRU), Bidirectional Recurrent Neural Network
(BRNN), and Long-Short-term memory (LSTM) are used. For example, the LSTM network
uses cells with input, output, and forget-gate to control the flow of information. LSTM can be
used with other tricks such as gradient clipping to set a threshold value for the error gradient and
weight regularization (L1-absolute or L2-Squared) to introduce a penalty to the loss functions,
etc. An LSTM based recurrent network is more effective when several layers are involved in
the sequence of data [34]. To train a recurrent network for complex high-dimensional data, a
GPU-based hardware (e.g., NVIDIA cuDNN) that supports deep learning framework is highly
recommended.
4.4. OTHER DEEP LEARNING MODELS 35
works as a data generator, and the other network classifies it by utilizing real data. This process
continues until the second network fails to discriminate between the real and synthetic data [13].
A GAN model is relatively difficult to train as it can suffer from mode collapse, diminished
gradient, and non-convergence problems [43, 44].
Generative networks can be used to generate realistic pictures or images, and it is explored
in many different engineering domains for image reconstructions. Recently, GAN was used to
learn scene dynamics from a large amount of unlabeled video (26 terabytes), and it also showed
impressive results in recovering features from astrophysical images by reducing random and
systematic noises [45].
Although GAN is an amazing development for deep learning, using it, people can make
fake media content, realistic images, and online profiles that can negatively affect our society.
There are numerous other potential applications of GAN that exist in the field of arts, industry,
and medicine, and it will definitely create an impact on our society in the near future.
37
CHAPTER 5
that we can use effectively, and the amount of effort required for this process is also relatively
insignificant. Some of the widely used pre-trained networks are ResNet, GoogleLeNet, VGG,
Inception, DenseNet, VGG, EfficientNet, etc. [51]. Many other ready-to-use pre-trained mod-
els are available now, and as the field of deep learning is emerging, new models are continuously
being tested and added to this list.
One of the advantages of using pre-trained networks is that the architectures have already
been trained and tested using a large dataset (e.g., ImageNet). We can just use these networks
partially or wholly in the transfer learning process. For partial uses of a model, only the last
few layers are regulated to train it on a new dataset, but the core part of the network doesn’t
change. On the other hand, we can also utilize the whole architecture without parameter values,
initialize all the weights randomly, and then train the model using the new dataset. These simple
techniques can significantly boost the performance of a deep learning model, and recently, this
practice is becoming well accepted in computer vision applications [52–54]. A schematic of
transfer learning process is shown in Fig. 5.1.
CHAPTER 6
Figure 6.1: Create a free account in Google Cloud Platform or Sign in using your existing
account.
Figure 6.3: Select the individual from the drop-down menu and fill out the name and address.
Figure 6.5: After account creation, you will receive a welcome message. Check your email and
complete your profile.
the “complete your profile” section from the welcome email. Next time, when you log in to the
GCP, you will only need to select “go to console” or “console,” as shown in Fig. 6.6.
Inside the GCP interface, a dashboard contains Project info, Compute engine, Google
Cloud Platform status, Resources, API, Billing, Documentations, and other options. The nav-
igation menu from the upper left corner (Fig. 6.7) will show a drop-down panel where you will
be able to select a few different options. It is recommended to check the billing section when
you run a compute engine. In this way, you will get an idea of billing and will be able to keep
track of your remaining credits.
Next, select “CREATE INSTANCE.” If you are creating an instance for the first time,
you may see a different interface than the one shown in Fig. 6.11. You will just need to select
the create instance option.
Next, assign a name for the instance (e.g., deeplearning), select a region for VM from
the region drop-down menu. In this example, we have chosen us-west-1 (Oregon), but you
may choose a different area. Please note that you might need to try another zone if the selected
region does not have enough resources available to fulfill your request at the time of creation.
Then select a custom machine type, 8 vCPU core, and 30 GB memory (Fig. 6.12).
This is just an example, and you may choose a different custom VM based on your re-
quirement. You may use CPU only for your computation if you think CPU is good enough
for handling your machine learning model. Here we intend to show you the way of selecting
48 6. SETTING UP PYTORCH AND GOOGLE CLOUD PLATFORM CONSOLE
GPU for your project since most of the deep learning models require a GPU for parallel com-
putational power. If you expand the CPU platform and GPU, you will see the type of GPUs
available in that location. But you won’t be able to add a GPU before quota approval. You will
need to request for GPU quotas, a step which we are going to show in the following section.
Next, scroll down and click change in the boot disk section. From the drop-down menu of
the operating system, select Ubuntu, and its version. If you are familiar with any other operating
system, you may select that one and increase the boot disk size as needed (e.g., 50 GB) for your
project (Fig. 6.13).
From the firewall check on Allow HTTP traffic and Allow HTTPS traffic. After that,
expand the “Management, security, disks, networking, sole tenancy” option. Go to the Manage-
ment tab and copy or write the script from Fig. 6.14 for the Custom metadata, “Startup script”
section. This operation will help you to perform some software installation and update every
time you boot up your instance (Fig. 6.15).
6.4. SET UP A VM INSTANCE 49
Figure 6.15: Write startup script inside Custom metadata, “Startup script” section.
Then go to the “Disks” tab and check off “Delete boot disk when instance is deleted.”
Finally, click on “Create” to create the computing engine. These steps are shown in Fig. 6.16.
You might see a warning message saying that the zone does not have enough resources
available to fulfill the request. Very typically, if you don’t select any GPU, you will not see this
error message. But if you see an error message, you should try a different zone to create your
instance. For example, you may change from us-west1-b to a, c or us-central1, us-east1, etc.
When the instance is successfully created, you will see a notification. At this stage, you
will need to stop the instance from Compute Engine ! VM instances. Select three dots, as
shown in Fig 6.17, and then select stop. This trick will save some unnecessary billing in your
account. You may only start the instance when you run a model or install dependencies.
Figure 6.18: For Quota request, go to Navigation Menu, IAM & Admin, and then select Quo-
tas.
Next, from the Quotas page, select Quota type ! All quotas, Service Compute Engine
API, Metric GPUs (all regions), and Locations All locations (Fig. 6.20).
These selections are very important. You may also request for GPU in a specific zone. Next,
request for quotas by selecting EDIT QUOTAS (Fig. 6.21). Write the Number of GPU you
want and provide a reason for your request in the request description section before submitting
your request. You will receive an email confirmation and another email reporting your approval
status within one hour to two days. It is not guaranteed that you will get the quota you requested.
In that case, Google will provide you a reason for denial, and you need to try again later.
Once your quota gets approved, you will need to update your instance by adding a GPU. If
you want to make any changes in your current instance, make sure to stop the running instance
(deeplearning1) first, and then select the instance. You will find an EDIT button for making
changes in your instance.
54 6. SETTING UP PYTORCH AND GOOGLE CLOUD PLATFORM CONSOLE
Select EDIT and then expand the “CPU platform and GPU” section. Here you will be
able to add the number and the types of GPU you want. If you have only one GPU quota, you
will not be able to add more than one GPU. If you think that you will need multiple GPUs for
your project, then you should request the number of Quotas in the previous section (Fig. 6.21).
You are allowed to have more than one GPU quotas. In this demonstration, we have selected
one NVIDIA Tesla P100 GPU, as shown in Fig. 6.22.
If you don’t find your desired GPU in the drop-down list, you will need to try a different
zone. As long as you have GPU quotas, you can create compute engine instances using GPUs.
Figure 6.23: Select Preemptibility “On” and “Create” to run VM at a lower cost.
Set Preemptiblity on, under the Available policy option and Create. You will see a notifi-
cation after the successful creation of your instance.
If you create your instance from the beginning after quota request approval or make an-
other instance, then go over the previous section, the only difference is you have to select GPUs
shown in this section. For any reason, if your instance does not create successfully, try to pick a
different GPU or zones, or try again later. Once your GPU instance is created, you are ready to
run your machine learning model in GCP.
ternal IP addresses. If you have an existing IP address, you will find the “RESERVE STATIC
ADDRESS” in the top-middle area. Otherwise check the Fig. 6.24.
You may give a name and description to the reserve static address. Also, check Network
Service Tier, IP version, type, and region, as shown in Fig. 6.25. Make sure to attach the IP
address from the drop-down menu to the instance you created. In our example, it is “deeplearn-
ing1.”
Finally, select “RESERVE” to reserve the IP address.
Figure 6.27: Steps for creating Firewall rules in GCP and specify a port number.
60 6. SETTING UP PYTORCH AND GOOGLE CLOUD PLATFORM CONSOLE
After that, type “yes” and press “enter” to accept the license terms (Fig. 6.31).
These steps will install related libraries and packages. In the end, you will need to type
“yes” and “enter” again to confirm and initialize Anaconda in your home directory (Fig. 6.32).
Next, install Microsoft Visual Studio (VS) code by typing “yes” and press “enter” again
(Fig. 6.33).
After installing Microsoft VS, type the following command in Fig. 6.34 to initialize in-
teractive shell session.
Now the VM instance is ready for running interactive sessions.
This command will open the Jupyter Notebook configuration file for VM instance. Press
“i” to edit this configuration file and add the lines shown in Fig. 6.36. The port number used in
this example will be the one that you specified during Firewall settings (Fig. 6.27).
After editing the configuration file, press “Esc” and type “:wq” in your keyboard to save and
exit (Fig. 6.37). If you are interested in setting up a password now, check the Jupyter Notebook
configuration for public server [61]. You may close your session now.
Now you are ready to launch Jupyter Notebook and execute code in the server. Next, go
to the SSH terminal and type terminal multiplexer command “tmux” (Fig. 6.38).
6.7. SETTING UP VM INSTANCE TO RUN MODELS 63
Figure 6.37: Press “Esc” and type “:wq” in your keyboard to save and exit.
Figure 6.40: Copy the login token shown in Fig. 6.38 and enter it to the token section and then
create a new password.
Figure 6.41: Enter commands in the SSH terminal before accessing Jupyter Notebook through
a webpage.
If you are running the Jupyter Notebook for the first time, you have to copy the login token
shown in Fig. 6.38 (red box) and Enter it to the browser’s token section, as shown in Fig. 6.40.
Then you can create a new password.
If your token doesn’t work, check the message in the shell window. You probably ended
up opening multiple ports. You may restart the VM instance and use the new token to create
a password. After the successful installation of the Jupyter Notebook in your GCP, you will be
able to write or edit a script and run it in the VM instance through this interface.
Next time when you run the SSH terminal (shown in Fig. 6.28), you will only need to
enter the command lines shown in Fig. 6.41.
After that, enter the url (http://**.**.***.***:8081) for your server in the web browser. The
browser will ask for the password that you created (Fig. 6.40), and then the Jupyter Notebook
interface will be opened.
67
CHAPTER 7
Figure 7.1: Schematic of transfer learning using a pre-trained deep neural network, EfficientNet
B7.
and testing data. A preprocessing step will help us to call a large volume of images using simple
Python codes.
For this case study, we made a folder called “input,” which contains all the relevant files re-
quired to run our model. If you want to reproduce the result, please check the Fig. 7.2 containing
the screenshot of our input folder and file names.
Please note that we converted “.xlsx” files to “.csv” files, and a new column is
added to the “cars_test_annos.csv” that shows classes of each test image copied from
“cars_test_annos_withlabels.mat.” We made this change to check the accuracy of our deep learn-
ing model on the testing dataset.
Step 2: Create a directory (e.g., deep_learling1) inside the home of instance. In the command
line type “sudo mkdir /home/deep_learning1” and press “enter.”
Step 3: Give permission to read and copy files inside the deep_learning1 folder. In the command
line type “sudo chmod a+w /home/deep_learning1” and press “enter.”
The “rsync” command is used to automate the synchronization of a local file system di-
rectory. You will notice in the command window that all the files and folders of “input” are
transferring to VM instance (Fig. 7.8).
have also created checkpoints to save model weights, and the model parameters that give the
best accuracy are always going to be saved inside deep_models. Here, all the images have the
same size (224 by 224), and 3 channels are used for red, green, and blue colors.
The number of samples that go through the model, before updating the parameters (also
known as the batch size) is set as 16. If we increase the batch size, the model performance may
improve, but it will take a significant amount of memories.
An optimum epoch selection that will not cause overfitting is also important. We have
selected 20 epochs, i.e., forward and backward propagation will be done 20 times through the
entire data set. This selection is critical because a low number of epochs might end up under-
fitting the model. One way to understand the fitting is to look into the training and testing
accuracy. Note that in this case study, we haven’t divided the training data into training and
validation sets.
If the training accuracy continuously increases and testing accuracy decreases, that is an
indication of overfitting the model. For this problem, the gradient descent hyperparameter (also
known as the learning rate) is set to 0.01. This learning rate is a standard starting point for many
deep learning models.
We are going to run this block of code shown in Fig. 7.10 only once in GCP. Typically,
the model needs to be run many times for tuning. Next time onward, we don’t have to install
these dependencies. Also, note that the “pip install torch” depends on the version of the CUDA
that we installed in the startup-script.
Figure 7.14: Class object to store the average and current value.
Figure 7.17: Code for reading images and cropping using bounding boxes.
In the following code, we define a “read_images” function that will use the bounding boxes
given by Stanford’s cars dataset and do the cropping (Fig. 7.17).
Next, we can create another function, “augmentor” to augment our training images
(Fig. 7.18). Many research studies show that augmenting training images will improve the learn-
ing process. Here, we rotate the images, so that the model can view it differently from different
angles and learn better.
After the image augmentation, we are going to read the training and testing CSV files
using the panda library. We are also going to remove the quotation marks that exist in the
78 7. PRACTICAL IMPLEMENTATION THROUGH TRANSFER LEARNING
Figure 7.18: Image augmentation function to learn better from training images.
Figure 7.19: Read training and testing CSV files, and make consistent column heading names.
original file. This step is required for the model to access the correct classes and bounding box
information of the corresponding images (Fig. 7.19).
After executing up to the previous code (Fig. 7.19), we may check our test and train file
heads using the “head” function. It will give us the result shown in Fig. 7.20. As we can see, the
quotation mark from the image name has been removed.
Now, the files can be accessed easily by the deep learning model using the scripts below.
In Fig. 7.21, the train_loader and val_loader load images for training and testing, respectively.
7.6. TRANSFER LEARNING MODEL (EFFICIENTNET-B7) 79
Figure 7.20: Check the names in the test and train file column heads using the “head” function.
Figure 7.22: Model Size vs. ImageNet Accuracy. EfficientNet-B7 achieves new state-of-the-art
84.4% top-1 accuracy but being 8.4x smaller and 6.1x faster than GPipe. Also, EfficientNet-B1
is 7.6x smaller and 5.7x faster than ResNet-152 [64].
Figure 7.23: Load pre-trained network EfficientNet-B7. To save the gradients for backpropa-
gation set param.requires_grad = True.
We will need to replace the output model classes, which is 196 different car models in
this case study. Note that in Fig. 7.25, the fully connected layer output (out_features = con-
fig.num_classes) is defined. Now, if you check the model again, it will show that the fully con-
nected layer has 196 output features. After that run model.cuda( ) to use the default GPU.
Now, we are going to define the Stochastic Gradient Descent (SGD) using PyTorch “op-
tim” function. Check the learning rate used in the DefaultConfig section. This learning rate is
one of the hyperparameters that we could use to tune the model for our specific problem. A
learning rate of 0.01 is a good starting point for many cases. As shown in Fig. 7.26, the loss
7.7. FINE-TUNING AND TRAINING 81
Figure 7.25: Change output features of the fully connected layer to 196.
Figure 7.26: Defining the optimizer, learning rate, and loss function.
function was assigned using “torch.nn” module. Here, we calculated the cross-entropy loss for
training and testing images.
Figure 7.27: Code for training the model and print the accuracies.
EfficientNet again. If you do that, you will lose the saved weights for the first 2 epochs. You may
also update the epoch numbers in the code in Fig. 7.9, but it will affect only the output display
during training.
When you run the model for 2 epochs, you will see that the training and testing accuracy
are continuously improving. After 2 epochs, the testing accuracy is 19.5% (Fig. 7.28).
For the next run, when 20 epochs are used, testing accuracy starts from 28%. Because at
this time, it is using the updated best parameters from 2 epochs. Make sure to set the gradient
requirement to “True” and run that portion only (Fig. 7.23) before the second run.
Finally, after 20 epochs, we will get an accuracy higher than 90%. Here, the model will
save only the parameters that give higher accuracy, which is 90.6% for this simulation.
7.7. FINE-TUNING AND TRAINING 83
Figure 7.28: Output display for training and testing losses and accuracies.
Figure 7.29: Output display for training and testing losses and accuracies after 20 epochs (Learn-
ing rate 0.01%).
If you have a situation where training accuracy is increasing, but the testing accuracy is
decreasing, you will have to stop the training because that is a sign of overfitting training data.
Now, to improve the model performance, we can further change some of the training
parameters that affect the learning process, such as learning rate, epochs, number of hidden
layers, etc. This is called hyperparameters tuning, and this step should be done carefully to see
how the model is learning. For example, if we change the learning rate from 0.01 to 0.025
(Fig. 7.9) and run the whole simulation as described in this section, we will get a better result
(testing accuracy 91.5%). The simulation output for learning rate 0.025 is shown in Fig. 7.30.
We may also check different learning rates and hyperparameters to improve testing accuracy.
At this point, we can say that our deep learning model will be able to predict the make,
model, and year of a car within 91.5% accuracy.
84 7. PRACTICAL IMPLEMENTATION THROUGH TRANSFER LEARNING
Figure 7.30: Output display for training and testing losses and accuracies after 20 epochs (Learn-
ing rate 0.025%).
The testing accuracy defines how well the model can predict the testing data set, which has
8,041 car images. After training, our model can predict 91.5% of images correctly. However,
this accuracy is based on the testing images provided by Stanford’s car dataset. We may check
the model performance by taking a car picture or by using a random car image from the internet.
We did a google search “Volkswagen Golf Hatchback 1991,” saved the image, and up-
loaded it to the GCP cloud home folder using Jupyter Notebook. We also did cropping before
classifying this image, since our model was trained on cropped images. Figure 7.31 shows the
way to read an image from the home folder (e.g., car_image1.jpg) and resize it for testing using
our model.
Now to test this image through the model, we used the codes shown in Fig. 7.32. The
model output class here is 190C1 D191. As Python reads columns from 0, we should add 1.
From the cars_meta.csv file, we also find that class 191 is actually Volkswagen Golf Hatchback
1991.
We also checked a couple of pre-2013 car model images from the internet, and most of
the time (9 out of 10), the model can predict car make, model, and year correctly.
7.9. CONCLUSION 85
Figure 7.32: Load car image and test the model output. The output class 191 (190C1) is for
Volkswagen Golf Hatchback 1991 model.
If you have followed all the steps up to here, you should also try to check this model
performance on different car images. You may also try testing images randomly from 8,041
given images (test dataset) to see if the model can predict the car make, types, and year correctly
or not.
7.9 CONCLUSION
In this short book, we have shown a practical case study on how to use Python and deep learning
models for classifying image datasets. The accuracy we have achieved here for multiclass clas-
86 7. PRACTICAL IMPLEMENTATION THROUGH TRANSFER LEARNING
sification is relatively good. Typically, if you spend a significant amount of time on optimizing
and do a lot of trials with hyperparameters, you will get an accuracy higher than 90%. But it also
depends on the type and quality of the dataset you use for training. A good quality data for deep
learning should be large in volume, organized, and cleaned.
The Python programming used in the case study problem is a little advanced-level. You
might find a gap in the difficulty level between the examples showed in the first two chapters and
the case study. If you come across any Python library and program syntax that looks unfamiliar
to you, you should review or study that from another source. There are a lot of free online Python
books and useful learning resources available that you can use for this purpose.
The goal of this book was to give someone a jump-start in the journey of deep learning,
without covering too many details and theories. One of the significant challenges in implement-
ing a deep learning algorithm is that it requires a GPU enabled compute engine. To overcome
this obstacle, we presented a cost-effective approach (preemptible) for using GPU in a cloud
platform (GCP).
The program shown here is structured in such a way that it can be used for other deep
learning models from PyTorch. The best way to use the instructions in this book is to utilize
free datasets to train a deep learning model that is available online. Finally, if you are doing any
computer-vision related project, you may take a lot of pictures by yourself and use your dataset
for training.
Deep learning or artificial intelligence is going to make a significant impact on every aspect
of our life. If we could blend our engineering knowledge with this new technology, it will surely
assist us in doing many exciting projects and solving prediction related problems.
87
Bibliography
[1] G. Piatetsky, Python overtakes R, becomes the leader in data science, machine learn-
ing platform, 2017. https://www.kdnuggets.com/2017/08/python-overtakes-r-leader-
analytics-data-science.html 1
[2] P. Guo, Python is now the most popular introductory teaching language at top
U.S. Universities, https://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-
most-popular-introductory-teaching-language-at-top-u-s-universities/fulltext 1
[3] J. M. Perkel, Why Jupyter is data scientists’ computational notebook of choice, Nature,
563(7729):145–146, November 2018. DOI: 10.1038/d41586-018-07196-1. 2
[5] F. Chollet, Deep Learning with Python, Manning Publications Company, 2017. 11, 24
[8] M. Mehdy, P. Ng, E. Shair, N. Saleh, and C. Gomes, Artificial neural networks in image
processing for early detection of breast cancer, Computational and Mathematical Methods
in Medicine, 2017. DOI: 10.1155/2017/2610628. 18
[10] X. Zhou, W. Gong, W. Fu, and F. Du, Application of deep learning in object detection,
IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS),
pages 631–634, 2017. DOI: 10.1109/icis.2017.7960069. 18
[12] Y. Ho and S. Wookey, The real-world-weight cross-entropy loss function: Modeling the
costs of mislabeling, IEEE Access, 2019. DOI: 10.1109/access.2019.2962617. 24
88 BIBLIOGRAPHY
[13] C. C. Aggarwal, Neural Networks and Deep Learning: A Textbook, Springer International
Publishing, 2018. DOI: 10.1007/978-3-319-94463-0. 25, 36
[14] A. Ng, CS229 lecture notes: Supervised learning, 2018. http://cs229.stanford.edu/notes/
cs229-notes1.pdf 26
[15] E. Kayacan and M. A. Khanesar, Chapter 5—Gradient descent methods for type-2 fuzzy
neural networks, Fuzzy Neural Networks for Real Time Control Applications, pages 45–
70, E. Kayacan and M. A. Khanesar, Eds., Butterworth-Heinemann, 2016. DOI:
10.1016/c2014-0-02444-6. 27
[16] S. Lau, Learning rate schedules and adaptive learning rate methods for deep learning,
2017. https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-
rate-methods-for-deep-learning-2c8f433990d1 27
[17] PyTorch documentation—TORCH.OPTIM. https://pytorch.org/docs/stable/optim.
html 27
[18] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, Rethinking the inception
architecture for computer vision, IEEE Conference on Computer Vision and Pattern Recog-
nition (CVPR), pages 2818–2826, 2016. DOI: 10.1109/cvpr.2016.308. 29
[19] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ImageNet classification with deep convo-
lutional neural networks, Advances in Neural Information Processing Systems, pages 1097–
1105, 2012. DOI: 10.1145/3065386. 29
[20] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, Inception-v4, Inception-ResNet
and the impact of residual connections on learning, 31st AAAI Conference on Artificial
Intelligence, 2017. 29
[21] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, Densely connected convo-
lutional networks, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition,
pages 4700–4708, 2017. DOI: 10.1109/cvpr.2017.243. 29
[22] D. Han, J. Kim, and J. Kim, Deep Pyramidal Residual Networks, pages 6307–6315, 2017.
DOI: 10.1109/cvpr.2017.668. 29
[23] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016. DOI:
10.1145/2976749.2978318. 29
[24] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature, 521(7553):436–444, May
1, 2015. DOI: 10.1038/nature14539. 29
[25] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, Gradient-based learning applied to doc-
ument recognition, Proc. of the IEEE, 86(11):2278–2324, 1998. DOI: 10.1109/5.726791.
30
BIBLIOGRAPHY 89
[26] Y. Kim, Convolutional neural networks for sentence classification, Proc. of the Con-
ference on Empirical Methods in Natural Language Processing, August 25, 2014. DOI:
10.3115/v1/d14-1181. 30
[28] D. Ciregan, U. Meier, and J. Schmidhuber, Multi-column deep neural networks for image
classification, IEEE Conference on Computer Vision and Pattern Recognition, pages 3642–
3649, 2012. DOI: 10.1109/cvpr.2012.6248110. 30
[29] X. Liu, Z. Deng, and Y. Yang, Recent progress in semantic image segmentation, Artificial
Intelligence Review, 52(2):1089–1106, August 1, 2019. DOI: 10.1007/s10462-018-9641-
3. 30
[30] J. Kim, V. D. Calhoun, E. Shim, and J. H. Lee, Deep neural network with weight
sparsity control and pre-training extracts hierarchical features and enhances classi-
fication performance: Evidence from whole-brain resting-state functional connectiv-
ity patterns of schizophrenia, Neuroimage, 124(PtA):127–146, January 1, 2016. DOI:
10.1016/j.neuroimage.2015.05.018. 32
[31] Y. Zhao et al., Automatic recognition of fMRI-derived functional networks using 3-D
convolutional neural networks, IEEE Transactions on Biomedical Engineering 65(9):1975–
1984, September 2018. DOI: 10.1109/tbme.2017.2715281. 32
[32] H. Jang, S. M. Plis, V. D. Calhoun, and J. H. Lee, Task-specific feature extraction and
classification of fMRI volumes using a deep neural network initialized with a deep belief
network: Evaluation using sensorimotor tasks, Neuroimage, 145(PtB):314–328, January
15, 2017. DOI: 10.1016/j.neuroimage.2016.04.003. 32
[33] X. Li and X. Wu, Constructing long short-term memory based deep recurrent neu-
ral networks for large vocabulary speech recognition, IEEE. October 15, 2014. DOI:
10.1109/icassp.2015.7178826. 33
[34] A. Graves, A. Mohamed, and G. Hinton, Speech recognition with deep recurrent neu-
ral networks, IEEE International Conference on Acoustics, Speech and Signal Processing,
pages 6645–6649, 2013. DOI: 10.1109/icassp.2013.6638947. 33, 34
[35] D. Güera and E. J. Delp, Deepfake video detection using recurrent neural networks, 15th
IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS),
pages 1–6, 2018. DOI: 10.1109/avss.2018.8639163. 33
90 BIBLIOGRAPHY
[36] A.-I. Marinescu, Bach 2.0—generating classical music using recurrent neu-
ral networks, Proc. Computer Science, 159:117–124, January 1, 2019. DOI:
10.1016/j.procs.2019.09.166. 33
[37] D. Eck and J. Schmidhuber, Learning the long-term structure of the blues, Artificial Neu-
ral Networks, (ICANN), pages 284–289, Berlin, Heidelberg, Springer Berlin Heidelberg,
2002. DOI: 10.1007/3-540-46084-5_47. 33
[38] S. Lawrence, C. L. Giles, and S. Fong, Natural language grammatical inference with re-
current neural networks, Knowledge and Data Engineering, IEEE Transactions on, 12:126–
140, February 1, 2000. DOI: 10.1109/69.842255. 33
[39] M.-J. Zhang and Z.-Z. Chu, Adaptive sliding mode control based on local recurrent
neural networks for underwater robot, Ocean Engineering, 45:56–62, May 1, 2012. DOI:
10.1016/j.oceaneng.2012.02.004. 33
[40] A. Chinea Manrique de Lara, Understanding the principles of recursive neural networks:
A generative approach to tackle model complexity, Artificial Neural Networks–ICANN
2009. DOI: 10.1007/978-3-642-04274-4_98. 35
[41] X. Wang and A. Gupta, Unsupervised learning of visual representations using videos,
IEEE International Conference on Computer Vision (ICCV), pages 2794–2802, 2015. DOI:
10.1109/iccv.2015.320. 35
[42] A. Sagheer and M. Kotb, Unsupervised pre-training of a deep LSTM-based stacked au-
toencoder for multivariate time series forecasting problems, Scientific Reports, 9(1):19038,
December 13, 2019. DOI: 10.1038/s41598-019-55320-6. 36
[46] H. C. Shin et al., Deep convolutional neural networks for computer-aided detection:
CNN architectures, dataset characteristics and transfer learning, IEEE Transactions on
Medical Imaging, 35(5):1285–98, May 2016. DOI: 10.1109/tmi.2016.2528162. 37
BIBLIOGRAPHY 91
[47] R. Marée, P. Geurts, and L. Wehenkel, Towards generic image classification using tree-
based learning: An extensive empirical study, Pattern Recognition Letters, 74:17–23, April
15, 2016. DOI: 10.1016/j.patrec.2016.01.006. 37
[48] O. Russakovsky et al., ImageNet large scale visual recognition challenge, International
Journal of Computer Vision, 115(3):211–252, December 1, 2015. DOI: 10.1007/s11263-
015-0816-y. 37
[49] A. Torralba, R. Fergus, and W. T. Freeman, 80 million tiny images: A large data set for
nonparametric object and scene recognition, IEEE Transactions on Pattern Analysis and
Machine Intelligence, 30(11):1958–1970, 2008. DOI: 10.1109/tpami.2008.128. 37
[50] S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge
and Data Engineering, 22(10):1345–1359, 2010. DOI: 10.1109/tkde.2009.191. 37
[52] Q. V. Le, Building high-level features using large scale unsupervised learning, IEEE In-
ternational Conference on Acoustics, Speech and Signal Processing, pages 8595–8598, 2013.
DOI: 10.1109/icassp.2013.6639343. 38
[53] A. Radford, L. Metz, and S. Chintala, Unsupervised representation learning with deep
convolutional generative adversarial networks, ArXiv Preprint ArXiv:1511.06434, 2015.
38
[54] R. Wan, H. Xiong, X. Li, Z. Zhu, and J. Huan, Towards making deep transfer learn-
ing never hurt, IEEE International Conference on Data Mining (ICDM), pages 578–587,
2019. DOI: 10.1109/icdm.2019.00068. 38
[55] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, A survey on deep transfer
learning, International Conference on Artificial Neural Networks, pages 270–279, Springer,
2018. DOI: 10.1007/978-3-030-01424-7_27. 38
Author’s Biography
TARIQ M. ARIF
Tariq M. Arif is an assistant professor in the Department of
Mechanical Engineering at Weber State University, UT. Prior
to that, he worked at the University of Wisconsin, Platteville,
as a lecturer. Tariq obtained his Ph.D. in 2017 from the Me-
chanical Engineering department of the New Jersey Institute
of Technology (NJIT), NJ. His main research interests are
in the area of artificial intelligence and genetic algorithm for
robotics control, computer vision, and biomedical simulations
of focused ultrasound surgery. He completed his Masters in
2011 from the University of Tokushima, Japan, and a B.Sc. in
2005 from Bangladesh University of Engineering and Tech-
nology (BUET). Tariq also worked in the Japanese automobile industry as a CAD/CAE engi-
neer after completing his B.Sc. degree. In his industrial and academic carrier, Tariq has been
involved in many different research projects. Currently, he is working on the implementation of
deep learning models for various engineering tasks.