Vasudevan S. Deep Learning. A Comprehensive Guide 2022
Vasudevan S. Deep Learning. A Comprehensive Guide 2022
Deep Learning
A Comprehensive Guide
Shriram K. Vasudevan
Subashri Vasudevan
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
© 2022 Shriram K. Vasudevan, Sini Raj Pulari, and Subashri Vasudevan.
CRC Press is an imprint of Taylor & Francis Group, LLC
Reasonable efforts have been made to publish reliable data and information, but the author and
publisher cannot assume responsibility for the validity of all materials or the consequences of their use.
The authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information storage
or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com
or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-
750-8400. For works that are not available on CCC please contact [email protected].
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are
used only for identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Names: Vasudevan, Shriram K., author. | Pulari, Sini Raj, author. | Vasudevan, Subashri, author.
Title: Deep learning: a comprehensive guide / Shriram K. Vasudevan,
Sini Raj Pulari, Subashri Vasudevan.
Description: First edition. | Boca Raton: Chapman & Hall/CRC Press, 2022. | Includes index. |
Summary: “Deep Learning: A Comprehensive Guide focuses on all the relevant topics in the
field of Deep Learning. Covers the conceptual, mathematical and practical aspects of all
relevant topics in deep learning Offers real time practical examples Provides case studies.
This book is aimed primarily at graduates, researchers and professional working in
Deep Learning and AI concepts – Provided by publisher.
Identifiers: LCCN 2021031713 (print) | LCCN 2021031714 (ebook) |
ISBN 9781032028828 (hardback) | ISBN 9781032028859 (paperback) |
ISBN 9781003185635 (ebook)
Subjects: LCSH: Deep learning (Machine learning)
Classification: LCC Q325.73 .V37 2022 (print) |
LCC Q325.73 (ebook) | DDC 006.3/1–dc23
LC record available at https://lccn.loc.gov/2021031713
LC ebook record available at https://lccn.loc.gov/2021031714
ISBN: 978-1-032-02882-8 (hbk)
ISBN: 978-1-032-02885-9 (pbk)
ISBN: 978-1-003-18563-5 (ebk)
DOI: 10.1201/9781003185635
Typeset in Minion Pro
by Newgen Publishing UK
Access the companion website: www.routledge.com/9781032028828
Contents
Preface, xi
The Authors, xiii
v
vi n Contents
QUIZ 224
FURTHER READING 224
Index 289
Preface
xi
xii n Preface
various DL products, and the same is discussed in Chapter 11. And at last,
Chapter 12 comes with a handy reference to the interview questions for
aspiring DL candidates.
We would like to thank Sunandhini Muralidharan and Nitin Dantu for
their efforts throughout in shaping this book.
The Authors
Shriram K. Vasudevan
An academician with a blend of indus-
trial and teaching experience for 15 years.
Authored/ co-authored 42 books for
publishers around the world. Authored
more than 120 research papers in inter-
national journals and 30 papers for inter-
national/national conferences. He is an
IETE Fellow, ACM Distinguished Speaker, CSI Distinguished Speaker, and
Intel Software Innovator. He has a YouTube channel –Shriram Vasudevan –
through which he teaches thousands of people all around the world.
Recognized/awarded for his technical expertise by Datastax, ACM,
IETE, Proctor and Gamble Innovation Centre (India), Dinamalar, AWS
(Amazon Web Services), Sabre Technologies, IEEE Compute, Syndicate
Bank, MHRD, Elsevier, Bounce, IncubateIND, Smart India Hackathon,
Stop the Bleed, “Hack Harvard” (Harvard University), Accenture Digital
(India), Nippon Electric Company (NEC, Japan), Thought Factory (Axis
Bank Innovation Lab), Rakuten (Japan), Titan, Future Group, Institution
of Engineers of India (IEI), Ministry of Food Processing Industries
(MoFPI –Government of India), Intel, Microsoft, Wipro, Infosys, IBM
India, SoS Ventures (USA), VIT University, Amrita University, Computer
Society of India, TBI –TIDE, ICTACT, Times of India, the Nehru Group of
institutions, Texas Instruments, IBC Cambridge, Cisco, CII (Confederation
of Indian Industries), Indian Air Force, DPSRU Innovation & Incubation
Foundation, ELGi Equipments (Coimbatore), and so forth. Listed in many
leading biographical databases.
xiii
xiv n The Authors
Notable honors:
• First Indian to be selected as HDE (Huawei Developer Expert).
• NVIDIA Certified Deep Learning Instructor.
• Winner of the Harvard University “Hack Harvard” Global, 2019 and
World Hack, 2019. Winner of 50-plus hackathons.
• Selected as “Intel IoT Innovator” and inducted into the “Intel Software
Innovator” group. Awarded “Top Innovator” award –2018, “Top
Innovator –Innovator Summit 2019”.
• World Record Holder –With his sister, Subashri Vasudevan (Only
siblings on the globe to have authored nine books together: Unique
World Record Books).
• Entry in Limca Book of Records for National Record –2015.
• Entry in India Book of Records –National Record and
Appreciation –2017.
The Authors n xv
Subashri Vasudevan
Subashri holds an M.Tech in CSE and was
associated with Cognizant Technology
Solutions for more than eight years. She
was a senior developer and has exposure
to various DOTNET technologies and
reporting tools. She has coauthored more
than twenty- five technical books for
publishers around the world, including
titles on software engineering, C# pro-
gramming, C+ +programming, and so
forth. Her name is featured in the Limca Book of Records for the number
of books authored by siblings. She has recently developed an interest in the
IoT and ML areas and began contributing to projects involving these tech-
nologies. Teaching is her passion, and she wants to make technology sim-
pler for the students. She also manages a technical YouTube channel (“All
about BI”) for Azure-related concepts and has delivered dozens of lectures
in various educational institutions.
CHAPTER 1
Introduction to
Deep Learning
LEARNING OBJECTIVES
After reading through this chapter, the reader will understand the following:
1.1
INTRODUCTION
Artificial Intelligence and Machine Learning have been buzz words for
more than a decade now, which makes the machine an artificially intel-
ligent one. The computational speed and enormous amounts of data have
stimulated academics to deep dive and unleash the tremendous research
potential that lies within. Even though Machine Learning helped us start
learning intricate and robust systems, Deep Learning has curiously entered
as a subset for AI, producing incredible results and outputs in the field.
Deep Learning architecture is built very similar to the working of a
human brain, whereby scientists teach the machine to learn in a way that
humans learn. This definitely is a tedious and challenging task, as the
DOI: 10.1201/9781003185635-1 1
2 n Deep Learning
1.2
THE NEED: WHY DEEP LEARNING?
Deep Learning applications have become an indispensable part of contem-
porary life. Whether we acknowledge it or not, there is no single day in
which we do not use our virtual assistants like Google Home, Alexa, Siri
and Cortana at home. We could commonly see our parents use Google
Voice Search for getting the search results easily without requiring the effort
of typing. Shopaholics cannot imagine shopping online without the appro-
priate recommendations scrolling in. We never perceive how intensely
Deep Learning has invaded our normal lifestyles. We have automatic cars
in the market already, like MG Hector, which can perform according to
our communication. We already have the luxury of smart phones, smart
homes, smart electrical appliances and so forth. We invariably are taken to
a new status of lifestyle and comfort with the technological advancements
that happen in the field of Deep Learning.
1.3
WHAT IS THE NEED OF A TRANSITION FROM
MACHINE LEARNING TO DEEP LEARNING?
Machine Learning has been around for a very long time. Machine Learning
helped and motivated scientists and researchers to come up with newer
algorithms to meet the expectations of technology enthusiasts. The major
Introduction to Deep Learning n 3
Manual
feature Learning
extraction
Machine
Apple
learning
Apple
Automatic
feature
extraction
and
learning
Apple
Deep learning
Apple
Deep Learning very closely tries to imitate the structure and pattern of
biological neurons. This single concept, which makes it more complex, still
helps to come out with effective predictions. Human intelligence is supposed
to be the best of all types of intelligence in the universe. Researchers are still
striving to understand the complexity of how the human brain works. The
Deep Learning module acts like a black box, which takes inputs, does the
processing in the black box, and gives the desired output. It helps us, with
the help of GPUs and TPUs, to work with complex algorithms at a faster pace.
The models developed could be reused for similar futuristic applications.
4 n Deep Learning
1.4
DEEP LEARNING APPLICATIONS
As introduced earlier, there is an excess of scenarios and applications
where Deep Learning is being used. Let us look at few applications in
Deep Learning for a more profound understanding of where exactly DL
is applied.
Sensors
FIGURE 1.6 Google helping us by giving recommendations for the search text.
converted as voice signals from which the algorithms encode and decode
what exactly a person has asked for. Viterbi algorithms are commonly used
here. This feature is mainly used for Google Search to retrieve the data
needed from a server by giving simple voice input. This feature is very com-
monly seen as subtitles for YouTube videos and multilingual cinemas, too.
Even the young people in this era are accustomed to Alexa, Hello Google,
Cortana, Siri and so forth This comes under the wide window of natural
speech processing.
1.4.4 Entertainment
Deep Learning is extensively used for abstraction of videos, cinema, and
Web series. This technique is widely used for the analysis of video footage
in sports and to generate highlights of games (Figure 1.8). Research is
going on to produce the best highlights, applying various algorithms.
This is also used for the analysis of major points for crime analysis in real
situations. Sometimes humans may miss the hours and hours of inactive
video footage with a relevant information. However, even minor changes
could be captured by deep video analysis. This helps people by saving
time, energy and effort, still helping in gathering the required informa-
tion without missing it. Deep Learning paves the way for editing the con-
tent in an effective way, generating content automatically and even virtual
characters in the process of animated films.
8 n Deep Learning
Auto cartoon makers are on the rise, and we have seen researchers super-
imposing old cartoons with variety of new sounds to make it more comical.
All these are done using Deep Learning algorithms. These algorithms will
become effective on multiple training data sets, which consist of a wide
range of video clips of varied lengths. Deep Learning helps in mapping
videos to corresponding words. The algorithms are also responsible for
identifying the best parameters in the video. There are also algorithms that
convert text into video or animated clips.
1.4.5 Healthcare
Healthcare is an area where Deep Learning is used for extremely arduous
tasks. Even during the 2020’s pandemics, artificial intelligence has played
a significant role in the testing for novel corona virus (Figure 1.9). Thus,
Deep Learning proved its presence in finding cures for untreatable diseases.
Apart from drug discovery, Deep Learning is often used in developing
assistance for children with speech disorders, developmental disorders and
autism. Deep Learning helps to analyze data and bring early detection of
these disorders. This helps victims in many ways to perform an early course
correction. This helps them to tackle situations ahead in a much better way.
Course correction may contribute to the physical, emotional and mental
well-being of the children. There are various research labs that specifically
work under this category.
Introduction to Deep Learning n 9
QUIZ
1. Define Deep Learning.
2. Explain how Deep Learning is different from machine Learning.
3. State a few reasons why Deep Learning is significant.
4. List a few real-time applications of Deep Learning that you use on
a daily basis.
5. Can you think of some healthcare applications where Deep Learning
could be useful?
6. If you are asked to help the professor in getting feedback for his class,
how will you engage Deep Learning techniques to get it done auto-
matically? You may use any gadgets you want. Widen your imagin-
ation and come out with the best solution for the professor.
FURTHER READING
✓ Grigorescu, Sorin, et al. “A survey of deep learning techniques
for autonomous driving.” Journal of Field Robotics 37.3 (2020):
362–386.
✓ Lateef, Fahad, and Yassine Ruichek. “Survey on semantic seg-
mentation using deep learning techniques.” Neurocomputing 338
(2019): 321–348.
✓ Kanjo, Eiman, Eman M.G. Younis, and Chee Siang Ang. “Deep
learning analysis of mobile physiological, environmental and loca-
tion sensor data for emotion detection.” Information Fusion 49
(2019): 46–56.
✓ Garcia-Garcia, Alberto, et al. “A survey on deep learning techniques
for image and video semantic segmentation.” Applied Soft Computing
70 (2018): 41–65.
✓ Bertero, Dario, and Pascale Fung. “A first look into a convo-
lutional neural network for speech emotion detection.” 2017
IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE, 2017.
Introduction to Deep Learning n 11
LEARNING OBJECTIVES:
After this chapter, the reader will be able to understand the following:
2.1
INTRODUCTION
It is always viewed this way. Installation of the tools and prerequisites to
learn Machine Learning or Deep Learning is very difficult and is a daunting
task. This view is a myth and is to be challenged. The installation process
is simple and straightforward. This chapter provides the complete infor-
mation on the installation process with step-by-step inputs. The next
major challenge faced by all Deep Learning/Machine Learning aspirants
is “Datasets.” Which dataset to use is a question that is frequently asked.
This chapter provides the details of the most commonly used datasets with
the links for download. Overall, this chapter serves as a foundation for the
complete learning provided to the reader.
DOI: 10.1201/9781003185635-2 13
14 n Deep Learning
2.2
THE TOOLS
Deep learning is all about tools. It is all practical. So, this chapter discusses
the tools and the prerequisites to be installed in your PC to practice the
algorithms and implementations. Let us get our hands on!
Scipy is used for the statistical operations and is one of the most preferred
(Figure 2.3). Scipy has the fundamental library for the scientific com-
puting. Support for numerical integration, interpolation, optimization,
linear algebra, and statistics makes Scipy the preferred option. Again, this
is open-source and one can read more about Scipy here: www.scipy.org/.
One interesting thing to note is that SciPy uses NumPy internally.
Pandas is a very powerful, flexible, and easy to use open-source data ana-
lysis and manipulation tool, built on top of the Python programming lan-
guage (Figure 2.4). It handles different formats of the file, such as CSV,
16 n Deep Learning
JSON or even SQL. It also helps with providing support for the data
manipulation through merging, selecting, and reshaping. It also has a role
in data cleaning. One can learn more about Pandas by visiting https://
pandas.pydata.org/
B.
Jupyter Installation
1. For the installation of the Jupyter, one should open the Anaconda
Prompt. Many beginners will make a minor mistake here. Beginners
tend to issue the command in the normal command prompt instead
of the Anaconda Prompt. Then the installation will not work, as it is
the destined method to be followed. The screenshot below presents
the error message one would get if the command is issued in normal
command prompt, which should be avoided. Figure 2.10 presents the
command prompt followed by the error message being presented as
Figure 2.11 when the command for Jupyter installation is issued.
3. After the installation is complete, one can with ease open and access
the notebook with the link obtained. Copying and pasting the link
in any of your favorite browsers gets the Jupyter Notebook open (See
Figure 2.14).
Or, there is another, easier way, to open and access the Jupyter
Notebook. Just type “Jupyter” in the search box. It will display the
Jupyter Notebook launcher option as presented below in Figure 2.15.
On clicking, the notebook is launched.
22 n Deep Learning
The reader will have the Jupyter launched in a Web browser as presented
below in Figure 2.16.
Now, the installation is complete. It is time to try out the first program.
3. Then, the user is free to type any Python code as required. The option
for running (executing) the typed code is also available, and RUN has
to be clicked as presented below in Figure 2.20. The output for the
first code has come.
24 n Deep Learning
Next, to install Keras, one has to issue the below command in the
command prompt. Once successfully installed, the message “Successfully
installed Keras” will appear on screen confirming the completeness of the
installation, as shown in Figure 2.22.
> pip install keras
To validate whether the installation has been done properly, there are
some sequences to be followed. Open the Jupyter Notebook with the steps
taught earlier in the section through command prompt or through the
search-and-find option. Let us try whether Keras works properly in the
Jupyter Notebook by running a simple program using Keras.
So, the first step in the flow is to import the Keras. Readers should
also understand that the inbuilt datasets from Keras can also be used.
Here, in the below code snipped, we have imported the MNIST data set
from Keras.
26 n Deep Learning
One can uninstall Keras by referring to the screenshot for easier refer-
ence to the following command (Figure 2.24).
>pip uninstall keras
2.3
DATASETS –A QUICK GLANCE
Many data sets sources are available. A few are listed below with the
corresponding links in Table 2.1.
QUIZ
1. Describe clearly the procedure for installation of Anaconda.
2. How can someone install Jupyter over Anaconda?
3. How do we create a notebook in Jupyter and how can the same
be run?
4. What is NumPy all about and how is it useful?
5. Why is Pandas so famous and how is it special?
6. How is SciPy handy? List your views.
7. What is the command for the installation of Keras?
8. How is Keras uninstalled?
CHAPTER 3
Machine Learning:
The Fundamentals
LEARNING OBJECTIVES
After reading through this chapter, reader will be able to understand the
following:
3.1
INTRODUCTION
The book is aimed at Deep Learning, and this chapter is about Machine
Learning. Why is this needed is the first question a reader would have in
mind. The first chapter introduced the reader to the need for Deep Learning
and the discussions revolved around it. The present chapter provides the
reader with information on Machine Learning concepts and fundamentals
DOI: 10.1201/9781003185635-3 29
30 n Deep Learning
that are essential for any AI aspirant to know. Also, a Deep Learning expert
cannot avoid the concepts of Regression or Classification or Clustering.
All these are discussed in this chapter, which certainly is an important
one. There are cases where one could use the ML concepts in the Deep
Learning based systems. There is a lot of interoperability between ML and
DL and, hence, the authors have drafted this chapter by embedding much
important information.
3.2
THE DEFINITIONS –YET ANOTHER TIME
It is good to recollect the definitions once again, here. Let us define what
Artificial Intelligence, Machine Learning, and Deep Learning are about.
• Supervised Learning
• Unsupervised Learning
• Reinforced Learning
• Evolutionary Learning
3.3
MACHINE LEARNING ALGORITHMS
anytime when coming across the same. This is called Supervised Learning!
Figure 3.2 shows exactly how Supervised Learning works.
Your data will enable examples for each situation. The data will also spe-
cify the outcome for these situations. Training data is used to build the
model, and the model will predict the outcome for the new data! (This is
done with previous knowledge). If not trained, the results will not be as
expected (Figure 3.3).
Given a set of data, it will categorize them and give an output. Each cat-
egory is referred to as a cluster. The labels for the clusters are to be manu-
ally tagged later. The categorization is based on features in the data points.
The unsupervised learning algorithms are also referred to as clustering
algorithms. Consider the same use case discussed for supervised learning.
Here the model takes in all images and gives out two clusters. One cluster
will be “cat,” whereas another cluster will have all the images of “dog.”
When a new image is given to the model, it will be placed in one of the two
clusters based on its features.
The baby comes to the rescue again. One can refer to Figure 3.6.
3.4
HOW/WHY DO WE NEED ML?
Well, how do we learn? The following example is handy. When we see dark
clouds in the sky, we predict it is going to rain. Simple; this is prediction
through the training we received from childhood. This is the same con-
cept we use with ML. We train, we get the machine working, repeatedly
working, with no human intervention.
Machine Learning: The Fundamentals n 37
Netflix makes recommendations with the best series based on your taste,
but with ML behind. So, the point is simple. With ML in the picture, a com-
pany can identify more opportunities for making good profits.
A Scenario –You are browsing about Thailand Holidays. You did not
book any tickets or even confirm the trip. You login to Facebook and get
the Thailand holidays related posts. This is ML for you!
It helps in finding new business, enhancing profit and avoiding errors
or human intervention. No operator is there to link that Thai AD to a
Facebook page yet, so you got it. One can refer to the YouTube lecture by
authors on all the above discussed topics @ https://youtu.be/sQH_jyEkP-8.
3.5
THE ML FRAMEWORK
Figure 3.7 below is sufficient to understand the flow. Each step needs math,
tools, and techniques which we shall be learning here.
newgenrtpdf
38 n Deep Learning
The ML Framework.
FIGURE 3.7
Machine Learning: The Fundamentals n 39
The first step is about collection of the data from all the relevant
resources. The data is new fuel. In fact, the data is everything. The better the
collected data, the better the model developed will be. The second step is
about cleansing the data. It is like sanitizing the data and making it usable.
The outliers should be removed, and data should be consistent. Next,
model building is the step to be done. Training the model is important
and the better the training, the better the results. Just training is not suf-
ficient, so, the next phase is about testing. One can gain insights from
the testing results. One can deploy the model and visualize, improving the
model as well. This is the flow. One can listen to the video lecture on the ML
Framework from the link: https://youtu.be/3QwkFzofzbI
3.6
LINEAR REGRESSION –A COMPLETE
UNDERSTANDING
We need to learn some math (it cannot be avoided any further). Simple
Linear Regression is very useful and most commonly used algorithms in
the predictive analytics and Machine Learning. We shall understand what
Linear Regression is, how to build a model, and so forth. It is a simple tech-
nique used to find the relationship between a dependent variable and an
independent variable. That is, simply saying, regression tries to establish a
clear relationship between input and output.
We now try to draw the relationship between the input and output, that
is, the Independent variable and Dependent variable (Figure 3.8).
40 n Deep Learning
This is the purpose and what Linear Regression does. It derives a rela-
tionship between the Independent and Dependent variables. Now, let us
understand the possible cases.
Case 1 –Positive Relationship:
What will be the status when the Independent variable changes? When the
Independent variable increases and if the Dependent variable increases, we
call it positive. See picture below:
With the regression, one would aim at drawing a straight line (Need not
be straight always). We plot all the observations with dots. Then a line has
to be drawn that fits all the different points, and it is called a Regression
line. This is drawn through Least Squares Methods. The core aim is to
minimize error through drawing the Regression line. The Regression line
is presented with Figure 3.11, and now one can clearly understand the pur-
pose of this line. One can also walk through the video lecture presented
at the YouTube Link, which clearly talks about Linear Regression: https://
youtu.be/MdHe7Sn6Qt4
One could recollect the Y =MX +C expression, which was learned
during school days. The same come handy here.
Y =b0 +b1 * x
• Y =Estimated Pay
• X =Number of hours worked
• b0 =Y Intercept
• b1 =Slope of the line –
• Here is positive impact and, hence, is +b1
As the number of hours increase, the pay should increase. Hence, there
is a Positive relationship.
Y =b0 –b1 * x
• Y =Estimated pay
• X =Number of hours worked
• b0 =Y Intercept
• b1 =Slope of the line
• Here is Negative impact and hence is –b1.
Step 1: Get the X and Y. Mark the plot. X is the input and Y is the output.
X can be termed as Independent Variable and Y can be termed as
Dependent Variable (Table 3.1).
The immediate next step is to calculate Mean X and Mean Y. Mean X =(1 +
2 +3 +4 +5) /5 =3. We also need to calculate Mean Y =(2 +4 +6 +3 +
5)/5 =4.
So, we have arrived at Mean X and Mean Y. It is time for readers to go to
the next step in the sequence. Can we make a plot? Yes, the next step is to
46 n Deep Learning
go for the plotting of the Independent variable versus the Dependent vari-
able. (Figure 3.15)
The next step is to plot the mean in the graph presented above in
Figure 3.15. One can have a look at Figure 3.16 to understand how the
plotting has happened.
Machine Learning: The Fundamentals n 47
The next step in flow is to get the slope and the Y intercept plotted.
These are to be taken forward from the previously presented graph shown
in Figure 3.17.
The above plots have given the visualization of the data in form of the
graph, and it is mandatory to visualize data points in an understandable
manner. Now, we need to call for the distance. The distance from x value to
X mean and the y value to Y mean is to be calculated. Table 3.2 presented
below keep growing step by step and are easier to understand.
Now that the calculation is complete and the table is ready with informa-
tion, it is easy to proceed with the formula toward completing the process.
There are some fascinating tips to be followed when constructing the table.
The sum of the x-x’ and y-y’ should always be 0. If non zero, then, there is
something wrong in the computation.
Y’ =b0 +b1 * X’
4 =b0 +b1.3
4 =b0 +0.5 * 3
4 =b0 +1.5
b0 =2.5
Machine Learning: The Fundamentals n 49
b1 =slope =0.5
The time has come for us to learn what exactly is the Logistic regression.
The terminologies, Logistic Regression and Linear Regression are very
important, and are among the most discussed topics in ML.
3.7
LOGISTIC REGRESSION –A COMPLETE
UNDERSTANDING
Logistic regression is a mathematical model that predicts the probability
of occurrence of Y given the information of X, a previous event. Given X,
logistic regression predicts whether Y will occur or not. Logistic regression
is a binary event, which means Y can be either 0 or 1. Y gets the value 1, if
the event occurs and Y gets the value 0 if the event does not occur.
Logistic regression is mainly used for classification applications like
spam email detection, diabetes detection for a person based on various
features provided, and so forth. Popular applications include –Spam
Detection, Customer Choice Prediction –that is, whether the customer
will click a particular link or not? Will the customer buy the product or
not? Also, diabetes/cancer prediction and more.
Linear Regression gives a continuous output, but logistic regression
provides a discrete output. Linear Regression has a straight line, but Logistic
regression uses a sigmoid function.
To concisely convey,
3.8
CLASSIFICATION –A MUST-KNOW CONCEPT
The first question normally gets raised this way: What is the difference
between Regression and Classification? We explain this point first.
To start with, Regression and Classification both come under Supervised
Learning Algorithms! (Yes, Supervised, Labeled). Both have extensive usage
in ML, and both use the labeled data set. Then, where are they different?
The problems they solve are different and there is the difference.
As discussed, Regression predicts continuous values –salary, marks, age,
and so forth. Classification classifies things –male/female, pass/fail, false/
true, spam/legitimate, and so forth (It classifies, and that is it). Classification
divides the given data set into classes based on the parameters considered.
An example can be very helpful.
Gmail is the best example. Gmail classifies email as legitimate or spam.
The model is trained with millions of emails and has many parameters in
consideration. Whenever new mail pops up, the classification is done as
“Inbox, Spam, Promotions or Updates.” If spam, it goes to the spam box. If
legitimate, it goes to the Inbox. There are many famous and frequently used
classification algorithms. They are listed as follows:
It is good to understand and learn all of these. But would it be out of the
scope for this book? So, we handpick Support Vector Machines (SVM) and
K-Nearest Neighbors for the discussion.
That is it. One can now understand that all the Red Stars and Green
Triangles are grouped appropriately based on the hyperplane. It is now
time for readers to quickly navigate to the next one in the must-know list:
K-NN, the K-Nearest Neighbor Algorithm. The complete implementation
of SVM and a quick lecture on the same can be found @ https://youtu.be/
Qd9Aj_EMfk0
• It is referred to as non-parametric.
• It is also said to be a Lazy Learner algorithm.
By now, readers should have understood the way the K-NN works. One
must make a note that “Keeping low K values should be avoided, as the pre-
diction would go wrong.”
Advantages
• Simple.
• The more the data, the better the classification.
Disadvantages
One can receive a brief note from the authors by listening to https://
youtu.be/nVgZbVUmh50.
The next topic for discussion is “Clustering.” It is one of the very
important areas to learn and is interesting as well.
3.9
CLUSTERING –AN INTERESTING CONCEPT
TO KNOW
To make it simple, this is a method or technique to group data into clusters.
The objects inside a cluster should/must have high similarity. (Example.
Medical students –First year is a cluster; second year is a cluster, etc.).
A cluster’s objects should be definitely dissimilar to the objects from
another cluster. (Example. Engineering students –First year is another
58 n Deep Learning
cluster when compared with a medical student’s first year). These two
clusters are disjointed. Clustering helps in dividing the complete data to
multiple clusters. This is a non-labeled approach. So, it can be called an
Unsupervised approach. One could understand clustering by referring the
Figure 3.25.
• First Centroid =O2 –This shall be cluster 1 (O2 =First Centroid =1, 2, 2)
• Second Centroid =O6 –This shall be cluster 2. (O6 =Second
Centroid =2, 4, 2)
Can we not choose any other object as centroid? This is a very common
question. The answer is that any object can become a centroid. The next
question to be answered is: How do we measure distance? There is a for-
mula, and it comes to the rescue.
d=|x2–x1|+|y2–y1|+|z2–z1|
It is time to reconstruct the table, and one has to use the distance between
each object and the centroids chosen.
60 n Deep Learning
Like O1, O2 and O3, the rest of the calculations to find out the distance
from C1 and C2 are to be computed. One can refer to Table 3.4 to acquire
clearer understanding.
The next step is to go ahead with the clustering. How can that be
achieved? Simple. Based on the distance, one can go ahead with clustering.
Whichever is shorter: Say C1 is shorter than C2 for an object, the object
falls to C1. Hence, the clustering looks like:
Objects X Y Z
O1 1 4 1
O2 1 2 2
O3 1 4 2
O4 2 1 2
O5 1 1 1
O6 2 4 2
O7 1 1 2
O8 2 1 1
The next round, that is, the next iteration, has to be started.
QUIZ
1. Define Machine Learning.
2. Define Deep Learning.
3. Where will one use Deep Learning or Machine Learning?
4. How is regression useful?
5. What is Linear Regression?
6. How is Linear Regression different from Logistic Regression?
7. Differentiate clustering and classification.
8. Explain clearly how SVM works?
9. Explain how K-Means clustering functions.
FURTHER READING
✓ Williams, D., & Hill, J. (2005). U.S. Patent Application No. 10/
939,288.
✓ Jordan, M.I. and Mitchell, T.M., 2015. “Machine learning: Trends,
perspectives, and prospects.” Science, 349(6245), pp. 255–260.
✓ Goodfellow, I., Bengio, Y. and Courville, A., 2016. “Machine learning
basics.” Deep Learning, 1, pp. 98–164.
✓ Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar.
Foundations of Machine Learning. MIT Press, 2018.
64 n Deep Learning
LEARNING OBJECTIVES
After this chapter, the reader shall be able to understand the following:
4.1
INTRODUCTION
Artificial Intelligence (AI) has become an indispensable part of human life.
We have smart phones, smart TV, smart water heaters, and smart air con-
ditioning, even a complete smart home. Everything around us is smart after
the advent of AI. In the previous chapters, we have already talked about
what exactly AI is and how it differs from Machine Learning and Deep
Learning.
Deep Learning helps the machine to learn adaptively and respond to an
unprecedented scenario, as how a human being responds. This capability
is brought to a machine by building and training various models. These
models are capable of having various abstracted layers. These layers help in
extracting the required features for the decision-making layer to correctly
classify and predict the most effective output.
DOI: 10.1201/9781003185635-4 65
66 n Deep Learning
4.2
ARTIFICIAL NEURON
It is a universal rule in computer science that anything should follow an
input –output process model. It is the same case for Deep Learning. Why
does Deep Learning have layers? Yes, we have already talked about this in
an earlier scenario. We need Deep Learning to help machines to deal with
unprecedented scenarios as human beings do. That is great, but have we
ever thought about how a human brain works? Is that so simple that anyone
and everyone can understand? Definitely, no, which is why we have serious
research work still going on, and neuroscientists are trying to unveil the
process inside a human brain. If we need the machines to work like human
brains, it has something to do with neurons and its structure. Yes, now we
understand Deep Learning has neural networks inside.
A biological neuron has dendrites to collect the data from other neurons,
and the data is processed by the nucleus. The processed information is
passed through the transmission channel axon to the terminals for passing
The Deep Learning Framework n 67
the information to the next neuron, and the chain continues. This is how a
biological neuron works. Deep Learning is trying to imitate the structure
of a neuron in our brain. This is why a simple neural network could be
depicted as in the Figure 4.2.
This picture could be mapped to a simple neural network as simple as
having one set of inputs, one hidden layer and one output layer. For your
initial understanding, the hidden layer is where the processing happens.
The hidden layers are the abstractive layers, which help in performing
the extraction of features. We will learn about this in detail in upcoming
chapters.
4.2.2 Perceptron
A neuron is the basic element in any artificial neural networks. Here, it is
important to talk about the term perceptron, is coined by Frank Rosenblatt
in the year 1957. Perceptron is the unit in the artificial neural network that
acts as the computational unit for extracting features. This unit also acts as
the major business logic to classify or predict from the input data fed to the
system. This can be depicted as in Figure 4.3.
68 n Deep Learning
w1 x1 + w2 x2 + + wn xn
The Deep Learning Framework n 69
F (X) = Y
0.5
0
–6 –4 –2 0 2 4 6
TanH is another function, which is used often. They are the simplest
activation functions. Refer to Figure 4.5 to understand TanH.
cosh (x)
3
tanh (x)
1
–3 –2 –1 1 2 3 x
–1
–2
–3
sinh (x)
–4
F ( x ) = max (0, x )
The Deep Learning Framework n 71
f(x)
–4 –2 0 2 4 x
4.2.4 Parameters
In any Deep Learning neural network there are two different types of
parameters: One type is the model parameter and other one the hyper par-
ameter. Model parameters are those parameters in the model that are iden-
tified automatically by the system from training data. Hyper parameters are
those parameters that are adjustable and have to be tuned for obtaining the
best performance of the built model. Hyper parameters guard the whole
neural network process. Some of the hyper parameters include the number
The Deep Learning Framework n 73
of hidden layers that determine the network structure. The learning rate
is a hyper parameter that helps to understand how the network is trained.
Selection of optimal hyper parameters plays a significant role in the whole
process. If the learning rate is too slow, then the network consumes lot of
time in training to reach the global optimum. On the other hand, if the
learning rate is too fast, then it diverges, and the global optimum is hardly
reached. So, what is learning rate? A learning rate is a tuning parameter in
an optimization function wherein the size of each step decides how fast or
slow the network converges to a global optimum. Why do we need an opti-
mization function in Deep Learning, and what is the use of it? In addition,
what exactly do we mean by global optimum?
An optimization function is used in a neural network to reduce losses.
Optimizers tune the parameters such as weights and learning rates in
such a way that the model built is accurate with less loss. Optimization
algorithms help to fine-tune the learning rate in a neural network to reach
a global optimum. Global optima could be considered as the best possible
solutions by an optimization function from among multiple local optima
as shown in Figure 4.8.
Local optima 1
Local optima 2
Global optima
4.2.5 Overfitting
So, what is overfitting of data? When the model is tested elaborately with
a quantum of datasets, the model will start including every point, thus
resulting in overfitting. This mainly happens when the model is unable to
differentiate and learns even the noise in the training data, which in turn has
a negative impact in overall training. Overfitting is presented in Figure 4.9.
The model is trained many numbers of times as it tries to fit most of the
data. This affects the efficiency of the model in an adverse manner, giving
many false positives. Regression is prone to overfitting.
functions help to identify and stop training when the model starts giving a
certain accuracy. This acts as a checkpoint to adjust the learning rates after
each epoch. This helps the model from overfitting.
4.3
A FEW MORE TERMS
It is important to know the difference between other terminologies that
will be used frequently in the rest of the present volume’s chapters. Batch
size, training step, and epoch are such terms.
Batch size could be explained with an example: Consider our dataset as
being too big –say 1 million image data. It is extremely hard for the com-
puting system to train all the 1 million data at one go. Therefore, the entire
data could be divided into various chunks so that the processing done by
systems using CPU, GPU, or TPU would go well. Say a batch size of 25,000
at one go will make it easier to process each step faster. This explains the sig-
nificance of batch size. Batch size will not affect the time taken for training;
instead, it has a positive impact on model performance. A training step is
the processing of one batch size that in the above example, 25,000 images
are processed all at once. This updates the gradient of the model simultan-
eously after the complete process. This is also called as an iteration.
For example, if the dataset contains 1 million data and if the batch size
is 25,000 images, then the number of steps to complete the processing of
data is given to be 1 million/25,000 which is 40 steps. So, 1 epoch is the
number of steps a model takes to complete the processing of whole dataset
at once. Relating batch size, number of steps, and epochs, we can formulate
as follows:
4.4
OPTIMIZERS
It is time to examine optimization algorithms, which iterate many times,
fine-tuning the various parameters to bring out the best using the model
developed. There are various optimization algorithms. The evolution of
optimization algorithms in Deep Learning has emerged in a beautiful
manner. Optimizers define how neural networks learn. They help to adjust
the values of parameters such that the loss function is at its minimum as
shown in Figure 4.10.
76 n Deep Learning
θ = θ − α.∇θ J (θ)
Where θ is the current position and in formula above indicates the nega-
tive direction; α indicates the step size (learning rate), ∇θ gradient at current
position and, finally, J(θ) is the cost function, which has to be minimum.
In order to overcome the issue in gradient descent, the only solution
is to update the parameters more frequently. How can we do that? It is
possible by an improvised model of gradient descent algorithm named the
Stochastic Gradient Descent Algorithm. Let us see what happens here.
For every epoch,
For every data point in a sample,
θ = θ − α.∇θ J (θ)
This tries to update the weights considering each sample, which makes
things worse as it is biased by every single sample. Therefore, came the next
The Deep Learning Framework n 77
θ = θ − α.∇θ J (θ )
QUIZ
1. Define ANN.
2. Explain Perceptron.
3. What is the need of activation functions?
4. Which activation function deals with the issue of dead ReLU?
5. What is the Vanishing Gradient Problem? How can we overcome it?
6. What is an optimization function?
7. What are your thoughts about selecting the right optimization
function?
8. Define the terms epoch and batch size.
9. What is the major problem that Regression analysis suffers from?
10. Differentiate Linear Regression and Logistic Regression.
FURTHER READING
✓ Grigorescu Saxe, Andrew, Stephanie Nelli, and Christopher
Summerfield. “If Deep Learning is the answer, what is the question?”
Nature Reviews Neuroscience, 22.1 (2021): 5567.
✓ Wang, Xiaoyu, and Martin Benning. “Generalised Perceptron
Learning.” arXiv preprint arXiv:2012.03642 (2020).
✓ Jagtap, Ameya D., Kenji Kawaguchi, and George Em Karniadakis.
“Adaptive activation functions accelerate convergence in deep
and physics-informed neural networks.” Journal of Computational
Physics 404 (2020): 109136.
✓ Bingham, Garrett, William Macke, and Risto Miikkulainen.
“Evolutionary optimization of Deep Learning activation functions.”
arXiv preprint arXiv:2002.07224 (2020).
The Deep Learning Framework n 79
CNN – Convolutional
Neural Networks:
A Complete
Understanding
LEARNING OBJECTIVES
After this chapter, the reader shall be able to understand the following:
DOI: 10.1201/9781003185635-5 81
82 n Deep Learning
5.1
INTRODUCTION
When someone claims they know Deep Learning concepts, the first and
foremost question would be from CNN –Convolutional Neural Networks.
Yes, it is so very important and acts as the foundation for the rest of the
concepts, which are dealt in the subsequent sections. It is easier to learn
all these concepts if there is fundamental image processing knowledge.
However, the authors have explained these concepts well, and step by step.
All the codes presented in this chapter are coded and tested with Anaconda,
through Jupyter. Readers are requested to follow the instructions and
guidelines presented in Chapter 2 to install the required software tools. All
of them are open source, and the reader need not buy anything.
5.2
WHAT IS UNDERFITTING, OVERFITTING AND
APPROPRIATE FITTING?
Before getting into the core concepts with CNN it is important to learn
some fundamentals and terminology that will repeatedly be used in this
chapter and beyond. One such term is fitting.
What is underfitting? The line does not cover all the points shown in
Figure 5.1, presented below. This is called underfitting. Some also refer it
as “High Bias”.
What is overfitting? –The graph shows the predicted line covers all the
points in the graph. Is this not perfect and okay to go? No. It is ideally
not possible. It covers all the points. This means it could as well miss the
Understanding Convolutional Neural Networks n 83
noise and outliers. So, this is not a good approach. This model certainly
will give poor results. Avoiding this is mandatory! This is a “high variance”
approach. Refer to Figure 5.2 to understand the same.
What is correct fit? –The name says it all! It is the perfect fit. This will
not have High Bias/Variance. This covers a majority of the points. The plot
shown in Figure 5.3 presents the correct fit representation.
One could make a note of the term bias in the above explanation. Yes,
correctly observed. The next topics for discussion are bias and variance.
5.3
BIAS/VARIANCE –A QUICK LEARNING
Bias is how far are the predicted values from the actual values. If the average
predicted values are far off the actual values, then the bias is high. This is
what is seen with the underfitting, this is to be avoided!
What is the results of High Bias? The model is said to be too simple
and will not capture all the complexity of the data. This will lead to
underfitting.
Variance occurs when the model performs very well with the trained
dataset, but does not do well on a dataset that it is not trained on, like a test
dataset or validation dataset. Variance tells us how far out is the predicted
value from the actual value.
What is the result? Noise/outliers would be included and will be regarded
as overfitting which should also be avoided.
Having understood the fundamental terminology, it is important to
navigate promptly to the Convolutional Neural Networks.
5.4
CONVOLUTIONAL NEURAL NETWORKS
CNN is an ANN –Artificial Neural Network. A prominent application
for CNN is in image analysis. Yes, it is so very perfect and useful there
in that area. As conveyed earlier, it is better if the reader possesses image
processing knowledge, but if not, matrix multiplication knowledge is
sufficient.
CNN can be seen as an ANN, has some specialization, and detects
patterns in the images. CNN has hidden layers called convolutional layers.
There can be one or more hidden layers. It has non-convolutional layers,
too. But the basis is convolutional layers. It does the operation.
Like any other neural network architectures, CNN also has multiple
hidden layers; these are called convolutional layers (some of them could be
non-convolutional as well). One can study the pictorial representation in
Figure 5.4 to understand the aforesaid explanation.
Understanding Convolutional Neural Networks n 85
When the edges are detected (filtered), we term it edge detection filtering.
Similarly, we can detect the squares, rectangles, corners, and so forth, and
all through filters. Smoothing, sharpening, and so forth, all also come
under this category. We shall see the simple example below in Figure 5.6.
The input image has been operated with filters and the respective results are
provided below. The deeper the networks, the more sophisticated the filters
should be. The more sophisticated it becomes, the better the detection and
results. For instance, from the image presented one can even detect the IC,
cable, battery, and so forth.
newgenrtpdf
Understanding Convolutional Neural Networks n 87
An image –filter operations.
FIGURE 5.6
88 n Deep Learning
Can we take an example for the same? –5 * 5 input matrix to be convolved
with the filter chosen. An example is handy, and we can do it now.
The input image (as a matrix) assumed is presented below (Figure 5.8)
as a 5 x 5 matrix.
1 1 1 0 1
0 1 0 0 1
1 1 1 0 1
0 1 0 1 0
0 0 0 1 0
FIGURE 5.8 Input image as a 5 x 5 matrix.
Step 1
One can see from the below math that (Figure 5.10), the filter has
convolved (hovered) over the input image and, during the first step, that
is, the first stretch of convolution, the result “0” has been updated in the
resultant matrix shown on the right side of the image. Similar convolving
will happen across the entire input image matrix, and the resulting cells will
be updated consequently. One can better understand the same by having a
look at Step 2.
newgenrtpdf
90 n Deep Learning
FIGURE 5.10 The 3 x 3 filter convolving over 5 x 5 input image.
Understanding Convolutional Neural Networks n 91
Steps 2 and 3
Readers can look at Figure 5.11 to understand the next step in the convo-
lution process. Readers need to visualize the movement of the filter to the
right side in the input image by a pixel (i.e., by a cell). Again, matrix multi-
plication has to be carried out, and the result is to be updated on the result
matrix as shown in the right side of the figure below. Similarly, the filter has
to move further as shown in Figure 5.12.
newgenrtpdf
92 n Deep Learning
FIGURE 5.11 The 3 x 3 filter convolving over 5 x 5 input image –second step.
FIGURE 5.12 The 3 x 3 filter convolving over 5 x 5 input image –Third step.
Understanding Convolutional Neural Networks n 93
Now one can observe that the first row is completed, and the edges are
met in the matrix. That is, there cannot be any movement on the right side
of the input matrix. Similarly, the next row will be processed, and one can
understand the entire flow for the second row by referring to Figure 5.13.
newgenrtpdf
94 n Deep Learning
FIGURE 5.13 The 3 x 3 filter convolving over 5 x 5 input image –second row.
Understanding Convolutional Neural Networks n 95
Step 4
Finally, the last set of the convolution happens. Like the previous two steps,
convolution goes on here, and the final resultant matrix is obtained (Figure
5.14).
newgenrtpdf
96 n Deep Learning
FIGURE 5.14 The final row of convolution operation.
Understanding Convolutional Neural Networks n 97
• Zero Padding
• Stride
• Depth
The input image becomes convolved with the filter and is getting us
output as shown in Figure 5.15. The input is the matrix of 5 * 5, which gets
convolved with a 3 * 3 filter here. One can see that the dimension of the
input image gets reduced to 3 * 3 after convolving. There is a definite shrink
happening, and why is this? Yes, a 3 * 3 filter can fit only 3 * 3 positions
in the input image, and this can be understood clearly with the following
Figure 5.16.
newgenrtpdf
100 n Deep Learning
FIGURE 5.16 Edges play a vital role.
Understanding Convolutional Neural Networks n 101
Where
• n * n –input size
• f * f –Filter size
• 5 * 5 –Input Size
• 3 * 3 –Filet Size
So, the output size =(5 –3 +1) * (5 –3 +1). So, 3 * 3 is the resultant
image size!
If the input image size is 20 * 20 this shrinking may be okay. But assume
the case we have considered here. The input image size is only 5 x 5 and if
we have multiple filters, and many convolutions need to happen, imagine
the size of the final image and the amount of information we could have
lost on the edges?
More filters –Deeper and Deeper –Output shall be smaller and smaller –
hence, we end up with meaningless results. So, what is the solution? Zero
Padding is the solution. This is explained technically below for easier
understanding. The input image matrix, when padded with zero (one layer
of zeros is chosen for the example), as shown below in Figure 5.17, it is
referred to as zero padding.
102 n Deep Learning
Where
• n * n –input size
• f * f –Filter size
Considering the input matrix as shown in Figure 5.17 with the filter size
3 x 3, the result would be
• 7 * 7 –Input Size
• 3 * 3 –FilterSize
So, the output size =(7 –3 +1) * (7 –3 +1). So, 5 * 5 is the resultant
image size! Hence, we retained the size (dimension of the image).
This is how the size, that is, the dimension of the image can be retained
through zero padding. To bring better understanding, the complete convo-
lution with the zero padded input matrix is presented below. The filter to be
used for convolution is 3 x 3 filter as presented below (Figure 5.18):
Understanding Convolutional Neural Networks n 103
The above code snippet carries out the convolution with the specified
filter size, and when the padding option is set to valid, no padding happens,
and the dimensions will certainly be reduced, that is, shrinking happens.
One can run the same code in the Jupyter with all installations properly
done. The result on running the above code snippet is presented below
(Figure 5.19), where the dimensions are reduced and is circled.
Having seen this option, the next step is to find if the zero padding works
well when enabled with the code.
Understanding Convolutional Neural Networks n 105
One could notice that padding is set to same, which means the padding
is enabled, and dimensions should be retained. The below output can make
this understanding clearer (Figure 5.20). The dimensions remain con-
sistent, and no change is reported.
YouTube Link for Zero Padding and Max Pooling implementation: https://
youtu.be/mh_yYOpdBLc
Depth –As conveyed earlier, depth is fundamentally based on the
number of filters used. When someone uses n different filters, the depth of
the feature map is also n.
106 n Deep Learning
Stride –We moved the matrix over the input image, right? That sliding
is important here. Moving one pixel at a time corresponds to Stride 1. An
example: One can refer to Figure 5.21 to understand the striding concept.
One can understand the complete flow from Figure 5.22 as in how max
pooling works; The 4 x 4 matrix has been reduced to 2 x 2 through the max
108 n Deep Learning
pooling process. This definitely reduces the computational load. (The fewer
pixels to handle, the easier the computation.) This also helps in avoiding
overfitting. There are two more types of pooling. They are Average Pooling
and Sum Pooling. A brief note on this is presented below.
Average Pooling –This is like the max pooling, but with a slight devi-
ation in the approach. Instead of taking the maximum values from the
identified region, it is now the average of all the values in the region. Thus
came the name average pooling. This is not preferred over the max pooling
as it fails with the detection of sharp edges and other complex features.
Sum Pooling –This again is a variation of the max pooling. Here, instead
of average or max value, the sum of all the pixels in the chosen region is
calculated. Sum pooling also is preferred next to the max pooling in
applications.
YouTube link for Max Pooling –https://youtu.be/0uwStkFys-I
A Simple python code with Keras is presented below as a code snippet,
and the corresponding output is presented. One can understand the way
max pooling has halved the size of the input dimension.
One can see from the above execution result that the size is reduced
by half (Figure 5.25).
The first fully connected layer –This layer takes the inputs from the fea-
ture analysis and applies weights to predict the correct label.
Fully connected output layer –This gets us the final probabilities for
each label. Here is where one can get the final output.
transforms this number close to 0 and 1.One can understand the transition
by referring to the graph in Figure 5.27.
1
0.5
0
−6 −4 −2 0 2 4 6
Readers can try this out, step by step in their machines and visualize the
output as well.
Prerequisites for the code to be run:
• One must have installed Keras Libraries properly. (pip install keras or
conda install keras in the conda prompt are to be issued).
• While running the code one may get the following error:
Import Error: Could not find ‘nvcuda.dll’
This can be rectified by visiting www.dll-files.com/nvcuda.dll.html
where one can download the .dll file and save it in the directory –C:\
Windows\System32
• One should also install the following as they are to be used in future.
Entire model is to be run over the tensor flow and, hence, it is to be
installed as well. The commands are presented for ready reference.
pip install protobuf.
pip install tensor flow
The dataset
Whenever a model is built, it is to be tested with an appropriate dataset for
its functioning. The CNN model to be developed is first to be used to clas-
sify the input image as a cat or dog. The dataset collected is stored appro-
priately in the local drive of your choice (Figure 5.28). One can see the
test_set, training_set directories shown below. Also, cat_or_dog_1, cat_
or_dog_2 are used as the input images to be classified by the CNN model
developed.
It can be seen that test_set and training_set are present in the dataset
location. One should understand the fundamental idea of having the test
and training images. Below Figures 5.29 and 5.30 reveal the content inside
both the directories. The amount of training images is generally expected
to be more than the testing images. Normally, 70 percent of the images will
be used as training images while 30 percent will be used as testing images.
However, there is no hard and fast rule for this 70–30 ratio.
newgenrtpdf
114 n Deep Learning
FIGURE 5.29 The test dataset: cats and dogs.
newgenrtpdf
Understanding Convolutional Neural Networks n 115
The training dataset: cats and dogs.
FIGURE 5.30
116 n Deep Learning
On seeing the above code snippet, one can connect the fundamental
concepts learned clearly in the previous sections. The CNN architecture
can be visualized from the code as well. This code can be run on a PC
with a reasonable configuration such as Intel I5 with minimum of 4 GB
RAM. The dataset is collected from various resources and is made avail-
able for readers’ ready reference in the GitHub link. On running the above
118 n Deep Learning
developed CNN model successfully, one could get the below result clearly
on screen with classification done (Figure 5.31).
FIGURE 5.31 Output for Code 5.4, 5.5, 5.6, 5.7: CNN model.
One can understand from the above screenshot that the system itself
started giving 100 percent accuracy at the 5th epoch. Hence, once the
accuracy is found consistent –that is, if is found stable with epoch 7 or
8 –then, one can stop with 8 epochs. All these are trial-and-error based
findings and, once the code runs, one can tune the parameters to under-
stand all these clearly.
YouTube link for complete CNN Implementation: https://youtu.be/
lN-1m7pPVkY
QUIZ
1. Definite convolution.
2. What is the role of filters in convolution?
3. Why and how is zero padding important?
4. How is max pooling achieved?
5. What is the major difference between max pooling and average
pooling?
6. What are the layers in CNN architecture?
7. What is the need to flatten before the fully connected layer in CNN?
8. Explain clearly the term striding.
FURTHER READING
✓ Ma, N., Zhang, X., Zheng, H.T., and Sun, J., 2018. “Shufflenet
v2: Practical guidelines for efficient CNN architecture design.” In
Proceedings of the European Conference on Computer Vision (ECCV)
(pp. 116–131).
✓ Luo, R., Tian, F., Qin, T., Chen, E., and Liu, T.Y., 2018. “Neural archi-
tecture optimization.” In Advances in Neural Information Processing
Systems (pp. 7816–7827).
120 n Deep Learning
CNN Architectures:
An Evolution
LEARNING OBJECTIVES
After this chapter, the reader will be able to understand the following:
6.1
INTRODUCTION
One should understand a point: CNN is a beginning for a revolution. CNN
has laid the foundation for many innovations in the field of Deep Learning.
Many architectures have evolved based on the CNN, and learning these
architectures will eventually help the reader to select the appropriate archi-
tecture for the application. Let us first know the names of the available
architectures:
• LeNet
• VGGNet
• AlexNet
• ResNet
• ZFNet
• GoogleNet
6.2
LENET CNN ARCHITECTURE
Although CNN is believed to detect the patterns in the image and is
believed to be the only application area, actually, this is not so. More than
just patterns, it can be used for applications in real-time object detection,
OCR –Optical character recognition, hand writing recognition, and so
forth. CNN architectures mentioned in the introduction are similar in basic
aspects, but they could vary in the number of layers or types of layers being
deployed. One can understand this clearly as we deal with the models, one
by one.
It all started in the year 1998. LeCun et al. suggested LeNet architecture
in their research article was primarily meant for OCR. It is believed that
they tried to recognize the handwriting on the cheque. The LeNet model
that was developed is run with Intel I7 8th Gen CPU and is found to pro-
vide good performance. So, the author’s observation is that CPU can be
sufficient.
The original LeNet architecture proposed by the creators is
presented below:
newgenrtpdf
CNN Architectures: An Evolution n 123
FIGURE 6.1 LeNet architecture.
(Source LeCun et al., 1998).
124 n Deep Learning
Step 2
So, what would be the next step? The implementation takes up down sam-
pling/sub-sampling as the next step. How can this be achieved? Readers
were introduced in Chapter 5 to average pooling. Average pooling is to
be deployed in this stage with filter size 2 x 2, striding set to 2. As one can
understand, the average pooling will reduce the dimensions by half. Hence,
as expected, the size of the image will get reduced to 14 x 14, retaining 6
feature maps. Max pooling will not have any impact on the feature maps
and hence it is retained.
Readers should look at the expanded Table 6.1, which is presented as
Table 6.2 below.
Step 4
This layer is again the down sampling or subsampling layer. One should use
Average Pooling in this layer. The filter size preferred is 2 x 2 with striding
of 2. Naturally, the down sampling will reduce the size of the image by half
and 10 x 10 image gets reduced to 5 x 5 with retaining 16 feature maps.
Table 6.3 is updated as Table 6.4 and presented below for ready reference.
126 n Deep Learning
We are nearing the end. Yes, only a few more layers left.
Step 6
The sixth layer is a fully connected layer (F6) with 84 units.
CNN Architectures: An Evolution n 127
Step 7
Fully connected softmax output layer with 10 possible values as the classi-
fication result.
The table in its final form is presented below as Table 6.6.
Well, the model can be built now. Before getting into the process of
model building, it is important to connect the model with what it is going
to be used for. Here, we are going to use it for character recognition, that is,
handwriting character recognition. There are many handwriting datasets
available for usage, and the reader can themselves prepare one, too. But
the most common and most frequently used dataset is MNIST (Modified
National Institute of Standards and Technology) dataset. We will first
understand the dataset and navigate to the next stage, where we build the
model, step by step.
The Dataset –MNIST
One can find the MNIST dataset for download at the website http://yann.
lecun.com/exdb/mnist/.
This webpage has a lot of data. Really, a lot. The MNIST database has a
collection of handwritten digits. The dataset has 60,000 images for training
and 10,000 images for testing. It is a larger count and certainly found to
be very handy. This MNIST dataset originated from the NIST (National
Institute of Standards and Technology) dataset. One good aspect of the
MNIST dataset is that all the digits have been properly size normalized and
128 n Deep Learning
positioned centrally in the image, which makes the life of the programmer
easier. Also, the same dataset is made open in many GitHub links. The
following Figure 6.2 presents sample images from the dataset to establish a
quicker understanding for the readers about the dataset.
One can go through the below result screenshots (Figure 6.3 and 6.4)
to understand the accuracy achieved with this model in classifying the
handwritten digits from the MNIST dataset with the identification result
considered for two instances in the code.
132 n Deep Learning
FIGURE 6.3 Output for Code 6.1, 6.2, 6.3, 6.4 –MNIST dataset recognition for
digit 9.
CNN Architectures: An Evolution n 133
FIGURE 6.4 Output for Code 6.1, 6.2, 6.3, 6.4 –MNIST dataset recognition for
digit 6.
6.3
VGG16 CNN ARCHITECTURE
This is one of the most preferred CNN architectures in the recent past. It
had been developed by Simonyan and Zisserman by 2014. It has 16 convo-
lutional layers. One can see the complexity increasing gradually comparing
the initial versions of CNN architectures like LeNET. VGG 16 is preferred
as it has a very uniform architecture. Initially, it may look tougher, but it is
not as complicated as it appears. It is simple provided we establish a correct
understanding. The only point to worry about regarding VGG 16 is the
huge number of parameters it has. That is, it has 138 million parameters,
134 n Deep Learning
Since the number of layers is very high, it is better to explain the details
of layers through a table that has the complete information embedded in it.
One can refer to the Table 6.7 to understand the details.
One can understand the above table better after going through the
implementation. Details of the activation functions used is also clearly
mentioned in the code for easier reference.
The Model Development and Deployment
The Dataset
Whenever a model is built, it is to be tested with an apt dataset for its
functioning. The VGG16 model to be developed is first used to classify the
input image as a cat or dog. The dataset collected is stored appropriately in
the local drive of your choice. One can see the test_set, training_set dir-
ectories shown below. Also, cat_or_dog_1, cat_or_dog_2 are used as the
input images to be classified by the CNN model developed.
136 n Deep Learning
It can be seen that test_set and training_set are present in the dataset
location (Figure 6.7). One should understand the fundamental idea of
having the test and training images. Figures 6.8 and 6.9 reveal the con-
tent inside both directories. The amount of training images is generally
expected to be more than the testing images. Normally, 70 percent of the
images will be used as training images while 30 percent will be used as
testing images. However, there is no hard and fast rule for this 70–30 ratio.
newgenrtpdf
CNN Architectures: An Evolution n 137
The test dataset –cats and dogs.
FIGURE 6.8
138 n Deep Learning
On compiling this code (one should make sure that all the prerequisites
are installed) the following result will be presented (Figure 6.10). The image
tested ‘cat_or_dog_1’ has a dog, and the same is classified appropriately in
the results. One can also see the details of accuracy and other parameters.
FIGURE 6.11 Output for Code 6.5, 6.6, 6.7 –cat or dog classification with the
VGG 16 dataset.
One can see that the model is performing well and is giving the correctly
classified output (Figure 6.11). Once again, it is important for the readers
to tweak the parameters to try something different to understand the com-
plete behavior of the model.
Complete Demo and Description of VGG –16 in YouTube: https://youtu.be/
bEsRLXY7GCo
The time has come to understand one more model, which is believed to
be a trendsetter in this field. AlexNet is the next one to be understood and
learned!
Disclaimer: The reader should understand that the number of images
considered for the model implementation should be higher. For the want of
illustrations and understanding, the number of images considered in these
implementations could be limited. It is strongly encouraged to have as many
images in the training and testing dataset to increase accuracy and to elim-
inate overfitting problems.
6.4
ALEXNET CNN ARCHITECTURE
Disclaimer: Readers should not try this model in a lower configuration
machine, and it would certainly make the system frozen. This is computa-
tionally intense model and should be developed with proper configuration
machine.
142 n Deep Learning
One can see a lot of similarity for AlexNet with LeNET. Remember, there
were many convolution layers in the LeNET architecture. This is called a
deep neural network, and 60 Million Parameters are there, which makes
it very challenging. A significant difference, or development, one can see
from AlexNet is the depth. It is much deeper with having more filters per
layer. Also, the number of convolution layers are more. This can be called
a deep neural network without any doubt! A Dropout feature is added to
ensure overfitting is avoided. Data Augmentation (Mirroring image, to
increase the training volume) is also done. With the advent of GPUs and
Storage, AlexNet is found to be one of the best and, undoubtedly, it has set
a new trend.
The original architecture proposed by the authors of the architecture
is presented as Figure 6.12. It is eventually difficult for any beginner to
understand the architecture, and we realized the same. The revised, simpler
version for easier understanding is presented as Figure 6.13.
CNN Architectures: An Evolution n 143
In this architecture, the input size is RGB 224 x 224 x 3 (Actually 227
x 227, this mistake has been pointed out by many experts). The code
developed by us goes with 227 x 227 and can be observed from the code.
The architecture has 5 convolution layers, 2 fully connected layers, followed
by one softmax layer (output). One can understand that after the first con-
volution, max pooling is carried out. After the second convolution, max
pooling is again carried out. Then the third and the fourth convolutions
are carried out respectively. After the fifth convolution, again max pooling
is carried out. Entirely in this architecture up until the final layer, it is the
ReLU activation function being used, and the final layer is the softmax layer.
The dimensions, filter details, and striding information for the convo-
lution layers and the max pooling are presented as a table below with a
144 n Deep Learning
brief note for each layer, which will enable the reader to build the model
with ease.
Layer 1, Followed by Max Pooling.
As discussed, the input image is of the dimension 227 x 227 and is fed to
the convolution layer. The convolution layer generates the resultant image
with 96 feature maps, the dimensions being 55 x 55 with filter size 11 x 11.
The striding is set to 4. ReLu activation function is used in this layer. One
can go through Table 6.8 to understand the aforementioned details. This is
followed by the max pooling layer with the kernel size 3 x 3 and striding
of 2. The dimensions are reduced by half after max pooling and become
27 x 27 x 96. Feature maps never get altered through pooling operations
and hence there is no change.
Layer 2
The next round of convolution is carried out: 256 feature maps and a kernel
size of 5 x 5 with striding 1 gets the size as 1. Followed by this, there has to
be a max pooling carried out with the kernel size 3 x 3 and stride 2. This cer-
tainly again halves the pixel count and brings it down. Table 6.8 is updated
as Table 6.9 and is presented below for quick reference.
Finally, comes the fully connected layers and two fully connected layers
with 4096 neurons and ReLu activation are carried out. The final output
layer is with 1000 neurons, and softmax activation is used to fire the output
classification. Final Table 6.11 is presented below for a complete and com-
prehensive understanding by the readers.
146 n Deep Learning
The model developed is compiled for its functioning and the results are
presented below for ready reference (Figure 6.14). Readers can try training
the model with training dataset, test set and see if they derive a successful
classification as result.
148 n Deep Learning
FIGURE 6.14 Output for Code 6.9, 6.10, 6.11 –compilation results for AlexNet
architecture.
The next section gives a brief note on the rest of the architectures. With the
given inputs until now, one can easily implement any CNN architecture.
6.5
OTHER CNN ARCHITECTURES AT A GLANCE
1. GoogleNet –After the evolution of VGG16, multiple other models
have started coming out and many innovations appeared. Out of all
those models that came out in 2014, the one that gained attention
is GoogleNet. It is famously regarded as an inception. This model
also emerged as the winner of the famous and challenging ImageNet
contest. The model actually introduced a new module known as
inception modules. Inception modules used image distortion, batch
normalization, and RMSProp, which is a gradient-based optimiza-
tion technique. Batch normalization is an appreciable technique that
is deployed in GoogleNet for improving speed, performance, and
stability, as it is used in many other ANN applications. This module
(inception) is fundamentally built with many small convolutions so as
to bring down the number of parameters. As promised, the number
of parameters is brought to 4 million from a whopping 60 million
presented by AlexNet. This architecture is also deep and has 22 layers.
One can have a look at how the architecture looks from Figure 6.15.
3. ResNet –Kaiming and his team take the credit for building ResNet.
The research team from Microsoft is the first to introduce the
term Skipping Connections while not compromising quality when
building very deep neural networks. The skipping actually enabled
skipping one or more layers. It is certainly seen as innovative
to design such a deep network with up to 152 layers without any
quality compromise. Also, ResNet is one of the first to adapt Batch
Normalization. Use of residual blocks also avoided the vanishing
gradient to a very large extent. One can have a look at the architec-
ture of ResNet in Figure 6.17.
newgenrtpdf
CNN Architectures: An Evolution n 153
FIGURE 6.17 ResNet.
(Source: He, K., Zhang, X., Ren, S. and Sun, J., 2016. “Deep residual learning for image recognition.” In Proceedings of the IEEE conference
on computer vision and pattern recognition; pp. 770–778).
154 n Deep Learning
QUIZ
1. Mention the layers used in the LeNET architecture.
2. What are the important aspects of VGG16 architecture?
3. Where can someone use VGG 16 over other applications?
4. Draw the simple version of AlexNet architecture and explain the
important terminologies in the AlexNet architecture.
CNN Architectures: An Evolution n 155
FURTHER READING
✓ Mollahosseini, A., Chan, D. and Mahoor, M.H., 2016, March. “Going
deeper in facial expression recognition using deep neural networks.”
In 2016 IEEE Winter Conference on Applications of Computer Vision
(WACV) (pp. 1–10). IEEE.
✓ Christian, S., Wei, L., Yangqing, J., Pierre, S., Scott, R., Dragomir,
A., Dumitru, E., Vincent, V. and Andrew, R., 2015, June. Going
deeper with convolutions. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 1–9).
✓ Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D.,
Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. “Going deeper
with convolutions.” In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (pp. 1–9).
✓ Zeiler, M.D. and Fergus, R., 2014, September. “Visualizing and
understanding convolutional networks.” In European Conference on
Computer Vision (pp. 818–833). Springer, Cham.
✓ Simonyan, K. and Zisserman, A., 2014. “Very deep convolu-
tional networks for large-scale image recognition.” arXiv preprint
arXiv:1409.1556.
✓ Herath, S., Harandi, M. and Porikli, F., 2017. “Going deeper into
action recognition: A survey.” Image and Vision Computing, 60,
pp.4–21.
✓ Lee, H. and Kwon, H., 2017. “Going deeper with contextual CNN
for hyperspectral image classification.” IEEE Transactions on Image
Processing, 26(10), pp. 4843–4855.
CHAPTER 7
Recurrent Neural
Networks
LEARNING OBJECTIVES
After this chapter, the reader will be able to understand the following:
• What is RNN?
• Challenges in the Basic RNN.
• Functioning of LSTM function.
• What is GRU?
7.1
INTRODUCTION
This chapter focuses on the new term, Recurrent Neural Networks (RNN).
It is interesting and challenging, too. The Readers are requested to read the
contents of this chapter thoroughly and, in case of queries, one could look
into the video lecture links provided at appropriate places.
RNN is married to both Machine Learning and Deep Learning. Yes,
it is married to Artificial Intelligence (AI). Recurrent Neural Networks
have a variety of applications, which is unavoidable. Many of us are unwit-
tingly using some applications that could have deployed RNN. If you have
used speech recognition, language translators, or market stock predictor
applications, RNN is a regular customer. It is even very handy for image
recognition (Figure 7.1).
Needless to say, RNN is one of the types of ANN. Here in RNN, the output
of the previous step is fed in as the input (i.e., feedback) to the current step.
It is definitely different from the traditional approach or even from CNN,
which we dealt with in the previous chapters. There, traditionally, the input
and output are certainly independent of each other. Remember, in CNN,
we never remembered the previous state fully to get to further layers.
But RNN is not that. It remembers. Means, it has memory like humans
and animals. Coming to the applications of Natural Language Processing
(NLP), like the prediction of words in a sentence, it is always important to
remember the words in the past. Then the memory comes into the picture.
So, RNN came into the picture this way! The appreciable feature came
in with RNN is the “hidden state.” Hidden state is nothing but the memory,
which remembers some information. Figure 7.2 helps understand the diffe-
rence between traditional neural networks and recurrent neural networks.
How is the climate outside right today? How this could be processed is
presented step by step as shown.
• The first step is to feed “How” into the RNN. The RNN encodes
“How” and produces an output.
• Next, the word “is” should be fed. RNN now has the information
“How” and “is.” This stage gets OP2.
• Next stage, “the” would be fed. RNN now has the information from
the previous stages. So, in total it would be. How, is, the. This process
goes on until the end, and this is the fundamental idea. One can refer
to Figure 7.4 to understand the entire flow.
160 n Deep Learning
One should understand this point: The final output is derived through
the complete sequence. So, the final output, that is, OP8 in the example
above, can be passed to the feed forward layer and get the result!
One can listen to the lecture: “RNN: How It Works?” –https://youtu.be/
Gir8xDkEB8s.
7.2
CNN VS. RNN: A QUICK UNDERSTANDING
CNN is a feed forward neural network that finds lot of its applications in the
Image Recognition and Object Recognition sectors. RNN is fundamentally
based on the feedback, that is, the output of the current layer is dependent
on the previous layer as well. CNN worries only about the current input
and, as discussed, RNN has concern for previous output. RNN is memory
driven. It has memory. CNN is normally constructed with four layers: the
convolution layer, activation layer, pooling layer and fully connected layer.
The major task carried out by each layer is to extract features and to find
out patterns in the input image. RNN is all about input/hidden and output
layers. The hidden layers do the looping and have the memory to store the
previous results.
Recurrent Neural Networks n 161
CNN is most suitable for images, whereas RNN is suitable for sequen-
tial data. CNN has a finite set of input and generates only the finite set
of predicted values based on the input. RNN is not so. It allows arbitrary
input length. CNN is found to be too good for image-and video-related
projects, whereas RNN is always found to be good with time series infor-
mation. Such as, what can be the next word in the sequence, and so forth.
One can understand the difference between RNN and CNN by referring to
Figure 7.5.
7.3
RNN VS. FEEDFORWARD NEURAL NETWORKS:
A QUICK UNDERSTANDING
When it comes to Feedforward Neural Networks (FFN), the data navigates
from the input layer to the output layer, that is, from left to right. The data
moves through the hidden layers, which are structured in between. The
information, that is, the flow of the data, will be received only from left to
right, that is, no looking back.
Also, remember, never, ever does information reach a particular node
twice in the full cycle. A particular node receives input, and it is the only
time it can receive and never can it receive again. This will not be suitable
for applications like stock forecasting, market predictions and so forth, as
there is no knowledge of history. Any prediction needs history. One has
to agree to this point. History can be remembered only when there is
memory. No memory, no history in place. So, to conclude: feedforward
162 n Deep Learning
networks have memory loss (No memory at all!). FFN can remember only
the current input and the training instructions. RNN is different in this
aspect.
When it comes to RNN, we should remember there is a loop. The infor-
mation goes through the loop, and memory comes in. The decision on the
data is arrived at through the current state input and the previous outputs.
That is, prediction of markets will be done through the consideration of
current and historical data. Only then, is it prediction. One can understand
the concept better with the following Figure 7.6.
newgenrtpdf
Recurrent Neural Networks n 163
FFN vs. RNN.
FIGURE 7.6
164 n Deep Learning
One can listen to the lectures on the above topic by referring below
the links:
7.4
SIMPLE RNN
All these started with something called Elman Network, built by Jeffrey
Elman in 1990. It was the inspiration for everything we have now as RNNs.
Elman followed a simple strategy. He proposed to have the input layer,
hidden layer, and context layer along with the recurrent feedback concept.
To see what is was like, one can look at Figure 7.7.
Here, as one can see, there are connections from hidden layer to context
layer. (Feedback is to be seen as well, it provides data of the past). Also,
there is a connection from input to hidden layer, too.
Output
Recurrent
Hidden layer
Xt =Input at time t.
The function we use is recursive and is tanh. Also, weights are to be
multiplied appropriately. So,
State t =fn (State t-1, Xt) becomes State t =tanh(Wq * State t-1 +Wp * Xt)
Yt =Wr * State t
State t =St
Finally,
One can look at Figure 7.8 to understand the concept even better.
The time has come for the reader to understand the next concept, LSTM.
It is easier and interesting, too.
7.5
LSTM: LONG SHORT-TERM MEMORY
We need to go step by step in LSTM. LSTM is expanded as Long Short-
Term Memory. Readers will be taken through the entire process in small
steps, one after another, so that learning is effective and easier. One has to
use three gates and something called the cell state throughout this section.
The three gates are,
• Forget Gate
• Input Gate
• Output Gate
Recurrent Neural Networks n 167
Mathematically,
Why is Forget Gate required? What does it do? The first Sigmoid activa-
tion function in the network is the Forget Gate. As the name says, this gate
will decide which information has to be retained or dropped. The infor-
mation from the previous hidden state and the current input gets through
the Sigmoid function, and the output arrives. It is between 0 and 1. So,
the closer the value to 0, it is to be forgotten; the closer it to 1, it is to be
remembered. The next to be learned is the Input Gate.
Input Gate
The input gate is represented as it and computed through
Mathematically
Why is the Input gate required and what does it do? This is the second
Sigmoid function and first tanh activation function. This decides which
information should be saved to the cell State and which should be dropped.
Finally, one has to learn what the Output Gate is.
Output Gate
Output Gate is represented as Ot. It is computed through
Mathematically,
The new state is the new hidden state formed. One could go diagram-
matic now to understand the entire flow in a better way. One has to
remember that
• ft =Forget Gate
• ot =Output Gate
• it =Input Gate
• Ct =Cell State
• Ct’ =Intermediate Cell State
Recurrent Neural Networks n 169
Stage 1
We have the previous state c0, Old State S0, and input X1 with us. One can
understand the same by referring Figure 7.10.
Stage 2
There are weights available to be visualized as presented below.
170 n Deep Learning
One can compute the Input Gate through the Sigmoid function.
Recurrent Neural Networks n 171
Stage 3
Next, one should compute the Ct’ through tanh and with which Wc can be
computed; the same is presented below.
Stage 4
One should multiply the results as shown below (Figure 7.13). It should be
multiplied with Ct’. Also, ft is to be computed with Sigmoid function. Then
multiplication should happen as shown above with C0.
Stage 5
Finally, with adder being used, one could get the C1 from the previous
results. Cell State is updated at this point. Next is to compute Ot through
the usage of tanh, and is presented above. One can understand the same by
referring Figure 7.14.
Stage 6
Here is the final result (Figure 7.15), and LSTM is completed.
The next step is to implement the same with Keras. It is interesting and
easier, too. One can use LSTM to carry out the sentiment analysis from the
IMDB dataset. The complete code is presented with clear comments for
each of the lines.
7.6
GATED RECURRENT UNIT
The next topic queued up is Gated Recurrent Unit (GRU). This is one of
the recent innovations, and it is just six years old. It was invented in 2014
by K. Cho. (LSTM was created by 1997). GRU is very closely related to the
LSTM and, in fact, can be regarded as a family member of LSTM. When
someone says they are related, they share many good features with the
newer generation getting better over the previous one. GRUs can effect-
ively address the vanishing gradient problem just like LSTM. In addition to
it, GRU appears simpler comparing LSTM.
As seen in the LSTM, GRU also utilizes the gating mechanisms
(remember the gates used in LSTM: Forget, Input, and Output) to manage
and as well to control the flow of the information between the cells in the
neural network.
Recurrent Neural Networks n 177
The unrolled version will give reader a better idea and the same is
presented in Figure 7.18.
(
zt = σ W (z ) xt + U (z )ht −1 )
(
rt = σ W (r ) xt + U (r )ht −1 )
(
ht’ = tanh Wxt + rt Uht −1 )
The complete architecture is presented below as Figure 7.21.
180 n Deep Learning
One can try the following a piece of code to understand the practical
functioning of the GRU through Keras.
Recurrent Neural Networks n 181
• GRU: https://youtu.be/xLKSMaYp2oQ
• GRU Implementation: https://youtu.be/QtcxL-gd8Ok
182 n Deep Learning
QUIZ
1. How is CNN different from RNN?
2. Where could one choose RNN over CNN?
3. What is LSTM, and how does it function?
4. Mention the three gates used in the LSTM.
5. What are the cell states one should be aware of with LSTM?
6. Mention clearly the technical differences between LSTM and GRU.
Recurrent Neural Networks n 183
FURTHER READING
✓ Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. “Recurrent
neural network regularization.” arXiv preprint arXiv:1409.2329
(2014).
✓ Medsker, Larry R., and L. C. Jain. “Recurrent neural networks.”
Design and Applications 5 (2001).
✓ Mikolov, Tomáš, Stefan Kombrink, Lukáš Burget, Jan Černocký, and
Sanjeev Khudanpur. “Extensions of recurrent neural network lan-
guage model.” In 2011 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pp. 5528–5531. IEEE, 2011.
✓ Rodriguez, Paul, Janet Wiles, and Jeffrey L. Elman. “A recurrent
neural network that learns to count.” Connection Science 11, no. 1
(1999): 5–40.
✓ Gregor, Karol, Ivo Danihelka, Alex Graves, Danilo Rezende, and
Daan Wierstra. “Draw: A recurrent neural network for image gener-
ation.” In International Conference on Machine Learning, pp. 1462–
1471. PMLR, 2015.
✓ Greff, Klaus, Rupesh K. Srivastava, Jan Koutník, Bas R. Steunebrink,
and Jürgen Schmidhuber. “LSTM: A search space odyssey.” IEEE
Transactions on Neural Networks and Learning Systems 28, no. 10
(2016): 2222–2232.
✓ Sundermeyer, Martin, Ralf Schlüter, and Hermann Ney.
“LSTM neural networks for language modeling.” In Thirteenth
Annual Conference of the International Speech Communication
Association. 2012.
✓ Wen, T.H., Gasic, M., Mrksic, N., Su, P.H., Vandyke, D. and Young, S.,
2015. “Semantically conditioned lstm-based natural language gener-
ation for spoken dialogue systems.” arXiv preprint arXiv:1508.01745.
CHAPTER 8
Autoencoders
LEARNING OBJECTIVES
After this chapter, the reader will be able to understand the following:
• What is an autoencoder?
• Applications of an autoencoder.
• Types of autoencoder and complete implementation of convolutional
autoencoder.
8.1
INTRODUCTION
In the previous chapters we have come across CNN, RNN architectures
and their applications. They follow Supervised Learning. Do you remember
Supervised and Unsupervised Learning techniques? Supervised Learning is
a Machine-learning technique that helps to work with the dataset split into
training and test data. The training data will be associated with labels; the
algorithm has to learn with the training dataset, and the user employs test
or validation data to analyze the accuracy of the model created by using the
algorithm. On the other hand, with Unsupervised Learning, we have only
data with no labels associated with it. Now we are going to see a neural net-
work architecture, called an autoencoder, which has two neural networks,
encoder and decoder, working in an unsupervised fashion. Autoencoders
are mainly used as transformer-based models that have an encoder part,
which extracts the features, say from an image, and reconstructs the same
image from the features extracted, but with a low dimensionality.
8.2
WHAT IS AN AUTOENCODER?
An autoencoder is a simple ML algorithm that acquires input image, and it
will reconstruct the same. That is, the image is compressed. A dimension-
ally reduced image is produced as the output image. The dimensionality
reduction certainly is used in the data pre-processing (we reduce/com-
press). Sometimes, we might not require all the attributes that are there in
the dataset. So, what can we do? We could apply dimensionality reduction
to find out the most similar attributes, if it does not have an impact after
removing them on the dataset, and then they could be taken out. Therefore,
it helps to retain those features that are useful for the analysis of the dataset.
This seriously reduces the dimensionality of the dataset. Autoencoders use
dimensionality reduction for the reconstruction of the output image from
the input image fed into the neural network (Figure 8.1).
ENCODER DECODER
Original Reconstructed
input input
Compressed
representation
• Encoder (Input)
• Code (Hidden)
• Decoder (Output)
BottleNeck
HIDDEN
c
o
d
Encode
Decode
INPUT e OUTPUT
Here, as given in Figure 8.2, there is an input layer, also called an encoder,
which moves on to a hidden layer also called a code or bottleneck, where
the necessary features are extracted and, finally, on to the decoder, which
reconstructs a compressed image from code or bottleneck feature (Figure
8.3). The final output would be a compressed image than the original one.
188 n Deep Learning
“The Allegory of the Cave” tells about prisoners in a cave and unable
even to move their heads as they are chained. All they can see is the wall
before them. Behind them, there is a fire, which burns all through. Behind
the prisoners are puppeteers with puppets. They cast shadows on the wall
in front of prisoners. Prisoners are unaware of either the puppeteers or
the fire behind. All they can see are the images cast on the walls. In the
story, Plato’s story is very relevant for understanding latent features in the
context of autoencoders. To make it clear, the prisoners’ reality is only the
Autoencoders n 189
shadow of the puppets, for they are ignorant about the puppeteers and the
fire behind. These are just the “observed variables,” and the “real variables”
are behind, casting shadows making the observed variables. Here, the real
variables are hidden, not visible to the eyes of prisoners. The real variables
are not directly observable, but they are the true explanatory factors that
are creating the observed variables. These real variables are the true latent
variables in autoencoders.
So, what is the challenge now? We need to learn the true explanatory
factors; that is latent variables, only when observed variables are given. In
order to understand how latent features looks like in a real time example,
one could refer to the Figure 8.5.
image from the latent space representations. At this point, two more ter-
minologies are to be introduced about the dimensionality of latent space
representations and the loss function.
The dimensionality of latent space representation is directly related
to the quality of the reconstructed image. Lower dimensionality of the
reconstructed image will force a larger training bottleneck because of poor
quality of reconstruction. The loss function helps to understand the diffe-
rence in terms of loss from the original image to the reconstructed image.
Normally, in a standard autoencoder, mean squared error is used as the loss
function,
L ( X, Z ) = X − Z
2
This loss function, also termed as reconstruction loss, insists the latent
space representation to learn as many features as possible from the data.
The loss functions indirectly help the reconstructed image to follow the
distribution of the original image without much deviation.
Data-specific: This can work only on the data that are similar to what the
system is already trained on. (Also, this is not like GZIP or WINRAR,
where the compression happens and packaging is done.) For instance, if
an autoencoder is trained on compressing cat images, it may not work well
with donkey images.
Lossy: Well, the expectation may not always happen. This is the case with
Autoencoders. The output may not be exact as the input. Nevertheless, it
will be very much closer. It will certainly be a degraded version. This is a
lossy version. In addition, if you want lossless compression, well then, find
out different methods.
Unsupervised/self-supervised: We can call this unsupervised, as we
need not do anything other than feed the raw input. No explicit labelling
Autoencoders n 191
8.3
APPLICATIONS OF AUTOENCODERS
An autoencoder architecture has mainly three parts: encoder neural net-
work, code, and a decoder neural network. Now, let us see some applications
of autoencoders in real time scenarios. The major applications that we are
going to see here are
• Data Compression
• Dimensionality Reduction
• Image De-noising
• Feature Extraction
• Image Generation
• Image Colorization
8.4
TYPES OF AUTOENCODERS
There are different types of autoencoders available, namely,
• Denoising Autoencoder
• Vanilla Autoencoder
• Deep Autoencoder
Autoencoders n 193
• Sparse Autoencoder
• Undercomplete Autoencoder
• Stacked Autoencoder
• Variational Autoencoder
• Convolutional Autoencoder
One can understand this from the diagram (Figure 8.8). The hidden layer
is compressed representation, and with two sets of weights (and biases), we
194 n Deep Learning
encode our input data into the compressed representation and decode our
compressed representation back into input space.
HIDDEN
O
I
U
N
T
P
P
U
U
T
T
Mean
vector
R I
E M
I µ C A
N O
P E D N G
U N E S E
T C C T
X O Z O Y
R
I D D U
M E E C
R σ R
A T
E
G D
E Neural network
Neural network
Standard
deviation
vector
8.4.8
Convolutional Autoencoder
A convolutional autoencoder is unsupervised learning version of convo-
lutional neural networks using convolution filters (Figure 8.10). Major
applications where convolutional autoencoders are used are in the area of
image reconstruction in order to reduce errors.
In Code 8.1, we can see that all the required packages and headers have
been imported for the smooth functioning of the program. In the Code 8.2,
we are defining a function extract data, which accepts filename and no of
images as input, and return a numpy array of data as output. Here, in this
function, we use a function named gzip.open to open the file with filename
passed as argument and read the data as byte stream. Each of the data read
is converted into a 3D tensor as dimensions of image and no of images as
arguments. Reshaping of the data is done here.
198 n Deep Learning
In the Code 8.3, 60,000 training data and 10,000 test data are extracted by
using the function extract data (). In the Code 8.4, we can see the same but
defining a new function named extract_labels() and applying the training
and testing data to the function. The shape function gives the dimensions
of training and testing data.
Autoencoders n 199
In the Code 8.5, we can see the labels and images in dataset displayed.
In the Code 8.7, it is clearly stated about the training and testing split
of dataset with 80 percent of training data and 20 percent of testing data.
Followed by the initial settings required for the autoencoder like the batch
size, number of epochs, and so forth.
202 n Deep Learning
The autoencoder fits the model with the training and testing data by
using this code:
The next step is to predict the test images by applying the model. Let
us see how well the reconstruction of images happen. The test images and
reconstructed images can be seen in the Code 8.12.
Autoencoders n 205
QUIZ
1. What is an autoencoder?
2. How does an autoencoder work?
3. What is the significance of latent features in autoencoders?
Autoencoders n 207
FURTHER READING
✓ Baldi, Pierre. “Autoencoders, unsupervised learning, and deep
architectures.” Proceedings of ICML Workshop on Unsupervised
and Transfer Learning. JMLR Workshop and Conference
Proceedings, 2012.
✓ Pu, Yunchen, et al. “Variational autoencoder for deep learning of
images, labels and captions.” arXiv preprint arXiv:1609.08976 (2016).
✓ Pinaya, Walter Hugo Lopez, et al. “Autoencoders.” Machine learning.
Academic Press, 2020. 193–208.
✓ Sewak, Mohit, Sanjay K. Sahay, and Hemant Rathore. “An over-
view of deep learning architecture of deep neural networks and
autoencoders.” Journal of Computational and Theoretical Nanoscience
17.1 (2020): 182–188.
✓ Guo, Xifeng, et al. “Deep clustering with convolutional
autoencoders.” International Conference on Neural Information
Processing. Springer, Cham, 2017.
✓ Bao, Wei, Jun Yue, and Yulei Rao. “A deep learning framework for
financial time series using stacked autoencoders and long-short
term memory.” PloS one 12.7 (2017): e0180944.
CHAPTER 9
Generative Models
LEARNING OBJECTIVES
After this chapter, the reader will be able to understand:
9.1
INTRODUCTION
In the previous chapter, “Autoencoders,” we have learned about the working,
types, applications, and implementation of the standard convolutional
autoencoder. It is already seen that, in autoencoders, we use latent variables
for feature extraction. These features are used by decoder networks for the
reconstruction of images. Therefore, we could say we already know about
latent models. Latent models like autoencoders are mainly used for density
estimation of data, as they capture the probability distribution of data being
used. Another major application of latent models like autoencoders is to
generate new samples of data. However, in order to perform this task, there
is another variant of autoencoders, called generative models. Generative
models are a part of the statistical approach and have been in use for a long
time. This is what we are going to cover in this chapter.
9.2
WHAT IS A GENERATIVE MODEL?
Generative models are part of a statistical classification approach. This
model has been widely used in prediction of the next sentence or word in a
sequence, where the probability of adjacent word/s matter a lot. Generative
models help to find the foundational level of explanatory factors of under-
lying data by keeping track of the distribution of the data. This concept is
extended and used for the generation of new samples, which follow the
data distribution of an original dataset.
Generative models are judged to be powerful tools for exploring data
distribution or the density estimation of datasets. Generative models follow
unsupervised learning that automatically discovers the patterns or irregu-
larities of the data being analyzed. This helps to generate new data that
resemble mainly the original dataset. To be precise, Generative models aim
at learning the true data distribution of the training set to generate new
data points, with some variations.
Generative models are mainly used for density estimation and sample
generation, where it takes a few input training samples following some
distribution and generating new samples, which follow the same distri-
bution as input training samples. Another major application of genera-
tive models is outlier detection. Here the major question is: How can we
learn a generative model as similar as the true distribution of original data?
This is achieved by identifying the overrepresented and underrepresented
features. What are overrepresented and underrepresented features? Let us
understand that with help of an example, as shown in Figure 9.1.
The overrepresented features are all same in color, and the positions of
all the pencils are the same, whereas in the underrepresented features, the
Generative Models n 211
colors and poses are diverse and there are lot of overlapping, too. Therefore,
the challenge of generative models is to consider both these homogenous
and diverse features to generate fair and representative datasets. Another
major advantage of generative models is that they detect the outliers when
encountering something new or rare in the dataset. This could be done
by observing the distribution of data and the insight from outliers could
be used to improve the quality of generated data while training. To sum-
marize, the major applications of generative models include density esti-
mation, sample generation, and outlier detection.
9.3
WHAT ARE GENERATIVE ADVERSARIAL NETWORKS
(GAN)?
Generative Adversarial Networks are a type of generative models, which
sample from a simple noise and learns transformation to the training dis-
tribution as shown in Figure 9.2.
GAN uses two neural networks, which compete with each other, thereby
generating new samples. GAN architecture is shown in Figure 9.3.
212 n Deep Learning
each other until the discriminator network finds it entirely too difficult to
distinguish which of the samples generated by the generator network is
fake and whichreal.
In short, GAN tries to improve the closeness to the original data.
Discriminator works well and trains in such a way that it can identify the
fake ones clearly. Now, generator tries to move the fake points as close to the
real points that discriminator would be unable to identify. Discriminator
attempts to classify real data from fakes made by the generator. Generator
attempts to produce the copy of data to make the discriminator predictions
false. After complete training, the generator network would be able to
produce new samples that are not at all available. This is how the adver-
sarial networks work, generating brand new data.
9.4
TYPES OF GAN
There are different types of Generative Adversarial Networks available and
suitable for variable applications. Before moving into applications, let us
have a look over to the various types of GAN.
There are many more types of GAN, but they are out of the scope of
this book.
9.5
APPLICATIONS OF GAN
There are many applications for GAN, but we are going to restrict ourselves
to discussing a few applications here.
9.6
IMPLEMENTATION OF GAN
In this section we are going to see how to implement a basic Generative
Adversarial Network using Tensor Flow –Keras.
In Code 9.1, we import all the required libraries for the implementa-
tion of GAN. In this we have used, vstack from Numpy to stack arrays
in sequence row-wise vertically. We are already familiar with the other
libraries. In Code 9.2, we are building the discriminator neural network.
218 n Deep Learning
In Code 9.6, generate points in latent space as input for the generator.
In Code 9.7, use the generator to generate n fake examples, with class
labels.
In Code 9.9, evaluate the discriminator and generated images and save
the generator model.
As it goes, we could notice the generator getting better accuracy than the
discriminator network. That means, at one particular stage, the discrimin-
ator is no longer able to distinguish fake images from real ones.
224 n Deep Learning
QUIZ
1. Definite Generative Models.
2. What is GAN?
3. Can we generate a fair dataset from overrepresented and
underrepresented features? If so how?
4. Can you list a few variants of GAN?
5. What are the major applications of GAN?
FURTHER READING
✓ Zhong, Shi, and Joydeep Ghosh. “Generative model-based docu-
ment clustering: a comparative study.” Knowledge and Information
Systems 8.3 (2005): 374–384.
✓ Goodfellow, Ian J., et al. “Generative networks.” arXiv preprint
arXiv:1406.2661 (2014).
Generative Models n 225
Transfer Learning
LEARNING OBJECTIVES
After this chapter, the reader will be able to answer the following questions.
10.1
WHAT IS TRANSFER LEARNING?
Transfer Learning is the process of transferring the knowledge acquired
in implementing or learning one task to implement another related task.
For example, if we know to ride a bicycle (task A), we can use the know-
ledge we gained to learn bike (task B) (Figure 10.1). Here task B is related
to task A. In both the cases, the number of wheels will be 2. The way we
balance both are also the same. Similarly, if we know to do simple math-
ematics, we can apply those basics to study Deep Learning \ Machine
Learning (ML) algorithms.
10.2
WHEN CAN WE USE TRANSFER LEARNING?
In general, we can consider the below points to decide whether to apply
Transfer Learning techniques or not:
Transfer Learning n 229
• We are short of the vast set of labelled data needed to train our net-
work from the beginning.
• We already have a pre-trained network that solves a problem similar
to the one in hand.
• When the inputs of task 1 and 2 are same.
10.3
EXAMPLE –1: CAT OR DOG USING TRANSFER
LEARNING WITH VGG 16
The first and foremost step is to import the required libraries. In the
below code snippet, we can see that we have to import Keras, Numpy, and
VGG16from keras.applications (Keras is the library that provides us the
ability to use pre-trained models and Numpy for numerical operations).
Once we are done with the libraries import, we can go ahead with the
initialization of the VGG16 model with the weights of Imagenet dataset,
using the below lines:
#Load the VGG16 model
vgg_model = vgg16.VGG16(weights=‘imagenet’)
Code 10.1 shows the first step to importing the libraries and initializing
the VGG16 model.
Tip: If we want to use the other models like ResNet, Inception, we
should be using the below lines of code.
#Load the Inception_V3 model
inception_model = inception_v3.InceptionV3(weights=‘imagenet’)
#Load the ResNet50 model
resnet_model = resnet50.ResNet50(weights=‘imagenet’)
#Load the MobileNet model
mobilenet_model = mobilenet.MobileNet(weights=‘imagenet’)
230 n Deep Learning
Along with the above text, we also show the image from the input file
for our reference using the imshow() method as shown in the Figure 10.2.
25
50
75
100
125
150
175
200
predictions_mobilenet = mobilenet_model.predict(processed_image_
mobilenet)
label_mobilenet = decode_predictions(predictions_mobilenet)
print (‘label_mobilenet =‘, label_mobilenet)
Output of the predict function using the models would be the breed of the
input image (dog) and the probability. It will be similar to the below lines
for VGG16:
Label_
vgg16 =[[(‘n02099712’, ‘Labrador_ retriever’, 0.5967783),
(‘n02088601’, ‘golden_retriever’, 0.24647361), …
So, we are done with the input image classification using the VGG16
model. In the same way, we can predict the input image using Inception_
v3, ResNet, and so forth, and do a comparison of the models.
Readers are requested to run the above lines of code and compare and
contrast the performance of each model.
10.4
EXAMPLE –2: IDENTIFY YOUR RELATIVES’ FACES
USING TRANSFER LEARNING
The objective of this example is to identify the faces of our near and dears
using Transfer Learning techniques. We are going to leverage the VGG16
process again for this example. Before we can see the code snippets, we
need to recap the VGG16 architecture a bit.
newgenrtpdf
234 n Deep Learning
VGG16 layer details.
FIGURE 10.3
Transfer Learning n 235
Under each folder we will have the pictures of the persons. In short, we
are labeling the dataset. Once we are done with the dataset labeling, we can
start coding.
236 n Deep Learning
The above Code 10.6 shows the required modules for this exercise.
Table 10.1 points out the usage of each of the modules used.
Code 10.7 declares the image size to be used for all the images. As
VGG16 architecture was originally handling images of size 224*224, we
are also abiding by that dimension. Also, we see the path of Test and Train
folders above.
Transfer Learning n 237
Code 10.8 shows how we remove the last fully connected layer as
mentioned at the beginning of this section. include_top=False will tell the
model to ignore the last layer of the pre-trained model.
• Loss function
• Optimizer function
• Metrics to be used
238 n Deep Learning
Next comes the fitting part of the model. It is done to understand how
well our model is able to generalize to datasets similar to the ones on which
it was trained.
Transfer Learning n 239
Code 10.12 shows the fitting step. The parameters required for fitting a
model would be the test data set, training data set, number of epochs, steps
per epoch, and validation.
The last set of code snippet is given in Code 10.13. In this, we see that we
are declaring the labels of the datasets, followed by the image to be tested
(test.jpg). After the dimensionality changes, we ask the model to predict.
And, finally, we print the name of the person in test.jpg.
In this example, we are using the VGG16 model for predicting friends
and family. The readers are requested to try the above exercise with other
pre-trained models, such as resNet, Inception, and so forth.
10.5
THE DIFFERENCE BETWEEN TRANSFER LEARNING
AND FINE TUNING
The whole process of any neural network model is classified into two parts
(Figure 10.4).
1. Feature extraction
2. Classification/Prediction.
newgenrtpdf
240 n Deep Learning
FIGURE 10.4 Outline of CNN architecture.
Transfer Learning n 241
10.6
TRANSFER LEARNING STRATEGIES
Based on source domain, target domain, source task and target task, we can
classify the Transfer Learning techniques to different types.
What is a domain?
Domain is about a random variable X and the values X may take.
Mathematically, Domain (D) is represented as follows:
D ={X, P(X)}
Where,
• X =Feature Space (X1, X2, X3, X4, … .,Xn)
• P(X) =Probability distribution of X.
To make it simpler, if our domain is Animals (X), the domain space will
contain animals such as Cat, Dog, Giraffe, Elephant, and so forth. In this
case, X1, X2, X3 denotes an animal.
All right, what is a task then?
A task is the objective, or what our model is supposed to identify. For
instance, a cat or dog from the given picture. So, for a given target Domain
Dt and the task Tt, if we use the knowledge gained from source Domain
Ds and the task Ts, then coming back to the first statement of the section,
we can classify the Transfer Learning techniques to different types. We are
going to see these techniques in detail.
242 n Deep Learning
QUIZ
1. When the domain of the problem in hand is same as a pre-trained
model, but the task is different, what is it termed to be?
2. What are the differences between fine tuning and Transfer Learning.
3. What are the benefits of Transfer Learning?
4. How can we measure the accuracy of a model that uses Transfer
Learning?
5. If we do not have enough dataset available to train the model; we
cannot use Transfer Learning. True or false.
FURTHER READING
✓ www.cse.ust.hk/~qyang/Docs/2009/tkde_transfer_learning.pdf
✓ W. Dai, Q. Yang, G. Xue, and Y. Yu, “Boosting for Transfer Learning,”
in Proceedings of the 24th International Conference on Machine
Learning, Corvalis, Oregon, June 2007, pp. 193–200.
✓ K. Nigam, A. K. McCallum, S. Thrun, and T. Mitchell, “Text classi-
fication from labeled and unlabeled documents using EM,” Machine
Learning, vol. 39, no. 2–3, pp. 103–134, 2000. Blum and T. Mitchell,
“Combining labeled and unlabeled data with co- training,” in
Proceedings of the Eleventh Annual Conference on Computational
Learning Theory,1998, pp. 92–100.
✓ T. Joachims, “Transductive inference for text classification using
support vector machines,” in Proceedings of Sixteenth International
Conference on Machine Learning, Bled, Slovenia, 1999, pp. 825–830.
✓ S. J. Pan, V. W. Zheng, Q. Yang, and D. H. Hu, “Transfer learning for
wifi-based indoor localization,” in Proceedings of the Workshop on
Transfer Learning for Complex Task of the 23rd AAAI Conference on
Artificial Intelligence, Chicago, July 2008.
CHAPTER 11
Intel OpenVino:
A Must-Know Deep
Learning Toolkit
LEARNING OBJECTIVES
After this chapter, the reader will be able to understand the following:
• What is OpenVino?
• How to install and set up Intel OpenVino?
• Sample applications with Intel OpenVino.
11.1
INTRODUCTION
OpenVino is from Intel: Expanded as Open Visual Inference and Neural
Network Optimization toolkit. This is regarded as now the fastest on
the market. It was earlier called Intel Computer Vision SDK. OpenVino
provides improved neural network performance. To be precise, the per-
formance is ensured on variety of intel platforms and processors.
One can say, for sure that OpenVino is cost-effective and most suitable
for real-time computer vision applications. Also, OpenVino enables Deep
Learning Inference and Heterogeneous execution a reality. OpenVino also
can really enable the developers to innovate with the Deep Learning and
AI solutions. One good thing, this is easier to try, install, and practice. Also,
this is compatible with Neural Compute Sticks, both versions. This chapter
is focused toward getting readers a clear idea and guideline about how to
use OpenVino.
11.2
OPENVINO INSTALLATION GUIDELINES
This is a bit tricky, and you need a good machine with nice configuration.
Otherwise, this might not be a good show for you. OpenVino is compatible
with Linux/Windows 10/MAC. Guidelines are pretty easy, and installation
can be completed in an hour provided you do not get to many errors.
The hardware requirements for installation are to be met and the users
are supposed to have one of the following hardware components available.
We shall see Microsoft Visual Studio* with C++2019, 2017, or 2015 with
MSBuild download and installation.
One should visit –https://visualstudio.microsoft.com/downloads/. The
landing page would be similar to the one presented below in Figure 11.2.
One should select the free download option as presented below.
From the workloads section during the visual studio installation, select
the highlighted options as presented below in Figure 11.3. These are essen-
tial and must be done.
248 n Deep Learning
Now, under the individual components section, select the ticked options
as presented in the Figure 11.4.
Add path for all the users as it is preferred. The same is presented as
Figure 11.6.
Select the appropriate version files for the installation. One can refer to
Figure 11.8 for a quicker understanding.
The real installation starts now. The core OpenVino components are to
be installed and the same can be found from the link: http://software.intel.
com/en-us/openvino-toolkit/choose-download/free-download-windows?_
ga=2.118944013.509865441.1560578777-1273749604.1560486217&elq_
cid=3591353&erpm_id=6399395
w_openvino_toolkit_p_<version>.exe will be the name of the file. You
should register and download. Keep the key safe (Figure 11.10).
You must update several environment variables before you can compile
and run OpenVino applications. Open the Command Prompt, and run the
setupvars.bat batch file to temporarily set your environment variables:
The same has been presented as a screenshot for the quicker reference
below as Figure11.13.
cd C:\Program Files[CE:(x86)\IntelSWTools\openvino\deploy-
ment_tools\model_optimizer\install_prerequisites]
install_prerequisites.bat
Again, make sure the below-mentioned path is reached.
cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_
tools\model_optimizer\install_prerequisites
For each framework: We need to now go ahead with the configuration.
install_prerequisites_caffe.bat – For caffe
install_prerequisites_tf.bat – For Tensorflow
install_prerequisites_mxnet.bat –For Mx Net
install_prerequisites_onnx.bat – For Onnx.
install_prerequisites_kaldi.bat – For Kaldi
One would get the message as shown below in the prompt (Figure 11.14).
Two samples are there, by default, and the same can be tested. One has
to navigate to:
cd C:\Program Files (x86)\IntelSWTools\openvino\deployment_
tools\demo\
Issue the following commandfor running the first demo.
demo_squeezenet_download_convert_run.bat
When the verification script completes, you will have the label and confi-
dence for the top-10 categories as shown in Figure 11.15.
The next demo is interesting as well, and to run that one should issue
demo_security_barrier_camera.batcommand. Immediately the result
appears on screen and the same is presented as Figure 11.16.
Intel OpenVino: A Deep Learning Toolkit n 257
If you have seen these results on screen, that is it. You have installed
the software and is ready to be used. There are many scenarios one could
use OpenVino with. Two such examples are presented below as a simple
reference. One can recognize and predict the posture of a person with
OpenVino, and it is presented as a screenshot in Figure 11.17, followed by
real-time emotion analysis as presented in Figure 11.18. Opportunities are
endless when it comes to OpenVino. Start exploring.
newgenrtpdf
258 n Deep Learning
Posture recognition.
FIGURE 11.17
Intel OpenVino: A Deep Learning Toolkit n 259
QUIZ
1. How do you install the Intel OpenVino on Linux boxes?
2. What is a model optimizer, and what is the purpose of the same?
3. What are the frameworks supported by OpenVino?
260 n Deep Learning
FURTHER READING
✓ Gorbachev, Yury, Mikhail Fedorov, Iliya Slavutin, Artyom Tugarev,
Marat Fatekhov, and Yaroslav Tarkan. “Openvino deep learning
workbench: Comprehensive analysis and tuning of neural networks
inference.” In Proceedings of the IEEE/CVF International Conference
on Computer Vision Workshops, pp. 783–787, Korea. 2019.
✓ Castro-Zunti, R.D., Yépez, J. and Ko, S.B., 2020. “License plate
segmentation and recognition system using deep learning and
OpenVINO.” IET Intelligent Transport Systems, 14(2), pp.119–126.
✓ Kustikova, V., Vasiliev, E., Khvatov, A., Kumbrasiev, P., Vikhrev,
I., Utkin, K., Dudchenko, A. and Gladilov, G., 2019, July. “Intel
Distribution of OpenVINO Toolkit: A Case Study of Semantic
Segmentation.” In International Conference on Analysis of Images,
Social Networks and Texts (pp. 11–23). Springer, Cham.
✓ Yew Shun, O.O.I., 2018. High Density Deep Learning –Lite, with
Intel OpenVino.
✓ https://docs.openvinotoolkit.org/latest/index.html
✓ https:// s oftware.intel.com/ c ontent/ w ww/ u s/ e n/ d evelop/ tools/
openvino-toolkit.html
CHAPTER 12
Interview Questions
and Answers
LEARNING OBJECTIVES
After this chapter, the reader will be
Q2. What is the relationship you could draw between Deep Learning,
AI, and Machine Learning?
A:
What is correct fit? –The name says it all! It is the perfect fit. This will not
have High Bias/Variance.
Interview Questions and Answers n 263
Many signals can be received at an instance from other neurons. For this
the dendrites are attached to the cell body.
Convolutional layers
1 0 −1
1 0 −1
Output 1 0 −1
layer
Input
layer Hidden Hidden
layer layer
1 2
The filter shown above will slide over the input block (matrix). Sliding
is called convolving (Figure 12.6). The filter is going to convolve, and is
the crux!
newgenrtpdf
266 n Deep Learning
FIGURE 12.6 The convolving operation.
Interview Questions and Answers n 267
0.5
0
−6 −4 −2 0 2 4 6
Q20. What percentage should be test dataset and how much should go
for training?
A: There is no hard and fast rule. Normally it is preferred as 70 per-
cent/30 percent –70 percent for the training and 30 for the testing.
Therefore, the best combination can be arrived at by trial and error.
Q24. How can you enable padding in the Python code? Can you present
the code for the same?
A:
One can understand the process by referring Figure 12.13
newgenrtpdf
274 n Deep Learning
FIGURE 12.13 Padding with the Python code.
Interview Questions and Answers n 275
Hidden Hidden
Input layer Input layer
layer Output layer Output
layer layer
Outputs
Outputs
Inputs
Inputs
RNN FFN
Linear function
8
2
Linear (x)
−2
−4
−6
−8
−8 −6 −4 −2 0 2 4 6 8
X
30
22.5
15
7.5
0
0 15 30 45 60
ENCODER DECODER
Original Reconstructed
input input
Compressed
representation
Q34. What are the three properties/features one should remember with
autoencoders?
A:
1. Data-specific behavior
2. Lossy Compression nature
3. Unsupervised in nature
Interview Questions and Answers n 281
Q38. What determines the difference from the original image to the
reconstructed image in an autoencoder?
A: Loss function helps to understand the difference in terms of loss
from the original image to the reconstructed image. Mean squared
error is used as the loss function,
L ( X, Z ) = X − Z
2
Mean
vector
R I
E M
I µ C A
N O
P E D N G
U N E S E
T C C T
X O Z O Y
R
I D D U
M E E C
R σ R
A T
E
G D
E Neural network
Neural
network
Standard
deviation
vector
Q48. How would you decide whether Transfer Learning or Fine Tuning
is best suited for a problem?
A: Start with Transfer Learning and, if needed, move to Fine Tuning
other detailed layers.
289
290 n Index