Merged Deep Learning
Merged Deep Learning
net/publication/360400321
CITATIONS READS
0 2,076
3 authors, including:
All content following this page was uploaded by Makhan Kumbhkar on 19 July 2023.
by
Makhan kumbhkar
&
AG
PH
Books
2022
i
INTRODUCTION TO DEEP
LEARNING
Subba Rao Polamuri, Makhan kumbhkar and
© 2022 @ Authors
ISBN – 978-93-94339-21-7
Published by:
Contact: +91-7089366889
ii
ABOUT AUTHORS
iii
International Journals, Conference and Symposiums. He
has done many patents and copyrights. He is editorial
board member of different journals. His main area of
interest includes Machine learning, Deep learning, Natural
language processing and Internet of Things.
iv
PREFACE
The book offers a good starting point for people who want
to get started in deep learning, with a focus on NLP.
v
ACKNOWLEDGMENTS
Last but not least, thankyou reader, for your interest, time,
and trust to work with this book.
vi
TABLE OF CONTENT
vii
CHAPTER
1
THE NEURAL NETWORK
1
distinguish between different sounds. Objects can be
2
And by the time they're in kindergarten, they have a
3
conventional computer programmes for this purpose.
Source:https://www.researchgate.net/figure/MNIST-
dataset-of-handwritten-digits_fig1_324877673
Figure 1.1, we can always tell which ones are zeros and
which ones are ones since they are all written in a slightly
https://upload.wikimedia.org/wikipedia/commons/thumb/2/27/
*
MnistExamples.png/320px-MnistExamples.png
4
solve this problem. How can we discern one digit from the
other?
picture has just one closed loop, we may say that we have
worse six?
six*
Source:https://www.oreilly.com/library/view/fundamentals-of-
*
deep/9781491925607/ch01.html
5
Some form of cutoff for the distance between the
6
equation, or extract a derivation from an equation. The
The fact that our parents did not teach us how to recognise
7
In artificial intelligence, deep learning is a subfield of
algorithm*
Source: https://www.javatpoint.com/machine-learning-support-
*
vector-machine-algorithm
8
components would be pixel intensities at each position, as
the day before. We collect a lot of data, and for each data
9
we want to figure out a parameter vector that allows our
Figure 1.4: Sample data for our exam predictor algorithm and a
potential classifier*
*Source:
https://docs.oracle.com/cd/E18283_01/datamine.112/e16808/classi
fy.htm
10
Then it comes to light, by selecting θ = [−24 3 4]T, Every
11
Another thing is clear: this specific model (the linear
Figure 1.5: As our data takes on more complex forms, we need more
Source:https://www.softwaretestinghelp.com/data-mining-
*
process/
12
approximate the structures used by human brains in order
outperform them!
13
CHAPTER
2
Artificial Neural Network
2.1. Background
labelled "Neural.":
non-linear functions.
14
Training and prediction activate the adaptive weights,
network models.
15
more suited for dealing with real-world issues than the
traditional models.
2.2. ANN
16
The lecture on artificial neural networks goes through
other things.
* Source: https://www.javatpoint.com/artificial-neural-network
17
Figure 2.2: typical Artificial Neural Network*
* Source: https://www.javatpoint.com/artificial-neural-network
18
disorganized manner, allowing us to access several bits of
19
and connected together. Check out all of the many artificial
Input Layer:
Hidden Layer:
* Source: https://www.javatpoint.com/artificial-neural-network
20
calculations necessary to uncover hidden patterns and
traits.
Output Layer:
Using the hidden layer, the input is turned into the output,
which is subsequently conveyed with the help of the
communication layer.
21
2.4. How do artificial neural networks work?
for bias is the same, and its weight is 1. It's possible for the
22
total weighted inputs to have any value between 0 and 1.
Source:
* https://static.javatpoint.com/tutorial/artificial-neural-
network/images/artificial-neural-network6.png
23
linear activation functions are the most common types of
Binary:
Sigmoidal Hyperbolic:
the real net input is achieved here using the tan hyperbolic
24
a certain task in a way that is akin to that of the human
Feedback ANN:
Feed-Forward ANN:
25
may be determined based on the collective activity of the
26
output may be sent back to itself as an input in certain
form of a sequence
Source:https://towardsdatascience.com/introducing-recurrent-
*
neural-networks-f359653d7020
27
As with the words and numbers in sentences, these neural
produce sequences.
28
Encoder-Decoder Networks
29
Figure 2.6: Recursive neural network*
reinforcement-based.
Source: https://www.kdnuggets.com/2016/06/recursive-neural-
*
networks-tensorflow.html
30
2.6.1. Supervised learning
f(x), and the goal value y, across all the example pairings,
networks.
31
"instructor" function, in the form of a function that offers
network's output, f.
32
costs, such as the total cost of ownership, is the ultimate
33
networks can carry out several tasks at once.
34
Having fault tolerance:
structure via trial and error, experience, and more trial and
error.
Hardware dependence:
35
the production of the equipment is contingent on its
success.
36
deal of trial and error is necessary, however, when it
2.10. Applications
37
a kind of regression analysis that includes time
processing.
38
Several tumors have been identified using artificial neural
institutions.
39
since neurological systems are so directly linked to
Types of models
40
Memory networks
other. All addresses that vary by a few bits from the query
41
Deep-Neural Mind's Turing Machines to improve the
42
MLP is a universal function approximator. When it comes
Capacity
Convergence
43
may be multiple local minima. The cost function and the
unreliable.
44
accomplished by selecting a higher prior probability over
over the training set and the predicted error in unseen data
as a result of overfitting, respectively.
a validation set.
45
CHAPTER
3
Deep Learning
46
expressions) from examples are available. There are claims
(supervised).
level ones.
47
A hierarchy of ideas may be formed by learning
As the signal goes from the input layer to the output layer
48
CAP depth is equal to the number of hidden layers
deep learning.
49
abstract are learned from more concrete ones. These
deep learning.
image recognition).
50
Unsupervised learning challenges frame many deep
51
Recurrent neural networks
the process repeats again. The term "deep" comes from the
be deemed usable.
52
learning is both quicker and more accurate than
supervised learning.
scan digital data for pixel patterns. With each iteration, the
53
unstructured and unlabeled as the Internet of Things (IoT)
54
popular methods of adjusting the learning rate include
requires far less data than other approaches, this one may
be computed in only minutes or hours instead of days or
weeks or months.
55
during training. The dropout strategy has been
biology.
artwork.
560 million goods for sale and more than 300 million
56
In contrast to typical machine learning, deep learning does
you may have, but you are unsure of the attributes of your
launched in 2017.
57
system determined that the photo is of a cat and not a dog
58
Figure 3.1: Applications of Deep Learning *
Source:
* https://serokell.io/blog/deep-learning-and-neural-
network-guide
59
Recommender systems: User preferences across a wide
point.
60
first few iterations include educated guesses, the training
data must be labelled so that the model can see whether its
61
These are some of the domains where deep learning is
62
Medical research: Deep learning has been used by cancer
single source that does not cover the whole functional area
cannot be used to build a generalizable model.
63
are most important. Using parameters such as race or
64
powerful and accurate models will demand
retrained.
65
opposite. As the amount of the data rises, the time it takes
66
CHAPTER
4
Deep Learning Libraries
4.1. Theano
67
On top of Theano, a wide range of applications have been
Blocks
Keras
Lasagne
PyLearn2
4.2. TensorFlow
machines.
68
single desktop, server, or mobile device using a single API,
it was built.
69
word used to describe the flow of these tensor units across
70
Automatic differentiation: The automated
graphs.
of devices.
4.4. Keras
few libraries that can operate on both the GPU and the
CPU.
71
components. Neural layers, cost functions and optimizers,
4.5. PyTorch
72
finest machine learning and deep learning framework,
*Source: https://towardsdatascience.com/best-python-libraries-
for-machine-learning-and-deep-learning-b0bd40c7e8c
73
4.6. Scikit-learn
Classification
Regression
Clustering
Dimensionality Reduction
Model Selection
Preprocessing
4.7. Pandas
74
dataset is prepared for training, it comes into play. For
include:
alignment.
Options for several types of indexing: hierarchical
4.8. NLTK
75
FrameNet, WordNet, Word2Vec, and many more. NLTK's
76
Figure 4.2: Spark Mllib Machine Learning Tools*
4.10. Numpy
sets are the focus of the NumPy package for Python. Fast
*Source: https://towardsdatascience.com/best-python-libraries-
for-machine-learning-and-deep-learning-b0bd40c7e8c
77
Simulations based on chance
78
CHAPTER
5
Deep Learning Architectures
79
layered compositions of picture primitives. Additional
similarly.
by doing this.
80
action. In order to identify the item, creative and analytical
81
chance that these weights will be in an area of the weight
visible connections*
Source:https://www.kdnuggets.com/2016/06/recursive-neural-
*
networks-tensorflow.html
82
A DBN may be trained in an unsupervised, layer-by-layer
links between the visible units of the input layer and the
hidden units of the hidden layer as shown in figure 5.1.
83
Belief Networks were used after that to generate values for
84
5.3. Convolutional neural networks
85
Figure 5.2: CNN*
Source:
* https://towardsdatascience.com/a-comprehensive-
guide-to-convolutional-neural-networks-the-eli5-way-
3bd2b1164a53
86
5.3.1. Working of CNN
Convolutional layer
Pooling layer
Fully-connected (FC) layer
layer of the CNN, more and more details of the picture are
picked up, until the final layer recognizes the target item.
87
components are needed, including input data, filtering,
88
Figure 5.3: Feature map*
Source:
* https://www.ibm.com/cloud/learn/convolutional-
neural-networks
89
are used to change certain parameters, such as weight
90
Full padding: Zeros are added to the input border in order
91
Figure 5.4: Determining the image of bicycle*
Source:
* https://www.ibm.com/cloud/learn/convolutional-
neural-networks
92
function to fill the output array with the receptive field's
looks for the pixel with the highest value and sends that
directly.
93
Based on the characteristics retrieved from the preceding
layers and their varied filters, this layer conducts the duty
profile.
94
Healthcare: It is now possible to detect malignant tumours
5.3.6. Limitations
95
with it. However, when faced with the identical sight, we
Source:
*
https://www.analyticsvidhya.com/blog/2021/05/convolutional-
neural-networks-cnn/
96
trained on a massive library of photos and videos, the
conditions.
97
5.3.7. Real-world applications of Convolutional
neural network (CNN)
on the faces.
98
When it comes to object detection, CNN has been used to
those objects.
99
need for human translators or translators who are fluent in
both languages.
written differently.
100
models to look for anomalies. CNNs have been used to
101
combining many photos from social networking sites, such
102
information, CNN may utilise both text and pictures to
Tensor Flow
103
GPUs on a server, desktop, or mobile device does not need
*Source: https://www.analyticsvidhya.com/blog/2016/10/an-
introduction-to-implementing-neural-networks-using-
tensorflow/
104
Step 1: Upload Dataset
105
Define the CNN:An advantage of CNNs over more
linear.
pooling layer.
106
CNN architecture
107
inputs to form a two-dimensional convolutional
layer.
together in a function.
108
it must be padded in the same manner as its output.
14, 14].
109
layer. A 7*7*36-sized module may be reshaped using the
module reshape.
110
Figure 5.7: Pooling and Flattening*
layer.
Source: https://towardsdatascience.com/the-most-intuitive-and-
*
easiest-guide-for-convolutional-neural-network-3607be47480
111
Isn't this just a waste of time and money? Is there a good
reason for reducing the size? It may seem that data is being
Source: https://towardsdatascience.com/the-most-intuitive-and-
*
easiest-guide-for-convolutional-neural-network-3607be47480
112
Paging in convolutional neural networks refers to the
Padding
Source:
* https://analyticsindiamag.com/guide-to-different-
padding-methods-for-cnn-models/
113
adding padding to an image's frame. When CNN images
CNN.
Types of Padding
Same padding
Causal padding
Valid padding
with the padding layers that add zero values to the outer
frame of images or data.
114
Valid Padding:We use every point/pixel value when
bottom if your filter and stride don't cover the whole input
image (i.e., no padding mode).
Stride
then the filter will only move one pixel in the image (or
115
output volume, stride is often set to a full integer rather
*Source: https://deepai.org/machine-learning-glossary-and-
terms/stride
116
CHAPTER
6
Natural language Processing
117
translated into numerical values that computers can
Processing (NLP), you may not have even noticed it. When
drafting an email, offering to translate a Facebook post
6.1.1. Evolution
118
The Turing Test, created by Alan Turing in the 1950s to
language.
fields.
119
applications for natural language processing. Traditional
Syntactic Analysis
different words.
120
A few of its most important sub-tasks are as follows:
noun.
Inflected words may be reduced to their
Semantic Analysis
121
Disambiguation of word sense is the process of
122
SLA is an industry abbreviation for service-level
agreement.
be evaluated.
now read text, hear voice, and analyse it using NLP. They
can also gauge how people feel about what they hear and
use that information to make decisions.
123
automation will be important for the effective analysis of
124
6.1.4. NLP in Business
communication.
Sentiment Analysis
125
referencing your business in real time and respond to
Language Translation
in other languages.
126
Text Extraction
Chatbots
127
quickly, manage many inquiries at once, and free up
Topic Classification
128
Topic categorization may also be used to automate the
129
Statistical NLP, machine learning, and deep learning
the early days of NLP, but they couldn't keep up with the
being generated.
130
natural by allowing computers to comprehend human
language.
department of a firm
131
organised programming language or by using a restricted
speech.
132
digest naturally. Rules of language do exist, but they are
outdated.
133
CHAPTER
7
Memory Augmented Neural Networks
134
Figure 7.1: Memory Augmented Neural Network Architecture*
Source: https://medium.com/the-ai-team/memory-augmented-
*
neural-network-for-meta-learning-case-study-56af9cc81ae2
135
been the subject of several investigations, with innovative
136
effective solution for any arbitrary issue because the search
space is so wide.
The answer is so trivial: It's two! How did our minds come
137
are listed in the second phrase, and at this point it's only a
glass of milk.
that the office is where the first memory place was. By the
138
For example, if we tried to access the NTM's memory in
139
Where T denotes the matrix transpose operation.
by the models.
140
All of the architecture's components may be rearranged
There are two weightings for each read or write head: one
for each location and one for each head. In this way, a
memories.
141
memory. The neural network gets input from the outside
142
longer relevant. second drawback of NTM is its inability to
utilization vector.
being utilised.
143
The initial value of the use vector, with 0 representing a
obvious that the area with the lowest utility value should
144
weightings. As a result, having the option to release and
and not get freed. Each member of this vector has a value
145
7.8. Temporal Linking of DNC Writes
been the last one to have been written in. Initially, the
146
When updating, the previous values of the precedence are
sequences.
147
memory dynamically. Instead we’ll treat a series of such
We can see how the DNC is writing each of the five vectors
*Source: https://www.researchgate.net/figure/An-example-
structure-of-the-input-data-set-to-LSTM-model-Here-the-light-
colored-time_fig1_348426928
148
Figure 7.3: Visualization of the DNC operation on the copy problem*
*Source:https://www.researchgate.net/figure/A-summary-of-the-
five-major-steps-involved-in-the-development-of-machine-
learning-based_fig1_349198111
149
During the writing and reading stages of each sequence in
150
CHAPTER
8
Deep Reinforcement Learning
8.1. Introduction
151
AlphaGo, which defeated the world champions of the Go
152
take some time before they bear fruit. While it might be
153
Agent: A person or machine that performs a task, such as a
the fact that you are the agency in your own life.
154
would be valued the same as immediate ones if they were
delivers the agent's reward and the next state it will be in.
155
transmits actions to the environment from any given state,
156
well worth the trade-off. They have different time frames.
157
to store, index, and update all potential states and their
values.
158
to be a donkey, 50% likely to be a horse, and 30% likely to
earn five points, leaping will earn seven, and running left
* Source: https://wiki.pathmind.com/deep-reinforcement-learning
† Source: https://wiki.pathmind.com/deep-reinforcement-learning
159
will earn none using reinforcement learning and a
convolutional net.
Wikipedia:
160
The neural network coefficients may be initialised
particular activity.
161
8.3. Safely and Security of DRL
reader.
162
8.4. Successful applications of deep
reinforcement learning
163
progresses. DeepMind claims that AlphaZero learned the
drilling data and simulations. Data from the drill bit, such
164
learning's capacity to handle complicated problems that
Automotive
165
Resource Management
AI toolkits
successful they will be. This means that we'll have the
166
opportunity to see a significant growth in practical
applications.
It's still in the early phases, but it's promising. RL, on the
other hand, has shown to be a force that might
Manufacturing
167
a plethora of items to the appropriate recipients. Any time
wisdom.
Healthcare
Traffic Control
deciphering it.
168
network. Learning is accomplished via the use of its
Bots
Video Games
169
In a world where the game business is becoming more
170
CHAPTER
9
Advanced Topics in Deep Learning
9.1. Introduction
performance.
171
2. Models with selective access to internal memory:This
172
memory. A neural Turing machine or a memory network
173
A family of systems known as memory networks is closely
174
concentration of color-sensitive cones. Figure depicts the
When the picture falls on the whole retina, the eye only
175
After discussing the concept of attention in relation to
*Source:https://www.researchgate.net/figure/Illustration-of-the-
major-landmarks-in-the-retina-The-left-image-illustrates-
the_fig1_328211285
176
source of inspiration in this case. To acquire what they
177
framed in terms of reinforcement learning. Rather than
178
The following equation describes the gradient of the
179
The LSTM may be regarded to have a permanent memory,
despite the fact that the memory and computations are not
Figure 9.2: The output screen and policy network for learning the
Source:
* https://link.springer.com/chapter/10.1007/978-3-319-
94463-0_10
180
Neuronal Turing machines have external memory. The
181
network that are like CPU registers that are utilised for
concept of states.
182
Neuronal Turing machines have two major flaws that are
183
lowered. Prior to writing, controllers emit a set of free
day.
this case, it is vital to note that the memory writes are soft,
therefore it is impossible to establish a tight sequence.
There were at least two people that came up with the first
184
blurred. There is still a long way to go before these
185
Naive Bayes classification is an example of a model that
generates data.
prior distribution for that class and then sampling from the
186
generator network's weights have been modified. Generic
187
get into the applications, we'll go through how to train a
188
purposes of moving, it closer to an input, the output unit
outperformed humans.
189
surpass human talents. A number of basic technological
190
another truck of the same type, shape, and color. This
191
Another kind of transfer learning is based on the idea of
intensive operation.
192
may be more energy efficient is a reasonable conclusion to
193