0% found this document useful (0 votes)
140 views27 pages

Unit-4 Aiml

The document discusses the history and development of perceptrons and artificial neural networks. It describes the original perceptron model from 1958 and its limitations in only being able to learn linearly separable patterns. The development of the multi-layer perceptron overcame these limitations and allowed neural networks to learn more complex patterns. It also discusses gradient descent and the delta rule which provided the basis for the backpropagation algorithm.

Uploaded by

RAFIQ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
140 views27 pages

Unit-4 Aiml

The document discusses the history and development of perceptrons and artificial neural networks. It describes the original perceptron model from 1958 and its limitations in only being able to learn linearly separable patterns. The development of the multi-layer perceptron overcame these limitations and allowed neural networks to learn more complex patterns. It also discusses gradient descent and the delta rule which provided the basis for the backpropagation algorithm.

Uploaded by

RAFIQ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

UNIT-4 ARTIFICIAL INTELLIGENCE

AND MACHINE LEARNING


Perceptron model, Multilayer perceptron
In machine learning, the perceptron is an algorithm for supervised learning of binary
classifiers. A binary classifier is a function which can decide whether an input,
represented by a vector of numbers, belongs to some specific class. It is a type of
linear classifier, i.e. a classification algorithm that makes its predictions based on a
linear predictor function combining a set of weights with the feature vector.

The perceptron algorithm was invented in 1958 at the Cornell Aeronautical


Laboratory by Frank Rosenblatt, funded by the United States Office of Naval
Research.

The perceptron was intended to be a machine, rather than a program, and while its
first implementation was in software for the IBM 704, it was subsequently
implemented in custom-built hardware as the “Mark 1 perceptron”. This machine
was designed for image recognition: it had an array of 400 photocells, randomly
connected to the “neurons”. Weights were encoded in potentiometers, and weight
updates during learning were performed by electric motors.

In a 1958 press conference organized by the US Navy, Rosenblatt made statements


about the perceptron that caused a heated controversy among the fledgling AI
community; based on Rosenblatt’s statements, The New York Times reported the
perceptron to be “the embryo of an electronic computer that [the Navy] expects will
be able to walk, talk, see, write, reproduce itself and be conscious of its existence.”

Although the perceptron initially seemed promising, it was quickly proved that
perceptron’s could not be trained to recognize many classes of patterns. This caused
the field of neural network research to stagnate for many years before it was
recognized that a feedforward neural network with two or more layers (also called a
multilayer perceptron) had greater processing power than perceptron’s with one
layer (also called a single-layer perceptron).

Single layer perceptron’s are only capable of learning linearly separable patterns. For
a classification task with some step activation function, a single node will have a
single line dividing the data points forming the patterns. More nodes can create
more dividing lines, but those lines must somehow be combined to form more
complex classifications. A second layer of perceptron’s, or even linear nodes, are
sufficient to solve a lot of otherwise non-separable problems.

In 1969, a famous book entitled Perceptron’s by Marvin Minsky and Seymour Paper
showed that it was impossible for these classes of network to learn an XOR function.
It is often believed (incorrectly) that they also conjectured that a similar result would
hold for a multi-layer perceptron network. However, this is not true, as both Minsky
and Paper already knew that multi-layer perceptron’s could produce an XOR
function. (See the page on Perceptron’s (book) for more information.) Nevertheless,
the often-miscited Minsky/Paper text caused a significant decline in interest and
funding of neural network research. It took ten more years until neural network
research experienced a resurgence in the 1980s. This text was reprinted in 1987 as
“Perceptron’s Expanded Edition” where some errors in the original text are shown
and corrected.

The kernel perceptron algorithm was already introduced in 1964 by Aizerman.


Margin bounds guarantees were given for the Perceptron algorithm in the general
non-separable case first by Freund and Schapire (1998), and more recently by Mohri
and Rostamizadeh (2013) who extend previous results and give new L1 bounds.

The perceptron is a simplified model of a biological neuron. While the complexity of


biological neuron models is often required to fully understand neural behavior,
research suggests a perceptron-like linear model can produce some behavior seen in
real neurons.

Single-layered perceptron model

A single-layer perceptron model includes a feed-forward network depends on a


threshold transfer function in its model. It is the easiest type of artificial neural
network that able to analyze only linearly separable objects with binary
outcomes(target) i.e. 1, and 0.

Multi-layered perceptron model

A multi-layered perceptron model has a structure similar to a single-layered


perceptron model with a greater number of hidden layers. It is also termed as a
Backpropagation algorithm. It executes in two stages: the forward stage and the
backward stages.

Gradient descent and the Delta rule


The development of the perceptron was a big step towards the goal of creating
useful connectionist networks capable of learning complex relations between inputs
and outputs. In the late 1950’s, the connectionist community understood that what
was needed for further development of connectionist models was a mathematically
derived (and thus potentially more flexible and powerful) rule for learning. By early
1960’s, the Delta Rule [also known as the Widrow & Hoff Learning rule or the Least
Mean Square (LMS) rule] was invented by Widrow and Hoff. This rule is similar to
the perceptron learning rule by McClelland & Rumelhart, 1988, but is also
characterized by a mathematical utility and elegance missing in the perceptron and
other early learning rules.
The Delta Rule uses the difference between target activation (i.e., target output
values) and obtained activation to drive learning. For reasons discussed below, the
use of a threshold activation function (as used in both the McCulloch-Pitts network
and the perceptron) is dropped & instead a linear sum of products is used to
calculate the activation of the output neuron (alternative activation functions can
also be applied). Thus, the activation function is called a Linear Activation function,
in which the output node’s activation is simply equal to the sum of the network’s
respective input/weight products. The strength of network connections (i.e., the
values of the weights) are adjusted to reduce the difference between target and
actual output activation (i.e., error).

A set of data points are said to be linearly separable if the data can be divided into
two classes using a straight line. If the data is not divided into two classes using a
straight line, such data points are said to be called non-linearly separable data.

Although the perceptron rule finds a successful weight vector when the training
examples are linearly separable, it can fail to converge if the examples are not
linearly separable.

A second training rule, called the delta rule, is designed to overcome this difficulty.

If the training examples are not linearly separable, the delta rule converges toward a
best-fit approximation to the target concept.

The key idea behind the delta rule is to use gradient descent to search the
hypothesis space of possible weight vectors to find the weights that best fit the
training examples.
This rule is important because gradient descent provides the basis for the
BACKPROPAGATON algorithm, which can learn networks with many interconnected
units.

Multilayer networks
In network theory, multidimensional networks, a special type of multilayer network,
are networks with multiple kinds of relations. Increasingly sophisticated attempts to
model real-world systems as multidimensional networks have yielded valuable
insight in the fields of social network analysis, economics, urban and international
transport, ecology, psychology, medicine, biology, Commerce, climatology, physics,
computational neuroscience, operations management, infrastructures, and finance.

The rapid exploration of complex networks in recent years has been dogged by a
lack of standardized naming conventions, as various groups use overlapping and
contradictory terminology to describe specific network configurations (e.g.,
multiplex, multilayer, multilevel, multidimensional, multirelational, interconnected).
Formally, multidimensional networks are edge-labeled multigraphs. The term “fully
multidimensional” has also been used to refer to a multipartite edge-labeled
multigraph. Multidimensional networks have also recently been reframed as specific
instances of multilayer networks. In this case, there are as many layers as there are
dimensions, and the links between nodes within each layer are simply all the links
for a given dimension.

Different Routes of Infection

Multilayer networks can be usefully applied in contexts where a pathogen can be


transmitted through multiple modes or pathways of infection, as the multiplex
approach provides a framework to account for multiple transmission probabilities.
Considering the presence of multiple transmission modes can influence the efficacy
of targeted interventions, particularly if nodes were traditionally targeted according
to their degree in only one layer. This has implications for situations where data,
networks, and resultant optimal control strategies are only available for one mode of
transmission, leading to overconfidence in the efficacy of control.
In the context of veterinary epidemiology, animal movements are typically
considered the most effective transmission mode between farms (direct contacts).
However, other infection mechanisms might play an important role such as wind-
borne spread and fomites disseminated through contaminated clothes, equipment,
and vehicles by personnel (indirect contacts). Ignoring one mode of transmission
could lead to inaccurate farm risk predictions and ineffective targeted surveillance.
This has been demonstrated in a network analysis that considered both direct (cattle
movements) and indirect (veterinarian movements) contacts to reveal that indirect
contact, despite being less efficient in transmission, can play a major role in spread
of a pathogen within a network.

In another example, Stella used an “eco multiplex model” to study the spread of
Trypanosoma Cruzi (cause of Chagas disease in humans) across different mammal
species. This pathogen can be transmitted either through invertebrate vectors
(Triatominae or kissing bugs) or through predation when a susceptible predator
feeds on infected prey or vectors. Thus, their model included two
ecological/transmission layers: the food-web and vector layers. Their results showed
that studying the multiplex network structure offered insights on which host species
facilitate parasite spread, and thus which would be more effective to immunize to
control the spread. At the same time, they showed how, in this system, when
parasites spread occurs primarily through the trophic layer, immunizing predators
hampers parasite transmission more than immunizing prey.

Furthermore, multilayer network analysis can help differentiate between different


types of social interactions that may lead to disease transmission. For example, sex-
related dynamics of contact networks can have important implications for disease
spread in animal populations, as seen in the spread of Mycobacterium bovis in
European badgers (Meles meles). The authors constructed an interconnected
network that distinguished male-male, female-female, and between-sex contacts
recorded during proximity loggers. Inter-layer between-sex edges and edges in the
male-male layer were more important in connecting groups into wider social
communities, and contacts between different social communities were also more
likely in these layers.

Dynamics of Coupled Processes: The Spread of Two Pathogens

Another application of multilayer networks in epidemiology is to model the


concurrent propagation of two entities through a network, such as two different
pathogens co-occurring in the same population or the spread of disease awareness
alongside the spread of infection. In both scenarios, the spread of one entity within
the network interacts with the spread of the other, creating a coupled dynamical
system. A multiplex approach can allow for each coupled process to spread through
a network that is based on the appropriate type of contact for propagation (i.e.,
contact networks involved in pathogen transmission vs. interaction or association
networks that allow information to spread). In the case of two infectious diseases
concurrently spreading through a network, a multiplex approach can be particularly
useful if infection of a node by pathogen A alters the susceptibility to pathogen B, or
if coinfection of a node influences its ability to transmit either pathogen. For
example, when infection by one pathogen increases the likelihood of becoming
infected by another pathogen, it could theoretically facilitate the spread of a second
pathogen and thus alter epidemic dynamics. This type of dynamic is likely to
widespread in wild and domestic animals due to the importance of co-infection in
affecting infectious disease dynamics by influencing the replication of pathogens
within hosts. However, when there is competition or cross-immunity, the spread of
one pathogen could reduce the spread of a second pathogen. For example, this type
of dynamic could be expected for pathogens strains characterized by partial cross-
immunity, such as avian influenza, or microparasite-macroparasite coinfections in
which infection with one parasite reduces transmission of a second, such as infection
with gastrointestinal helminths reducing the transmission of bovine tuberculosis in
African buffalo (Syncerus caffer). Similar “within-node” dynamics could be important
at a farm-level in livestock movement networks. For example, the detection of a
given pathogen infection in a farm might cause it to be quarantined, thus reduce its
susceptibility and ability to transmit other pathogen infections.

Dynamics of Coupled Processes: Interactions Between Transmission


Networks and Information/Social Networks

For coupled processes involving a disease alongside a social process (i.e., spread of
information or disease awareness), we might expect that the spread of the pathogen
will be associated with the spread of disease awareness or preventative behaviors
such as mask-wearing, and in these cases theoretical models suggest that
considering the spread of disease awareness can result in reduced disease spread. A
model was presented by Granell, which represented two competing processes on the
same network: infection spread (modeled using a Susceptible-Infected-Susceptible
compartmental model) coupled with information spread through a social network (an
Unaware-Aware-Unaware compartmental model). The authors used their model to
show that the timing of self-awareness of infection had little effect on the epidemic
dynamics. However, the degree of immunization (a parameter which regulates the
probability of becoming infected when aware) and mass media information spread
on the social layer did critically impact disease spread. A similar framework has been
used to study the effect of the diffusion of vaccine opinion (pro or anti) across a
social network with concurrent infectious disease spread. The study showed a clear
regime shift from a vaccinated population and controlled outbreak to vaccine refusal
and epidemic spread depending on the strength of opinion on the perceived risks of
the vaccine. The shift in outcomes from a controlled to uncontrolled outbreak was
accompanied by an increase in the spatial correlation of cases. While models in the
veterinary literature have accounted for altered behavior of nodes (imposition of
control measures) because of detection or awareness of disease, it is not common
for awareness to be considered as a dynamic process that is influenced by how each
node has interacted with the pathogen (i.e., contact with an infected neighbor). For
example, the rate of adoption of biosecurity practices at a farm, such as enhanced
surveillance, use of vaccination, or installation of air filtration systems, may be
dependent on the presence of disease in neighboring farms or the farmers’
awareness of a pathogen through a professional network of colleagues.
There is also some evidence that nodes that are more connected in their “social
support” networks (e.g., connections with family and close friends in humans) can
alter network processes that result in negative outcomes, such as pathogen
exposure or engagement in high-risk behavior. In a case based on users of
injectable drugs, social connections with non-injectors can reduce drug-users
connectivity in a network based on risky behavior with other drug injectors. In a
model presented by Chen, a social-support layer of a multiplex network drove the
allocation of resources for infection recovery, meaning that infected individuals
recovered faster if they possessed more neighbors in the social support layer. In
animal (both wild and domesticated) populations, this concept could be adapted to
represent an individual’s likelihood of recovery from, or tolerance to, infection being
influenced by the buffering effect of affiliative social relationships. For domestic
animals, investment in certain resources at a farm level could influence a premise’s
ability to recover (e.g., treatment) or onwards transmission of a pathogen (e.g.,
treatment or biosecurity practices). Sharing of these resources between farms could
be modeled through a “social-support” layer in a multiplex, for example, where a
farm’s transmissibility is impacted by access to shared truck-washing facilities.

Multi-Host Infections

Multilayer networks can be used to study the features of mixed species contact
networks or model the spread of a pathogens in a host community, providing
important insights into multi-host pathogens. Scenarios like this are commonplace at
the livestock-wildlife interface and therefore the insights provided could be of real
interest to veterinary epidemiology. In the case of multi-host pathogens, intralayer
and interlayer edges represent the contacts between individuals of the same species
and between individuals of different species, respectively. They can therefore be
used to identify bottlenecks of transmission and provide a clearer idea of how
spillover occurs. For example, Silk used an interconnected network with three layers
to study potential routes of transmission in a multi-host system. One layer consisted
of a wild European badger (Meles meles) contact network, the second a
domesticated cattle contact network, and the third a layer containing badger latrine
sites (potentially important sites of indirect environmental transmission). No
intralayer edges were possible in the latrine layer. The authors demonstrated the
importance of these environmental sites in shortening paths through the multilayer
network (for both between- and within-species transmission routes) and showed
that some latrine sites were more important than others in connecting the different
layers. Pilosof presented a theoretical model, labeling the species as focal (i.e., of
interest) and non-focal, showing that the outbreak probability and outbreak size
depend on which species originates the outbreak and on asymmetries in between-
species transmission probabilities.

Similar applications of multilayer networks could easily be extended to systems


where two or more species are domesticated animals, as well. Examples of these
could be the study of a pathogen such as Bluetongue virus, which affects both cattle
and sheep, or foot-and-mouth disease virus, which infects cattle, sheep, and pigs. In
such cases, each species can be represented by a different level in the network, and
interlayer edges are made possible because of mixed farms (i.e., cattle and sheep),
different species from different farms grazing on the same pasture, or for other
types of indirect contacts such as the sharing equipment or personnel.

Overall, multilayer approaches provide an elegant way to analyze cross-species


transmission and spillover, including for zoonotic pathogens across the human-
livestock-wildlife interface. They can be used to simultaneously model within-species
transmission, identify heterogeneities among nodes in their tendency to engage in
between-species contacts relevant for spillover and spillback, and better predict the
dynamics of spread prior and subsequent to cross-species transmission events,
which may contribute to forecasting outbreaks in target species. Measures of
multilayer network centrality in this instance could be used to extend the super
spreader concept into a community context; individuals that are influential in within-
species contact networks and possess between-species connections might be
predicted to have a more substantial influence on infectious disease dynamics in the
wider community.

Backpropagation Algorithm
In machine learning, backpropagation (backprop, BP) is a widely used algorithm for
training feedforward neural networks. Generalizations of backpropagation exist for
other artificial neural networks (ANNs), and for functions generally. These classes of
algorithms are all referred to generically as “backpropagation”. In fitting a neural
network, backpropagation computes the gradient of the loss function with respect to
the weights of the network for a single input–output example, and does so
efficiently, unlike a naive direct computation of the gradient with respect to each
weight individually. This efficiency makes it feasible to use gradient methods for
training multilayer networks, updating weights to minimize loss; gradient descent, or
variants such as stochastic gradient descent, are commonly used. The
backpropagation algorithm works by computing the gradient of the loss function
with respect to each weight by the chain rule, computing the gradient one layer at a
time, iterating backward from the last layer to avoid redundant calculations of
intermediate terms in the chain rule; this is an example of dynamic programming.

The term backpropagation strictly refers only to the algorithm for computing the
gradient, not how the gradient is used; however, the term is often used loosely to
refer to the entire learning algorithm, including how the gradient is used, such as by
stochastic gradient descent. Backpropagation generalizes the gradient computation
in the delta rule, which is the single-layer version of backpropagation, and is in turn
generalized by automatic differentiation, where backpropagation is a special case of
reverse accumulation (or “reverse mode”). The term backpropagation and its
general use in neural networks was announced in Rumelhart, Hinton & Williams
(1986a), then elaborated and popularized in Rumelhart, Hinton & Williams (1986b),
but the technique was independently rediscovered many times, and had many
predecessors dating to the 1960s; see § History. A modern overview is given in the
deep learning textbook by Goodfellow, Bengio & Courville.
The algorithm is used to effectively train a neural network through a method called
chain rule. In simple terms, after each forward pass through a network,
backpropagation performs a backward pass while adjusting the model’s parameters
(weights and biases).

Input layer

The neurons, colored in purple, represent the input data. These can be as simple as
scalars or more complex like vectors or multidimensional matrices.

Hidden layers

The final values at the hidden neurons, colored in green, are computed using z^l —
weighted inputs in layer l, and a^l— activations in layer l.

Output layer

The final part of a neural network is the output layer which produces the predicated
value. In our simple example, it is presented as a single neuron, colored in blue.

Plan of attack: Backpropagation is based around four fundamental equations.


Together, those equations give us a way of computing both the error δl and the
gradient of the cost function. Be warned, though: you shouldn’t expect to
instantaneously assimilate the equations. Such an expectation will lead to
disappointment. In fact, the backpropagation equations are so rich that
understanding them well requires considerable time and patience as you gradually
delve deeper into the equations. The good news is that such patience is repaid many
times over. And so the discussion in this section is merely a beginning, helping you
on the way to a thorough understanding of the equations.

Deep Learning Introduction


Deep learning (also known as deep structured learning) is part of a broader family of
machine learning methods based on artificial neural networks with representation
learning. Learning can be supervised, semi-supervised or unsupervised.

Deep-learning architectures such as deep neural networks, deep belief networks,


deep reinforcement learning, recurrent neural networks and convolutional neural
networks have been applied to fields including computer vision, speech recognition,
natural language processing, machine translation, bioinformatics, drug design,
medical image analysis, material inspection and board game programs, where they
have produced results comparable to and in some cases surpassing human expert
performance.

Artificial neural networks (ANNs) were inspired by information processing and


distributed communication nodes in biological systems. ANNs have various
differences from biological brains. Specifically, artificial neural networks tend to be
static and symbolic, while the biological brain of most living organisms is dynamic
(plastic) and analogue.

The adjective “deep” in deep learning refers to the use of multiple layers in the
network. Early work showed that a linear perceptron cannot be a universal classifier,
but that a network with a nonpolynomial activation function with one hidden layer of
unbounded width can. Deep learning is a modern variation which is concerned with
an unbounded number of layers of bounded size, which permits practical application
and optimized implementation, while retaining theoretical universality under mild
conditions. In deep learning the layers are also permitted to be heterogeneous and
to deviate widely from biologically informed connectionist models, for the sake of
efficiency, trainability, and understandability, whence the “structured” part.

Most modern deep learning models are based on artificial neural networks,
specifically convolutional neural networks (CNN)s, although they can also include
propositional formulas or latent variables organized layer-wise in deep generative
models such as the nodes in deep belief networks and deep Boltzmann machines.

In deep learning, each level learns to transform its input data into a slightly more
abstract and composite representation. In an image recognition application, the raw
input may be a matrix of pixels; the first representational layer may abstract the
pixels and encode edges; the second layer may compose and encode arrangements
of edges; the third layer may encode a nose and eyes; and the fourth layer may
recognize that the image contains a face. Importantly, a deep learning process can
learn which features to optimally place in which level on its own. This does not
completely eliminate the need for hand-tuning; for example, varying numbers of
layers and layer sizes can provide different degrees of abstraction.

The word “Deep” in “Deep learning” refers to the number of layers through which
the data is transformed. More precisely, deep learning systems have a substantial
credit assignment path (CAP) depth. The CAP is the chain of transformations from
input to output. CAPs describe potentially causal connections between input and
output. For a feedforward neural network, the depth of the CAPs is that of the
network and is the number of hidden layers plus one (as the output layer is also
parameterized). For recurrent neural networks, in which a signal may propagate
through a layer more than once, the CAP depth is potentially unlimited. No
universally agreed-upon threshold of depth divides shallow learning from deep
learning, but most researchers agree that deep learning involves CAP depth higher
than 2. CAP of depth 2 has been shown to be a universal approximator in the sense
that it can emulate any function. Beyond that, more layers do not add to the
function approximator ability of the network. Deep models (CAP > 2) can extract
better features than shallow models and hence, extra layers help in learning the
features effectively.

Deep learning architectures can be constructed with a greedy layer-by-layer method.


Deep learning helps to disentangle these abstractions and pick out which features
improve performance.

Deep neural networks are generally interpreted in terms of the universal


approximation theorem or probabilistic inference.

The classic universal approximation theorem concerns the capacity of feedforward


neural networks with a single hidden layer of finite size to approximate continuous
functions. In 1989, the first proof was published by George Cybenko for sigmoid
activation functions and was generalized to feed-forward multi-layer architectures in
1991 by Kurt Hornik. Recent work also showed that universal approximation also
holds for non-bounded activation functions such as the rectified linear unit.

The universal approximation theorem for deep neural networks concerns the
capacity of networks with bounded width but the depth is allowed to grow. Lu
proved that if the width of a deep neural network with ReLU activation is strictly
larger than the input dimension, then the network can approximate any Lebesgue
integrable function; If the width is smaller or equal to the input dimension, then
deep neural network is not a universal approximator.

The probabilistic interpretation derives from the field of machine learning. It features
inference, as well as the optimization concepts of training and testing, related to
fitting and generalization, respectively. More specifically, the probabilistic
interpretation considers the activation nonlinearity as a cumulative distribution
function. The probabilistic interpretation led to the introduction of dropout as
regularize in neural networks. The probabilistic interpretation was introduced by
researchers including Hopfield, Widrow and Narendra and popularized in surveys
such as the one by Bishop.

Architectures:

Deep Neural Network: It is a neural network with a certain level of complexity


(having multiple hidden layers in between input and output layers). They are
capable of modeling and processing non-linear relationships.
Deep Belief Network (DBN): It is a class of Deep Neural Network. It is multi-
layer belief networks.

Steps for performing DBN:

1. Learn a layer of features from visible units using Contrastive Divergence


algorithm.
2. Treat activations of previously trained features as visible units and then learn
features of features.
3. Finally, the whole DBN is trained when the learning for the final hidden layer
is achieved.

Recurrent (perform same task for every element of a sequence) Neural Network:
Allows for parallel and sequential computation. Like the human brain (large feedback
network of connected neurons). They can remember important things about the
input they received and hence enables them to be more precise.

Limitations:

 Learning through observations only


 The issue of biases

Advantages:

 Reduces need for feature engineering.


 Best in-class performance on problems.
 Eliminates unnecessary costs.
 Identifies defects easily that are difficult to detect.

Disadvantages:

 Computationally expensive to train.


 Large amount of data required.
 No strong theoretical foundation.

Applications:

Healthcare: Helps in diagnosing various diseases and treating it.

Automatic Text Generation: Corpus of text is learned and from this model new
text is generated, word-by-word or character-by-character. Then this model is
capable of learning how to spell, punctuate, form sentences, or it may even capture
the style.

Concept of Convolutional Neural network


In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of
artificial neural network, most applied to analyze visual imagery. They are also
known as shift invariant or space invariant artificial neural networks (SIANN), based
on the shared-weight architecture of the convolution kernels or filters that slide
along input features and provide translation equivariant responses known as feature
maps. Counter-intuitively, most convolutional neural networks are only equivariant,
as opposed to invariant, to translation. They have applications in image and video
recognition, recommender systems, image classification, image segmentation,
medical image analysis, natural language processing, brain-computer interfaces, and
financial time series.

CNNs are regularized versions of multilayer perceptron’s. Multilayer perceptron’s


usually mean fully connected networks, that is, each neuron in one layer is
connected to all neurons in the next layer. The “full connectivity” of these networks
makes them prone to overfitting data. Typical ways of regularization, or preventing
overfitting, include penalizing parameters during training (such as weight decay) or
trimming connectivity (skipped connections, dropout, etc.) CNNs take a different
approach towards regularization: they take advantage of the hierarchical pattern in
data and assemble patterns of increasing complexity using smaller and simpler
patterns embossed in their filters. Therefore, on a scale of connectivity and
complexity, CNNs are on the lower extreme.

Convolutional networks were inspired by biological processes in that the connectivity


pattern between neurons resembles the organization of the animal visual cortex.
Individual cortical neurons respond to stimuli only in a restricted region of the visual
field known as the receptive field. The receptive fields of different neurons partially
overlap such that they cover the entire visual field.

CNNs use relatively little pre-processing compared to other image classification


algorithms. This means that the network learns to optimize the filters (or kernels)
through automated learning, whereas in traditional algorithms these filters are hand-
engineered. This independence from prior knowledge and human intervention in
feature extraction is a major advantage.

Architecture

A convolutional neural network consists of an input layer, hidden layers and an


output layer. In any feed-forward neural network, any middle layers are called
hidden because their inputs and outputs are masked by the activation function and
final convolution. In a convolutional neural network, the hidden layers include layers
that perform convolutions. Typically, this includes a layer that performs a dot
product of the convolution kernel with the layer’s input matrix. This product is
usually the Frobenius inner product, and its activation function is commonly ReLU.
As the convolution kernel slides along the input matrix for the layer, the convolution
operation generates a feature map, which in turn contributes to the input of the next
layer. This is followed by other layers such as pooling layers, fully connected layers,
and normalization layers.
Convolutional layers

In a CNN, the input is a tensor with a shape: (number of inputs) x (input height) x
(input width) x (input channels). After passing through a convolutional layer, the
image becomes abstracted to a feature map, also called an activation map, with
shape: (number of inputs) x (feature map height) x (feature map width) x (feature
map channels).

Convolutional layers convolve the input and pass its result to the next layer. This is
similar to the response of a neuron in the visual cortex to a specific stimulus. Each
convolutional neuron processes data only for its receptive field. Although fully
connected feedforward neural networks can be used to learn features and classify
data, this architecture is generally impractical for larger inputs such as high-
resolution images. It would require a very high number of neurons, even in a
shallow architecture, due to the large input size of images, where each pixel is a
relevant input feature. For instance, a fully connected layer for a (small) image of
size 100 x 100 has 10,000 weights for each neuron in the second layer. Instead,
convolution reduces the number of free parameters, allowing the network to be
deeper. For example, regardless of image size, using a 5 x 5 tiling region, each with
the same shared weights, requires only 25 learnable parameters. Using regularized
weights over fewer parameters avoids the vanishing gradients and exploding
gradients problems seen during backpropagation in traditional neural networks.
Furthermore, convolutional neural networks are ideal for data with a grid-like
topology (such as images) as spatial relations between separate features are
considered during convolution and/or pooling.

Pooling layers

Convolutional networks may include local and/or global pooling layers along with
traditional convolutional layers. Pooling layers reduce the dimensions of data by
combining the outputs of neuron clusters at one layer into a single neuron in the
next layer. Local pooling combines small clusters, tiling sizes such as 2 x 2 are
commonly used. Global pooling acts on all the neurons of the feature map. There
are two common types of pooling in popular use: max and average. Max pooling
uses the maximum value of each local cluster of neurons in the feature map, while
average pooling takes the average value.

Fully connected layers

Fully connected layers connect every neuron in one layer to every neuron in another
layer. It is the same as a traditional multi-layer perceptron neural network (MLP).
The flattened matrix goes through a fully connected layer to classify the images.

Receptive field

In neural networks, each neuron receives input from some number of locations in
the previous layer. In a convolutional layer, each neuron receives input from only a
restricted area of the previous layer called the neuron’s receptive field. Typically, the
area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the
receptive field is the entire previous layer. Thus, in each convolutional layer, each
neuron takes input from a larger area in the input than previous layers. This is due
to applying the convolution over and over, which considers the value of a pixel, as
well as its surrounding pixels. When using dilated layers, the number of pixels in the
receptive field remains constant, but the field is more sparsely populated as its
dimensions grow when combining the effect of several layers.

Weights

Each neuron in a neural network computes an output value by applying a specific


function to the input values received from the receptive field in the previous layer.
The function that is applied to the input values is determined by a vector of weights
and a bias (typically real numbers). Learning consists of iteratively adjusting these
biases and weights.

The vector of weights and the bias are called filters and represent features of the
input (e.g., a particular shape). A distinguishing feature of CNNs is that many
neurons can share the same filter. This reduces the memory footprint because a
single bias and a single vector of weights are used across all receptive fields that
share that filter, as opposed to each receptive field having its own bias and vector
weighting.

Types of Layers (Convolutional Layers,


Activation function, Pooling, fully
connected)
Convolutional Layers

Convolutional layers are the major building blocks used in convolutional neural
networks.

A convolution is the simple application of a filter to an input that results in an


activation. Repeated application of the same filter to an input result in a map of
activations called a feature map, indicating the locations and strength of a detected
feature in an input, such as an image.

The innovation of convolutional neural networks is the ability to automatically learn


many filters in parallel specific to a training dataset under the constraints of a
specific predictive modelling problem, such as image classification. The result is
highly specific features that can be detected anywhere on input images.

Activation function
Activation function decides whether a neuron should be activated or not by
calculating weighted sum and further adding bias with it. The purpose of the
activation function is to introduce non-linearity into the output of a neuron.

Neural network has neurons that work in correspondence of weight, bias and their
respective activation function. In a neural network, we would update the weights
and biases of the neurons based on the error at the output. This process is known
as back-propagation. Activation functions make the back-propagation possible since
the gradients are supplied along with the error to update the weights and biases.

1) Linear Function:

 Equation: Linear function has the equation similar to as of a straight line i.e. y
= ax
 No matter how many layers we have, if all are linear in nature, the final
activation function of last layer is nothing but just a linear function of the
input of first layer.
 Range: -inf to +inf
 Uses: Linear activation function is used at just one place i.e. output layer.
 Issues: If we will differentiate linear function to bring non-linearity, result will
no longer depend on input “x” and function will become constant, it won’t
introduce any ground-breaking behavior to our algorithm.

2) Sigmoid Function:

 It is a function which is plotted as ‘S’ shaped graph.


 Equation:

A = 1/(1 + e-x)

 Nature: Non-linear. Notice that X values lies between -2 to 2, Y values are


very steep. This means, small changes in x would also bring about large
changes in the value of Y.
 Value Range: 0 to 1
 Uses: Usually used in output layer of a binary classification, where result is
either 0 or 1, as value for sigmoid function lies between 0 and 1 only so,
result can be predicted easily to be 1 if value is greater than 0.5 and 0
otherwise.

Tanh Function: The activation that works almost always better than sigmoid
function is Tanh function also knows as Tangent Hyperbolic function. It’s
mathematically shifted version of the sigmoid function. Both are similar and can be
derived from each other.

Pooling Layer

The pooling or down sampling layer is responsible for reducing the special size of the
activation maps. In general, they are used after multiple stages of other layers (i.e.
convolutional and non-linearity layers) to reduce the computational requirements
progressively through the network as well as minimizing the likelihood of overfitting.

The key concept of the pooling layer is to provide translational invariance since
particularly in image recognition tasks, the feature detection is more important
compared to the feature’s exact location. Therefore, the pooling operation aims to
preserve the detected features in a smaller representation and does so, by
discarding less significant data at the cost of spatial resolution.

Fully connected

Fully connected layers connect every neuron in one layer to every neuron in another
layer. It is the same as a traditional multi-layer perceptron neural network (MLP).
The flattened matrix goes through a fully connected layer to classify the images.

Fully connected neural networks (FCNNs) are a type of artificial neural network
where the architecture is such that all the nodes, or neurons, in one layer are
connected to the neurons in the next layer.

While this type of algorithm is commonly applied to some types of data, in practice
this type of network has some issues in terms of image recognition and
classification. Such networks are computationally intense and may be prone to
overfitting. When such networks are also ‘deep’ (meaning there are many layers of
nodes or neurons) they can be particularly hard for humans to understand.

Training of Network, Recent Applications


ANNs are statistical models designed to adapt and self-program by using learning
algorithms to understand and sort out concepts, images, and photographs. For
processors to do their work, developers arrange them in layers that operate in
parallel. The input layer is analogous to the dendrites in the human brain’s neural
network. The hidden layer is comparable to the cell body and sits between the input
layer and output layer (which is akin to the synaptic outputs in the brain). The
hidden layer is where artificial neurons take in a set of inputs based on synaptic
weight, which is the amplitude or strength of a connection between nodes. These
weighted inputs generate an output through a transfer function to the output layer.

Attributes of Neural Networks

With the human-like ability to problem-solve and apply that skill to huge datasets
neural networks possess the following powerful attributes:

Adaptive Learning: Like humans, neural networks model non-linear and complex
relationships and build on previous knowledge. For example, software uses adaptive
learning to teach math and language arts.
Self-Organization: The ability to cluster and classify vast amounts of data makes
neural networks uniquely suited for organizing the complicated visual problems
posed by medical image analysis.

Real-Time Operation: Neural networks can (sometimes) provide real-time


answers, as is the case with self-driving cars and drone navigation.

Prognosis: NN’s ability to predict based on models has a wide range of


applications, including for weather and traffic.

Fault Tolerance: When significant parts of a network are lost or missing, neural
networks can fill in the blanks. This ability is especially useful in space exploration,
where the failure of electronic devices is always a possibility.

Tasks Neural Networks Perform

Neural networks are highly valuable because they can carry out tasks to make sense
of data while retaining all their other attributes. Here are the critical tasks that
neural networks perform:

Classification: NNs organize patterns or datasets into predefined classes.

Prediction: They produce the expected output from given input.

Clustering: They identify a unique feature of the data and classify it without any
knowledge of prior data.

Associating: You can train neural networks to “remember” patterns. When you
show an unfamiliar version of a pattern, the network associates it with the most
comparable version in its memory and reverts to the latter.

Network engineering applications currently in use in various industries:

 Aerospace: Aircraft component fault detectors and simulations, aircraft


control systems, high-performance auto-piloting, and flight path simulations
 Automotive: Improved guidance systems, development of power trains,
virtual sensors, and warranty activity analyzers
 Electronics: Chip failure analysis, circuit chip layouts, machine vision, non-
linear modeling, prediction of the code sequence, process control, and voice
synthesis
 Manufacturing: Chemical product design analysis, dynamic modeling of
chemical process systems, process control, process and machine diagnosis,
product design and analysis, paper quality prediction, project bidding,
planning and management, quality analysis of computer chips, visual quality
inspection systems, and welding quality analysis
 Mechanics: Condition monitoring, systems modeling, and control
 Robotics: Forklift robots, manipulator controllers, trajectory control, and
vision systems
 Telecommunications: ATM network control, automated information
services, customer payment processing systems, data compression,
equalizers, fault management, handwriting recognition, network design,
management, routing and control, network monitoring, real-time translation
of spoken language, and pattern recognition (faces, objects, fingerprints,
semantic parsing, spell check, signal processing, and speech recognition).

Types of Neural Networks in Artificial Intelligence

Parameter Types Description


Based on the Feedforward: In which graphs have no
Feedforward,
connection loops. Recurrent: Loops occur because of
Recurrent
pattern feedback.
Based on the Single Layer: Having one secret layer. E.g., Single
Single layer,
number of Perceptron Multilayer: Having multiple secret
multi-Layer
hidden layers layers. Multilayer Perceptron
Based on the Fixed: Weights are a fixed priority and not changed
Fixed,
nature of at all. Adaptive: Updates the weights and changes
Adaptive
weights during training.
Static: Memoryless unit. The current output
depends on the current input. E.g., Feedforward
Based on the Static,
network. Dynamic: Memory unit – The output
Memory unit Dynamic
depends upon the current input as well as the
current output. E.g., Recurrent Neural Network

Perceptron Model in Neural Networks

Neural Network is having two input units and one output unit with no hidden layers.
These are also known as ‘single-layer perceptron’s.’

Radial Basis Function Neural Network

These networks are like the feed-forward Neural Network, except radial basis
function is used as these neurons’ activation function.

Multilayer Perceptron Neural Network

These networks use more than one hidden layer of neurons, unlike single-layer
perceptron. These are also known as Deep Feedforward Neural Networks.

Recurrent Neural Network

Type of Neural Network in which hidden layer neurons have self-connections.


Recurrent Neural Networks possess memory. At any instance, the hidden layer
neuron receives activation from the lower layer and its previous activation value.

Long Short-Term Memory Neural Network (LSTM)


The type of Neural Network in which memory cell is incorporated into hidden layer
neurons is called LSTM network.

Hopfield Network

A fully interconnected network of neurons in which each neuron is connected to


every other neuron. The network is trained with input patterns by setting a value of
neurons to the desired pattern. Then its weights are computed. The weights are not
changed. Once trained for one or more patterns, the network will converge to the
learned patterns. It is different from other Neural Networks.

Boltzmann Machine Neural Network

These networks are like the Hopfield network, except some neurons are input, while
others are hidden in nature. The weights are initialized randomly and learn through
the backpropagation algorithm.

Convolutional Neural Network

Get a complete overview of Convolutional Neural Networks through our blog Log
Analytics with Machine Learning and Deep Learning.

Modular Neural Network

It is the combined structure of different types of neural networks like multilayer


perceptron, Hopfield Network, Recurrent Neural Network, etc., which are
incorporated as a single module into the network to perform independent subtask of
whole complete Neural Networks.

Physical Neural Network

In this type of Artificial Neural Network, electrically adjustable resistance material is


used to emulate synapse instead of software simulations performed in the neural
network.

Introduction to Reinforcement Learning,


Learning Task, Example of Reinforcement
Learning in Practice, Learning model for
Reinforcement Markov Decision process
Reinforcement learning (RL) is an area of machine learning concerned with how
intelligent agents ought to take actions in an environment in order to maximize the
notion of cumulative reward. Reinforcement learning is one of three basic machine
learning paradigms, alongside supervised learning and unsupervised learning.
Reinforcement learning differs from supervised learning in not needing labelled
input/output pairs be presented, and in not needing sub-optimal actions to be
explicitly corrected. Instead, the focus is on finding a balance between exploration
(of uncharted territory) and exploitation (of current knowledge). Partially supervised
RL algorithms can combine the advantages of supervised and RL algorithms.

The environment is typically stated in the form of a Markov decision process (MDP)
because many reinforcement learning algorithms for this context use dynamic
programming techniques. The main difference between the classical dynamic
programming methods and reinforcement learning algorithms is that the latter do
not assume knowledge of an exact mathematical model of the MDP and they target
large MDPs where exact methods become infeasible.

Reinforcement learning is an area of Machine Learning. It is about taking suitable


action to maximize reward in a particular situation. It is employed by various
software and machines to find the best possible behavior or path it should take in a
specific situation. Reinforcement learning differs from supervised learning in a way
that in supervised learning the training data has the answer key with it so the model
is trained with the correct answer itself whereas in reinforcement learning, there is
no answer but the reinforcement agent decides what to do to perform the given
task. In the absence of a training dataset, it is bound to learn from its experience.

Focal point of Reinforcement learning:

Input: The input should be an initial state from which the model will start

Output: There are many possible outputs as there are a variety of solutions to a
particular problem

Training: The training is based upon the input; The model will return a state and
the user will decide to reward or punish the model based on its output. The model
keeps continues to learn. The best solution is decided based on the maximum
reward.

Types of Reinforcement: There are two types of Reinforcement:

Positive:

Positive Reinforcement is defined as when an event, occurs due to a particular


behavior, increases the strength and the frequency of the behavior. In other words,
it has a positive effect on behavior.

Advantages of reinforcement learning are:

 Maximizes Performance
 Sustain Change for a long period of time
 Too much Reinforcement can lead to an overload of states which can diminish
the results
Negative:

Negative Reinforcement is defined as strengthening of behavior because a negative


condition is stopped or avoided.

Advantages of reinforcement learning:

 Increases Behavior.
 Provide defiance to a minimum standard of performance.
 It Only provides enough to meet up the minimum behavior.

Example of Reinforcement Learning in Practice

 Robotics for industrial automation.


 Business strategy planning.
 Machine learning and data processing.
 It helps you to create training systems that provide custom instruction and
materials according to the requirement of students.
 Aircraft control and robot motion control.

Learning model for Reinforcement Markov Decision process

There are many different algorithms that tackle this issue. As a matter of fact,
Reinforcement Learning is defined by a specific type of problem, and all its solutions
are classed as Reinforcement Learning algorithms. In the problem, an agent is
supposed to decide the best action to select based on his current state. When this
step is repeated, the problem is known as a Markov Decision Process.

A Markov Decision Process (MDP) model contains:

 A set of possible world states S.


 A set of Models.
 A set of possible actions A.
 A real-valued reward function R(s,a).
 A policy the solution of Markov Decision Process.

State

A State is a set of tokens that represent every state that the agent can be in.

Model

A Model (sometimes called Transition Model) gives an action’s effect in a state. In


particular, T (S, a, S’) defines a transition T where being in state S and taking an
action ‘a’ takes us to state S’ (S and S’ may be the same). For stochastic actions
(noisy, non-deterministic) we also define a probability P(S’|S,a) which represents the
probability of reaching a state S’ if action ‘a’ is taken in state S. Note Markov
property states that the effects of an action taken in a state depend only on that
state and not on the prior history.

Actions

An Action A is a set of all possible actions. A(s) defines the set of actions that can be
taken being in state S.

Q Learning: Q Learning function, Q


Learning Algorithm), Application of
Reinforcement Learning, Introduction to
Deep Q Learning
Q-learning is a model-free reinforcement learning algorithm to learn the value of
an action in a particular state. It does not require a model of the environment
(hence “model-free”), and it can handle problems with stochastic transitions and
rewards without requiring adaptations.

For any finite Markov decision process (FMDP), Q-learning finds an optimal policy in
the sense of maximizing the expected value of the total reward over all successive
steps, starting from the current state. Q-learning can identify an optimal action-
selection policy for any given FMDP, given infinite exploration time and a partly
random policy. “Q” refers to the function that the algorithm computes the expected
rewards for an action taken in each state.

Reinforcement learning involves an agent, a set of states’ S, and a set A of actions


per state. By performing an action, a€A, the agent transitions from state to state.
Executing an action in a specific state provides the agent with a reward (a numerical
score).

The goal of the agent is to maximize its total reward. It does this by adding the
maximum reward attainable from future states to the reward for achieving its
current state, effectively influencing the current action by the potential future
reward. This potential reward is a weighted sum of expected values of the rewards
of all future steps starting from the current state.

As an example, consider the process of boarding a train, in which the reward is


measured by the negative of the total time spent boarding (alternatively, the cost of
boarding the train is equal to the boarding time). One strategy is to enter the train
door as soon as they open, minimizing the initial wait time for yourself. If the train is
crowded, however, then you will have a slow entry after the initial action of entering
the door as people are fighting you to depart the train as you attempt to board. The
total boarding time, or cost, is then:
0 seconds wait time + 15 seconds fight time

On the next day, by random chance (exploration), you decide to wait and let other
people depart first. This initially results in a longer wait time. However, time-fighting
other passengers are less. Overall, this path has a higher reward than that of the
previous day, since the total boarding time is now:

5 second wait time + 0 second fight time

Through exploration, despite the initial (patient) action resulting in a larger cost (or
negative reward) than in the forceful strategy, the overall cost is lower, thus
revealing a more rewarding strategy.

Q Learning Algorithm

Q-learning is a model-free reinforcement learning algorithm.

Q-learning is a values-based learning algorithm. Value based algorithms updates the


value function based on an equation (particularly Bellman equation). Whereas the
other type, policy-based estimates the value function with a greedy policy obtained
from the last policy improvement.

Q-learning is an off-policy learner. Means it learns the value of the optimal policy
independently of the agent’s actions. On the other hand, an on-policy learner learns
the value of the policy being carried out by the agent, including the exploration steps
and it will find a policy that is optimal, considering the exploration inherent in the
policy.

Q-Table

Q-Table is the data structure used to calculate the maximum expected future
rewards for action at each state. Basically, this table will guide us to the best action
at each state. To learn each value of the Q-table, Q-Learning algorithm is used.

Step 1: initialize the Q-Table

We will first build a Q-table. There are n columns, where n= number of actions.
There are m rows, where m= number of states. We will initialize the values at 0.

Steps 2 and 3: choose and perform an action

This combination of steps is done for an undefined amount of time. This means that
this step runs until the time we stop the training, or the training loop stops as
defined in the code.

We will choose an action (a) in the state (s) based on the Q-Table. But, as
mentioned earlier when the episode initially starts, every Q-value is 0.
Steps 4 and 5: evaluate

Now we have taken an action and observed an outcome and reward. We need to
update the function Q(s,a).

Application of Reinforcement Learning

Manufacturing

In Fanuc, a robot uses deep reinforcement learning to pick a device from one box
and putting it in a container. Whether it succeeds or fails, it memorizes the object
and gains knowledge and train’s itself to do this job with great speed and precision.

Many warehousing facilities used by eCommerce sites and other supermarkets use
these intelligent robots for sorting their millions of products every day and helping to
deliver the right products to the right people. If you look at Tesla’s factory, it
comprises of more than 160 robots that do major part of work on its cars to reduce
the risk of any defect.

Finance

Reinforcement learning has helped develop several innovative applications in the


financial industry. This combined with Machine Learning has made several
differences in the domain over the years. Today, there are numerous technologies
involved in finance, such as search engines, chatbots, etc.

Several reinforcement learning techniques can help generate more return on


investment, reduce cost, improve customer experience, etc. Reinforcement learning
and Machine Learning, together, can result in improved execution while approving
loans, measuring risk factors, and managing investments.

One of the most popular applications of reinforcement learning in finance is portfolio


management. It is building a platform that allows you to make significantly more
accurate predictions with regards to stock and other such investments, thereby
providing better results. This is one of the main reasons why most investors in the
industry wish to create these applications to evaluate the financial market in a
detailed manner. Moreover, many of these portfolio management applications,
including Robo-advisors, allow you to generate more accurate results with time.

Inventory Management

A major issue in supply chain inventory management is the coordination of inventory


policies adopted by different supply chain actors, such as suppliers, manufacturers,
distributors, to smooth material flow and minimize costs while responsively meeting
customer demand.
Reinforcement learning algorithms can be built to reduce transit time for stocking as
well as retrieving products in the warehouse for optimizing space utilization and
warehouse operations.

Healthcare

With technology improving and advancing on a regular basis, it has taken over
almost every industry today, especially the healthcare sector. With the
implementation of reinforcement learning, the healthcare system has generated
better outcomes consistently. One of the most common areas of reinforcement
learning in the healthcare domain is Quotient Health.

Quotient Health is a software app built to target reduced expenses on electronic


medical record assistance. The app achieves this by standardizing and enhancing the
methods that create such systems. The main goal of this is to make improvements in
the healthcare system, specifically by lowering unnecessary costs.

Delivery Management

Reinforcement learning is used to solve the problem of Split Delivery Vehicle


Routing. Q-learning is used to serve appropriate customers with just one vehicle.

Image Processing

The image processing field is a subcategory of the healthcare domain. It is,


somewhat, a part of the medical industry but having a domain of its own. Honestly,
reinforcement learning revolutionized not only image processing but the medical
industry at large. However, here, we will discuss some of the applications of this
technology in image processing alone.

Deep Q-learning

The DeepMind system used a deep convolutional neural network, with layers of tiled
convolutional filters to mimic the effects of receptive fields. Reinforcement learning is
unstable or divergent when a nonlinear function approximator such as a neural
network is used to represent Q. This instability comes from the correlations present
in the sequence of observations, the fact that small updates to Q may significantly
change the policy of the agent and the data distribution, and the correlations
between Q and the target values.

The technique used experience replay, a biologically inspired mechanism that uses a
random sample of prior actions instead of the most recent action to proceed. This
removes correlations in the observation sequence and smooths changes in the data
distribution. Iterative updates adjust Q towards target values that are only
periodically updated, further reducing correlations with the target.

Because the future maximum approximated action value in Q-learning is evaluated


using the same Q function as in current action selection policy, in noisy
environments Q-learning can sometimes overestimate the action values, slowing the
learning. A variant called Double Q-learning was proposed to correct this. Double Q-
learning is an off-policy reinforcement learning algorithm, where a different policy is
used for value evaluation than what is used to select the next action.

You might also like