0% found this document useful (0 votes)
8 views

Notes Deep Learning

The document provides an overview of Artificial Neural Networks (ANNs) and Deep Learning, explaining their relationship to machine learning and artificial intelligence. It discusses the structure and functioning of neural networks, their characteristics, various types, and applications in fields like computer vision and natural language processing. Additionally, it outlines the history of deep learning, highlighting key developments and successes in the field.

Uploaded by

Praneeth B
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Notes Deep Learning

The document provides an overview of Artificial Neural Networks (ANNs) and Deep Learning, explaining their relationship to machine learning and artificial intelligence. It discusses the structure and functioning of neural networks, their characteristics, various types, and applications in fields like computer vision and natural language processing. Additionally, it outlines the history of deep learning, highlighting key developments and successes in the field.

Uploaded by

Praneeth B
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

UNIT I

Artificial Neural Networks

P Jyothi,Asst. Prof., CSE Dept.


P Jyothi
Asst. prof.,
CSE Dept.

P Jyothi,Asst. Prof., CSE Dept.


Deep Learning

P Jyothi,Asst. Prof., CSE Dept.


Deep Learning conti..

 Deep Learning is a subset of machine


learning(ML), DL learns features and tasks
directly from data such as images, text, or
sound.

P Jyothi,Asst. Prof., CSE Dept.


Deep Learning conti..

 Machine learning is a subset of artificial intelligence (AI)


that allows computer programs to learn data and predict
accurate outcomes without being programmed to do so.
 ML is applied in image recognition, speech recognition,
and fraud detection.
 Machine learning takes a statistical approach to obtain
patterns from data.

P Jyothi,Asst. Prof., CSE Dept.


Deep Learning conti..

 AI is the capability of machines and computers to mimic


human intelligence and, behavior.
 AI is accomplished by studying how the human brain
operates while trying to solve a problem.
 Deep Learning is a machine learning technique that
automatically extracts the useful pieces of information or
makes decisions using neural networks.

P Jyothi,Asst. Prof., CSE Dept.


Deep Learning conti..

 An ANN is based on a collection of connected units


or nodes called artificial neurons, which loosely
model the neurons in a biological brain.

 Each connection, like the synapses in a biological


brain, can transmit a signal to other neurons.

P Jyothi,Asst. Prof., CSE Dept.


Deep Learning conti..

 The connections are called edges. Neurons and edges


typically have a weight that adjusts as learning proceeds.
The weight increases or decreases the strength of the
signal at a connection.
 Neurons may have a threshold such that a signal is sent
only if the aggregate signal crosses that threshold.

P Jyothi,Asst. Prof., CSE Dept.


How Neural Networks work

 Neural Networks are algorithms inspired by the human


brain. Neural Networks are systems of hardware and
software patterned after neuron operations in the human
mind.

 Neural Networks learn models, identify patterns, and


arrange non-identical kinds of information while trying to
imitate the human brain.

P Jyothi,Asst. Prof., CSE Dept.


Characteristics of Neural Networks

Features of Biological Neural Networks

 Some attractive features of the biological neural network


that make it superior to even the most sophisticated A1
computer system for pattern recognition tasks are the
following:

P Jyothi,Asst. Prof., CSE Dept.


Characteristics of Neural Networks

 (a) Robustness and fault tolerance: The decay of nerve cells


does not seem to affect the performance significantly.
 (b) Flexibility: The network automatically adjusts to a new
environment without using any preprogrammed instructions.
 (c) Ability to deal with a variety of data situations: The
network can deal with information that is fuzzy, probabilistic,
noisy and inconsistent.
 (d) Collective computation: The network performs routinely
many operations in parallel and also a given task in a
distributed manner.
P Jyothi,Asst. Prof., CSE Dept.
Biological Neuron

 Deep Learning is a form of Machine Learning. It is known as


'Deep' Learning because it contains many layers of neurons.
A neuron within a Deep Learning network is similar to a
neuron of the human brain - another name for Deep Learning
is 'Artificial Neural Networks’.

P Jyothi,Asst. Prof., CSE Dept.


Biological Neuron conti..

 It consists of a cell body or soma where the cell nucleus


is located.
 Treelike nerve fibres called dendrites are associated with
the cell body.
 These dendrites receive signals from other neurons.
Extending from the cell body is a single long fibre called
the axon, which eventually connecting to many other
neurons at the synaptic junctions, or synapses.

P Jyothi,Asst. Prof., CSE Dept.


Biological Neuron conti..

 The dendrites serve as receptors for signals from


other neurons, whereas the purpose of an axon is
transmission of the generated neural activity to
other nerve cells

P Jyothi,Asst. Prof., CSE Dept.


P Jyothi,Asst. Prof., CSE Dept.
Types of neural networks
 Deep Learning models are able to automatically learn features from the data, which
makes them well-suited for tasks such as image recognition, speech recognition, and
natural language processing. The most widely used architectures in deep learning
are feedforward neural networks, convolutional neural networks (CNNs), and
recurrent neural networks (RNNs).
 Feedforward neural networks (FNNs) are the simplest type of ANN, with a linear flow
of information through the network. FNNs have been widely used for tasks such as
image classification, speech recognition, and natural language processing.
 Convolutional Neural Networks (CNNs) are specifically for image and video
recognition tasks. CNNs are able to automatically learn features from the images,
which makes them well-suited for tasks such as image classification, object
detection, and image segmentation.
 Recurrent Neural Networks (RNNs) are a type of neural network that is able to
process sequential data, such as time series and natural language. RNNs are able to
maintain an internal state that captures information about the previous inputs, which
makes them well-suited for tasks such as speech recognition, natural language
processing, and language translation.

P Jyothi,Asst. Prof., CSE Dept.


Applications of Deep Learning :

 The main applications of deep learning can be divided into


 Computer vision
 Natural language processing (NLP), and
 Reinforcement learning

P Jyothi,Asst. Prof., CSE Dept.


Applications of Deep Learning Conti..

Computer Vision:
 In computer vision, Deep learning models can enable machines to identify and
understand visual data. Some of the main applications of deep learning in
computer vision include:
• Object detection and recognition: Deep learning model can be used to
identify and locate objects within images and videos, making it possible for
machines to perform tasks such as self-driving cars, surveillance, and robotics.
• Image classification: Deep learning models can be used to classify images into
categories such as animals, plants, and buildings. This is used in applications
such as medical imaging, quality control, and image retrieval.
• Image segmentation: Deep learning models can be used for image
segmentation into different regions, making it possible to identify specific
features within images.
P Jyothi,Asst. Prof., CSE Dept.
Applications of Deep Learning Conti..

Natural language processing (NLP):


 In NLP, the Deep learning model can enable machines to understand and generate
human language. Some of the main applications of deep learning in NLP include:
• Automatic Text Generation – Deep learning model can learn the corpus of text and new
text like summaries, essays can be automatically generated using these trained models.
• Language translation: Deep learning models can translate text from one language to
another, making it possible to communicate with people from different linguistic
backgrounds.
• Sentiment analysis: Deep learning models can analyze the sentiment of a piece of text,
making it possible to determine whether the text is positive, negative, or neutral. This is
used in applications such as customer service, social media monitoring, and political
analysis.
• Speech recognition: Deep learning models can recognize and transcribe spoken words,
making it possible to perform tasks such as speech-to-text conversion, voice search, and
voice-controlled devices.

P Jyothi,Asst. Prof., CSE Dept.


Applications of Deep Learning Conti..

Reinforcement learning:
 In reinforcement learning, deep learning works as training agents to take action
in an environment to maximize a reward. Some of the main applications of deep
learning in reinforcement learning include:
• Game playing: Deep reinforcement learning models have been able to beat
human experts at games such as Go, Chess, and Atari.
• Robotics: Deep reinforcement learning models can be used to train robots to
perform complex tasks such as grasping objects, navigation, and manipulation.
• Control systems: Deep reinforcement learning models can be used to control
complex systems such as power grids, traffic management, and supply chain
optimization.

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning

 The history of deep learning can be traced back to 1943, when Walter Pitts and
Warren McCulloch created a computer model based on the neural networks of the
human brain.
The 1960s:
 Henry J. Kelley is given credit for developing the basics of a continuous Back
Propagation Model in 1960. In 1962, a simpler version based only on the chain rule
was developed by Stuart Dreyfus. While the concept of back propagation (the
backward propagation of errors for purposes of training) did exist in the early 1960s,
it was clumsy and inefficient, and would not become useful until 1985.

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning conti..

The 1970s
 The first “convolutional neural networks” were used by Kunihiko Fukushima.
Fukushima designed neural networks with multiple pooling and convolutional layers.
In 1979, he developed an artificial neural network, called Neocognitron, which used a
hierarchical, multilayered design. This design allowed the computer “learn” to
recognize visual patterns.
 The networks resembled modern versions, but were trained with a reinforcement
strategy of recurring activation in multiple layers, which gained strength over time.
Additionally, Fukushima’s design allowed important features to be adjusted manually
by increasing the “weight” of certain connections.

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning conti..

The 1989:
 Yann LeCun provided the first practical demonstration of backpropagation at Bell
Labs. He combined convolutional neural networks with back propagation onto read
“handwritten” digits. This system was eventually used to read the numbers of
handwritten checks.
The1995:
 Dana Cortes and Vladimir Vapnik developed the support vector machine (a system
for mapping and recognizing similar data). LSTM (long short-term memory) for
recurrent neural networks was developed in 1997, by Sepp Hochreiter and Juergen
Schmidhuber.

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning conti..

The 1999:
 The next significant evolutionary step for deep learning took place in
1999, when computers started becoming faster at processing data
and GPU (graphics processing units) were developed. Faster
processing, with GPUs processing pictures, increased computational
speeds by 1000 times over a 10 year span.
 During this time, neural networks began to compete with support
vector machines. While a neural network could be slow compared to a
support vector machine, neural networks offered better results using
the same data. Neural networks also have the advantage of
continuing to improve as more training data is added.

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning conti..

The 2000-2010:
 Around the year 2000, The Vanishing Gradient Problem appeared. It was discovered
“features” (lessons) formed in lower layers were not being learned by the upper
layers, because no learning signal reached these layers. This was not a fundamental
problem for all neural networks, just the ones with gradient-based learning methods.
 The source of the problem turned out to be certain activation functions. A number of
activation functions condensed their input, in turn reducing the output range in a
somewhat chaotic fashion. This produced large areas of input mapped over an
extremely small range. In these areas of input, a large change will be reduced to a
small change in the output, resulting in a vanishing gradient. Two solutions used to
solve this problem were layer-by-layer pre-training and the development of long
short-term memory.

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning conti..

 In 2001, a research report by META Group (now called Gartner) described he


challenges and opportunities of data growth as three-dimensional. The
report described the increasing volume of data and the increasing speed of
data as increasing the range of data sources and types. This was a call to
prepare for the onslaught of Big Data, which was just starting.
 In 2009, Fei-Fei Li, an AI professor at Stanford launched ImageNet,
assembled a free database of more than 14 million labeled images. The
Internet is, and was, full of unlabeled images. Labeled images were needed to
“train” neural nets. Professor Li said, “Our vision was that big data would
change the way machine learning works. Data drives learning.”

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning conti..

2011-2020:
 By 2011, the speed of GPUs had increased significantly, making
it possible to train convolutional neural networks “without” the
layer-by-layer pre-training. With the increased computing
speed, it became obvious deep learning had significant
advantages in terms of efficiency and speed.
 One example is AlexNet, a convolutional neural network whose
architecture won several international competitions during
2011 and 2012. Rectified linear units were used to enhance the
speed and dropout.

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning conti..

 Also in 2012, Google Brain released the results of an unusual


project known as The Cat Experiment. The free-spirited project
explored the difficulties of “unsupervised learning.”
 Deep learning uses “supervised learning,” meaning the
convolutional neural net is trained using labeled data (think
images from ImageNet). Using unsupervised learning, a
convolutional neural net is given unlabeled data, and is then
asked to seek out recurring patterns.

P Jyothi,Asst. Prof., CSE Dept.


The History of Deep Learning conti..

 The Generative Adversarial Neural Network (GAN) was


introduced in 2014. GAN was created by Ian Goodfellow. With
GAN, two neural networks play against each other in a game.
The goal of the game is for one network to imitate a photo, and
trick its opponent into believing it is real. The opponent is, of
course, looking for flaws.

P Jyothi,Asst. Prof., CSE Dept.


Deep Learning Success Stories

 Organizations at every stage of growth—from startups to Fortune


500s—are using deep learning and AI. Deep learning, the fastest
growing field in AI, is empowering immense progress in all kinds
of emerging markets and will be instrumental in ways we haven’t
even imagined.
 Already, deep learning is enabling self-driving cars, smart
personal assistants, and smarter Web services. But the
opportunities aren’t limited to a few business-specific areas. AI
and deep learning are shaping innovation across industries.
Applications of AI, such as fraud detection and supply chain
modernization, are being used by the world’s most advanced
teams and organizations.
P Jyothi,Asst. Prof., CSE Dept.
Deep Learning Success Stories Conti..

Applications in Businesses: Deep Learning Use Cases

 Deep Learning algorithms are becoming more widely used in every industry sector
from online retail to photography; some use cases are more popular and have
attracted extra attention of global media than others. Some widely publicized Deep
Learning applications include:
• Speech recognition used by Amazon Alexa, Google, Apple Siri, or Microsoft Cortana.
• Image recognition used for analyzing documents and pictures residing on large
databases.

P Jyothi,Asst. Prof., CSE Dept.


Deep Learning Success Stories Conti..

• Natural Language Processing (NLP) used for negative sampling, sentiment analysis,
machine translation, or contextual entity linking.
• Automated drug discovery and toxicology used for drug design and development
work, as well as for predictive diagnosis of diseases.
• CRM activities used for automated marketing practices.
• Recommendation engines used in a variety of applications.
• Predictions in gene ontology and gene-function relationships.
• Health predictions based on data collected from wearables and EMRs.

P Jyothi,Asst. Prof., CSE Dept.


Idea of Computational Units

 In deep learning, a layer can be thought of as a


collection of computational units that learn to
detect a repeating occurrence of values1. Units
are linked to one another from one layer to another
in the bulk of neural networks. Each of these links
has weights that control how much one unit
influences another. The neural network learns
more and more about the data as it moves from
one unit to another, ultimately producing an output
from the output layer2.
P Jyothi,Asst. Prof., CSE Dept.
Idea of Computational Units conti..

 Similar to how neurons form the fundamental building blocks of


the brain, deep learning architecture contains a computational
unit that allows modeling of nonlinear functions
called perceptron.
 The magic of deep learning starts with the humble perceptron.
Similar to how a "neuron" in a human brain transmits electrical
pulses throughout our nervous system, the perceptron receives a
list of input signals and transforms them into output signals.

P Jyothi,Asst. Prof., CSE Dept.


Idea of Computational Units conti..

 The perceptron aims to understand data representation by stacking together


many layers, where each layer is responsible for understanding some part of
the input. A layer can be thought of as a collection of computational units that
learn to detect a repeating occurrence of values.
 Each layer of perceptrons is responsible for interpreting a specific pattern
within the data. A network of these perceptrons mimics how neurons in the
brain form a network, so the architecture is called neural networks (or artificial
neural networks).

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model

 In McCulloch-Pitts (MP) model (Figure 1.2) the activation (x) is given by a


weighted sum of its M input values (ai) and

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model Cont..

 The output signal (s) is typically a nonlinear function flx) of the activation
value x. The following equations describe the operation of an MP model:
 Activation:

 Output Signal:

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model Cont..

 Three commonly used nonlinear functions (binary, ramp and sigmoid) are
shown in Figure 1.3, although only the binary function was used in the
original MP model. Networks consisting of MP neurons with binary (on-off)
output signals can be configured to perform several logical functions

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model Cont..

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model Cont..

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model Cont..

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model Cont..

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model Cont..

P Jyothi,Asst. Prof., CSE Dept.


McCulloch-Pitts Model Cont..

P Jyothi,Asst. Prof., CSE Dept.


Perceptron

 The Rosenblatt's perceptron model (Figure 1.5) for an artificial neuron


consists of outputs from sensory units to a fixed set of association units, the
outputs of which are fed to an MP neuron

P Jyothi,Asst. Prof., CSE Dept.


Perceptron Cont..

 The association units perform predetermined


manipulations on their inputs. The main deviation from
the MP model is that learning (i.e., adjustment of
weights) is incorporated in the operation of the imit.

P Jyothi,Asst. Prof., CSE Dept.


Perceptron Cont..

 The desired or target output (b) is compared with the actual binary output
(s), and the error (6) is used to adjust the weights. The following equations
describe the operation of the perceptron model of a neuron:

where is the
learning rate
parameter.

P Jyothi,Asst. Prof., CSE Dept.


Threshold Logic Unit

 The Threshold Logic Unit (TLU) is a basic form of machine


learning model consisting of a single input unit (and
corresponding weights), and an activation function. Note
that the TLU is the most basic form of AI-
neuron/computational unit, knowledge of which will lay the
foundation for advanced topics in machine learning and
deep learning.

P Jyothi,Asst. Prof., CSE Dept.


Threshold Logic Unit Conti..

 The TLU is based on mimicking the functionality of


biological neuron at high-level. A typical neuron
receives a multitude of inputs from afferent
neurons, each associated with weight. The
weighted-inputs are modulated in the receiving
neuron (the efferent) and the neuron responds
accordingly — fires/produces a pulse (1) or no
firing/no pulse (0).

P Jyothi,Asst. Prof., CSE Dept.


Threshold Logic Unit Conti..

 This is achieved in the TLU via an activation


function which takes the activation ‘a’ as an input
to generate a prediction ‘y` . A threshold θ is
defined and the model produces an output if the
threshold is exceeded, otherwise no output.
 In the TLU, each input xᵢ is associated with
a weight wᵢ , in which the sum of the weighted
inputs (products of the input-weight xᵢ × wᵢ ) is
computed to decide the activation a: a = ∑ᴺᵢ₌₁ xᵢ ×
wᵢ
P Jyothi,Asst. Prof., CSE Dept.
Threshold Logic Unit Conti..

TLU Implementation
Having established the theoretical base, the next step is to describe and implement
the training phase of the model. Basically, the implementation is based on the
following steps:
1. Identify inputs and the corresponding representation
2. Identify the free parameters in the problem
3. Specify the learning rule
4. Adjust the free parameters for optimisation
5. Evaluate the model

P Jyothi,Asst. Prof., CSE Dept.


Perceptron Learning Algorithm and
Convergence

 The Perceptron learning algorithm is a linear classifier that


updates a weight vector based on the training patterns. The
algorithm will converge to a separating hyperplane if the
data is linearly separable. The number of updates is
bounded by a function of the data. If the data is not linearly
separable, the algorithm will never converge.
 The Perceptron learning algorithm converges when all the
inputs are classified correctly.

P Jyothi,Asst. Prof., CSE Dept.


Perceptron Learning Algorithm and
Convergence Conti..
 More generally, in d-dimensional space, a set of points
with labels in {+,-} is linearly separable if there exists a
hyperplane in the same space such that all the points
labeled + lie to one side of the hyperplane, and all the
points labeled - lie to the other side of the hyperplane.
 A Perceptron with its parameters fixed may indeed be
viewed as an origin-centred hyperplane that partitions
space into two regions. Concretely, the parameters (or
weights) of the Perceptron may be interpreted as the
components of a normal vector to the hyperplane.
P Jyothi,Asst. Prof., CSE Dept.
Perceptron Learning Algorithm and
Convergence Conti..
 Observe that a normal vector suffices to fix an origin-
centred hyperplane. In fact, the length of the normal
vector is immaterial; its direction alone matters.
 Given a set of points labeled + and -, the Perceptron
Learning Algorithm is an iterative procedure to update the
weights of a Perceptron such that eventually the
corresponding hyperplane contains all the points labeled +
on one side, and all the points labeled - on the other. We
adopt the convention that the points labeled + must lie on
the side of the hyperplane to which its normal points.
P Jyothi,Asst. Prof., CSE Dept.
Perceptron Learning Algorithm and
Convergence Conti..
 The input to the Perceptron Learning Algorithm is a data set of n >= 1 points
(each ddimensional), and their associated labels (+ or -). For mathematical
convenience, we associate the ` label with the number `1, and the ´ label
with the number ´1. Hence, we may take our input to be

P Jyothi,Asst. Prof., CSE Dept.


Perceptron Learning Algorithm and
Convergence Conti..

P Jyothi,Asst. Prof., CSE Dept.


Perceptron Learning Algorithm and
Convergence Conti..
 The algorithm maintains a weight vector, initially the zero vector. If this
weight vector already separates the ` points from the ´ points, we are done.
If not, we pick an arbitrary point that is being misclassified. This point is
used to update the weight vector: in fact, the update is nothing more than
vector addition with

P Jyothi,Asst. Prof., CSE Dept.

You might also like