Unit 1
Unit 1
Topics:
Introduction: History of Deep Learning, McCulloch Pitts Neuron, Multilayer
Perceptrons (MLPs), Sigmoid Neurons, Feed Forward Neural Networks, Back
Propagation.
Introduction:
Relation between Deep Learning and Machine Learning:
From the figure above, we can say that Deep Learning is a subset of
Machine Learning and in turn Machine Learning is a subset of Artificial
Intelligence.
You can think of them as a series of overlapping concentric circles, with
AI occupying the largest, followed by machine learning, then deep learning. In
other words, deep learning is AI, but AI is not deep learning.
What is Artificial Intelligence?
At its most basic level, the field of artificial intelligence uses computer
science and data to enable problem solving in machines.
While we don’t yet have human-like robots trying to take over the world,
we do have examples of AI all around us. These could be as simple as a
computer program that can play chess, or as complex as an algorithm that can
predict the RNA structure of a virus to help develop vaccines.
For a machine or program to improve on its own without further input
from human programmers, we need machine learning.
What is Machine Learning?
Machine learning refers to the study of computer systems that learn and
adapt automatically from experience without being explicitly programmed.
With simple AI, a programmer can tell a machine how to respond to
various sets of instructions by hand-coding each “decision.” With machine
learning models, computer scientists can “train” a machine by feeding it large
amounts of data. The machine follows a set of rules—called an algorithm—to
analyze and draw inferences from the data. The more data the machine parses,
the better it can become at performing a task or making a decision.
Here’s one example you may be familiar with: Music streaming service
Spotify learns your music preferences to offer you new suggestions. Each time
you indicate that you like a song by listening through to the end or adding it to
your library, the service updates its algorithms to feed you more accurate
recommendations. Netflix and Amazon use similar machine learning algorithms
to offer personalized recommendations.
What is Deep Learning?
Where machine learning algorithms generally need human correction
when they get something wrong, deep learning algorithms can improve their
outcomes through repetition, without human intervention.
A machine learning algorithm can learn from relatively small sets of data,
but a deep learning algorithm requires big data sets that might include diverse
and unstructured data.
Think of deep learning as an evolution of machine learning. Deep
learning is a machine learning technique that layers algorithms and
computing units—or neurons—into what is called an artificial neural
network.
These deep neural networks take inspiration from the structure of the
human brain. Data passes through this web of interconnected algorithms in a
non-linear fashion, much like how our brains process information.
Takes less time to train the model. Takes more time to train the model.
Now the sense organs pass the information to the first/lowest layer of
neurons to process it. And the output of the processes is passed on to the next
layers in a hierarchical manner, some of the neurons will fire and some won’t
and this process goes on until it results in a final response — in this case,
laughter.
This massively parallel network also ensures that there is a division of
work. Each neuron only fires when its intended criteria is met i.e., a neuron may
perform a certain role to a certain stimulus, as shown below.
We can see that g(x) is just doing a sum of the inputs — a simple
aggregation. And ‘theta’ here is called thresholding parameter. For
example, if I always watch the game when the sum turns out to be 2 or
more, the ‘theta’ is 2 here. This is called the Thresholding Logic.
Properties of MP Neuron:
1. Binary Nature: Both inputs and outputs are binary, simplifying
the processing model.
2. Logical Operations: The McCulloch-Pitts neuron can
implement basic logical operations (AND, OR, NOT)
depending on the arrangement of inputs and weights.
3. Computational Power: Despite its simplicity, a network of
McCulloch-Pitts neurons can simulate any computable function,
making it a universal model of computation.
Applications of MP Neuron:
Neural Networks: Serves as a basic building block for more
complex artificial neural networks.
Theoretical Neuroscience: Provides insights into the
functioning of biological neural networks.
Limitations of MP Neuron:
Non-Continuity: The binary output limits its ability to model
real-valued inputs and outputs, which are common in biological
systems.
Static Weights: In the original model, weights do not change,
limiting learning capabilities.
The ANN depicted on the right of the image is a simple neural network
called ‘perceptron’. It consists of a single layer, which is the input layer, with
multiple neurons with their own weights; there are no hidden layers. The
perceptron algorithm learns the weights for the input signals in order to draw a
linear decision boundary.
However, to solve more complicated, non-linear problems related to
image processing, computer vision, and natural language processing tasks, we
work with deep neural networks.
There are several types of ANN, each designed for specific tasks and
architectural requirements.
Feedforward Neural Networks (FNN)
These are the simplest form of ANNs, where information flows in one
direction, from input to output. There are no cycles or loops in the network
architecture. Multilayer perceptrons (MLP) are a type of feedforward neural
network.
Recurrent Neural Networks (RNN)
In RNNs, connections between nodes form directed cycles, allowing
information to persist over time. This makes them suitable for tasks involving
sequential data, such as time series prediction, natural language processing, and
speech recognition.
Convolutional Neural Networks (CNN)
CNNs are designed to effectively process grid-like data, such as images.
They consist of layers of convolutional filters that learn hierarchical
representations of features within the input data. CNNs are widely used in tasks
like image classification, object detection, and image segmentation.
Long Short-Term Memory Networks (LSTM) and Gated Recurrent
Units (GRU)
These are specialized types of recurrent neural networks designed to
address the vanishing gradient problem in traditional RNN.
LSTMs and GRUs incorporate gated mechanisms to better capture long-
range dependencies in sequential data, making them particularly effective for
tasks like speech recognition, machine translation, and sentiment analysis.
Autoencoder
It is designed for unsupervised learning and consists of an encoder
network that compresses the input data into a lower-dimensional latent space,
and a decoder network that reconstructs the original input from the latent
representation.
Autoencoders are often used for dimensionality reduction, data
denoising, and generative modelling.
Generative Adversarial Networks (GAN)
GANs consist of two neural networks, a generator and a discriminator,
trained simultaneously in a competitive setting.
The generator learns to generate synthetic data samples that are
indistinguishable from real data, while the discriminator learns to distinguish
between real and fake samples.
GANs have been widely used for generating realistic images, videos, and
other types of data.
Multilayer Perceptron:
A multilayer perceptron is a type of feedforward neural network
consisting of fully connected neurons with a nonlinear kind of activation
function. It is widely used to distinguish data that is not linearly separable.
MLPs have been widely used in various fields, including image
recognition, natural language processing, and speech recognition, among others.
Their flexibility in architecture and ability to approximate any function
under certain conditions make them a fundamental building block in deep
learning and neural network research.
Key Concepts:
Input layer:
The input layer consists of nodes or neurons that receive the initial
input data. Each neuron represents a feature or dimension of the input
data. The number of neurons in the input layer is determined by the
dimensionality of the input data.
Hidden layer:
Between the input and output layers, there can be one or more
layers of neurons. Each neuron in a hidden layer receives inputs from all
neurons in the previous layer (either the input layer or another hidden
layer) and produces an output that is passed to the next layer.
The number of hidden layers and the number of neurons in each
hidden layer are hyperparameters that need to be determined during the
model design phase.
Output layer:
This layer consists of neurons that produce the final output of the
network. The number of neurons in the output layer depends on the
nature of the task.
In binary classification, there may be either one or two neurons
depending on the activation function and representing the probability of
belonging to one class; while in multi-class classification tasks, there can
be multiple neurons in the output layer.
Weights:
Neurons in adjacent layers are fully connected to each other. Each
connection has an associated weight, which determines the strength of the
connection. These weights are learned during the training process.
Bias neurons:
In addition to the input and hidden neurons, each layer (except the
input layer) usually includes a bias neuron that provides a constant input
to the neurons in the next layer. Bias neurons have their own weight
associated with each connection, which is also learned during training.
The bias neuron effectively shifts the activation function of the
neurons in the subsequent layer, allowing the network to learn an offset or
bias in the decision boundary.
By adjusting the weights connected to the bias neuron, the MLP
can learn to control the threshold for activation and better fit the training
data.
Note: It is important to note that in the context of MLPs, bias can refer to
two related but distinct concepts: bias as a general term in machine learning and
the bias neuron (defined above).
In general machine learning, bias refers to the error introduced by
approximating a real-world problem with a simplified model. Bias measures
how well the model can capture the underlying patterns in the data.
A high bias indicates that the model is too simplistic and may underfit
the data, while a low bias suggests that the model is capturing the underlying
patterns well.
Activation function:
Typically, each neuron in the hidden layers and the output layer
applies an activation function to its weighted sum of inputs.
Common activation functions include sigmoid, tanh, ReLU
(Rectified Linear Unit), and Softmax. These functions introduce
nonlinearity into the network, allowing it to learn complex patterns in the
data.
Training with backpropagation:
MLPs are trained using the backpropagation algorithm, which
computes gradients of a loss function with respect to the model's
parameters and updates the parameters iteratively to minimize the loss.
Working of MultiLayer Perceptron: Layer by Layer
In a multilayer perceptron, neurons process information in a step-by-step
manner, performing computations that involve weighted sums and nonlinear
transformations. Let's walk layer by layer to see the magic that goes within.
In a multilayer perceptron, neurons process information in a step-by-step
manner, performing computations that involve weighted sums and nonlinear
transformations. Let's walk layer by layer to see the magic that goes within.
Input layer
The input layer of an MLP receives input data, which could be
features extracted from the input samples in a dataset. Each neuron in
the input layer represents one feature.
Neurons in the input layer do not perform any computations; they
simply pass the input values to the neurons in the first hidden layer.
Hidden layers
The hidden layers of an MLP consist of interconnected neurons that
perform computations on the input data.
Each neuron in a hidden layer receives input from all neurons in the
previous layer. The inputs are multiplied by corresponding weights,
denoted as w. The weights determine how much influence the input
from one neuron has on the output of another.
In addition to weights, each neuron in the hidden layer has an
associated bias, denoted as b. The bias provides an additional input to
the neuron, allowing it to adjust its output threshold. Like weights,
biases are learned during training.
For each neuron in a hidden layer or the output layer, the weighted
sum of its inputs is computed. This involves multiplying each input by
its corresponding weight, summing up these products, and adding the
bias:
Sigmoid Neurons:
Sigmoid neuron is the building block of deep neural networks. It is
similar to perceptron, but they are slightly modified such that the output is much
smoother that the step functional output from the perceptron.
Why Sigmoid Neuron?
Perceptron model takes several real-valued inputs and gives a single
binary output. In the perceptron model, every input xi has weight wi associated
with it. The weights indicate the importance of the input in the decision-making
process.
The model output is decided by a threshold Wₒ if the weighted sum of the
inputs is greater than threshold Wₒ output will be 1 else output will be 0. In
other words, the model will fire if the weighted sum is greater than the
threshold.
The image in the left is the perceptron and the image in the right indicates
the mathematical representation of the perceptron.
From the mathematical representation, we can say that the thresholding
logic used by the perceptron is very harsh. Let’s see the harsh thresholding logic
of the perceptron with an example.
Consider the decision-making process of a person, whether he/she would
like to purchase a car or not based on only one input X1 — Salary and by setting
the threshold b(Wₒ) = -10 and the weight W₁ = 0.2. The output from the
perceptron model will look like in the figure shown below.
Back Propagation:
Backpropagation is a key algorithm used in training artificial neural
networks. It efficiently computes the gradient of the loss function with respect
to the weights of the network, enabling optimization methods like gradient
descent. Here’s a detailed breakdown of the backpropagation process:
Overview
1. Feedforward Phase: Input data is passed through the network to produce
an output.
2. Loss Calculation: The output is compared to the target (true) values
using a loss function to compute the error.
3. Backpropagation Phase: The error is propagated backward through the
network to update the weights.
Let us consider a sample neural network as shown below:
The output vector is nothing but ‘a’ vector at level 4. It can be represented
as:
^y =⃗a( 4)
In the example, we have two hidden layers, and they can be represented
in terms of activation(a) as a⃗ (2) and a⃗ (3) .
In the feed forward phase, the input data is passed through the network to
produce an output. This is shown in the figure below.
In all the procedures, the weights are guessed. At the end of this phase,
we get the cost function ‘J’. Ideally, the cost function should be 0. But that is
not going to be the case because the guessed weights are typically not going
to be so good.
So, ‘J’ is a function of y, the ground truth and the output ^y . It can be
represented as:
J (y, ^y ¿
The next step is to figure out which of these weights was responsible for
this higher J.
So, the task is to essentially redistribute this ‘J’ to all these weights. This
is shown in the figure below.
∂J
So, this procedure of redistributing weights(w) using ∂ ω is called
∂J
Gradient Descent, but just calculating ∂ ω is called Back Propagation.
Back Propagation can be done in 2 methods:
1) Finite Difference method.
2) Chain rule.
Finite Difference method:
The steps followed in this method are as follows:
o Guess ⃗ω and do a forward pass. Forward pass is nothing but for
given input x, calculate ^y using w. Calculate J (⃗
ω)
o Guess ⃗ ω , do a forward pass, we get the output as ^y + Δ ^y .
ω+ Δ ⃗
Calculate J (⃗ ω ).
ω+ Δ ⃗
o Then,
∂J J(⃗ ω )−J ( ω
w+Δ ⃗ ⃗)
≈
∂ω Δ⃗
ω
∂ J J ( ω1 + Δ ω1 )−J ( ω1 )
≈
∂ ω1 Δ ω1
∂J
∂ ω is calculated for every parameter and there could be millions of
calculation, there are 2 calculations to be done i.e., J(w) and J(w+ Δw).
parameters and so, these are millions of calculations and for each
∂J
In the example above, back propagation is nothing but calculating ∂ ω ,
1
∂J ∂J
∂ ω2
and ∂ ω3
.
∂J
Now, it is actually easiest to find ∂ ω because this is the closest for being
3
responsible for J.
So, let’s find this term first.
The assumption here is that ‘J’ is the binary entropy cost function.
J = -{yln ^y +(1-y) ln (1- ^y ))}
g = Sigmoid function
So,
(4) (4 )
∂J ∂ J ∂a ∂z
( 3)
= ( 4 ) ⋅ ( 4) ⋅ ( )
∂ω ∂a ∂ z ∂ω 3
So,
∂J
∂ω
( 3) = -{y-a(4)} a(3)
The generalised notation for above equation (at layer ‘l’) can be written as:
∂J
( l)
≡ δ ( l)
∂z
∂J
( 4 ) denotes error in activation.
∂z
Therefore,
∂J
( 3)
=δ (4 ) a(3 )
∂ω
∂J
Suppose, we have to calculate
∂ω
( 2) , then it can be represented using
chain rule as:
(4) ( 4) ( 3) (3 )
∂J ∂ J ∂a ∂z ∂a ∂ z
( 2)
= ( 4 ) ⋅ ( 4 ) ⋅ ( 3 ) ⋅ ( 3) ⋅ ( )
∂ω ∂a ∂ z ∂a ∂z ∂ω2
∂J
( 2)
=δ (3) a( 2)
∂ω
(3 )
∂a (3) (3)
( 3 ) can be written as g (z ) i.e., g prime of z .
∂z
Therefore,
δ(3) = δ(4)w(3) g (z(3))
Similarly,
δ (l) = δ(l+1)w(l)g (z(l))
∂J ( l+1 ) (l )
( l)
=δ a
∂ω
∂J
( l)
=δ (jl+1 ) a (il )
∂ ω ⅈj
The summary is if there is some input ‘x’, somehow using some guessed
∂J
weights ‘w’, we get output ^y weight ‘w’ is improved using ∂ w .
∂J
The main computation in neural networks is calculating this ∂ w for a
given guess ‘w’. This is done using Back Propagation.
The reason it is called back propagation is obvious. First, the error at the
last layer is calculated and we propagate to the first layer.
Advantages:
Efficiency
Enabling Neural networks to learn from vast amounts of data.
Flexibility
Can be applied to wide range of neural networks.
Accuracy
By iteratively adjusting weights, it can achieve high accuracy in
complex tasks.
Applications:
Image Recognition
Identifies objects in images
Enables applications like self-driving cars and medical
diagnosis.
Natural Language Processing
Powers machine translation, text summarization, and
chatbot systems.
Robotics
Enable robots to learn from experience and adapt to
changing environments.
Game AI
Creates intelligent opponents and enhancing game
realism.