0% found this document useful (0 votes)
16 views40 pages

NNDL Unit 3

Uploaded by

muthuprabhu678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views40 pages

NNDL Unit 3

Uploaded by

muthuprabhu678
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

UNIT III THIRD-GENERATION NEURAL NETWORKS

Spiking Neural Networks-Convolutional Neural Networks-Deep Learning Neural Networks-Extreme


Learning Machine Model-Convolutional Neural Networks: The Convolution Operation – Motivation –
Pooling – Variants of the basic Convolution Function – Structured Outputs – Data Types – Efficient
Convolution Algorithms – Neuroscientific Basis – Applications: Computer Vision, Image Generation, Image
Compression.

1. SPIKING NEURAL NETWORKS:


 Artificial neural networks that closely mimic natural neural networks are known as spiking neural
networks (SNNs).
 In addition to neuronal and synaptic status, SNNs incorporate time into their working model.
 The idea is that neurons in the SNN do not transmit information at the end of each propagation cycle
(as they do in traditional multi-layer perceptron networks), but only when a membrane potential – a
neuron’s intrinsic quality related to its membrane electrical charge – reaches a certain value, known
as the threshold.
 The neuron fires when the membrane potential hits the threshold, sending a signal to neighboring
neurons, which increase or decrease their potentials in response to the signal. A spiking neuron model
is a neuron model that fires at the moment of threshold crossing.

SNN with connections and Biological Neuron

 Artificial neurons, despite their striking resemblance to biological neurons, do not behave in the
same way.
 Biological and artificial NNs differ fundamentally in the following ways:

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 Structure in general
 Computations in the brain
 In comparison to the brain, learning is a rule.
 Alan Hodgkin and Andrew Huxley created the first scientific model of a Spiking Neural Network in
1952.
 The model characterized the initialization and propagation of action potentials in biological neurons.
 Biological neurons, on the other hand, do not transfer impulses directly.
 In order to communicate, chemicals called neurotransmitters must be exchanged in the synaptic gap.

HOW DOES SPIKING NEURAL NETWORK WORK?

Key Concepts

 Each neuron has a value that is equivalent to the electrical potential of biological neurons at any given
time.
 The value of a neuron can change according to its mathematical model; for example, if a neuron gets
a spike from an upstream neuron, its value may rise or fall.
 If a neuron’s value surpasses a certain threshold, the neuron will send a single impulse to each
downstream neuron connected to the first one, and the neuron’s value will immediately drop below
its average.
 As a result, the neuron will go through a refractory period similar to that of a biological neuron. The
neuron’s value will gradually return to its average over time.

Spike Based Neural Codes

 Artificial spiking neural networks are designed to do neural computation.


 This necessitates that neural spiking is given meaning: the variables important to the computation
must be defined in terms of the spikes with which spiking neurons communicate.
 A variety of neuronal information encodings have been proposed based on biological knowledge:
i. Binary Coding:
 Binary coding is an all-or-nothing encoding in which a neuron is either active or inactive within a
specific time interval, firing one or more spikes throughout that time frame.
 The finding that physiological neurons tend to activate when they receive input (a sensory stimulus
such as light or external electrical inputs) encouraged this encoding.
 Individual neurons can benefit from this binary abstraction because they are portrayed as binary units
that can only accept two on/off values.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 It can also be applied to the interpretation of spike trains from current spiking neural networks, where
a binary interpretation of the output spike trains is employed in spike train classification.
ii. Rate Coding:
 Only the rate of spikes in an interval is employed as a metric for the information communicated in rate
coding, which is an abstraction from the timed nature of spikes.
 The fact that physiological neurons fire more frequently for stronger (sensory or artificial) stimuli
motivates rate encoding.
 It can be used at the single-neuron level or in the interpretation of spike trains once more.
 In the first scenario, neurons are directly described as rate neurons, which convert real-valued input
numbers “rates” into an output “rate” at each time step.
 In technical contexts and cognitive research, rate coding has been the concept behind conventional
artificial “sigmoidal” neurons.
iii. Fully Temporal Codes
 The encoding of a fully temporal code is dependent on the precise timing of all spikes.
 Evidence from neuroscience suggests that spike-timing can be incredibly precise and repeatable.
 Timings are related to a certain (internal or external) event in a fully temporal code (such as the onset
of a stimulus or spike of a reference neuron).
iv. Latency Coding
 The timing of spikes is used in latency coding, but not the number of spikes.
 The latency between a specific (internal or external) event and the first spike is used to encode
information.
 This is based on the finding that significant sensory events cause upstream neurons to spike earlier.
 This encoding has been employed in both unsupervised and supervised learning approaches, such as
Spike Prop and the Chronotron, among others.
 Information about a stimulus is encoded in the order in which neurons within a group generate their
first spikes, which is closely connected to rank-order coding.

SNN ARCHITECTURE

 Spiking neurons and linking synapses are described by configurable scalar weights in an SNN
architecture.
 The analogue input data is encoded into the spike trains using either a rate-based technique, some sort
of temporal coding or population coding as the initial stage in building an SNN.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 A biological neuron in the brain (and a simulated spiking neuron) gets synaptic inputs from other
neurons in the neural network, as previously explained.
 Both action potential production and network dynamics are present in biological brain networks.

 The network dynamics of artificial SNNs are much simplified as compared to actual biological
networks.
 It is useful in this context to suppose that the modelled spiking neurons have pure threshold dynamics
(as opposed to refractoriness, hysteresis, resonance dynamics, or post-inhibitory rebound features).
 When the membrane potential of postsynaptic neurons reaches a threshold, the activity of presynaptic
neurons affects the membrane potential of postsynaptic neurons, resulting in an action potential or spike.

LEARNING RULES IN SNN’S

 Learning is achieved in practically all ANNs, spiking or non-spiking, by altering scalar-valued


synaptic weights.
 Spiking allows for the replication of a form of bio-plausible learning rule that is not possible in non-
spiking networks.
 Many variations of this learning rule have been uncovered by neuroscientists under the umbrella
term spike-timing-dependent plasticity (STDP).

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 Its main feature is that the weight (synaptic efficacy) connecting a pre-and post-synaptic neuron is
altered based on their relative spike times within tens of millisecond time intervals.
 The weight adjustment is based on information that is both local to the synapse and local in time.
The next subsections cover both unsupervised and supervised learning techniques in SNNs.
i. Unsupervised Learning
 Data is delivered without a label, and the network receives no feedback on its performance. Detecting
and reacting to statistical correlations in data is a common activity.
 Hebbian learning and its spiking generalizations, such as STDP, are a good example of this.
 The identification of correlations can be a goal in and of itself, but it can also be utilized to cluster or
classify data later on.
 STDP is defined as a process that strengthens a synaptic weight if the post-synaptic neuron activates
soon after the pre-synaptic neuron fires, and weakens it if the post-synaptic neuron fires later.
 This conventional form of STDP, on the other hand, is merely one of the numerous physiological
forms of STDP.
ii. Supervised Learning
 In supervised learning, data (the input) is accompanied by labels (the targets), and the learning
device’s purpose is to correlate (classes of) inputs with the target outputs (a mapping or regression
between inputs and outputs).
 An error signal is computed between the target and the actual output and utilized to update the
network’s weights.
 Supervised learning allows us to use the targets to directly update parameters, whereas reinforcement
learning just provides us with a generic error signal (“reward”) that reflects how well the system is
functioning.
 In practice, the line between the two types of supervised learning is blurred.

APPLICATION OF SPIKING NEURAL NETWORKS

 In theory, SNNs can be used in the same applications as standard ANNs.


 SNNs can also stimulate the central nervous systems of biological animals, such as an insect seeking
food in an unfamiliar environment.
 They can be used to examine the operation of biological brain networks due to their realism.
 Starting with a hypothesis about the topology and function of a real neural circuit, recordings of this
circuit can be compared to the output of the appropriate SNN to assess the hypothesis’ plausibility.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 However, adequate training processes for SNNs are lacking, which can be a hindrance in particular
applications, such as computer vision.

ADVANTAGES OF SNN

 SNN is a dynamic system. As a result, it excels in dynamic processes like speech and dynamic picture
identification.
 When an SNN is already working, it can still train.
 To train an SNN, you simply need to train the output neurons.
 Traditional ANNs often have more neurons than SNNs; however, SNNs typically have fewer
neurons.
 Because the neurons send impulses rather than a continuous value, SNNs can work incredibly
quickly.
 Because they leverage the temporal presentation of information, SNNs have boosted information
processing productivity and noise immunity.

DISADVANTAGES

 SNNs are difficult to train.


 As of now, there is no learning algorithm built expressly for this task.
 Building a small SNN is impracticable.

2. DEEP LEARNING NEURAL NETWORKS:


 Deep learning is a branch of machine learning which is based on artificial neural networks.
 It is capable of learning complex patterns and relationships within data. In deep learning, we don’t
need to explicitly program everything.
 It has become increasingly popular in recent years due to the advances in processing power and
the availability of large datasets.
 Because it is based on artificial neural networks (ANNs) also known as deep neural networks
(DNNs).
 These neural networks are inspired by the structure and function of the human brain’s biological
neurons, and they are designed to learn from large amounts of data.
 Deep Learning is a subfield of Machine Learning that involves the use of neural networks to model
and solve complex problems.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 Neural networks are modeled after the structure and function of the human brain and consist of
layers of interconnected nodes that process and transform data.
 The key characteristic of Deep Learning is the use of deep neural networks, which have multiple
layers of interconnected nodes.
 These networks can learn complex representations of data by discovering hierarchical patterns and
features in the data.
 Deep Learning algorithms can automatically learn and improve from data without the need for
manual feature engineering.
 Deep Learning has achieved significant success in various fields, including image recognition,
natural language processing, speech recognition, and recommendation systems.
 Some of the popular Deep Learning architectures include Convolutional Neural Networks (CNNs),
Recurrent Neural Networks (RNNs), and Deep Belief Networks (DBNs).
 Training deep neural networks typically requires a large amount of data and computational
resources.
 However, the availability of cloud computing and the development of specialized hardware, such
as Graphics Processing Units (GPUs), has made it easier to train deep neural networks.
 Deep Learning is a subfield of Machine Learning that involves the use of deep neural networks to
model and solve complex problems.
 Deep Learning has achieved significant success in various fields, and its use is expected to
continue to grow as more data becomes available, and more powerful computing resources become
available.

What is Deep Learning?


 Deep learning is the branch of machine learning which is based on artificial neural network
architecture.
 An artificial neural network or ANN uses layers of interconnected nodes called neurons that work
together to process and learn from the input data.
 In a fully connected deep neural network, there is an input layer and one or more hidden layers
connected one after the other.
 Each neuron receives input from the previous layer neurons or the input layer.
 The output of one neuron becomes the input to other neurons in the next layer of the network, and
this process continues until the final layer produces the output of the network.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 The layers of the neural network transform the input data through a series of nonlinear
transformations, allowing the network to learn complex representations of the input data.

 Deep learning has become one of the most popular and visible areas of machine learning, due to
its success in a variety of applications, such as computer vision, natural language processing, and
Reinforcement learning.
 Deep learning can be used for
i. Supervised
ii. unsupervised
iii. reinforcement machine learning
i. Supervised Machine Learning:
 Supervised machine learning is the machine learning technique in which the neural network learns
to make predictions or classify data based on the labeled datasets. Here we input both input
features along with the target variables. The neural network learns to make predictions based on
the cost or error that comes from the difference between the predicted and the actual target, this
process is known as back propagation. Deep learning algorithms like Convolutional neural
networks, recurrent neural networks are used for many supervised tasks like image classifications
and recognization, sentiment analysis, language translations, etc.
ii. Unsupervised Machine Learning:
 Unsupervised machine learning is the machine learning technique in which the neural network
learns to discover the patterns or to cluster the dataset based on unlabeled datasets. Here there are
no target variables. While the machine has to self-determined the hidden patterns or relationships
within the datasets. Deep learning algorithms like auto encoders and generative models are used
for unsupervised tasks like clustering, dimensionality reduction, and anomaly detection.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


iii. Reinforcement Machine Learning:
 Reinforcement Machine Learning is the machine learning technique in which an agent learns to
make decisions in an environment to maximize a reward signal. The agent interacts with the
environment by taking action and observing the resulting rewards. Deep learning can be used to
learn policies, or a set of actions, that maximizes the cumulative reward over time. Deep
reinforcement learning algorithms like Deep Q networks and Deep Deterministic Policy
Gradient (DDPG) are used to reinforce tasks like robotics and game playing etc.
DIFFERENCE BETWEEN MACHINE LEARNING AND DEEP LEARNING
 Machine learning and deep learning both are subsets of artificial intelligence but there are many
similarities and differences between them.

Machine Learning Deep Learning

Apply statistical algorithms to learn the hidden Uses artificial neural network architecture to learn
patterns and relationships in the dataset. the hidden patterns and relationships in the dataset.

Requires the larger volume of dataset compared to


Can work on the smaller amount of dataset
machine learning

Better for complex task like image processing,


Better for the low-label task.
natural language processing, etc.

Takes less time to train the model. Takes more time to train the model.

A model is created by relevant features which are


Relevant features are automatically extracted from
manually extracted from images to detect an
images. It is an end-to-end learning process.
object in the image.

More complex, it works like the black box


Less complex and easy to interpret the result.
interpretations of the result are not easy.

It can work on the CPU or requires less


It requires a high-performance computer with GPU.
computing power as compared to deep learning.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


TYPES OF NEURAL NETWORKS:
 Deep Learning models are able to automatically learn features from the data, which makes them
well-suited for tasks such as image recognition, speech recognition, and natural language
processing.
 The most widely used architectures in deep learning are
i. Feed forward neural networks
ii. Convolutional neural networks (CNNs)
iii. Recurrent neural networks (RNNs).
i. Feed forward neural networks (FNNs) are the simplest type of ANN, with a linear flow of
information through the network. FNNs have been widely used for tasks such as image classification,
speech recognition, and natural language processing.
ii. Convolutional Neural Networks (CNNs) are specifically for image and video recognition tasks.
CNNs are able to automatically learn features from the images, which makes them well-suited for
tasks such as image classification, object detection, and image segmentation.
iii.Recurrent Neural Networks (RNNs) are a type of neural network that is able to process sequential
data, such as time series and natural language. RNNs are able to maintain an internal state that captures
information about the previous inputs, which makes them well-suited for tasks such as speech
recognition, natural language processing, and language translation.
APPLICATIONS OF DEEP LEARNING:
The main applications of deep learning can be divided into

i. Computer vision
ii. Natural language processing (NLP)
iii. Reinforcement learning.

i. Computer vision
 In computer vision, Deep learning models can enable machines to identify and understand visual
data. Some of the main applications of deep learning in computer vision include:
 Object detection and recognition: Deep learning model can be used to identify and locate objects
within images and videos, making it possible for machines to perform tasks such as self-driving
cars, surveillance, and robotics.
 Image classification: Deep learning models can be used to classify images into categories such
as animals, plants, and buildings. This is used in applications such as medical imaging, quality
control, and image retrieval.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 Image segmentation: Deep learning models can be used for image segmentation into different
regions, making it possible to identify specific features within images.
ii. Natural language processing (NLP):
 In NLP, the Deep learning model can enable machines to understand and generate human
language. Some of the main applications of deep learning in NLP include:
 Automatic Text Generation – Deep learning model can learn the corpus of text and new text like
summaries, essays can be automatically generated using these trained models.
 Language translation: Deep learning models can translate text from one language to another,
making it possible to communicate with people from different linguistic backgrounds.
 Sentiment analysis: Deep learning models can analyze the sentiment of a piece of text, making it
possible to determine whether the text is positive, negative, or neutral. This is used in applications
such as customer service, social media monitoring, and political analysis.
 Speech recognition: Deep learning models can recognize and transcribe spoken words, making it
possible to perform tasks such as speech-to-text conversion, voice search, and voice-controlled
devices.
iii. Reinforcement learning:
 In reinforcement learning, deep learning works as training agents to take action in an environment
to maximize a reward. Some of the main applications of deep learning in reinforcement learning
include:
 Game playing: Deep reinforcement learning models have been able to beat human experts at
games such as Go, Chess, and Atari.
 Robotics: Deep reinforcement learning models can be used to train robots to perform complex
tasks such as grasping objects, navigation, and manipulation.
 Control systems: Deep reinforcement learning models can be used to control complex systems
such as power grids, traffic management, and supply chain optimization.
ADVANTAGES OF DEEP LEARNING:
1. High accuracy: Deep Learning algorithms can achieve state-of-the-art performance in various
tasks, such as image recognition and natural language processing.
2. Automated feature engineering: Deep Learning algorithms can automatically discover and learn
relevant features from data without the need for manual feature engineering.
3. Scalability: Deep Learning models can scale to handle large and complex datasets, and can learn
from massive amounts of data.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


4. Flexibility: Deep Learning models can be applied to a wide range of tasks and can handle various
types of data, such as images, text, and speech.
5. Continual improvement: Deep Learning models can continually improve their performance as
more data becomes available.
DISADVANTAGES OF DEEP LEARNING:
1. High computational requirements: Deep Learning models require large amounts of data and
computational resources to train and optimize.
2. Requires large amounts of labeled data: Deep Learning models often require a large amount of
labeled data for training, which can be expensive and time- consuming to acquire.
3. Interpretability: Deep Learning models can be challenging to interpret, making it difficult to
understand how they make decisions.
4. Over fitting: Deep Learning models can sometimes overfit to the training data, resulting in poor
performance on new and unseen data.
5. Black-box nature: Deep Learning models are often treated as black boxes, making it difficult to
understand how they work and how they arrived at their predictions.
3. EXTREME LEARNING MACHINE MODEL:
 Extreme Learning Machine commonly referred to as ELM, is one of the machine learning
algorithms introduced by Huang et al in 2006.
 This algorithm has gained widespread recognition in recent years, primarily due to its lightning-
fast learning capabilities, exceptional generalization performance, and ease of implementation.
 This makes it awesome for businesses and researchers because they can get results fast and
efficient way.
 It provides a significant contribution to fields like Image recognition, speech recognition, Natural
language processing financial forecasting, medical diagnosis, social media analysis, and
recommendation systems.
What is ELM in Machine Learning?
 In Deep learning, an Extreme Learning Machine (ELM) is a type of feed forward neural
network utilized for tasks such as classifications and regression.
 ELM stands apart from traditional feed forward neural networks due to its unique training
approach.
 In ELM, the hidden layer’s weights and biases are randomly initialized. However, these initial
values are just starting points.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 The distinctive aspect of ELM lies in its ability to compute the output layer’s weights using
the Moore-Penrose generalized inverse of the hidden layer’s output matrix.
 This approach enables ELM to learn from training data in a single step, setting it apart from
traditional neural networks that often require iterative training procedures, such as back
propagation.
 It uses single-hidden layer feed forward neural networks (SLFN) instead of traditional feed
forward neural networks.
 Thus it randomly selects hidden nodes and analytically finds their output weight.
 ELM’s single-step training process makes it an efficient and versatile tool for a wide range of
machine-learning applications.
ARCHITECTURE OF ELM
 The architecture of ELM is very simple and straight forward which involves three segments
which are listed below,
1. Input layer
2. Hidden layer – Single hidden layer
3. Output layer

Fig: ELM Architecture

i. Input layer

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 In ELM, the Input Layer is where the data enters the model. It’s represented as a vector called
X, which contains the input features.
X = [X [1], X [2], X [3]... X [N]]

 In this representation, each X[i] corresponds to a specific feature or attribute of the data. N is
the total number of features.
 The Input Layer is responsible for passing the data to the Hidden Layer for further processing.
ii. Hidden layer-single hidden layer
 The hidden layer of ELM is where random weights and biases are assigned. Let’s denote the
number of hidden neurons as L as per above Fig 1.
 The weights connecting the input features to the hidden neurons are represented by a weight
matrix W of size (number of features, L).
 The value of N is a hyper parameter that needs to be set before training the neural network.
 The more hidden neurons there are, the more complex the neural network will be and the more
accurate it will be at modeling complex functions. However, having too many neurons will
lead to over fitting.
 Each column in the weight matrix corresponds to the weights of a hidden neuron. The biases
for the hidden neurons are represented by a bias vector b of size (L, 1).
 Thе second dimension of 1 is used to ensure that the bias vector is a column vector.
 This is because the dot product of the weight matrix W and input feature vector X results in a
column vector, and adding a row vector (the bias vector) to a column vector requires that the
bias vector be a column vector as well.
 Thе purpose of the bias term is to shift the activation function to the left or right, allowing it
to model more complex functions.
 The output of the hidden layer, often denoted as H, is calculated by applying the activation
function g like linear regression concept by making element-wise to the dot product of the
input features and the weights, adding the bias.
H = g (W * X + b)

iii. Output layer


 In ELM, the output layer weights are calculated using Moore-Penrose inverse of the hidden
layer output matrix.
 This output weight matrix is denoted as beta. The output predictions, represented as f(x), are
calculated by multiplying the hidden layer output H by the output weights beta:

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


F (x) = H * beta

 To make predictions, we multiply the hidden layer output H by the output weights beta. Each
row in f(x) represents the predictions for a corresponding data point.
 The output predictions f(x) is a matrix of size J x K, where J is the number of data points and
K is the number of output variables.
 H is a matrix of size J x L, where L is the number of hidden neurons that contains the
transformed input data after applying the random weights and biases of the hidden layer. Each
row represents to data points and each column represents to hidden neuron.
 The output weights beta is a matrix of size L x K that constitutes a link between hidden layer
output to output predictions. Each row corresponds to hidden neuron, and each column
represents an output variable.
HOW ELM TRAINED:
ELM is get trained based on input training data in step by step procedure which are listed below,
1. Input Training Data: The first step is to gather the training data which include input features and
target variable to feed into ELM machine learning algorithm.
2. Random Initialization: Next the weights and bias are randomly initialized in hidden layer thus
this process eliminate iterative weight adjustment.
3. Feature Mapping: Thе input data is transformed into a high-dimensional feature space using
randomly assigned weights. This process is known as feature mapping and helps to capture
complex relationships between the input features.
4. Hidden Layer Processing: Then the transformed input data is processed by the hidden layer,
resulting in an output matrix. This output is a key component in ELM’s unique single-step learning
process.
5. Output Weight Calculation: In the fifth step ELM leverage the output matrix by applying Moore-
Penrose generalized inverse to calculate the output weights. This mathematical technique ensures
robustness, even in the presence of noise or missing data.
6. Model Evaluation: Once the output weights are determined by Moore-Penrose generalized
inverse, the model’s training output is generated. This output is evaluated by specific metrics
based on the problem use cases.
7. Fine-Tuning and Hyper parameters Adjustment: Depending on the assessment results, fine-
tuning or hyper parameter adjustments can be applied to improve model performance.
8. Deployment: Finally training process is complete, the ELM model is ready for deployment in
real-world applications, where it can make predictions and decisions based on the learned patterns.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


APPLICATION OF ELM
Extreme learning machine is used in wide range of application in machine learning and artificial
intelligence which are listed below,
1. It is used in Image recognition and classification task, where applied to identify object in photos
and perform analysis on medical image reports.
2. ELM is used in Speech recognition task, by converting human speech to text which applied in
voice assistants, transcription services.
3. It is used in various Natural language processing tasks, including text classification, sentiment
analysis, language translation, and Chabot development.
4. ELM is used in finance sector to predict stock prices, exchange rates, and other financial data
which beneficial for trader and investors.
5. It is used in social media analysis by implementing sentiment analysis, trend detection, and user
behavior understanding, which is beneficial for marketing and brand management.
6. It is used in recommendation system that suggest products, content, or services to users based on
their preferences and behavior which is beneficial to e-commerce website and content platform.
7. It is used for predictive maintenance, helping to prevent equipment failures by analyzing sensor
data which is beneficial to manufacturing industries.
ADVANTAGE OF ELM
Extreme learning machine has number of advantage over other machine learning algorithm which are
listed below,
1. ELM is a relatively simple algorithm, which makes it easier to explain how the model makes
decisions.
2. ELM can learn from the training data in one step, without repeating the learning process in
multiple steps. This makes much fast to train compared to other neural network methods like back
propagation.
3. ELM is known for good generalization performance, even when the training data is limited.
This means that ELM models are less overfit in training data, and they can still do a good job when
dealing with new, unseen test data.
4. ELM can handle noisy and incomplete training data effectively by using Moore-Penrose
generalized inverse. This method calculate output weights, which is particularly avoid these types
of errors.
5. ELM is easy to implement in practice, and there many open source libraries to help with
training and utilizing ELM models.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


LIMITATION OF ELM
Extreme learning machine has few limitation when compared to other machine learning algorithm
which are listed below,
1. In ELM initializing random weight in the hidden layer can make hyper parameter tuning a
bit challenge and difficult. It requires experimentation to select right combination of hidden
neurons and activation functions.
2. ELM lacks the ability to fine tune or customize features according to the specifics of a
problem. It relies on a somewhat random approach to transform input data, which might not be
ideal for tasks requiring precise in feature engineering.
3. ELM is primarily designed for batch learning where all the data is available at once, and it may
not be suitable for tasks that require sequential or online learning where the model needs to
adapt to changing data over time.
4. CONVOLUTIONAL NEURAL NETWORKS:

 Convolutional Neural Network (CNN) is the extended version of artificial neural networks
(ANN) which is predominantly used to extract the feature from the grid-like matrix dataset.
 For example visual datasets like images or videos where data patterns play an extensive role.
CNN ARCHITECTURE:
 Convolutional Neural Network consists of multiple layers like the input layer, Convolutional layer,
Pooling layer, and fully connected layers.

Fig: Simple CNN architecture

 The Convolutional layer applies filters to the input image to extract features, the Pooling layer
down samples the image to reduce computation, and the fully connected layer makes the final
prediction.
 The network learns the optimal filters through back propagation and gradient descent.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


HOW CONVOLUTIONAL LAYERS WORKS:
 Convolution Neural Networks or converts are neural networks that share their parameters. Imagine
you have an image.
 It can be represented as a cuboid having its length, width (dimension of the image), and height (i.e
the channel as images generally have red, green, and blue channels).

 Now imagine taking a small patch of this image and running a small neural network, called a filter
or kernel on it, with say, K outputs and representing them vertically.
 Now slide that neural network across the whole image, as a result, we will get another image with
different widths, heights, and depths. Instead of just R, G, and B channels now we have more
channels but lesser width and height. This operation is called Convolution.
 If the patch size is the same as that of the image it will be a regular neural network. Because of
this small patch, we have fewer weights.

Fig: Deep Learning Udacity

Now let’s talk about a bit of mathematics that is involved in the whole convolution process.
 Convolution layers consist of a set of learnable filters (or kernels) having small widths and heights
and the same depth as that of input volume (3 if the input layer is image input).

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 For example, if we have to run convolution on an image with dimensions 34x34x3. The possible
size of filters can be axax3, where ‘a’ can be anything like 3, 5, or 7 but smaller as compared to
the image dimension.
 During the forward pass, we slide each filter across the whole input volume step by step where
each step is called stride (which can have a value of 2, 3, or even 4 for high-dimensional images)
and compute the dot product between the kernel weights and patch from input volume.
 As we slide our filters we’ll get a 2-D output for each filter and we’ll stack them together as a
result, we’ll get output volume having a depth equal to the number of filters. The network will
learn all the filters.
LAYERS USED TO BUILD CONVNETS
 A complete Convolution Neural Networks architecture is also known as covnets.
 A covnets is a sequence of layers, and every layer transforms one volume to another through
a differentiable function.
Types of layers:
Let’s take an example by running a covnets on of image of dimension 32 x 32 x 3.
 Input Layers: It’s the layer in which we give input to our model. In CNN, Generally, the input
will be an image or a sequence of images. This layer holds the raw input of the image with width
32, height 32, and depth 3.
 Convolutional Layers: This is the layer, which is used to extract the feature from the input dataset.
It applies a set of learnable filters known as the kernels to the input images. The filters/kernels are
smaller matrices usually 2×2, 3×3, or 5×5 shape. It slides over the input image data and computes
the dot product between kernel weight and the corresponding input image patch. The output of this
layer is referred as feature maps. Suppose we use a total of 12 filters for this layer we’ll get an
output volume of dimension 32 x 32 x 12.
 Activation Layer: By adding an activation function to the output of the preceding layer, activation
layers add nonlinearity to the network. It will apply an element-wise activation function to the
output of the convolution layer. Some common activation functions are RELU: max (0,
x), Tanh, Leaky RELU, etc. The volume remains unchanged hence output volume will have
dimensions 32 x 32 x 12.
 Pooling layer: This layer is periodically inserted in the covnets and its main function is to reduce
the size of volume which makes the computation fast reduces memory and also prevents over
fitting. Two common types of pooling layers are max pooling and average pooling. If we use a
max pool with 2 x 2 filters and stride 2, the resultant volume will be of dimension 16x16x12.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 Flattening: The resulting feature maps are flattened into a one-dimensional vector after the
convolution and pooling layers so they can be passed into a completely linked layer for
categorization or regression.
 Fully Connected Layers: It takes the input from the previous layer and computes the final
classification or regression task.
 Output Layer: The output from the fully connected layers is then fed into a logistic function for
classification tasks like sigmoid or softmax which converts the output of each class into the
probability score of each class.
5. THE CONVOLUTION OPERATION:
 Convolutional networks belong to a class of neural networks that take the image as an input, subjects
it to combinations of weights and biases, extracts features and outputs the results.
 They tend to reduce the dimensions of the input image with the use of a kernel which makes it easier
to extract features as compared to a generic dense neural network.
 Convolutional networks trace their foundation to convolution operations on matrices.
 Convnets were inspired by biological processes in that the connectivity pattern between neurons
resembles the organization of the animal visual cortex.
 Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as
the receptive field.
 The receptive fields of different neurons partially overlap such that they cover the entire visual field.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 The name “Convolutional neural network” indicates that the network employs a mathematical
operation called Convolution.
 Convolution is a specialized kind of linear operation. Convnets are simply neural networks that use
convolution in place of general matrix multiplication in at least one of their layers.
 Convolution between two functions in mathematics produces a third function expressing how the
shape of one function is modified by other.
Convolution Kernels
 A kernel is a small 2D matrix whose contents are based upon the operations to be performed.
 A kernel maps on the input image by simple matrix multiplication and addition, the output obtained
is of lower dimensions and therefore easier to work with.

Fig: Kernel types

 Above is an example of a kernel for applying Gaussian blur (to smoothen the image before
processing), Sharpen image (enhance the depth of edges) and edge detection.
 The shape of a kernel is heavily dependent on the input shape of the image and architecture of the
entire network, mostly the size of kernels is (MxM) i.e. a square matrix. The movement of a kernel
is always from left to right and top to bottom.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


Fig: Kernel movement

 Stride defines by what step does to kernel move, for example stride of 1 makes kernel slide by one
row/column at a time and stride of 2 moves kernel by 2 rows/columns.

Fig: Multiple kernels aka filters with stride=1

 For input images with 3 or more channels such as RGB a filter is applied.
 Filters are one dimension higher than kernels and can be seen as multiple kernels stacked on each
other where every kernel is for a particular channel.
 Therefore for an RGB image of (32x32) we have a filter of the shape say (5x5x3).
Now let’s see how a kernel operates on sample matrix

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


Convolution in action

 Here the input matrix has shape 4x4x1 and the kernel is of size 3x3 since the shape of input is
larger than the kernel, we are able to implement a sliding window protocol and apply the kernel
over entire input. First entry in the convoluted result is calculated as:
45*0 + 12*(-1) + 5*0 + 22*(-1) + 10*5 + 35*(-1) + 88*0 + 26*(-1) + 51*0 = -45
Sliding window protocol:

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


1. The kernel gets into position at the top-left corner of the input matrix.
2. Then it starts moving left to right, calculating the dot product and saving it to a new matrix until it
has reached the last column.
3. Next, kernel resets its position at first column but now it slides one row to the bottom. Thus following
the fashion left-right and top-bottom.
4. Steps 2 and 3 are repeated till the entire input has been processed.
6. MOTIVATION:
i. Sparse Interactions
 In traditional Neural Networks, every output unit interacts with every input unit.
 Convolutional networks, however, typically have sparse interactions, by making kernel
smaller than input.
o Reduces memory requirements
o Improves statistical efficiency
 In a deep convolutional network, units in the deeper layers may indirectly interact with a larger
portion of the input.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


ii. Parameter Sharing
 Parameter sharing refers to using same parameter for more than one function in a model.
 In convolutional neural net, each member of kernel is used at every position of input i.e.
parameters used to compute different output units are tied together (all times their values are
same).
 Sparse interactions and parameter sharing combined can improve efficiency of a linear
function for detecting edges in an image.
iii. Equivariance

 Parameter sharing in a convolutional network provides equivariance to translation.

 Translation of image results in corresponding translation in the output map.


 Convolution operation by itself is not equivariant to changes in scale or rotation.

7. POOLING :

 Similar to the Convolutional Layer, the Pooling layer is responsible for reducing the spatial
size of the Convolved Feature.
 This is to decrease the computational power required to process the data by reducing the
dimensions.
 There are two types of pooling average pooling and max pooling. I’ve only had experience
with Max Pooling so far I haven’t faced any difficulties.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 So what we do in Max Pooling is we find the maximum value of a pixel from a portion of the
image covered by the kernel.
 Max Pooling also performs as a Noise Suppressant.
 It discards the noisy activations altogether and also performs de-noising along with
dimensionality reduction.
 On the other hand, Average Pooling returns the average of all the values from the portion of
the image covered by the Kernel.
 Average Pooling simply performs dimensionality reduction as a noise suppressing mechanism.
Hence, we can say that Max Pooling performs a lot better than Average Pooling.

 The pooling operation involves sliding a two-dimensional filter over each channel of feature map
and summarizing the features lying within the region covered by the filter.
 For a feature map having dimensions nh x nw x nc, the dimensions of output obtained after a
pooling layer is
(nh - f + 1) / s x (nw - f + 1)/s x nc

Where,
-> nh - height of feature map

-> nw - width of feature map

-> nc - number of channels in the feature map

-> f - Size of filter

-> s - stride length

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


A common CNN model architecture is to have a number of convolution and pooling layers stacked
one after the other.
WHY TO USE POOLING LAYERS?
 Pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number
of parameters to learn and the amount of computation performed in the network.
 The pooling layer summarizes the features present in a region of the feature map generated by a
convolution layer.
 So, further operations are performed on summarized features instead of precisely positioned
features generated by the convolution layer. This makes the model more robust to variations in the
position of the features in the input image.
TYPES OF POOLING LAYERS:
i. Max pooling
ii. Average Pooling
iii. Global Pooling
i. MAX POOLING
 Max pooling is a pooling operation that selects the maximum element from the region of
the feature map covered by the filter.
 Thus, the output after max-pooling layer would be a feature map containing the most
prominent features of the previous feature map.

ii. AVERAGE POOLING


 Average pooling computes the average of the elements present in the region of feature map
covered by the filter.
 Thus, while max pooling gives the most prominent feature in a particular patch of the feature
map, average pooling gives the average of features present in a patch.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


iii. GLOBAL POOLING
 Global pooling reduces each channel in the feature map to a single value. Thus, an nh x
nw x nc feature map is reduced to 1 x 1 x nc feature map.
 This is equivalent to using a filter of dimensions nh x nw i.e. the dimensions of the feature
map.
 Further, it can be either global max pooling or global average pooling.
ADVANTAGES OF POOLING LAYER:
1. Dimensionality reduction: The main advantage of pooling layers is that they help in reducing the
spatial dimensions of the feature maps. This reduces the computational cost and also helps in
avoiding over fitting by reducing the number of parameters in the model.
2. Translation invariance: Pooling layers are also useful in achieving translation invariance in the
feature maps. This means that the position of an object in the image does not affect the
classification result, as the same features are detected regardless of the position of the object.
3. Feature selection: Pooling layers can also help in selecting the most important features from the
input, as max pooling selects the most salient features and average pooling preserves more
information.

DISADVANTAGES OF POOLING LAYER:


1. Information loss: One of the main disadvantages of pooling layers is that they discard some
information from the input feature maps, which can be important for the final classification or
regression task.
2. Over-smoothing: Pooling layers can also cause over-smoothing of the feature maps, which can
result in the loss of some fine-grained details that are important for the final classification or
regression task.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


3. Hyper parameter tuning: Pooling layers also introduce hyper parameters such as the size of the
pooling regions and the stride, which need to be tuned in order to achieve optimal performance.
This can be time-consuming and requires some expertise in model building.
8. VARIANTS OF THE BASIC CONVOLUTION FUNCTION:

In practical implementations of the convolution operation, certain modifications are made which
deviate from standard discrete convolution operation:
 In general a convolution layer consists of application of several different kernels to the input.
Since, convolution with a single kernel can extract only one kind of feature.
 The input is generally not real-valued but instead vector valued.
 Multi-channel convolutions are commutative if number of output and input channels is the
same.
Effect of Strides
 Stride is the number of pixels shifts over the input matrix.
 In order to allow for calculation of features at a coarser level strided convolutions can be used.
 The effect of strided convolution is the same as that of a convolution followed by a down
sampling stage.
 Strides can be used to reduce the representation size.
 Below is an example representing 2-D Convolution, with (3 * 3) Kernel and Stride of 2 units.
Effect of Zero Padding
 Convolution networks can implicitly zero pad the input V, to make it wider.
 Without zero padding, the width of representation shrinks by one pixel less than the kernel width
at each layer.
 Zero padding the input allows to control kernel width and size of output independently.
Zero Padding Strategies
3 common zero padding strategies are:

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


Zero
Padding Properties Example
Type
1. No zero padding is used.
2. Output is computed only at places where entire
kernel lies inside the input.
Valid 3. Shrinkage > 0
Zero- 4. Limits #convolution layers to be used in
Padding network
5. Input's width = m, Kernel's width = k,
Width of Output = m-k+1

1. Just enough zero padding is added to keep:


1.a. Size (Output) = Size (Input)
2. Input is padded by (k-1) zeros
3. Since the #output units connected to border
Same
pixels is less than that for center pixels, it
Zero-
may under-represent border pixels.
Padding
4. Can add as many convolution layers as
hardware can support
5. Input's width = m, Kernel's width = k,
Width of Output = m

1. The input is padded by enough zeros such


that each input pixel is
connected to same #output units.
Strong
2. Allows us to make an arbitrarily deep NN.
Zero-
3. Can add as many convolution layers as
Padding
hardware can support
4. Input's width = m, Kernel's width = k,
Width of Output = m+k-1

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


TYPES OF CONVOLUTION

Comparing Unshared, Tiled and Traditional Convolutions

Convolution
Properties Advantages and Disadvantages
Type

Advantages
1. No Parameter sharing. 1. Reducing memory consumption
2. Each output unit performs a linear 2. Increasing statistical efficiency
operation on its neighborhood but 3. Reducing the amount of
parameters are not shared across computation needed to perform
Unshared
output units. forward and back-propagation.
Convolution
3. Captures local connectivity while Disadvantages
allowing different features to be 1. Requires much more
computed at different spatial parameters than the convolution
locations. operation.

1. Offers a compromise b/w


unshared and traditional
Advantages
Tiled convolution.
1. Reduces the #parameters in
Convolution 2. Learn a set of kernels and
model.
cycle/rotate them through space.
3. Makes use of parameter sharing.

1. Equivalent to tiled convolution


Traditional with t=1.
Convolution 2. Has the same connectivity as
unshared convolution

Examples of Unshared, Tiled and Traditional Convolutions

1) Unshared Convolution

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


2) Tiled Convolution

3) Traditional Convolution

4) Comparing Computation Times

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


9. STRUCTURED OUTPUTS:

 Convolutional networks can be trained to output high-dimensional structured output rather than
just a classification score.
 To produce an output map as same size as input map, only same-padded convolutions can be
stacked.
 The output of the first labelling stage can be refined successively by another convolutional model.
 If the models use tied parameters, this gives rise to a type of recursive model.

Variable Description

X Input image tensor


Y Probability distribution over tensor for each pixel
H Hidden representation
U Tensor of convolution kernels
V Tensor of kernels to produce estimation of labels
W Kernel tensor to convolve over Y to provide input to H

10. DATA TYPES:

The data used with a convolutional network usually consist of several channels, each channel being the
observation of a different quantity at some point in space or time.

 When output is variable sized, no extra design change needs to be made.


 When output requires fixed size (classification), a pooling stage with kernel size proportional
to input size needs to be used.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


11. EFFICIENT CONVOLUTION ALGORITHMS:
i. Fourier Transform

 The Fourier Transform is a tool that breaks a waveform (a function or signal) into an alternate
representation, characterized by sine and cosines.

ii. Separable Kernels


 Convolution is equivalent to converting both input and kernel to frequency domain using
a Fourier transform, performing point-wise multiplication of two signals:

 Converting back to time domain using an inverse Fourier transform.

 When a d-dimensional kernel can be expressed as outer product of d vectors, one vector per
dimension, the kernel is called separable.
 The kernel also takes fewer parameters to represent as vectors.

Kernel Type Runtime complexity for d-dimensional kernel with w elements wide

Traditional Kernel

Separable Kernel

RANDOM AND UNSUPERVISED FEATURES


To reduce the cost of convolutional network training, we have to use features that are not trained in a
supervised way:
CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani
iii. Random Initialization:
 Layers consisting of convolution followed by pooling naturally become frequency selective
and translation invariant when assigned random weights.
 Randomly initialize several CNN architectures and just train the last classification layer.
 Once a winner is determined, train that model using a more expensive approach (supervised
approach).
 Hand Designed Kernels: * Used to detect edges at a certain orientation or scale.
 Unsupervised Training: * Unsupervised pre-training may offer regularization effect.
* It may also allow for training of larger CNNs because of reduced
computation cost.
iv. Greedy Layer-wise Pre-training
Instead of training an entire convolutional layer at a time, we can train a model of a small patch:
 Train the first layer in isolation.
 Extract all features from the first layer only once.
 Once the first layer is trained, its output is stored and used as input for training the next
layer.
 We can train very large models and incur a high computational cost only at inference time.
12. NEUROSCIENTIFIC BASIS:

 Hubel and Wiesel studied the activity of neurons in a cat’s brain in response to visual stimuli.
Their work characterized many aspects of brain function.
In a simplified view, we have:
 The light entering the eye stimulates the retina. The image then passes through the optic nerve
and a region of the brain called the LGN (lateral geniculate nucleus).
 V1 (primary visual cortex): The image produced on the retina is transported to the V1 with
minimal processing.
The properties of V1 that have been replicated in CNNs are:
 The V1 response is localized spatially, i.e. the upper image stimulates the cells in the upper region
of V1 [localized kernel].
 V1 has simple cells whose activity is a linear function of the input in a small
neighborhood [convolution].
 V1 has complex cells whose activity is invariant to shifts in the position of the feature [pooling] as
well as some changes in lighting which cannot be captured by spatial pooling [cross-channel
pooling].

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


 There are several stages of V1 like operations [stacking convolutional layers].
 In the medial temporal lobe, we find grandmother cells. These cells respond to specific
concepts and are invariant to several transforms of the input. In the medial temporal lobe,
researchers also found neurons spiking on a particular concept, e.g. the Halle Berry neuron fires
when looking at a photo/drawing of Halle Berry or even reading the text Halle Berry. Of course,
there are neurons which spike at other concepts like Bill Clinton, Jennifer Aniston, etc.
 The medial temporal neurons are more generic than CNN in that they respond even to specific
ideas. A closer match to the function of the last layers of a CNN is the IT (infer temporal cortex).
When viewing an object, information flows from the retina, through LGN, V1, V2, and V4 and
reaches IT. This happens within 100ms. When a person continues to look at an object, the brain
sends top-down feedback signals to affect lower level activation.
Some of the major differences between the human visual system (HVS) and the CNN model are:
 The human eye is low resolution except in a region called fovea. Essentially, the eye does not
receive the whole image at high resolution but stiches several patches through eye movements
called saccades. This attention based gazing of the input image is an active research
problem. Note: attention mechanisms have been shown to work on natural language tasks.
 Integration of several senses in the HVS while CNNs are only visual.
 The HVS processes rich 3D information, and can also determine relations between objects.
CNNs for such tasks are in their early stages.
 The feedback from higher levels to V1 has not been incorporated into CNNs with substantial
improvement.
 While the CNN can capture firing rates in the IT, the similarity between intermediate
computations is not established. The brain probably uses different activation and pooling
functions. Even the linearity of filter response is doubtful as recent models for V1 involve
quadratic filters.
 Neuroscience tells us very little about the training procedure. Back propagation which is a
standard training mechanism today is not inspired by neuroscience and sometimes considered
biologically implausible.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


2 MARKS QUESTIONS AND ANSWERS

1. What’s the difference between ELM and traditional neural networks?


Ans: ELM (Extreme Learning Machine) uses single step training with randomly initialized hidden
weights, making it faster and simpler than traditional neural networks. Since in traditional neural
networks involve iterative training, and it make more complex in weight adjustments, which
makes longer training times.
2. What is the ELM method in machine learning?
Ans: ELM (Extreme Learning Machine) method in machine learning is a fast and efficient way to
learn from input data. It randomly sets up parameters in its learning process, and make fast and
efficient in generating output, comparing other machine learning methods. This simplicity and
fast learning efficient where making used for various tasks like image recognition, speech
recognition and natural language processing.
3. Is ELM supervised or unsupervised?
Ans: ELM (Extreme Learning Machine) is a supervised machine learning algorithm. ELM is
commonly used for regression and classification tasks, making it a supervised learning
technique.
4. Why is ELM algorithm used?
Ans: ELM (Extreme Learning Machine) is used for its fast learning, and more efficient to apply in
real-time applications. It offers robust generalization, performing well on new data, and is easy to
implement it. ELM’s efficiency and adaptability make it valuable for tasks like image recognition,
speech analysis, and financial forecasting.
5. How ELM algorithm works?
Ans: ELM (Extreme Learning Machine) randomly initializes hidden layer weights, applies them to
input data, and calculates output weights using an analytical method in a single step. This fast
training method makes ELM computationally efficient and well-suited for various machine
learning tasks.
6. What is a Convolutional Neural Network (CNN)?
Ans: A Convolutional Neural Network (CNN) is a type of deep learning neural network that is well-
suited for image and video analysis. CNNs use a series of convolution and pooling layers to
extract features from images and videos, and then use these features to classify or detect objects
or scenes.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


7. How do CNNs work?
Ans: CNNs work by applying a series of convolution and pooling layers to an input image or video.
Convolution layers extract features from the input by sliding a small filter, or kernel, over the
image or video and computing the dot product between the filter and the input. Pooling layers
then down sample the output of the convolution layers to reduce the dimensionality of the data
and make it more computationally efficient.
8. What are some common activation functions used in CNNs?
Ans: Some common activation functions used in CNNs include:
 Rectified Linear Unit (ReLU): ReLU is a non-saturating activation function that is
computationally efficient and easy to train.
 Leaky Rectified Linear Unit (Leaky ReLU): Leaky ReLU is a variant of ReLU that allows a
small amount of negative gradient to flow through the network. This can help to prevent the
network from dying during training.
 Parametric Rectified Linear Unit (PReLU): PReLU is a generalization of Leaky ReLU that
allows the slope of the negative gradient to be learned.
9. What is the purpose of using multiple convolution layers in a CNN?
Ans: Using multiple convolution layers in a CNN allows the network to learn increasingly complex
features from the input image or video. The first convolution layers learn simple features, such
as edges and corners. The deeper convolution layers learn more complex features, such as shapes
and objects.
10. What are some common regularization techniques used in CNNs?
Ans: Regularization techniques are used to prevent CNNs from over fitting the training data. Some
common regularization techniques used in CNNs include:
 Dropout: Dropout randomly drops out neurons from the network during training. This forces
the network to learn more robust features that are not dependent on any single neuron.
 L1 regularization: L1 regularization regularizes the absolute value of the weights in the
network. This can help to reduce the number of weights and make the network more efficient.
 L2 regularization: L2 regularization regularizes the square of the weights in the network. This
can also help to reduce the number of weights and make the network more efficient.
11. What is the difference between a convolution layer and a pooling layer?
Ans: A convolution layer extracts features from an input image or video, while a pooling layer down
samples the output of the convolution layers. Convolution layers use a series of filters to extract

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani


features, while pooling layers use a variety of techniques to down sample the data, such as max
pooling and average pooling.
12. Define Convolution.
Ans: Neural network across the whole image, as a result, we will get another image with different
widths, heights, and depths. Instead of just R, G, and B channels now we have more channels
but lesser width and height. This operation is called Convolution.
13. Define Stride.
Ans: During the forward pass, we slide each filter across the whole input volume step by step
where each step is called stride (which can have a value of 2, 3, or even 4 for high-
dimensional images) and compute the dot product between the kernel weights and patch from
input volume.
14. What is feed forward propagation?
Ans: The data is fed into the model and output from each layer is obtained from the above step is
called feed forward, we then calculate the error using an error function, some common error
functions are cross-entropy, square loss error, etc.
15. What is back propagation?
Ans: The error function measures how well the network is performing. After that, we back
propagate into the model by calculating the derivatives. This step is called back
propagation which basically is used to minimize the loss.

CCS355 NN & DL Department of IT Asst.Prof.M.gokilavani

You might also like