0% found this document useful (0 votes)
9 views

Module 1

The document provides an overview of neural networks, including single-layer and multi-layer perceptrons, activation functions, and training challenges. It discusses the biological inspiration behind artificial neurons, the perceptron algorithm, and the advantages of neural networks over traditional machine learning methods. Additionally, it outlines the structure and function of multilayer perceptrons and their role in deep learning applications.

Uploaded by

akshaylalsp6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Module 1

The document provides an overview of neural networks, including single-layer and multi-layer perceptrons, activation functions, and training challenges. It discusses the biological inspiration behind artificial neurons, the perceptron algorithm, and the advantages of neural networks over traditional machine learning methods. Additionally, it outlines the structure and function of multilayer perceptrons and their role in deep learning applications.

Uploaded by

akshaylalsp6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

CST414 – Deep Learning

Module 1 – Neural Networks


Introduction to neural networks -Single layer perceptrons, Multi Layer
Perceptrons (MLPs), Representation Power of MLPs, Activation functions - Sigmoid, Tanh,
ReLU, Softmax. , Risk minimization, Loss function, Training MLPs with back propagation,
Practical issues in neural network training - The Problem of Overfitting, Vanishing and
exploding gradient problems, Difficulties in convergence, Local and spurious Optima,
Computational Challenges. Applications of neural networks.

Reena Thomas, Asst. Prof., CSE dept., CEMP


1
Text Books
1. Goodfellow, I., Bengio,Y., and Courville, A., Deep Learning, MIT Press, 2016.
2. Neural Networks and Deep Learning, Aggarwal, Charu C.
3. Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence
Algorithms (1st. ed.). Nikhil Buduma and Nicholas Locascio. 2017. O'Reilly Media,
Inc.

Reference Books
1. Satish Kumar, Neural Networks: A Classroom Approach, Tata McGraw-Hill
Education, 2004.
2. Yegnanarayana, B., Artificial Neural Networks PHI Learning Pvt. Ltd, 2009.
3. Michael Nielsen, Neural Networks and Deep Learning, 2018

2
Artificial Intelligence (AI)
Refers to the development of computer systems capable of
performing tasks that typically require human intelligence.
These tasks include recognizing speech, making decisions,
solving problems, and identifying patterns.

Machine learning (ML)


It is a branch of AI that enables algorithms to uncover
hidden patterns within datasets, allowing them to make
predictions on new, similar data without explicit
programming for each task.

Deep Learning (DL)


It is a branch of machine learning that is based on artificial
neural network (ANN) architecture. ANN uses layers of
interconnected nodes called neurons that work together to
process and learn from the input data.

Convolutional Neural Network (CNN)


It is a type of ANN, specifically designed to process and
analyse data with grid-like structure, such as images or
sequential data. Eg: image recognition, object detection,
video analysis.
3
Biological Neurons
Artificial neural networks are popular machine learning techniques that simulate the mechanism of learning in
biological organisms. The human nervous system contains cells, which are referred to as neurons. The
foundational unit of the human brain is the neuron. A tiny piece of the brain, about the size of grain of rice,
contains over 10,000 neurons, each of which forms an average of 6,000 connections with other neuron.

Dendrite : Branch-like extensions from the neuron’s cell body that


receive signals (electrical or chemical) from other neurons or sensory
stimuli and transmit it toward the soma.

Soma (Cell Body) : The main body of the neuron containing the nucleus
and other organelles. Integrates incoming signals from dendrites and
generates outgoing signals when necessary.

Nucleus : A membrane-bound structure within the soma that contains the neuron’s genetic material. Regulates
cellular activities, including the production of proteins essential for neuron function.

Axon : A long, slender projection extending from the neuron’s soma. Transmits electrical impulses away from the
cell body to other neurons, muscles, or glands.

4
Axon Terminal : The endpoint of an axon where it connects to other neurons or
target cells. Releases neurotransmitters into the synaptic cleft to communicate
with the next neuron or target cell.

The neurons are connected to one another with the use of axons and dendrites,
and the connecting regions between axons and dendrites are referred to as
Synapses.

Synaptic strength : refers to the varying weights of connections between


neurons in a network, which change based on the activities of the neurons at
both ends of the connection. This change is the fundamental process through
which learning occurs in living organisms.

• This biological mechanism is simulated in ANN’s, which contain computation units that are referred to as
neurons.
• The computational units are connected to one another through weights, which serve the same role as the
strengths of synaptic connections in biological organisms.
• After being weighted by the strength of their respective connections, the inputs are summed together in the
cell body.
• This sum is then transformed into a new signal that’s propagated along the cell’s axon and sent off to other
neurons.

5
6
• In 1943, Warren S. McCulloch and Walter H. Pitts introduced the concept of an
artificial neuron, inspired by biological neurons.
• An artificial neuron receives multiple inputs (x1, x2, ..., xn), each multiplied by a
corresponding weight (w1, w2, ..., wn).
• The weighted inputs are summed to calculate the logit (z), often with an added
bias term (b).
• The logit is then passed through an activation function (f) to produce the output
y=f(z), which can be sent to other neurons.

7
8
• In conclusion, ANN processes inputs by passing values from input neurons to output
neurons, using weights as key parameters.

• Learning occurs by changing the weights connecting the neurons.

• Just as external stimuli are needed for learning in biological organisms, the external
stimulus in ANN’s is provided by the training data containing examples of input-
output pairs of the function to be learned.

• For example, the training data might contain pixel representations of images (input)
and their annotated labels (e.g., carrot, banana) as the output.

• The network uses this data to improve predictions by comparing them to the correct
labels and adjusting weights based on errors.

• A good ANN can generalize, meaning it performs well on new, unseen data after
being trained. The true value of machine learning lies in this ability to generalize.

9
Advantages of Neural Networks

Neural networks have two key advantages over traditional machine learning:

• Higher-level abstraction of expressing semantic insights (meaningful patterns or


relationships within the data) about data domains by architectural design
choices in the computational graph.

• Neural networks provide a simple way to adjust the complexity of a model by


adding or removing neurons from the architecture according to the availability
of training data or computational power.

10
A Perceptron is an Artificial Neuron

• It is the simplest possible Neural Network and Neural Networks are the building
blocks of Machine Learning.
• In 1957, Frank Rosenblatt "invented" a Perceptron program, on an IBM 704
computer at Cornell Aeronautical Laboratory.
• Scientists discovered that neurons process sensory input, store information, and
make decisions using electrical signals.
• Inspired by this, Frank proposed perceptrons to simulate these brain functions and
enable learning and decision-making.

• The original Perceptron was designed to take a number of binary inputs, and
produce one binary output (0 or 1).
• The idea was to use different weights to represent the importance of each input,
and that the sum of the values should be greater than a threshold value before
making a decision like true or false (0 or 1).
11
The Perceptron Algorithm

Frank Rosenblatt suggested this algorithm:


1. Set a threshold value
2. Multiply all inputs with its weights
3. Sum all the results
4. Activate the output

Eg.: Imagine a perceptron in your brain deciding whether to go to a concert.


It considers factors like the artist's quality and the weather, with each factor
having a different weight in the decision.

12
Criteria Input Weight
Artists is Good x1 = 0 or 1 w1 = 0.7
Weather is Good x2 = 0 or 1 w2 = 0.6
Friend will Come x3 = 0 or 1 w3 = 0.5
Food is Served x4 = 0 or 1 w4 = 0.3
Alcohol is Served x5 = 0 or 1 w5 = 0.4

inputs(x1,x2,x3,x4,x5) = [1, 0, 1, 0, 1]
Weights(w1,w2,w3,w4,w5) = [0.7, 0.6, 0.5, 0.3, 0.4]

1. Threshold = 1.5 3. Sum all the results: (The Weighted Sum)


2. Multiply all inputs with its weights 0.7 + 0 + 0.5 + 0 + 0.4 = 1.6
x1 * w1 = 1 * 0.7 = 0.7 4. Activate the Output:
x2 * w2 = 0 * 0.6 = 0 Return true if the sum > 1.5
x3 * w3 = 1 * 0.5 = 0.5 ("Yes I will go to the Concert")
x4 * w4 = 0 * 0.3 = 0
x5 * w5 = 1 * 0.4 = 0.4
13
• If the weather weight is 0.6 for you, it might different for someone else.
• A higher weight means that the weather is more important to them.
• If the threshold value is 1.5 for you, it might be different for someone else.
• A lower threshold means they are more wanting to go to the concert.
• A Perceptron is often used to classify data into two parts.
• A Perceptron is also known as a Linear Binary Classifier.

14
AND Gate problem

Random weights are w1=0.9, w2=0.9


Threshold= 0.5
Round 1
1st instance to the perceptron x1=0, x2=0
Weighted sum=0
Sum,0<0.5 , then the output is 0
It will not update the weight because there is no error in this case.
15
2nd instance to the perceptron x1=0,x2=1
Weighted sum=0.9
Activation unit return 1, because 0.9>0.5
output of this instance should be 0.
This instance is not predicted correctly.
That’s why, we will update weights based on the error.
ε = actual – prediction = 0 – 1 = -1
Learning rate = 0.5
We will add error times learning rate value to the weights
w1 = w1 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4
w2 = w2 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4

3rd instance.
x1 = 1 and x2 = 0.
Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 0 * 0.4 = 0.4
Activation unit will return 0, because output of the sum unit = 0.5 and it is less than 0.5.
We will not update weights.
16
4th instance.
x1 = 1 and x2 = 1.
sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 1 * 0.4 = 0.8
Activation unit will return 1 because output of the sum unit is 0.8 and it is greater than
the threshold value 0.5.
Its actual value should 1 as well. This means that 4th instance is predicted correctly.
We will not update anything.
Round 2
1st instance.
x1 = 0 and x2 = 0.
Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 0 * 0.4 = 0.4
Activation unit will return 0 because sum unit is 0.4 and it is less than the threshold
value 0.5.
The output of the 1st instance should be 0 as well. This means that the instance is
classified correctly.

17
2nd instance
x1 = 0 and x2 = 1.
Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 1 * 0.4 = 0.4
Activation unit will return 0 because sum unit is less than the threshold 0.5.
Its output should be 0 as well.
This means that it is classified correctly and we will not update weights.

For 3rd and 4th instances already for the current weight values in the previous
round. They were classified correctly.

18
Single-Layer Perceptron
• A single-layer perceptron is the simplest type of neural network that performs
binary classification. We can also call it as the perceptron.
• It consists of:
• Inputs, each have it’s corresponding weight.
• A bias added to the weighted sum.
• A step activation function to determine the output.
• It adjusts weights and bias using a learning rule during training.
• It works only for linearly separable data, meaning data that can be separated by
a straight line.

19
This neural network contains a single input layer and an output node.

20
Each training instance is in the form of (X, y):
• Features: X=[x1,………,xd] a set of d variables.
• Class Label: y∈{−1,+1}, the observed binary outcome.

The goal is to predict y for new instances where it is unknown.


• The input layer has d nodes, each transmitting a feature.
• Each feature is connected to the output node with a weight W=[w1,…,wd].
• The output node computes a linear function to make a prediction.

21
The sign function is used to predict the class (+1 or -1) based on the output of the
perceptron.

• If the calculated value is positive or zero, the sign function gives +1.
• If the calculated value is negative, it gives −1.

The error is the difference between the actual class (y) and the predicted class (y^):

The perceptron has two layers:


• Input layer: Passes the features to the next layer but doesn’t do any calculation.
• Output layer: Computes the final result based on the input and weights.

Even though there are two layers, the input layer is not counted as a computational
layer, so the perceptron is called a single-layer network.

22
The bias is an additional variable that accounts for the invariant part of the prediction.
It helps adjust the decision boundary, especially in cases of imbalanced binary class
distribution, where one class is more frequent than the other.

• The perceptron algorithm is heuristically designed to minimize the number of


misclassifications, and convergence proofs were available that provided correctness
guarantees of the learning algorithm
• Goal of the perceptron algorithm in least-squares form with respect to all training
instances in a data set D containing feature-label pairs.
• This type of minimization objective function is also referred to as a loss function.

23
Where does the perceptron fail

Perceptron fails at similar situations line linear SVM


• Inability to learn and represent non-linear data ( Inability to Handle Nonlinear
Decision Boundaries )

24
Multilayer Perceptron (MLP)
• Feedforward Neural Networks (FNNs), also known as Deep feedforward
networks or Multilayer Perceptrons (MLPs), are a type of neural network that
defines a mapping y=f(x;θ).
• Here, θ represents the learnable parameters (weights and biases).
• The network learns these parameters to approximate a function f∗(x), which
maps inputs x to outputs y.
• No Feedback Connections: MLPs have no feedback loops or connections in
which outputs of the model are fed back into itself.
• MLPs is referred to as feed-forward networks, because successive layers feed
into one another in the forward direction from input to output.
• When feedforward neural networks are extended to include feedback
connections, they are called Recurrent Neural Networks.

25
Layers:
1. Input Layer: Accepts raw data x and transmits it to the next layer.
2. Hidden Layers: Perform computations to extract meaningful features from the
input. These layers are hidden because their outputs are not directly visible.
3. Output Layer: Produces the final predictions y.

Each layer computes a function and passes its output to the next layer. For example:

• The length of this chain (number of layers) defines the depth of the network.
• The number of neurons of the hidden layers defines the width of the model.
• A multilayer network evaluates compositions of functions:
• For a path of length 2, where 𝑔(⋅) is computed in layer 𝑚, and 𝑓(⋅) in layer 𝑚+1, the
computation is:

• Non-linear activation functions like ReLU or Sigmoid are crucial to the network's
ability to model complex relationships.
26
Training the Network
• Goal: Adjust the parameters θ to minimize the difference between f(x) (model
prediction) and f∗(x) (true function).
• Process:
• Loss Function: Measures the error between predictions and actual values.
• Backpropagation: Uses dynamic programming to compute gradients of the
loss with respect to all parameters.
• Optimization Algorithm: Updates parameters using techniques like Stochastic
Gradient Descent (SGD) or Adam.
• Training data consists of input-output pairs (x, y), where y≈f∗(x).

• The strategy of deep learning can be expressed as:

• ϕ(x;θ): The feature representation learned by the network.


• w: Parameters that map this representation to the output.
• The optimization algorithm adjusts θ to find the best ϕ(x) for predicting y.

27
Summary of Characteristics of MLPs

• Feedforward Flow: No feedback connections—data flows from input to output.


• Layers: Input, hidden, and output layers, each performing specific functions.
• Compositional Functions: Depth comes from the composition of multiple functions.
• Hidden Layers: Extract hierarchical representations and increase model capacity.
• Automatic Training: Parameters are learned through backpropagation and
optimization.

28
Representation Power of MLP’s
• It is the ability of a neural network to classify data correctly by creating accurate
decision boundaries for different classes.
• A neural network achieves this by combining simpler functions to form complex
ones through layers of computation.

• Why depth matters ?


• The power of deep learning comes from repeatedly applying nonlinear
transformations across layers. This increases the network's expressive
power, allowing it to model complex patterns with fewer parameters.
• Nonlinearity is Essential
• Linear Activations: Cannot handle non-linearly separable data as the output
remains a linear combination of inputs.
• Nonlinear Activations: Enable nonlinear transformations (e.g., with ReLU),
making data linearly separable in higher-dimensional spaces.

29
The power of non linear activation functions in transforming a data set to linear separability.

30
• Suppose the hidden layer uses ReLU ac va on and learns two new features, h1 and
h2, based on the input data.
• The hidden layer creates a new representation of the data, such that:
• Class ‘*’ has points like (1,0) and (0,1).
• Class ‘+’ has points like (0,0).
• These points are now linearly separable in the hidden layer.

Importance of Nonlinear Activation


The ReLU function "thresholds" negative values to 0, enabling the network to create
new, useful features.
For example:
• The output O=h1+h2 (with weights set to 1 and a linear activation) can separate
two classes:
For class ‘*’, O=1
For class ‘+’, O=0
This shows how much of a network’s power lies in its activation functions.

31
Role of Activation Functions

• Activation functions (e.g., ReLU, sigmoid, tanh) allow the network to Perform
nonlinear transformations of data.
• Make complex patterns linearly separable in higher dimensions.
• As the number of layers (depth) increases, the network's power to learn and
represent complex patterns also increases.

32
Activation Functions
Why do we need Activation functions in neural networks ?

• In the output layer, an activation function Φ(v) can determine the nature of the output,
such as constraining it to a specific range (e.g., producing a probability value in [0,1]).

• In multilayer neural networks, activation functions introduce non-linearity into the


hidden layers, which is essential for increasing the model's complexity and enabling it to
learn more sophisticated patterns. Without non-linear activation functions, a neural
network—regardless of the number of layers—would be equivalent to a single-layer
network because linear transformations alone cannot capture complex relationships.

• The activation functions used during inference might differ from those employed in the
loss functions during training. For instance, a perceptron uses the sign function
Φ(v)=sign(v) for making predictions, but it does not rely on any activation function
while computing the perceptron criterion during training.
33
Identity Activation Function

• It is the simplest activation function, where the output is the same as the input.
Mathematically, it is expressed as: Φ(v)=v
• For a single layer network, if the training pair is (X, y) the outputs is as follows

34
35
36
37
38
39
40
Loss Function
• The choice of the loss function is critical in defining the outputs in a way that is
sensitive to the application at hand.
• For example, least-squares regression with numeric outputs requires a simple
squared loss of the form (y − ˆy)^2 for a single training instance with target y and
prediction ˆy.
• other types of loss like hinge loss for y {−1, +1} and real valued prediction ˆy (with
identity activation): L = max{0, 1 − y · ˆy}
• The hinge loss can be used to implement a learning method, which is referred to as a
support vector machine.

• For probabilistic predictions, two different types of loss functions are used,
depending on whether the prediction is binary or whether it is multi-way:
• Binary targets (Logistic regression)
• Categorical targets

41
42
43
44
45
Training MLPs with back propagation

46
• In Single-layer Neural Networks , Training is straightforward because the loss
function is a direct function of the weights.
• In Multi-layer Neural Networks, Training is more complex because the loss function
is a composition of multiple layers, making it difficult to compute directly.
• To address this, the backpropagation algorithm is used, which calculates the error
gradients efficiently.

The Backpropagation Algorithm


• It relies on the chain rule of calculus to compute error gradients as a sum of local-
gradient products across paths from a node to the output.
• The chain rule allows us to compute the derivative of a composite function. If a
function y depends on u, and u depends on x, the derivative of y with respect to x
is calculated as:

• Backpropagation is essentially an application of dynamic programming (Breaking


complex problems into smaller sub-problems).
47
Two Phases of Backpropagation

1. Forward Phase:
• Inputs are fed into the network, and computations are performed layer by
layer using the current weights.
• The final output is compared with the expected value, and the derivative
of the loss function with respect to the output is calculated.

2. Backward Phase:
• The gradients of the loss function with respect to each weight are
computed using the chain rule, working backward from the output layer
to the input layer.
• These gradients are used to update the weights.

48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
Practical issues in neural network training
Training a neural network involves several practical challenges that can significantly
impact model performance.

1. The Problem of Overfitting and Underfitting


When training a neural network, the goal is to develop a model that performs well on
both training data and unseen test data. However, two common problems that arise
are: overfitting and underfitting.

71
1.1 Overfitting

• Overfitting occurs when a model learns the training data too well, capturing noise
and random fluctuations instead of just the underlying patterns.
• This results in high accuracy on training data but poor performance on test data.

Key Characteristics:
• High training accuracy but low test accuracy
• The model is too complex (too many parameters) for the given dataset
• The model memorizes the data instead of generalizing

Example:
Imagine a model trained to recognize cats and dogs. If it memorizes every detail
(such as the exact lighting or background in each training image), it may fail to
recognize cats and dogs in new images with different backgrounds.
Solutions:

• Regularization (L1/L2) – Penalize large weights to prevent excessive complexity


• Dropout – Randomly deactivate neurons during training to reduce reliance on
specific features
• Early Stopping – Stop training when validation loss starts increasing
• More Training Data – More diverse data helps prevent memorization

73
1.2 Underfitting

Underfitting occurs when a model is too simple to capture the underlying structure of
the data, leading to poor performance on both training and test data.

Key Characteristics:
• Low training accuracy and low test accuracy
• The model fails to capture key patterns in the data
• Can happen if the model has too few parameters or lacks enough training

Example: If we train a neural network with only one hidden layer and very few
neurons to classify handwritten digits, it might not learn enough detail to distinguish
between different numbers.

Solutions:
• Increase Model Complexity – Add more layers/neurons to capture patterns
• Train Longer – Ensure the model has had enough training epochs
• Feature Engineering – Provide better input features for learning
74
75
2. Vanishing Gradient Problem

• Gradients shrink too much as they propagate backward.


• Early layers learn very slowly or not at all.
• The model struggles to improve.

Cause:
• The chain rule of backpropagation involves multiplying many small values, causing
gradients to decay exponentially.
• Sigmoid and Tanh activations worsen this because their derivatives are always ≤ 0.25.

Example: Imagine passing a message through many people, and each person whispers
softer. Eventually, the message disappears!

Solutions:
• Use ReLU activation (avoids small derivatives for positive values).
• Apply batch normalization to stabilize activations.
• Use proper weight initialization (Xavier/He initialization).
76
3. Exploding Gradient Problem

• Gradients become too large, leading to unstable updates.


• The model’s weights oscillate or diverge instead of converging.

Cause:
• If derivatives are greater than 1, multiplying them amplifies gradients
exponentially.
• This happens in deep networks with large weight values.

Example: Imagine a microphone picking up its own sound and amplifying it into loud
feedback noise!

Solutions:
• Gradient clipping (limits gradient values to prevent explosion).
• Use adaptive learning rates (optimizers like Adam, RMSprop).
• Batch normalization (keeps values in a stable range).
77
4. Difficulties in convergence
Convergence refers to a neural network gradually reaching an optimal state where
the loss is minimized. However, several factors can make this process slow or
unstable.

• Poor Learning Rate Selection


Too High: The model overshoots the optimal point, leading to instability.
Too Low: Training is too slow and may get stuck.
• Vanishing & Exploding Gradients
Small gradients → Slow learning (vanishing gradient).
Large gradients → Unstable updates (exploding gradient).
• Poor Weight Initialization : Badly initialized weights can cause slow convergence or
exploding gradients.
• Local and Spurious Optima : The model can get stuck in suboptimal solutions
instead of finding the best one.
• Computational Limitations : Deep networks require high memory and processing
power, otherwise it will slow down the training process.
78
79
80
81
82
83
Applications of Neural Networks
1. Facial Recognition

• Neural networks, particularly Convolutional Neural Networks (CNNs), are used for
facial recognition tasks.

• These models are trained to identify or verify a person’s identity based on facial
features.

• The network learns to recognize distinct features of a face, such as the eyes, nose,
and mouth, and can match these features with stored images in a database. It’s
used in security systems, phone unlocking, and surveillance.

• They are used in offices for selective entries. The systems thus authenticate a
human face and match it up with the list of IDs that are present in its database.

84
2. Stock Market Prediction

• Investments are subject to market risks.

• Multi-Layer Perceptrons (MLPs), are used to predict stock price movements by


analyzing historical data like price, volume, market sentiment, annual returns, and
profit ratios.

• MLPs consist of multiple layers of fully connected nodes, each learning patterns and
trends from past performances. This helps investors make informed decisions about
buying, selling, or holding stocks.

85
3. Social Media

• Neural networks are used in social media platforms for various tasks like
recommendation systems, sentiment analysis, and content moderation.

• Neural networks analyze user behavior (likes, comments, shares) and content
preferences to provide personalized recommendations.

• They also detect harmful or inappropriate content (e.g., hate speech, explicit
images) through natural language processing (NLP) and image classification
models.

86
4. Aerospace & Defense

• In aerospace and defense, neural networks are used for target detection,
navigation, radar signal processing, and autonomous vehicles.

• Neural networks can analyze radar signals or images to detect enemy targets or
obstacles.

• They are also used in autonomous drones or unmanned aerial vehicles (UAVs) to
navigate and make decisions based on real-time data.

87
5. Healthcare

• Neural networks are applied in medical image analysis, diagnostic systems,


and drug discovery.

• CNNs are used to analyze medical images like X-rays, MRIs, or CT scans to
detect diseases like cancer, tumors, or other anomalies.

• Neural networks also help in predicting patient outcomes, personalizing


treatments, and even discovering new drugs by analyzing biological data.

88
6. Signature Verification and Handwriting Analysis

• Neural networks are used for signature verification and handwriting recognition
in applications such as fraud detection and document authentication.

• The network learns the unique features of a person’s handwriting or signature.

• By comparing new inputs to stored signatures, the model can determine


authenticity or match the handwriting to the correct individual.

• In handwriting recognition, neural networks are trained to transcribe handwritten


text into digital format.

89
7. Weather Forecasting

• Neural networks are used to predict weather patterns, including temperature,


rainfall, and storm prediction.

• It analyze historical weather data (such as temperature, pressure, and


humidity) and learn patterns over time to forecast future weather conditions.

• They can also improve the accuracy of weather models by handling complex,
non-linear relationships in the data.

In summary, Neural networks are widely applied across diverse fields. In each
case, they learn from vast amounts of data, detect patterns, and make
predictions or decisions. Their ability to handle complex and unstructured data
makes them especially valuable in tasks like recognition, prediction, analysis, and
automation.
90
Course Level Assessment Questions

91
1. Suppose you have a 3-dimensional input x = (x1, x2, x3) = (2, 2, 1) fully connected to 1
neuron which is in the hidden layer with activation function sigmoid. Calculate the
output of the hidden layer neuron.

92
2. Design a single layer perceptron to compute the NAND (not-AND) function. This
function receives two binary-valued inputs x1 and x2, and returns 0 if both inputs are 1,
and returns 1 otherwise.

93
3. Suppose we have a fully connected, feed-forward network with no hidden layer, and 5
input units connected directly to 3 output units. Briefly explain why adding a hidden
layer with 8 linear units does not make the network any more powerful.

4. Briefly explain one thing you would use a validation set for, and why you can’t just do
it using the test set.

5. Give a method to fight vanishing gradients in fully-connected neural networks.


Assume we are using a network with Sigmoid activations trained using SGD.

6. You would like to train a fully-connected neural network with 5 hidden layers, each
with 10 hidden units. The input is 20-dimensional and the output is a scalar. What is the
total number of trainable parameters in your network? 94
1. Discuss the limitation of a single layer perceptron with an example.
2. List the advantages and disadvantages of sigmoid and ReLU activation functions
3.

95
96
Previous Year Questions

97
• How does neural network solve the XOR problem?
• Describe Perceptron and its components.
• Explain the practical issues in neural network training
• Define universal approximation theorem.
• Discuss methods to prevent overfitting in neural networks.
• A 3-dimensional input X = (X1, X2, X3) = (1, 2, 1) is fully connected to 1 neuron
which is in the hidden layer with binary sigmoid activation function. Calculate the
output of the hidden layer neuron. Assume associated weights. Neglect bias term.
• Draw the architecture of a multi-layer perceptron. Derive update rules for
parameters in the multi-layer neural network through the gradient descent.

98
• Describe various activation functions used in neural networks.
• Calculate the output of the following neuron Y if the activation function is a
bipolar sigmoid.

• Explain the importance of choosing the right step size in neural networks.
• Discuss the disadvantages of single layer perceptrons with an example.

99
• List any three applications of neural network.
• Explain different activations functions and their derivatives used in neural networks
with the help of graphical representation.
• Implement the back propagation algorithm to train Multi Layer Perceptron using
tanh activation function.

100

You might also like