Module 1
Module 1
Reference Books
1. Satish Kumar, Neural Networks: A Classroom Approach, Tata McGraw-Hill
Education, 2004.
2. Yegnanarayana, B., Artificial Neural Networks PHI Learning Pvt. Ltd, 2009.
3. Michael Nielsen, Neural Networks and Deep Learning, 2018
2
Artificial Intelligence (AI)
Refers to the development of computer systems capable of
performing tasks that typically require human intelligence.
These tasks include recognizing speech, making decisions,
solving problems, and identifying patterns.
Soma (Cell Body) : The main body of the neuron containing the nucleus
and other organelles. Integrates incoming signals from dendrites and
generates outgoing signals when necessary.
Nucleus : A membrane-bound structure within the soma that contains the neuron’s genetic material. Regulates
cellular activities, including the production of proteins essential for neuron function.
Axon : A long, slender projection extending from the neuron’s soma. Transmits electrical impulses away from the
cell body to other neurons, muscles, or glands.
4
Axon Terminal : The endpoint of an axon where it connects to other neurons or
target cells. Releases neurotransmitters into the synaptic cleft to communicate
with the next neuron or target cell.
The neurons are connected to one another with the use of axons and dendrites,
and the connecting regions between axons and dendrites are referred to as
Synapses.
• This biological mechanism is simulated in ANN’s, which contain computation units that are referred to as
neurons.
• The computational units are connected to one another through weights, which serve the same role as the
strengths of synaptic connections in biological organisms.
• After being weighted by the strength of their respective connections, the inputs are summed together in the
cell body.
• This sum is then transformed into a new signal that’s propagated along the cell’s axon and sent off to other
neurons.
5
6
• In 1943, Warren S. McCulloch and Walter H. Pitts introduced the concept of an
artificial neuron, inspired by biological neurons.
• An artificial neuron receives multiple inputs (x1, x2, ..., xn), each multiplied by a
corresponding weight (w1, w2, ..., wn).
• The weighted inputs are summed to calculate the logit (z), often with an added
bias term (b).
• The logit is then passed through an activation function (f) to produce the output
y=f(z), which can be sent to other neurons.
7
8
• In conclusion, ANN processes inputs by passing values from input neurons to output
neurons, using weights as key parameters.
• Just as external stimuli are needed for learning in biological organisms, the external
stimulus in ANN’s is provided by the training data containing examples of input-
output pairs of the function to be learned.
• For example, the training data might contain pixel representations of images (input)
and their annotated labels (e.g., carrot, banana) as the output.
• The network uses this data to improve predictions by comparing them to the correct
labels and adjusting weights based on errors.
• A good ANN can generalize, meaning it performs well on new, unseen data after
being trained. The true value of machine learning lies in this ability to generalize.
9
Advantages of Neural Networks
Neural networks have two key advantages over traditional machine learning:
10
A Perceptron is an Artificial Neuron
• It is the simplest possible Neural Network and Neural Networks are the building
blocks of Machine Learning.
• In 1957, Frank Rosenblatt "invented" a Perceptron program, on an IBM 704
computer at Cornell Aeronautical Laboratory.
• Scientists discovered that neurons process sensory input, store information, and
make decisions using electrical signals.
• Inspired by this, Frank proposed perceptrons to simulate these brain functions and
enable learning and decision-making.
• The original Perceptron was designed to take a number of binary inputs, and
produce one binary output (0 or 1).
• The idea was to use different weights to represent the importance of each input,
and that the sum of the values should be greater than a threshold value before
making a decision like true or false (0 or 1).
11
The Perceptron Algorithm
12
Criteria Input Weight
Artists is Good x1 = 0 or 1 w1 = 0.7
Weather is Good x2 = 0 or 1 w2 = 0.6
Friend will Come x3 = 0 or 1 w3 = 0.5
Food is Served x4 = 0 or 1 w4 = 0.3
Alcohol is Served x5 = 0 or 1 w5 = 0.4
inputs(x1,x2,x3,x4,x5) = [1, 0, 1, 0, 1]
Weights(w1,w2,w3,w4,w5) = [0.7, 0.6, 0.5, 0.3, 0.4]
14
AND Gate problem
3rd instance.
x1 = 1 and x2 = 0.
Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 0 * 0.4 = 0.4
Activation unit will return 0, because output of the sum unit = 0.5 and it is less than 0.5.
We will not update weights.
16
4th instance.
x1 = 1 and x2 = 1.
sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 1 * 0.4 = 0.8
Activation unit will return 1 because output of the sum unit is 0.8 and it is greater than
the threshold value 0.5.
Its actual value should 1 as well. This means that 4th instance is predicted correctly.
We will not update anything.
Round 2
1st instance.
x1 = 0 and x2 = 0.
Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 0 * 0.4 = 0.4
Activation unit will return 0 because sum unit is 0.4 and it is less than the threshold
value 0.5.
The output of the 1st instance should be 0 as well. This means that the instance is
classified correctly.
17
2nd instance
x1 = 0 and x2 = 1.
Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 1 * 0.4 = 0.4
Activation unit will return 0 because sum unit is less than the threshold 0.5.
Its output should be 0 as well.
This means that it is classified correctly and we will not update weights.
For 3rd and 4th instances already for the current weight values in the previous
round. They were classified correctly.
18
Single-Layer Perceptron
• A single-layer perceptron is the simplest type of neural network that performs
binary classification. We can also call it as the perceptron.
• It consists of:
• Inputs, each have it’s corresponding weight.
• A bias added to the weighted sum.
• A step activation function to determine the output.
• It adjusts weights and bias using a learning rule during training.
• It works only for linearly separable data, meaning data that can be separated by
a straight line.
19
This neural network contains a single input layer and an output node.
20
Each training instance is in the form of (X, y):
• Features: X=[x1,………,xd] a set of d variables.
• Class Label: y∈{−1,+1}, the observed binary outcome.
21
The sign function is used to predict the class (+1 or -1) based on the output of the
perceptron.
• If the calculated value is positive or zero, the sign function gives +1.
• If the calculated value is negative, it gives −1.
The error is the difference between the actual class (y) and the predicted class (y^):
Even though there are two layers, the input layer is not counted as a computational
layer, so the perceptron is called a single-layer network.
22
The bias is an additional variable that accounts for the invariant part of the prediction.
It helps adjust the decision boundary, especially in cases of imbalanced binary class
distribution, where one class is more frequent than the other.
23
Where does the perceptron fail
24
Multilayer Perceptron (MLP)
• Feedforward Neural Networks (FNNs), also known as Deep feedforward
networks or Multilayer Perceptrons (MLPs), are a type of neural network that
defines a mapping y=f(x;θ).
• Here, θ represents the learnable parameters (weights and biases).
• The network learns these parameters to approximate a function f∗(x), which
maps inputs x to outputs y.
• No Feedback Connections: MLPs have no feedback loops or connections in
which outputs of the model are fed back into itself.
• MLPs is referred to as feed-forward networks, because successive layers feed
into one another in the forward direction from input to output.
• When feedforward neural networks are extended to include feedback
connections, they are called Recurrent Neural Networks.
25
Layers:
1. Input Layer: Accepts raw data x and transmits it to the next layer.
2. Hidden Layers: Perform computations to extract meaningful features from the
input. These layers are hidden because their outputs are not directly visible.
3. Output Layer: Produces the final predictions y.
Each layer computes a function and passes its output to the next layer. For example:
• The length of this chain (number of layers) defines the depth of the network.
• The number of neurons of the hidden layers defines the width of the model.
• A multilayer network evaluates compositions of functions:
• For a path of length 2, where 𝑔(⋅) is computed in layer 𝑚, and 𝑓(⋅) in layer 𝑚+1, the
computation is:
• Non-linear activation functions like ReLU or Sigmoid are crucial to the network's
ability to model complex relationships.
26
Training the Network
• Goal: Adjust the parameters θ to minimize the difference between f(x) (model
prediction) and f∗(x) (true function).
• Process:
• Loss Function: Measures the error between predictions and actual values.
• Backpropagation: Uses dynamic programming to compute gradients of the
loss with respect to all parameters.
• Optimization Algorithm: Updates parameters using techniques like Stochastic
Gradient Descent (SGD) or Adam.
• Training data consists of input-output pairs (x, y), where y≈f∗(x).
27
Summary of Characteristics of MLPs
28
Representation Power of MLP’s
• It is the ability of a neural network to classify data correctly by creating accurate
decision boundaries for different classes.
• A neural network achieves this by combining simpler functions to form complex
ones through layers of computation.
29
The power of non linear activation functions in transforming a data set to linear separability.
30
• Suppose the hidden layer uses ReLU ac va on and learns two new features, h1 and
h2, based on the input data.
• The hidden layer creates a new representation of the data, such that:
• Class ‘*’ has points like (1,0) and (0,1).
• Class ‘+’ has points like (0,0).
• These points are now linearly separable in the hidden layer.
31
Role of Activation Functions
• Activation functions (e.g., ReLU, sigmoid, tanh) allow the network to Perform
nonlinear transformations of data.
• Make complex patterns linearly separable in higher dimensions.
• As the number of layers (depth) increases, the network's power to learn and
represent complex patterns also increases.
32
Activation Functions
Why do we need Activation functions in neural networks ?
• In the output layer, an activation function Φ(v) can determine the nature of the output,
such as constraining it to a specific range (e.g., producing a probability value in [0,1]).
• The activation functions used during inference might differ from those employed in the
loss functions during training. For instance, a perceptron uses the sign function
Φ(v)=sign(v) for making predictions, but it does not rely on any activation function
while computing the perceptron criterion during training.
33
Identity Activation Function
• It is the simplest activation function, where the output is the same as the input.
Mathematically, it is expressed as: Φ(v)=v
• For a single layer network, if the training pair is (X, y) the outputs is as follows
34
35
36
37
38
39
40
Loss Function
• The choice of the loss function is critical in defining the outputs in a way that is
sensitive to the application at hand.
• For example, least-squares regression with numeric outputs requires a simple
squared loss of the form (y − ˆy)^2 for a single training instance with target y and
prediction ˆy.
• other types of loss like hinge loss for y {−1, +1} and real valued prediction ˆy (with
identity activation): L = max{0, 1 − y · ˆy}
• The hinge loss can be used to implement a learning method, which is referred to as a
support vector machine.
• For probabilistic predictions, two different types of loss functions are used,
depending on whether the prediction is binary or whether it is multi-way:
• Binary targets (Logistic regression)
• Categorical targets
41
42
43
44
45
Training MLPs with back propagation
46
• In Single-layer Neural Networks , Training is straightforward because the loss
function is a direct function of the weights.
• In Multi-layer Neural Networks, Training is more complex because the loss function
is a composition of multiple layers, making it difficult to compute directly.
• To address this, the backpropagation algorithm is used, which calculates the error
gradients efficiently.
1. Forward Phase:
• Inputs are fed into the network, and computations are performed layer by
layer using the current weights.
• The final output is compared with the expected value, and the derivative
of the loss function with respect to the output is calculated.
2. Backward Phase:
• The gradients of the loss function with respect to each weight are
computed using the chain rule, working backward from the output layer
to the input layer.
• These gradients are used to update the weights.
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
Practical issues in neural network training
Training a neural network involves several practical challenges that can significantly
impact model performance.
71
1.1 Overfitting
• Overfitting occurs when a model learns the training data too well, capturing noise
and random fluctuations instead of just the underlying patterns.
• This results in high accuracy on training data but poor performance on test data.
Key Characteristics:
• High training accuracy but low test accuracy
• The model is too complex (too many parameters) for the given dataset
• The model memorizes the data instead of generalizing
Example:
Imagine a model trained to recognize cats and dogs. If it memorizes every detail
(such as the exact lighting or background in each training image), it may fail to
recognize cats and dogs in new images with different backgrounds.
Solutions:
73
1.2 Underfitting
Underfitting occurs when a model is too simple to capture the underlying structure of
the data, leading to poor performance on both training and test data.
Key Characteristics:
• Low training accuracy and low test accuracy
• The model fails to capture key patterns in the data
• Can happen if the model has too few parameters or lacks enough training
Example: If we train a neural network with only one hidden layer and very few
neurons to classify handwritten digits, it might not learn enough detail to distinguish
between different numbers.
Solutions:
• Increase Model Complexity – Add more layers/neurons to capture patterns
• Train Longer – Ensure the model has had enough training epochs
• Feature Engineering – Provide better input features for learning
74
75
2. Vanishing Gradient Problem
Cause:
• The chain rule of backpropagation involves multiplying many small values, causing
gradients to decay exponentially.
• Sigmoid and Tanh activations worsen this because their derivatives are always ≤ 0.25.
Example: Imagine passing a message through many people, and each person whispers
softer. Eventually, the message disappears!
Solutions:
• Use ReLU activation (avoids small derivatives for positive values).
• Apply batch normalization to stabilize activations.
• Use proper weight initialization (Xavier/He initialization).
76
3. Exploding Gradient Problem
Cause:
• If derivatives are greater than 1, multiplying them amplifies gradients
exponentially.
• This happens in deep networks with large weight values.
Example: Imagine a microphone picking up its own sound and amplifying it into loud
feedback noise!
Solutions:
• Gradient clipping (limits gradient values to prevent explosion).
• Use adaptive learning rates (optimizers like Adam, RMSprop).
• Batch normalization (keeps values in a stable range).
77
4. Difficulties in convergence
Convergence refers to a neural network gradually reaching an optimal state where
the loss is minimized. However, several factors can make this process slow or
unstable.
• Neural networks, particularly Convolutional Neural Networks (CNNs), are used for
facial recognition tasks.
• These models are trained to identify or verify a person’s identity based on facial
features.
• The network learns to recognize distinct features of a face, such as the eyes, nose,
and mouth, and can match these features with stored images in a database. It’s
used in security systems, phone unlocking, and surveillance.
• They are used in offices for selective entries. The systems thus authenticate a
human face and match it up with the list of IDs that are present in its database.
84
2. Stock Market Prediction
• MLPs consist of multiple layers of fully connected nodes, each learning patterns and
trends from past performances. This helps investors make informed decisions about
buying, selling, or holding stocks.
85
3. Social Media
• Neural networks are used in social media platforms for various tasks like
recommendation systems, sentiment analysis, and content moderation.
• Neural networks analyze user behavior (likes, comments, shares) and content
preferences to provide personalized recommendations.
• They also detect harmful or inappropriate content (e.g., hate speech, explicit
images) through natural language processing (NLP) and image classification
models.
86
4. Aerospace & Defense
• In aerospace and defense, neural networks are used for target detection,
navigation, radar signal processing, and autonomous vehicles.
• Neural networks can analyze radar signals or images to detect enemy targets or
obstacles.
• They are also used in autonomous drones or unmanned aerial vehicles (UAVs) to
navigate and make decisions based on real-time data.
87
5. Healthcare
• CNNs are used to analyze medical images like X-rays, MRIs, or CT scans to
detect diseases like cancer, tumors, or other anomalies.
88
6. Signature Verification and Handwriting Analysis
• Neural networks are used for signature verification and handwriting recognition
in applications such as fraud detection and document authentication.
89
7. Weather Forecasting
• They can also improve the accuracy of weather models by handling complex,
non-linear relationships in the data.
In summary, Neural networks are widely applied across diverse fields. In each
case, they learn from vast amounts of data, detect patterns, and make
predictions or decisions. Their ability to handle complex and unstructured data
makes them especially valuable in tasks like recognition, prediction, analysis, and
automation.
90
Course Level Assessment Questions
91
1. Suppose you have a 3-dimensional input x = (x1, x2, x3) = (2, 2, 1) fully connected to 1
neuron which is in the hidden layer with activation function sigmoid. Calculate the
output of the hidden layer neuron.
92
2. Design a single layer perceptron to compute the NAND (not-AND) function. This
function receives two binary-valued inputs x1 and x2, and returns 0 if both inputs are 1,
and returns 1 otherwise.
93
3. Suppose we have a fully connected, feed-forward network with no hidden layer, and 5
input units connected directly to 3 output units. Briefly explain why adding a hidden
layer with 8 linear units does not make the network any more powerful.
4. Briefly explain one thing you would use a validation set for, and why you can’t just do
it using the test set.
6. You would like to train a fully-connected neural network with 5 hidden layers, each
with 10 hidden units. The input is 20-dimensional and the output is a scalar. What is the
total number of trainable parameters in your network? 94
1. Discuss the limitation of a single layer perceptron with an example.
2. List the advantages and disadvantages of sigmoid and ReLU activation functions
3.
95
96
Previous Year Questions
97
• How does neural network solve the XOR problem?
• Describe Perceptron and its components.
• Explain the practical issues in neural network training
• Define universal approximation theorem.
• Discuss methods to prevent overfitting in neural networks.
• A 3-dimensional input X = (X1, X2, X3) = (1, 2, 1) is fully connected to 1 neuron
which is in the hidden layer with binary sigmoid activation function. Calculate the
output of the hidden layer neuron. Assume associated weights. Neglect bias term.
• Draw the architecture of a multi-layer perceptron. Derive update rules for
parameters in the multi-layer neural network through the gradient descent.
98
• Describe various activation functions used in neural networks.
• Calculate the output of the following neuron Y if the activation function is a
bipolar sigmoid.
• Explain the importance of choosing the right step size in neural networks.
• Discuss the disadvantages of single layer perceptrons with an example.
99
• List any three applications of neural network.
• Explain different activations functions and their derivatives used in neural networks
with the help of graphical representation.
• Implement the back propagation algorithm to train Multi Layer Perceptron using
tanh activation function.
100