0% found this document useful (0 votes)

21 views100 pages

Module 1

The document provides an overview of neural networks, including single-layer and multi-layer perceptrons, activation functions, and training challenges. It discusses the biological inspiration behind artificial neurons, the perceptron algorithm, and the advantages of neural networks over traditional machine learning methods. Additionally, it outlines the structure and function of multilayer perceptrons and their role in deep learning applications.

Uploaded by

akshaylalsp6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views100 pages

Module 1

Uploaded by

akshaylalsp6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

CST414 – Deep Learning

Module 1 – Neural Networks

Introduction to neural networks -Single layer perceptrons, Multi Layer
Perceptrons (MLPs), Representation Power of MLPs, Activation functions - Sigmoid, Tanh,
ReLU, Softmax. , Risk minimization, Loss function, Training MLPs with back propagation,
Practical issues in neural network training - The Problem of Overfitting, Vanishing and
exploding gradient problems, Difficulties in convergence, Local and spurious Optima,
Computational Challenges. Applications of neural networks.

Reena Thomas, Asst. Prof., CSE dept., CEMP

1
Text Books
1. Goodfellow, I., Bengio,Y., and Courville, A., Deep Learning, MIT Press, 2016.
2. Neural Networks and Deep Learning, Aggarwal, Charu C.
3. Fundamentals of Deep Learning: Designing Next-Generation Machine Intelligence
Algorithms (1st. ed.). Nikhil Buduma and Nicholas Locascio. 2017. O'Reilly Media,
Inc.

Reference Books
1. Satish Kumar, Neural Networks: A Classroom Approach, Tata McGraw-Hill
Education, 2004.
2. Yegnanarayana, B., Artificial Neural Networks PHI Learning Pvt. Ltd, 2009.
3. Michael Nielsen, Neural Networks and Deep Learning, 2018

2
Artificial Intelligence (AI)
Refers to the development of computer systems capable of
performing tasks that typically require human intelligence.
These tasks include recognizing speech, making decisions,
solving problems, and identifying patterns.

Machine learning (ML)

It is a branch of AI that enables algorithms to uncover
hidden patterns within datasets, allowing them to make
predictions on new, similar data without explicit
programming for each task.

Deep Learning (DL)

It is a branch of machine learning that is based on artificial
neural network (ANN) architecture. ANN uses layers of
interconnected nodes called neurons that work together to
process and learn from the input data.

Convolutional Neural Network (CNN)

It is a type of ANN, specifically designed to process and
analyse data with grid-like structure, such as images or
sequential data. Eg: image recognition, object detection,
video analysis.
3
Biological Neurons
Artificial neural networks are popular machine learning techniques that simulate the mechanism of learning in
biological organisms. The human nervous system contains cells, which are referred to as neurons. The
foundational unit of the human brain is the neuron. A tiny piece of the brain, about the size of grain of rice,
contains over 10,000 neurons, each of which forms an average of 6,000 connections with other neuron.

Dendrite : Branch-like extensions from the neuron’s cell body that

receive signals (electrical or chemical) from other neurons or sensory
stimuli and transmit it toward the soma.

Soma (Cell Body) : The main body of the neuron containing the nucleus
and other organelles. Integrates incoming signals from dendrites and
generates outgoing signals when necessary.

Nucleus : A membrane-bound structure within the soma that contains the neuron’s genetic material. Regulates
cellular activities, including the production of proteins essential for neuron function.

Axon : A long, slender projection extending from the neuron’s soma. Transmits electrical impulses away from the
cell body to other neurons, muscles, or glands.

4
Axon Terminal : The endpoint of an axon where it connects to other neurons or
target cells. Releases neurotransmitters into the synaptic cleft to communicate
with the next neuron or target cell.

The neurons are connected to one another with the use of axons and dendrites,
and the connecting regions between axons and dendrites are referred to as
Synapses.

Synaptic strength : refers to the varying weights of connections between

neurons in a network, which change based on the activities of the neurons at
both ends of the connection. This change is the fundamental process through
which learning occurs in living organisms.

• This biological mechanism is simulated in ANN’s, which contain computation units that are referred to as
neurons.
• The computational units are connected to one another through weights, which serve the same role as the
strengths of synaptic connections in biological organisms.
• After being weighted by the strength of their respective connections, the inputs are summed together in the
cell body.
• This sum is then transformed into a new signal that’s propagated along the cell’s axon and sent off to other
neurons.

5
6
• In 1943, Warren S. McCulloch and Walter H. Pitts introduced the concept of an
artificial neuron, inspired by biological neurons.
• An artificial neuron receives multiple inputs (x1, x2, ..., xn), each multiplied by a
corresponding weight (w1, w2, ..., wn).
• The weighted inputs are summed to calculate the logit (z), often with an added
bias term (b).
• The logit is then passed through an activation function (f) to produce the output
y=f(z), which can be sent to other neurons.

7
8
• In conclusion, ANN processes inputs by passing values from input neurons to output
neurons, using weights as key parameters.

• Learning occurs by changing the weights connecting the neurons.

• Just as external stimuli are needed for learning in biological organisms, the external
stimulus in ANN’s is provided by the training data containing examples of input-
output pairs of the function to be learned.

• For example, the training data might contain pixel representations of images (input)
and their annotated labels (e.g., carrot, banana) as the output.

• The network uses this data to improve predictions by comparing them to the correct
labels and adjusting weights based on errors.

• A good ANN can generalize, meaning it performs well on new, unseen data after
being trained. The true value of machine learning lies in this ability to generalize.

9
Advantages of Neural Networks

Neural networks have two key advantages over traditional machine learning:

• Higher-level abstraction of expressing semantic insights (meaningful patterns or

relationships within the data) about data domains by architectural design
choices in the computational graph.

• Neural networks provide a simple way to adjust the complexity of a model by

adding or removing neurons from the architecture according to the availability
of training data or computational power.

10
A Perceptron is an Artificial Neuron

• It is the simplest possible Neural Network and Neural Networks are the building
blocks of Machine Learning.
• In 1957, Frank Rosenblatt "invented" a Perceptron program, on an IBM 704
computer at Cornell Aeronautical Laboratory.
• Scientists discovered that neurons process sensory input, store information, and
make decisions using electrical signals.
• Inspired by this, Frank proposed perceptrons to simulate these brain functions and
enable learning and decision-making.

• The original Perceptron was designed to take a number of binary inputs, and
produce one binary output (0 or 1).
• The idea was to use different weights to represent the importance of each input,
and that the sum of the values should be greater than a threshold value before
making a decision like true or false (0 or 1).
11
The Perceptron Algorithm

Frank Rosenblatt suggested this algorithm:

1. Set a threshold value
2. Multiply all inputs with its weights
3. Sum all the results
4. Activate the output

Eg.: Imagine a perceptron in your brain deciding whether to go to a concert.

It considers factors like the artist's quality and the weather, with each factor
having a different weight in the decision.

12
Criteria Input Weight
Artists is Good x1 = 0 or 1 w1 = 0.7
Weather is Good x2 = 0 or 1 w2 = 0.6
Friend will Come x3 = 0 or 1 w3 = 0.5
Food is Served x4 = 0 or 1 w4 = 0.3
Alcohol is Served x5 = 0 or 1 w5 = 0.4

inputs(x1,x2,x3,x4,x5) = [1, 0, 1, 0, 1]
Weights(w1,w2,w3,w4,w5) = [0.7, 0.6, 0.5, 0.3, 0.4]

1. Threshold = 1.5 3. Sum all the results: (The Weighted Sum)

2. Multiply all inputs with its weights 0.7 + 0 + 0.5 + 0 + 0.4 = 1.6
x1 * w1 = 1 * 0.7 = 0.7 4. Activate the Output:
x2 * w2 = 0 * 0.6 = 0 Return true if the sum > 1.5
x3 * w3 = 1 * 0.5 = 0.5 ("Yes I will go to the Concert")
x4 * w4 = 0 * 0.3 = 0
x5 * w5 = 1 * 0.4 = 0.4
13
• If the weather weight is 0.6 for you, it might different for someone else.
• A higher weight means that the weather is more important to them.
• If the threshold value is 1.5 for you, it might be different for someone else.
• A lower threshold means they are more wanting to go to the concert.
• A Perceptron is often used to classify data into two parts.
• A Perceptron is also known as a Linear Binary Classifier.

14
AND Gate problem

Random weights are w1=0.9, w2=0.9

Threshold= 0.5
Round 1
1st instance to the perceptron x1=0, x2=0
Weighted sum=0
Sum,0<0.5 , then the output is 0
It will not update the weight because there is no error in this case.
15
2nd instance to the perceptron x1=0,x2=1
Weighted sum=0.9
Activation unit return 1, because 0.9>0.5
output of this instance should be 0.
This instance is not predicted correctly.
That’s why, we will update weights based on the error.
ε = actual – prediction = 0 – 1 = -1
Learning rate = 0.5
We will add error times learning rate value to the weights
w1 = w1 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4
w2 = w2 + α * ε = 0.9 + 0.5 * (-1) = 0.9 – 0.5 = 0.4

3rd instance.
x1 = 1 and x2 = 0.
Sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 0 * 0.4 = 0.4
Activation unit will return 0, because output of the sum unit = 0.5 and it is less than 0.5.
We will not update weights.
16
4th instance.
x1 = 1 and x2 = 1.
sum unit: Σ = x1 * w1 + x2 * w2 = 1 * 0.4 + 1 * 0.4 = 0.8
Activation unit will return 1 because output of the sum unit is 0.8 and it is greater than
the threshold value 0.5.
Its actual value should 1 as well. This means that 4th instance is predicted correctly.
We will not update anything.
Round 2
1st instance.
x1 = 0 and x2 = 0.
Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 0 * 0.4 = 0.4
Activation unit will return 0 because sum unit is 0.4 and it is less than the threshold
value 0.5.
The output of the 1st instance should be 0 as well. This means that the instance is
classified correctly.

17
2nd instance
x1 = 0 and x2 = 1.
Sum unit: Σ = x1 * w1 + x2 * w2 = 0 * 0.4 + 1 * 0.4 = 0.4
Activation unit will return 0 because sum unit is less than the threshold 0.5.
Its output should be 0 as well.
This means that it is classified correctly and we will not update weights.

For 3rd and 4th instances already for the current weight values in the previous
round. They were classified correctly.

18
Single-Layer Perceptron
• A single-layer perceptron is the simplest type of neural network that performs
binary classification. We can also call it as the perceptron.
• It consists of:
• Inputs, each have it’s corresponding weight.
• A bias added to the weighted sum.
• A step activation function to determine the output.
• It adjusts weights and bias using a learning rule during training.
• It works only for linearly separable data, meaning data that can be separated by
a straight line.

19
This neural network contains a single input layer and an output node.

20
Each training instance is in the form of (X, y):
• Features: X=[x1,………,xd] a set of d variables.
• Class Label: y∈{−1,+1}, the observed binary outcome.

The goal is to predict y for new instances where it is unknown.

• The input layer has d nodes, each transmitting a feature.
• Each feature is connected to the output node with a weight W=[w1,…,wd].
• The output node computes a linear function to make a prediction.

21
The sign function is used to predict the class (+1 or -1) based on the output of the
perceptron.

• If the calculated value is positive or zero, the sign function gives +1.
• If the calculated value is negative, it gives −1.

The error is the difference between the actual class (y) and the predicted class (y^):

The perceptron has two layers:

• Input layer: Passes the features to the next layer but doesn’t do any calculation.
• Output layer: Computes the final result based on the input and weights.

Even though there are two layers, the input layer is not counted as a computational
layer, so the perceptron is called a single-layer network.

22
The bias is an additional variable that accounts for the invariant part of the prediction.
It helps adjust the decision boundary, especially in cases of imbalanced binary class
distribution, where one class is more frequent than the other.

• The perceptron algorithm is heuristically designed to minimize the number of

misclassifications, and convergence proofs were available that provided correctness
guarantees of the learning algorithm
• Goal of the perceptron algorithm in least-squares form with respect to all training
instances in a data set D containing feature-label pairs.
• This type of minimization objective function is also referred to as a loss function.

23
Where does the perceptron fail

Perceptron fails at similar situations line linear SVM

• Inability to learn and represent non-linear data ( Inability to Handle Nonlinear
Decision Boundaries )

24
Multilayer Perceptron (MLP)
• Feedforward Neural Networks (FNNs), also known as Deep feedforward
networks or Multilayer Perceptrons (MLPs), are a type of neural network that
defines a mapping y=f(x;θ).
• Here, θ represents the learnable parameters (weights and biases).
• The network learns these parameters to approximate a function f∗(x), which
maps inputs x to outputs y.
• No Feedback Connections: MLPs have no feedback loops or connections in
which outputs of the model are fed back into itself.
• MLPs is referred to as feed-forward networks, because successive layers feed
into one another in the forward direction from input to output.
• When feedforward neural networks are extended to include feedback
connections, they are called Recurrent Neural Networks.

25
Layers:
1. Input Layer: Accepts raw data x and transmits it to the next layer.
2. Hidden Layers: Perform computations to extract meaningful features from the
input. These layers are hidden because their outputs are not directly visible.
3. Output Layer: Produces the final predictions y.

Each layer computes a function and passes its output to the next layer. For example:

• The length of this chain (number of layers) defines the depth of the network.
• The number of neurons of the hidden layers defines the width of the model.
• A multilayer network evaluates compositions of functions:
• For a path of length 2, where 𝑔(⋅) is computed in layer 𝑚, and 𝑓(⋅) in layer 𝑚+1, the
computation is:

• Non-linear activation functions like ReLU or Sigmoid are crucial to the network's
ability to model complex relationships.
26
Training the Network
• Goal: Adjust the parameters θ to minimize the difference between f(x) (model
prediction) and f∗(x) (true function).
• Process:
• Loss Function: Measures the error between predictions and actual values.
• Backpropagation: Uses dynamic programming to compute gradients of the
loss with respect to all parameters.
• Optimization Algorithm: Updates parameters using techniques like Stochastic
Gradient Descent (SGD) or Adam.
• Training data consists of input-output pairs (x, y), where y≈f∗(x).

• The strategy of deep learning can be expressed as:

• ϕ(x;θ): The feature representation learned by the network.

• w: Parameters that map this representation to the output.
• The optimization algorithm adjusts θ to find the best ϕ(x) for predicting y.

27
Summary of Characteristics of MLPs

• Feedforward Flow: No feedback connections—data flows from input to output.

• Layers: Input, hidden, and output layers, each performing specific functions.
• Compositional Functions: Depth comes from the composition of multiple functions.
• Hidden Layers: Extract hierarchical representations and increase model capacity.
• Automatic Training: Parameters are learned through backpropagation and
optimization.

28
Representation Power of MLP’s
• It is the ability of a neural network to classify data correctly by creating accurate
decision boundaries for different classes.
• A neural network achieves this by combining simpler functions to form complex
ones through layers of computation.

• Why depth matters ?

• The power of deep learning comes from repeatedly applying nonlinear
transformations across layers. This increases the network's expressive
power, allowing it to model complex patterns with fewer parameters.
• Nonlinearity is Essential
• Linear Activations: Cannot handle non-linearly separable data as the output
remains a linear combination of inputs.
• Nonlinear Activations: Enable nonlinear transformations (e.g., with ReLU),
making data linearly separable in higher-dimensional spaces.

29
The power of non linear activation functions in transforming a data set to linear separability.

30
• Suppose the hidden layer uses ReLU ac va on and learns two new features, h1 and
h2, based on the input data.
• The hidden layer creates a new representation of the data, such that:
• Class ‘*’ has points like (1,0) and (0,1).
• Class ‘+’ has points like (0,0).
• These points are now linearly separable in the hidden layer.

Importance of Nonlinear Activation

The ReLU function "thresholds" negative values to 0, enabling the network to create
new, useful features.
For example:
• The output O=h1+h2 (with weights set to 1 and a linear activation) can separate
two classes:
For class ‘*’, O=1
For class ‘+’, O=0
This shows how much of a network’s power lies in its activation functions.

31
Role of Activation Functions

• Activation functions (e.g., ReLU, sigmoid, tanh) allow the network to Perform
nonlinear transformations of data.
• Make complex patterns linearly separable in higher dimensions.
• As the number of layers (depth) increases, the network's power to learn and
represent complex patterns also increases.

32
Activation Functions
Why do we need Activation functions in neural networks ?

• In the output layer, an activation function Φ(v) can determine the nature of the output,
such as constraining it to a specific range (e.g., producing a probability value in [0,1]).

• In multilayer neural networks, activation functions introduce non-linearity into the

hidden layers, which is essential for increasing the model's complexity and enabling it to
learn more sophisticated patterns. Without non-linear activation functions, a neural
network—regardless of the number of layers—would be equivalent to a single-layer
network because linear transformations alone cannot capture complex relationships.

• The activation functions used during inference might differ from those employed in the
loss functions during training. For instance, a perceptron uses the sign function
Φ(v)=sign(v) for making predictions, but it does not rely on any activation function
while computing the perceptron criterion during training.
33
Identity Activation Function

• It is the simplest activation function, where the output is the same as the input.
Mathematically, it is expressed as: Φ(v)=v
• For a single layer network, if the training pair is (X, y) the outputs is as follows

34
35
36
37
38
39
40
Loss Function
• The choice of the loss function is critical in defining the outputs in a way that is
sensitive to the application at hand.
• For example, least-squares regression with numeric outputs requires a simple
squared loss of the form (y − ˆy)^2 for a single training instance with target y and
prediction ˆy.
• other types of loss like hinge loss for y {−1, +1} and real valued prediction ˆy (with
identity activation): L = max{0, 1 − y · ˆy}
• The hinge loss can be used to implement a learning method, which is referred to as a
support vector machine.

• For probabilistic predictions, two different types of loss functions are used,
depending on whether the prediction is binary or whether it is multi-way:
• Binary targets (Logistic regression)
• Categorical targets

41
42
43
44
45
Training MLPs with back propagation

46
• In Single-layer Neural Networks , Training is straightforward because the loss
function is a direct function of the weights.
• In Multi-layer Neural Networks, Training is more complex because the loss function
is a composition of multiple layers, making it difficult to compute directly.
• To address this, the backpropagation algorithm is used, which calculates the error
gradients efficiently.

The Backpropagation Algorithm

• It relies on the chain rule of calculus to compute error gradients as a sum of local-
gradient products across paths from a node to the output.
• The chain rule allows us to compute the derivative of a composite function. If a
function y depends on u, and u depends on x, the derivative of y with respect to x
is calculated as:

• Backpropagation is essentially an application of dynamic programming (Breaking

complex problems into smaller sub-problems).
47
Two Phases of Backpropagation

1. Forward Phase:
• Inputs are fed into the network, and computations are performed layer by
layer using the current weights.
• The final output is compared with the expected value, and the derivative
of the loss function with respect to the output is calculated.

2. Backward Phase:
• The gradients of the loss function with respect to each weight are
computed using the chain rule, working backward from the output layer
to the input layer.
• These gradients are used to update the weights.

48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
Practical issues in neural network training
Training a neural network involves several practical challenges that can significantly
impact model performance.

1. The Problem of Overfitting and Underfitting

When training a neural network, the goal is to develop a model that performs well on
both training data and unseen test data. However, two common problems that arise
are: overfitting and underfitting.

71
1.1 Overfitting

• Overfitting occurs when a model learns the training data too well, capturing noise
and random fluctuations instead of just the underlying patterns.
• This results in high accuracy on training data but poor performance on test data.

Key Characteristics:
• High training accuracy but low test accuracy
• The model is too complex (too many parameters) for the given dataset
• The model memorizes the data instead of generalizing

Example:
Imagine a model trained to recognize cats and dogs. If it memorizes every detail
(such as the exact lighting or background in each training image), it may fail to
recognize cats and dogs in new images with different backgrounds.
Solutions:

• Regularization (L1/L2) – Penalize large weights to prevent excessive complexity

• Dropout – Randomly deactivate neurons during training to reduce reliance on
specific features
• Early Stopping – Stop training when validation loss starts increasing
• More Training Data – More diverse data helps prevent memorization

73
1.2 Underfitting

Underfitting occurs when a model is too simple to capture the underlying structure of
the data, leading to poor performance on both training and test data.

Key Characteristics:
• Low training accuracy and low test accuracy
• The model fails to capture key patterns in the data
• Can happen if the model has too few parameters or lacks enough training

Example: If we train a neural network with only one hidden layer and very few
neurons to classify handwritten digits, it might not learn enough detail to distinguish
between different numbers.

Solutions:
• Increase Model Complexity – Add more layers/neurons to capture patterns
• Train Longer – Ensure the model has had enough training epochs
• Feature Engineering – Provide better input features for learning
74
75
2. Vanishing Gradient Problem

• Gradients shrink too much as they propagate backward.

• Early layers learn very slowly or not at all.
• The model struggles to improve.

Cause:
• The chain rule of backpropagation involves multiplying many small values, causing
gradients to decay exponentially.
• Sigmoid and Tanh activations worsen this because their derivatives are always ≤ 0.25.

Example: Imagine passing a message through many people, and each person whispers
softer. Eventually, the message disappears!

Solutions:
• Use ReLU activation (avoids small derivatives for positive values).
• Apply batch normalization to stabilize activations.
• Use proper weight initialization (Xavier/He initialization).
76
3. Exploding Gradient Problem

• Gradients become too large, leading to unstable updates.

• The model’s weights oscillate or diverge instead of converging.

Cause:
• If derivatives are greater than 1, multiplying them amplifies gradients
exponentially.
• This happens in deep networks with large weight values.

Example: Imagine a microphone picking up its own sound and amplifying it into loud
feedback noise!

Solutions:
• Gradient clipping (limits gradient values to prevent explosion).
• Use adaptive learning rates (optimizers like Adam, RMSprop).
• Batch normalization (keeps values in a stable range).
77
4. Difficulties in convergence
Convergence refers to a neural network gradually reaching an optimal state where
the loss is minimized. However, several factors can make this process slow or
unstable.

• Poor Learning Rate Selection

Too High: The model overshoots the optimal point, leading to instability.
Too Low: Training is too slow and may get stuck.
• Vanishing & Exploding Gradients
Small gradients → Slow learning (vanishing gradient).
Large gradients → Unstable updates (exploding gradient).
• Poor Weight Initialization : Badly initialized weights can cause slow convergence or
exploding gradients.
• Local and Spurious Optima : The model can get stuck in suboptimal solutions
instead of finding the best one.
• Computational Limitations : Deep networks require high memory and processing
power, otherwise it will slow down the training process.
78
79
80
81
82
83
Applications of Neural Networks
1. Facial Recognition

• Neural networks, particularly Convolutional Neural Networks (CNNs), are used for
facial recognition tasks.

• These models are trained to identify or verify a person’s identity based on facial
features.

• The network learns to recognize distinct features of a face, such as the eyes, nose,
and mouth, and can match these features with stored images in a database. It’s
used in security systems, phone unlocking, and surveillance.

• They are used in offices for selective entries. The systems thus authenticate a
human face and match it up with the list of IDs that are present in its database.

84
2. Stock Market Prediction

• Investments are subject to market risks.

• Multi-Layer Perceptrons (MLPs), are used to predict stock price movements by

analyzing historical data like price, volume, market sentiment, annual returns, and
profit ratios.

• MLPs consist of multiple layers of fully connected nodes, each learning patterns and
trends from past performances. This helps investors make informed decisions about
buying, selling, or holding stocks.

85
3. Social Media

• Neural networks are used in social media platforms for various tasks like
recommendation systems, sentiment analysis, and content moderation.

• Neural networks analyze user behavior (likes, comments, shares) and content
preferences to provide personalized recommendations.

• They also detect harmful or inappropriate content (e.g., hate speech, explicit
images) through natural language processing (NLP) and image classification
models.

86
4. Aerospace & Defense

• In aerospace and defense, neural networks are used for target detection,
navigation, radar signal processing, and autonomous vehicles.

• Neural networks can analyze radar signals or images to detect enemy targets or
obstacles.

• They are also used in autonomous drones or unmanned aerial vehicles (UAVs) to
navigate and make decisions based on real-time data.

87
5. Healthcare

• Neural networks are applied in medical image analysis, diagnostic systems,

and drug discovery.

• CNNs are used to analyze medical images like X-rays, MRIs, or CT scans to
detect diseases like cancer, tumors, or other anomalies.

• Neural networks also help in predicting patient outcomes, personalizing

treatments, and even discovering new drugs by analyzing biological data.

88
6. Signature Verification and Handwriting Analysis

• Neural networks are used for signature verification and handwriting recognition
in applications such as fraud detection and document authentication.

• The network learns the unique features of a person’s handwriting or signature.

• By comparing new inputs to stored signatures, the model can determine

authenticity or match the handwriting to the correct individual.

• In handwriting recognition, neural networks are trained to transcribe handwritten

text into digital format.

89
7. Weather Forecasting

• Neural networks are used to predict weather patterns, including temperature,

rainfall, and storm prediction.

• It analyze historical weather data (such as temperature, pressure, and

humidity) and learn patterns over time to forecast future weather conditions.

• They can also improve the accuracy of weather models by handling complex,
non-linear relationships in the data.

In summary, Neural networks are widely applied across diverse fields. In each
case, they learn from vast amounts of data, detect patterns, and make
predictions or decisions. Their ability to handle complex and unstructured data
makes them especially valuable in tasks like recognition, prediction, analysis, and
automation.
90
Course Level Assessment Questions

91
1. Suppose you have a 3-dimensional input x = (x1, x2, x3) = (2, 2, 1) fully connected to 1
neuron which is in the hidden layer with activation function sigmoid. Calculate the
output of the hidden layer neuron.

92
2. Design a single layer perceptron to compute the NAND (not-AND) function. This
function receives two binary-valued inputs x1 and x2, and returns 0 if both inputs are 1,
and returns 1 otherwise.

93
3. Suppose we have a fully connected, feed-forward network with no hidden layer, and 5
input units connected directly to 3 output units. Briefly explain why adding a hidden
layer with 8 linear units does not make the network any more powerful.

4. Briefly explain one thing you would use a validation set for, and why you can’t just do
it using the test set.

5. Give a method to fight vanishing gradients in fully-connected neural networks.

Assume we are using a network with Sigmoid activations trained using SGD.

6. You would like to train a fully-connected neural network with 5 hidden layers, each
with 10 hidden units. The input is 20-dimensional and the output is a scalar. What is the
total number of trainable parameters in your network? 94
1. Discuss the limitation of a single layer perceptron with an example.
2. List the advantages and disadvantages of sigmoid and ReLU activation functions
3.

95
96
Previous Year Questions

97
• How does neural network solve the XOR problem?
• Describe Perceptron and its components.
• Explain the practical issues in neural network training
• Define universal approximation theorem.
• Discuss methods to prevent overfitting in neural networks.
• A 3-dimensional input X = (X1, X2, X3) = (1, 2, 1) is fully connected to 1 neuron
which is in the hidden layer with binary sigmoid activation function. Calculate the
output of the hidden layer neuron. Assume associated weights. Neglect bias term.
• Draw the architecture of a multi-layer perceptron. Derive update rules for
parameters in the multi-layer neural network through the gradient descent.

98
• Describe various activation functions used in neural networks.
• Calculate the output of the following neuron Y if the activation function is a
bipolar sigmoid.

• Explain the importance of choosing the right step size in neural networks.
• Discuss the disadvantages of single layer perceptrons with an example.

99
• List any three applications of neural network.
• Explain different activations functions and their derivatives used in neural networks
with the help of graphical representation.
• Implement the back propagation algorithm to train Multi Layer Perceptron using
tanh activation function.

100

Huawei: H13-311 - V3.0 Exam
100% (2)
Huawei: H13-311 - V3.0 Exam
93 pages
Deep Learning for Vision Systems 1st Edition Mohamed Elgendy download pdf
100% (1)
Deep Learning for Vision Systems 1st Edition Mohamed Elgendy download pdf
62 pages
Part7.2 Artificial Neural Networks
No ratings yet
Part7.2 Artificial Neural Networks
51 pages
UNIT-4 Material
No ratings yet
UNIT-4 Material
43 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
Neural Networks and CNN
No ratings yet
Neural Networks and CNN
25 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Lecture Slides-Week13,14
No ratings yet
Lecture Slides-Week13,14
62 pages
Week 2
No ratings yet
Week 2
47 pages
ML Unit-5 Final
No ratings yet
ML Unit-5 Final
23 pages
28 Lecture CSC462
No ratings yet
28 Lecture CSC462
28 pages
Neural Network
No ratings yet
Neural Network
85 pages
Chapter 3-1 Neural Network
No ratings yet
Chapter 3-1 Neural Network
43 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSE4020 ETH VL2023240103694 2023-09-01 Reference-Material-I
35 pages
SOS Final Submission
No ratings yet
SOS Final Submission
36 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Neural network
No ratings yet
Neural network
7 pages
Basics
No ratings yet
Basics
48 pages
unit-3_ml[1]
No ratings yet
unit-3_ml[1]
21 pages
Ict L2 PDF
No ratings yet
Ict L2 PDF
49 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
Lecture 2
No ratings yet
Lecture 2
37 pages
Mod-1 Part 1
No ratings yet
Mod-1 Part 1
143 pages
Wk. 12. Artificial Neural Networks [12!05!2021] (1)
No ratings yet
Wk. 12. Artificial Neural Networks [12!05!2021] (1)
48 pages
T. Sreedevi E-Mail: Sreedu2002@yahoo - Co.in Iv Cse G.Pulla Reddy Engineering College, Kurnool
No ratings yet
T. Sreedevi E-Mail: Sreedu2002@yahoo - Co.in Iv Cse G.Pulla Reddy Engineering College, Kurnool
11 pages
SCT UNIT-2
No ratings yet
SCT UNIT-2
30 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
ML & AI Notes
No ratings yet
ML & AI Notes
81 pages
Module 3 Ppt
No ratings yet
Module 3 Ppt
83 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
UNIT - 4
No ratings yet
UNIT - 4
17 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
81 pages
dp learn
No ratings yet
dp learn
72 pages
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
No ratings yet
CMPE 442 Introduction To Machine Learning: Artificial Neural Networks
65 pages
Introduction To Artificial Neural Networks and Perceptron
No ratings yet
Introduction To Artificial Neural Networks and Perceptron
59 pages
Unit 1 Notes Final.docx
No ratings yet
Unit 1 Notes Final.docx
36 pages
Course Material Neural Updated
No ratings yet
Course Material Neural Updated
90 pages
Unit-V
No ratings yet
Unit-V
42 pages
Introduction To Neural Networks: Training Learn Generalization
No ratings yet
Introduction To Neural Networks: Training Learn Generalization
46 pages
Introduction To Neural Networks
100% (1)
Introduction To Neural Networks
46 pages
module-4
No ratings yet
module-4
84 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
30 pages
Lntroduction NN
No ratings yet
Lntroduction NN
96 pages
ML-Lec10-Artificial Neural Networks (1)
No ratings yet
ML-Lec10-Artificial Neural Networks (1)
76 pages
IV Ai & Ds Al3451 Ml Unit4
No ratings yet
IV Ai & Ds Al3451 Ml Unit4
36 pages
UNIT II Basic On Neural Networks
No ratings yet
UNIT II Basic On Neural Networks
36 pages
Lecture 4
No ratings yet
Lecture 4
65 pages
Neural Networks
No ratings yet
Neural Networks
36 pages
NN Lecture1 Introduction
No ratings yet
NN Lecture1 Introduction
40 pages
Module - 2
No ratings yet
Module - 2
33 pages
Unit 6 Application of AI
No ratings yet
Unit 6 Application of AI
91 pages
Lesson 03 Artificial Neural Network
No ratings yet
Lesson 03 Artificial Neural Network
116 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
CO2- ANN Structure and Funadamentals_P1
No ratings yet
CO2- ANN Structure and Funadamentals_P1
65 pages
LIET III-II CSE AIML IV UNIT Previous Yrs QN Papers Qns and Answers
No ratings yet
LIET III-II CSE AIML IV UNIT Previous Yrs QN Papers Qns and Answers
15 pages
20200428135045cfbc718e2c (1)
No ratings yet
20200428135045cfbc718e2c (1)
30 pages
Refined Chapter 5 UceQEJ (2)
No ratings yet
Refined Chapter 5 UceQEJ (2)
79 pages
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
No ratings yet
Machine Learning Using Neural Networks: Presentation By: C. Vinoth Kumar SSN College of Engineering
24 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
37 pages
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Publicacions Tesis Inteligencia Artificial
No ratings yet
Publicacions Tesis Inteligencia Artificial
24 pages
A_New_Approach_for_the_Short-Term_Load_Forecasting
No ratings yet
A_New_Approach_for_the_Short-Term_Load_Forecasting
7 pages
CSE Artificial Neural Networks Report
No ratings yet
CSE Artificial Neural Networks Report
22 pages
Mini Project 1
No ratings yet
Mini Project 1
16 pages
Neural Networks For Beginners
86% (7)
Neural Networks For Beginners
72 pages
Deep Learning
No ratings yet
Deep Learning
169 pages
DL Unit2 HD
No ratings yet
DL Unit2 HD
141 pages
Artificial Intelligence in Power System
100% (3)
Artificial Intelligence in Power System
22 pages
p17
No ratings yet
p17
10 pages
Deep Learning: Huawei AI Academy Training Materials
No ratings yet
Deep Learning: Huawei AI Academy Training Materials
47 pages
Neural Networks: CMR Technical Campus
No ratings yet
Neural Networks: CMR Technical Campus
30 pages
Analyzing Types of Neural Networks in Deep Learning
No ratings yet
Analyzing Types of Neural Networks in Deep Learning
15 pages
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
No ratings yet
Scientific Machine Learning Through Physics-Informed Neural Networks: Where We Are and What's Next
67 pages
1 - Perceptron in Machine Learning
No ratings yet
1 - Perceptron in Machine Learning
6 pages
Chapter 7 - Neural-Networks
100% (1)
Chapter 7 - Neural-Networks
60 pages
Answers PDF
No ratings yet
Answers PDF
9 pages
Multi Layer Feed-Forward Network Learning
No ratings yet
Multi Layer Feed-Forward Network Learning
5 pages
Cs601pc - Machine Learning Unit - 2-1
No ratings yet
Cs601pc - Machine Learning Unit - 2-1
148 pages
Goj 2250
No ratings yet
Goj 2250
18 pages
Optical Neural Network Optical Computing A Review
No ratings yet
Optical Neural Network Optical Computing A Review
9 pages
Solar Si ANN
No ratings yet
Solar Si ANN
18 pages
Data Mining Techniques: Presentation On Neural Network
No ratings yet
Data Mining Techniques: Presentation On Neural Network
55 pages
Module 1 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
100% (1)
Module 1 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
18 pages
Deep Learning: Book Review
No ratings yet
Deep Learning: Book Review
4 pages
What Is A Neural Network
No ratings yet
What Is A Neural Network
3 pages
Introduction To Artificial Intelligence Technique: Humans."
No ratings yet
Introduction To Artificial Intelligence Technique: Humans."
10 pages
Lecture 01-Introduction
No ratings yet
Lecture 01-Introduction
33 pages
Jamshid Ghaboussi - Soft Computing in Engineering-CRC Press (2018)
No ratings yet
Jamshid Ghaboussi - Soft Computing in Engineering-CRC Press (2018)
221 pages

Module 1

Uploaded by

Module 1

Uploaded by

CST414 – Deep Learning

Module 1 – Neural Networks

Reena Thomas, Asst. Prof., CSE dept., CEMP

Machine learning (ML)

Deep Learning (DL)

Convolutional Neural Network (CNN)

Dendrite : Branch-like extensions from the neuron’s cell body that

Synaptic strength : refers to the varying weights of connections between

• Learning occurs by changing the weights connecting the neurons.

• Higher-level abstraction of expressing semantic insights (meaningful patterns or

• Neural networks provide a simple way to adjust the complexity of a model by

Frank Rosenblatt suggested this algorithm:

Eg.: Imagine a perceptron in your brain deciding whether to go to a concert.

1. Threshold = 1.5 3. Sum all the results: (The Weighted Sum)

Random weights are w1=0.9, w2=0.9

The goal is to predict y for new instances where it is unknown.

The perceptron has two layers:

• The perceptron algorithm is heuristically designed to minimize the number of

Perceptron fails at similar situations line linear SVM

• The strategy of deep learning can be expressed as:

• ϕ(x;θ): The feature representation learned by the network.

• Feedforward Flow: No feedback connections—data flows from input to output.

• Why depth matters ?

Importance of Nonlinear Activation

• In multilayer neural networks, activation functions introduce non-linearity into the

The Backpropagation Algorithm

• Backpropagation is essentially an application of dynamic programming (Breaking

1. The Problem of Overfitting and Underfitting

• Regularization (L1/L2) – Penalize large weights to prevent excessive complexity

• Gradients shrink too much as they propagate backward.

• Gradients become too large, leading to unstable updates.

• Poor Learning Rate Selection

• Investments are subject to market risks.

• Multi-Layer Perceptrons (MLPs), are used to predict stock price movements by

• Neural networks are applied in medical image analysis, diagnostic systems,

• Neural networks also help in predicting patient outcomes, personalizing

• The network learns the unique features of a person’s handwriting or signature.

• By comparing new inputs to stored signatures, the model can determine

• In handwriting recognition, neural networks are trained to transcribe handwritten

• Neural networks are used to predict weather patterns, including temperature,

• It analyze historical weather data (such as temperature, pressure, and

5. Give a method to fight vanishing gradients in fully-connected neural networks.

You might also like