0% found this document useful (0 votes)

16 views49 pages

Unit V

The document provides an overview of neural networks, focusing on their structure, including perceptrons, multilayer perceptrons, and various activation functions. It explains the learning process, types of neural networks, and the importance of non-linearity in modeling complex data. Additionally, it discusses training methods such as stochastic gradient descent and highlights the challenges faced by perceptrons in binary classification tasks.

Uploaded by

Vanshini R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views49 pages

Unit V

Uploaded by

Vanshini R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Artificial Intelligence

1
UNIT V

NEURAL NETWORKS
Contents
• Perceptron - Multilayer perceptron, Activation functions,
network training
• Gradient descent optimization – stochastic gradient
descent,
• Error back propagation, from shallow networks to deep
networks
• Unit saturation (aka the vanishing gradient problem)
• ReLU, hyperparameter tuning
• batch normalization, regularization, dropout.
What is a Neural Network?
• Neural networks are machine learning models that mimic
the complex functions of the human brain.
• A neural network makes decisions in a manner similar to
the human brain, by using processes that mimic the way
biological neurons work together to identify phenomena,
weigh options and arrive at conclusions.
• These models consist of interconnected nodes or neurons
that process data, learn patterns, and enable tasks such as
pattern recognition and decision-making.
• Their ability to learn from vast amounts of data is
transformative, impacting technologies like natural
language processing, self-driving vehicles,
and automated decision-making.
● In neurons, the dendrite receives
electrical signals from the axons of
other neurons. while in the perceptron
these electrical signals are represented
as numerical values.
● At the synapses between the dendrite
and axons, electrical signals are
modulated in various amounts. This is
also modeled in the perceptron by
multiplying each input value by a value
called the weight.
● An actual neuron fires an output signal
only when the total strength of the
input signals exceed a certain
threshold. This phenomenon is
modeled in a perceptron by calculating
the weighted sum of the inputs to
represent the total strength of the input
signals, and applying a threshold
function on the sum to determine its
output.
Perceptron
• A perceptron is a single processing unit of a neural network.
• Associated with each input, Xj ∊ R, j = 1, ... , d, is a connection
weight, or synaptic weight Wj ∊ R, and the output, y, in the simplest
case is a weighted sum of the inputs
• A perceptron can be defined as a single artificial neuron that
computes its weighted input with the help of the threshold activation
functions.
• They are used for Binary Classification.
• Neural networks are capable of learning and identifying
patterns directly from data without pre-defined rules.
These networks are built from several key components:
• Neurons: The basic units that receive inputs, each neuron
is governed by a threshold and an activation function.
• Connections: Links between neurons that carry
information, regulated by weights and biases.
• Weights and Biases: These parameters determine the
strength and influence of connections.
• Propagation Functions: Mechanisms that help process
and transfer data across layers of neurons.
• Learning Rule: The method that adjusts weights and
biases over time to improve accuracy.
Learning in neural networks follows a structured,
three-stage process:

• Input Computation: Data is fed into the network.

• Output Generation: Based on the current parameters, the
network generates an output.
• Iterative Refinement: The network refines its output by
adjusting weights and biases, gradually improving its
performance on diverse tasks.
Perceptron : Mathematical Representation
• The process of summing the weighted products of inputs
in a perceptron is known as Multiple-Accumulate
Process. This is mathematically represented as matrix
multiplication
• Since the inputs and weights are represented as 1-D
matrices for a perceptron, they are often called as vectors.
Example
• Problem : whether you should go surfing (Yes: 1, No: 0).
The decision to go or not to go is our predicted outcome,
y.
• Let’s assume that there are three factors influencing your
decision-making:
- Are the waves good? (Yes: 1, No: 0)
- Is the line-up empty? (Yes: 1, No: 0)
- Has there been a recent shark attack? (Yes: 0, No: 1)
Then, let’s assume the following, giving us the following
inputs:
• X1 = 1, since the waves are pumping
• X2 = 0, since the crowds are out
• X3 = 1, since there hasn’t been a recent shark attack

Now, we need to assign some weights to determine

importance. Larger weights signify that particular variables
are of greater importance to the decision or outcome.
• W1 = 5, since large swells don’t come around often
• W2 = 2, since you’re used to the crowds
• W3 = 4, since you have a fear of sharks
• Finally, we’ll also assume a threshold value of 3, which
would translate to a bias value of –3. With all the various
inputs, we can start to plug in values into the formula to get
the desired output.
Y = (1*5) + (0*2) + (1*4) – 3 = 6
• If we use the activation function from the beginning of this
section, we can determine that the output of this node
would be 1, since 6 is greater than 0.
• In this instance, you would go surfing; but if we adjust the
weights or the threshold, we can achieve different
outcomes from the model.
Layers in Neural Network Architecture
Layers in Neural Network Architecture
• Input Layer: This is where the network receives its input
data. Each input neuron in the layer corresponds to a
feature in the input data.
• Hidden Layers: These layers perform most of the
computational heavy lifting. A neural network can have
one or multiple hidden layers. Each layer consists of units
(neurons) that transform the inputs into something that the
output layer can use.
• Output Layer: The final layer produces the output of the
model. The format of these outputs varies depending on
the specific task (e.g., classification, regression).
Types of neural networks
• The perceptron is the oldest neural network, created by
Frank Rosenblatt in 1958.
• Feed forward neural networks, or multi-layer perceptrons
(MLPs), are recently being used. They are comprised of an
input layer, a hidden layer or layers, and an output layer.
• While these neural networks are also commonly referred to
as MLPs, it’s important to note that they are actually
comprised of sigmoid neurons, not perceptrons, as most
real-world problems are nonlinear.
• Data usually is fed into these models to train them, and
they are the foundation for computer vision, natural
language processing and other neural networks.
• Convolutional Neural Networks(CNNs) are similar to
feed forward networks, but they’re usually utilized for
image recognition, pattern recognition, and/or computer
vision.
• These networks harness principles from linear algebra,
particularly matrix multiplication, to identify patterns
within an image.
• Recurrent Neural Networks (RNNs) are identified by
their feedback loops.
• These learning algorithms are primarily leveraged when
using time-series data to make predictions about future
outcomes, such as stock market predictions or sales
forecasting.
FEED-Forward Network
• A feed-forward network usually constitutes multiple inputs
and multiple outputs, as shown below, with multiple feed
forward paths.
• In such a network, the weight vector becomes a mxn
weight matrix
FEED-Forward Network

• The weight matrix can be modelled as a column

matrix of multiple row vectors constituting weights
corresponding to each output.
FEED-Forward Network
• In the below the matrix operation u1, u2 and u3
represents inputs and v1, v2 and v3 represent the
outputs.
Training a perceptron
The perceptron defines a hyper plane, and the neural network
perceptron is just a way of implementing the hyper plane. The
process of training a perceptron includes the following two
tasks:

1. Find suitable weights in such a way that the training

examples are correctly classified.

2. Geometrically try to find an hyper plane that separates the

examples into two different classes.
SINGLE LAYER PERCEPTRON

21
Cons
• Perceptron can only perform binary classification.

• Perceptron is able to offer accurate results only if the

analyzed data can be linearly separable, i.e, they can only
create linear decision boundaries that offer less MSE.

• When confronted with noisy data, the perceptron

algorithm often exhibits a notable lack of robustness.

• The perceptron algorithm is notably sensitive to feature

scaling, meaning that the relative magnitude of input
features can greatly impact its performance.
Logic Gate Implementation Using Perceptron
Multi Layer Perceptron / Neural network

Modeling non-linear decision

boundaries

24
MULTI-LAYER PERCEPTRON
• Implementing AND, OR and NOT gates using single layer
perceptron is simple. But exclusive-OR cannot be
implemented due to its non-linear nature.

• Multi layer perceptron (MLP) is used to model such non-

linear decision boundaries.
MULTI LAYER PERCEPTRON
● A simple multilayer perceptron contains a input layer, a
hidden layer and a output layer.
● Activation functions in hidden layers enable solving
complex problems and create non-linear decision
boundaries.
● A multi layer perceptron or neural network is a ground of
artificial neurons designed to recognize pattern and make
decisions based on input data.
Activation Functions
• While building a neural network, one key decision is
selecting the Activation Function for both the hidden layer
and the output layer.
• Activation functions decide whether a neuron should be
activated.
• An activation function is a mathematical function applied
to the output of a neuron.
• It introduces non-linearity into the model, allowing the
network to learn and represent complex patterns in the
data.
• Without this non-linearity feature, a neural network would
behave like a linear regression model, no matter how many
layers it has.
• Activation function decides whether a neuron should be
activated by calculating the weighted sum of inputs and
adding a bias term.
• This helps the model make complex decisions and
predictions by introducing non-linearities to the output of
each neuron.
• Non-linearity means that the relationship between input
and output is not a straight line.
• In simple terms, the output does not change
proportionally with the input. A common choice is the
ReLU function, defined as σ(x)=max(0,x).
• If we use a linear function, it can only separate them
using a straight line.
• But real-world data is often more complex (e.g.,
overlapping colors, different lighting).
• By adding a non-linear activation function (like ReLU,
Sigmoid, or Tanh), the network can create curved
decision boundaries to separate them correctly.
Why is Non-Linearity Important in Neural
Networks?
• Neural networks consist of neurons that operate
using weights, biases, and activation functions.
• In the learning process, these weights and biases are
updated based on the error produced at the output—a
process known as back propagation.
• Activation functions enable backpropagation by providing
gradients that are essential for updating the weights and
biases.
• Without non-linearity, even deep networks would be
limited to solving only simple, linearly separable problems.
Activation functions empower neural networks to model
highly complex data distributions and solve advanced deep
learning tasks.
Mathematical Proof of Need of Non-
Linearity in Neural Networks
Types of Activation Functions

1. Linear Activation Function

2. Non-Linear Activation Functions
- Sigmoid Function
- Tanh Activation Function
- ReLU (Rectified Linear Unit) Function
3. Exponential Linear Units
- Soft max Function
- Soft Plus Function
Linear Activation Function
• Linear Activation Function resembles straight line define
by y=x. No matter how many layers the neural network
contains, if they all use linear activation functions, the
output is a linear combination of the input.
• The range of the output spans from (−∞ to +∞)(−∞ to +∞).
• Linear activation function is used at just one place i.e.
output layer.
• Using linear activation across all layers makes the
network's ability to learn complex patterns limited.
Linear Activation Function
Non-Linear Activation Functions
Sigmoid Function
• It is characterized by 'S' shape. It is mathematically defined
as A=1/(1+e−x). This formula ensures a smooth and
continuous output that is essential for gradient-based
optimization methods.
• It allows neural networks to handle and model complex
patterns that linear equations cannot.
• The output ranges between 0 and 1, hence useful for binary
classification.
• The function exhibits a steep gradient when x values are
between -2 and 2. This sensitivity means that small
changes in input x can cause significant changes in output
y, which is critical during the training process.
Sigmoid Function
Tanh Activation Function
• Tanh Function (hyperbolic tangent function), is a
shifted version of the sigmoid, allowing it to stretch across
the y-axis. It is defined as:
f(x)=tanh(x)=(2/1+e−2x) −1.
• Alternatively, it can be expressed using the sigmoid
function:
tanh(x)=2×sigmoid(2x)−1
• Value Range: Outputs values from -1 to +1.
• Non-linear: Enables modeling of complex data patterns.
• Use in Hidden Layers: Commonly used in hidden layers
due to its zero-centered output, facilitating easier learning
for subsequent layers.
Tanh Activation Function
ReLU (Rectified Linear Unit) Function
• ReLU is defined by A(x)=max(0,x), this means that if the
input x is positive, ReLU returns x, if the input is negative,
it returns 0.
• Value Range: [0,∞), meaning the function only outputs
non-negative values.
• Nature: It is a non-linear activation function, allowing
neural networks to learn complex patterns and making
back propagation more efficient.
• Advantage over other Activation: ReLU is less
computationally expensive than tanh and sigmoid because
it involves simpler mathematical operations. At a time only
a few neurons are activated making the network sparse
making it efficient and easy for computation.
ReLU (Rectified Linear Unit) Function
Exponential Linear Units
Softmax Function
• Softmax Function is designed to handle multi-class
classification problems. It transforms raw output scores
from a neural network into probabilities.
• It works by squashing the output values of each class into
the range of 0 to 1, while ensuring that the sum of all
probabilities equals 1.
• Softmax is a non-linear activation function.
• The Softmax function ensures that each class is assigned a
probability, helping to identify which class the input
belongs to.
Softmax Function
SoftPlus Function
• Softplus Activation Function is defined mathematically
as: A(x)=log(1+ ex).
• This equation ensures that the output is always positive
and differentiable at all points, which is an advantage over
the traditional ReLU function.
• Nature: The Softplus function is non-linear.
• Range: The function outputs values in the range (0,∞),
similar to ReLU, but without the hard zero threshold that
ReLU has.
• Smoothness: Softplus is a smooth, continuous function,
meaning it avoids the sharp discontinuities of ReLU,
which can sometimes lead to problems during
optimization.
SoftPlus Function
T
R
A
I
N
I
N
G
NN
STOchastic gradient descent

In training neural networks, we generally use online learning where we

are not given the whole sample, but we are given instances one by one
and would like the network to update its parameters after each
instance, adapting itself slowly with time. In online learning, we do not
write the error function over the whole sample but on individual
instances. Online training is favoured for a number of reasons:

1. It saves us the cost of storing the training sample in an external

memory and storing the intermediate results during optimization.

1. The problem may be changing in time, which means that the

sample distribution is not fixed, and a training set cannot be chosen
a priori.

1. There may be physical changes in the system. For example, in a

robotic system, the components of the system may wear out, or
sensors may degrade.
Cost Function

● Cost Function is a function that measures the performance of a model for any given
data. It quantifies the error between predicted values and expected values and presents
it in the form of a single real number.
● A common cost function for regression problems is the Mean Squared Error (MSE). It
calculates the average of the squared differences between predicted and actual values.

Where θ0 and θ1 are bias

and weight respectively.

hθ(x) is the predicted

output.

Xi and Yi are inputs and

actual output at ith index.
Gradient Descent

● In online learning, we do not write the error function over the whole sample but on
individual instances.
● Starting from
● random initial weights, at each iteration we adjust the parameters a little bit to minimize the
error, without forgetting what we have previously learned.
● If this error function is differentiable, we can use gradient descent.
● We descent through this error gradient till we reach the minima where the error is
minimum.

Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
Unit - 2
No ratings yet
Unit - 2
24 pages
The Introduction To Neural Networks 10 4 24
No ratings yet
The Introduction To Neural Networks 10 4 24
54 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
66 pages
Unit 1
No ratings yet
Unit 1
19 pages
ML Unit 5
No ratings yet
ML Unit 5
33 pages
Mod 2.1,2.2
No ratings yet
Mod 2.1,2.2
24 pages
Neural Networks
No ratings yet
Neural Networks
10 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
221 pages
Neural Deep Learning
No ratings yet
Neural Deep Learning
221 pages
Deep Learning
No ratings yet
Deep Learning
180 pages
Neural Network
No ratings yet
Neural Network
85 pages
Unit-Ii MLT1
No ratings yet
Unit-Ii MLT1
45 pages
Module 2
No ratings yet
Module 2
84 pages
Unit-4 Full
No ratings yet
Unit-4 Full
48 pages
ML-Lec10-Artificial Neural Networks
No ratings yet
ML-Lec10-Artificial Neural Networks
76 pages
Neural Network
No ratings yet
Neural Network
82 pages
This Document Is About Artificial Inteligence.
No ratings yet
This Document Is About Artificial Inteligence.
81 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Unit 6 Application of AI
No ratings yet
Unit 6 Application of AI
91 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
ML Lec11
No ratings yet
ML Lec11
14 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
51 pages
Advanced Supervised Learning
No ratings yet
Advanced Supervised Learning
17 pages
ML Unit 2
No ratings yet
ML Unit 2
23 pages
Basics
No ratings yet
Basics
48 pages
Isch 4
No ratings yet
Isch 4
44 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Lecture 7 - Neural Networks
No ratings yet
Lecture 7 - Neural Networks
48 pages
UNIT1
No ratings yet
UNIT1
72 pages
Neural Networks
100% (1)
Neural Networks
119 pages
Lesson 03 Artificial Neural Network
No ratings yet
Lesson 03 Artificial Neural Network
116 pages
Chapter 3-1 Neural Network
No ratings yet
Chapter 3-1 Neural Network
43 pages
Module 3
No ratings yet
Module 3
83 pages
Unit 5
No ratings yet
Unit 5
102 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Neural Networks: Some Material Adopted From Notes by
No ratings yet
Neural Networks: Some Material Adopted From Notes by
35 pages
4.0 The Complete Guide To Artificial Neural Networks
No ratings yet
4.0 The Complete Guide To Artificial Neural Networks
23 pages
Neural Networks
No ratings yet
Neural Networks
40 pages
Unit III
No ratings yet
Unit III
29 pages
Neural Network
No ratings yet
Neural Network
7 pages
Lesson 7.0 Supervised Learning With Neural Networks
No ratings yet
Lesson 7.0 Supervised Learning With Neural Networks
22 pages
Aimlf Unit4
No ratings yet
Aimlf Unit4
20 pages
ANN (Perceptron and Multilayerd Perceptron)
No ratings yet
ANN (Perceptron and Multilayerd Perceptron)
29 pages
ML - UNIT-1 &2 Notes
No ratings yet
ML - UNIT-1 &2 Notes
84 pages
6ee412 ch6 Neural DSP
No ratings yet
6ee412 ch6 Neural DSP
41 pages
Neural Networks
No ratings yet
Neural Networks
19 pages
Unit 3 Endsem PYQs
No ratings yet
Unit 3 Endsem PYQs
19 pages
Wk. 12. Artificial Neural Networks (12!05!2021)
No ratings yet
Wk. 12. Artificial Neural Networks (12!05!2021)
48 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
21 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
Lecture 5-Introduction To Neural Network
No ratings yet
Lecture 5-Introduction To Neural Network
42 pages
19 - Introduction To Neural Networks
No ratings yet
19 - Introduction To Neural Networks
7 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
51 pages
使用RNN预测股票下一日的收盘价
No ratings yet
使用RNN预测股票下一日的收盘价
14 pages
NLP Assignment 5
No ratings yet
NLP Assignment 5
5 pages
AI Crash Course For Beginners
No ratings yet
AI Crash Course For Beginners
60 pages
Stationary - Non-Stationary - White Noise Time Series
No ratings yet
Stationary - Non-Stationary - White Noise Time Series
21 pages
1 - Introduction, Need of Oop, Pop Vs Oop
No ratings yet
1 - Introduction, Need of Oop, Pop Vs Oop
17 pages
Exam P Formula Sheet
100% (4)
Exam P Formula Sheet
14 pages
Lab Manual Soft Computing
100% (1)
Lab Manual Soft Computing
44 pages
A Complete Introduction To Time Series Analysis (With R) - SARIMA Models
No ratings yet
A Complete Introduction To Time Series Analysis (With R) - SARIMA Models
26 pages
Deep Learning 20CSE21 - Previous Paper
No ratings yet
Deep Learning 20CSE21 - Previous Paper
2 pages
Greedy Layerwise Learning
No ratings yet
Greedy Layerwise Learning
39 pages
I. Time Series For 2000-2007 & Mean
No ratings yet
I. Time Series For 2000-2007 & Mean
7 pages
C++ Note
No ratings yet
C++ Note
99 pages
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
No ratings yet
Artificial Neural Networks: Prajith CA Associate Professor Ece, Cet
46 pages
CSE 425: Software Design and Pattern: Section 1
No ratings yet
CSE 425: Software Design and Pattern: Section 1
11 pages
Ardl Modeling Using R Software
No ratings yet
Ardl Modeling Using R Software
5 pages
Deep Learning (R20a06610)
No ratings yet
Deep Learning (R20a06610)
170 pages
Mcqs Prob 2
No ratings yet
Mcqs Prob 2
6 pages
Java OOPs Concepts
No ratings yet
Java OOPs Concepts
6 pages
Minimization of Finite Automata: Computer Engineering Department B. E. Iii, Co - E, 6 Semester (EVEN - 2018)
No ratings yet
Minimization of Finite Automata: Computer Engineering Department B. E. Iii, Co - E, 6 Semester (EVEN - 2018)
21 pages
ARIMA Models in Python Chapter2
No ratings yet
ARIMA Models in Python Chapter2
43 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
9-Normal Distribution
No ratings yet
9-Normal Distribution
28 pages
Tutorial-5 - Solution
No ratings yet
Tutorial-5 - Solution
3 pages
Digital Circuits - Finite State Machines
No ratings yet
Digital Circuits - Finite State Machines
3 pages
Adaptive Linear Neuron
No ratings yet
Adaptive Linear Neuron
4 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
TEMPLATE 4: The Lesson Structure
No ratings yet
TEMPLATE 4: The Lesson Structure
73 pages
CS 611 Slides 5
No ratings yet
CS 611 Slides 5
28 pages
Formal Languages Automata Thery PDF
No ratings yet
Formal Languages Automata Thery PDF
122 pages
Determine The Value For C and The Covariance and Correlation - Quizlet
No ratings yet
Determine The Value For C and The Covariance and Correlation - Quizlet
5 pages