0% found this document useful (0 votes)
15 views49 pages

Unit V

The document provides an overview of neural networks, focusing on their structure, including perceptrons, multilayer perceptrons, and various activation functions. It explains the learning process, types of neural networks, and the importance of non-linearity in modeling complex data. Additionally, it discusses training methods such as stochastic gradient descent and highlights the challenges faced by perceptrons in binary classification tasks.

Uploaded by

Vanshini R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views49 pages

Unit V

The document provides an overview of neural networks, focusing on their structure, including perceptrons, multilayer perceptrons, and various activation functions. It explains the learning process, types of neural networks, and the importance of non-linearity in modeling complex data. Additionally, it discusses training methods such as stochastic gradient descent and highlights the challenges faced by perceptrons in binary classification tasks.

Uploaded by

Vanshini R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Artificial Intelligence

1
UNIT V

NEURAL NETWORKS
Contents
• Perceptron - Multilayer perceptron, Activation functions,
network training
• Gradient descent optimization – stochastic gradient
descent,
• Error back propagation, from shallow networks to deep
networks
• Unit saturation (aka the vanishing gradient problem)
• ReLU, hyperparameter tuning
• batch normalization, regularization, dropout.
What is a Neural Network?
• Neural networks are machine learning models that mimic
the complex functions of the human brain.
• A neural network makes decisions in a manner similar to
the human brain, by using processes that mimic the way
biological neurons work together to identify phenomena,
weigh options and arrive at conclusions.
• These models consist of interconnected nodes or neurons
that process data, learn patterns, and enable tasks such as
pattern recognition and decision-making.
• Their ability to learn from vast amounts of data is
transformative, impacting technologies like natural
language processing, self-driving vehicles,
and automated decision-making.
● In neurons, the dendrite receives
electrical signals from the axons of
other neurons. while in the perceptron
these electrical signals are represented
as numerical values.
● At the synapses between the dendrite
and axons, electrical signals are
modulated in various amounts. This is
also modeled in the perceptron by
multiplying each input value by a value
called the weight.
● An actual neuron fires an output signal
only when the total strength of the
input signals exceed a certain
threshold. This phenomenon is
modeled in a perceptron by calculating
the weighted sum of the inputs to
represent the total strength of the input
signals, and applying a threshold
function on the sum to determine its
output.
Perceptron
• A perceptron is a single processing unit of a neural network.
• Associated with each input, Xj ∊ R, j = 1, ... , d, is a connection
weight, or synaptic weight Wj ∊ R, and the output, y, in the simplest
case is a weighted sum of the inputs
• A perceptron can be defined as a single artificial neuron that
computes its weighted input with the help of the threshold activation
functions.
• They are used for Binary Classification.
• Neural networks are capable of learning and identifying
patterns directly from data without pre-defined rules.
These networks are built from several key components:
• Neurons: The basic units that receive inputs, each neuron
is governed by a threshold and an activation function.
• Connections: Links between neurons that carry
information, regulated by weights and biases.
• Weights and Biases: These parameters determine the
strength and influence of connections.
• Propagation Functions: Mechanisms that help process
and transfer data across layers of neurons.
• Learning Rule: The method that adjusts weights and
biases over time to improve accuracy.
Learning in neural networks follows a structured,
three-stage process:

• Input Computation: Data is fed into the network.


• Output Generation: Based on the current parameters, the
network generates an output.
• Iterative Refinement: The network refines its output by
adjusting weights and biases, gradually improving its
performance on diverse tasks.
Perceptron : Mathematical Representation
• The process of summing the weighted products of inputs
in a perceptron is known as Multiple-Accumulate
Process. This is mathematically represented as matrix
multiplication
• Since the inputs and weights are represented as 1-D
matrices for a perceptron, they are often called as vectors.
Example
• Problem : whether you should go surfing (Yes: 1, No: 0).
The decision to go or not to go is our predicted outcome,
y.
• Let’s assume that there are three factors influencing your
decision-making:
- Are the waves good? (Yes: 1, No: 0)
- Is the line-up empty? (Yes: 1, No: 0)
- Has there been a recent shark attack? (Yes: 0, No: 1)
Then, let’s assume the following, giving us the following
inputs:
• X1 = 1, since the waves are pumping
• X2 = 0, since the crowds are out
• X3 = 1, since there hasn’t been a recent shark attack

Now, we need to assign some weights to determine


importance. Larger weights signify that particular variables
are of greater importance to the decision or outcome.
• W1 = 5, since large swells don’t come around often
• W2 = 2, since you’re used to the crowds
• W3 = 4, since you have a fear of sharks
• Finally, we’ll also assume a threshold value of 3, which
would translate to a bias value of –3. With all the various
inputs, we can start to plug in values into the formula to get
the desired output.
Y = (1*5) + (0*2) + (1*4) – 3 = 6
• If we use the activation function from the beginning of this
section, we can determine that the output of this node
would be 1, since 6 is greater than 0.
• In this instance, you would go surfing; but if we adjust the
weights or the threshold, we can achieve different
outcomes from the model.
Layers in Neural Network Architecture
Layers in Neural Network Architecture
• Input Layer: This is where the network receives its input
data. Each input neuron in the layer corresponds to a
feature in the input data.
• Hidden Layers: These layers perform most of the
computational heavy lifting. A neural network can have
one or multiple hidden layers. Each layer consists of units
(neurons) that transform the inputs into something that the
output layer can use.
• Output Layer: The final layer produces the output of the
model. The format of these outputs varies depending on
the specific task (e.g., classification, regression).
Types of neural networks
• The perceptron is the oldest neural network, created by
Frank Rosenblatt in 1958.
• Feed forward neural networks, or multi-layer perceptrons
(MLPs), are recently being used. They are comprised of an
input layer, a hidden layer or layers, and an output layer.
• While these neural networks are also commonly referred to
as MLPs, it’s important to note that they are actually
comprised of sigmoid neurons, not perceptrons, as most
real-world problems are nonlinear.
• Data usually is fed into these models to train them, and
they are the foundation for computer vision, natural
language processing and other neural networks.
• Convolutional Neural Networks(CNNs) are similar to
feed forward networks, but they’re usually utilized for
image recognition, pattern recognition, and/or computer
vision.
• These networks harness principles from linear algebra,
particularly matrix multiplication, to identify patterns
within an image.
• Recurrent Neural Networks (RNNs) are identified by
their feedback loops.
• These learning algorithms are primarily leveraged when
using time-series data to make predictions about future
outcomes, such as stock market predictions or sales
forecasting.
FEED-Forward Network
• A feed-forward network usually constitutes multiple inputs
and multiple outputs, as shown below, with multiple feed
forward paths.
• In such a network, the weight vector becomes a mxn
weight matrix
FEED-Forward Network

• The weight matrix can be modelled as a column


matrix of multiple row vectors constituting weights
corresponding to each output.
FEED-Forward Network
• In the below the matrix operation u1, u2 and u3
represents inputs and v1, v2 and v3 represent the
outputs.
Training a perceptron
The perceptron defines a hyper plane, and the neural network
perceptron is just a way of implementing the hyper plane. The
process of training a perceptron includes the following two
tasks:

1. Find suitable weights in such a way that the training


examples are correctly classified.

2. Geometrically try to find an hyper plane that separates the


examples into two different classes.
SINGLE LAYER PERCEPTRON

21
Cons
• Perceptron can only perform binary classification.

• Perceptron is able to offer accurate results only if the


analyzed data can be linearly separable, i.e, they can only
create linear decision boundaries that offer less MSE.

• When confronted with noisy data, the perceptron


algorithm often exhibits a notable lack of robustness.

• The perceptron algorithm is notably sensitive to feature


scaling, meaning that the relative magnitude of input
features can greatly impact its performance.
Logic Gate Implementation Using Perceptron
Multi Layer Perceptron / Neural network

Modeling non-linear decision


boundaries

24
MULTI-LAYER PERCEPTRON
• Implementing AND, OR and NOT gates using single layer
perceptron is simple. But exclusive-OR cannot be
implemented due to its non-linear nature.

• Multi layer perceptron (MLP) is used to model such non-


linear decision boundaries.
MULTI LAYER PERCEPTRON
● A simple multilayer perceptron contains a input layer, a
hidden layer and a output layer.
● Activation functions in hidden layers enable solving
complex problems and create non-linear decision
boundaries.
● A multi layer perceptron or neural network is a ground of
artificial neurons designed to recognize pattern and make
decisions based on input data.
Activation Functions
• While building a neural network, one key decision is
selecting the Activation Function for both the hidden layer
and the output layer.
• Activation functions decide whether a neuron should be
activated.
• An activation function is a mathematical function applied
to the output of a neuron.
• It introduces non-linearity into the model, allowing the
network to learn and represent complex patterns in the
data.
• Without this non-linearity feature, a neural network would
behave like a linear regression model, no matter how many
layers it has.
• Activation function decides whether a neuron should be
activated by calculating the weighted sum of inputs and
adding a bias term.
• This helps the model make complex decisions and
predictions by introducing non-linearities to the output of
each neuron.
• Non-linearity means that the relationship between input
and output is not a straight line.
• In simple terms, the output does not change
proportionally with the input. A common choice is the
ReLU function, defined as σ(x)=max(0,x).
• If we use a linear function, it can only separate them
using a straight line.
• But real-world data is often more complex (e.g.,
overlapping colors, different lighting).
• By adding a non-linear activation function (like ReLU,
Sigmoid, or Tanh), the network can create curved
decision boundaries to separate them correctly.
Why is Non-Linearity Important in Neural
Networks?
• Neural networks consist of neurons that operate
using weights, biases, and activation functions.
• In the learning process, these weights and biases are
updated based on the error produced at the output—a
process known as back propagation.
• Activation functions enable backpropagation by providing
gradients that are essential for updating the weights and
biases.
• Without non-linearity, even deep networks would be
limited to solving only simple, linearly separable problems.
Activation functions empower neural networks to model
highly complex data distributions and solve advanced deep
learning tasks.
Mathematical Proof of Need of Non-
Linearity in Neural Networks
Types of Activation Functions

1. Linear Activation Function


2. Non-Linear Activation Functions
- Sigmoid Function
- Tanh Activation Function
- ReLU (Rectified Linear Unit) Function
3. Exponential Linear Units
- Soft max Function
- Soft Plus Function
Linear Activation Function
• Linear Activation Function resembles straight line define
by y=x. No matter how many layers the neural network
contains, if they all use linear activation functions, the
output is a linear combination of the input.
• The range of the output spans from (−∞ to +∞)(−∞ to +∞).
• Linear activation function is used at just one place i.e.
output layer.
• Using linear activation across all layers makes the
network's ability to learn complex patterns limited.
Linear Activation Function
Non-Linear Activation Functions
Sigmoid Function
• It is characterized by 'S' shape. It is mathematically defined
as A=1/(1+e−x)​​. This formula ensures a smooth and
continuous output that is essential for gradient-based
optimization methods.
• It allows neural networks to handle and model complex
patterns that linear equations cannot.
• The output ranges between 0 and 1, hence useful for binary
classification.
• The function exhibits a steep gradient when x values are
between -2 and 2. This sensitivity means that small
changes in input x can cause significant changes in output
y, which is critical during the training process.
Sigmoid Function
Tanh Activation Function
• Tanh Function (hyperbolic tangent function), is a
shifted version of the sigmoid, allowing it to stretch across
the y-axis. It is defined as:
f(x)=tanh(x)=(2/1+e−2x) ​−1.
• Alternatively, it can be expressed using the sigmoid
function:
tanh(x)=2×sigmoid(2x)−1
• Value Range: Outputs values from -1 to +1.
• Non-linear: Enables modeling of complex data patterns.
• Use in Hidden Layers: Commonly used in hidden layers
due to its zero-centered output, facilitating easier learning
for subsequent layers.
Tanh Activation Function
ReLU (Rectified Linear Unit) Function
• ReLU is defined by A(x)=max(0,x), this means that if the
input x is positive, ReLU returns x, if the input is negative,
it returns 0.
• Value Range: [0,∞), meaning the function only outputs
non-negative values.
• Nature: It is a non-linear activation function, allowing
neural networks to learn complex patterns and making
back propagation more efficient.
• Advantage over other Activation: ReLU is less
computationally expensive than tanh and sigmoid because
it involves simpler mathematical operations. At a time only
a few neurons are activated making the network sparse
making it efficient and easy for computation.
ReLU (Rectified Linear Unit) Function
Exponential Linear Units
Softmax Function
• Softmax Function is designed to handle multi-class
classification problems. It transforms raw output scores
from a neural network into probabilities.
• It works by squashing the output values of each class into
the range of 0 to 1, while ensuring that the sum of all
probabilities equals 1.
• Softmax is a non-linear activation function.
• The Softmax function ensures that each class is assigned a
probability, helping to identify which class the input
belongs to.
Softmax Function
SoftPlus Function
• Softplus Activation Function is defined mathematically
as: A(x)=log(1+ ex).
• This equation ensures that the output is always positive
and differentiable at all points, which is an advantage over
the traditional ReLU function.
• Nature: The Softplus function is non-linear.
• Range: The function outputs values in the range (0,∞),
similar to ReLU, but without the hard zero threshold that
ReLU has.
• Smoothness: Softplus is a smooth, continuous function,
meaning it avoids the sharp discontinuities of ReLU,
which can sometimes lead to problems during
optimization.
SoftPlus Function
T
R
A
I
N
I
N
G
NN
STOchastic gradient descent

In training neural networks, we generally use online learning where we


are not given the whole sample, but we are given instances one by one
and would like the network to update its parameters after each
instance, adapting itself slowly with time. In online learning, we do not
write the error function over the whole sample but on individual
instances. Online training is favoured for a number of reasons:

1. It saves us the cost of storing the training sample in an external


memory and storing the intermediate results during optimization.

1. The problem may be changing in time, which means that the


sample distribution is not fixed, and a training set cannot be chosen
a priori.

1. There may be physical changes in the system. For example, in a


robotic system, the components of the system may wear out, or
sensors may degrade.
Cost Function

● Cost Function is a function that measures the performance of a model for any given
data. It quantifies the error between predicted values and expected values and presents
it in the form of a single real number.
● A common cost function for regression problems is the Mean Squared Error (MSE). It
calculates the average of the squared differences between predicted and actual values.

Where θ0 and θ1 are bias


and weight respectively.

hθ(x) is the predicted


output.

Xi and Yi are inputs and


actual output at ith index.
Gradient Descent

● In online learning, we do not write the error function over the whole sample but on
individual instances.
● Starting from
● random initial weights, at each iteration we adjust the parameters a little bit to minimize the
error, without forgetting what we have previously learned.
● If this error function is differentiable, we can use gradient descent.
● We descent through this error gradient till we reach the minima where the error is
minimum.

You might also like