0% found this document useful (0 votes)
18 views19 pages

Unit-5 AI

Uploaded by

Jagathdhathri KR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views19 pages

Unit-5 AI

Uploaded by

Jagathdhathri KR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

ARTIFICIAL NEURAL NETWORKS

INTRODUCTION
Neural network learning methods provide a robust approach to approximating real-valued, discrete-
valued, and vector-valued target functions. For certain types of problems, such as learning to interpret
complex real-world sensor data, artificial neural networks are among the most effective learning methods
currently known.
Biological Motivation
The study of artificial neural networks (ANNs) has been inspired in part by the observation that biological
learning systems are built of very complex webs of interconnected neurons.
Artificial neural networks are built out of a densely interconnected set of simple units, where each unit
takes a number of real-valued inputs (possibly the outputs of other units) and produces a single real-
valued output (which may become the input to many other units)
Consider a few facts from neurobiology
 Neuron switching time ~ :001 second
 Number of neurons ~ 1011
 Connections per neuron ~ 104-5
 Scene recognition time ~ :0.1 second
 100 inference steps doesn't seem like enough
 much parallel computation
Properties of artificial neural nets (ANN's):
 Many neuron-like threshold switching units
 Many weighted interconnections among units
 Highly parallel, distributed process
 Emphasis on tuning weights automatically
Historically, two groups of researchers have worked with artificial neural networks.
 One group has been motivated by the goal of using ANNs to study and model biological learning
processes.
 A second group has been motivated by the goal of obtaining highly effective machine learning
algorithms, independent of whether these algorithms mirror biological processes.
NEURAL NETWORK REPRESENTATIONS
A prototypical example of ANN learning is provided by Pomerleau's (1993) system ALVINN, which uses
a learned ANN to steer an autonomous vehicle driving at normal speeds on public highways.
The input to the neural network is a 30 x 32 grid of pixel intensities obtained from a forward-pointed
camera mounted on the vehicle.
The network output is the direction in which the vehicle is steered.
The ANN is trained to mimic the observed steering commands of a human driving the vehicle for
approximately 5 minutes.
ALVINN has used its learned networks to successfully drive at speeds up to 70 miles per hour and for
distances of 90 miles on public highways (driving in the left lane of a divided public highway, with other
vehicles present)
Neural network learning to steer an autonomous vehicle.
The ALVINN system uses BACKPROPAGATION to learn to steer an autonomous vehicle (photo at top)
driving at speeds up to 70 miles per hour. The diagram on the left shows how the image of a forward-
mounted camera is mapped to 960 neural network inputs, which are fed forward to 4 hidden units,
connected to 30 output units. Network outputs encode the commanded steering direction

It is appropriate for problems with the following characteristics:


 Instances are represented by many attribute-value pairs.
The target function to be learned is defined over instances that can be described by a vector of predefined
features, such as the pixel values in the ALVINN example. These input attributes may be highly
correlated or independent of one another. Input values can be any real values.
 The target function output may be discrete-valued, real-valued, or a vector of several real-
or discrete-valued attributes.
For example, in the ALVINN system the output is a vector of 30 attributes, each corresponding to a
recommendation regarding the steering direction. The value of each output is some real number between
0 and 1, which in this case corresponds to the confidence in predicting the corresponding steering
direction. We can also train a single network to output both the steering command and suggested
acceleration, simply by concatenating the vectors that encode these two output predictions.
 The training examples may contain errors.
ANN learning methods are quite robust to noise in the training data.
 Long training times are acceptable.
Network training algorithms typically require longer training times than, say, decision tree learning
algorithms. Training times can range from a few seconds to many hours, depending on factors such as the
number of weights in the network, the number of training examples considered, and the settings of
various learning algorithm parameters.
 Fast evaluation of the learned target function may be required.
Although ANN learning times are relatively long, evaluating the learned network, in order to apply it to a
subsequent instance, is typically very fast. For example, ALVINN applies its neural network several times
per second to continually update its steering command as the vehicle drives forward.
 The ability of humans to understand the learned target function is not important.
The weights learned by neural networks are often difficult for humans to interpret. Learned neural
networks are less easily communicated to humans than learned rules.
PERCEPTRON (Single layer neural network)
 One type of ANN system based on a unit called a ‘Perceptron’
 A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs,
 then outputs a 1 if the result is greater than or equal to some threshold and -1 otherwise.
Given inputs xl through xn the output o(x1, . . . , xn) computed by the perceptron is
o(x1,. ..,xn) = 1 if wo + wlxl+ w2x2 + - . + wnxn >= 0
-1 otherwise
 where each wi is a real-valued constant, or weight, that determines the contribution of input xi to
the perceptron output.
 The quantity (wO) is a threshold or bias
To simplify notation, we imagine an additional constant input xo = 1, allowing us to write the above
inequality as

C:= wixi >= 0, (i=0 to n: no.of inputs)


 The space H of candidate hypotheses considered in perceptron learning is the set of all possible
real-valued weight vectors
Representation of Perceptron

 We can view the perceptron as representing a hyper plane decision surface in the n-dimensional
space of instances (i.e., points).
 The perceptron outputs a 1 for instances lying on one side of the hyperplane and outputs a -1 for
instances lying on the other side.
 The Equation for the hyper plane is w.x=0
Some set of points cannot be separated by a linear hyper plane .those points that are separable are
called linearly separable
Perceptron’s can represent all of the primitive boolean functions AND, OR, NAND and NOR
Unfortunately, however, some Boolean functions cannot be represented by a single perceptron, such as
the XOR function whose value is 1 if and only if xl != x2.
The Perceptron Training Rule
Here the precise learning problem is to determine a weight vector that causes the perceptron to produce
the correct output +1 or -1 for each of the given training examples.
Several algorithms are known to solve this learning problem.
Here we consider two:
 The perceptron rule
 The delta rule (a variant of the LMS rule).

Perceptron rule
 One way to learn an acceptable weight vector is to begin with random weights,

 Iteratively apply the perceptron to each training example,

 Modifying the perceptron weights whenever it misclassifies an example.

 This process is repeated, iterating through the training examples as many times as needed until
the perceptron classifies all training examples correctly.

 Weights are modified at each step according to the perceptron training rule, which revises the
weight wi associated with input xi according to the rule
 Here ‘t’ is the target output for the current training example, ‘o’ is the output generated by the
perceptron, and ‘ƞ’ is a positive constant called the learning rate.
 Suppose the training example is correctly classified already by the perceptron. In this case, (t - o)
is zero, making Δwi zero, so that no weights are updated.

 Suppose the perceptron outputs a -1, when the target output is + 1. To make the perceptron output
a + 1 instead of – 1, the weights must be increased

 On the other hand, if t = -1 and o = 1, then weights associated with positive xi will be decreased
rather than increased.
In fact, the above learning procedure can be proven to converge,
 Provided the training examples are linearly separable
 Provided a sufficiently small ƞ is used.
If the data are not linearly separable, convergence is not assured.

Example:or gate
Perceptron for or function:
Another set of weight values for ‘or’ gate: W1=W2=0.5, w0= -0.3
Set of weight values for ‘And’ gate: W1=W2=0.5, w0= -0.3 (Does not work)
-0.8(works)

XOR Function
Artificial Neural Networks
Artificial Neural Networks contain artificial neurons which are called units . These units are arranged in a
series of layers that together constitute the whole Artificial Neural Network in a system. A layer can have
only a dozen units or millions of units as this depends on how the complex neural networks will be
required to learn the hidden patterns in the dataset. Commonly, Artificial Neural Network has an input
layer, an output layer as well as hidden layers. The input layer receives data from the outside world which
the neural network needs to analyze or learn about. Then this data passes through one or multiple hidden
layers that transform the input into data that is valuable for the output layer. Finally, the output layer
provides an output in the form of a response of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these
connections has weights that determine the influence of one unit on another unit. As the data transfers
from one unit to another, the neural network learns more and more about the data which eventually results
in an output from the output layer.

Artificial neurons vs Biological neurons


The concept of artificial neural networks comes from biological neurons found in animal brains So they
share a lot of similarities in structure and function wise.
 Structure : The structure of artificial neural networks is inspired by biological neurons. A
biological neuron has a cell body or soma to process the impulses, dendrites to receive them, and
an axon that transfers them to other neurons. The input nodes of artificial neural networks
receive input signals, the hidden layer nodes compute these input signals, and the output layer
nodes compute the final output by processing the hidden layer’s results using activation
functions.

Biological Neuron Artificial Neuron

Dendrite Inputs

Cell nucleus or
Nodes
Soma

Synapses Weights

Axon Output

 Synapses : Synapses are the links between biological neurons that enable the transmission of
impulses from dendrites to the cell body. Synapses are the weights that join the one-layer nodes
to the next-layer nodes in artificial neurons. The strength of the links is determined by the weight
value.
 Learning : In biological neurons, learning happens in the cell body nucleus or soma, which has a
nucleus that helps to process the impulses. An action potential is produced and travels through the
axons if the impulses are powerful enough to reach the threshold. This becomes possible by
synaptic plasticity, which represents the ability of synapses to become stronger or weaker over
time in reaction to changes in their activity. In artificial neural networks, backpropagation is a
technique used for learning, which adjusts the weights between nodes according to the error or
differences between predicted and actual outcomes.

Biological Neuron Artificial Neuron

Synaptic
Backpropagations
plasticity

 Activation : In biological neurons, activation is the firing rate of the neuron which happens when
the impulses are strong enough to reach the threshold. In artificial neural networks, A
mathematical function known as an activation function maps the input to the output, and executes
activations.

Biological neurons to Artificial neurons


How do Artificial Neural Networks learn?
Artificial neural networks are trained using a training set. For example, suppose you want to teach an
ANN to recognize a cat. Then it is shown thousands of different images of cats so that the network can
learn to identify a cat. Once the neural network has been trained enough using images of cats, then you
need to check if it can identify cat images correctly. This is done by making the ANN classify the images
it is provided by deciding whether they are cat images or not. The output obtained by the ANN is
corroborated by a human-provided description of whether the image is a cat image or not. If the ANN
identifies incorrectly then back-propagation is used to adjust whatever it has learned during
training. Backpropagation is done by fine-tuning the weights of the connections in ANN units based on
the error rate obtained. This process continues until the artificial neural network can correctly recognize a
cat in an image with minimal possible error rates.
What are the types of Artificial Neural Networks?
 Feedforward Neural Network: The feedforward neural network is one of the most basic
artificial neural networks. In this ANN, the data or the input provided travels in a single direction.
It enters into the ANN through the input layer and exits through the output layer while hidden
layers may or may not exist. So the feedforward neural network has a front-propagated wave only
and usually does not have backpropagation.
 Convolutional Neural Network: A Convolutional neural network has some similarities to the
feed-forward neural network, where the connections between units have weights that determine
the influence of one unit on another unit. But a CNN has one or more than one convolutional
layer that uses a convolution operation on the input and then passes the result obtained in the
form of output to the next layer. CNN has applications in speech and image processing which is
particularly useful in computer vision.
 Modular Neural Network: A Modular Neural Network contains a collection of different neural
networks that work independently towards obtaining the output with no interaction between them.
Each of the different neural networks performs a different sub-task by obtaining unique inputs
compared to other networks. The advantage of this modular neural network is that it breaks down
a large and complex computational process into smaller components, thus decreasing its
complexity while still obtaining the required output.
 Radial basis function Neural Network: Radial basis functions are those functions that consider
the distance of a point concerning the center. RBF functions have two layers. In the first layer, the
input is mapped into all the Radial basis functions in the hidden layer and then the output layer
computes the output in the next step. Radial basis function nets are normally used to model the
data that represents any underlying trend or function.
 Recurrent Neural Network: The Recurrent Neural Network saves the output of a layer and
feeds this output back to the input to better predict the outcome of the layer. The first layer in the
RNN is quite similar to the feed-forward neural network and the recurrent neural network starts
once the output of the first layer is computed. After this layer, each unit will remember some
information from the previous step so that it can act as a memory cell in performing
computations.
Applications of Artificial Neural Networks
1. Social Media: Artificial Neural Networks are used heavily in Social Media. For example, let’s
take the ‘People you may know’ feature on Facebook that suggests people that you might know in
real life so that you can send them friend requests. Well, this magical effect is achieved by using
Artificial Neural Networks that analyze your profile, your interests, your current friends, and also
their friends and various other factors to calculate the people you might potentially know.
Another common application of Machine Learning in social media is facial recognition . This is
done by finding around 100 reference points on the person’s face and then matching them with
those already available in the database using convolutional neural networks.
2. Marketing and Sales: When you log onto E-commerce sites like Amazon and Flipkart, they will
recommend your products to buy based on your previous browsing history. Similarly, suppose
you love Pasta, then Zomato, Swiggy, etc. will show you restaurant recommendations based on
your tastes and previous order history. This is true across all new-age marketing segments like
Book sites, Movie services, Hospitality sites, etc. and it is done by implementing personalized
marketing . This uses Artificial Neural Networks to identify the customer likes, dislikes, previous
shopping history, etc., and then tailor the marketing campaigns accordingly.
3. Healthcare : Artificial Neural Networks are used in Oncology to train algorithms that can identify
cancerous tissue at the microscopic level at the same accuracy as trained physicians. Various rare
diseases may manifest in physical characteristics and can be identified in their premature stages
by using Facial Analysis on the patient photos. So the full-scale implementation of Artificial
Neural Networks in the healthcare environment can only enhance the diagnostic abilities of
medical experts and ultimately lead to the overall improvement in the quality of medical care all
over the world.
4. Personal Assistants: I am sure you all have heard of Siri, Alexa, Cortana, etc., and also heard
them based on the phones you have!!! These are personal assistants and an example of speech
recognition that uses Natural Language Processing to interact with the users and formulate a
response accordingly. Natural Language Processing uses artificial neural networks that are made
to handle many tasks of these personal assistants such as managing the language syntax,
semantics, correct speech, the conversation that is going on, etc.

Connections Model In artificial neural networks (ANNs), the connections model plays a critical role in d
efining how neurons (nodes) are linked and how information is passed between them. These models can b
e categorized as fully connected, where each neuron connects to every neuron in the next layer, or sparsel
y connected, which limits connections to reduce complexity and computation. Weights assigned to these c
onnections determine the strength and direction of the signals, which are adjusted during training to optim
ize performance. The architecture of these connections influences the network's ability to learn patterns, g
eneralize from data, and solve complex problems, making it fundamental to the design and efficiency of
ANNs.

Neuron Modelling for the ANN Neuron modelling in artificial neural networks (ANNs) mimics the beha
vior of biological neurons. Each artificial neuron receives inputs, processes them through an activation fu
nction, and generates an output. This process involves summing weighted inputs and applying a non-
linear activation function to determine the neuron's firing rate. The primary goal is to simulate the decisio
nmaking process of biological neurons, enabling the ANN to learn and adapt. Neurons in ANNs are typic
ally organized into layers: input, hidden, and output. The structure and functioning of these neurons are cr
ucial for the network's ability to perform tasks like classification, regression, and pattern recognition.

Activation Function Activation functions in neural networks introduce nonlinearity, enabling the networ
k to model complex relationships and learn intricate patterns in the data. They determine whether a neuro
n should be activated based on the weighted sum of its inputs. Common activation functions include the si
gmoid, tanh, and ReLU (Rectified Linear Unit). These functions help normalize the output of each neuron
to a range that is manageable and meaningful for subsequent layers. Choosing the right activation functio
n is crucial for the network's performance, as it impacts the learning efficiency, convergence rate, and ove
rall accuracy of the model.

Sigmoidal Characteristics Sigmoidal activation functions, such as the logistic sigmoid, are characterized
by their Sshaped curve. They map input values to an output range between 0 and 1, making them useful f
or binary classification tasks. The sigmoid function smoothens the input and is differentiable, allowing for
backpropagation during training. However, they suffer from the vanishing gradient problem, where gradie
nts become too small for effective learning in deep networks. Despite this drawback, their probabilistic int
erpretation and simplicity make them foundational in early neural network architectures.

Bipolar Binary and Unipolar Binary Bipolar binary and unipolar binary refer to the types of inputs and
outputs used in binary neural networks. In bipolar binary, inputs and outputs are typically represented as (
Negative)1 and +1, which can be advantageous in certain signal processing applications. Unipolar binary,
on the other hand, uses 0 and 1 for input and output representation. The choice between bipolar and unipo
lar binary affects the activation function and the learning algorithm. Bipolar representations can sometime
s lead to faster convergence in learning algorithms due to their symmetrical nature.

Hard Limiting Activation and Soft Limiting Activation Function Hard limiting activation functions pr
oduce binary outputs, typically 0 or 1, based on whether the input exceeds a certain threshold. They are si
mple and computationally efficient but lack flexibility in capturing nuanced patterns. In contrast, soft limi
ting activation functions, such as the sigmoid or tanh, output a range of values (e.g., between 0 and 1 or -1
and 1). These functions provide a smoother transition between output states, enabling the network to mod
el more complex relationships and improve gradient-based optimization.

Comparison Between Brain vs. Computer in Neural Network The comparison between biological brai
ns and artificial neural networks (ANNs) highlights significant differences and similarities. Biological neu
rons communicate through electrochemical signals and adapt based on experience, exhibiting high levels
of parallelism and energy efficiency. ANNs, inspired by this biological process, use mathematical models
to simulate neuron behavior and learning. However, ANNs operate on digital hardware and often require
substantial computational power and data. While brains excel in adaptability, generalization, and efficienc
y, ANNs provide speed, precision, and the ability to process vast amounts of data. The study of these diff
erences continues to drive advancements in AI and neuroscience.

Classification Based on Connections (Feedforward, Feedback, Recurrent) Neural networks can be cla
ssified based on their connection patterns into feedforward, feedback, and recurrent networks. Feedforwar
d networks have a unidirectional flow of information from input to output, with no cycles, making them s
uitable for tasks like classification and regression. Feedback networks, also known as recurrent neural net
works (RNNs), allow connections to loop back, enabling them to maintain memory and process sequentia
l data. Recurrent networks are essential for tasks requiring temporal context, such as language modeling a
nd timeseries prediction. These classifications determine the network's capability to handle different types
of data and applications effectively.

Single Layer Feedforward Network Diagram:


A single-layer feedforward neural network consists of an input layer and an output layer. The input
layer receives raw data, which is then passed to the output layer after being weighted. The activation
function determines the output by applying a threshold function like step, sigmoid, or ReLU. This type of
network is mainly used for linear classification tasks. The diagram typically shows nodes in the input
layer connected to each node in the output layer via weighted connections. Each connection adjusts
during training to minimize the error between predicted and actual outcomes.

Neural Processing:

Neural processing refers to how artificial neural networks mimic the functioning of biological neural
systems. It involves receiving input data, processing it through neurons (nodes), and adjusting the weights
of connections between them to achieve a specific output. This process depends on activation functions,
which determine the neuron's firing state, and learning rules that update weights based on errors in
predictions. The objective is to create a network that can generalize and recognize patterns or
relationships in the input data, leading to accurate predictions or classifications.

Heteroassociation:

Heteroassociation is a type of associative memory where a pattern in one set of data (input) is associated
with a different pattern in another set (output). In neural networks, it is used to learn relationships
between distinct sets of input and output patterns. For example, a network trained to translate English
words into Spanish is performing heteroassociation. This process often relies on Hebbian learning, where
the strength of the connection between two neurons increases if they are activated together frequently. It's
widely used in machine learning tasks such as translation, pattern recognition, and mapping between data
sets.

Neural Processing - Classification:

In neural processing for classification, a neural network learns to categorize input data into different
classes. During training, the network is exposed to labeled data, and the goal is to minimize the difference
between the predicted and actual class labels. Techniques like backpropagation are used to adjust the
weights of neurons so that the network can distinguish between classes more effectively. The
performance is measured by metrics such as accuracy, precision, and recall. Common applications
include image classification, spam detection, and sentiment analysis.

Threshold:

A threshold in neural networks is a value that determines whether a neuron should activate or "fire." In
many networks, an activation function (like the step or sigmoid function) compares the weighted sum of
inputs to a threshold. If the sum exceeds the threshold, the neuron outputs a signal (often 1 or another
high value); otherwise, it outputs zero. This mechanism allows neurons to decide whether the input is
significant enough to pass forward, contributing to the overall decision-making process in tasks like
classification and pattern recognition.

Learning Rate:
The learning rate is a key hyperparameter in neural networks, controlling the size of the weight updates
during training. A small learning rate means the model will make tiny adjustments to the weights,
resulting in slow convergence but potentially better accuracy. A large learning rate leads to faster learning
but risks overshooting the optimal solution, which can cause the model to diverge or oscillate. Optimizing
the learning rate is critical for achieving a balance between speed and precision in training.

Momentum Factor:

The momentum factor in neural networks is a technique used to accelerate the gradient descent
optimization process. By considering the direction of previous weight updates, momentum helps smooth
out oscillations and avoid getting stuck in local minima. It acts like a ball rolling down a hill, where the
speed increases in consistent directions. The momentum factor, typically a value between 0 and 1,
controls how much of the previous update influences the current one, allowing for faster convergence and
improved stability during training.

Vigilance Parameter:

The vigilance parameter is used in adaptive resonance theory (ART) networks, which are a type of
neural network designed for stable learning in a changing environment. The vigilance parameter controls
the level of similarity required for an input to be classified as part of an existing category. A high
vigilance value means the network will only accept inputs that closely match existing patterns, leading to
more refined categories. Conversely, a low vigilance value allows for broader generalizations, grouping
more diverse inputs into the same category.

SINGLE LAYER FEED-FORWARD NETWORK

A single-layer feedforward network (SLFN) is one of the simplest types of artificial neural networks. It
consists of two layers: an input layer and an output layer, with no hidden layers in between. Here’s a
breakdown of the key aspects:

Structure:

 Input Layer: The input layer is where data enters the network. Each input neuron represents a
feature or attribute of the data.

 Output Layer: The output neurons produce the final result after processing. In the case of
classification, each output neuron corresponds to a class.

 Weights: The connections between the input and output layers are weighted. These weights
determine the importance of each input feature in the decision-making process.

Information Flow:

 The network is "feedforward," meaning that data moves in a single direction—from the input to
the output. There is no feedback or looping, so information flows straight through the network.

Activation Function:
 Each output neuron applies an activation function (e.g., step, sigmoid, or ReLU) to the weighted
sum of the inputs to produce an output. The activation function introduces non-linearity, helping
the network model complex relationships.

Learning Process:

 Supervised Learning: The network is typically trained using labeled data, where the inputs are
paired with their correct outputs. The learning algorithm adjusts the weights based on the error
between the predicted and actual outputs.

 Error Minimization: Using methods like gradient descent, the network iteratively reduces the
error by adjusting weights.

Limitations:

 Linear Separability: A single-layer network can only solve problems that are linearly separable,
meaning it can separate the data into distinct classes using a straight line or hyperplane. Complex
problems (like XOR) cannot be solved by an SLFN.

 No Hidden Layers: Without hidden layers, the network lacks the ability to learn more abstract
features from the data.

Applications:

 SLFNs are commonly used for simple tasks such as binary classification, pattern recognition,
or linear regression.

In summary, while a single-layer feedforward network is limited in complexity, it offers a foundation for
understanding more advanced networks like multi-layer perceptron’s (MLPs). It’s most useful for
straightforward problems where linear relationships between inputs and outputs are sufficient.

MULTI LAYER FEED FORWARD NETWORK


A Multi-Layer Feedforward Network (MLFN), also called a multi-layer perceptron (MLP), is a type of
artificial neural network with multiple layers of neurons organized in a feedforward structure. It is an
extension of the single-layer network and can solve complex problems due to its ability to model non-
linear relationships.

Structure:
1. Input Layer:

o This layer receives the input data. Each neuron in the input layer represents a feature or
attribute of the data.

2. Hidden Layers:

o The hidden layers are where the network performs most of its computations. An MLFN
can have one or more hidden layers, where each layer's neurons take input from the
previous layer, apply weights, and pass their output to the next layer.

o The more hidden layers and neurons, the more complex patterns the network can learn.
Multiple hidden layers enable the network to capture deep hierarchical features in the
data.

3. Output Layer:

o The output layer produces the final result. For classification tasks, the number of neurons
in the output layer typically corresponds to the number of classes. For regression tasks, it
may consist of a single output neuron.

Information Flow:

 The network is feedforward, meaning data flows in one direction—from the input to the output—
without looping back.

 Each neuron receives inputs, multiplies them by a set of weights, sums them up, applies an
activation function, and sends the result to the next layer.

Activation Function:

 Activation functions introduce non-linearity, which is crucial for solving complex, non-linear
problems. Common activation functions include:

o Sigmoid: Maps the input to a value between 0 and 1.

o ReLU (Rectified Linear Unit): Outputs zero for negative inputs and the input value itself
for positive inputs, allowing faster learning in deep networks.

o Tanh (Hyperbolic Tangent): Outputs values between -1 and 1, used to center data around
zero.

Learning Process:

1. Forward Propagation:

o Input data is passed through the network layer by layer, with each neuron applying
weights and the activation function to generate outputs. This continues until the output
layer is reached.

2. Error Calculation:
o The network compares the predicted output to the actual labeled output and calculates an
error using a loss function (e.g., mean squared error for regression or cross-entropy loss
for classification).

3. Backpropagation:

o In the backpropagation step, the network updates its weights by propagating the error
backward from the output to the input layers. The goal is to minimize the error by
adjusting the weights, which is typically done using an optimization algorithm like
stochastic gradient descent (SGD).

4. Weight Update:

o The network updates the weights according to the calculated gradients, adjusting them
slightly in the direction that reduces the error. This process is repeated for many iterations
(epochs) to gradually improve the model's performance.

Advantages:

1. Ability to Model Non-Linear Relationships:

o The inclusion of multiple hidden layers allows MLFNs to learn complex, non-linear
patterns that cannot be captured by single-layer networks.

2. Universal Approximation Theorem:

o Theoretically, a multi-layer feedforward network with sufficient neurons and layers can
approximate any continuous function, making it a powerful model for complex tasks.

3. Deep Learning Foundation:

o MLFNs are the foundation of deep learning. When multiple hidden layers are stacked, the
network can learn hierarchical features, such as detecting edges in images in the early
layers and more complex objects in later layers.

Challenges:

1. Overfitting:

o If the network is too complex (too many neurons or layers), it can memorize the training
data, leading to poor generalization on new data. Techniques like dropout, regularization,
and early stopping are used to combat overfitting.

2. Computational Cost:

o Training MLFNs with many layers can be computationally expensive and require
significant resources, especially when working with large datasets.

Applications:

MLFNs are widely used in a variety of tasks, including:

 Image recognition

 Speech recognition
 Natural language processing (NLP)

 Forecasting and prediction

 Medical diagnosis

In summary, multi-layer feedforward networks are more powerful than single-layer networks because
they can learn and model non-linear relationships through multiple layers of neurons. They serve as the
foundation for more advanced neural architectures used in deep learning.

Back Propagation algorithm:

Neural networks are composed of multiple layers of interconnected neurons. These are organized into
three main layers: the input layer, the hidden layer and the output layer.

 The input layer receives the raw data features. Each neuron in this layer corresponds to a specific
feature in the input data.

 The hidden layer, of which there can be more than one, processes the data it receives. Hidden
layer neurons apply weights, biases and activation functions.

 The output layer produces the final output predictions. Neurons in this layer represent different
possible outputs of the model.

 Backpropagation algorithms are used extensively to train feedforward neural networks, such
as convolutional neural networks, in areas such as deep learning. A backpropagation algorithm is
pragmatic because it computes the gradient needed to adjust a network's weights more efficiently
than computing the gradient based on each individual weight. It enables the use of gradient
methods, such as gradient descent and stochastic gradient descent, to train multilayer networks
and update weights to minimize errors.

Types of backpropagations

The following are two types of backpropagation networks:

 Static backpropagation. This is a network developed to map static inputs for static outputs,
meaning that an output can be produced immediately after the input is provided. Static
networks can solve static classification problems, such as optical character recognition
(OCR).
 Recurrent backpropagation. The recurrent backpropagation network is used for fixed-point
learning. This means that during neural network training, the weights are numerical values
that determine how much neurons influence perceptrons. They're adjusted so that the network
can achieve stability by reaching a fixed value.
Activation Function
Activation function decides, whether a neuron should be activated or not by calculating weighted sum and
further adding bias with it.

 Step function

 Ramp function

 Sigmoid function

 Gaussian function

You might also like