Unit-5 AI
Unit-5 AI
INTRODUCTION
Neural network learning methods provide a robust approach to approximating real-valued, discrete-
valued, and vector-valued target functions. For certain types of problems, such as learning to interpret
complex real-world sensor data, artificial neural networks are among the most effective learning methods
currently known.
Biological Motivation
The study of artificial neural networks (ANNs) has been inspired in part by the observation that biological
learning systems are built of very complex webs of interconnected neurons.
Artificial neural networks are built out of a densely interconnected set of simple units, where each unit
takes a number of real-valued inputs (possibly the outputs of other units) and produces a single real-
valued output (which may become the input to many other units)
Consider a few facts from neurobiology
Neuron switching time ~ :001 second
Number of neurons ~ 1011
Connections per neuron ~ 104-5
Scene recognition time ~ :0.1 second
100 inference steps doesn't seem like enough
much parallel computation
Properties of artificial neural nets (ANN's):
Many neuron-like threshold switching units
Many weighted interconnections among units
Highly parallel, distributed process
Emphasis on tuning weights automatically
Historically, two groups of researchers have worked with artificial neural networks.
One group has been motivated by the goal of using ANNs to study and model biological learning
processes.
A second group has been motivated by the goal of obtaining highly effective machine learning
algorithms, independent of whether these algorithms mirror biological processes.
NEURAL NETWORK REPRESENTATIONS
A prototypical example of ANN learning is provided by Pomerleau's (1993) system ALVINN, which uses
a learned ANN to steer an autonomous vehicle driving at normal speeds on public highways.
The input to the neural network is a 30 x 32 grid of pixel intensities obtained from a forward-pointed
camera mounted on the vehicle.
The network output is the direction in which the vehicle is steered.
The ANN is trained to mimic the observed steering commands of a human driving the vehicle for
approximately 5 minutes.
ALVINN has used its learned networks to successfully drive at speeds up to 70 miles per hour and for
distances of 90 miles on public highways (driving in the left lane of a divided public highway, with other
vehicles present)
Neural network learning to steer an autonomous vehicle.
The ALVINN system uses BACKPROPAGATION to learn to steer an autonomous vehicle (photo at top)
driving at speeds up to 70 miles per hour. The diagram on the left shows how the image of a forward-
mounted camera is mapped to 960 neural network inputs, which are fed forward to 4 hidden units,
connected to 30 output units. Network outputs encode the commanded steering direction
We can view the perceptron as representing a hyper plane decision surface in the n-dimensional
space of instances (i.e., points).
The perceptron outputs a 1 for instances lying on one side of the hyperplane and outputs a -1 for
instances lying on the other side.
The Equation for the hyper plane is w.x=0
Some set of points cannot be separated by a linear hyper plane .those points that are separable are
called linearly separable
Perceptron’s can represent all of the primitive boolean functions AND, OR, NAND and NOR
Unfortunately, however, some Boolean functions cannot be represented by a single perceptron, such as
the XOR function whose value is 1 if and only if xl != x2.
The Perceptron Training Rule
Here the precise learning problem is to determine a weight vector that causes the perceptron to produce
the correct output +1 or -1 for each of the given training examples.
Several algorithms are known to solve this learning problem.
Here we consider two:
The perceptron rule
The delta rule (a variant of the LMS rule).
Perceptron rule
One way to learn an acceptable weight vector is to begin with random weights,
This process is repeated, iterating through the training examples as many times as needed until
the perceptron classifies all training examples correctly.
Weights are modified at each step according to the perceptron training rule, which revises the
weight wi associated with input xi according to the rule
Here ‘t’ is the target output for the current training example, ‘o’ is the output generated by the
perceptron, and ‘ƞ’ is a positive constant called the learning rate.
Suppose the training example is correctly classified already by the perceptron. In this case, (t - o)
is zero, making Δwi zero, so that no weights are updated.
Suppose the perceptron outputs a -1, when the target output is + 1. To make the perceptron output
a + 1 instead of – 1, the weights must be increased
On the other hand, if t = -1 and o = 1, then weights associated with positive xi will be decreased
rather than increased.
In fact, the above learning procedure can be proven to converge,
Provided the training examples are linearly separable
Provided a sufficiently small ƞ is used.
If the data are not linearly separable, convergence is not assured.
Example:or gate
Perceptron for or function:
Another set of weight values for ‘or’ gate: W1=W2=0.5, w0= -0.3
Set of weight values for ‘And’ gate: W1=W2=0.5, w0= -0.3 (Does not work)
-0.8(works)
XOR Function
Artificial Neural Networks
Artificial Neural Networks contain artificial neurons which are called units . These units are arranged in a
series of layers that together constitute the whole Artificial Neural Network in a system. A layer can have
only a dozen units or millions of units as this depends on how the complex neural networks will be
required to learn the hidden patterns in the dataset. Commonly, Artificial Neural Network has an input
layer, an output layer as well as hidden layers. The input layer receives data from the outside world which
the neural network needs to analyze or learn about. Then this data passes through one or multiple hidden
layers that transform the input into data that is valuable for the output layer. Finally, the output layer
provides an output in the form of a response of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to another. Each of these
connections has weights that determine the influence of one unit on another unit. As the data transfers
from one unit to another, the neural network learns more and more about the data which eventually results
in an output from the output layer.
Dendrite Inputs
Cell nucleus or
Nodes
Soma
Synapses Weights
Axon Output
Synapses : Synapses are the links between biological neurons that enable the transmission of
impulses from dendrites to the cell body. Synapses are the weights that join the one-layer nodes
to the next-layer nodes in artificial neurons. The strength of the links is determined by the weight
value.
Learning : In biological neurons, learning happens in the cell body nucleus or soma, which has a
nucleus that helps to process the impulses. An action potential is produced and travels through the
axons if the impulses are powerful enough to reach the threshold. This becomes possible by
synaptic plasticity, which represents the ability of synapses to become stronger or weaker over
time in reaction to changes in their activity. In artificial neural networks, backpropagation is a
technique used for learning, which adjusts the weights between nodes according to the error or
differences between predicted and actual outcomes.
Synaptic
Backpropagations
plasticity
Activation : In biological neurons, activation is the firing rate of the neuron which happens when
the impulses are strong enough to reach the threshold. In artificial neural networks, A
mathematical function known as an activation function maps the input to the output, and executes
activations.
Connections Model In artificial neural networks (ANNs), the connections model plays a critical role in d
efining how neurons (nodes) are linked and how information is passed between them. These models can b
e categorized as fully connected, where each neuron connects to every neuron in the next layer, or sparsel
y connected, which limits connections to reduce complexity and computation. Weights assigned to these c
onnections determine the strength and direction of the signals, which are adjusted during training to optim
ize performance. The architecture of these connections influences the network's ability to learn patterns, g
eneralize from data, and solve complex problems, making it fundamental to the design and efficiency of
ANNs.
Neuron Modelling for the ANN Neuron modelling in artificial neural networks (ANNs) mimics the beha
vior of biological neurons. Each artificial neuron receives inputs, processes them through an activation fu
nction, and generates an output. This process involves summing weighted inputs and applying a non-
linear activation function to determine the neuron's firing rate. The primary goal is to simulate the decisio
nmaking process of biological neurons, enabling the ANN to learn and adapt. Neurons in ANNs are typic
ally organized into layers: input, hidden, and output. The structure and functioning of these neurons are cr
ucial for the network's ability to perform tasks like classification, regression, and pattern recognition.
Activation Function Activation functions in neural networks introduce nonlinearity, enabling the networ
k to model complex relationships and learn intricate patterns in the data. They determine whether a neuro
n should be activated based on the weighted sum of its inputs. Common activation functions include the si
gmoid, tanh, and ReLU (Rectified Linear Unit). These functions help normalize the output of each neuron
to a range that is manageable and meaningful for subsequent layers. Choosing the right activation functio
n is crucial for the network's performance, as it impacts the learning efficiency, convergence rate, and ove
rall accuracy of the model.
Sigmoidal Characteristics Sigmoidal activation functions, such as the logistic sigmoid, are characterized
by their Sshaped curve. They map input values to an output range between 0 and 1, making them useful f
or binary classification tasks. The sigmoid function smoothens the input and is differentiable, allowing for
backpropagation during training. However, they suffer from the vanishing gradient problem, where gradie
nts become too small for effective learning in deep networks. Despite this drawback, their probabilistic int
erpretation and simplicity make them foundational in early neural network architectures.
Bipolar Binary and Unipolar Binary Bipolar binary and unipolar binary refer to the types of inputs and
outputs used in binary neural networks. In bipolar binary, inputs and outputs are typically represented as (
Negative)1 and +1, which can be advantageous in certain signal processing applications. Unipolar binary,
on the other hand, uses 0 and 1 for input and output representation. The choice between bipolar and unipo
lar binary affects the activation function and the learning algorithm. Bipolar representations can sometime
s lead to faster convergence in learning algorithms due to their symmetrical nature.
Hard Limiting Activation and Soft Limiting Activation Function Hard limiting activation functions pr
oduce binary outputs, typically 0 or 1, based on whether the input exceeds a certain threshold. They are si
mple and computationally efficient but lack flexibility in capturing nuanced patterns. In contrast, soft limi
ting activation functions, such as the sigmoid or tanh, output a range of values (e.g., between 0 and 1 or -1
and 1). These functions provide a smoother transition between output states, enabling the network to mod
el more complex relationships and improve gradient-based optimization.
Comparison Between Brain vs. Computer in Neural Network The comparison between biological brai
ns and artificial neural networks (ANNs) highlights significant differences and similarities. Biological neu
rons communicate through electrochemical signals and adapt based on experience, exhibiting high levels
of parallelism and energy efficiency. ANNs, inspired by this biological process, use mathematical models
to simulate neuron behavior and learning. However, ANNs operate on digital hardware and often require
substantial computational power and data. While brains excel in adaptability, generalization, and efficienc
y, ANNs provide speed, precision, and the ability to process vast amounts of data. The study of these diff
erences continues to drive advancements in AI and neuroscience.
Classification Based on Connections (Feedforward, Feedback, Recurrent) Neural networks can be cla
ssified based on their connection patterns into feedforward, feedback, and recurrent networks. Feedforwar
d networks have a unidirectional flow of information from input to output, with no cycles, making them s
uitable for tasks like classification and regression. Feedback networks, also known as recurrent neural net
works (RNNs), allow connections to loop back, enabling them to maintain memory and process sequentia
l data. Recurrent networks are essential for tasks requiring temporal context, such as language modeling a
nd timeseries prediction. These classifications determine the network's capability to handle different types
of data and applications effectively.
Neural Processing:
Neural processing refers to how artificial neural networks mimic the functioning of biological neural
systems. It involves receiving input data, processing it through neurons (nodes), and adjusting the weights
of connections between them to achieve a specific output. This process depends on activation functions,
which determine the neuron's firing state, and learning rules that update weights based on errors in
predictions. The objective is to create a network that can generalize and recognize patterns or
relationships in the input data, leading to accurate predictions or classifications.
Heteroassociation:
Heteroassociation is a type of associative memory where a pattern in one set of data (input) is associated
with a different pattern in another set (output). In neural networks, it is used to learn relationships
between distinct sets of input and output patterns. For example, a network trained to translate English
words into Spanish is performing heteroassociation. This process often relies on Hebbian learning, where
the strength of the connection between two neurons increases if they are activated together frequently. It's
widely used in machine learning tasks such as translation, pattern recognition, and mapping between data
sets.
In neural processing for classification, a neural network learns to categorize input data into different
classes. During training, the network is exposed to labeled data, and the goal is to minimize the difference
between the predicted and actual class labels. Techniques like backpropagation are used to adjust the
weights of neurons so that the network can distinguish between classes more effectively. The
performance is measured by metrics such as accuracy, precision, and recall. Common applications
include image classification, spam detection, and sentiment analysis.
Threshold:
A threshold in neural networks is a value that determines whether a neuron should activate or "fire." In
many networks, an activation function (like the step or sigmoid function) compares the weighted sum of
inputs to a threshold. If the sum exceeds the threshold, the neuron outputs a signal (often 1 or another
high value); otherwise, it outputs zero. This mechanism allows neurons to decide whether the input is
significant enough to pass forward, contributing to the overall decision-making process in tasks like
classification and pattern recognition.
Learning Rate:
The learning rate is a key hyperparameter in neural networks, controlling the size of the weight updates
during training. A small learning rate means the model will make tiny adjustments to the weights,
resulting in slow convergence but potentially better accuracy. A large learning rate leads to faster learning
but risks overshooting the optimal solution, which can cause the model to diverge or oscillate. Optimizing
the learning rate is critical for achieving a balance between speed and precision in training.
Momentum Factor:
The momentum factor in neural networks is a technique used to accelerate the gradient descent
optimization process. By considering the direction of previous weight updates, momentum helps smooth
out oscillations and avoid getting stuck in local minima. It acts like a ball rolling down a hill, where the
speed increases in consistent directions. The momentum factor, typically a value between 0 and 1,
controls how much of the previous update influences the current one, allowing for faster convergence and
improved stability during training.
Vigilance Parameter:
The vigilance parameter is used in adaptive resonance theory (ART) networks, which are a type of
neural network designed for stable learning in a changing environment. The vigilance parameter controls
the level of similarity required for an input to be classified as part of an existing category. A high
vigilance value means the network will only accept inputs that closely match existing patterns, leading to
more refined categories. Conversely, a low vigilance value allows for broader generalizations, grouping
more diverse inputs into the same category.
A single-layer feedforward network (SLFN) is one of the simplest types of artificial neural networks. It
consists of two layers: an input layer and an output layer, with no hidden layers in between. Here’s a
breakdown of the key aspects:
Structure:
Input Layer: The input layer is where data enters the network. Each input neuron represents a
feature or attribute of the data.
Output Layer: The output neurons produce the final result after processing. In the case of
classification, each output neuron corresponds to a class.
Weights: The connections between the input and output layers are weighted. These weights
determine the importance of each input feature in the decision-making process.
Information Flow:
The network is "feedforward," meaning that data moves in a single direction—from the input to
the output. There is no feedback or looping, so information flows straight through the network.
Activation Function:
Each output neuron applies an activation function (e.g., step, sigmoid, or ReLU) to the weighted
sum of the inputs to produce an output. The activation function introduces non-linearity, helping
the network model complex relationships.
Learning Process:
Supervised Learning: The network is typically trained using labeled data, where the inputs are
paired with their correct outputs. The learning algorithm adjusts the weights based on the error
between the predicted and actual outputs.
Error Minimization: Using methods like gradient descent, the network iteratively reduces the
error by adjusting weights.
Limitations:
Linear Separability: A single-layer network can only solve problems that are linearly separable,
meaning it can separate the data into distinct classes using a straight line or hyperplane. Complex
problems (like XOR) cannot be solved by an SLFN.
No Hidden Layers: Without hidden layers, the network lacks the ability to learn more abstract
features from the data.
Applications:
SLFNs are commonly used for simple tasks such as binary classification, pattern recognition,
or linear regression.
In summary, while a single-layer feedforward network is limited in complexity, it offers a foundation for
understanding more advanced networks like multi-layer perceptron’s (MLPs). It’s most useful for
straightforward problems where linear relationships between inputs and outputs are sufficient.
Structure:
1. Input Layer:
o This layer receives the input data. Each neuron in the input layer represents a feature or
attribute of the data.
2. Hidden Layers:
o The hidden layers are where the network performs most of its computations. An MLFN
can have one or more hidden layers, where each layer's neurons take input from the
previous layer, apply weights, and pass their output to the next layer.
o The more hidden layers and neurons, the more complex patterns the network can learn.
Multiple hidden layers enable the network to capture deep hierarchical features in the
data.
3. Output Layer:
o The output layer produces the final result. For classification tasks, the number of neurons
in the output layer typically corresponds to the number of classes. For regression tasks, it
may consist of a single output neuron.
Information Flow:
The network is feedforward, meaning data flows in one direction—from the input to the output—
without looping back.
Each neuron receives inputs, multiplies them by a set of weights, sums them up, applies an
activation function, and sends the result to the next layer.
Activation Function:
Activation functions introduce non-linearity, which is crucial for solving complex, non-linear
problems. Common activation functions include:
o ReLU (Rectified Linear Unit): Outputs zero for negative inputs and the input value itself
for positive inputs, allowing faster learning in deep networks.
o Tanh (Hyperbolic Tangent): Outputs values between -1 and 1, used to center data around
zero.
Learning Process:
1. Forward Propagation:
o Input data is passed through the network layer by layer, with each neuron applying
weights and the activation function to generate outputs. This continues until the output
layer is reached.
2. Error Calculation:
o The network compares the predicted output to the actual labeled output and calculates an
error using a loss function (e.g., mean squared error for regression or cross-entropy loss
for classification).
3. Backpropagation:
o In the backpropagation step, the network updates its weights by propagating the error
backward from the output to the input layers. The goal is to minimize the error by
adjusting the weights, which is typically done using an optimization algorithm like
stochastic gradient descent (SGD).
4. Weight Update:
o The network updates the weights according to the calculated gradients, adjusting them
slightly in the direction that reduces the error. This process is repeated for many iterations
(epochs) to gradually improve the model's performance.
Advantages:
o The inclusion of multiple hidden layers allows MLFNs to learn complex, non-linear
patterns that cannot be captured by single-layer networks.
o Theoretically, a multi-layer feedforward network with sufficient neurons and layers can
approximate any continuous function, making it a powerful model for complex tasks.
o MLFNs are the foundation of deep learning. When multiple hidden layers are stacked, the
network can learn hierarchical features, such as detecting edges in images in the early
layers and more complex objects in later layers.
Challenges:
1. Overfitting:
o If the network is too complex (too many neurons or layers), it can memorize the training
data, leading to poor generalization on new data. Techniques like dropout, regularization,
and early stopping are used to combat overfitting.
2. Computational Cost:
o Training MLFNs with many layers can be computationally expensive and require
significant resources, especially when working with large datasets.
Applications:
Image recognition
Speech recognition
Natural language processing (NLP)
Medical diagnosis
In summary, multi-layer feedforward networks are more powerful than single-layer networks because
they can learn and model non-linear relationships through multiple layers of neurons. They serve as the
foundation for more advanced neural architectures used in deep learning.
Neural networks are composed of multiple layers of interconnected neurons. These are organized into
three main layers: the input layer, the hidden layer and the output layer.
The input layer receives the raw data features. Each neuron in this layer corresponds to a specific
feature in the input data.
The hidden layer, of which there can be more than one, processes the data it receives. Hidden
layer neurons apply weights, biases and activation functions.
The output layer produces the final output predictions. Neurons in this layer represent different
possible outputs of the model.
Backpropagation algorithms are used extensively to train feedforward neural networks, such
as convolutional neural networks, in areas such as deep learning. A backpropagation algorithm is
pragmatic because it computes the gradient needed to adjust a network's weights more efficiently
than computing the gradient based on each individual weight. It enables the use of gradient
methods, such as gradient descent and stochastic gradient descent, to train multilayer networks
and update weights to minimize errors.
Types of backpropagations
Static backpropagation. This is a network developed to map static inputs for static outputs,
meaning that an output can be produced immediately after the input is provided. Static
networks can solve static classification problems, such as optical character recognition
(OCR).
Recurrent backpropagation. The recurrent backpropagation network is used for fixed-point
learning. This means that during neural network training, the weights are numerical values
that determine how much neurons influence perceptrons. They're adjusted so that the network
can achieve stability by reaching a fixed value.
Activation Function
Activation function decides, whether a neuron should be activated or not by calculating weighted sum and
further adding bias with it.
Step function
Ramp function
Sigmoid function
Gaussian function