0% found this document useful (0 votes)
14 views20 pages

SC Unit-2

Uploaded by

ageesshabana84
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views20 pages

SC Unit-2

Uploaded by

ageesshabana84
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Course Code: 19CS0544 R19

UNIT –II
ARTIFICIAL NEURAL NETWORKS

1 Analyze the Back propagation of Neural Network with neat diagram. [L4][CO2] [12M]
Back propagation Network
The back propagation learning algorithm is one of the most important
developments in neural networks (Bryson and Ho, 1969; Werbos, 1974; Lecun,
1985; Parker, 1985;umelhart, 1986). This learning algorithm is applied to multilayer
feed-forward networks consisting of processing elements with continuous differentiable
activation functions. The networks associated with back-propagation learning algorithm
are also called back- propagation networks. (BPNs).
For a given set of training input-output pair, this algorithm provides a procedure for
changing the weights in a BPN to classify the given input patterns correctly. The

basic concept for this weight update algorithm is simply the gradient descent
method. This is a methods were error is propagated back to the hidden unit. Back
propagation network is a training algorithm.

A back-propagation neural network is a multilayer, feed-forward neural network


consisting of an input layer, a hidden layer and an output layer. The neurons
present in the hidden and output layers have biases, which are the connections
from the units whose activation is always 1. The bias terms also acts as weights.
During the back propagation phase of learning, signals are sent in the reverse
direction. The inputs sent to the BPN and the output obtained from the net
could be either binary (0, 1) or bipolar (-1, +1). The activation function
could be any function which increases monotonically and is also differentiable.
The commonly used activation functions are binary, sigmoidal and bipolar
sigmoidal activation functions. These functions are used in the BPN because of
the following characteristics: (i) Continuity (ii) Differentiability iii) Non
decreasing monotonic.
Course Code: 19CS0544 R19
Training Algorithm

Step 0: Initialize weights and learning rate (take some small random values).
Step 1: Perform Steps 2-9 when stopping condition is false.
Step 2: Perform Steps 3-8 for each training pair.

Feed forward Phase 1

Step 3: Each input unit receives input signal xi and sends it to the hidden unit
(i = l to n).
Step 4: Each hidden unit zj (j = 1 to p) sums its weighted input signals to
calculate net input:


 = 0 + ∑ 
=1

Calculate output of the hidden unit by applying its activation


functions over 
(binary or bipolar sigmoidal activation function):

 = ()

and send the output signal from the hidden unit to the input of output
layer units.

Step 5: For each output unit  (k = 1 to m), calculate the net input:


 =  0 + ∑ 
=1

and apply the activation function to compute output signal

 = ()

Back-propagation of error (Phase II)

Step 6: Each output unit (k=1 to m) receives a target pattern


corresponding to the inputtraining pattern and computes the error correction
term:

 = ( − )′()

The derivative ′() can be calculated as in activation function


section. On the basis ofthe calculated error correction term, update the
change in weights and bias:

∆ =  ; ∆0 = 

Also, send  to the hidden layer backwards.


Step 7: Each hidden unit ( = 1 to p) sums its delta inputs from the output
units:
Course Code: 19CS0544 R19

 = ∑  
=1

The term  gets multiplied with the derivative of () to calculate the
error term:

 = ′( )

The derivative ′() can be calculated as activation function section


depending on whether binary or bipolar sigmoidal function is used. On
the basis of the calculated  , update the change in weights and bias:

∆ = ; ∆ 0 = 

Weight and bias updation (Phase IIl):

Step 8: Each output unit (yk, k = 1 to m) updates the bias and weights:

 () =  () + 

0() = 0() + 0

Each hidden unit (zj; j = 1 to p) updates its bias and weights:

() = () + 

0() = 0() + 0

Step 9: Check for the stopping condition. The stopping condition may be
certain number of epochs reached or when the actual output equals the
target output.

2 Discuss Self –Organizing Map algorithm and its features . [L2][CO2] [12M]
Kohonen Self-Organizing Feature Maps

It is one of the Dimensionality Reduction Technique. Suppose we have some pattern of


arbitrary dimensions, however, we need them in one dimension or two dimensions.
Then the process of feature mapping would be very useful to convert the wide pattern
space into a typical feature space. to convert the arbitrary dimensions into 1-D or 2-D,
it must also have the ability to preserve the neighbor topology like grid topology,
rectangular grid topology.

Rectangular Grid Topology

This topology has 24 nodes in the distance-2 grid, 16 nodes in the distance-1
grid, and 8 nodes in the distance-0 grid, which means the difference between
each rectangular grid is 8 nodes. The winning unit is indicated by #.
Course Code: 19CS0544 R19

Hexagonal Grid Topology

This topology has 18 nodes in the distance-2 grid, 12 nodes in the distance-1
grid, and 6 nodes in the distance-0 grid, which means the difference between
each rectangular grid is 6 nodes. The winning unit is indicated by #.

Architecture

The architecture of KSOM is similar to that of the competitive network. With


the help of neighborhood schemes, discussed earlier, the training can take place
over the extended region of the network.

Algorithm for training

Step 1 − Initialize the weights, the learning rate α and the neighborhood
topological scheme.

Step 2 − Continue step 3-9, when the stopping condition is not true.

Step 3 − Continue step 4-6 for every input vector x.

Step 4 − Calculate Square of Euclidean Distance for j = 1 to m


Course Code: 19CS0544 R19

3 Illustrate Learning Vector Quantization with neat sketch. [L3][CO2] [12M]

Learning Vector Quantization (LVQ), different from Vector quantization


(VQ) and Kohonen Self-Organizing Maps (KSOM), basically is a competitive
network which uses supervised learning

Architecture

Following figure shows the architecture of LVQ which is quite similar to the
architecture of KSOM. As we can see, there are “n” number of input units
and “m” number of output units. The layers are fully interconnected with
having weights on them.
Course Code: 19CS0544 R19

Variants
Three other variants namely LVQ2, LVQ2.1 and LVQ3 have been developed
by Kohonen. Complexity in all these three variants, due to the concept that the
winner as well as the runner-up unit will learn, is more than in LVQ.

LVQ2
As discussed, the concept of other variants of LVQ above, the condition of
LVQ2 is formed by window. This window will be based on the following
parameters −
Course Code: 19CS0544 R19

4 a Explain Hamming neural network with neat diagram. [L2][CO2] [12M]

Hamming Network
In most of the neural networks using unsupervised learning, it is essential to
compute the distance and perform comparisons. This kind of network is
Hamming network, where for every given input vectors, it would be clustered
Course Code: 19CS0544 R19
into different groups. Following are some important features of Hamming
Networks −

 Lippmann started working on Hamming networks in 1987.


 It is a single layer network.
 The inputs can be either binary {0, 1} of bipolar {-1, 1}.
 The weights of the net are calculated by the exemplar vectors.
 It is a fixed weight network which means the weights would remain the
same even during training.

Max Net
This is also a fixed weight network, which serves as a subnet for selecting the
node having the highest input. All the nodes are fully interconnected and there
exists symmetrical weights in all these weighted interconnections.

Architecture

It uses the mechanism which is an iterative process and each node receives
inhibitory inputs from all other nodes through connections. The single node
whose value is maximum would be active or winner and the activations of all
other nodes would be inactive. Max Net uses identity activation function with

5 Describe architectural functions and its characteristics of Hopfield


[L2][CO2] [12M]
Neural Network with neat sketch.
Hopfield Networks
Hopfield neural network was invented by Dr. John J. Hopfield in 1982. It
consists of a single layer which contains one or more fully connected recurrent
neurons. The Hopfield network is commonly used for auto-association and
Course Code: 19CS0544 R19
optimization tasks.

Discrete Hopfield Network

A Hopfield network which operates in a discrete line fashion or in other words,


it can be said the input and output patterns are discrete vector, which can be
either binary (0,1) or bipolar (+1, -1) in nature. The network has symmetrical
weights with no self-connections i.e., wij = wji and wii = 0.

Architecture

Following are some important points to keep in mind about discrete Hopfield
network −

 This model consists of neurons with one inverting and one non-
inverting output.
 The output of each neuron should be the input of other neurons but not
the input of self.
 Weight/connection strength is represented by wij.
 Connections can be excitatory as well as inhibitory. It would be
excitatory, if the output of the neuron is same as the input, otherwise
inhibitory.
 Weights should be symmetrical, i.e. wij = wji

The output from Y1 going to Y2, Yi and Yn have the


weights w12, w1i and w1n respectively. Similarly, other arcs have the weights on
them.

Training Algorithm

During training of discrete Hopfield network, weights will be updated. As we


know that we can have the binary input vectors as well as bipolar input vectors.
Hence, in both the cases, weight updates can be done with the following
Course Code: 19CS0544 R19
relation

Testing Algorithm

Step 1 − Initialize the weights, which are obtained from training algorithm by
using Hebbian principle.
Step 2 − Perform steps 3-9, if the activations of the network is not consolidated.
Step 3 − For each input vector X, perform steps 4-8.
Step 4 − Make initial activation of the network equal to the external input
vector X as follows −
Course Code: 19CS0544 R19

6 a Discuss Bidirectional Associate Memory and its applications. [L2][CO2] [8M]


Bidirectional Associative Memory (BAM)
Bidirectional associative memory (BAM), first proposed by Bart Kosko in the
year 1988. The BAM network performs forward and backward associative
searches for stored stimulus responses. The BAM is a recurrent hetero
associative pattern-marching nerwork that encodes binary or bipolar patterns
using Hebbian learning rule
Course Code: 19CS0544 R19
Architecture

The architecture of BAM network consists of two layers of neurons which are
connected by directed weighted pare interconnections. The network dynamics
involve two layers of interaction. The BAM network iterates by sending the
signals back and forth between the two layers until all the neurons reach
equilibrium. The weights associated with the network are bidirectional. Thus,
BAM can respond to the inputs in either layer.
Figure shows a BAM network consisting of n units in X layer and m units in Y
layer. The layers can be connected in both directions(bidirectional) with the
result the weight matrix sent from the X layer to the Y layer is W and the
weight matrix for signals sent from the Y layer to the X layer is W T. Thus, the
Weight matrix is calculated in both directions.

Determination of Weights
Let the input vectors be denoted by s(p) and target vectors by t(p). p = 1, ... , P.
Then the weight matrix to store a set of input and target vectors, where
s(p) = (s1(p), .. , si(p), ... , sn(p))
t(p) = (t1(p), .. , tj(p), ... , tm(p))
Course Code: 19CS0544 R19

b Analyze the Characteristics , limitations and applications of Associative [L4][CO2] [4M]


memory
Associative memory is also known as content addressable memory (CAM) or
associative storage or associative array. It is a special type of memory that is
optimized for performing searches through data, as opposed to providing a simple
direct access to the data based on the address.
Associative memory of conventional semiconductor memory (usually RAM) with
added comparison circuity that enables a search operation to complete in a single
clock cycle. It is a hardware search engine, a special type of computer memory used
in certain very high searching applications.
Applications of Associative memory :-
. It can be only used in memory allocation format.
2. It is widely used in the database management systems, etc.
Advantages of Associative memory :-
. It is used where search time needs to be less or short.
2. It is suitable for parallel searches.
3. It is often used to speedup databases.
4. It is used in page tables used by the virtual memory and used in neural
networks.
Course Code: 19CS0544 R19
Limitations or Disadvantages of Associative memory :-
. It is more expensive than RAM.

Each cell must have storage capability and logical circuits for matching its
content with external argument.
7 a Generalize the Adaptive Resonance Theory Neural Network [L6][CO2] [8M]

Adaptive Resonance Theory


This network was developed by Stephen Grossberg and Gail Carpenter in 1987.
It is based on competition and uses unsupervised learning model. Adaptive
Resonance Theory (ART) networks, as the name suggests, is always open to
new learning (adaptive) without losing the old patterns (resonance). Basically,
ART network is a vector classifier which accepts an input vector and classifies
it into one of the categories depending upon which of the stored pattern it
resembles the most.

Architecture of ART1

It consists of the following two units −

Computational Unit − It is made up of the following −

 Input unit (F1 layer) − It further has the following two portions −
F1(a) layer (Input portion) − In ART1, there would be no
processing in this portion rather than having the input vectors only. It
is connected to F1(b) layer (interface portion).
F1(b) layer (Interface portion) − This portion combines the signal
from the input portion with that of F2 layer. F1(b) layer is connected
to F2 layer through bottom up weights bij and F2 layer is connected to
F1(b) layer through top down weights tji.
 Cluster Unit (F2 layer) − This is a competitive layer. The unit having
the largest net input is selected to learn the input pattern. The activation of
all other cluster unit are set to 0.
 Reset Mechanism − The work of this mechanism is based upon the
similarity between the top-down weight and the input vector. Now, if the
degree of this similarity is less than the vigilance parameter, then the
cluster is not allowed to learn the pattern and a rest would happen.

Supplement Unit − Actually the issue with Reset mechanism is that the
layer F2 must have to be inhibited under certain conditions and must also be
available when some learning happens. That is why two supplemental units
namely, G1 and G2 is added along with reset unit, R. They are called gain
control units. These units receive and send signals to the other units present in
the network. ‘+’ indicates an excitatory signal, while ‘−’ indicates an inhibitory
signal.
Course Code: 19CS0544 R19

b Identify some applications of ART Model [L2][CO2] [4M]

Target recognition:
Fuzzy ARTMAP neural network can be used for automatic classification of targets
depend on their radar range profiles. Tests on synthetic data show the fuzzy ARTMAP
can result in substantial savings in memory requirements when related to k nearest
neighbor(kNN) classifiers. The utilization of multi wave length profiles mainly
improves the performance of both kinds of classifiers.
Medical diagnosis:
Medical databases present huge numbers of challenges found in general information
management settings where speed, use, efficiency, and accuracy are the prime concerns.
A direct objective of improved computer-assisted medicine is to help to deliver
intensive care in situations that may be less than ideal. Working with these issues has
stimulated several ART architecture developments, including ARTMAP-IC.
Signature verification:
Course Code: 19CS0544 R19
Automatic signature verification is a well known and active area of research with
various applications such as bank check confirmation, ATM access, etc. the training of
the network is finished using ART1 that uses global features as input vector and the
verification and recognition phase uses a two-step process. In the initial step, the input
vector is coordinated with the stored reference vector, which was used as a training set,
and in the second step, cluster formation takes place.
Mobile control robot: Nowadays, we perceive a wide range of robotic devices. It is
still a field of research in their program part, called artificial intelligence. The human
brain is an interesting subject as a model for such an intelligent system. Inspired by the
structure of the human brain, an artificial neural emerges. Similar to the brain, the
artificial neural network contains numerous simple computational units, neurons that
are interconnected mutually to allow the transfer of the signal from the neurons to
neurons. Artificial neural networks are used to solve different issues with good
outcomes compared to other decision algorithms.
8 Illustrate the Support Vector Machine with neat diagram. [L3][CO2] [12M]
SVM:

Support Vector Machine or SVM is one of the most popular Supervised


Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning. It is inherently binary classifier.

The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane

Types of SVM

SVM can be of two types:

o Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated
Course Code: 19CS0544 R19
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.

Applications of SVM
 Face detection
 image classification
 Medical Diagnosis
 text categorization, etc.

Support Vectors:

The data points or vectors that are the closest to the hyperplane and which
affect the position of the hyperplane are termed as Support Vector. Since these
vectors support the hyperplane, hence called a Support vector.

Linear SVM:

The working of the SVM algorithm can be understood by using an example.


Suppose we have a dataset that has two tags (green and blue), and the dataset
has two features x1 and x2. We want a classifier that can classify the pair(x1,
x2) of coordinates in either green or blue. Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate


these two classes. But there can be multiple lines that can separate these
classes.

Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a


straight line, but for non-linear data, we cannot draw a single
straight line. Consider the below image:

So to separate these data points, we need to add one


more dimension. For linear data, we have used two dimensions x and y, so for
non-linear data, we will add a third dimension z. It can be calculated as:
z=x2 +y2

By adding the third dimension, the sample space will become as below image:
Course Code: 19CS0544 R19

Summarize the following terms:


9 [L2][CO2] [12M]
i) Hebbian Learning rule
ii) perceptron, Delta Learning Rules

Hebbian Learning Rule:

This rule, one of the oldest and simplest, was introduced by Donald Hebb in his
book The Organization of Behavior in 1949. It is a kind of feed-forward,
unsupervised learning.

Basic Concept − This rule is based on a proposal given by Hebb, who wrote −

“When an axon of cell A is near enough to excite a cell B and repeatedly or


persistently takes part in firing it, some growth process or metabolic change
takes place in one or both cells such that A’s efficiency, as one of the cells
firing B, is increased.”

From the above postulate, we can conclude that the connections between two
neurons might be strengthened if the neurons fire at the same time and might
weaken if they fire at different times.

Mathematical Formulation − According to Hebbian learning rule, following


is the formula to increase the weight of connection at every time step.

Perceptron Learning Rule


This rule is an error correcting the supervised learning algorithm of single layer
feedforward networks with linear activation function, introduced by Rosenblatt.
Course Code: 19CS0544 R19
Basic Concept − As being supervised in nature, to calculate the error, there
would be a comparison between the desired/target output and the actual output.
If there is any difference found, then a change must be made to the weights of
connection.

Mathematical Formulation − To explain its mathematical formulation,


suppose we have ‘n’ number of finite input vectors, x(n), along with its
desired/target output vector t(n), where n = 1 to N.

Now the output ‘y’ can be calculated, as explained earlier on the basis of the net
input, and activation function being applied over that net input can be expressed
as follows –

Delta Learning Rule (Widrow-Hoff Rule)


It is introduced by Bernard Widrow and Marcian Hoff, also called Least Mean
Square (LMS) method, to minimize the error over all training patterns. It is kind
of supervised learning algorithm with having continuous activation function.
Basic Concept − The base of this rule is gradient-descent approach, which
continues forever. Delta rule updates the synaptic weights so as to minimize the
net input to the output unit and the target value.
Mathematical Formulation − To update the synaptic weights, delta rule is
Course Code: 19CS0544 R19

Describe the structure of back propagation neural network and derive [L2][CO2] [12M]
the learning rule for the back propagation algorithm.
10
It is same as question 1 answer

You might also like