0% found this document useful (0 votes)
11 views64 pages

Final PPT DataMining

The document provides an overview of neural networks, including their structure, learning processes, and algorithms for training. It discusses the basics of a neural network, including input normalization, single neuron functioning, and various learning methods such as supervised, unsupervised, and error correction techniques. Additionally, it covers specific training algorithms like the Single Discrete Perceptron Training Algorithm and the Single Continuous Perceptron Training Algorithm.

Uploaded by

Nikshita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views64 pages

Final PPT DataMining

The document provides an overview of neural networks, including their structure, learning processes, and algorithms for training. It discusses the basics of a neural network, including input normalization, single neuron functioning, and various learning methods such as supervised, unsupervised, and error correction techniques. Additionally, it covers specific training algorithms like the Single Discrete Perceptron Training Algorithm and the Single Continuous Perceptron Training Algorithm.

Uploaded by

Nikshita
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 64

Neural Networks -II

Mihir Mohite
Jeet Kulkarni
Rituparna Bhise
Shrinand Javadekar

Data Mining CSE 634


Prof. Anita Wasilewska
References

 http://www.csse.uwa.edu.au/teaching/units/233.407/lecture
Notes/Lect4-UWA.pdf
 http://www.csse.uwa.edu.au/teaching/units/233.407/lecture

Notes/Lect4-UWA.pdf
 http://www.comp.glam.ac.uk/digimaging/neural.htm

 http://www.nbb.cornell.edu/neurobio/linster/lecture4.pdf

 src:http://www.nbb.cornell.edu/neurobio/linster/lecture4.pdf

 Lecture slides prepared by Jalal Mahmud and Hyung-Yeon Gu

under the guidance of Prof. Anita Wasilewska


Basics of a Neural Network
 Neural Network is a set of connected
INPUT/OUTPUT UNITS, where each connection
has a WEIGHT associated with it
 Neural Network learns by adjusting the weights so
as to be able to correctly classify the training data
and hence, after testing phase, to classify
unknown data.
Basics of a Neural Network
 Input: Classification data
It contains classification attribute
 Data is divided, as in any classification problem.
[Training data and Testing data]

 All data must be normalized


(i.e. all values of attributes in the database are changed
to contain values in the internal [0,1] or[-1,1])
Neural Network can work with data in the range of (0,1)
or (-1,1)
Basics of a Neural Network
v  min A
v'  (new _ max A  new _ min A)  new _ min A
max A  min A
Example: We want to normalize data to range of the
interval [0,1].
We put: new_max A= 1, new_minA =0.

Say, max A was 100 and min A was 20 ( That means


maximum and minimum values for the attribute ).

Now, if v = 40 ( If for this particular pattern , attribute value


is 40 ), v’ will be calculated as , v’ = (40-20) x (1-0) / (100-20)
+0
=> v’ = 20 x 1/80
=> v’ = 0.4
A single Neuron

Here x1 and x2 are normalized attribute value of data.

y is the output of the neuron , i.e the class label.

x1 and x2 values multiplied by weight values w1 and w2 are


input to the neuron x.

Value of x1 is multiplied by a weight w1 and values of x2 is


multiplied by a weight w2.
A single Neuron
 Given that

 w1 = 0.5 and w2 = 0.5


 Say value of x1 is 0.3 and value of x2 is 0.8,

 So, weighted sum is :

 sum= w1 x x1 + w2 x x2 = 0.5 x 0.3 + 0.5 x 0.8 =


0.55
A single Neuron
 The neuron receives the weighted sum as input and
calculates the output as a function of input as follows :

 y = f(x) , where f(x) is defined as


f(x) = 0 { when x< 0.5 }
f(x) = 1 { when x >= 0.5 }

 For our example, x ( weighted sum ) is 0.55, so y = 1 ,

 That means corresponding input attribute values are


classified in class 1.

 If for another input values , x = 0.45 , then f(x) = 0,


 so we could conclude that input values are classified to
class 0.
Bias of a Neuron
 We need the bias value to be added to the
weighted sum ∑wixi so that we can transform it
from the origin.
x1-x2= -1
x2 x1-x2=0

x1-x2= 1

x1
Bias as an input

X0= w0
o/p
+1
class
w1
x1 ∑ f

wn
Activation func
xn Summing func
A Multilayer Feed-Forward
Neural Network
Output Class
Ok
Output nodes
w jk
Oj
Hidden nodes

wij - weights

Input nodes
Network is fully connected
Input Record : xi
Inputs to a Neural Network
 INPUT: records without class attribute with normalized attributes
values.

 INPUT VECTOR: X = { x1, x2, …. xn}


where n is the number of (non class) attributes.

 WEIGHT VECTOR: W = {w1,w2,….wn} where n is the number of


(non-class) attributes

 INPUT LAYER – there are as many nodes as non-class attributes


i.e. as the length of the input vector.

 HIDDEN LAYER – the number of nodes in the hidden layer and the
number of hidden layers depends on implementation.
Net Weighted Input
• Given a unit j in a hidden or output layer, the
net input is
I j  wij Oi   j
i

where wij is the weight of the connection from


unit i in the previous layer to unit j; Oi is the
output of unit I from the previous layer;

j is the bias of the unit


Binary activation function
 Given a net input Ij to unit j, then
Oj = f(Ij),
the output of unit j, is computed as

Oj = 1 if lj>T
Oj= 0 if lj<=T

Where T is known as the Threshold


Squashing activation function
 Each unit in the hidden and output layers
takes its net input and then applies an
activation function. The function symbolizes
the activation of the neuron represented by
the unit. It is also called a logistic, sigmoid, or
squashing function.
 Given a net input Ij to unit j, then
Oj = f(Ij), 1
Oj   Ij
the output of unit j, is computed as 1 e
Learning in Neural Networks
 Learning in Neural Networks-what is it?
 Why is learning required?

 Supervised and Unsupervised learning

 Ittakes a long time to train a neural network


 A well trained network is tolerant to noise in

data
Using Error Correction

 Used for supervised learning

 Perceptron Learning Formula


 For binary-valued response function
 Delta Learning Formula
 For continuous-valued response function
Using Error Correction
 Perceptron Learning Formula
∆wi = c[di –oi]xi

So the value of ∆wi is either


0 (when expected output and actual output are
the same)
Or
2cxi (when di –oi is +/-2)
Using Error Correction
Perceptron Learning Formula
(http://www.csse.uwa.edu.au/teaching/units/233.407/lectureNotes/
Lect4-UWA.pdf)
Using Error Correction
 Delta Learning Formula
∆wi = c[di –oi]xi * o’i

In case of a unipolar squashing activation


function the value of o’i evaluates to oi(1- oi).

Where oi is given as oi = 1/(1 + e-net i/p )


Using Error Correction
 Delta Learning Formula
(http://www.csse.uwa.edu.au/teaching/units/233.407/lectureNotes/
Lect4-UWA.pdf)
Hebbian Learning Formula
 A purely feed forward unsupervised learning
network
 Hebbian learning formula comes from Hebb’s
postulation that if two neurones were very active at
the same time which is illustrated by the high values
of both its output and one of its inputs, the strength
of the connection between the two neurones will
grow or increase.
 Depends on pre-synaptic and post-synaptic
activities
 src:http://www.comp.glam.ac.uk/digimaging/neural.htm
Hebbian Learning Formula
 Ifxj is the output of the presynaptic neuron, xi
the output of the postsynaptic neuron, and wij
the strength of the connection between them,
and γ learning rate, then one form of a
learning formula would be:
 ∆Wij (t) = γ∗xj*xi
 src:http://www.nbb.cornell.edu/neurobio/linster/lecture4.pdf
Hebbian Learning Formula
 src:http://www.nbb.cornell.edu/neurobio/linster/lecture4.pdf
Competitive Learning

 Unsupervised network training, and applicable for


an ensemble of neurons (e.g. a layer of p neurons),
not for a single neuron.
 Output neurons of NN compete to become active
 Adapt the neuron m which has the maximum
response due to input x
 Only single neuron is active at any one time
 –salient feature for pattern classification
 –Neurons learn to specialize on ensembles of similar
patterns; Therefore,
 –They become feature detectors
Competitive Learning

 Basic Elements
 A set of neurons that are all same except synaptic
weight distribution
 respond differently to a given set of input pattern
 A mechanism to compete to respond to a given input
 The winner that wins the competition is called“winner-
takes-all”
Competitive Learning

 Forexample, if the input vector is (0.35, 0.8),


the winning neurode might have weight
vector (0.4, 0.78). The learning rule would
adjust the weight vector to make it even
closer to the input vector. Only the winning
neurode produces output, and only the
winning neurode gets its weights adjusted.
References
• http://www.csse.uwa.edu.au/teaching/units/2
33.407/lectureNotes/Lect4-UWA
• Eric Plummer, University of Wyoming
www.karlbranting.net/papers/plummer/Pres.p
pt
• J.M. Zurada, “Introduction to Artificial Neural
Systems”, West Publishing Company, 1992,
chapter 3.
The Discrete Perceptron

Src: http://www.csse.uwa.edu.au/teaching/units/233.407/lectureNotes/Lect4-UWA
Single Discrete Perceptron Training
Algorithm (SDPTA)

 We will begin to examine neural network classifiers


that derive their weights during the learning cycle.

 The sample pattern vectors X1, X2, …, Xp, called the


training sequence, are presented to the machine
along with the correct response.

 Based on the perceptron learning rule seen earlier.


Given are P training pairs
{X1,d1,X2,d2....Xp,dp}, where
Xi is (n*1)
di is (1*1)
i=1,2,...P
Yi= Augmented input pattern( obtained by appending 1 to the input
vector)
i=1,2,…P
In the following, k denotes the training step and p denotes the step
counter within the training cycle
Step 1: c>0 is chosen.
Step 2: Weights are initialized at w at small values, w is (n+1)*1.
Counters and error are initialized.
k=1,p=1,E=0
Step 3: The training cycle begins here. Input is presented and
output computed:
Y=Yp, d=dp
O=sgn(wtY)
SDPTA contd..

Step 4: Weights are updated:


W=W+1/2c(d-o)Y
Step 5: Cycle error is computed:
E=1/2(d-o)2+E
Step 6: If p<P then p=p+1,k=k+1, and go to Step 3:
Otherwise go to Step 7.
Step 7: The training cycle is completed. For E=0,terminate the
training session. Outputs weights and k.
If E>0,then E=0 ,p=1, and enter the new training cycle by
going to step 3.
Single Continous Perceptron
Training Algorithm (SCPTA)
 We will begin to examine neural network classifiers
that derive their weights during the learning cycle.

 The sample pattern vectors X1, X2, …, Xp, called the


training sequence, are presented to the machine
along with the correct response.

 Based on the delta learning rule seen earlier.


The Continuous Perceptron

Src: http://www.csse.uwa.edu.au/teaching/units/233.407/lectureNotes/Lect4-UWA
Given are P training pairs
{X1,d1,X2,d2....Xp,dp}, where
Xi is (n*1)
di is (1*1)
i=1,2,...P
Yi= Augmented input pattern( obtained by appending 1 to the input
vector)
i=1,2,…P
In the following, k denotes the training step and p denotes the step
counter within the training cycle
Step 1: c>0 , Emin is chosen,
Step 2: Weights are initialized at w at small values, w is (n+1)*1.
Counters and error are initialized.
k=1,p=1,E=0
Step 3: The training cycle begins here. Input is presented and
output computed:
Y=Yp, d=dp
O=f(net) net=wtY.
SCPTA contd..

Step 4: Weights are updated:


W=W+1/2c(d-o)(1-o2)Y
Step 5: Cycle error is computed:
E=1/2(d-o)2+E
Step 6: If p<P then p=p+1,k=k+1, and go to Step 3:
Otherwise go to Step 7.
Step 7: The training cycle is completed. For E< Emin,terminate the
training session. Outputs weights and k.
If E>0,then E=0 ,p=1, and enter the new training cycle by
going to step 3.
R category Discrete Perceptron
Training Algorithm (RDPTA)

Src: http://www.csse.uwa.edu.au/teaching/units/233.407/lectureNotes/Lect4-UWA
Algorithm
Given are P training pairs
{X1,d1,X2,d2....Xp,dp}, where
Xi is (n*1)
di is (n*1)
No of Categories=R.
i=1,2,...P
Yi= Augmented input pattern( obtained by appending 1 to the input
vector)
i=1,2,…P
In the following, k denotes the training step and p denotes the step
counter within the training cycle
Step 1: c>0 , Emin is chosen,
Step 2: Weights are initialized at w at small values, w is (n+1)*1.
Counters and error are initialized.
k=1,p=1,E=0
Step 3: The training cycle begins here. Input is presented and
output computed:
Y=Yp, d=dp
Oi=f(wtY) for i=1,2,….R
RDPTA contd..

Step 4: Weights are updated:


wi=wi+1/2c(di-oi)Y for i=1,2,…..R.

Step 5: Cycle error is computed:


E=1/2(di-oi)2+E for i=1,2,…..R.

Step 6: If p<P then p=p+1,k=k+1, and go to Step 3:


Otherwise go to Step 7.

Step 7: The training cycle is completed. For E=0,terminate the


training session. Outputs weights and k.
If E>0,then E=0 ,p=1, and enter the new training cycle by
going to step 3.
What is Backpropagation?

• Supervised Error Back-propagation Training


The mechanism of backward error transmission is
used to modify the synaptic weights of the internal
(hidden) and output layers.

• Based on the delta learning rule.

• One of the most popular algorithms for supervised


training of multilayer feed forward networks.
Architecture: Backpropagation
Network
The Backpropagation Net was first introduced by G.E. Hinton, E.
Rumelhart and R.J. Williams in 1986.

Type:
Feedforward
Neuron layers:
1 input layer
1 or more hidden layers
1 output layer
Learning Method:
Supervised
Notation:
 x = input training vector
 t = Output target vector.
 δk = portion of error correction weight for wjk that is due
to an error at output unit Yk; also the information about
the error at unit Yk that is propagated back to the hidden
units that feed into unit Yk
 δj = portion of error correction weight for vjk that is due to
the backpropagation of error information from the output
layer to the hidden unit Zj
 α = learning rate.
 voj = bias on hidden unit j
 wok = bias on output unit k
EBPTA contd..
Generalisation
 Once trained, weights are held constant, and
input patterns are applied in feedforward.
mode. - Commonly called “recall mode”.
 We wish network to “generalize”, i.e. to make
sensible choices about input vectors which are
not in the training set.
 Commonly we check generalization of a

network by dividing known patterns into a


training set, used to adjust weights, and a test
set, used to evaluate performance of trained
network.
Generalisation …
 Generalisation can be improved by
– Using a smaller number of hidden units
(network must learn the rule, not just the
examples)
– Not overtraining (occasionally check that
error on test set is not increasing)
– Ensuring training set includes a good
mixture of examples

 No good rule for deciding upon good network size (#


of layers, # units per layer)
Handwritten Text Recognition
References

1)A Neural Based Segmentation and Recognition Technique for Handwritten


Words - M. Blumenstein and B. Verma, School of Information Technology, Griffith
University, Gold Coast Campus, Qld 9726, Australia.
IEEE World Congress on Computational Intelligence. The 1998 IEEE International
Joint Conference , Neural Networks Proceedings, 9th May 1998.

2)An Off-Line Cursive Handwriting Recognition System- Andrew W.


Senior,Anthony J. Robinson,IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 20, 1998

3) http://www.codeproject.com/dotnet/simple_ocr.asp
Steps for Classification
Binarisation

Preprocessing

Segmentation using
heuristic algorithm

Training of Segmentation Validation


Segmentation ANN using ANN

Training of Character Extraction of


Recognizing ANN individual words
Input Representation

The image is split into squares and we


calculate average value of each square.
Thus, the input is digitized and stored into
a data structure like an array.

Digitized input
representation

** source http://www.codeproject.com/dotnet/simple_ocr.asp
Preprocessing

Slope Correction Slant Correction Size is normalized

Neural Network

**Screenshots taken from: http://www.thomastannahill.com/tom-ato/


Segmentation using ANN
Train ANN with
n - inputs 1 - output
segmentation points

Learning Rate = 0.2


Momentum = 0.2
Segment words
with heuristic algorithm

Present extracted
segmentation points to ANN n - inputs 1 - output

ANN classifies correct segmentation points


and non-legitimate points are removed
Identifying Characters
 Recurrent Neural Network-
 A recurrent network is well suited to the recognition of

patterns such as speech, text recognition.


 The recurrent network architecture used here is a single layer

of standard perceptrons with nonlinear activation functions


 The usefulness resides in existence of training algorithms

which causes the weights to converge toward a desired


function approximation.
Recurrent Network

The feedback units have a standard


sigmoid activation function

Character outputs have a


“softmax” activation function

A schematic of the recurrent error propagation network**


** An Off-Line Cursive Handwriting Recognition System
Andrew W. Senior, Member, IEEE, and Anthony J. Robinson, Member, IEEE
.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 3, MARCH 1998
Some parameters
 Stopping Criteria: The stopping criterion is a heuristic
based on the observation of validation word error rate
over time.

 Adding more feedback units to the network increases


its capacity, but the error rate of the system is seen to
fall as the number of feedback units increase.
(Feedback units ranging from 80 to 160 were used in
this example)
Training problems.. solutions
Training never completes because Possible solution

1. The network topology is too simple


Add more nodes into middle layer or
to handle amount of training
add more middle layers to the
patterns you provide. You will have
network.
to create bigger network.

2. The training patterns are not clear


As a solution you can clean the
enough, not precise or are too
patterns or you can use different
complicated for the network to
type of network /training algorithm.
differentiate them.

3. Your training expectations are too Lower your expectations. The


high and/or not realistic. network could be never 100% "sure"
Advantages/Disadvantages
 Output oriented model. No specific steps or approach for arriving to
the conclusion.
 Online training is possible, which allows to keep ‘teaching’ the
network.

 Training takes up a large amount of time and the network has to be


trained for all possible inputs.
 The network model to be chosen is not based on any fixed rule.
Parameters like no. of Hidden Layers, perceptrons on each layer
can be determined based on experience.
Effective Data Mining Using
Neural Networks
VLDB'95 Proceedings, Springer, Singapore, 1995

Hongjun Lu, Rudy Setiono, Huan Liu


Department of Information Systems Computer
Science
National University of Singapore
References:
1. http://citeseer.ist.psu.edu/cache/papers/cs/13788/
http:zSzzSzwww.eng.auburn.eduzSzuserszSzwenchenzSzcoursezSzco
mp714zSzarticlezSzlu.pdf/lu96effective.pdf
2. http://en.wikipedia.org/wiki/NeuralNetwork.html
Criticism of Neural Networks
 Generating/articulating rules is a difficult
problem

 Learning time is usually long

 Multiple passes over the training data


Neural Network based Data
Mining
 Three phases

 Network Construction and Training

 Network Pruning

 Rule Extraction
 Network construction and training
 Construct and train a neural network

 Network Pruning
 Aims at removing redundant links and units without
increasing the classification error rate
 Small number of units and links are left in the network

 Rule Extraction
 Extracts classification rules from the pruned network
 (a1 θ v1) ^ (a2 θ v2) ^ … (an θ vn) then Cj
Rule Extraction Algorithm**
 Input nodes, Hidden nodes, Output node
 Activation values

**http://en.wikipedia.org/wiki/Image:Neuralnetwork.png
1. Enumerate hidden node activation values

 E.g.
H = {0,0,1,1,0}

2.Generate rules that describe the network output in


terms of the discretized hidden unit activation values

 E.g.
(H1 = 0) ^ (H2 = 0) ^ (H3 = 1) ^ (H4 = 1) ^ (H5 = 0) then O
3. For each hidden unit, enumerate the input
values that lead to them

 E.g.
For H1, I = {0,0}
For H2, I = {0,1}
For H3, I = {1,0}
For H4, I = {1,1}
For H5, I = {-1,-1}
4. Generate rules that describe the hidden unit
activation value in terms of inputs
 E.g.
(I1 = 0) ^ (I2 = 0) then H1
(I1 = 0) ^ (I2 = 1) then H2
(I1 = 1) ^ (I2 = 0) then H3
(I1 = 1) ^ (I2 = 1) then H4
(I1 =-1) ^ (I2 =-1) then H5

5. Merge the two sets of rules to relate inputs


and outputs
Future Enhancements
 Trainingtimes still longer than those required
by decision trees

 Incremental training

 Reduce training time and improve


classification accuracy by feature selection

You might also like