K Mean Clustering
K Mean Clustering
K Mean Clustering
Networks
Introduction
Artificial Neural Networks (ANN)
Information processing paradigm inspired by
biological nervous systems
ANN is composed of a system of neurons
connected by synapses
ANN learn by example
Adjust synaptic connections between neurons
History
1943: McCulloch and Pitts model neural
networks based on their understanding of
neurology.
Neurons embed simple logic functions:
a or b
a and b
1950s:
Farley and Clark
IBM group that tries to model biological behavior
Consult neuro-scientists at McGill, whenever stuck
Better
learning rule for generic three layer networks
Regenerates interest in the 1980s
Successful applications in medicine, marketing,
risk management, … (1990)
In need for another breakthrough.
ANN
Promises
Combine speed of silicon with proven success
of carbon artificial brains
Neuron Model
Natural neurons
Neuron Model
Neuron collects signals from dendrites
Sends out spikes of electrical activity through an
axon, which splits into thousands of branches.
At end of each brand, a synapses converts
activity into either exciting or inhibiting activity of
a dendrite at another neuron.
Neuron fires when exciting activity surpasses
inhibitory activity
Learning changes the effectiveness of the
synapses
Neuron Model
Natural neurons
Neuron Model
Abstract neuron model:
ANN Forward Propagation
ANN Forward Propagation
Bias Nodes
Add one node to each layer that has constant
output
Forward propagation
Calculatefrom input layer to output layer
For each neuron:
Calculate weighted average of input
Calculate activation function
Neuron Model
Firing Rules:
Threshold rules:
Calculate weighted average of input
Fire if larger than threshold
Perceptron rule
Calculate weighted average of input input
Output activation level is
1
1
2
1
( ) 0
2
0 0
Neuron Model
Firing Rules: Sigmoid functions:
Hyperbolic tangent function
1 exp( )
tanh( / 2)
1 exp( )
0 w0,3=.6
2 w2,3=-.9
Bias Node
ANN Forward Propagation
Example: Three layer network
Calculates xor of inputs
-4.8
0 2 5.9
5.1
4
4.6
1 3 5.2
-5.2
-3.2 -2.6 -2.7
Bias
ANN Forward Propagation
Input (0,0)
-4.8
0 2 5.9
5.1
4
4.6
1 3 5.2
-5.2
-3.2 -2.6 -2.7
Bias
ANN Forward Propagation
Input (0,0)
Node 2 activation is (-4.8 0+4.6 0 - 2.6)= 0.0691
-4.8
0 2 5.9
5.1
4
4.6
1 3 5.2
-5.2
-3.2 -2.6 -2.7
Bias
ANN Forward Propagation
Input (0,0)
Node 3 activation is (5.1 0 - 5.2 0 - 3.2)= 0.0392
-4.8
0 2 5.9
5.1
4
4.6
1 3 5.2
-5.2
-3.2 -2.6 -2.7
Bias
ANN Forward Propagation
Input (0,0)
Node 4 activation is (5.9 0.069 + 5.2 0.069 – 2.7)= 0.110227
-4.8
0 2 5.9
5.1
4
4.6
1 3 5.2
-5.2
-3.2 -2.6 -2.7
Bias
ANN Forward Propagation
Input (0,1)
Node 2 activation is (4.6 -2.6)= 0.153269
-4.8
0 2 5.9
5.1
4
4.6
1 3 5.2
-5.2
-3.2 -2.6 -2.7
Bias
ANN Forward Propagation
Input (0,1)
Node 3 activation is (-5.2 -3.2)= 0.000224817
-4.8
0 2 5.9
5.1
4
4.6
1 3 5.2
-5.2
-3.2 -2.6 -2.7
Bias
ANN Forward Propagation
Input (0,1)
Node 4 activation is (5.9 0.153269 + 5.2 0.000224817 -2.7 )=
0.923992
-4.8
0 2 5.9
5.1
4
4.6
1 3 5.2
-5.2
-3.2 -2.6 -2.7
Bias
ANN Forward Propagation
Density Plot of
Output
ANN Forward Propagation
ANN Forward Propagation
Network can learn a non-linearly
separated set of outputs.
Need to map output (real value) into binary
values.
ANN Training
Weights are determined by training
Back-propagation:
On given input, compare actual output to desired
output.
Adjust weights to output nodes.
Adjust weights
ANN Training
Error is the mean square of differences in
output layer
1 K 2
E ( x ) ( y k ( x ) t k ( x ))
2 k 1
y – observed output
t – target output
ANN Training
Error of training epoch is the average of all
errors.
ANN Training
Update weights and thresholds using
E ( x )
Weights w j ,k w j ,k ( )
w jk
E ( x )
Bias k k ( )
k
Bias
ANN Training Example
x1 x2 y Error
0 -0,5 2 0.1 0 0 0.69 0.472448
-0.5
4 0 1 0.67 0.110583
1
1 3 -0.5 1 0 0.70 0.0911618
1
-1 -0.5 1 1 1 0.68 0.457959
Bias
Bias
’( net4(0,0)) = ’( 0.787754) = 0.214900
’( net4(0,1)) = ’( 0.696717) = 0.221957
’( net4(1,0)) = ’( 0.838124) = 0.210768
’( net4(1,1)) = ’( 0.738770) = 0.218768
ANN Training Example
New weights going into node 4
We now obtain values for each input
0 -0,5 2 0.1 separately:
-0.5
4 Input 0,0:
1
1 3 -0.5 4= ’( net4(0,0)) *(0-y4(0,0)) = -0.152928
1
-1 -0.5 1 Input 0,1:
0 -0,5 2 0.1
-0.5
4
1 New weight from 2 to 4 is now going to be
1 3 -0.5
1 0.1190451.
-1 -0.5 1
Bias
E ( x )
y j j
w jk
j f ' (net j )(t j y j )
ANN Training Example
New weights going into node 4
For (0,0): E4,3 = 0.0411287
0 -0,5 2 0.1
-0.5 For (0,1): E4,3 = -0.0341162
4
1 For(1,0): E4,3 = -0.0108341
1 3 -0.5
1 For(1,1): E4,3 = 0.0580565
-1 -0.5 1
Average is 0.0135588
Bias
E ( x ) New weight is -0.486441
y j j
w jk
j f ' (net j )(t j y j )
ANN Training Example
New weights going into node 4:
We also need to change the bias node
0 -0,5 2 0.1
-0.5 For (0,0): E4,B = 0.0411287
4
1 For (0,1): E4,B = -0.0341162
1 3 -0.5
1 For(1,0): E4,B = -0.0108341
-1 -0.5 1
For(1,1): E4,B = 0.0580565
Bias Average is 0.0447706
E ( x )
y j j
w jk New weight is 1.0447706
E
yi j
w ji
ANN Training Example
We now calculate the updates to the
weights of neuron 2.
0 -0,5 2 0.1 First, we calculate the net-input into 2.
-0.5
4 This is really simple because it is just a
1
1 3 -0.5 linear functions of the arguments x1 and x2
1
-1 -0.5 1 net2 = -0.5 x1 + x2 - 0.5
We obtain
Bias
2 (0,0) = - 0.00359387
2(0,1) = 0.00160349
j ' (net j ) (δk wkj )
k 2(1,0) = 0.00116766
E 2(1,1) = - 0.00384439
yi j
w ji
ANN Training Example
Call E20 the derivative of E with respect to
w20. We use the output activation for the
0 -0,5 2 0.1 neurons in the previous layer (which
-0.5 happens to be the input layer)
4
1
1 3 -0.5 E20 (0,0) = - (0)2 (0,0) = 0.00179694
1
-1 -0.5 1 E20(0,1) = 0.00179694
E20(1,0) = -0.000853626
Bias
E20(1,1) = 0.00281047
E
yi j
w ji
ANN Training Example
We now calculate the updates to the
weights of neuron 3.
0 -0,5 2 0.1
-0.5
1
4 …
1 3 -0.5
1
-1 -0.5 1
Bias
E
yi j
w ji
ANN Training
ANN Back-propagation is an empirical
algorithm
ANN Training
XOR is too simple an example, since
quality of ANN is measured on a finite sets
of inputs.
More relevant are ANN that are trained on
a training set and unleashed on real data
ANN Training
Need to measure effectiveness of training
Need training sets
Need test sets.
There can be no interaction between test sets
and training sets.
Example of a Mistake:
Train ANN on training set.
Test ANN on test set.
Results are poor.
Go back to training ANN.
After this, there is no assurance that ANN will work well in
practice.
In a subtle way, the test set has become part of the training set.
ANN Training
Convergence
ANN back propagation uses gradient decent.
Naïve implementations can
overcorrect weights
undercorrect weights
In either case, convergence can be poor
Stuck in the wrong place
ANN starts with random weights and improves them
If improvement stops, we stop algorithm
No guarantee that we found the best set of weights
Could be stuck in a local minimum
ANN Training
Overtraining
An ANN can be made to work too well on a
training set
But loose performance on test sets
Training set
Performance
Test set
Training time
ANN Training
Overtraining
Assume we want to separate the red from the green dots.
Eventually, the network will learn to do well in the training case
But have learnt only the particularities of our training set
ANN Training
Overtraining
ANN Training
Improving Convergence
Many Operations Research Tools apply
Simulated annealing
Sophisticated gradient descent
ANN Design
ANN is a largely empirical study
“Seems to work in almost all cases that we
know about”
Known to be statistical pattern analysis
ANN Design
Number of layers
Apparently, three layers is almost always good
enough and better than four layers.
Also: fewer layers are faster in execution and training
How many hidden nodes?
Many hidden nodes allow to learn more complicated
patterns
Because of overtraining, almost always best to set the
number of hidden nodes too low and then increase
their numbers.
ANN Design
Interpreting Output
ANN’s output neurons do not give binary
values.
Good or bad
Need to define what is an accept.
E
output j k
Wkj
Pseudo-Code
For each hidden neuron j calculate:
1 Node 0: x0
0 2 0.3
0 Node 1: x1
4 Node 2: o2 = (x0 + x1 -0.5)
1
1 3 -0.7 Node 3: o3 = (0.5 x1 -1)
0.5
Node 4: o4 = (0.3 o2 – 0.7 o3 + 1)
-1 -0.5 1
Bias
ANN Training Example 2
Calculate outputs
1 x1 x2 y=o4
0 2 0.3
0 0 0 0.7160
4
1 0 1 0.7155
1 3 -0.7
0.5 1 0 0.7308
-1 -0.5 1 1 1 0.7273
Bias
ANN Training Example 2
Calculate average error to be E = 0.14939
0
0
1 2 0.3 x0 x1 y t E=(y-t)2/2
1 4 0 0 0 0.2564
1 3 -
0.5 0.7160
-1 -
0.7 1
0.5 0 1 0.7155 1 0.0405
Bia
s 1 0 0.7308 1 0.0362
1 1 0.7273 0 0.264487
ANN Training Example 2
Calculate the change for node 4
0 1 2 0.3
0
1 4
1 3 -0.7
0.5
-1 -0.5 1
Bias
Need to calculate net4, the weighted input of all input into node 4
net4(x0,x1) = 0.3 o2(x0,x1) – 0.7 o3(x0,x1) + 1
net4 = (net4(0,0) + net4(0,1) + net4(1,0) + net4(1,1))/4
This gives 0.956734
ANN Training Example 2
Calculate the change for node 4
1
0 2 0.3 E ( x )
0 y j j
1 4
w jk
1 3 -0.7
0.5
-1 -0.5 1 j f ' (net j )(t j y j )
Bias
We now calculate
4(0,0) = ’(net4(0,0)(0 - o4(0,0)) = - 0.14588
4(0,1) = ’(net4(0,1)(1 - o4(0,1)) = 0.05790
4(1,1) = ’(net4(1,0)(0 - o4(1,0)) = 0.05297
4(1,1) = ’(net4(1,1)(0 - o4(1,1)) = -0.14425
On average 4 = -0.044741
ANN Training Example 2
Calculate the change for node 4
1
0 2 0.3 E ( x )
0 y j j
1 4
w jk
1 3 -0.7
0.5
-1 -0.5 1 j f ' (net j )(t j y j )
4 = -0.044741 Bias
We can now update the weights for node 4
E4,2(0,0) = -o2(0,0)* 4 =0.01689
E4,2(0,1) = -o2(0,1)* 4 = 0.02785
E4,2(1,0) = -o2(1,0)* 4 = 0.02785
E4,2(0,0) = -o2(0,0)* 4 =0.03658
with average 0.00708
ANN Training Example 2
Calculate the change for node 4
1
0 2 0.3 E ( x )
0 y j j
1 4
w jk
1 3 -0.7
0.5
-1 -0.5 1 j f ' (net j )(t j y j )
Bias
E4,2 = 0.00708
Therefore, new weight w42 is 0.2993