ANN Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 54

DATA

MINING:CLASSIFICATION
Supervised Learning
ARTIFICIAL NEURAL NETWORK(ANN)
HUMAN NERVE CELL
ANN
Artificial neural networks (ANN) or connectionist systems
are computing systems vaguely inspired by the biological
neural networks that constitute animal brains.
Motivation: the simulation of the neuro
system (human brain)’s information
processing mechanisms
Structure: huge amount of densely
connected, mutally operating processing
units (neurons)
It learns from experiences (training
instances)
Some neurobiology…
Neurons have many inputs and a single
output
The output is either excited or not
The inputs from other neurons determins
whether the neuron fires
Each input synapse has a weight and
changes in size in response to learning
Axon –carries signals away
Dendrites- carry signals in
ANN structure
ANN are set layers of highly
interconnected processing
elements (neurons) that make a
series of transformations on the
data to generate its own
understanding of it(what we
commonly call features).
Modelled after the human
brain, ANN has the goal of having
machines mimic how the brain
works.
Understanding the representation
Input layer — It is used to pass in our input(an
image, text or any suitable type of data for NN).
Hidden Layer — These are the layers in between
the input and output layers. These layers are
responsible for learning the mapping between
input and output. (i.e. in the dog and cat gif
above, the hidden layers are the ones
responsible to learn that the dog picture is linked
to the name dog, and it does this through a
series of matrix multiplications and mathematical
transformations to learn these mappings).
Output Layer — This layer is responsible for
giving us the output of the NN given our inputs.
The engine of Neural Networks:
Basic ANN
ANN layers transform the input data through a series of
mathematical and matrix operations to learn a mapping between
input and output:
output = f(σ (W⋅x) + b)
W and b are tensors (multidimensional matrices) that are attributes of the layer.
They are commonly called weights or trainable parameters of the layer
(the kernel and bias respectively).
we do a matrix multiplication between input and weights, add the result to the bias and use an
activation function to put the values into an acceptable range.
Basic ANN……
b - Bias

I
n
Outputs
p
u
W σ f()

t
s

Inputs(x) – normally a vector of measured parameters


Bias(b) – may/may not be added
f() – transfer or activation function(Sigmoid,ReLU,ELU,SELU,……………)
output = f(σ (W⋅x) + b) T
Basic ANN……weights
These weights contain the information learned by the network from
exposure to training data, in other words, the weights contain all
the knowledge.
These weight matrices are filled with small random values (called
random initialization), we do this because initializing all values with
zero doesn’t work with ANN because when we update the values, all
will update to the same value repeatedly, this is not good because
our network won’t be able to generalize its learning because it will
always end up in the same place, so when we randomly initialize
the weights we brake the symmetry.
Activation functions
Applied on Weighted average of
inputs.
σ (W⋅x) + b
If the average is above a threshold T it
fires (outputs 1) else its output is 0 or -
1.
Activation Functions
Log Sigmoidal Function
Perceptron
Multilayer perceptron =ANN
Different
representation at
various layers
Multilayer perceptron
Feedforward neural networks
Connection only to the next layer
The weights of the connections (between two layers) can be changed
Activation functions are used to calculate whether the neuron fires
Three-layer network:
◦ Input layer
◦ Hidden layer
◦ Output layer
General (three-layer) feedforward
network (c output unit)
 nH  d  
g k ( x)  zk  f k   wkj f j   w ji xi  w j 0   wk 0 

 j 1  i 1  
(k  1,..., c)

◦The hidden units with their activation functions can express non-
linear functions

◦The activation functions can be different at neurons (but the same


one is used in practice)
Training of neural networks
(backpropagation)
Training of neural networks
The network topology is given
The same activation function is used at each hidden neuron and it is given
Training = calibration of weights
on-line learning (epochs)
Training of neural networks
1.Forward propagation
An input vector propagates through the network

2.Weight update (backpropagation)


the weights of the network will be changed in order to decrease the difference
between the predicted and gold standard values
-0.06

W1

W2
-2.5 f(x)
W3

1.4
-0.06

2.7

-8.6
-2.5 f(x)
0.002 x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34

1.4
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Training the neural network
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Training data
Fields class Initialise with random weights
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Training data
Fields class Present a training pattern
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7

1.9
Training data
Fields class Feed it through to get output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8

1.9
Training data
Fields class Compare with target output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
0
1.9 error 0.8
Training data
Fields class Adjust weights based on error
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
0
1.9 error 0.8
Training data
Fields class Present a training pattern
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8

1.7
Training data
Fields class Feed it through to get output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9

1.7
Training data
Fields class Compare with target output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
Training data
Fields class Adjust weights based on error
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
Training data
Fields class And so on ….
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1

Repeat this thousands, maybe millions of times – each time


taking a random training instance, and making slight
weight adjustments
Algorithms for weight adjustment are designed to make
changes that will reduce the error
The decision boundary perspective…
Initial random weights
The decision boundary perspective…
Present a training instance / adjust the weights
The decision boundary perspective…
Present a training instance / adjust the weights
The decision boundary perspective…
Present a training instance / adjust the weights
The decision boundary perspective…
Present a training instance / adjust the weights
The decision boundary perspective…
Eventually ….
Training of neural networks

we can calculate (propagate back) the error signal for each hidden neuron
tk is the target (gold standard) value of output neuron k, zk is the prediction at
output neuron k (k = 1, …, c) and w are the weights

Error:

◦ backpropagation is a gradient descent algorithms


◦ initial weights are random, then

1 c 1
J ( w )   ( t k  zk )  t  z
2 2

2 k 1 2

J
w  
w
Backpropagation
The error of the weights between the hidden and output layers:

the error signal for output neuron k:

J J netk netk
 .   k
wkj netk wkj wkj

J
k  
netk
because netk = wkty:

net k
 yj
and:

wkj
The change of weights between the hidden and output layers:
wkj = kyj = (tk – zk) f’ (netk)yj

zk  f (netk )
J J zk
k    .  (t k  zk ) f ' (netk )
netk zk netk
The gradient of the hidden units:

d
y j  f (net j ), net j   w ji xi
i 0

J J y j net j
 . .
w ji y j net j w ji
J  1 c 2
c
zk

y j y j  2  (t k  zk )    (t k  zk ) y
 k 1  k 1 j
c
zk netk c
  (tk  zk ) .   (tk  zk ) f ' (netk ) wkj
k 1 netk y j k 1

k
The error signal of the hidden units:

The weight change between the input and hidden layers:

c
 j  f ' ( net j ) wkj k
k 1

w ji  xi j   w kj k  f ' ( net j ) x i





j
Backpropagation
Calculate the error signal for hidden neurons

output

c
 j  f ' ( net j ) wkj k rejtett
k 1

input
Backpropagation
Update the weights between the input and hidden neurons

output

rejtett
updating the ones to j
w ji   j xi
input
Questions of network design
How many hidden neurons?
◦ few neurons cannot learn complex patterns
◦ too many neurons can easily overfit
◦ validation set?

Learning rate!?
Deep learning

You might also like