ANN Notes
ANN Notes
ANN Notes
MINING:CLASSIFICATION
Supervised Learning
ARTIFICIAL NEURAL NETWORK(ANN)
HUMAN NERVE CELL
ANN
Artificial neural networks (ANN) or connectionist systems
are computing systems vaguely inspired by the biological
neural networks that constitute animal brains.
Motivation: the simulation of the neuro
system (human brain)’s information
processing mechanisms
Structure: huge amount of densely
connected, mutally operating processing
units (neurons)
It learns from experiences (training
instances)
Some neurobiology…
Neurons have many inputs and a single
output
The output is either excited or not
The inputs from other neurons determins
whether the neuron fires
Each input synapse has a weight and
changes in size in response to learning
Axon –carries signals away
Dendrites- carry signals in
ANN structure
ANN are set layers of highly
interconnected processing
elements (neurons) that make a
series of transformations on the
data to generate its own
understanding of it(what we
commonly call features).
Modelled after the human
brain, ANN has the goal of having
machines mimic how the brain
works.
Understanding the representation
Input layer — It is used to pass in our input(an
image, text or any suitable type of data for NN).
Hidden Layer — These are the layers in between
the input and output layers. These layers are
responsible for learning the mapping between
input and output. (i.e. in the dog and cat gif
above, the hidden layers are the ones
responsible to learn that the dog picture is linked
to the name dog, and it does this through a
series of matrix multiplications and mathematical
transformations to learn these mappings).
Output Layer — This layer is responsible for
giving us the output of the NN given our inputs.
The engine of Neural Networks:
Basic ANN
ANN layers transform the input data through a series of
mathematical and matrix operations to learn a mapping between
input and output:
output = f(σ (W⋅x) + b)
W and b are tensors (multidimensional matrices) that are attributes of the layer.
They are commonly called weights or trainable parameters of the layer
(the kernel and bias respectively).
we do a matrix multiplication between input and weights, add the result to the bias and use an
activation function to put the values into an acceptable range.
Basic ANN……
b - Bias
I
n
Outputs
p
u
W σ f()
t
s
◦The hidden units with their activation functions can express non-
linear functions
W1
W2
-2.5 f(x)
W3
1.4
-0.06
2.7
-8.6
-2.5 f(x)
0.002 x = -0.06×2.7 + 2.5×8.6 + 1.4×0.002 = 21.34
1.4
A dataset
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Training the neural network
Fields class
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Training data
Fields class Initialise with random weights
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0
etc …
Training data
Fields class Present a training pattern
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7
1.9
Training data
Fields class Feed it through to get output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
1.9
Training data
Fields class Compare with target output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
0
1.9 error 0.8
Training data
Fields class Adjust weights based on error
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 1.4
etc …
2.7 0.8
0
1.9 error 0.8
Training data
Fields class Present a training pattern
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8
1.7
Training data
Fields class Feed it through to get output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1.7
Training data
Fields class Compare with target output
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
Training data
Fields class Adjust weights based on error
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
Training data
Fields class And so on ….
1.4 2.7 1.9 0
3.8 3.4 3.2 0
6.4 2.8 1.7 1
4.1 0.1 0.2 0 6.4
etc …
2.8 0.9
1
1.7 error -0.1
we can calculate (propagate back) the error signal for each hidden neuron
tk is the target (gold standard) value of output neuron k, zk is the prediction at
output neuron k (k = 1, …, c) and w are the weights
Error:
1 c 1
J ( w ) ( t k zk ) t z
2 2
2 k 1 2
J
w
w
Backpropagation
The error of the weights between the hidden and output layers:
J J netk netk
. k
wkj netk wkj wkj
J
k
netk
because netk = wkty:
net k
yj
and:
wkj
The change of weights between the hidden and output layers:
wkj = kyj = (tk – zk) f’ (netk)yj
zk f (netk )
J J zk
k . (t k zk ) f ' (netk )
netk zk netk
The gradient of the hidden units:
d
y j f (net j ), net j w ji xi
i 0
J J y j net j
. .
w ji y j net j w ji
J 1 c 2
c
zk
y j y j 2 (t k zk ) (t k zk ) y
k 1 k 1 j
c
zk netk c
(tk zk ) . (tk zk ) f ' (netk ) wkj
k 1 netk y j k 1
k
The error signal of the hidden units:
c
j f ' ( net j ) wkj k
k 1
output
c
j f ' ( net j ) wkj k rejtett
k 1
input
Backpropagation
Update the weights between the input and hidden neurons
output
rejtett
updating the ones to j
w ji j xi
input
Questions of network design
How many hidden neurons?
◦ few neurons cannot learn complex patterns
◦ too many neurons can easily overfit
◦ validation set?
Learning rate!?
Deep learning