Unit 1
Unit 1
com
UNIT-1
ARCHITECTURE
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Machine Learning
Machine learning involves adaptive mechanisms
that enable computers to learn from experience,
learn by example and learn by analogy. Learning
capabilities can improve the performance of an
intelligent system over time. The most popular
approaches to machine learning are artificial
neural networks and genetic algorithms. This
lecture is dedicated to neural networks.
www.Vidyarthiplus.com
Soma
Synapse
Dendrites
Axon
Soma
Dendrites
Synapse
www.Vidyarthiplus.com
Weights
Output Signals
x1
w1
x2
w2
Neuron
wn
xn
Y
5
www.Vidyarthiplus.com
Input Signals
Middle Layer
Input Layer
Output Layer
6
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
n
www.Vidyarthiplus.com
n
10
www.Vidyarthiplus.com
Network Structure
The output signal is transmitted through the
neurons outgoing connection. The outgoing
connection splits into a number of branches
that transmit the same signal. The outgoing
branches terminate at the incoming
connections of other neurons in the network.
11
www.Vidyarthiplus.com
Axon
Soma
Synapse
Dendrites
Axon
Synapse
Soma
Dendrites
Middle Layer
Synapse
Input Layer
Output Layer
12
www.Vidyarthiplus.com
Course Topics
Learning Tasks
Supervised
Data:
Labeled examples
(input , desired output)
Tasks:
classification
pattern recognition
regression
NN models:
perceptron
adaline
feed-forward NN
radial basis function
support vector machines
Unsupervised
Data:
Unlabeled examples
(different realizations of the
input)
Tasks:
clustering
content addressable memory
NN models:
self-organizing maps (SOM)
Hopfield networks
13
www.Vidyarthiplus.com
Network architectures
Three different classes of network architectures
single-layer feed-forward neurons are organized
multi-layer feed-forward
in acyclic layers
recurrent
14
www.Vidyarthiplus.com
Input layer
of
source nodes
Output layer
of
neurons
15
www.Vidyarthiplus.com
Output
layer
Input
layer
Hidden Layer
16
www.Vidyarthiplus.com
Recurrent network
Recurrent Network with hidden neuron: unit delay operator z-1 is
used to model a dynamic system
z-1
z-1
input
hidden
output
z-1
17
www.Vidyarthiplus.com
The Neuron
Bias
b
x1
Input
values
w1
x2
w2
()
Output
y
Summing
function
xm
Local
Field
Activation
function
wm
weights
18
www.Vidyarthiplus.com
The Neuron
The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of links, describing the neuron inputs, with
weights W1, W2, , Wm
2 An adder function (linear combiner) for computing the
weighted sum of
mthe inputs
(real numbers):
j j
j 1
u w x
y (u b)
19
www.Vidyarthiplus.com
Bias of a Neuron
The bias b has the effect of applying an affine
transformation to the weighted sum u
v=u+b
v is called induced field of the neuron
x2
x1-x2= -1
x1-x2=0
x1 x2
x1-x2= 1
x1
20
www.Vidyarthiplus.com
Input
signal
w0
w0 b
w2
Local
Field
v
Activation
function
()
Output
y
Summing
function
..
xm
j j
j0
w1
x2
wx
wm
Synaptic
weights
21
www.Vidyarthiplus.com
Activation Function
There are different activation functions used in different applications. The
most common ones are:
Hard-limiter
1 if v 0
v
0 if v 0
Piecewise linear
if v 1 2
1
v v if 1 2 v 1 2
0
if v 1 2
Sigmoid
1
v
1 exp( av)
Hyperbolic tangent
v tanhv
22
www.Vidyarthiplus.com
Neuron Models
The choice of
step function:
ramp function:
sigmoid function:
with z,x,y parameters
a if v c
(v )
b if v c
a if v c
( v ) b if v d
a (( v c )(b a ) /(d c )) otherwise
(v ) z
Gaussian function:
(v )
1
1 exp( xv y )
1 v 2
1
exp
2
2
23
www.Vidyarthiplus.com
Learning Algorithms
Depend on the network architecture:
Error correcting learning (perceptron)
Delta rule (AdaLine, Backprop)
Competitive Learning (Self Organizing Maps)
24
www.Vidyarthiplus.com
Classification:
Applications
Image recognition
Speech recognition
Diagnostic
Fraud detection
Regression:
Pattern association:
Clustering:
clients profiles
disease subtypes
25
www.Vidyarthiplus.com
Supervised Learning
Training and test data sets
Training set: input & target
26
www.Vidyarthiplus.com
Perceptron: architecture
We consider the architecture: feed-forward NN
with one layer
It is sufficient to study single layer perceptrons
with just one neuron:
27
www.Vidyarthiplus.com
www.Vidyarthiplus.com
1 if v 0
(v )
1 if v 0
b (bias)
x1
x2
w1
w2
y
(v)
wn
xn
29
www.Vidyarthiplus.com
30
www.Vidyarthiplus.com
Perceptron Training
How can we train a perceptron for a
classification task?
We try to find suitable values for the
weights in such a way that the training
examples are correctly classified.
Geometrically, we try to find a hyper-plane
that separates the examples of the two
classes.
31
www.Vidyarthiplus.com
x2
w x w
i 1
i i
decision
boundary
decision
region for C1
w1x1 + w2x2 + w0 >= 0
C1
C2
x1
w1x1 + w2x2 + w0 = 0
32
www.Vidyarthiplus.com
Example: AND
-1 AND -1 = false
-1 AND +1 = false
+1 AND -1 = false
+1 AND +1 = true
33
www.Vidyarthiplus.com
34
www.Vidyarthiplus.com
35
www.Vidyarthiplus.com
Example: XOR
Heres the XOR function:
-1 XOR -1 = false
-1 XOR +1 = true
+1 XOR -1 = true
+1 XOR +1 = false
www.Vidyarthiplus.com
37
www.Vidyarthiplus.com
Perceptron: Limitations
The perceptron can only model linearly separable
classes, like (those described by) the following Boolean
functions:
AND
OR
COMPLEMENT
It cannot model the XOR.
You can experiment with these functions in the Matlab
practical lessons.
38
www.Vidyarthiplus.com
o(x)=w0 + w1 x1 + + wn xn
39
www.Vidyarthiplus.com
Replace the step function in the perceptron with a continuous (differentiable)
function f, e.g the simplest is linear function
With or without the threshold, the Adaline is trained based on the output of the
function f rather than the final output.
+/
f (x)
(Adaline)
40
www.Vidyarthiplus.com
Incremental Stochastic
Gradient Descent
41
www.Vidyarthiplus.com
E(w)
e
e
w
w
T
e[x ]
Delta rule for weight update:
www.Vidyarthiplus.com
) d(n) w(n)
ne=(n
n+1;
x(n)
w(n 1) w(n) e(n)x(n)
end-while;
x(n)
w(n 1) w(n) e(n)
|| x(n) ||
43
www.Vidyarthiplus.com
44
www.Vidyarthiplus.com
Outline
INTRODUCTION
ADALINE
MADALINE
Least-Square Learning Rule
The proof of Least-Square Learning Rule
45
www.Vidyarthiplus.com
www.Vidyarthiplus.com
x0
x1
w0
w1
wn
xn
f
Linear Graded Unit (LGU)
f(s)
Percptron: LTU
Gradient-Decent
sgn(s)
tanh(s)
linear(s)
s
47
MADALINE: Many ADALINEs; Network
of ADALINEs
www.Vidyarthiplus.com
ADALINE
ADALINE(Adaptive Linear Neuron) is a network
model proposed by Bernard Widrow in1959.
X1
PE
X2
X3
48
www.Vidyarthiplus.com
Method
Method : The value in each unit must +1 or 1
X iWi
net =
X 0 1 net W0 W1 X 1 W2 X 2 Wn X n
1
Y
1
net 0
if
net 0
49
www.Vidyarthiplus.com
Method
Wi (T-Y) X i , T expected output
Wi Wi Wi
ADALINE can solve only linear problem(the limitation)
50
www.Vidyarthiplus.com
MADALINE
MADALINE It is composed of many ADALINE
Multilayer Adaline.
Xi
Wij
netj
No Wij
Yj
Xn
www.Vidyarthiplus.com
X1
t
X j ( X 0 , X 1 , , X n ) , (i.e. X ) 1 j p
X
n
W0
W1
t
W0 (W0 , W1 , , Wn ) , (i.e. W )
W
n
n
Net j W t X j , i.e., Net j W0 X 0 W1 X 1 Wn X n
i 0
52
www.Vidyarthiplus.com
or
R correlation matrix
P
R' '
t
'
'
'
where R , R R1 R2 RP X j X j
p
j 1
t
T
X
j
j
j 1
Pt
RW * P
53
www.Vidyarthiplus.com
X1
X2
X3
Tj
X1
X2
X3
-1
T2 1
54
1
X 3 1
1
T3 1
www.Vidyarthiplus.com
1
1 1 0
'
R1 1 110 1 1 0
0
0 0 0
1
1
1 0 1
3 2 2
'
R 2 0 101 0 0 0 R 2 2 1 2
3
3
1
1 0 1
2 1 2
3
1
1 1 1
'
R3 1111 1 1 1
1
1 1 1
2
2
1
3
3
3
2
3
1
3
2
3
55
www.Vidyarthiplus.com
P1 1 1,1,0 110
t 1
1
t
P2 1 1,0,1 101
P 100 ,0,0
3
3
t
P3 1 1,1,1 1 1 1
t
1
1 2 2 W1 3 3W1 2W2 2W3 1 W 3
1
3 3
2
2
1
R W P
W2 0 2W1 2W2 W3 0 W2 -2
3
3
3
2 1 2 W 0 2W W 2W 0 W1 -2
3 3 3 3 1
2
3
56
www.Vidyarthiplus.com
3
-2
ADALINE
-2
X3
57
www.Vidyarthiplus.com
1 L 2 2 L
1 L 2
T Tk Yk Yk let Tk2 is mean of
L k 1
L k 1
L k 1
<Tk2>
L
2 L
1
t
t
T Tk Yk [W ( X k X k )W ]
L k 1
L
k 1
2
k
58
www.Vidyarthiplus.com
ps. Yk ( wi xik ) (W X k ) (W t X k )( X k W )
2
k 1
k 1
i 1
k 1
k 1
W ( X k X k ) W
t
k 1
L
L
2
t 1
Tt Tk Yk W [ ( X k X t )]W
L k 1
L k 1
2
2 L
t
t
Tt Tk (W k X k ) W t X k X k W
L k 1
2
1 L
t
t
Tt 2[ (Tk X k )W ] W t X k X k W
L k 1
2
Tt 2 2Tk X k W W t X k X k W
t
59
www.Vidyarthiplus.com
k 1
2 RW 2 P
k
if
2 RW * 2 P =
W
W * = R -1P RW *=P
2
60
www.Vidyarthiplus.com
Adaline
Architecture
Single-layer
Single-layer
Neuron
model
Learning
algorithm
Non-linear
linear
Minimze
number of
misclassified
examples
Minimize total
error
Application
Linear
classification