0% found this document useful (0 votes)
130 views

Unit 1

This document provides an overview of neural networks and their architecture. It discusses: - How neural networks are inspired by biological brains and consist of simple computational elements connected in networks. - The basic components of artificial neurons, including inputs, weights, summation functions and activation functions. - Common network architectures like single-layer feedforward, multi-layer feedforward and recurrent networks. - Learning algorithms used to train networks, including error correction and backpropagation. - Applications of neural networks like classification, regression, clustering and pattern association. - The basic perceptron model and how it can be trained to perform binary classification tasks.

Uploaded by

GowthamUcek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views

Unit 1

This document provides an overview of neural networks and their architecture. It discusses: - How neural networks are inspired by biological brains and consist of simple computational elements connected in networks. - The basic components of artificial neurons, including inputs, weights, summation functions and activation functions. - Common network architectures like single-layer feedforward, multi-layer feedforward and recurrent networks. - Learning algorithms used to train networks, including error correction and backpropagation. - Applications of neural networks like classification, regression, clustering and pattern association. - The basic perceptron model and how it can be trained to perform binary classification tasks.

Uploaded by

GowthamUcek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

www.Vidyarthiplus.

com

UNIT-1

ARCHITECTURE

www.Vidyarthiplus.com

What are Neural Networks?


Simple computational elements forming a large
network
Emphasis on learning (pattern recognition)
Local computation (neurons)

Definition of NNs is vague


Often | but not always | inspired by biological brain

www.Vidyarthiplus.com

Machine Learning
Machine learning involves adaptive mechanisms
that enable computers to learn from experience,
learn by example and learn by analogy. Learning
capabilities can improve the performance of an
intelligent system over time. The most popular
approaches to machine learning are artificial
neural networks and genetic algorithms. This
lecture is dedicated to neural networks.

www.Vidyarthiplus.com

Biological neural network


Synapse
Axon

Soma

Synapse

Dendrites
Axon

Soma

Dendrites
Synapse

www.Vidyarthiplus.com

The neuron as a simple computing element


Diagram of a neuron
Input Signals

Weights

Output Signals

x1

w1
x2

w2
Neuron
wn

xn

Y
5

www.Vidyarthiplus.com

Input Signals

Out put Signals

Architecture of a typical artificial neural network

Middle Layer
Input Layer

Output Layer
6

www.Vidyarthiplus.com

A neural network can be defined as a model of


reasoning based on the human brain. The brain
consists of a densely interconnected set of nerve
cells, or basic information-processing units, called
neurons.
The human brain incorporates nearly 10 billion
neurons and 60 trillion connections, synapses,
between them. By using multiple neurons
simultaneously, the brain can perform its functions
much faster than the fastest computers in
existence today.
7

www.Vidyarthiplus.com

Each neuron has a very simple structure, but an


army of such elements constitutes a tremendous
processing power.
A neuron consists of a cell body, soma, a number of
fibers called dendrites, and a single long fiber
called the axon.

www.Vidyarthiplus.com
n

Our brain can be considered as a highly complex,


non-linear and parallel information-processing
system.
Information is stored and processed in a neural
network simultaneously throughout the whole
network, rather than at specific locations. In other
words, in neural networks, both data and its
processing are global rather than local.
Learning is a fundamental and essential
characteristic of biological neural networks. The
ease with which they can learn led to attempts to
emulate a biological neural network in a computer.
9

www.Vidyarthiplus.com
n

An artificial neural network consists of a number of


very simple processors, also called neurons, which
are analogous to the biological neurons in the
brain.
The neurons are connected by weighted links
passing signals from one neuron to another.

10

www.Vidyarthiplus.com

Network Structure
The output signal is transmitted through the
neurons outgoing connection. The outgoing
connection splits into a number of branches
that transmit the same signal. The outgoing
branches terminate at the incoming
connections of other neurons in the network.

11

www.Vidyarthiplus.com

Analogy between biological and


artificial neural networks

Axon

Soma

Synapse

Dendrites
Axon

Out put Signals

Synapse

Artificial Neural Network


Neuron
Input
Output
Weight
Input Signals

Biological Neural Network


Soma
Dendrite
Axon
Synapse

Soma

Dendrites

Middle Layer

Synapse

Input Layer

Output Layer

12

www.Vidyarthiplus.com

Course Topics
Learning Tasks
Supervised
Data:
Labeled examples
(input , desired output)
Tasks:
classification
pattern recognition
regression
NN models:
perceptron
adaline
feed-forward NN
radial basis function
support vector machines

Unsupervised
Data:
Unlabeled examples
(different realizations of the
input)
Tasks:
clustering
content addressable memory
NN models:
self-organizing maps (SOM)
Hopfield networks

13

www.Vidyarthiplus.com

Network architectures
Three different classes of network architectures
single-layer feed-forward neurons are organized
multi-layer feed-forward
in acyclic layers
recurrent

The architecture of a neural network is linked with the


learning algorithm used to train

14

www.Vidyarthiplus.com

Single Layer Feed-forward

Input layer
of
source nodes

Output layer
of
neurons

15

www.Vidyarthiplus.com

Multi layer feed-forward


3-4-2 Network

Output
layer

Input
layer

Hidden Layer

16

www.Vidyarthiplus.com

Recurrent network
Recurrent Network with hidden neuron: unit delay operator z-1 is
used to model a dynamic system

z-1

z-1

input
hidden
output

z-1

17

www.Vidyarthiplus.com

The Neuron
Bias
b

x1

Input
values

w1

x2

w2

()

Output
y

Summing
function

xm

Local
Field

Activation
function

wm

weights
18

www.Vidyarthiplus.com

The Neuron
The neuron is the basic information processing unit of a
NN. It consists of:
1 A set of links, describing the neuron inputs, with
weights W1, W2, , Wm
2 An adder function (linear combiner) for computing the
weighted sum of
mthe inputs
(real numbers):
j j
j 1

u w x

3 Activation function (squashing function)


for

limiting the amplitude of the neuron output.

y (u b)
19

www.Vidyarthiplus.com

Bias of a Neuron
The bias b has the effect of applying an affine
transformation to the weighted sum u
v=u+b
v is called induced field of the neuron
x2

x1-x2= -1
x1-x2=0

x1 x2

x1-x2= 1

x1

20

www.Vidyarthiplus.com

Bias as extra input


The bias is an external parameter of the neuron. It can be
modeled by adding an extra input.
m
x0 = +1
x1

Input
signal

w0

w0 b

w2

Local
Field
v

Activation
function

()

Output
y

Summing
function

..
xm

j j

j0

w1

x2

wx

wm

Synaptic
weights
21

www.Vidyarthiplus.com

Activation Function
There are different activation functions used in different applications. The
most common ones are:

Hard-limiter

1 if v 0
v
0 if v 0

Piecewise linear

if v 1 2
1

v v if 1 2 v 1 2
0
if v 1 2

Sigmoid

1
v
1 exp( av)

Hyperbolic tangent

v tanhv

22

www.Vidyarthiplus.com

Neuron Models
The choice of
step function:

determines the neuron model. Examples:

ramp function:

sigmoid function:
with z,x,y parameters

a if v c
(v )
b if v c
a if v c

( v ) b if v d
a (( v c )(b a ) /(d c )) otherwise

(v ) z
Gaussian function:
(v )

1
1 exp( xv y )

1 v 2
1
exp

2
2
23

www.Vidyarthiplus.com

Learning Algorithms
Depend on the network architecture:
Error correcting learning (perceptron)
Delta rule (AdaLine, Backprop)
Competitive Learning (Self Organizing Maps)

24

www.Vidyarthiplus.com

Classification:

Applications

Image recognition
Speech recognition
Diagnostic
Fraud detection

Regression:

Forecasting (prediction on base of past history)


Pattern association:

Retrieve an image from corrupted one


Clustering:

clients profiles
disease subtypes

25

www.Vidyarthiplus.com

Supervised Learning
Training and test data sets
Training set: input & target

26

www.Vidyarthiplus.com

Perceptron: architecture
We consider the architecture: feed-forward NN
with one layer
It is sufficient to study single layer perceptrons
with just one neuron:

27

www.Vidyarthiplus.com

Single layer perceptrons


Generalization to single layer perceptrons with more
neurons is easy because:

The output units are independent among each other


Each weight only affects one of the outputs
28

www.Vidyarthiplus.com

Perceptron: Neuron Model


The (McCulloch-Pitts) perceptron is a single layer
NN with a non-linear , the sign function

1 if v 0
(v )
1 if v 0
b (bias)
x1
x2

w1
w2

y
(v)

wn
xn
29

www.Vidyarthiplus.com

Perceptron for Classification


The perceptron is used for binary
classification.
Given training examples of classes C1, C2 train
the perceptron in such a way that it classifies
correctly the training examples:
If the output of the perceptron is +1 then the input is
assigned to class C1
If the output is -1 then the input is assigned to C2

30

www.Vidyarthiplus.com

Perceptron Training
How can we train a perceptron for a
classification task?
We try to find suitable values for the
weights in such a way that the training
examples are correctly classified.
Geometrically, we try to find a hyper-plane
that separates the examples of the two
classes.

31

www.Vidyarthiplus.com

Perceptron Geometric View


The equation below describes a (hyper-)plane in the input space
consisting of real valued 2D vectors. The plane splits the input
space into two regions, each of them describing one class.

x2

w x w
i 1

i i

decision
boundary

decision
region for C1
w1x1 + w2x2 + w0 >= 0

C1
C2

x1
w1x1 + w2x2 + w0 = 0

32

www.Vidyarthiplus.com

Example: AND

Here is a representation of the AND function


White means false, black means true for the output
-1 means false, +1 means true for the input

-1 AND -1 = false
-1 AND +1 = false
+1 AND -1 = false
+1 AND +1 = true

33

www.Vidyarthiplus.com

Example: AND continued


A linear decision surface (i.e. a plane in 3D
space) intersecting the feature space (i.e.
the 2D plane where z=0) separates false
from true instances

34

www.Vidyarthiplus.com

Example: AND continued


Watch a perceptron learn the AND function:

35

www.Vidyarthiplus.com

Example: XOR
Heres the XOR function:
-1 XOR -1 = false
-1 XOR +1 = true
+1 XOR -1 = true
+1 XOR +1 = false

Perceptrons cannot learn such linearly inseparable functions


36

www.Vidyarthiplus.com

Example: XOR continued


Watch a perceptron try to learn XOR

37

www.Vidyarthiplus.com

Perceptron: Limitations
The perceptron can only model linearly separable
classes, like (those described by) the following Boolean
functions:
AND
OR
COMPLEMENT
It cannot model the XOR.
You can experiment with these functions in the Matlab
practical lessons.

38

www.Vidyarthiplus.com

Gradient Descent Learning Rule


Perceptron learning rule fails to converge if examples
are not linearly separable
Gradient Descent: Consider linear unit without
threshold and continuous output o (not just 1,1)

o(x)=w0 + w1 x1 + + wn xn

Update the wis such that they minimize the squared


error

E[w1,,wn] = (x,d)D (d-o(x))2

where D is the set of training examples

39

www.Vidyarthiplus.com
Replace the step function in the perceptron with a continuous (differentiable)
function f, e.g the simplest is linear function
With or without the threshold, the Adaline is trained based on the output of the
function f rather than the final output.

+/

f (x)

(Adaline)
40

www.Vidyarthiplus.com

Incremental Stochastic
Gradient Descent

Batch mode : gradient descent


w=w - ED[w] over the entire data D
ED[w]=1/2d(td-od)2
Incremental mode: gradient descent
w=w - Ed[w] over individual training examples d
Ed[w]=1/2 (td-od)2
Incremental Gradient Descent can approximate Batch Gradient Descent
arbitrarily closely if is small enough

41

www.Vidyarthiplus.com

Weights Update Rule:


incremental mode
Computation of Gradient(E):

E(w)
e
e
w
w
T
e[x ]
Delta rule for weight update:

w(n 1) w(n) e(n)x(n)


42

www.Vidyarthiplus.com

LMS learning algorithm


n=1;
initialize w(n) randomly;
while (E_tot unsatisfactory and n<max_iterations)
Select an example (x(n),d(n))

) d(n) w(n)
ne=(n
n+1;

x(n)
w(n 1) w(n) e(n)x(n)
end-while;

= learning rate parameter (real number)


A modification uses

x(n)
w(n 1) w(n) e(n)
|| x(n) ||
43

www.Vidyarthiplus.com

Perceptron Learning Rule VS.


Gradient Descent Rule

Perceptron learning rule guaranteed to succeed if


Training examples are linearly separable
Sufficiently small learning rate
Linear unit training rules uses gradient descent
Guaranteed to converge to hypothesis with
minimum squared error
Given sufficiently small learning rate
Even when training data contains noise
Even when training data not separable by H

44

www.Vidyarthiplus.com

Outline

INTRODUCTION
ADALINE
MADALINE
Least-Square Learning Rule
The proof of Least-Square Learning Rule

45

www.Vidyarthiplus.com

Widrow and Hoff, 1960


Bernard Widrow and Ted Hoff introduced the Least-MeanSquare algorithm (a.k.a.
delta-rule or Widrow-Hoff rule) and used it to train the
Adaline (ADAptive Linear Neuron)
--The Adaline was similar to the perceptron, except that it
used a linear activation function instead of the threshold
--The LMS algorithm is still heavily used in adaptive signal
processing
MADALINE: Many ADALINEs; Network of ADALINEs
46

www.Vidyarthiplus.com
x0
x1

w0
w1
wn

xn

Perceptron vs. ADALINE


y

f
Linear Graded Unit (LGU)

Emperical Hebbian Assumption


ADALINE: LGU

Linear Threshold Unit (LTU) or

f(s)

Percptron: LTU

Gradient-Decent

sgn(s)
tanh(s)

linear(s)
s

LTU: sign function; +/- (Positive/Negative)


LGU: Continuous and Differentiable Activation function

-1 including Linear function

47
MADALINE: Many ADALINEs; Network
of ADALINEs

www.Vidyarthiplus.com

ADALINE
ADALINE(Adaptive Linear Neuron) is a network
model proposed by Bernard Widrow in1959.
X1

PE

single processing element

X2

X3
48

www.Vidyarthiplus.com

Method
Method : The value in each unit must +1 or 1
X iWi
net =

X 0 1 net W0 W1 X 1 W2 X 2 Wn X n
1

Y
1

net 0
if
net 0

This is different from perception's transfer function.

49

www.Vidyarthiplus.com

Method
Wi (T-Y) X i , T expected output
Wi Wi Wi
ADALINE can solve only linear problem(the limitation)

50

www.Vidyarthiplus.com

MADALINE
MADALINE It is composed of many ADALINE
Multilayer Adaline.

Xi

Wij

netj

No Wij
Yj

if more than half of netj 0,then


output 1,otherwise, output 1

Xn

After the second layer, the majority vote is used.


51

www.Vidyarthiplus.com

Least-Square Learning Rule (1/2)


Least-Square Learning Rule X 0


X1
t
X j ( X 0 , X 1 , , X n ) , (i.e. X ) 1 j p


X
n

W0

W1
t
W0 (W0 , W1 , , Wn ) , (i.e. W )


W
n
n
Net j W t X j , i.e., Net j W0 X 0 W1 X 1 Wn X n
i 0

52

www.Vidyarthiplus.com

Least-Square Learning Rule (2/2)


By applying the least-square learning rule the
weights is
W * R -1P

or

R correlation matrix

P
R' '
t
'
'
'
where R , R R1 R2 RP X j X j
p
j 1

t
T
X

j
j

j 1

Pt

RW * P

53

www.Vidyarthiplus.com

Exercise: Use Adaline (1/4)


1
1


Example X1 1 X 2 0
0
1


T1 1

X1

X2

X3

Tj

X1

X2

X3

-1

T2 1

54

1

X 3 1
1

T3 1

www.Vidyarthiplus.com

Sol. First calculate

1
1 1 0

'
R1 1 110 1 1 0
0
0 0 0

1
1
1 0 1
3 2 2

'
R 2 0 101 0 0 0 R 2 2 1 2
3
3
1
1 0 1

2 1 2
3
1
1 1 1


'
R3 1111 1 1 1
1
1 1 1

2
2
1

3
3
3

2
3
1
3
2
3

55

www.Vidyarthiplus.com

P1 1 1,1,0 110

t 1
1
t
P2 1 1,0,1 101
P 100 ,0,0
3
3

t
P3 1 1,1,1 1 1 1
t

1

1 2 2 W1 3 3W1 2W2 2W3 1 W 3

1
3 3

2
2
1
R W P
W2 0 2W1 2W2 W3 0 W2 -2
3
3
3


2 1 2 W 0 2W W 2W 0 W1 -2
3 3 3 3 1
2
3

56

www.Vidyarthiplus.com

Verify the net:

1,1,0 net=3X1-2X2-2X3=1 Y=1 ok


1,0,1 net=3X1-2X2-2X3=1 Y=1 ok
1,1,1 net=3X1-2X2-2X3=-1 Y=-1 ok
X1
X2

3
-2

ADALINE

-2

X3

57

www.Vidyarthiplus.com

Proof of Least Square Learning Rule(1/3)


Let us use Least Mean Square Error to ensure the minimum
total error. As long as the total error approaches zero, the best
solution is found. Therefore, we are looking for the minimum of
2

k
Proof:
1 L 2 1 L
1 L
2
2
2
mean k (Tk Yk ) (Tk 2Tk Yk Yk )
L k 1
L k 1
L k 1
2

1 L 2 2 L
1 L 2
T Tk Yk Yk let Tk2 is mean of
L k 1
L k 1
L k 1
<Tk2>
L
2 L
1
t
t
T Tk Yk [W ( X k X k )W ]
L k 1
L
k 1
2
k

58

www.Vidyarthiplus.com

Proof of Least Square Learning Rule(2/3)


1

ps. Yk ( wi xik ) (W X k ) (W t X k )( X k W )
2

k 1

k 1

i 1

k 1

k 1

W ( X k X k ) W
t

k 1
L

L
2
t 1
Tt Tk Yk W [ ( X k X t )]W
L k 1
L k 1
2

2 L
t
t
Tt Tk (W k X k ) W t X k X k W
L k 1
2

1 L
t
t
Tt 2[ (Tk X k )W ] W t X k X k W
L k 1
2

Tt 2 2Tk X k W W t X k X k W
t

59

www.Vidyarthiplus.com

Proof of Least Square Learning Rule(3/3)


Let Rk X k X k , i.e., Rk is a n n matrix , also called Correlation Matrix
t

Let R' R'1 R'2 R'K R'L Tk X k

k 1

Let R ( R' / L) which is R X k X kt


Tk W t RW 2Tk X k W
2

We want to find W such that k is minimal


k
2
t
[Tk W t RW 2Tk X k W ]'
W
t
2 RW 2 Tk X k
Let P Tk X k
2

2 RW 2 P
k
if
2 RW * 2 P =
W
W * = R -1P RW *=P
2

60

www.Vidyarthiplus.com

Comparison of Perceptron and Adaline


Perceptron

Adaline

Architecture

Single-layer

Single-layer

Neuron
model
Learning
algorithm

Non-linear

linear

Minimze
number of
misclassified
examples

Minimize total
error

Application

Linear
classification

Linear classification and


regression
61

You might also like