0% found this document useful (0 votes)
25 views16 pages

Artifical Neural Networks - Lect - 2

The document summarizes the basic architecture and components of neural networks. It discusses that a neural network consists of neurons with weighted inputs that are summed and passed through an activation function. The weights are initialized and then updated during training using an algorithm like the perceptron learning rule to minimize error. Common activation functions include the sigmoid, tanh, and ReLU functions.

Uploaded by

ma5395822
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views16 pages

Artifical Neural Networks - Lect - 2

The document summarizes the basic architecture and components of neural networks. It discusses that a neural network consists of neurons with weighted inputs that are summed and passed through an activation function. The weights are initialized and then updated during training using an algorithm like the perceptron learning rule to minimize error. Common activation functions include the sigmoid, tanh, and ReLU functions.

Uploaded by

ma5395822
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

The basic architecture of

neural networks
Simple model of neuron
y1
• Each neuron has a threshold value w1j
w2j
y2
• Each neuron has weighted inputs

w3j
y3 O
from other neurons
yi wij

• The input signals form a weighted sum

• The weighted sum is subtracted from its threshold value, to give its
activation level.
Cont.
• If the activation level exceeds the threshold, the neuron “fires”.
Otherwise, the perceptron does not(e.g, ignored).

• Note : “The most common fixed value of threshold is zero”.

• Activation level is passed through a sigmoid activation function to


determine output
Mathematical representation of a perceptron
• Consider training instance is in the form ,
input vector 𝑋 is
and desired output y is
• Note: d is number of nodes, -1 and +1 is observed value of binary class
• The basic steps in the learning process in the perceptron is working as follows:
1) Compute the following function
2) The sign function is applied on the previous value to determine the
actual output:
Cont.
• The sign function maps the real value to -1 or +1 (e.g. sign(2.3) is +1
and sign(-2.3) is -1 ).
Note: sign function is a type of “activation functions” which have many
forms( will be discussed later).

3) The error of the prediction is then computed as the form:

4) If the error value is nonzero , then the weights must be updated in


the negative direction of the error gradient( in next slides).
Note: is called loss function (different forms of E will be discussed later)
Cont.
• The typical form of perceptron equation has a fixed value , it is called
a bias.
• By incorporating the bias value b, we can write the actual output
equation as the form:

• Note: For simplicity, bias may not appear in the following equations
Cont.
• The perceptron algorithm use smooth approximation of the gradient with
respect to each example:

• The weights are updating according to:

Current
Weight Previous
weight ∆w

𝛼 is learning rate (mostly equals 0.1).


Weight initialization in NNs
We can use one of the following rules to initialize weight:

1. Zero initialization: this choice causes all weights have the same value in
subsequent iterations.

2. Random initialization : this is a better choice to break the symmetry.


However, initializing weight with much high or low value can result in
slower optimization.

3. Using an extra scaling factor in scheme like Xavier initialization:


this method can solve the above issue (slower optimization). That’s why this
is the more recommended weight initialization method among all.
Convergence
• The perceptron algorithm always converges to provide zero error on
the training data when data are linearly separable.
• But it is not guaranteed to converge in the case when the data are not
linearly separable.
Activation function
• Choice of activation function is a critical part in NN design.

• Neuron computes two functions in the node:


The value computed before applying activation function is called pre activation
value (computed by ∑ ) , whereas the value computed after applying
activation function is called post activation value (denoted by )
Cont.
• All neurons contain an activation function which determines whether the
signal is strong enough to produce an output.

• Classical activation functions:

• Sign function (or Bipolar binary function ) used to map to binary outputs at
prediction time(-1 or +1).
• Sigmoid function (or unipolar continuous function) outputs a value in the interval
(0,1), thus it can create probabilistic outputs(real values) and creating loss
function based on maximum- likelihood model.
• Tanh function is similar to sigmoid function, but it is preferred when the outputs
desired to be positive or negative. Furthermore, its larger gradient makes it easier
to train.
Cont.
• Piecewise linear activation functions:

• Both ReLU and hard tanh have largely replaced sigmoid and soft tanh
activation function in modern neural networks for the ease of training.
Cont.
Learning algorithms
During the learning process, weights can be updated by different rules. In the next
rules: C is learning rate, d desired output, o actual output and net = 𝑾𝒕 𝐗
• Perceptron Learning Rule:
∆wi = c [di – oi)] xi
• Hebbien learning rule:
∆wi = c [oi] xi
• Delta learning rule:
∆wi = c [di – oi)] f '(net) xi
f '(net)= 1/2 (1-o2(
• Widrow-Hoff Learning Rule (d is independent of activation function)
∆wi = c [di – net)] f '(net) xi
f '(net)= 1, f (net) = net
• Correlation Learning Rule (special case of Hebbien)
∆wi = c [di ] xi
Example 1: illustrates the perceptron learning rule

Sol : when c = 0.1

You might also like