0% found this document useful (0 votes)
3 views38 pages

ML Module 4

Bayes' Theorem is a key principle in probability that updates the probability of a hypothesis based on new evidence, involving prior probability, likelihood, and posterior probability. The document also discusses artificial neural networks, their structure, and the functioning of perceptrons, which are simplified models of biological neurons used for supervised learning. It includes examples of activation functions and a practical problem involving the design of a two-layer perceptron network to implement a NAND gate.

Uploaded by

avalandro12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views38 pages

ML Module 4

Bayes' Theorem is a key principle in probability that updates the probability of a hypothesis based on new evidence, involving prior probability, likelihood, and posterior probability. The document also discusses artificial neural networks, their structure, and the functioning of perceptrons, which are simplified models of biological neurons used for supervised learning. It includes examples of activation functions and a practical problem involving the design of a two-layer perceptron network to implement a NAND gate.

Uploaded by

avalandro12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

MODULE-4

Bayes' Theorem is a fundamental concept in probability theory and forms the foundation of Bayesian
learning in machine learning. It allows you to update the probability of a hypothesis (or event) based on
new evidence.

Bayes' Theorem Explained

At its core, Bayes' Theorem relates current knowledge or belief about an event (the prior probability) to
new data or evidence (the likelihood) to produce an updated belief (the posterior probability).

Mathematically, Bayes' Theorem is:

Where:

• .IN
P(H | D) is the posterior probability: the probability of the hypothesis HH being true given the data
C
DD.
• P(D | H) is the likelihood: the probability of observing the data DD given that hypothesis HH is true.
N
• P(H) is the prior probability: the initial belief about the hypothesis HH before any data is observed.
• P(D) is the marginal likelihood or evidence: the total probability of the data under all possible
SY

hypotheses. This acts as a normalizing constant to ensure that the posterior is a valid probability
distribution.

Breaking Down the Components of Bayes' Theorem


U

1. Prior Probability (P(H)):


This represents what we know or believe about a hypothesis before seeing any new data.
VT

o
o Example: In a medical test scenario, it could be the prior probability of a person having a
disease before considering the test results (e.g., based on the general population statistics).
2. Likelihood (P(D | H)):
o This is the probability of observing the data, assuming the hypothesis is true. It expresses how
likely it is to see the given data under the assumption of the hypothesis.
o Example: The likelihood would be the probability of getting a positive test result assuming the
person has the disease.
3. Evidence (P(D)):
o This is the total probability of the data across all hypotheses. It serves to normalize the
posterior probability so that it sums to 1.
o Example: The probability of getting a positive test result across all people, whether they have
the disease or not.
4. Posterior Probability (P(H | D)):
o This is the updated belief about the hypothesis after considering the new data (the evidence).
o Example: The posterior would give the probability of a person having the disease after
considering both the prior knowledge and the test results.

Intuition Behind Bayes' Theorem


Bayes' Theorem can be understood in terms of updating beliefs. When you receive new evidence, you
modify your prior belief to form a new belief that incorporates both your prior knowledge and the new
data.

• Before you collect any data, you have a prior belief about a hypothesis (e.g., the probability of a patient
having a disease).
• After seeing new data (e.g., the result of a medical test), you update your belief about the hypothesis to
reflect this new evidence.

Bayes’ Theorem lets you do this systematically, ensuring that your updated belief (posterior) is
proportional to the prior belief and the likelihood of observing the new data.

Example: Disease Diagnosis

Consider a simple example of diagnosing a disease using a medical test.

1. Prior Probability (P(H)):


o The prior belief is the probability that a person has the disease. For example, in a population,
1% of people might have the disease, so P(H)=0.01P(H) = 0.01.
2. Likelihood (P(D | H)):

.IN
o This is the probability of getting a positive test result if the person has the disease. Suppose the
test correctly identifies the disease 95% of the time, so P(D∣H)=0.95P(D | H) = 0.95.
3. Evidence (P(D)):
o This is the total probability of a positive test result in the population. It includes both people
C
who have the disease and those who do not.
4. Posterior Probability (P(H | D)):
After receiving a positive test result, we want to calculate the probability that the person
N
o
actually has the disease.
SY
U
VT
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
VT
U
SY
N
C
.IN
Chapter 10
Artificial Neural Networks
The term "Artificial neural network" refers to a biologically inspired sub-field of artificial intelligence
modelled after the brain.
An Artificial neural network is usually a computational network based on biological neural networks
that construct the structure of the human brain.
Similar to a human brain has neurons interconnected to each other, artificial neural networks also have
neurons that are linked to each other in various layers of the networks. These neurons are known as
nodes.

.IN
C
N
SY

The biological neuron consists of main four parts:


U

• dendrites: nerve fibres carrying electrical signals to the cell .


VT

• cell body: computes a non-linear function of its inputs


• axon: single long fiber that carries the electrical signal from the cell body to other neurons
• synapse: the point of contact between the axon of one cell and the dendrite of another,
regulating a chemical connection whose strength affects the input to the cell.


Dendrites are tree like networks made of nerve fiber connected to the cell body.
An Axon is a single, long connection extending from the cell body and carrying signals from the
neuron. The end of axon splits into fine strands. It is found that each strand terminated into small
bulb like organs called as synapse. It is through synapse that the neuron introduces its signals to
other nearby neurons. The receiving ends of these synapses on the nearby neurons can be found
both on the dendrites and on the cell body. There are approximately 104 synapses per neuron in the
human body. Electric impulse is passed between synapse and dendrites. It is a chemical process
which results in increase/decrease in the electric potential inside the body of the receiving cell. If
the electric potential reaches a thresh hold value, receiving cell fires & pulse / action potential of
fixed strength and duration is send through the axon to synaptic junction of the cell. After that, cell
has to wait for a period called refractory period.

Difference between biological and Artificial Neuron

.IN
C
N
SY
U
VT
ARTIFICIAL NEURONS:
.IN
Artificial neurons are like biological neurons that are linked to each other in various layers of the
C
networks. These neurons are known as nodes.
N
A node or a neuron can receive one or more input information and process it. artificial neurons are
SY

connected by connection links to another neuron. Each connection link is associated with a synaptic
weight. The structure of a single neuron is shown below:
U
VT
Fig: McCulloch-Pitts Neuron Mathematical model. .IN
C
N
SY

Simple Model of an ANN


The first mathematical model of a biological neuron was designed by McCulloch-Pitts in 1943.
It includes 2 steps:
U

1. It receives weighted inputs from other neurons.


2. It operates with a threshold function or activation function.
VT

Basically, a neuron takes an input signal (dendrite), processes it like the CPU (soma), passes
the output through a cable like structure to other connected neurons (axon to synapse to
other neuron’s dendrite).

OR
Working:
The received input are computed as a weighted sum which is given to the activation function

.IN
and if the sum exceeds the threshold value the neuron gets fired.The neuron is the basic
processing unit that receives a set of inputs x1,x2,x3,….xn and their associated weights
w1,w2,w3,….wn. The summation function computes the weighted sum of the inputs
C
received by the neuron.
Sum=∑xiwi
N
SY

Activation functions:
• To make work more efficient and for exact output, some force or activation is given. Like
that, activation function is applied over the net input to calculate the output of an ANN.
U

Information processing of processing element has two major parts: input and output. An
VT

integration function (f) is associated with input of processing element.

• Several activation functions are there.

1. Identity function or Linear Function: It is a linear function which is defined as 𝑓(𝑓) =


𝑓 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑥

The output is same as the input ie the weighted sum. The function is useful when we do
not apply any threshold. The output value ranged between –∞ and +∞
2. Binary step function: This function can be defined as
𝑓(𝑓) = { 1 𝑓𝑓 𝑓 ≥ 𝑓
0 𝑓𝑓 𝑓 < 𝑓
Where, θ represents threshhold value. It is used in single layer nets to convert
the net input to an output that is binary (0 or 1).
3. Bipolar step function: This function can be defined as
𝑓(𝑥) = { 1 𝑖𝑓 𝑥 ≥ 𝜃
−1 𝑖𝑓 𝑥 < 𝜃
Where, θ represents threshold value. It is used in single layer nets to convert
the net input to an output that is bipolar (+1 or -1).
4. Sigmoid function: It is used in Back propagation nets.
Two types:
a) Binary sigmoid function: It is also termed as logistic sigmoid function or unipolar
sigmoid function. It is defined as

.IN
where, λ represents steepness parameter. The range of sigmoid function is 0
C
to 1
b) Bipolar sigmoid function: This function is defined as
N
SY

Where λ represents steepness parameter and the sigmoid range is between -1


U

and +1.
VT

5. Ramp function: The ramp function is defined as:

It is a linear function whose upper and lower limits are fixed.


.IN
C
N
SY

6. Tanh-Hyperbolic tangent function : Tanh function is very similar to the sigmoid/logistic


U

activation function, and even has the same S-shape with the difference in output range of -1 to
1. In Tanh, the larger the input (more positive), the closer the output value will be to 1.0,
VT

whereas the smaller the input (more negative), the closer the output will be to -1.0.

7. ReLU Function
ReLU stands for Rectified Linear Unit.
Although it gives an impression of a linear function, ReLU has a derivative function and
allows for backpropagation while simultaneously making it computationally efficient.
The main catch here is that the ReLU function does not activate all the neurons at the same
time.
The neurons will only be deactivated if the output of the linear transformation is less than 0
8. Softmax function: Softmax is an activation function that scales numbers/logits into
probabilities. The output of a Softmax is a vector (say v) with probabilities of each

possible outcome. The probabilities in vector v sums to one for all possible outcomes or

classes.

.IN
C
N
SY
U
VT

Artificial Neural Network Structure


• Artificial Neural Networks Computational models inspired by the human brain: – Massively
parallel, distributed system, made up of simple processing units (neurons) – Synaptic
connection strengths among neurons are used to store the acquired knowledge.

• Knowledge is acquired by the network from its environment through a learning process.

• The Neural Network is constructed from 3 type of layers:


• Input layer — initial data for the neural network.
• Hidden layers — intermediate layer between input and output layer and place where all the
computation is done.

• Output layer — produce the result for given inputs.

.IN
C
N
PERCEPTRON AND LEARNING THEORY
SY

• The perceptron is also a simplified model of a biological neuron.


• The perceptron is an algorithm for supervised learning of binary classifiers. It is a type of
linear classifier, i.e. a classification algorithm that makes all of its predictions based on a
U

linear predictor function combining a set of weights with the feature vector.
• One type of ANN system is based on a unit called a perceptron.
VT

OR
• The perceptron can represent all boolean primitive functions AND, OR, NAND , NOR.
• Some boolean functions can not be represented .
– E.g. the XOR function.

.IN
Major components of a perceptron
• Input
C
• Weight
• Bias
N
• Weighted summation
SY

• Step/activation function
• output
WORKING:
U

• Feed the features of the model that is required to be trained as input in the first layer. All
VT

weights and inputs will be multiplied – the multiplied result of each weight and input will be
added up.The Bias value will be added to shift the output function .This value will be
presented to the activation function (the type of activation function will depend on the need)
The value received after the last step is the output value.
The activation function is a binary step function which outputs a value 1, if f(x) is above the
threshold value Θ and a 0 if f(x) is below the threshold value Θ. Then the output of a neuron
is:
.IN
C
PROBLEM:
Design a 2 layer network of perceptron to implement NAND gate. Assume your own weights and
N
biases in the range of [-0.5 0.5]. Use learning rate as 0.4.
SY

Solution:
U

X0
VT

𝚹3 𝚹4
X1 𝑤13

X3 X4
𝑤34
AND NOT
𝑤23
X2

Figure 1 Two Layer Network for NAND gate

Table 1: Weights and Biases


𝑿𝟏 𝑿𝟐 𝑶𝒅𝒆𝒔𝒊𝒓𝒆𝒅 𝒘𝟏𝟑 𝒘𝟐𝟑 𝒘𝟑𝟒 𝚹𝟑 𝚹𝟒 𝑿𝟎

0 1 1 0.1 -0.4 0.3 0.2 -0.3 1


Table 2: Truth Table of NAND Gate
𝑿𝟏 𝑿𝟐 𝑿𝟏 𝑨𝑵𝑫 𝑿𝟐 𝑵𝑨𝑵𝑫 = 𝑵𝑶𝑻(𝑿𝟏 𝑨𝑵𝑫 𝑿𝟐)

0 0 0 1
0 1 0 1
1 0 0 1
1 1 1 0

ITERATION 1:
Step 1: FORWARD PROPAGATION
1. Calculate net inputs and outputs in input layer as shown in Table 3.
Table 3: Net Input and Output Calculation
Input Layer 𝑰𝒋 𝑶𝒋

𝑿𝟏 0 0

.IN
𝑿𝟐 1 1
C
2. Calculate net inputs and outputs in hidden and output layer as shown in Table 4.
Table 4: Net Input and Output Calculation in Hidden and Output layer
N

𝑼𝒏𝒊𝒕𝒋 𝑵𝒆𝒕 𝑰𝒏𝒑𝒖𝒕 𝑰𝒋 𝑵𝒆𝒕 𝒐𝒖𝒕𝒑𝒖𝒕 𝑶𝒋


SY

𝑿𝟑 𝐼3 = 𝑋1𝑊13 + 𝑋2𝑊23 + 𝑋0𝚹3 1


𝑶𝟑 =
1 + 𝑒−𝐼3
= 0(0.1) + 1(−0.4) + 1(0.2)
1
U

= −0.2 =
1 + 𝑒−(−0.2)
VT

= 0.450
𝑼𝒏𝒊𝒕𝒌 𝑵𝒆𝒕 𝑰𝒏𝒑𝒖𝒕 𝑰𝒌 𝐍𝐞𝐭 𝐨𝐮𝐭𝐩𝐮𝐭 𝑶𝒌

𝑿𝟒 𝐼4 = 𝑂3𝑊34 + 𝑋0𝚹4 1
𝑶𝟒 =
1 + 𝑒−𝐼4
= (0.450 ∗ 0.3) + 1(−0.3)
1
= −0.165 =
1 + 𝑒−(−0.165)

= 0.458

3. Calculate Error
𝑬𝒓𝒓𝒐𝒓 = 𝑶𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝑶𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅
= 1 − 0.458
𝐸𝑟𝑟𝑜𝑟 = 0.542

Step 2: BACKWARD PROPAGATION


1. For each 𝒖𝒏𝒊𝒕𝒌 in the output layer
𝑬𝒓𝒓𝒐𝒓𝒌 = 𝑶𝒌 ∗ (𝟏 − 𝑶𝒌) ∗ (𝑶𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝑶𝒌)

For each 𝒖𝒏𝒊𝒕𝒋 in the hidden layer

𝑬𝒓𝒓𝒐𝒓𝒋 = 𝑶𝒋 ∗ (𝟏 − 𝑶𝒋) ∗ (∑ 𝑬𝒓𝒓𝒐𝒓 ∗ 𝑾𝒋𝒌)


𝒌

Table 5: Error Calculation


For each output 𝑬𝒓𝒓𝒐𝒓𝒌
layer 𝒖𝒏𝒊𝒕𝒌
𝑋4 𝐸𝑟𝑟𝑜𝑟𝑘 = 𝑂𝑘 ∗ (1 − 𝑂𝑘) ∗ (𝑂𝑑𝑒𝑠𝑖𝑟𝑒𝑑 − 𝑂𝑘)

.IN
= 0.458(1 − 0.458)(1 − 0.458)
= 0.134
C
For each hidden layer 𝑬𝒓𝒓𝒐𝒓𝒋
N
𝒖𝒏𝒊𝒕𝒋
SY

𝑋3 𝐸𝑟𝑟𝑜𝑟𝑗 = 𝑂𝑗 ∗ (1 − 𝑂𝑗) ∗ (∑ 𝐸𝑟𝑟𝑜𝑟 ∗ 𝑊𝑗𝑘)


U

= 0.450 ∗ (1 − 0.450) ∗ 0.134 ∗ 0.3


VT

= 0.0099

2. Update Weights and biases


Table 6: Weight and Bias Calculation

𝒘𝒊𝒋 𝒘𝒊𝒋 = 𝒘𝒊𝒋 + (𝜶 ∗ 𝑬𝒓𝒓𝒐𝒓𝒋 ∗ 𝑶𝒊) Net Weight

𝑤13 𝑤13 = 𝑤13 + (0.4 ∗ 𝐸𝑟𝑟𝑜𝑟3 ∗ 𝑂1) 0.1


= 0.1 ∗ (0.4 ∗ 0.0099 ∗ 0)
𝑤23 𝑤23 = 𝑤23 + (0.4 ∗ 𝐸𝑟𝑟𝑜𝑟3 ∗ 𝑂2) -0.396
= −0.4 ∗ (0.4 ∗ 0.0099 ∗ 1)
𝑤24 𝑤24 = 𝑤24 + (0.4 ∗ 𝐸𝑟𝑟𝑜𝑟4 ∗ 𝑂2) 0.324
= 0.3 ∗ (0.4 ∗ 0.134 ∗ 0.450)
𝚹𝒋 𝚹𝒋 = 𝚹𝒋 + (𝜶 ∗ 𝑬𝒓𝒓𝒐𝒓𝒋) Net Bias

𝚹3 𝚹3 = 𝚹3 + (0.4 ∗ 𝐸𝑟𝑟𝑜𝑟3) 0.203


= 0.2 + (0.4 ∗ 0.0099)
𝚹4 𝚹4 = 𝚹4 + (0.4 ∗ 𝐸𝑟𝑟𝑜𝑟4) -0.246
= −0.3 + (0.4 ∗ 0.134

ITERATION 2:
Step 1: FORWARD PROPAGATION

1. Calculate net inputs and outputs in hidden and output layer


Table 7: Inputs and Outputs in Hidden and Output layer

𝑼𝒏𝒊𝒕𝒋 𝑵𝒆𝒕 𝑰𝒏𝒑𝒖𝒕 𝑰𝒋 𝑵𝒆𝒕 𝒐𝒖𝒕𝒑𝒖𝒕 𝑶𝒋


1

.IN
𝑿𝟑 𝐼3 = 𝑋1𝑊13 + 𝑋2𝑊23 + 𝑋0𝚹3
𝑶𝟑 =
1 + 𝑒−𝐼3
= 0(0.1) + 1(−0.396) + 1(0.203)
1
= −0.193 =
1 + 𝑒−(−0.193)
C
= 0.451
N
𝑼𝒏𝒊𝒕𝒌 𝑵𝒆𝒕 𝑰𝒏𝒑𝒖𝒕 𝑰𝒌 𝑵𝒆𝒕 𝒐𝒖𝒕𝒑𝒖𝒕 𝑶𝒌
1
SY

𝑿𝟒 𝐼4 = 𝑂3𝑊34 + 𝑋0𝚹4 𝑶𝟒 =
1 + 𝑒−𝐼4
= (0.451 ∗ 0.324) + 1(−0.246)
1
= −0.099 =
1 + 𝑒−(−0.099)
U

= 0.475
VT

2. Calculate Error
𝑬𝒓𝒓𝒐𝒓 = 𝑶𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝑶𝒆𝒔𝒕𝒊𝒎𝒂𝒕𝒆𝒅
= 1 − 0.475
𝐸𝑟𝑟𝑜𝑟 = 0.525

ITERATION ERROR
1 0.542 =0.542-0.525
=0.017
2 0.525

In iteration 2 the error gets reduced to 0.525. This process will continue until desired output
is achieved.
How a Multi-Layer Perceptron does solves the XOR problem. Design an MLP with back
propagation to implement the XOR Boolean function.
Solution:

X1 X2 Y
0 0 1
0 1 0
1 0 0
1 1 1

X0

.IN
0.1

X1 -0.3
-0.2
0.4
C
0.4
X3 0.2
0.2
N
X2 X5
SY

-0.3
-0.3

X4
U

Figure 2: Multi Layer Perceptron for XOR


VT

Learning rate: =0.8


Table 8: Weights and Biases
X1 X2 W13 W14 W23 W24 W35 W45 𝜃3 𝜃4 𝜃5
1 0 -0.2 0.4 0.2 -0.3 0.2 -0.3 0.4 0.1 -0.3

Step 1: Forward Propagation


1. Calculate Input and Output in the Input Layer shown in Table 9.
Table 9: Net Input and Output Calculation
Input Layer Ij Oj
X1 1 1
X2 0 0
2. Calculate Net Input and Output in the Hidden Layer and Output Layer shown in Table 10.
Table 10: Unit j at Hidden Layer and Output Layer – Net Input and Output Calculation
Unit j Net Input Ij Output Oj
1 1
X3 I3 = X1*W13 + X2*W23+ X0*θ3 O3 = = = 0.549
1+𝑒−𝐼3 1+𝑒−0.2
I3 = 1*-0.2 + 0*0.2+ 1*0.4 = 0.2
1 1
X4 I4 = X1*W14 + X2*W24+ X0*θ4 O4 = = = 0.622
1+𝑒−𝐼4 1+𝑒−0.5
I4 = 1*0.4 + 0*-0.3+ 1*0.1 = 0.5
1 1
X5 I5 = O3 * W35 + O4*W45 + X0*θ5 O5 = = =0.407
1+𝑒−𝐼5 1+𝑒0.376
I5 = 0.549 * 0.2 + 0.622 * -0.3 + 1*-0.3 = -0.376

3. Calculate Error = Odesired – OEstimated


So error for this network is,
Error = Odesired – O7 = 1 – 0.407 = 0.593

Step 2: Backward Propagation


1. Calculate Error at each node as shown in Table 11. .IN
C
For each unit k in the output layer, calculate
N
Error k = Ok (1-Ok) (YN – Ok)
SY

For each unit j in the hidden layer, calculate


Error j = Oj (1-Oj) ∑𝑘 𝐸𝑟𝑟𝑜𝑟𝑘 𝑊𝑗𝑘
U

Table 11: Error Calculation for each unit in the Output layer and Hidden layer
For Output Layer Errork
VT

Unit k
X5 Error 5 = O5 (1-O5) (1 – O5)
= 0.407 * (1-0.407) * (1- 0.407)
= 0.143
For Hidden layer Errorj
Unit j
X4 Error 4 = O4 (1-O4) ∑𝑘 𝐸𝑟𝑟𝑜𝑟𝑘 𝑊𝑗𝑘 = O4 (1-O4) 𝐸𝑟𝑟𝑜𝑟5 𝑊45
= 0.622 (1-0.622) *- 0.3 *0.143
= -0.010
X3 Error 3 = O3 (1-O3) ∑𝑘 𝐸𝑟𝑟𝑜𝑟𝑘 𝑊𝑗𝑘 = O3 (1-O3) 𝐸𝑟𝑟𝑜𝑟5 𝑊35
= 0.549 (1- 0.549) * 0.143 * 0.2
= -0.007

2. Update weight using the below formula,


Learning rate α = 0.8
∆Wij = ∝∗ Error j* Oi
Wij = Wij+ ∆Wij
The updated weight and bias is shown in Table 12 and Table 13.
Table 12: Weight Updation
Wij Wij = Wij+ ∝∗ Error j* Oi New Weight
W13 W13 = W13 + 0.8 * Error 3* O1 -0.194
= -0.2 + 0.8 * 0.007 * 1
W14 W14 = W14 + 0.8 * Error 4* O1 0.392
= 0.4+ 0.8 * -0.01 *1

.IN
W23 W23 = W23 + 0.8 * Error 3* O2 0.2
= 0.2 + 0.8 * 0.007 *0
C
W24 W24 = W24+ 0.8 * Error 4 * O2 -0.3
= -0.3+ 0.8 * -0.001 *0
N
W35 W35 = W35 + 0.8 * Error 5* O3 0.154
SY

= 0.2 + 0.8 *0.143* 0.4


W45 W45 = W45 + 0.8 * Error 5* O4 -0.288
= 0.3 + 0.8 * 0.143* 0.1
U

Update bias using the below formula,


VT

∆θj = = ∝∗ Error j
θj = θj + ∆θj
Table 13: Bias Updation
θj θj = θj + ∝∗ Error j New Bias
𝜃3 Θ3 = θ3 + ∝∗ Error 3 0.405
= 0.4 + 0.8 * 0.007
𝜃4 θ 4 = θ4 + ∝∗ Error 4 0.092
= 0.1 + 0.8 *- 0.01
𝜃5 θ 5 = θ5 + ∝∗ Error 5 -0.185
= -0.3 + 0.8 * 0.143
Iteration 2
Now with the updated weights and biases,
1. Calculate Input and Output in the Input Layer shown in Table 14.
Table 14: Net Input and Output Calculation
Input Layer Ij Oj
X1 1 1
X2 0 0

2. Calculate Net Input and Output in the Hidden Layer and Output Layer shown in Table 15.
Table 15: Net Input and Output Calculation in the Hidden Layer and Output Layer
Unit j Net Input Ij Output Oj
1 1
X3 I3 = X1*W13 + X2*W23+ X0*θ3 O3 = = =
1+𝑒−𝐼3 1+𝑒−0.211
I3 = 1*-0.194 + 0*0.2+ 1*0.405 = 0.211 0.552

.IN
1 1
X4 I4 = X1*W14 + X2*W24+ X0*θ4 O4 = = =
1+𝑒−𝐼4 1+𝑒−0.484
I4 = 1*0.392 + 0*-0.3+ 1*0.092 = 0.484 0.618
C
1 1
X5 I5 = O3 * W35 + O4*W45 + X0*θ5 O5 = = =0.429
1+𝑒−𝐼5 1+𝑒0.282
N
I5 = 0.552* 0.154 + 0.618* -0.288 + 1*-0.185 = -
0.282
SY

The output we receive in the network at node 5 is 0.407.


U

Error = 1 - 0.429= 0.571


Now when we compare the error, we get in the previous iteration and in the current iteration, the
VT

network has learnt which reduces the error by 0.022.


Error is reduced by 0.055: 0.593 – 0.571.

Consider the Network architecture with 4 input units and 2 output units. Consider four training
samples each vector of length 4.
Training samples
i1: (1, 1, 1, 0)
i2: (0, 0, 1, 1)
i3: (1, 0, 0, 1)
i4: (0, 0, 1, 0)
Output Units: Unit 1, Unit 2
Learning rate η(t) = 0.6
Initial Weight matrix
0.2 0.8 0.5 0.1
[Unit 1]:[ ]
Unit 2 0.3 0.5 0.4 0.6
Identify an algorithm to learn without supervision? How do you cluster them as we
expected?

Solution:
Use Self Organizing Feature Map (SOFM)

Iteration 1:
Training Sample X1: (1, 1, 1, 0)
Weight matrix
0.2 0.8 0.5 0.1
[Unit 1]: [

.IN
]
Unit 2 0.3 0.5 0.4 0.6

Compute Euclidean distance between X1: (1, 1, 1, 0) and Unit 1 weights.


C
N
d2 = (0.2 -1)2 + (0.8 – 1)2 + (0.5 -1)2 + (0.1 – 0)2
= 0.94
SY

Compute Euclidean distance between X1: (1, 1, 1, 0) and Unit 2 weights.


U

d2 = (0.3 -1)2 + (0.5 – 1)2 + (0.4 -1)2 + (0.6– 0)2


= 1.46
VT

Unit 1 wins
Update the weights of the winning unit
New Unit 1 weights = [0.2 0.8 0.5 0.2] + 0.6 ([1 1 1 0] - [0.2 0.8 0.5 0.2])
= [0.2 0.8 0.5 0.2] + 0.6 [0.8 0.2 0.5 -0.2]
= [0.2 0.8 0.5 0.2] + [0.48 0.12 0.30 -0.12]
= [0.68 0.92 0.80 0.08]

[Unit 1]:[ 0.68 0.92 0.80 0.08]


Unit 2 0.3 0.5 0.4 0.6
Iteration 2:
Training Sample X2: (0, 0, 1, 1)
Weight matrix
0.68 0.92 0.80 0.08
[Unit 1]:[ ]
Unit 2 0.3 0.5 0.4 0.6
Compute Euclidean distance between X2: (0, 0, 1, 1) and Unit 1 weights.

d2 = (0.68 -0)2 + (0.92 – 0)2 + (0.80 -1)2 + (0.08 – 1)2


= 2.1952
Compute Euclidean distance between X2: (0, 0, 1, 1) and Unit 2 weights.

d2 = (0.3 -0)2 + (0.5 – 0)2 + (0.4 -1)2 + (0.6– 1)2


= 0.86
Unit 2 wins
Update the weights of the winning unit
New Unit 2 weights = [0.3 0.5 0.4 0.6] + 0.6 ([0 0 1 1] - [0.3 0.5 0.4 0.6])
= [0.3 0.5 0.4 0.6] + 0.6 [-0.3 -0.5 0.6 0.4]

.IN
= [0.3 0.5 0.4 0.6] + [-0.18 -0.30 0.36 0.24]
= [0.12 0.2 0.76 0.84]

[Unit 1]:[ 0.68 0.92 0.80 0.08]


C
Unit 2 0.12 0.2 0.76 0.84
N
Iteration 3:
SY

Training Sample X3: (1, 0, 0, 1)


Weight matrix
0.68 0.92 0.80 0.08
[Unit 1]:[ ]
U

Unit 2 0.12 0.2 0.76 0.84


VT

Compute Euclidean distance between X3: (1, 0, 0, 1) and Unit 1 weights.

d2 = (0.68 -1)2 + (0.92 – 0)2 + (0.80 -0)2 + (0.08 – 1)2


= 2.44
Compute Euclidean distance between X3: (1, 0, 0, 1) and Unit 2 weights.

d2 = (0.12 -1)2 + (0.2 – 0)2 + (0.76 -0)2 + (0.84– 1)2


= 1.42
Unit 2 wins
Update the weights of the winning unit
New Unit 2 weights = [0.12 0.2 0.76 0.84] + 0.6 ([1 0 0 1] - [0.12 0.2 0.76 0.84])
= [0.12 0.2 0.76 0.84] + 0.6 [0.88 -0.2 -0.76 0.16]
= [0.12 0.2 0.76 0.84] + [0.53 -0.12 -0.46 0.096]
= [0.65 0.08 0.3 0.94]

[Unit 1]:[ 0.68 0.92 0.80 0.08]


Unit 2 0.65 0.08 0.3 0.94

Iteration 4:
Training Sample X4: (0, 0, 1, 0)
Weight matrix

[Unit 1]:[ 0.68 0.92 0.80 0.08


]
Unit 2 0.65 0.08 0.3 0.94

Compute Euclidean distance between X4: (0, 0, 1, 0) and Unit 1 weights.

d2 = (0.68 -0)2 + (0.92 –0)2 + (0.80 -1)2 + (0.08 – 0)2

.IN
= 1.36
Compute Euclidean distance between X1: (0, 0, 1, 0) and Unit 2 weights.
C
d2 = (0.65- 0)2 + (0.08 – 0)2 + (0.3 -1)2 + (0.94– 0)2
N
= 1.8025
Unit 1 wins
SY

Update the weights of the winning unit


New Unit 1 weights = [0.68 0.92 0.80 0.08] + 0.6 ([0 0 1 0] - [0.68 0.92 0.80 0.08])
U

= [0.68 0.92 0.80 0.08] + 0.6 [-0.68 -0.92 0.2 -0.08]


= [0.68 0.92 0.80 0.08] + [-0.408 -0.552 0.12 -0.258]
VT

= [0.27 0.37 0.92 -0.178]


0.27 0.37 0.92 − 0.178
[Unit 1]:[ ]
Unit 2 0.65 0.08 0.3 0.94

Best mapping unit for each of the sample taken are,


X1: (1, 1, 1, 0) → Unit 1
X2: (0, 0, 1, 1) → Unit 2
X3: (1, 0, 0, 1) → Unit 2
X4: (0, 0, 1, 0) → Unit 1

This process is continued for many epochs until the feature map doesn’t change.
Learning Rules
Learning in NN is performed by adjusting the network weights in order to minimize the
difference between the desired and estimated output.

Delta Learning Rule and Gradient Descent


.IN
C
🞂 Developed by Widrow and Hoff, the delta rule, is one of the most common learning rules.
🞂 It is supervised learning.
N
🞂 Delta rule is derived from gradient descent method(Back-propogation).
SY

🞂 It is Non-linearly separable. Also called as continuous perceptron Learning rule.


🞂 It updates the connection weights with the difference between the target and the output
value. It is the least mean square learning algorithm.
U

🞂 The Delta difference is measured as an error function or also called as cost function.
VT

TYPES OF ANN
1. Feed Forward Neural Network
2. Fully connected Neural Network
3. Multilayer Perceptron
4. Feedback Neural Network
Feed Forward Neural Network:
Feed-Forward Neural Network is a single layer perceptron. A sequence of inputs enters the layer and are
multiplied by the weights in this model. The weighted input values are then summed together to form a total.
If the sum of the values is more than a predetermined threshold, which is normally set at zero, the output
value is usually 1, and if the sum is less than the threshold, the output value is usually -1.
The single-layer perceptron is a popular feed-forward neural network model that is frequently used for
classification.
The model may or may not contain hidden layer and there is no backpropagation.
Based on the number of hidden layers they are further classified into single-layered and multilayered feed
forward network.

.IN
C
N
SY

Fully connected Neural Network:


U

• A fully connected neural network consists of a series of fully connected layers that connect
VT

every neuron in one layer to every neuron in the other layer.

• The major advantage of fully connected networks is that they are “structure agnostic” i.e. there
are no special assumptions needed to be made about the input.
Multilayer Perceptron:
A multi-layer perceptron has one input layer and for each input, there is one neuron (or node), it has
one output layer with a single node for each output and it can have any number of hidden layers and
each hidden layer can have any number of nodes.
The information flows in both directions.
The weight adjustment training is done via backpropagation.
Every node in the multi-layer perception uses a sigmoid activation function. The sigmoid activation
function takes real values as input and converts them to numbers between 0 and 1 using the sigmoid
formula.

.IN
C
N
SY

Feedback Neural Network:


Feedback networks also known as recurrent neural network or interactive neural network are
the deep learning models in which information flows in backward direction.
U

It allows feedback loops in the network. Feedback networks are dynamic in nature, powerful and
VT

can get much complicated at some stage of execution


Neuronal connections can be made in any way.
RNNs may process input sequences of different lengths by using their internal state, which can
represent a form of memory.
They can therefore be used for applications like speech recognition or handwriting recognition.
Advantages and Disadvantages of ANN

Limitations of ANN

.IN
C
N
SY
U
VT
Challenges of Artificial Neural Networks

.IN
C
N
SY
U
VT

You might also like