0% found this document useful (0 votes)

78 views11 pages

ANN Unit 3

A single layer perceptron (SLP) is the simplest type of artificial neural network that can only classify linearly separable data with a binary target. An SLP has a single layer of nodes fully connected to the input layer, with weights assigned randomly initially. It sums the weighted inputs and if the sum exceeds a threshold, the output is activated. The weights are adjusted using a learning rule if the output does not match the desired output, in order to minimize errors. This process continues until the data is classified correctly or the model determines the data is not linearly separable.

Uploaded by

Santhosh Maddila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views11 pages

ANN Unit 3

Uploaded by

Santhosh Maddila

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Single Layered Perceptron

Linearly seperable data

A single layer perceptron (SLP) is a feed-forward network based on a threshold transfer function. SLP is
the simplest type of artificial neural networks and can only classify linearly separable cases with a binary
target (1 , 0).

 In a single layer perceptron, the weights to each input node are assigned randomly since there is
no a priori knowledge associated with the nodes.
 Also, a threshold value is assigned randomly
 Now SLP sums all the weights which are inputted and if the sums are is above the threshold then
the network is activated.
 If the calculated value is matched with the desired value, then the model is successful
 If it is not, then since there is no back-propagation technique involved in this the error needs to
be calculated using the below formula and the weights need to be adjusted again.

Perceptron Weight Adjustment

Below is the equation in Perceptron weight adjustment:

Δw = η * d * x

Where,

d: Predicted Output – Desired Output

η: Learning Rate, Usually Less than 1.

x: Input Data.

Since this network model works with the linear classification and if the data is not linearly separable,
then this model will not show the proper results.

Perceptron Learning Rule (learnp)

Perceptrons are trained on examples of desired behavior. The desired behavior can be summarized by a
set of input, output pairs

P1t1, p2t2, p3t3, ….., pntn

where p is an input to the network and t is the corresponding correct (target) output. The objective is to
reduce the error e, which is the difference between t-a, the neuron response a, and the target vector t.
The perceptron learning rule learnp calculates desired changes to the perceptron's weights and biases
given an input vector p, and the associated error e. The target vector t must contain values of either 0 or
1, as perceptrons (with hardlim transfer functions) can only output such values.

Each time learnp is executed, the perceptron has a better chance of producing the correct outputs. The
perceptron rule is proven to converge on a solution in a finite number of iterations if a solution exists.

There are three conditions that can occur for a single neuron.

 p -> Input vector

 a -> Network response (acquired output)
 t -> Desired Output

CASE 1: If an input vector is presented and the output of the neuron is correct (a = t, and e = t - a = 0),
then the weight vector w is not altered.

CASE 2: If the neuron output is 0 and should have been 1 (a = 0 and t = 1, and e = t - a = 1), the input
vector p is added to the weight vector w. This makes the weight vector point closer to the input vector,
increasing the chance that the input vector will be classified as a 1 in the future.

CASE 3: If the neuron output is 1 and should have been 0 (a = 1 and t = 0, and e = t - a = -1), the input
vector p is subtracted from the weight vector w. This makes the weight vector point farther away from
the input vector, increasing the chance that the input vector is classified as a 0 in the future.
Pattern Classifier

Pattern recognition/classification is the process of recognizing/classifying patterns of data based on

knowledge already gained or on statistical information extracted from patterns and/or their
representation. One of the important aspects of the pattern recognition is its application potential.

Pattern recognition possesses the following features:

1. Pattern recognition system should recognise familiar pattern quickly and accurate
2. Recognize and classify unfamiliar objects
3. Accurately recognize shapes and objects from different angles
4. Identify patterns and objects even when partly hidden
5. Recognise patterns quickly with ease, and with automaticity.

Applications:

Image processing, segmentation and analysis

Pattern recognition is used to give human recognition intelligence to machine which is required in image
processing.

Computer vision

Pattern recognition is used to extract meaningful features from given image/video samples and is used
in computer vision for various applications like biological and biomedical imaging.

Seismic analysis

Pattern recognition approach is used for the discovery, imaging and interpretation of temporal patterns
in seismic array recordings. Statistical pattern recognition is implemented and used in different types of
seismic analysis models.

Radar signal classification/analysis

Pattern recognition and Signal processing methods are used in various applications of radar signal
classifications like AP mine detection and identification.

Speech recognition

The greatest success in speech recognition has been obtained using pattern recognition paradigms. It is
used in various algorithms of speech recognition which tries to avoid the problems of using a phoneme
level of description and treats larger units such as words as pattern
Finger print identification

The fingerprint recognition technique is a dominant technology in the biometric market. A number of
recognition methods have been used to perform fingerprint matching out of which pattern recognition
approaches is widely used.

In a typical pattern recognition application, the raw data is processed and converted into a form that is
amenable for a machine to use. Pattern recognition involves classification and cluster of patterns.

 In classification, an appropriate class label is assigned to a pattern based on an abstraction that

is generated using a set of training patterns or domain knowledge. Classification is used in
supervised learning.
 Clustering generates a partition of the data which helps decision making, the specific decision
making activity of interest to us. Clustering is used in an unsupervised learning.
 Regression. Regression algorithms try to find a relationship between variables and predict
unknown dependent variables based on known data. It is based on supervised learning.

Features may be represented as continuous, discrete or discrete binary variables. A feature is a

function of one or more measurements, computed so that it quantifies some significant characteristics
of the object.

Example: consider our face then eyes, ears, nose etc are features of the face.

A set of features that are taken together, forms the features vector.

Example: In the above example of face, if all the features (eyes, ears, nose etc) taken together then the
sequence is feature vector([eyes, ears, nose]). Feature vector is the sequence of a features represented
as a n-dimensional column vector. In case of speech, MFCC (Melfrequency Cepstral Coefficent) is the
spectral features of the speech.

Training and Learning in Pattern Recognition

Learning is a phenomena through which a system gets trained and becomes adaptable to give result in
an accurate manner. Learning is the most important phase as how well the system performs on the data
provided to the system depends on which algorithms used on the data. Entire dataset is divided into two
categories, one which is used in training the model i.e. Training set and the other that is used in testing
the model after training, i.e. Testing set.

Training set:

Training set is used to build a model. Training rules and algorithms used give relevant information on
how to associate input data with output decision. The system is trained by applying these algorithms on
the dataset, all the relevant information is extracted from the data and results are obtained. Generally,
80% of the data of the dataset is taken for training data.
Testing set:

Testing data is used to test the system. It is the set of data which is used to verify whether the system is
producing the correct output after being trained or not. Generally, 20% of the data of the dataset is used
for testing. Example: a system which identifies which category a particular flower belongs to, is able to
identify seven category of flowers correctly out of ten and rest others wrong, then the accuracy is 70 %

Approaches for Pattern Classification

Statistical approach: Historical data is collected and based on the observations and analyses from those
data, new patterns are recognized.

Syntactical approach: It is also known as the structural approach as it mainly relies upon sub-patterns
called primitives like words.

Neural approach: In this, a neural network is used to analyse the patterns. The advantages of neural
networks are their adaptive-learning, self-organization, and fault-tolerance capabilities. For these
outstanding capabilities, neural networks are used for pattern recognition applications. An ANN initially
goes through a training phase where it learns to recognize patterns in data, whether visually, aurally, or
textually. Some of the best neural models are back-propagation, high-order nets, time-delay neural
networks, and recurrent nets.

Bayesian Classification

Marginal Probability: The probability of an event irrespective of the outcomes of other random
variables, e.g. P(A).

If the random variable is independent, then it is the probability of the event directly, otherwise, if the
variable is dependent upon other variables, then the marginal probability is the probability of the event
summed over all outcomes for the dependent variables, called the sum rule.

Joint Probability: The joint probability is the probability of two (or more) simultaneous events, often
described in terms of events A and B from two dependent random variables, e.g. X and Y. The joint
probability is often summarized as just the outcomes, e.g. A and B. Probability of two (or more)
simultaneous events, e.g. P(A and B) or P(A, B).

Conditional Probability: The conditional probability is the probability of one event given the occurrence
of another event, often described in terms of events A and B from two dependent random variables e.g.
X and Y. Probability of one (or more) event given the occurrence of another event, e.g. P(A given B) or
P(A | B).
Principle of Naive Bayes Classifier:

A Naive Bayes classifier is a probabilistic machine learning model that’s used for classification task. The
crux of the classifier is based on the Bayes theorem.

Event A -> pick a black card from a pack – 26/52

Event B -> pick another black card from the same pack -> 25/51

Bayes Theorem:

Using Bayes theorem, we can find the probability of A happening, given that B has occurred. Here, B is
the evidence and A is the hypothesis. The assumption made here is that the predictors/features are
independent. That is presence of one particular feature does not affect the other. Hence it is called
naive.

Example:

The dataset is divided into two parts, namely, feature matrix and the response vector.
 Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the
value of independent features. In above dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.
 Response vector contains the value of class variable (prediction or output) for each row of
feature matrix. In above dataset, the class variable name is ‘Play golf’.

1. It is to be classified whether the day is suitable for playing golf, given the features of the day.
2. The columns represent these features and the rows represent individual entries.
3. If the first row of the dataset is taken, we can observe that is not suitable for playing golf if the
outlook is rainy, temperature is hot, humidity is high and it is not windy.
4. We make two assumptions here,
a. We consider that these predictors are independent. That is, if the temperature is hot, it does not
necessarily mean that the humidity is high.
b. All the predictors have an equal effect on the outcome. That is, the day being windy does not
have more importance in deciding to play golf or not.

According to this example, Bayes theorem can be rewritten as:

The variable y is the class variable(play golf), which represents if it is suitable to play golf or not given the
conditions. Variable X represent the parameters/features.

X is given as,

Here x1,x2….xn represent the features, i.e they can be mapped to outlook, temperature, humidity and
windy. By substituting for X and expanding using the chain rule we get,

Now, you can obtain the values for each by looking at the dataset and substitute them into the
equation. For all entries in the dataset, the denominator does not change, it remain static. Therefore,
the denominator can be removed and proportionality can be introduced.

Before applying the above formula manually on the weather dataset, some precomputations are to be
done on the dataset.
Find P(xi | yj) for each xi in X and yj in y. All these calculations have been demonstrated in the tables
below:

Calculated P(xi | yj) for each xi in X and yj in y manually in the tables. For example, probability of playing
golf given that the temperature is cool, i.e P(temp = cool | play golf = Yes) = 3/9.
Also, we need to find class probabilities (P(y)) which has been calculated in the table. For example,
P(play golf = Yes) = 9/14.

So now, we are done with our pre-computations and the classifier is ready!

Let us test it on a new set of features (let us call it today):

today = (Sunny, Hot, Normal, False)

So, probability of playing golf is given by:

P(Yes|today) = P(Sunny Outlook|Yes)P(Hot Temperature|Yes)P(Normal Humidity|Yes)P(No Wind|Yes)P(Yes) / P(today)

and probability to not play golf is given by:

P(No | today) = P(Sunny Outlook|No)P(Hot Temperature|No)P(Normal Humidity|No)P(No Wind|No)P(No) / P(today)

Since, P(today) is common in both probabilities, we can ignore P(today) and find proportional
probabilities as:

P(Yes | today) (directly proportional ) 2/9 x 2/9 x 6/9 x 6/9 / 9/14 = approx = 0.0141

and

P(No | today) (directly proportional ) 3/5 x 2/5 x 1/5 x 2/5 / 5/14 = approx = 0.0068

Now, since

P(Yes | today) + P(No | today) = 1

These numbers can be converted into a probability by making the sum equal to 1 (normalization):

P(Yes | today) = 0.0141 / (0.0141 + 0.0068) = 0.67

and

P(No | today) = 0.0068 / (0.0141 + 0.0068) = 0.33

Since

P(Yes | today) > P(No | today)

So, prediction that golf would be played is ‘Yes’.

Perceptron as Pattern Classifier

Like Logistic Regression, the Perceptron is a linear classifier used for binary predictions. This means that
in order for it to work, the data must be linearly separable.

The perceptron works by “learning” a series of weights, corresponding to the input features. These input
features are vectors of the available data.

For example, if we were trying to classify whether an animal is a cat or dog,

x1 might be weight, x2 might be height, and x3 might be length.

Each pair of weights and input features is multiplied together, and then the results are summed. If the
summation is above a certain threshold, we predict one class, otherwise the prediction belongs to a
different class.

For example, we could set the threshold at 0. If the summation is greater than 0, the prediction is a 1
(dog), otherwise it’s a 0 (cat).

The final step is to check if our predictions were classified correctly. If they were not, then the weights
are updated using a learning rate. This process continues for a certain number of iterations, known as
“epochs.” The goal is to determine the weights that produce a linear decision boundary that correctly
classifies the predictions.

Step-by-step Example

A good way to understand exactly how the Perceptron works is to walk through a simple example. I’m
going to use a NAND gate model for my example, which has a very small linearly separable dataset.
Given the two features x1 and x2, here’s what the outputs y are for the NAND gate:

x1 x2 y
0 0 1
0 1 1
1 0 1
1 1 0
Breaking the features and output into column vectors

Limitations of Perceptron

The output values of a perceptron can take on only one of two values (0 or 1) due to the hard-
limit transfer function.
Perceptrons can only classify linearly separable sets of vectors. If a straight line or a plane can be
drawn to separate the input vectors into their correct categories, the input vectors are linearly
separable. If the vectors are not linearly separable, learning will never reach a point where all
vectors are classified properly. Note, however, that it has been proven that if the vectors are
linearly separable, perceptrons trained adaptively will always find a solution in finite time.

PATTERN RECOGNITION Final Notes
90% (10)
PATTERN RECOGNITION Final Notes
40 pages
#5 - Chapter 4
No ratings yet
#5 - Chapter 4
21 pages
PR Unit 1 2
No ratings yet
PR Unit 1 2
40 pages
Pattern Recognition
No ratings yet
Pattern Recognition
5 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
Module 1
No ratings yet
Module 1
22 pages
Unit 5
No ratings yet
Unit 5
4 pages
Pattern Recognition
No ratings yet
Pattern Recognition
3 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
PR & ML: CS5691: Machine Learning
No ratings yet
PR & ML: CS5691: Machine Learning
42 pages
Learning
No ratings yet
Learning
48 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Pattern Recognition in Neural Networks: T. Muthya Mounika, V.V. Vishnu Prabhakar
No ratings yet
Pattern Recognition in Neural Networks: T. Muthya Mounika, V.V. Vishnu Prabhakar
3 pages
Machine Learning - v1
No ratings yet
Machine Learning - v1
30 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Notes
No ratings yet
Notes
125 pages
MOST ASKED QUESTIONS Pattern Recognition GTU
No ratings yet
MOST ASKED QUESTIONS Pattern Recognition GTU
23 pages
DR DL
No ratings yet
DR DL
7 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
Pattern and Classification
No ratings yet
Pattern and Classification
20 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
Presentation On ML
No ratings yet
Presentation On ML
469 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
ML - 1 - Sovan - Introduction To ML
No ratings yet
ML - 1 - Sovan - Introduction To ML
83 pages
Data Exploration
No ratings yet
Data Exploration
5 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Day 2 Part 1
No ratings yet
Day 2 Part 1
52 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
PR Unit 1 ....
No ratings yet
PR Unit 1 ....
34 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
135 pages
MCA - ML Question Bank Answer
No ratings yet
MCA - ML Question Bank Answer
139 pages
Machine Learning - Unit - 1
100% (1)
Machine Learning - Unit - 1
58 pages
Pattern - Recognigation - Lab 3 Sept 23 - Practical File
No ratings yet
Pattern - Recognigation - Lab 3 Sept 23 - Practical File
19 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
Unit 1 Image Proc
No ratings yet
Unit 1 Image Proc
37 pages
Module 1
No ratings yet
Module 1
50 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
34 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Lecture 2: Basics and Definitions: Networks As Data Models
No ratings yet
Lecture 2: Basics and Definitions: Networks As Data Models
28 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
Unit 4
No ratings yet
Unit 4
18 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
2-Inductive Learning
No ratings yet
2-Inductive Learning
37 pages
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
100% (2)
Unit 2 - Machine Learning - WWW - Rgpvnotes.in
21 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
ML Short Question and Answers
No ratings yet
ML Short Question and Answers
11 pages
Pattern Recognition & Analysis Assignment - Ii
No ratings yet
Pattern Recognition & Analysis Assignment - Ii
19 pages
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Perceptrons: Fundamentals and Applications for The Neural Building Block
From Everand
Perceptrons: Fundamentals and Applications for The Neural Building Block
Fouad Sabry
No ratings yet
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
From Everand
Long Short Term Memory: Fundamentals and Applications for Sequence Prediction
Fouad Sabry
No ratings yet
STM Unit 3 - Domain Testing
No ratings yet
STM Unit 3 - Domain Testing
11 pages
GVP College of Engineering For Women Madhurawada::Visakhapatnam
No ratings yet
GVP College of Engineering For Women Madhurawada::Visakhapatnam
1 page
Art Schools - Mathura, Gandhara, Amaravathi
No ratings yet
Art Schools - Mathura, Gandhara, Amaravathi
6 pages
Notes - Unit1,2,3
No ratings yet
Notes - Unit1,2,3
8 pages
Santhosh AJP (IT) 2017-2018
No ratings yet
Santhosh AJP (IT) 2017-2018
2 pages
Database Management Systems Lab: A Record Work Done in
No ratings yet
Database Management Systems Lab: A Record Work Done in
53 pages
PROPOSED - Date Sheet For Mid-Term Examination. March 2024
No ratings yet
PROPOSED - Date Sheet For Mid-Term Examination. March 2024
5 pages
CDCS Specimen Paper C - QP
No ratings yet
CDCS Specimen Paper C - QP
30 pages
Hypnosis Hypnotic Gaze Braco
No ratings yet
Hypnosis Hypnotic Gaze Braco
7 pages
11 Chemistry
No ratings yet
11 Chemistry
2 pages
WiFi Module - ESP8266 - WRL-13252 - SparkFun Electronics PDF
No ratings yet
WiFi Module - ESP8266 - WRL-13252 - SparkFun Electronics PDF
4 pages
Mathematics Stage 9
No ratings yet
Mathematics Stage 9
4 pages
Final Year Project Report Batch - 24
No ratings yet
Final Year Project Report Batch - 24
75 pages
NICU Discharge Plan
No ratings yet
NICU Discharge Plan
58 pages
Transducers:: Manasy Jayasurya Asst. Professor Dept. of Computer Science & Applications ST - Mary's College, Thrissur
No ratings yet
Transducers:: Manasy Jayasurya Asst. Professor Dept. of Computer Science & Applications ST - Mary's College, Thrissur
8 pages
Park Kubzansky
No ratings yet
Park Kubzansky
11 pages
Robb 2009 - Metalsucks
No ratings yet
Robb 2009 - Metalsucks
7 pages
Check My Accounting Homework
100% (1)
Check My Accounting Homework
5 pages
T 502 Productleaflet
No ratings yet
T 502 Productleaflet
2 pages
Effects of Collaborative Leadership On Organizatio
No ratings yet
Effects of Collaborative Leadership On Organizatio
8 pages
64c641a1f3760 Ecostrategist Casestudy Wipro
No ratings yet
64c641a1f3760 Ecostrategist Casestudy Wipro
2 pages
Physics Class 12
No ratings yet
Physics Class 12
9 pages
Jožef Kolarič: Literary Intertextuality in The Lyrics of GZA, MF DOOM, Aesop Rock and Billy Woods
No ratings yet
Jožef Kolarič: Literary Intertextuality in The Lyrics of GZA, MF DOOM, Aesop Rock and Billy Woods
19 pages
List of Experiments OOPM16
No ratings yet
List of Experiments OOPM16
3 pages
ISCOM5104G-GP (T) Configuration Guide (Web) (Rel - 01)
No ratings yet
ISCOM5104G-GP (T) Configuration Guide (Web) (Rel - 01)
31 pages
Bohler Art of Interpretation PDF
No ratings yet
Bohler Art of Interpretation PDF
20 pages
Microsoft Word - WW Formula 2nd Edition Cover and Index
No ratings yet
Microsoft Word - WW Formula 2nd Edition Cover and Index
20 pages
Gebru Netsanet Kassaye 150519190409
No ratings yet
Gebru Netsanet Kassaye 150519190409
65 pages
MZDG - 66172 1 en 0603
No ratings yet
MZDG - 66172 1 en 0603
44 pages
Deif GC1F
100% (1)
Deif GC1F
80 pages
Contoh Soal
No ratings yet
Contoh Soal
6 pages
Attributes of God
No ratings yet
Attributes of God
13 pages
Soham Hall Ticket
No ratings yet
Soham Hall Ticket
1 page
Eco Chill Leaflet Final - Web - 20.02.2024
No ratings yet
Eco Chill Leaflet Final - Web - 20.02.2024
6 pages
Scaphoid Fractures and Nonunions - RP's Ortho Notes
No ratings yet
Scaphoid Fractures and Nonunions - RP's Ortho Notes
3 pages

ANN Unit 3

Uploaded by

ANN Unit 3

Uploaded by

Single Layered Perceptron

Linearly seperable data

Perceptron Weight Adjustment

Below is the equation in Perceptron weight adjustment:

d: Predicted Output – Desired Output

Perceptron Learning Rule (learnp)

P1t1, p2t2, p3t3, ….., pntn

 p -> Input vector

Pattern recognition/classification is the process of recognizing/classifying patterns of data based on

Pattern recognition possesses the following features:

Image processing, segmentation and analysis

Radar signal classification/analysis

 In classification, an appropriate class label is assigned to a pattern based on an abstraction that

Features may be represented as continuous, discrete or discrete binary variables. A feature is a

Training and Learning in Pattern Recognition

Approaches for Pattern Classification

Event A -> pick a black card from a pack – 26/52

According to this example, Bayes theorem can be rewritten as:

Let us test it on a new set of features (let us call it today):

today = (Sunny, Hot, Normal, False)

So, probability of playing golf is given by:

P(Yes|today) = P(Sunny Outlook|Yes)P(Hot Temperature|Yes)P(Normal Humidity|Yes)P(No Wind|Yes)P(Yes) / P(today)

and probability to not play golf is given by:

P(No | today) = P(Sunny Outlook|No)P(Hot Temperature|No)P(Normal Humidity|No)P(No Wind|No)P(No) / P(today)

P(Yes | today) + P(No | today) = 1

P(Yes | today) = 0.0141 / (0.0141 + 0.0068) = 0.67

P(No | today) = 0.0068 / (0.0141 + 0.0068) = 0.33

P(Yes | today) > P(No | today)

So, prediction that golf would be played is ‘Yes’.

For example, if we were trying to classify whether an animal is a cat or dog,

x1 might be weight, x2 might be height, and x3 might be length.

You might also like