0% found this document useful (0 votes)

120 views17 pages

Naïve Bayes Classifier in Pattern Recognition

The document provides information about prototype selection, model prototyping, and the Naive Bayes classifier machine learning algorithm. It defines prototype selection as selecting a minimal subset of samples that represent a larger dataset. It describes model prototyping as the iterative phase of building machine learning models through data preparation and model training/tuning. It then provides details on the Naive Bayes classifier, including how it works, its assumptions, applications, advantages, disadvantages, and different types.

Uploaded by

Ankur Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views17 pages

Naïve Bayes Classifier in Pattern Recognition

Uploaded by

Ankur Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Pattern Recognition

PECCS702B
WorkBook
Semester - 7
Prof. Bavrabi Ghosh

What is prototype selection?

Prototype methods seek a minimal subset of samples that can serve as a distillation or
condensed view of a data set. As the size of modern data sets grows, being able to present a
domain specialist with a short list of “representative” samples chosen from the data set is of
increasing interpretative value.

What is model prototyping?

Model prototyping is the phase in pattern recognition model development lifecycle where
data scientists iterate towards building best performing models through data loading,
cleansing, preparation, feature engineering, model training, tuning and scoring so that it can
be used in production environment to meet a business need. On the data side, this
experimental and iterative phase is where data scientists gather all the domain knowledge
from SMEs, explore the univariate data distributions and relationships between features
and possible target labels, and establish relationships among multiple features. On the
model side, data scientists explore different modelling options based upon the identified
business use case, as well as requirements for interpretability and metrics for evaluating the
performance of the models.

Why is model prototyping important?

The various decisions made during model prototyping contribute to the end performance of
AI applications. Further optimizing and automating the model prototyping experience (as
needed) for rapid iteration enables data scientists to become efficient in terms of time
taken, infrastructural resources used and number of experiments required to iterate on,
thereby accelerating the entire AI application development lifecycle.

Naïve Bayes Classifier

Naïve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem
and used for solving classification problems.
It is mainly used in text classification that includes a high-dimensional training dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification algorithms which
helps in building the fast machine learning models that can make quick predictions.
It is a probabilistic classifier, which means it predicts on the basis of the probability of an
object.
Some popular examples of Naïve Bayes Algorithm are spam filtration, Sentimental analysis,
and classifying articles.

Why is it called Naïve Bayes?

The Naïve Bayes algorithm is comprised of two words Naïve and Bayes, Which can be
described as:

Naïve: It is called Naïve because it assumes that the occurrence of a certain feature is
independent of the occurrence of other features. Such as if the fruit is identified on the
bases of color, shape, and taste, then red, spherical, and sweet fruit is recognized as an
apple. Hence each feature individually contributes to identify that it is an apple without
depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Bayes theorem

Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability of a
hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes classifier

Working of Naïve Bayes' Classifier can be understood with the help of the below example:

Suppose we have a dataset of weather conditions and corresponding target variable "Play".
So using this dataset we need to decide that whether we should play or not on a particular
day according to the weather conditions. So to solve this problem, we need to follow the
below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?

Solution: To solve this, first consider the below dataset:

Play
Outlook

0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2
Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)

P(Sunny|Yes)= 3/10= 0.3

P(Sunny)= 0.35

P(Yes)=0.71

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)

P(Sunny|NO)= 2/4=0.5

P(No)= 0.29

P(Sunny)= 0.35

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Hence on a Sunny day, Player can play the game.

Note - https://www.analyticsvidhya.com/blog/2022/10/frequently-asked-interview-
questions-on-naive-bayes-classifier/

Advantages of Naïve Bayes Classifier:

o Naïve Bayes is one of the fast and easy ML algorithms to predict a class of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

o Naive Bayes assumes that all features are independent or unrelated, so it cannot
learn the relationship between features.

Application of Naïve Bayes Classifier

o It is used for Credit Scoring.

o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is an eager
learner.
o It is used in Text classification such as Spam filtering and Sentiment analysis.

Types of Naïve Bayes Classifier:

There are three types of Naive Bayes Model, which are given below:

o Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian distribution.
o Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems, it
means a particular document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
o Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but the
predictor variables are the independent Booleans variables. Such as if a particular
word is present or not in a document. This model is also famous for document
classification tasks.
Steps to Implement Naïve Bayes Algorithm:

o Data Pre-processing step

o Fitting Naive Bayes to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

Email Spam Classifier using Naïve Bayes

What is Spam filtering

Spam Email , become a big trouble over the internet. Spam is waste of time, storage space
and communication bandwidth. The problem of spam e-mail has been increasing for years.
In recent statistics, 40% of all emails are spam which about 15.4 billion email per day and
that cost internet users about $355 million per year. Knowledge engineering and machine
learning are the two general approaches used in e-mail filtering. In knowledge engineering
approach a set of rules has to be specified according to which emails are categorized as
spam or ham.

Machine learning approach is more efficient than knowledge engineering approach; it does
not require specifying any rules . Instead, a set of training samples, these samples is a set of
pre classified e-mail messages. A specific algorithm is then used to learn the classification
rules from these e-mail messages. Machine learning approach has been widely studied and
there are lots of algorithms can be used in e-mail filtering. They include Naive Bayes,
support vector machines, Neural Networks, K-nearest neighbour, Rough sets and the
artificial immune system.

Why We Using Naive Bayes as an Algorithms for Filtering the Email:-

Naive Bayes work on dependent events and the probability of an event occurring in the
future that can be detected from the previous occurring of the same event . This technique
can be used to classify spam e-mails, words probabilities play the main rule here. If some
words occur often in spam but not in ham, then this incoming e-mail is probably spam.
Naive Bayes classifier technique has become a very popular method in mail filtering Email.
Every word has certain probability of occurring in spam or ham email in its database. If the
total of words probabilities exceeds a certain limit, the filter will mark the e-mail to either
category. Here, only two categories are necessary: spam or ham.

The statistic we are mostly interested for a token T is its spamminess (spam rating),
calculated as follows:-

Where CSpam(T) and CHam(T) are the number of spam or ham messages containing token
T, respectively. To calculate the possibility for a message M with tokens {T1,……,TN}, one
needs to combine the individual token’s spamminess to evaluate the overall message
spamminess. A simple way to make classifications is to calculate the product of individual
token’s spamminess and compare it with the product of individual token’s hamminess

(H [M] = Π ( 1- S [T ]))

The message is considered spam if the overall spamminess product S[M] is larger than the
hamminess product H[M].

All the Machine Learning Algorithms works on two stages:-

Training Stage.

Testing Stage.

So In the Training Stage Naive Bayes create a Lookup table in which they store all the
possibility of probability which we are going to use in the Algorithm for predicting the result.

And In the testing phase let Suppose you have given a test point to the algorithm to predict
the result , they fetch the values from the lookup table in which they store all the possibility
of probability and use that value to predict the result .

Decision tree classification algorithm

Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a
tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.

The decisions or the test are performed on the basis of features of the given dataset.

It is a graphical representation for getting all the possible solutions to a problem/decision

based on given conditions.

It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.

In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.

A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.

Below diagram explains the general structure of a decision tree:

Why to use Decision Tree?

There are various algorithms in Machine learning, so choosing the best algorithm for the
given dataset and problem is the main point to remember while creating a machine learning
model. Below are the two reasons for using the Decision tree:

Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.

The logic behind the decision tree can be easily understood because it shows a tree-like
structure.

Decision tree terminologies

Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further
after getting a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are
called the child nodes.

Working of Decision Tree algorithm

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the
root node of the tree. This algorithm compares the values of root attribute with the record
(real dataset) attribute and, based on the comparison, follows the branch and jumps to the
next node.

For the next node, the algorithm again compares the attribute value with the other sub-
nodes and move further. It continues the process until it reaches the leaf node of the tree.
The complete process can be better understood using the below algorithm:

o Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not. So, to solve this problem, the decision tree starts with the
root node (Salary attribute by ASM). The root node splits further into the next decision node
(distance from the office) and one leaf node based on the corresponding labels. The next
decision node further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes. So, to solve such problems there is a
technique which is called as Attribute selection measure or ASM. By this measurement, we
can easily select the best attribute for the nodes of the tree. There are two popular
techniques for ASM, which are:

o Information Gain
o Gini Index

1. Information Gain:

o Information gain is the measurement of changes in entropy after the segmentation

of a dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision
tree.
o A decision tree algorithm always tries to maximize the value of information gain, and
a node/attribute having the highest information gain is split first. It can be calculated
using the below formula:
1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies

randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples

o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision tree in
the CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high
Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits.
o Gini index can be calculated using the below formula:

Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order to get the
optimal decision tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all the
important features of the dataset. Therefore, a technique that decreases the size of the
learning tree without reducing accuracy is known as Pruning. There are mainly two types of
tree pruning technology used:

o Cost Complexity Pruning

o Reduced Error Pruning.

Advantages of the Decision Tree

o It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.

o It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
o For more class labels, the computational complexity of the decision tree may
increase.

Steps to implement Decision tree

o Data Pre-processing step

o Fitting a Decision-Tree algorithm to the Training set
o Predicting the test result
o Test accuracy of the result(Creation of Confusion matrix)
o Visualizing the test set result.

Linear Discriminant Analysis (LDA) in Machine Learning

Linear Discriminant Analysis (LDA) is one of the commonly used dimensionality reduction
techniques in machine learning to solve more than two-class classification problems. It is
also known as Normal Discriminant Analysis (NDA) or Discriminant Function Analysis (DFA).

This can be used to project the features of higher dimensional space into lower-dimensional
space in order to reduce resources and dimensional costs. In this topic, "Linear Discriminant
Analysis (LDA) in machine learning”, we will discuss the LDA algorithm for classification
predictive modeling problems, limitation of logistic regression, representation of linear
Discriminant analysis model, how to make a prediction using LDA, how to prepare data for
LDA, extensions to LDA and much more. So, let's start with a quick introduction to Linear
Discriminant Analysis (LDA) in machine learning.

What is Linear Discriminant Analysis (LDA)?

Although the logistic regression algorithm is limited to only two-class, linear Discriminant
analysis is applicable for more than two classes of classification problems.

Linear Discriminant analysis is one of the most popular dimensionality reduction techniques
used for supervised classification problems in machine learning. It is also considered a pre-
processing step for modeling differences in ML and applications of pattern classification.

Whenever there is a requirement to separate two or more classes having multiple features
efficiently, the Linear Discriminant Analysis model is considered the most common
technique to solve such classification problems. For e.g., if we have two classes with
multiple features and need to separate them efficiently. When we classify them using a
single feature, then it may show overlapping.

To overcome the overlapping issue in the classification process, we must increase the
number of features regularly.

Example:

Let's assume we have to classify two different classes having two sets of data points in a 2-
dimensional plane as shown below image:
However, it is impossible to draw a straight line in a 2-d plane that can separate these data
points efficiently but using linear Discriminant analysis; we can dimensionally reduce the 2-D
plane into the 1-D plane. Using this technique, we can also maximize the separability
between multiple classes.

How Linear Discriminant Analysis (LDA) works?

Linear Discriminant analysis is used as a dimensionality reduction technique in machine

learning, using which we can easily transform a 2-D and 3-D graph into a 1-dimensional
plane.

Let's consider an example where we have two classes in a 2-D plane having an X-Y axis, and
we need to classify them efficiently. As we have already seen in the above example that LDA
enables us to draw a straight line that can completely separate the two classes of the data
points. Here, LDA uses an X-Y axis to create a new axis by separating them using a straight
line and projecting data onto a new axis.

Hence, we can maximize the separation between these classes and reduce the 2-D plane
into 1-D.
To create a new axis, Linear Discriminant Analysis uses the following criteria:

It maximizes the distance between means of two classes.

It minimizes the variance within the individual class.
Using the above two conditions, LDA generates a new axis in such a way that it can
maximize the distance between the means of the two classes and minimizes the variation
within each class.
In other words, we can say that the new axis will increase the separation between the data
points of the two classes and plot them onto the new axis.

Why LDA?
o Logistic Regression is one of the most popular classification algorithms that perform
well for binary classification but falls short in the case of multiple classification
problems with well-separated classes. At the same time, LDA handles these quite
efficiently.
o LDA can also be used in data pre-processing to reduce the number of features, just
as PCA, which reduces the computing cost significantly.
o LDA is also used in face detection algorithms. In Fisherfaces, LDA is used to extract
useful data from different faces. Coupled with eigenfaces, it produces effective
results.
Drawbacks of Linear Discriminant Analysis (LDA)
Although, LDA is specifically used to solve supervised classification problems for two or
more classes which are not possible using logistic regression in machine learning. But LDA
also fails in some cases where the Mean of the distributions is shared. In this case, LDA fails
to create a new axis that makes both the classes linearly separable.
To overcome such problems, we use non-linear Discriminant analysis in machine learning.
Extension to Linear Discriminant Analysis (LDA)
Linear Discriminant analysis is one of the most simple and effective methods to solve
classification problems in machine learning. It has so many extensions and variations as
follows:
1. Quadratic Discriminant Analysis (QDA): For multiple input variables, each class
deploys its own estimate of variance.
2. Flexible Discriminant Analysis (FDA): it is used when there are non-linear groups of
inputs are used, such as splines.
3. Flexible Discriminant Analysis (FDA): This uses regularization in the estimate of the
variance (actually covariance) and hence moderates the influence of different
variables on LDA.
Real-world Applications of LDA
Some of the common real-world applications of Linear discriminant Analysis are given
below:
o Face Recognition
Face recognition is the popular application of computer vision, where each face is
represented as the combination of a number of pixel values. In this case, LDA is used
to minimize the number of features to a manageable number before going through
the classification process. It generates a new template in which each dimension
consists of a linear combination of pixel values. If a linear combination is generated
using Fisher's linear discriminant, then it is called Fisher's face.
o Medical
In the medical field, LDA has a great application in classifying the patient disease on
the basis of various parameters of patient health and the medical treatment which is
going on. On such parameters, it classifies disease as mild, moderate, or severe. This
classification helps the doctors in either increasing or decreasing the pace of the
treatment.
o Customer Identification
In customer identification, LDA is currently being applied. It means with the help of
LDA; we can easily identify and select the features that can specify the group of
customers who are likely to purchase a specific product in a shopping mall. This can
be helpful when we want to identify a group of customers who mostly purchase a
product in a shopping mall.
o For Predictions
LDA can also be used for making predictions and so in decision making. For example,
"will you buy this product” will give a predicted result of either one or two possible
classes as a buying or not.
o In Learning
Nowadays, robots are being trained for learning and talking to simulate human work,
and it can also be considered a classification problem. In this case, LDA builds similar
groups on the basis of different parameters, including pitches, frequencies, sound,
tunes, etc.
Difference between Linear Discriminant Analysis and PCA
Below are some basic differences between LDA and PCA:
o PCA is an unsupervised algorithm that does not care about classes and labels and
only aims to find the principal components to maximize the variance in the given
dataset. At the same time, LDA is a supervised algorithm that aims to find the linear
discriminants to represent the axes that maximize separation between different
classes of data.
o LDA is much more suitable for multi-class classification tasks compared to PCA.
However, PCA is assumed to be an as good performer for a comparatively small
sample size.
o Both LDA and PCA are used as dimensionality reduction techniques, where PCA is
first followed by LDA.

How to Prepare Data for LDA

Below are some suggestions that one should always consider while preparing the data to
build the LDA model:
o Classification Problems: LDA is mainly applied for classification problems to classify
the categorical output variable. It is suitable for both binary and multi-class
classification problems.
o Gaussian Distribution: The standard LDA model applies the Gaussian Distribution of
the input variables. One should review the univariate distribution of each attribute
and transform them into more Gaussian-looking distributions. For e.g., use log and
root for exponential distributions and Box-Cox for skewed distributions.
o Remove Outliers: It is good to firstly remove the outliers from your data because
these outliers can skew the basic statistics used to separate classes in LDA, such as
the mean and the standard deviation.
o Same Variance: As LDA always assumes that all the input variables have the same
variance, hence it is always a better way to firstly standardize the data before
implementing an LDA model. By this, the Mean will be 0, and it will have a standard
deviation of 1.

What Is Naive Bayes Algorithm
No ratings yet
What Is Naive Bayes Algorithm
10 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
Naive Bayes Model
No ratings yet
Naive Bayes Model
10 pages
Naïve Bayes Classifiers 3
No ratings yet
Naïve Bayes Classifiers 3
16 pages
Mechine Learning
No ratings yet
Mechine Learning
7 pages
UNIT 2 AAM Notes
No ratings yet
UNIT 2 AAM Notes
38 pages
Nave Bayes Algorithms
No ratings yet
Nave Bayes Algorithms
15 pages
CSL0777 L24
No ratings yet
CSL0777 L24
38 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
16 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
14 pages
Naive Bayes for Python Newbies
No ratings yet
Naive Bayes for Python Newbies
3 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
10 pages
6d7701 - Bayesean Classifer
No ratings yet
6d7701 - Bayesean Classifer
8 pages
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
No ratings yet
ML Unit No.4 Naïve Bayes Classifiers PPT Notes
47 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
3 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
16 pages
Naïve Bayes Classifier Guide
No ratings yet
Naïve Bayes Classifier Guide
24 pages
Naive Bayes Classifier Explained
No ratings yet
Naive Bayes Classifier Explained
3 pages
Ame: Waqar Ali
No ratings yet
Ame: Waqar Ali
22 pages
Naïve Bayes Classifier Explained
No ratings yet
Naïve Bayes Classifier Explained
33 pages
Naive Bayes Classifier in Machine Learning
No ratings yet
Naive Bayes Classifier in Machine Learning
16 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
17 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
18 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
6 pages
Naïve Baye's Classification
No ratings yet
Naïve Baye's Classification
16 pages
Naive Bayes Classifier 1
No ratings yet
Naive Bayes Classifier 1
18 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
Report On Naive Bayes
No ratings yet
Report On Naive Bayes
5 pages
Baye's Theorem - Example
No ratings yet
Baye's Theorem - Example
7 pages
What Is Naive Bayes Algorithm?
No ratings yet
What Is Naive Bayes Algorithm?
18 pages
Naive Bayes Classifier in Machine Learning - Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning - Javatpoint
19 pages
Naive Bayes
No ratings yet
Naive Bayes
7 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
11 pages
Naive Bayes Algorithm
No ratings yet
Naive Bayes Algorithm
3 pages
Naive Bayes & SVM Overview
No ratings yet
Naive Bayes & SVM Overview
79 pages
Notes
No ratings yet
Notes
32 pages
L25 - Naïve Bayes
No ratings yet
L25 - Naïve Bayes
18 pages
Unit 4
No ratings yet
Unit 4
36 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Naïve Bayes & Bayesian Networks Guide
No ratings yet
Naïve Bayes & Bayesian Networks Guide
9 pages
Naive Bayes
No ratings yet
Naive Bayes
30 pages
ML Notes (III BCA)
No ratings yet
ML Notes (III BCA)
64 pages
NOTES
No ratings yet
NOTES
15 pages
AI & ML: Naïve Bayes Explained
No ratings yet
AI & ML: Naïve Bayes Explained
4 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
5 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
8 pages
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
No ratings yet
6 Easy Steps To Learn Naive Bayes Algorithm With Codes in Python and R
6 pages
Naive Bayes Classifier in Machine Learning Javatpoint
No ratings yet
Naive Bayes Classifier in Machine Learning Javatpoint
23 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
15 pages
Naive Ba Yes
No ratings yet
Naive Ba Yes
28 pages
Ex - No.5 - Naïve Bayesian Classifier
No ratings yet
Ex - No.5 - Naïve Bayesian Classifier
4 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
Naive Bayes Classifier Overview
No ratings yet
Naive Bayes Classifier Overview
2 pages
WK 08
No ratings yet
WK 08
10 pages
A Review On The Role of AI in The Formation of The Future of Software Engineering
No ratings yet
A Review On The Role of AI in The Formation of The Future of Software Engineering
5 pages
Understanding Style in Art History
No ratings yet
Understanding Style in Art History
1 page
AI & Digital Marketing Trends 2023
No ratings yet
AI & Digital Marketing Trends 2023
2 pages
PE Exam Study Guide
No ratings yet
PE Exam Study Guide
2 pages
Module For Mass Comm
No ratings yet
Module For Mass Comm
3 pages
Socioeconomic Development 2 Course Outline
No ratings yet
Socioeconomic Development 2 Course Outline
4 pages
Quarter 2 3: Disciplines and Ideas in The Applied Social Sciences
No ratings yet
Quarter 2 3: Disciplines and Ideas in The Applied Social Sciences
5 pages
Philosophical Views on the Self
100% (1)
Philosophical Views on the Self
12 pages
Group Project - MKTG 310 - Spring 2024
No ratings yet
Group Project - MKTG 310 - Spring 2024
3 pages
Freilich EcologyCultureEnvironmental 1967
No ratings yet
Freilich EcologyCultureEnvironmental 1967
19 pages
LGS 2nd Sem-Final TOS
No ratings yet
LGS 2nd Sem-Final TOS
2 pages
OB Module 1-1
No ratings yet
OB Module 1-1
7 pages
FriendRich Froebel Kindergarten
No ratings yet
FriendRich Froebel Kindergarten
15 pages
Article Review - IML656
No ratings yet
Article Review - IML656
10 pages
Language Acquisition Insights
No ratings yet
Language Acquisition Insights
8 pages
Instruments For Exploring Organizational Culture A
No ratings yet
Instruments For Exploring Organizational Culture A
11 pages
Lesson 7 Systems Theories
No ratings yet
Lesson 7 Systems Theories
15 pages
Understanding the Stroop Effect: Research Insights
No ratings yet
Understanding the Stroop Effect: Research Insights
5 pages
Human Development: Key Concepts and Stages
No ratings yet
Human Development: Key Concepts and Stages
15 pages
Data Mi Nin: Find The Answers To These Questions in The Following Text
No ratings yet
Data Mi Nin: Find The Answers To These Questions in The Following Text
3 pages
Day 76 Passage 2
No ratings yet
Day 76 Passage 2
4 pages
Research Method and Research Methodology
100% (4)
Research Method and Research Methodology
3 pages
Pengaruh Penggunaan Aplikasi Digital Terhadap Efek
No ratings yet
Pengaruh Penggunaan Aplikasi Digital Terhadap Efek
10 pages
AI for Solar Panel Fault Detection
No ratings yet
AI for Solar Panel Fault Detection
10 pages
DLL Ucsp Template
No ratings yet
DLL Ucsp Template
6 pages
Amartya Sen's Views On Justice
No ratings yet
Amartya Sen's Views On Justice
11 pages
Research Trends in Cybercrime and Cybersecurity - A Review Based o
No ratings yet
Research Trends in Cybercrime and Cybersecurity - A Review Based o
25 pages
The Impact of Technology on Society
No ratings yet
The Impact of Technology on Society
2 pages
Chapter 1 Theoretical Perspectives On Gender
No ratings yet
Chapter 1 Theoretical Perspectives On Gender
6 pages
Flores, Atilano, Suh, Navarro (2019)
No ratings yet
Flores, Atilano, Suh, Navarro (2019)
15 pages

Naïve Bayes Classifier in Pattern Recognition

Uploaded by

Naïve Bayes Classifier in Pattern Recognition

Uploaded by

Pattern Recognition

What is prototype selection?

What is model prototyping?

Why is model prototyping important?

Naïve Bayes Classifier

Why is it called Naïve Bayes?

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes classifier

Solution: To solve this, first consider the below dataset:

Frequency table for the Weather Conditions:

Likelihood table weather condition:

Overcast 0 5 5/14= 0.35

All 4/14=0.29 10/14=0.71

P(Sunny|Yes)= 3/10= 0.3

So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60

So P(No|Sunny)= 0.5*0.29/0.35 = 0.41

So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)

Advantages of Naïve Bayes Classifier:

Disadvantages of Naïve Bayes Classifier:

Application of Naïve Bayes Classifier

o It is used for Credit Scoring.

Types of Naïve Bayes Classifier:

o Data Pre-processing step

Email Spam Classifier using Naïve Bayes

Why We Using Naive Bayes as an Algorithms for Filtering the Email:-

All the Machine Learning Algorithms works on two stages:-

Decision tree classification algorithm

It is a graphical representation for getting all the possible solutions to a problem/decision

Below diagram explains the general structure of a decision tree:

Why to use Decision Tree?

Decision tree terminologies

Working of Decision Tree algorithm

o Information gain is the measurement of changes in entropy after the segmentation

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

o S= Total number of samples

Pruning: Getting an Optimal Decision tree

o Cost Complexity Pruning

Advantages of the Decision Tree

Disadvantages of the Decision Tree

o The decision tree contains lots of layers, which makes it complex.

Steps to implement Decision tree

o Data Pre-processing step

Linear Discriminant Analysis (LDA) in Machine Learning

What is Linear Discriminant Analysis (LDA)?

How Linear Discriminant Analysis (LDA) works?

Linear Discriminant analysis is used as a dimensionality reduction technique in machine

It maximizes the distance between means of two classes.

How to Prepare Data for LDA

You might also like