0% found this document useful (0 votes)

17 views23 pages

Slide 3

machine learning slide 3

Uploaded by

Muhammad Ibrahim Isah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views23 pages

Slide 3

machine learning slide 3

Uploaded by

Muhammad Ibrahim Isah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Naive Bayes

Classifiers
COLLECTION OF CLASSIFICATION ALGORITHMS
Principle of Naive Bayes
Classifier:
 A Naive Bayes classifier is a probabilistic machine learning model
that’s used for classification task. The crux of the classifier is
based on the Bayes theorem.

 Bayes theorem can be rewritten as:

 It is not a single algorithm but a family of algorithms where all of

them share a common principle, i.e. every pair of features being
classified is independent of each other.
Example:
Let us take an example to get some better intuition. Consider the problem of playing golf. The dataset is represented as below.
 We classify whether the day is suitable for playing golf, given the
features of the day. The columns represent these features and the
rows represent individual entries. If we take the first row of the
dataset, we can observe that is not suitable for playing golf if the
outlook is rainy, temperature is hot, humidity is high and it is not
windy. We make two assumptions here, one as stated above we
consider that these predictors are independent. That is, if the
temperature is hot, it does not necessarily mean that the humidity
is high. Another assumption made here is that all the predictors
have an equal effect on the outcome. That is, the day being windy
does not have more importance in deciding to play golf or not.
 According to this example, Bayes theorem can be rewritten as:

 The variable y is the class variable(play golf), which represents if it

is suitable to play golf or not given the conditions.
Variable X represent the parameters/features.
 X is given as,
 Here x_1,x_2….x_n represent the features, i.e they can be mapped
to outlook, temperature, humidity and windy. By substituting
for X and expanding using the chain rule we get,

 Now, you can obtain the values for each by looking at the dataset
and substitute them into the equation. For all entries in the dataset,
the denominator does not change, it remain static. Therefore, the
denominator can be removed and a proportionality can be
introduced.

 In our case, the class variable(y) has only two outcomes, yes or no.
There could be cases where the classification could be multivariate.
Therefore, we need to find the class y with maximum probability.

 Using the above function, we can obtain the class, given the
predictors.
 We need to find P(xi | yj) for each xi in X and yj in y. All these
calculations have been demonstrated in the tables below:

 So, in the figure above, we have calculated P(x i | yj) for each xi in X
and yj in y manually in the tables 1-4. For example, probability of
playing golf given that the temperature is cool, i.e P(temp. = cool |
play golf = Yes) = 3/9.
 Also, we need to find class probabilities (P(y)) which has been calculated in the
table 5. For example, P(play golf = Yes) = 9/14.
 So now, we are done with our pre-computations and the classifier is ready!
 Let us test it on a new set of features (let us call it today):
Types of Naive Bayes Classifier:

 Multinomial Naive Bayes: This is mostly used for document

classification problem, i.e whether a document belongs to the
category of sports, politics, technology etc. The
features/predictors used by the classifier are the frequency of the
words present.

 Bernoulli Naive Bayes: This is similar to the multinomial naive

bayes but the predictors are boolean variables. The parameters
that we use to predict the class variable take up only values yes
or no, for example if a word occurs in the text or not.

 Gaussian Naive Bayes: When the predictors take up a

continuous value and are not discrete, we assume that these
values are sampled from a gaussian distribution.


Gaussian Distribution(Normal Distribution)

Conclusion:
Naive Bayes algorithms are mostly used in sentiment analysis, spam
filtering, recommendation systems etc. They are fast and easy to
implement but their biggest disadvantage is that the requirement of
predictors to be independent. In most of the real life cases, the predictors
are dependent, this hinders the performance of the classifier.
Decision Tree
CLASSIFICATION ALGORITHM
 Decision tree algorithm falls under the category of supervised learning. They can
be used to solve both regression and classification problems..
 Decision tree builds classification or regression models in the form of a tree
structure. It breaks down a dataset into smaller and smaller subsets while at the
same time an associated decision tree is incrementally developed. The final
result is a tree with decision nodes and leaf nodes. A decision node (e.g.,
Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy). Leaf node
(e.g., Play) represents a classification or decision. The topmost decision node in a
tree which corresponds to the best predictor called root node. Decision trees
can handle both categorical and numerical data.
 We can represent any boolean function on discrete attributes using the decision
tree.
 Types of decision trees
 Categorical Variable Decision Tree: Decision Tree which has categorical target
variable then it called as categorical variable decision tree.
 Continuous Variable Decision Tree: Decision Tree which has continuous target
variable then it is called as Continuous Variable Decision Tree.
 Root Node: It represents entire population or sample and this further gets
divided into two or more homogeneous sets.
 Splitting: It is a process of dividing a node into two or more sub-nodes.
 Decision Node: When a sub-node splits into further sub-nodes, then it is called
decision node.
 Leaf/ Terminal Node: Nodes with no children (no further split) is called Leaf or
Terminal node.
 Pruning: When we reduce the size of decision
trees by removing nodes (opposite of Splitting),
the process is called pruning.
 Branch / Sub-Tree: A sub section of decision
tree is called branch or sub-tree.
 Parent and Child Node: A node, which is divided
into sub-nodes is called parent node of sub-nodes
where as sub-nodes are the child of parent node.
Algorithm
 Algorithms used in decision trees:
 ID3
 Gini Index
 Chi-Square
 Reduction in Variance
 The core algorithm for building decision trees is called ID3. Developed by J.
R. Quinlan and it uses Entropy and Information Gain to construct a decision
tree.
 The ID3 algorithm begins with the original set S as the root node. On each
iteration of the algorithm, it iterates through every unused attribute of the
set S and calculates the entropy H(S)or information gain IG(S) of that
attribute. It then selects the attribute which has the smallest entropy (or
largest information gain) value. The set S is then split or partitioned by the
selected attribute to produce subsets of the data
Entropy
 Entropy is a measure of the randomness in the information being
processed. The higher the entropy, the harder it is to draw any
conclusions from that information. Decision tree algorithm uses
entropy to calculate the homogeneity of a sample. If the sample
is completely homogeneous the entropy is zero and if the sample
is an equally divided it has entropy of one.
Example:
 To build a decision tree, we need to calculate two types of
entropy using frequency tables as follows:
 a) Entropy using the frequency table of one attribute:
 b) Entropy using the frequency table of two attributes:
Information gain

The information gain is based on the decrease in entropy after a dataset is

split on an attribute. Constructing a decision tree is all about finding attribute
that returns the highest information gain (i.e., the most homogeneous
branches).

Step 1: Calculate entropy of the target.

 Step 2: The dataset is then split on the different attributes. The
entropy for each branch is calculated. Then it is added
proportionally, to get total entropy for the split. The resulting
entropy is subtracted from the entropy before the split. The
result is the Information Gain, or decrease in entropy.
 Step 3: Choose attribute with the largest information gain as the
decision node, divide the dataset by its branches and repeat the
same process on every branch.
 Step 4a: A branch with entropy of 0 is a leaf node

 Step 4b: A branch with entropy more than 0 needs further

splitting
 Step 5: The ID3 algorithm is run recursively on the non-leaf
branches, until all data is classified.
 Decision Tree to Decision Rules
 A decision tree can easily be transformed to a set of rules by
mapping from the root node to the leaf nodes one by one.
 Limitations to Decision Trees
 Decision trees tend to have high variance when they utilize
different training and test sets of the same data, since they tend
to overfit on training data. This leads to poor performance on
unseen data. Unfortunately, this limits the usage of decision
trees in predictive modeling.
 To overcome these problems we use ensemble methods, we can
create models that utilize underlying(weak) decision trees as a
foundation for producing powerful results and this is done in
Random Forest Algorithm

Final Internship Report MAAN
No ratings yet
Final Internship Report MAAN
32 pages
AI - Facilitators - Handbook - VII 2025-26
No ratings yet
AI - Facilitators - Handbook - VII 2025-26
38 pages
Sohit
No ratings yet
Sohit
10 pages
Big-Data Unit-3
100% (1)
Big-Data Unit-3
54 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Neural Information Processing: Teddy Mantoro Minho Lee Media Anugerah Ayu Kok Wai Wong Achmad Nizar Hidayanto
No ratings yet
Neural Information Processing: Teddy Mantoro Minho Lee Media Anugerah Ayu Kok Wai Wong Achmad Nizar Hidayanto
703 pages
Neuromorphic Computing Full Report
No ratings yet
Neuromorphic Computing Full Report
12 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Decession Tree
No ratings yet
Decession Tree
72 pages
Classification
No ratings yet
Classification
148 pages
Machine Learning: 1.1 Types of Problems and Tasks
No ratings yet
Machine Learning: 1.1 Types of Problems and Tasks
9 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Interpedia 2023
No ratings yet
Interpedia 2023
183 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
Unit II
No ratings yet
Unit II
34 pages
Internshala Summer Training Report On Data Science
No ratings yet
Internshala Summer Training Report On Data Science
70 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
Aids 5
No ratings yet
Aids 5
58 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
8 Classification
No ratings yet
8 Classification
45 pages
Akka Infoq Agentic Ai Design Patterns
No ratings yet
Akka Infoq Agentic Ai Design Patterns
33 pages
Data Science Notes
No ratings yet
Data Science Notes
13 pages
Definition and Scope of Statistics PPT
No ratings yet
Definition and Scope of Statistics PPT
20 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
HAAI Linear Models Slides
No ratings yet
HAAI Linear Models Slides
49 pages
Industry 4.0 - The Future of Manufacturing - SAP
No ratings yet
Industry 4.0 - The Future of Manufacturing - SAP
9 pages
Tree Models
No ratings yet
Tree Models
42 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
Springer - Linguistic Decision Trees For Classification-2014
No ratings yet
Springer - Linguistic Decision Trees For Classification-2014
43 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Cpts 440 / 540 Artificial Intelligence: Search
No ratings yet
Cpts 440 / 540 Artificial Intelligence: Search
182 pages
Decision Tree Random Forrest Naive Bayes 02
No ratings yet
Decision Tree Random Forrest Naive Bayes 02
13 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
AIML Module-04
No ratings yet
AIML Module-04
46 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Slide 2
No ratings yet
Slide 2
30 pages
Massive Passive AI Review
No ratings yet
Massive Passive AI Review
4 pages
CH 5
No ratings yet
CH 5
84 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Unit 5
No ratings yet
Unit 5
25 pages
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
No ratings yet
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
14 pages
2024 Data Engineering Trends and Predictions
No ratings yet
2024 Data Engineering Trends and Predictions
18 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
Mini Project 1
No ratings yet
Mini Project 1
16 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
DM Module 4
No ratings yet
DM Module 4
12 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
DWM Exp3 63
No ratings yet
DWM Exp3 63
7 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Slide 1
No ratings yet
Slide 1
29 pages
AI - Introduction - Laura Perea
No ratings yet
AI - Introduction - Laura Perea
23 pages
Media Kit 2021
No ratings yet
Media Kit 2021
23 pages
Special Topics in Computing
No ratings yet
Special Topics in Computing
2 pages
Classification
No ratings yet
Classification
33 pages
Unit 3
No ratings yet
Unit 3
16 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Lecture 023+-+Decision+Trees+ - 1
No ratings yet
Lecture 023+-+Decision+Trees+ - 1
54 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
OR - LPP For Class
No ratings yet
OR - LPP For Class
8 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Atlassian Letter To Shareholders
No ratings yet
Atlassian Letter To Shareholders
24 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Quiz 3
No ratings yet
Quiz 3
5 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Artificial Intelligence More Human With Human
No ratings yet
Artificial Intelligence More Human With Human
3 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Csa4005 Expert-Systems-And-Fuzzy-Logic LT 1.0 6 Csa4005
No ratings yet
Csa4005 Expert-Systems-And-Fuzzy-Logic LT 1.0 6 Csa4005
2 pages
Honor in Artificial Intelligence
No ratings yet
Honor in Artificial Intelligence
3 pages
EMERITUS Online Certificate Diploma Programs Calendar 2021
No ratings yet
EMERITUS Online Certificate Diploma Programs Calendar 2021
18 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Machine Learning 1
No ratings yet
Machine Learning 1
2 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Theses Guided Research Projects Seminars Co Supervision by Saqib Bukhari 2016 2019
No ratings yet
Theses Guided Research Projects Seminars Co Supervision by Saqib Bukhari 2016 2019
7 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Machine Learning by Joerg Kienitz
No ratings yet
Machine Learning by Joerg Kienitz
5 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Slide 3

Uploaded by

Slide 3

Uploaded by

Naive Bayes

 Bayes theorem can be rewritten as:

 It is not a single algorithm but a family of algorithms where all of

 The variable y is the class variable(play golf), which represents if it

 Multinomial Naive Bayes: This is mostly used for document

 Bernoulli Naive Bayes: This is similar to the multinomial naive

 Gaussian Naive Bayes: When the predictors take up a

Gaussian Distribution(Normal Distribution)

The information gain is based on the decrease in entropy after a dataset is

Step 1: Calculate entropy of the target.

 Step 4b: A branch with entropy more than 0 needs further

You might also like