0% found this document useful (0 votes)

5 views12 pages

Chapt 2 Notes

This document provides an overview of supervised learning techniques, focusing on classification algorithms such as Naïve Bayes, Decision Trees, and Random Forests. It explains the workings, advantages, and disadvantages of each algorithm, including their applications in various fields like banking, medicine, and marketing. Key concepts such as Bayes' theorem, decision tree terminologies, and attribute selection measures are also discussed to illustrate the classification process.

Uploaded by

gx59368

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views12 pages

Chapt 2 Notes

Uploaded by

gx59368

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Chapt.

2 Supervised Learning :Naïve Bayes, Decision Tree and Random

Forest

Q.1.What is the Classification Algorithm?

Ans-The Classification algorithm is a Supervised Learning technique that is
used to identify the category of new observations on the basis of training data.
In Classification, a program learns from the given dataset or observations and
then classifies new observation into a number of classes or groups. Such as, Yes
or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be called as
targets/labels or categories.
Unlike regression, the output variable of Classification is a category, not a
value, such as "Green or Blue", "fruit or animal", etc. Since the Classification
algorithm is a Supervised learning technique, hence it takes labelled input data,
which means it contains input with the corresponding output.

Q.2.What is Naïve Bayes Classifier Algorithm?

Ans-Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
It is mainly used in text classification that includes a high-dimensional training
dataset.
Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
examples of Naïve Bayes Algorithm -spam filtration, Sentimental analysis,
and classifying articles.

Q.3.Explain Working of Naïve Bayes' Classifier.

Ans-Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we should
play or not on a particular day according to the weather conditions. So to solve
this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Ans-Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0

Rainy 2 2

Sunny 3 2

Total 10 5

Likelihood table weather condition:

Weather No Yes

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
Q-4.What are the Types of Naïve Bayes Model:
Ans-There are three types of Naive Bayes Model, which are given below:
1.Gaussian: The Gaussian model assumes that features follow a normal
distribution. This means if predictors take continuous values instead of discrete,
then the model assumes that these values are sampled from the Gaussian
distribution.
2.Multinomial: The Multinomial Naïve Bayes classifier is used when the data
is multinomial distributed. It is primarily used for document classification
problems, it means a particular document belongs to which category such as
Sports, Politics, education, etc.
The classifier uses the frequency of words for the predictors.
3.Bernoulli: The Bernoulli classifier works similar to the Multinomial
classifier, but the predictor variables are the independent Booleans variables.
Such as if a particular word is present or not in a document. This model is also
famous for document classification tasks.
Q.5. Why is it called Naïve Bayes?
Ans-The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:
Naïve: It is called Naïve because it assumes that the occurrence of a certain
feature is independent of the occurrence of other features. Such as if the fruit is
identified on the bases of color, shape, and taste, then red, spherical, and sweet
fruit is recognized as an apple. Hence each feature individually contributes to
identify that it is an apple without depending on each other.
Bayes: It is called Bayes because it depends on the principle of Bayes' Theorem.

Q.6.What is Bayes' Theorem?

Ans-Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used
to determine the probability of a hypothesis with prior knowledge. It depends on
the conditional probability.
The formula for Bayes' theorem is given as:
Where,
P(A|B) is Posterior probability: Probability of hypothesis A on the observed
event B.
P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
P(A) is Prior Probability: Probability of hypothesis before observing the
evidence.
P(B) is Marginal Probability: Probability of Evidence.

Q.7.What are the Applications of Naïve Bayes Classifier?

Ans-1-It is used for Credit Scoring.
2-It is used in medical data classification.
3-It can be used in real-time predictions because Naïve Bayes Classifier is an
eager learner.
4-It is used in Text classification such as Spam filtering and Sentiment analysis.
Q.8.What is Decision Tree?
Ans-Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
The decisions or the test are performed on the basis of features of the given
dataset.
o A decision tree can contain categorical data (YES/NO) as well as
numeric data.
Q.9. Why use Decision Trees?
Ans-Decision Trees usually mimic human thinking ability while making a
decision, so it is easy to understand.
The logic behind the decision tree can be easily understood because it shows a
tree-like structure.
Q.10.What are the Decision Tree Terminologies?
Ans- 1.Root Node: Root node is from where the decision tree starts. It
represents the entire dataset, which further gets divided into two or more
homogeneous sets.
2.Leaf Node: Leaf nodes are the final output node, and the tree cannot be
segregated further after getting a leaf node.
3.Splitting: Splitting is the process of dividing the decision node/root node into
sub-nodes according to the given conditions.
4. Branch/Sub Tree: A tree formed by splitting the tree.
5. Pruning: Pruning is the process of removing the unwanted branches from the
tree.
6.Parent/Child node: The root node of the tree is called the parent node, and
other nodes are called the child nodes.
Q.11.How does the Decision Tree algorithm Work?
Ans-In a decision tree, for predicting the class of the given dataset, the
algorithm starts from the root node of the tree. This algorithm compares the
values of root attribute with the record (real dataset) attribute and, based on the
comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the
other sub-nodes and move further. It continues the process until it reaches the
leaf node of the tree. The complete process can be better understood using the
below algorithm:
Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection
Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best
attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you
cannot further classify the nodes and called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide
whether he should accept the offer or Not. So, to solve this problem, the
decision tree starts with the root node (Salary attribute by ASM). The root node
splits further into the next decision node (distance from the office) and one leaf
node based on the corresponding labels. The next decision node further gets
split into one decision node (Cab facility) and one leaf node. Finally, the
decision node splits into two leaf nodes (Accepted offers and Declined offer).
Consider the below diagram:
Attribute Selection Measures
While implementing a Decision tree, the main issue arises that how to select the
best attribute for the root node and for sub-nodes. So, to solve such problems
there is a technique which is called as Attribute selection measure or ASM.
two popular techniques for ASM, which are:

Information Gain-It calculates how much information a feature provides us

about a class.

Gini Index-Gini index is a measure of impurity or purity used while creating a

decision tree in the CART(Classification and Regression Tree) algorithm.

Q.12.What are the Advantages and Disadvantages of the Decision Tree?

Ans- Advantages of the Decision Tree

It is simple to understand as it follows the same process which a human follow

while making any decision in real-life.

It can be very useful for solving decision-related problems.

It helps to think about all the possible outcomes for a problem.

There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree-The decision tree contains lots of layers,

which makes it complex.

It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.

For more class labels, the computational complexity of the decision tree may
increase.

Q.13.Why use Random Forest?

Ans-It takes less training time as compared to other algorithms.

It predicts output with high accuracy, even for the large dataset it runs
efficiently.

It can also maintain accuracy when a large proportion of data is missing.

Q.14.What are the Applications of Random Forest

Ans- There are mainly four sectors where Random forest mostly used:

1. Banking: Banking sector mostly uses this algorithm for the

identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the
disease can be identified.
3. Land Use: We can identify the areas of similar land use by this
algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
Q.15. What is Random Forest Algorithm?

Ans- Random Forest is a classifier that contains a number of decision trees

on various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset.

Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts the
final output.

The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.
Q.16. What are the Advantages and Disadvantages of Random Forest?

Ans- Advantages of Random Forest -Random Forest is capable of performing

both Classification and Regression tasks.

It is capable of handling large datasets with high dimensionality.

It enhances the accuracy of the model and prevents the overfitting issue.

Disadvantages of Random Forest

Although random forest can be used for both classification and regression tasks,
it is not more suitable for Regression tasks.

Q.17. How does Random Forest algorithm work?

Ans- Random Forest works in two-phase first is to create the random forest
by combining N decision tree, and second is to make predictions for each
tree created in the first phase.

Step-1: Select random K data points from the training set.

Step-2: Build the decision trees associated with the selected data points
(Subsets).

Step-3: Choose the number N for decision trees that you want to build.

Step-4: Repeat Step 1 & 2.

Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.

The working of the algorithm can be better understood by the below example:

Example: Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each decision
tree produces a prediction result, and when a new data point occurs, then based
on the majority of results, the Random Forest classifier predicts the final
decision. Consider the below image:

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Professional Summary: Jayantee Bhalerao
No ratings yet
Professional Summary: Jayantee Bhalerao
2 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
AAM UNIT 2 QB WITH ANSWER
No ratings yet
AAM UNIT 2 QB WITH ANSWER
16 pages
AAM Unit 2 (1)
No ratings yet
AAM Unit 2 (1)
17 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
ML notes(III BCA)
No ratings yet
ML notes(III BCA)
64 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Slide 3
No ratings yet
Slide 3
23 pages
Unit 3
No ratings yet
Unit 3
33 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
DWM_EXP3_63
No ratings yet
DWM_EXP3_63
7 pages
Unit 4
No ratings yet
Unit 4
33 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
5 Learning
No ratings yet
5 Learning
7 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
2023-24_ML_NOTES_2
No ratings yet
2023-24_ML_NOTES_2
16 pages
decisiontree
No ratings yet
decisiontree
4 pages
41 j48 Naive Bayes Weka
No ratings yet
41 j48 Naive Bayes Weka
5 pages
5 Learning
No ratings yet
5 Learning
8 pages
decision tree
No ratings yet
decision tree
13 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Tree
No ratings yet
Tree
7 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
CST401 M5 Ktunotes - in
No ratings yet
CST401 M5 Ktunotes - in
17 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Pattern Recognition and Computer Vision Unit-1
No ratings yet
Pattern Recognition and Computer Vision Unit-1
37 pages
Decision Tree Algorithm: and Classification Problems Too
No ratings yet
Decision Tree Algorithm: and Classification Problems Too
12 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
AI Unit 4
No ratings yet
AI Unit 4
15 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Aiml QB With Ans - 075736
No ratings yet
Aiml QB With Ans - 075736
69 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
4. Classification
No ratings yet
4. Classification
75 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Assignment 04
No ratings yet
Assignment 04
17 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
MLT Unit-3 Important Questions
No ratings yet
MLT Unit-3 Important Questions
8 pages
Cours #4—Decision Tree
No ratings yet
Cours #4—Decision Tree
18 pages
Asign-3 DWDM
No ratings yet
Asign-3 DWDM
27 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
10 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Super 25 Unit 4 Notes
No ratings yet
Super 25 Unit 4 Notes
16 pages
Chapter No 05-1
No ratings yet
Chapter No 05-1
8 pages
Types of Entrepreneurs
No ratings yet
Types of Entrepreneurs
5 pages
Mad Solved Manual
No ratings yet
Mad Solved Manual
200 pages
fam.imp
No ratings yet
fam.imp
1 page
Applications of Artificial Intelligence (AI) in Petroleum Enginee
No ratings yet
Applications of Artificial Intelligence (AI) in Petroleum Enginee
87 pages
Amazon's Strategic Approach
No ratings yet
Amazon's Strategic Approach
20 pages
Semester II: Discipline: Information Technology Stream: IT1
No ratings yet
Semester II: Discipline: Information Technology Stream: IT1
188 pages
Anselin l an Introduction to Spatial Data Science With Geoda
No ratings yet
Anselin l an Introduction to Spatial Data Science With Geoda
238 pages
Deep Learning Applications Volume 3 Advances in Intelligent Systems and Computing 1st Edition M Arif Wani Bhiksha Raj Feng Luo Dejing Dou Editors
No ratings yet
Deep Learning Applications Volume 3 Advances in Intelligent Systems and Computing 1st Edition M Arif Wani Bhiksha Raj Feng Luo Dejing Dou Editors
79 pages
Dokumen - Pub - Data Wrangling Concepts Applications and Tools 111987968x 9781119879688
No ratings yet
Dokumen - Pub - Data Wrangling Concepts Applications and Tools 111987968x 9781119879688
357 pages
ML-UNIT-1 - Introduction PART-1
No ratings yet
ML-UNIT-1 - Introduction PART-1
60 pages
Text Based Nlp.2
No ratings yet
Text Based Nlp.2
29 pages
Generative AI Partner Playbook
No ratings yet
Generative AI Partner Playbook
51 pages
HKGBC - Smart Green Building Design Best Practice Guidebook
No ratings yet
HKGBC - Smart Green Building Design Best Practice Guidebook
162 pages
Digital Agriculture
No ratings yet
Digital Agriculture
4 pages
Attention Is Not All You Need Anymore
No ratings yet
Attention Is Not All You Need Anymore
16 pages
Approved Topics
No ratings yet
Approved Topics
4 pages
Fabric Data Science 1 150
No ratings yet
Fabric Data Science 1 150
150 pages
C2C - Predictive Analysis of Student Campus Placement PDF
No ratings yet
C2C - Predictive Analysis of Student Campus Placement PDF
16 pages
Academy Neovaristy Brochure
No ratings yet
Academy Neovaristy Brochure
24 pages
ISHA
No ratings yet
ISHA
5 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Top 10 Uses of Python in The Real World With Examples
100% (1)
Top 10 Uses of Python in The Real World With Examples
10 pages
Statistical Physics Optimization Inference and Message Passing Algorithms 1st Edition Florent Krzakala - The ebook in PDF/DOCX format is ready for download now
No ratings yet
Statistical Physics Optimization Inference and Message Passing Algorithms 1st Edition Florent Krzakala - The ebook in PDF/DOCX format is ready for download now
28 pages
Loss Functions - An Algorithm-Wise Comprehensive Summary
No ratings yet
Loss Functions - An Algorithm-Wise Comprehensive Summary
5 pages
Efficient Processing of Deep Neural Networks
No ratings yet
Efficient Processing of Deep Neural Networks
58 pages
Part 1.2. Back Propagation
No ratings yet
Part 1.2. Back Propagation
30 pages
Malware Detection
No ratings yet
Malware Detection
37 pages
Artificial Intelligrnce (2)
No ratings yet
Artificial Intelligrnce (2)
30 pages
State Farm Class Action
100% (1)
State Farm Class Action
26 pages
Akhilesh - Kashyap Mini
No ratings yet
Akhilesh - Kashyap Mini
86 pages
Vacancy Details 10-12-2024.Xlsx Scope Guide List 2025
No ratings yet
Vacancy Details 10-12-2024.Xlsx Scope Guide List 2025
13 pages
2023 Detection of Cross-Site Scripting (XSS) Attacks Using Machine Learning Techniques A Review
No ratings yet
2023 Detection of Cross-Site Scripting (XSS) Attacks Using Machine Learning Techniques A Review
45 pages