0% found this document useful (0 votes)

20 views20 pages

4 Classification

Uploaded by

thirosul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views20 pages

4 Classification

Uploaded by

thirosul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Unit IV: Classification

What is Classification?

Classification is to identify the category or the class label of a new

observation. First, a set of data is used as training data. The set of input data
and the corresponding outputs are given to the algorithm.
Sometimes there can be more than two classes to classify. That is
called multiclass classification.
How does Classification Work?
There are two stages in the data classification system: classifier or model
creation and classification classifier.
Developing the Classifier or model creation: This level is the learning stage or
the learning process. The classification algorithms construct the classifier in
this stage. A classifier is constructed from a training set composed of the
records of databases and their corresponding class names.
Applying classifier for classification: The classifier is used for classification at
this level. The test data are used here to estimate the accuracy of the
classification algorithm. If the consistency is deemed sufficient, the
classification rules can be expanded to cover new data records. It includes:
o Sentiment Analysis: Sentiment analysis is highly helpful in social
media monitoring. We can use it to extract social media insights.
We can build sentiment analysis models to read and analyze
misspelled words with advanced machine learning algorithms.
The accurate trained models provide consistently accurate
outcomes and result in a fraction of the time.
o Document Classification: We can use document classification to
organize the documents into sections according to the content.
Document classification refers to text classification; we can
classify the words in the entire document. And with the help of
machine learning classification algorithms, we can execute it
automatically.
o Image Classification: Image classification is used for the trained
categories of an image. These could be the caption of the image,
a statistical value, a theme. You can tag images to train your
model for relevant categories by applying supervised learning
algorithms.
o Machine Learning Classification: It uses the statistically
demonstrable algorithm rules to execute analytical tasks that
would take humans hundreds of more hours to perform.
2. Data Classification Process: The data classification process can be
categorized into five steps:
o Create the goals of data classification, strategy, workflows, and
architecture of data classification.
o Classify confidential details that we store.
o Using marks by data labelling.
o To improve protection and obedience, use effects.
o Data is complex, and a continuous method is a classification.

A general approach to classification:

Classification is a two-step process involving,
Learning Step: It is a step where the Classification model is to be constructed.
In this phase, training data are analyzed by a classification Algorithm.
Classification Step: it’s a step where the model is employed to predict class
labels for given data. In this phase, test data are wont to estimate the
accuracy of classification rules.

Evaluation of classifiers:
There are various methods commonly used to evaluate the performance of a
classifier which are as follows −
Holdout Method –
In the holdout method, the initial record with labeled instances is partitioned
into two disjoint sets, known as the training and the test sets, accordingly. A
classification model is induced from the training set and its implementation is
computed on the test set.
The efficiency of the classifier can be computed depending on the efficiency
of the induced model on the test set. The holdout method has various
well-known disadvantages. First, some labeled instances are accessible for
training because several data are withheld for testing.
As a result, the induced model cannot be as best as when some labeled
examples are utilized for training. Second, the model can be hugely
dependent on the structure of the training and test sets.
On the other hand, if the training set is too large, then the estimated accuracy
computed from the smaller test set is less reliable. Therefore an estimate has
a broad confidence interval. Finally, the training and test sets are no higher
separate from each other.
Random Subsampling –
The holdout method can be repeated multiple times to enhance the
computation of a classifier's implementation. This method is called random
subsampling.
Let acci be the model accuracy during the ith iteration. The overall accuracy is
given by accsub=∑i=1k∑i=1kacci/k
Random subsampling encounters several issues associated with the holdout
approach because it does not use as much data is applicable for training. It
also has no control over the several times each data is used for testing and
training. Therefore, some data can be used for training more than others.
Cross-Validation −:
An alternative to random subsampling is cross-validation. In this method,
each data is used multiple times for training and accurately once for testing.
Consider that it can partition the record into two equal-sized subsets. First, it
can select one of the subsets for training and the other for testing. It can
change the roles of the subsets so that the earlier training set becomes the
test set. This method is known as twofold cross-validation.

Basic algorithms of classification:

Decision Tree: Decision tree is the most powerful and popular tool for
classification and prediction. A Decision tree is a flowchart-like tree structure,
where each internal node denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (terminal node) holds a
class label.
A decision tree for the concept Play Tennis.

Construction of Decision Tree:

A tree can be “learned” by splitting the source set into subsets based on an
attribute value test. This process is repeated on each derived subset in a
recursive manner called recursive partitioning.
● The recursion is completed when the subset at a node all has the same
value of the target variable, or when splitting no longer adds value to
the predictions.
● The construction of decision tree classifier does not require any domain
knowledge or parameter setting, and therefore is appropriate for
exploratory knowledge discovery.
● Decision trees can handle high dimensional data. In general decision
tree classifier has good accuracy. Decision tree induction is a typical
inductive approach to learn knowledge on classification.
Decision Tree Representation:
Decision trees classify instances by sorting them down the tree from the
root to some leaf node, which provides the classification of the instance.
An instance is classified by starting at the root node of the tree, testing the
attribute specified by this node, then moving down the tree branch
corresponding to the value of the attribute as shown in the above figure. This
process is then repeated for the subtree rooted at the new node.
For example, the instance

(Outlook = Rain, Temperature = Hot, Humidity = High, Wind = Strong )

would be sorted down the leftmost branch of this decision tree and would
therefore be classified as a negative instance.
In other words we can say that decision tree represent a disjunction of
conjunctions of constraints on the attribute values of instances.

(Outlook = Sunny ^ Humidity = Normal) v (Outlook = Overcast) v (Outlook =

Rain ^ Wind = Weak)

Strengths and Weakness of the Decision Tree Approach

The strengths of decision tree methods are:

● Decision trees are able to generate understandable rules.

● Decision trees perform classification without requiring much computation.
● Decision trees are able to handle both continuous and categorical
variables.
● Decision trees provide a clear indication of which fields are most
important for prediction or classification.
The weaknesses of decision tree methods:

● Decision trees are less appropriate for estimation tasks where the goal is
to predict the value of a continuous attribute.
● Decision trees are prone to errors in classification problems with many
class and relatively small number of training examples.
● Decision tree can be computationally expensive to train.

Expressing attribute test conditions

Decision tree induction algorithms support an approach for defining an
attribute test condition and its correlating results for multiple attribute types.
Binary Attributes − A binary attribute is a nominal attribute with only two
elements or states including 0 or 1, where 0 frequently represents that the
attribute is absent, and 1 represents that it is present. Binary attributes are
defined as Boolean if the two states are equivalent to true and false.
A binary attribute is symmetric if both of its states are equal valuable and
make an equal weight. There is no preference on which results must be
coded as 0 or 1. An example can be the attribute gender having the states
male and female.
Nominal Attributes − Nominal defines associating with names. The values of a
nominal attribute are symbols or names of things. Each value defines some
type of category, code, or state, etc. Nominal attributes are defined as
categorical. The values do not have any significant order. In computer
science, the values are also called enumerations
Ordinal Attributes − An ordinal attribute is an attribute with applicable values
that have an essential series or ranking among them, but the magnitude
between successive values is unknown.
Ordinal attributes can make binary or multiway splits. Ordinal attribute
values can be combined considering the grouping does not violate the order
nature of the attribute values.
Numeric Attributes − A numeric attribute is quantitative. It is a computable
quantity, represented in numerical or real values. It can be interval-scaled or
ratio-scaled.
Measures for Selecting the Best Split
Node splitting, or simply splitting, is the process of dividing a node into
multiple sub-nodes to create relatively pure nodes. There are multiple ways
of doing this, which can be broadly divided into two categories based on the
type of target variable:

1. Continuous Target Variable:-

o Reduction in Variance

2. Categorical Target Variable

o Gini Impurity
o Information Gain
o Chi-Square

Decision Tree Splitting Method #1: Reduction in Variance

Reduction in Variance is a method for splitting the node used when the target
variable is continuous, i.e., regression problems. It is so-called because it
uses variance as a measure for deciding the feature on which node is split
into child nodes.

Variance is used for calculating the homogeneity of a node. If a node is

entirely homogeneous, then the variance is zero.

Here are the steps to split a decision tree using reduction in variance:

1. For each split, individually calculate the variance of each child node
2. Calculate the variance of each split as the weighted average variance of
child nodes
3. Select the split with the lowest variance
4. Perform steps 1-3 until completely homogeneous nodes are achieved
o Decision Tree Splitting Method #2: Gini Impurity/ index

1. Gini Index is a metric to measure how often a randomly chosen

element would be incorrectly identified.
2. It means an attribute with a lower Gini index should be preferred.
3. Sklearn supports “Gini” criteria for Gini Index and by default, it takes
“gini” value.

The Formula for the calculation of the Gini Index is given below.

The Gini Index is a measure of the inequality or impurity of a distribution,

commonly used in decision trees and other machine learning algorithms. It
ranges from 0 to 1, where 0 represents perfect equality (all values are the
same) and 1 represents perfect inequality (all values are different).

Some additional features and characteristics of the Gini Index are:

1. It is calculated by summing the squared probabilities of each outcome

in a distribution and subtracting the result from 1.
2. A lower Gini Index indicates a more homogeneous or pure distribution,
while a higher Gini Index indicates a more heterogeneous or impure
distribution.
3. In decision trees, the Gini Index is used to evaluate the quality of a split
by measuring the difference between the impurity of the parent node
and the weighted impurity of the child nodes.
4. Compared to other impurity measures like entropy, the Gini Index is
faster to compute and more sensitive to changes in class probabilities.
5. One disadvantage of the Gini Index is that it tends to favor splits that
create equally sized child nodes, even if they are not optimal for
classification accuracy.
6. In practice, the choice between using the Gini Index or other impurity
measures depends on the specific problem and dataset, and often
requires experimentation and tuning.
Decision Tree Splitting Method #3: Information Gain

Now, what if we have a categorical target variable? Reduction in variation

won’t quite cut it.
Well, the answer to that is Information Gain. Information Gain is used for
splitting the nodes when the target variable is categorical. It works on the
concept of the entropy and is given by:

Entropy is used for calculating the purity of a node. Lower the value of
entropy, higher is the purity of the node. The entropy of a homogeneous
node is zero. Since we subtract entropy from 1, the Information Gain is higher
for the purer nodes with a maximum value of 1. Now, let’s take a look at the
formula for calculating the entropy:

Steps to split a decision tree using Information Gain:

1. For each split, individually calculate the entropy of each child node
2. Calculate the entropy of each split as the weighted average entropy of
child nodes
3. Select the split with the lowest entropy or highest information gain
4. Until you achieve homogeneous nodes, repeat steps 1-3

Decision Tree Splitting Method #4: Chi-Square

Chi-square is another method of splitting nodes in a decision tree for datasets

having categorical target values. It can make two or more than two splits. It
works on the statistical significance of differences between the parent node
and child nodes. Chi-Square value is:
Here, the Expected is the expected value for a class in a child node based on
the distribution of classes in the parent node, and Actual is the actual value
for a class in a child node.

The above formula gives us the value of Chi-Square for a class. Take the sum
of Chi-Square values for all the classes in a node to calculate the Chi-Square
for that node. Higher the value, higher will be the differences between parent
and child nodes, i.e., higher will be the homogeneity.

Here are the steps to split a decision tree using Chi-Square:

1. For each split, individually calculate the Chi-Square value of each child
node by taking the sum of Chi-Square values for each class in a node
2. Calculate the Chi-Square value of each split as the sum of Chi-Square
values for all the child nodes
3. Select the split with a higher Chi-Square value
4. Until you achieve homogeneous nodes, repeat steps 1-3

Naïve Bayes Classifier Algorithm

o Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional
training dataset.
o Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

Bayes' Theorem:
o Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is
used to determine the probability of a hypothesis with prior knowledge.
It depends on the conditional probability.
o The formula for Bayes' theorem is given as:

Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed

event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the

probability of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the

evidence.

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the
below example:
Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we
should play or not on a particular day according to the weather conditions. So
to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:

Outlook Play
0 Rainy Yes

1 Sunny Yes

2 Overcast Yes

3 Overcast Yes

4 Sunny No

5 Rainy Yes

6 Sunny Yes

7 Overcast Yes

8 Rainy No

9 Sunny No

10 Sunny Yes

11 Rainy No

12 Overcast Yes

13 Overcast Yes

Frequency table for the Weather Conditions:

Weather Yes No

Overcast 5 0
Rainy 2 2

Sunny 3 2

Total 10 4

Likelihood table weather condition:

Weather No Yes All

Overcast 0 5 5/14= 0.35

Rainy 2 2 4/14=0.29

Sunny 2 3 5/14=0.35

All 4/14=0.29 10/14=0.71

Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.3
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.3= 0.7
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.3
So P(No|Sunny)= 0.5*0.29/0.3 = 0.48
So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
Hence on a Sunny day, Player can play the game.
Advantages of Naïve Bayes Classifier:
o Naïve Bayes is one of the fast and easy ML algorithms to predict a class
of datasets.
o It can be used for Binary as well as Multi-class Classifications.
o It performs well in Multi-class predictions as compared to the other
Algorithms.
o It is the most popular choice for text classification problems.

Disadvantages of Naïve Bayes Classifier:

o Naive Bayes assumes that all features are independent or unrelated, so
it cannot learn the relationship between features.

Applications of Naïve Bayes Classifier:

o It is used for Credit Scoring.
o It is used in medical data classification.
o It can be used in real-time predictions because Naïve Bayes Classifier is
an eager learner.
o It is used in Text classification such as Spam filtering and Sentiment
analysis.

Bayesian Belief Networks

Bayesian classification is based on Bayes' Theorem. Bayesian classifiers
are the statistical classifiers. Bayesian classifiers can predict class
membership probabilities such as the probability that a given tuple belongs to
a particular class.
Baye's Theorem
Bayes' Theorem is named after Thomas Bayes. There are two types of
probabilities −
● Posterior Probability [P(H/X)]
● Prior Probability [P(H)]
where X is data tuple and H is some hypothesis.
According to Bayes' Theorem,
P(H/X)= P(X/H)P(H) / P(X)
Bayesian Belief Network

Bayesian Belief Networks specify joint conditional probability distributions.

They are also known as Belief Networks, Bayesian Networks, or Probabilistic
Networks.

● A Belief Network allows class conditional independencies to be defined

between subsets of variables.
● It provides a graphical model of causal relationship on which learning
can be performed.
● We can use a trained Bayesian Network for classification.
There are two components that define a Bayesian Belief Network −
● Directed acyclic graph
● A set of conditional probability tables

Directed Acyclic Graph

● Each node in a directed acyclic graph represents a random variable.

● These variable may be discrete or continuous valued.
● These variables may correspond to the actual attribute given in the data.

Directed Acyclic Graph Representation

The following diagram shows a directed acyclic graph for six Boolean variables.

The arc in the diagram allows representation of causal knowledge. For

example, lung cancer is influenced by a person's family history of lung cancer,
as well as whether or not the person is a smoker. It is worth noting that the
variable PositiveXray is independent of whether the patient has a family
history of lung cancer or that the patient is a smoker, given that we know the
patient has lung cancer.

Conditional Probability Table

The conditional probability table for the values of the variable LungCancer
(LC) showing each possible combination of the values of its parent nodes,
FamilyHistory (FH), and Smoker (S) is as follows −

K – Nearnest neighbor classification – Algorithm

What’s KNN?

KNN (K — Nearest Neighbors) is one of many (supervised learning) algorithms

used in data mining and machine learning, it’s a classifier algorithm where the
learning is based “how similar” is a data (a vector) from other .

How it’s working?

The KNN is pretty simple, imagine that you have a data about colored balls:

● Purple balls;

● Yellow balls;

● And a ball that you don’t know if it’s purple or yellow, but you has all the
data about this color (except the color label).
So, how are you going to know the ball’s color? imagine you like a machine
that you only have the ball’s characteristics(data), but doesn’t the final label.
Hou do you will to know the ball’s color(final label/your class)?

Obs: Let’s suppose that data with number 1(and label R) are referring to the
purple balls and the data with number 2 (and label A) are referring to the
yellow balls, this’s just to make the explanation easier,

Each line refers to a ball and each column refers to a ball’s characteristic, in
the last column we have the class (color) of each of the balls:

● R -> purple;

● A -> yellow

We have 5 balls there ( 5 lines), each one with yours classification, you can
try to discover the new ball’s color (in the case the class) of N ways, one of
these N ways is to comparing this new ball’s characteristics with all the
others, and see what it looks like most, if the data(characteristics) of this new
ball (that you doesn’t know the correct class) is similar to the data of the
yellow balls, then the color of the new ball is yellow, if the data in the new
ball is more similar to the data of the purple then yellow, then the color of
the new ball is purple, it looks so simple, and that is almost what the knn
does, but in a most sophisticated way.
The KNN’s steps are:
1 — Receive an unclassified data;
2 — Measure the distance (Euclidian, Manhattan, Minkowski or Weighted)
from the new data to all others data that is already classified;
3 — Gets the K(K is a parameter that you define) smaller distances;
4 — Check the list of classes had the shortest distance and count the amount
of each class that appears;
5 — Takes as correct class the class that appeared the most times;
6 —Classifies the new data with the class that you took in step 5;

Calculating distance:
To calculate the distance between two points (your new sample and all the
data you have in your dataset) is very simple, as said before, there are several
ways to get this value, in this article we will use the Euclidean distance.

The Euclidean distance’s formula is like the image below:

Characteristics of KNN
Between-sample geometric distance
The k-nearest-neighbor classifier is commonly based on the Euclidean
distance between a test sample and the specified training samples. Let xi be
an input sample with p features (xi1,xi2,…,xip) , n be the total number of input
samples (i=1,2,…,n) and p the total number of features (j=1,2,…,p) .
The Euclidean distance between sample xi and xl (l=1,2,…,n) is defined as
d(xi,xl)= √ xi1−xl1)2+(xi2−xl2)2+⋯+(xip−xlp)2

Classification decision rule and confusion matrix

Classification typically involves partitioning samples into training and testing
categories. Let xi be a training sample and x be a test sample, and let ω be
the true class of a training sample and ω^ be the predicted class for a test
sample (ω,ω^=1,2,…,Ω) . Here, Ω is the total number of classes.
Feature transformation
Increased performance of a classifier can sometimes be achieved when the
feature values are transformed prior to classification analysis. Two commonly
used feature transformations are standardization and fuzzification.
Standardization removes scale effects caused by use of features with
different measurement scales. For example, if one feature is based on patient
weight in units of kg and another feature is based on blood protein values in
units of ng/dL in the range [-3,3], then patient weight will have a much
greater influence on the distance between samples and may bias the
performance of the classifier. Standardization transforms raw feature values
into z-scores using the mean and standard deviation of a feature values over
all input samples, given by the relationship
zij=xij−μjσj,
where xij is the value for the ith sample and jth feature, μj is the average of
all xij for feature j, σj is the standard deviation of all xij over all input samples.
Performance assessment with cross-validation
A basic rule in classification analysis is that class predictions are not made for
data samples that are used for training or learning. If class predictions are
made for samples used in training or learning, the accuracy will be artificially
biased upward. Instead, class predictions are made for samples that are kept
out of training process.
The performance of most classifiers is typically evaluated
through cross-validation, which involves the determination of classification
accuracy for multiple partitions of the input samples used in training. For
example, during 5-fold (κ=5) cross-validation training, a set of input samples
is split up into 5 partitions D1,D2,…,D5 having equal sample sizes to the extent
possible. The notion of ensuring uniform class representation among the
partitions is called stratified cross-validation, which is preferred. To begin,
for 5-fold cross-validation, samples in partitions D2,D3,…,D5 are first used for
training while samples in partition D1 are used for testing. Next, samples in
groups D1,D3,…,D5 are used for training and samples in partition D2 used for
testing. This is repeated until each partitions have been used singly for testing.
It is also customary to re-partition all of the input samples e.g. 10 times in
order to get a better estimate of accuracy.

Lohara
100% (1)
Lohara
1 page
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
Module 3 - NC II - Solving and Addressing General Workplace Problems - ForTrainingOnly
100% (7)
Module 3 - NC II - Solving and Addressing General Workplace Problems - ForTrainingOnly
86 pages
Targ - Theoretical Mechanics A Short Course - Mir 1988 PDF
100% (2)
Targ - Theoretical Mechanics A Short Course - Mir 1988 PDF
528 pages
Amazing Adventures Book of Powers
67% (3)
Amazing Adventures Book of Powers
50 pages
CH 4
No ratings yet
CH 4
21 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Classifiction
No ratings yet
Classifiction
42 pages
Tunnel Lining Analysis and Design Using Staad Pro
No ratings yet
Tunnel Lining Analysis and Design Using Staad Pro
4 pages
Weekly Learning Activity Sheets General Physics 1 Grade 12, Quarter 2 Week 6
100% (1)
Weekly Learning Activity Sheets General Physics 1 Grade 12, Quarter 2 Week 6
10 pages
2015 366745 Maraathyanchyaa-Raajya
No ratings yet
2015 366745 Maraathyanchyaa-Raajya
222 pages
Sum of An Arithmetic Sequence
100% (1)
Sum of An Arithmetic Sequence
2 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
District Census Handbook: Maharashtra
No ratings yet
District Census Handbook: Maharashtra
202 pages
Classification&Decision Tree
No ratings yet
Classification&Decision Tree
13 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Unit 3
No ratings yet
Unit 3
95 pages
Classification&Decision Tree
No ratings yet
Classification&Decision Tree
10 pages
Module 3 Notes
No ratings yet
Module 3 Notes
31 pages
2015 366596 Dona-Tapen
No ratings yet
2015 366596 Dona-Tapen
161 pages
Classification Algorithm
No ratings yet
Classification Algorithm
78 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
Global Supply Chains
No ratings yet
Global Supply Chains
25 pages
Unit 3
No ratings yet
Unit 3
34 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
DWDM Module IV
No ratings yet
DWDM Module IV
57 pages
DWM Unit-V Notes
No ratings yet
DWM Unit-V Notes
15 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Hussain Et Al., 2015
No ratings yet
Hussain Et Al., 2015
11 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
What Is A Worldview? Published in Dutch As: "Wat Is Een Wereldbeeld?"
No ratings yet
What Is A Worldview? Published in Dutch As: "Wat Is Een Wereldbeeld?"
14 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Opening and Closing Spaces Problems and Solutions: C H A P T e R
No ratings yet
Opening and Closing Spaces Problems and Solutions: C H A P T e R
10 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
CH 5
No ratings yet
CH 5
84 pages
Indrayani
No ratings yet
Indrayani
99 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
DMDW Classification
No ratings yet
DMDW Classification
18 pages
BSC12 PDF
No ratings yet
BSC12 PDF
2 pages
Wu - 18 - Modelling of A Post-Combustion Carbon Dioxide Capture Absorber Using Potassium Carbonate Solvent in Aspen Custom Modeller
No ratings yet
Wu - 18 - Modelling of A Post-Combustion Carbon Dioxide Capture Absorber Using Potassium Carbonate Solvent in Aspen Custom Modeller
10 pages
State of Sugar Industry in India
No ratings yet
State of Sugar Industry in India
5 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Unit 3
100% (1)
Unit 3
21 pages
Unit 3
No ratings yet
Unit 3
16 pages
Classification
No ratings yet
Classification
33 pages
Data Link Layer
No ratings yet
Data Link Layer
26 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Module 04
No ratings yet
Module 04
75 pages
MTH101 Lec#1
No ratings yet
MTH101 Lec#1
28 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
.Trashed 1724238737 BPSC Senior Secondary Teacher Booklets Sanskrit
No ratings yet
.Trashed 1724238737 BPSC Senior Secondary Teacher Booklets Sanskrit
32 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
Physical Layer
No ratings yet
Physical Layer
37 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
CH-5 DM Classification
No ratings yet
CH-5 DM Classification
31 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Text To Image Survey
No ratings yet
Text To Image Survey
40 pages
Chapter - 6 - Excel - Data - Analysts - Training
No ratings yet
Chapter - 6 - Excel - Data - Analysts - Training
11 pages
Module 3
No ratings yet
Module 3
64 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Unit 3 Supervised Learning Technique
No ratings yet
Unit 3 Supervised Learning Technique
46 pages
TDS - en - Tecnotop S-3000 T
No ratings yet
TDS - en - Tecnotop S-3000 T
5 pages
Kinerja Ruas Dan Simpang Jalan
No ratings yet
Kinerja Ruas Dan Simpang Jalan
43 pages
Chapter - 7 - Excel - Data - Analysts - Training
No ratings yet
Chapter - 7 - Excel - Data - Analysts - Training
44 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Chapter 4
No ratings yet
Chapter 4
31 pages
DM Mod 3
No ratings yet
DM Mod 3
14 pages
Unit-Iii: Classification and Prediction
No ratings yet
Unit-Iii: Classification and Prediction
21 pages
Unit 4
No ratings yet
Unit 4
20 pages
Tulsidas Jadhav
No ratings yet
Tulsidas Jadhav
3 pages
Ict Tools in Biology Education: DR Katarzyna Potyrala
No ratings yet
Ict Tools in Biology Education: DR Katarzyna Potyrala
9 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Review-Midterm-1-Eng 6
No ratings yet
Review-Midterm-1-Eng 6
6 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
FRP For Sugarcane and Sugar Industry in India
No ratings yet
FRP For Sugarcane and Sugar Industry in India
4 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Network Layer
No ratings yet
Network Layer
21 pages
Minimum Subject Requirements For Students With NUS High School Diploma
No ratings yet
Minimum Subject Requirements For Students With NUS High School Diploma
6 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
ABSTRACT Experiment Material Chem
No ratings yet
ABSTRACT Experiment Material Chem
3 pages
Data Communication and NW Part - 1
No ratings yet
Data Communication and NW Part - 1
6 pages
Biology Practical Class 12
No ratings yet
Biology Practical Class 12
7 pages
Instant download New Trends in Engineering Research: Proceedings of the International Conference of Experimental and Numerical Investigations and New Technologies, ... (Lecture Notes in Networks and Systems, 792) 1st Edition Nenad Mitrovic pdf all chapter
100% (7)
Instant download New Trends in Engineering Research: Proceedings of the International Conference of Experimental and Numerical Investigations and New Technologies, ... (Lecture Notes in Networks and Systems, 792) 1st Edition Nenad Mitrovic pdf all chapter
42 pages
5 Explain How A Series of Chapters, Scenes, or Stanzas Fits Together To Provide The Overall Structure of A Particular Story, Drama, or Poem
No ratings yet
5 Explain How A Series of Chapters, Scenes, or Stanzas Fits Together To Provide The Overall Structure of A Particular Story, Drama, or Poem
3 pages
F1 Maths Bab 9
No ratings yet
F1 Maths Bab 9
6 pages
Nonlinear Dynamics and Machine Learning For Roboti
No ratings yet
Nonlinear Dynamics and Machine Learning For Roboti
23 pages
Unit2.5 Compoundsand Solutions
No ratings yet
Unit2.5 Compoundsand Solutions
17 pages
Punjab PET Syllabus
No ratings yet
Punjab PET Syllabus
4 pages
Linux - Servers
No ratings yet
Linux - Servers
5 pages
S. K. Patil
No ratings yet
S. K. Patil
2 pages
Chapter 4 Introduction To Discontinuity Study
No ratings yet
Chapter 4 Introduction To Discontinuity Study
87 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Fundamental Principles of Counting - 073819
No ratings yet
Fundamental Principles of Counting - 073819
6 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet

4 Classification

Uploaded by

4 Classification

Uploaded by

Unit IV: Classification

Classification is to identify the category or the class label of a new

A general approach to classification:

Basic algorithms of classification:

Construction of Decision Tree:

(Outlook = Rain, Temperature = Hot, Humidity = High, Wind = Strong )

(Outlook = Sunny ^ Humidity = Normal) v (Outlook = Overcast) v (Outlook =

Strengths and Weakness of the Decision Tree Approach

● Decision trees are able to generate understandable rules.

Expressing attribute test conditions

1. Continuous Target Variable:-

2. Categorical Target Variable

Decision Tree Splitting Method #1: Reduction in Variance

Variance is used for calculating the homogeneity of a node. If a node is

1. Gini Index is a metric to measure how often a randomly chosen

The Gini Index is a measure of the inequality or impurity of a distribution,

Some additional features and characteristics of the Gini Index are:

1. It is calculated by summing the squared probabilities of each outcome

Now, what if we have a categorical target variable? Reduction in variation

Steps to split a decision tree using Information Gain:

Decision Tree Splitting Method #4: Chi-Square

Chi-square is another method of splitting nodes in a decision tree for datasets

Here are the steps to split a decision tree using Chi-Square:

Naïve Bayes Classifier Algorithm

P(A|B) is Posterior probability: Probability of hypothesis A on the observed

P(B|A) is Likelihood probability: Probability of the evidence given that the

P(A) is Prior Probability: Probability of hypothesis before observing the

P(B) is Marginal Probability: Probability of Evidence.

Working of Naïve Bayes' Classifier:

Frequency table for the Weather Conditions:

Likelihood table weather condition:

Weather No Yes All

Overcast 0 5 5/14= 0.35

All 4/14=0.29 10/14=0.71

Disadvantages of Naïve Bayes Classifier:

Applications of Naïve Bayes Classifier:

Bayesian Belief Networks

Bayesian Belief Networks specify joint conditional probability distributions.

● A Belief Network allows class conditional independencies to be defined

Directed Acyclic Graph

● Each node in a directed acyclic graph represents a random variable.

Directed Acyclic Graph Representation

The arc in the diagram allows representation of causal knowledge. For

Conditional Probability Table

K – Nearnest neighbor classification – Algorithm

KNN (K — Nearest Neighbors) is one of many (supervised learning) algorithms

How it’s working?

The Euclidean distance’s formula is like the image below:

Classification decision rule and confusion matrix

You might also like