0% found this document useful (0 votes)

6 views48 pages

TE - DWM Module No 3

Uploaded by

riteshsingh8746

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views48 pages

TE - DWM Module No 3

Uploaded by

riteshsingh8746

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

MODULE NO : 03

CLASSIFICATION

1
CLASSIFICATION

• Classification is a form of data analysis that extracts models

describing important data classes.

• Such models, called classifiers, predict categorical class labels

• E.g. we can build a classification model to categorize bank

loan applications as either safe or risky

• A bank loans officer need to analysis from the data to learn

which loan applicants are “safe” and which are “risky” for the
bank to sanction loan
2
GENERAL
APPROACH
• General approach of classification is divided into two-steps

• In the first step, we build a classification model based on previous

data.

• In the second step, we determine if the model’s accuracy is

acceptable, and if so, we use the model to classify new data.

3
FIRST STEP

4
SECOND STEP

5
General Approach
The data classification process:
(a) Learning:
• Training data are analyzed by a classification algorithm.
• Here, the class label attribute is loan decision, and the learned
model or classifier is represented in the form of classification
rules
(b) Classification:
• Test data are used to estimate the accuracy of the classification
rules.
• If the accuracy is considered acceptable, the rules can be
applied to the classification of new data tuples.

6
Decision Tree Induction
• Decision tree induction is the learning of decision trees from
class-labeled training tuples.
• A decision tree is a flowchart-like tree structure, where each
internal node (nonleaf node) denotes a test on an attribute
• Each branch represents an outcome of the test, and each leaf
node (or terminal node) holds a class label.
• The topmost node in a tree is the root node.
• A typical decision tree is shown in Figure

7
8
DECISION TREE
INDUCTION
“How are decision trees used for classification?”
• Given a tuple, X, for which the associated class label is unknown,
the attribute values of the tuple are tested against the decision tree.
• A path is traced from the root to a leaf node, which holds the class
prediction for that tuple.
• Decision trees can easily be converted to classification rules.

9
RULE EXTRACTION FROM A DECISION TREE
• Decision tree classifiers are a popular method of
• To extract rules from a decision tree, one rule is created for each
path from the root to a leaf node.
• Each splitting criterion along a given path is logically ANDed to form
the rule antecedent (“IF” part)
• The leaf node holds the class prediction, forming the rule
consequent (“THEN” part)

10
RULE EXTRACTION FROM A DECISION TREE

11
12
13
14
15
RULE EXTRACTION FROM A DECISION TREE

18
19
BAYES CLASSIFICATION METHOD : NAIVE BAYES
• CLASSIFICATION
Bayesian classifiers are statistical classifiers.
• They can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class.
• A simple Bayesian classifier known as the naive Bayesian classifier
to be comparable in performance with decision tree
• Naive Bayesian classifiers assume that the effect of an attribute
value on a given class is independent of the values of the other
attributes. This assumption is called class conditional independence.
• It is made to simplify the computations involved and, in this sense,
is considered “naive.”

20
PREDICTING A CLASS LABEL USING NAIVE
BAYESIAN CLASSIFICATION

21
22
23
22
25
26
MODEL EVALUATION AND
• SELECTION
Now we know what is classification, how classifiers works so we
may built a classification model
• For example, suppose you used previous sales data to build a
classifier to predict customer purchasing behaviour
• In this example we would like to analyse how our model can predict
the purchasing behaviour of future customers.(data on which
classifier has not been trained)
• We may built different classifiers and we can compare their
accuracy/performance by applying various evaluation matrics
• Before we discuss the various evaluation matrics, we need to
understand some basic terminologies

29
MODEL EVALUATION AND
• SELECTION
MODEL : a model is created by applying an algorithms(or statistical
calculations) to data to generate predictions/classifications of new
data.
• Given data set is partitioned into subsets
• Training data set
• Testing data set
• Training data set: training data set is used to derive the model or
train the model
• Testing data set: the models accuracy is estimated by using testing
data set

30
MODEL EVALUATION AND
• SELECTION
Positive tuples : positive tuples of the class attribute (in our last
example positive tuples are buys_computer= yes)
• Negative tuples : negative tuples of the class attribute (in our last
example negative tuples are buys_computer= no)
• Suppose we use our classifier on a test set of labeled tuples.
• P is the number of positive tuples and N is the number of negative
tuples.
• For each tuple, we compare the classifier’s class attribute prediction
with the tuple’s known class attribute value.

31
MODEL EVALUATION AND SELECTION
There are four additional terms we need to know that are
• True positives (TP): These refer to the positive tuples that were correctly
labeled by the classifier. Let TP be the number of true positives.
• True negatives (TN): These are the negative tuples that were correctly
labeled by the classifier. Let TN be the number of true negatives.
• False positives (FP) Type I Error: These are the negative tuples that were
incorrectly labeled as positive (e.g., tuples of class buys_computer=no
for which the classifier predicted buys_computer=yes). Let FP be the
number of false positives.
• False negatives (FN) Type II Error: These are the positive tuples that were
mislabeled as negative (e.g., tuples of class buys_computer=yes for which
the classifier predicted buys_computer=no). Let FN be the number of
false negatives.

32
MODEL EVALUATION AND
SELECTION

33
CONFUSION
• MATRIX
The confusion matrix is a useful tool for analyzing how well your
classifier can recognize tuples of different classes.
• TP and TN tell us when the classifier is getting things right, while
FP and FN tell us when the classifier is getting things wrong.

34
CONFUSION
• MATRIX
E.g. suppose in a data set of the customers who buys the computer,
there are total 10000 tuples, out of that 7000 are positive and 3000
are negative and our model has predicated 6954 are positive and
2588 are negative, so prepare confusion matrix

35
CONFUSION
• MATRIX
E.g. suppose in a data set of the customers who buys the computer,
there are total 10000 tuples, out of that 7000 are positive and 3000
are negative and our model has predicated 6954 are positive and
2588 are negative, so the confusion matrix will be

36
CLASSIFIERS PERFORMANCE
EVALUATION MEASURES

37
CLASSIFIERS PERFORMANCE
• EVALUATION
Find all evaluation MEASURES
measures for the following confusion matrix

38
CONFUSION
• MATRIX
E.g. suppose in a data set of the cancer, there are total 10000 tuples,
out of that 300 are positive and 9700 are negative and our model has
predicated 90 are positive and 9560 are negative, so prepare
confusion matrix and Find all evaluation measures for the confusion
matrix

39
Evaluation measures for the confusion matrix

1. Accuracy:
2. Error rate:
3. Sensitivity: ability to correctly label the positive as positive
4. Specificity: ability to correctly label the negative as negative
5. Precision: % of positive tuples labelled as positive

41
MODEL EVALUATION AND
SELECTION METHODS
1. Holdout
2. Random sampling
3. Cross validation
4. Bootstrap
5. ROC Curves (Receiver operating characteristic curves)

42
HOLD
• InOUT
this method, the given data are randomly partitioned into two
independent sets, a training set and a test set.
• Typically, two-thirds of the data are allocated to the training set, and
the remaining one-third is allocated to the test set.
• The training set is used to derive the model. The model’s accuracy is
then estimated with the test set.
• The estimate is pessimistic(negative) because only a portion of the
initial data is used to derive the model.

43
HOLD
OUT

44
RANDOM SUB-
• SAMPLING
Random subsampling is a variation of the holdout method in which
the holdout method is repeated k times.
• The overall accuracy estimate is taken as the average of the
accuracies obtained from each iteration.

45
CROSS-VALIDATION
• In k-fold cross-validation, the initial data are randomly partitioned into k
mutually exclusive subsets or “folds,” D1, D2, .... , Dk, each of
approximately equal size.
• Training and testing is performed k times. In iteration i, partition Di is
reserved as the test set, and the remaining partitions are collectively used
to train the model.
• That is, in the first iteration, subsets D2,....., Dk collectively serve as the
training set to obtain a first model, which is tested on D1
• the second iteration is trained on subsets D1, D3, ...... , Dk and tested on
D2 and so on...
• Each fold is used the same number of times for training and once for
testing
• the accuracy estimate is the overall number of correct classifications from
the k iterations, divided by the total number of tuples in the initial data

46
CROSS-
VALIDATION

47
BOOTST Video
RAP
• Bootstrap randomly selects a tuple from the original data set
• Add that tuple into the training dataset and again send it back to the
original dataset
• Repeat this process N times (N is the total number of tuples in the
original dataset)
• The bootstrap is allowed to select the same tuple more than once.
• We use the training data set to train the model and test dataset to
obtain an accuracy estimate of the model

Accuracy and Error Measures
No ratings yet
Accuracy and Error Measures
46 pages
Chp8 Classification Basic Concepts - Lecture#8
No ratings yet
Chp8 Classification Basic Concepts - Lecture#8
40 pages
9b. Evaluation of Classifiers
No ratings yet
9b. Evaluation of Classifiers
4 pages
Classification
No ratings yet
Classification
23 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
8c - Model Evaluation and Selection
No ratings yet
8c - Model Evaluation and Selection
15 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Lecture 11
No ratings yet
Lecture 11
61 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Big Data Mining and Analytics Notes
No ratings yet
Big Data Mining and Analytics Notes
7 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
52 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
37 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
Week 5
No ratings yet
Week 5
72 pages
MODULE 3 Classification
No ratings yet
MODULE 3 Classification
5 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
ML Model Evaluation
No ratings yet
ML Model Evaluation
17 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
Module 5 Advanced Classification Techniques
No ratings yet
Module 5 Advanced Classification Techniques
40 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
DM 09 Classification and Prediction 19112024 102854am
No ratings yet
DM 09 Classification and Prediction 19112024 102854am
21 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
Classification
No ratings yet
Classification
33 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
08 Classification
No ratings yet
08 Classification
26 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Module 04
No ratings yet
Module 04
75 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
CH 5
No ratings yet
CH 5
84 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
No ratings yet
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
15 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
17 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
ML Unit IV
No ratings yet
ML Unit IV
70 pages
ML Notes Unit 1-2
No ratings yet
ML Notes Unit 1-2
55 pages
Internship ML REPORT
No ratings yet
Internship ML REPORT
27 pages
Foai Unit 1 2 3
No ratings yet
Foai Unit 1 2 3
41 pages
Project Report
No ratings yet
Project Report
47 pages
Previewpdf
No ratings yet
Previewpdf
72 pages
Salesforce Ai
No ratings yet
Salesforce Ai
31 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Abstract For Facial Emotion Detection Using Neural Networks
No ratings yet
Abstract For Facial Emotion Detection Using Neural Networks
48 pages
MLT Unit 2 Notes
No ratings yet
MLT Unit 2 Notes
58 pages
Ch-0 Introduction To Machine Learning
No ratings yet
Ch-0 Introduction To Machine Learning
45 pages
Documentation Project
No ratings yet
Documentation Project
48 pages
Forecasting Energy Consumption in The Philippines Using Machine Learning Algorithms
No ratings yet
Forecasting Energy Consumption in The Philippines Using Machine Learning Algorithms
12 pages
Development of Object Oriented Bare Surface Feature Extraction Algorithm For Desertification Early Warning Using Nigeria Sat-2 High Resolution Imagery Data
No ratings yet
Development of Object Oriented Bare Surface Feature Extraction Algorithm For Desertification Early Warning Using Nigeria Sat-2 High Resolution Imagery Data
9 pages
E11 Full
No ratings yet
E11 Full
4 pages
BatteryML Paper
No ratings yet
BatteryML Paper
22 pages
Anomaly Detection in Public Procurements
No ratings yet
Anomaly Detection in Public Procurements
8 pages
Learning Dynamic Dependencies With Graph Evolution Recurrent Unit For Stock Predictions
No ratings yet
Learning Dynamic Dependencies With Graph Evolution Recurrent Unit For Stock Predictions
13 pages
Ijst 2021 1266
No ratings yet
Ijst 2021 1266
15 pages
(Slide) Logistic Regression
No ratings yet
(Slide) Logistic Regression
42 pages
NeurIPS 2023 Imitation Learning From Imperfection Theoretical Justifications and Algorithms Paper Conference
No ratings yet
NeurIPS 2023 Imitation Learning From Imperfection Theoretical Justifications and Algorithms Paper Conference
40 pages
Pranay Mahawar 23M1349
No ratings yet
Pranay Mahawar 23M1349
1 page
46 - Fidelity - Detecting Market Manipulation in Small Cap Equities
No ratings yet
46 - Fidelity - Detecting Market Manipulation in Small Cap Equities
25 pages
Distracted Driver Research Paper
No ratings yet
Distracted Driver Research Paper
7 pages
Delving Into Machine Learning's Influence On Disease Diagnosis and Prediction
No ratings yet
Delving Into Machine Learning's Influence On Disease Diagnosis and Prediction
13 pages
ICT583 Case Study (1) (1) .Edited
No ratings yet
ICT583 Case Study (1) (1) .Edited
9 pages
Yolov8-Based Visual Detection of Road Hazards: Potholes, Sewer Covers, and Manholes
No ratings yet
Yolov8-Based Visual Detection of Road Hazards: Potholes, Sewer Covers, and Manholes
7 pages
BERT-Based Approach For Greening Software Requirements Engineering Through Non-Functional Requirements
No ratings yet
BERT-Based Approach For Greening Software Requirements Engineering Through Non-Functional Requirements
13 pages
Detection and Classification of Gastrointestinal Disease Using Convolutional Neural Network and SVM
No ratings yet
Detection and Classification of Gastrointestinal Disease Using Convolutional Neural Network and SVM
24 pages
Testmagzine Admin,+1694+8
No ratings yet
Testmagzine Admin,+1694+8
4 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet

TE - DWM Module No 3

Uploaded by

TE - DWM Module No 3

Uploaded by

MODULE NO : 03

• Classification is a form of data analysis that extracts models

• Such models, called classifiers, predict categorical class labels

• E.g. we can build a classification model to categorize bank

• A bank loans officer need to analysis from the data to learn

• In the first step, we build a classification model based on previous

• In the second step, we determine if the model’s accuracy is

You might also like