Classification and Prediction

The document discusses classification and prediction, describing classification as predicting categorical class labels by constructing a model based on training data, while regression models continuous functions. It covers issues in classification like data preparation and model evaluation, and describes decision tree induction as a method for classification that generates trees to partition data based on attribute tests at internal nodes.

Uploaded by

Bhagirath Prajapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views14 pages

Classification and Prediction

Uploaded by

Bhagirath Prajapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 14

Classification and

Prediction
Classification and Prediction
 What is classification? What is
regression?
 Issues regarding classification and
prediction
 Classification by decision tree induction
 Scalable decision tree induction
Classification vs. Prediction
 Classification:
 predicts categorical class labels
 classifies data (constructs a model) based on the
training set and the values (class labels) in a classifying
attribute and uses it in classifying new data
 Regression:
 models continuous-valued functions, i.e., predicts
unknown or missing values
 Typical Applications
 credit approval
 target marketing
 medical diagnosis
 treatment effectiveness analysis
Why Classification? A motivating
application
 Credit approval
 A bank wants to classify its customers based on whether
they are expected to pay back their approved loans
 The history of past customers is used to train the
classifier
 The classifier provides rules, which identify potentially
reliable future customers
 Classification rule:
 If age = “31...40” and income = high then credit_rating =
excellent
 Future customers
 Paul: age = 35, income = high  excellent credit rating
 John: age = 20, income = medium  fair credit rating
Classification—A Two-Step Process
 Model construction: describing a set of predetermined
classes
 Each tuple/sample is assumed to belong to a predefined class,
as determined by the class label attribute
 The set of tuples used for model construction: training set
 The model is represented as classification rules, decision
trees, or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test samples is compared with the

classified result from the model

 Accuracy rate is the percentage of test set samples that

are correctly classified by the model

 Test set is independent of training set, otherwise over-

fitting will occur

Classification Process (1):
Model Construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier

Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
Classification Process (2): Use
the Model in Prediction
Accuracy=?
Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Mellisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Supervised vs. Unsupervised
Learning
 Supervised learning (classification)
 Supervision: The training data (observations,
measurements, etc.) are accompanied by labels
indicating the class of the observations
 New data is classified based on the training set
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with
the aim of establishing the existence of classes or
clusters in the data
Issues regarding classification and
prediction (1): Data Preparation
 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data
 numerical attribute income  categorical
{low,medium,high}
 normalize all numerical attributes to [0,1)
Issues regarding classification and
prediction (2): Evaluating Classification
Methods
 Predictive accuracy
 Speed
 time to construct the model
 time to use the model
 Robustness
 handling noise and missing values
 Scalability
 efficiency in disk-resident databases
 Interpretability:
 understanding and insight provided by the model
 Goodness of rules (quality)
 decision tree size
 compactness of classification rules
Classification by Decision Tree
Induction
 Decision tree
 A flow-chart-like tree structure
 Internal node denotes a test on an attribute
 Branch represents an outcome of the test
 Leaf nodes represent class labels or class distribution
 Decision tree generation consists of two phases
 Tree construction
 At start, all the training examples are at the root

 Partition examples recursively based on selected attributes

 Tree pruning
 Identify and remove branches that reflect noise or outliers

 Use of decision tree: Classifying an unknown sample

 Test the attribute values of the sample against the decision tree
Training Dataset
age income student credit_rating buys_computer
This <=30 high no fair no
<=30 high no excellent no
follows 31…40 high no fair yes
an >40 medium no fair yes
example >40 low yes fair yes
>40 low yes excellent no
from 31…40 low yes excellent yes
Quinlan’s <=30 medium no fair no
<=30 low yes fair yes
ID3 >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Output: A Decision Tree for
“buys_computer”

age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
Scalable Decision Tree Induction Methods

 SLIQ (EDBT’96 — Mehta et al.)

 Builds an index for each attribute and only class list and the
current attribute list reside in memory
 SPRINT (VLDB’96 — J. Shafer et al.)
 Constructs an attribute list data structure
 PUBLIC (VLDB’98 — Rastogi & Shim)
 Integrates tree splitting and tree pruning: stop growing the
tree earlier
 RainForest (VLDB’98 — Gehrke, Ramakrishnan &
Ganti)
 Builds an AVC-list (attribute, value, class label)
 BOAT (PODS’99 — Gehrke, Ganti, Ramakrishnan &
Loh)
 Uses bootstrapping to create several small samples

Chapter 2-Tacheometric-Surveying-Examples
No ratings yet
Chapter 2-Tacheometric-Surveying-Examples
51 pages
Classification
No ratings yet
Classification
23 pages
Practical Jetpack Compose
No ratings yet
Practical Jetpack Compose
211 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
Lecture-5 Classification in ML
No ratings yet
Lecture-5 Classification in ML
50 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Percentage Notes
No ratings yet
Percentage Notes
4 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
Classification (Part II)
No ratings yet
Classification (Part II)
162 pages
Physics
100% (1)
Physics
74 pages
Entire and Meromorphic Functions
No ratings yet
Entire and Meromorphic Functions
88 pages
DW Unit 6-Min
No ratings yet
DW Unit 6-Min
44 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
Unit 4 - Classification and Prediction
No ratings yet
Unit 4 - Classification and Prediction
72 pages
Herb Garden Design
100% (5)
Herb Garden Design
306 pages
New Classification11
No ratings yet
New Classification11
98 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Unit 3
No ratings yet
Unit 3
53 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
Classification-1
No ratings yet
Classification-1
48 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Tabullation
No ratings yet
Tabullation
35 pages
Classification
No ratings yet
Classification
73 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Agard Ar 138 PDF
No ratings yet
Agard Ar 138 PDF
612 pages
Exarchos Dimitrios 2007 Iannis Xenakis and Sieve Theory Vol 1 Text
No ratings yet
Exarchos Dimitrios 2007 Iannis Xenakis and Sieve Theory Vol 1 Text
221 pages
Unit 3 DM
No ratings yet
Unit 3 DM
34 pages
What Is Classification? What Is Prediction?
No ratings yet
What Is Classification? What Is Prediction?
36 pages
Skin Temperature To Core Temperature
No ratings yet
Skin Temperature To Core Temperature
19 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Hypersoft Sets
No ratings yet
Hypersoft Sets
18 pages
ST Report Indeterminate
No ratings yet
ST Report Indeterminate
20 pages
CH 5
No ratings yet
CH 5
84 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Unit6 - 1 Classification-and-Prediction-Basics
No ratings yet
Unit6 - 1 Classification-and-Prediction-Basics
12 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
K Means Clustering - Ipynb - Colaboratory
No ratings yet
K Means Clustering - Ipynb - Colaboratory
4 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
DM Unit-3
No ratings yet
DM Unit-3
23 pages
Down 4
No ratings yet
Down 4
83 pages
Introduction To Aeroelasticity 2023
No ratings yet
Introduction To Aeroelasticity 2023
31 pages
Latex Pratical Front Page.
No ratings yet
Latex Pratical Front Page.
3 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Classification
No ratings yet
Classification
33 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Classification
No ratings yet
Classification
81 pages
Module 04
No ratings yet
Module 04
75 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
DSP Lab Experiments
No ratings yet
DSP Lab Experiments
13 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
7 Classification
100% (3)
7 Classification
63 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Chapter 8 Surds
No ratings yet
Chapter 8 Surds
2 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
ANSYS HFSS W03 11 3D Modeler Parameterized Horn
No ratings yet
ANSYS HFSS W03 11 3D Modeler Parameterized Horn
17 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Introduction To Runge Kutta Method
No ratings yet
Introduction To Runge Kutta Method
8 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
LSAT Blog - Logical Reasoning Spreadsheet - Older
100% (1)
LSAT Blog - Logical Reasoning Spreadsheet - Older
25 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Determinants - Practice Sheet - VIJETA SERIES CLASS-12TH
No ratings yet
Determinants - Practice Sheet - VIJETA SERIES CLASS-12TH
4 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
5.multiple Regression
No ratings yet
5.multiple Regression
17 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
95 pages
How To Use ZedGrap1
No ratings yet
How To Use ZedGrap1
7 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Student Directory of Narrabundah College
No ratings yet
Student Directory of Narrabundah College
4 pages
Finance - Assignment Sample
100% (5)
Finance - Assignment Sample
11 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
MATH 600, 2nd Examination: Rings and Modules Solutions and Grading Key
No ratings yet
MATH 600, 2nd Examination: Rings and Modules Solutions and Grading Key
3 pages
Experiment-3: AIM: WAP To 8 Puzzle Problem Using BFS. Theory
No ratings yet
Experiment-3: AIM: WAP To 8 Puzzle Problem Using BFS. Theory
7 pages
Groovy Like An Old Time Movie
No ratings yet
Groovy Like An Old Time Movie
7 pages
Mock Lrdii
No ratings yet
Mock Lrdii
8 pages
Significance of 248 in Man Part 2
No ratings yet
Significance of 248 in Man Part 2
24 pages
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet