0% found this document useful (0 votes)

10 views

Lecture 8

Uploaded by

Mrawan Taha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lecture 8

Uploaded by

Mrawan Taha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Lec.8.

computational
Tools for

4170201
Classification

2
•Classification is (supervised learning): a
form of data analysis that extracts models escribing
important data classes.

• Such analysis can help provides us with a better

understanding of the large data .

• Recent data science researches has built on such

work, developing scalable classification and
prediction techniques capable of handling large
amounts of disk-resident data.
General Approach to Classification
•Data classification is a two-step
process, consisting of :
learning step (where a classification
model is constructed).

classification step (where the model is

used to predict class labels for given
data).
Classification—A Two-Step Process:
• Model construction (Learning step): describing
a set of predetermined classes
Each sample is assumed to belong to a
predefined class, as determined by the class
label attribute.
The set of sample used for model construction
is training set.
The model is represented as classification
rules, decision trees, or mathematical
formulae.
5
Classification—A Two-Step Process
• Model usage: for classifying future or unknown
objects
Estimate accuracy of the model
 The known label of test sample is compared
with the classified result from the model
 Accuracy rate is the percentage of test set
samples that are correctly classified by the
model
 Test set is independent of training set .
 If the accuracy is acceptable, use the model to
classify new data.
• Note: If the test set is used to select models, it is
called validation (test) set 6
Process (1): Model Construction

Classification
Training
Algorithms
Data

NAME RANK YEARS TENURED

M ike A ssistant P rof 3 no Classifier
M ary A ssistant P rof 7 yes (Model)
B ill P rofessor 2 yes
Jim A ssociate P rof 7 yes
D ave A ssistant P rof 6 no
IF rank = ‘professor’
A nne A ssociate P rof 3 no
OR years > 6
THEN tenured = ‘yes’
7
Process (2): Using the Model in Prediction

Classifier

Testing Unseen Data

Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
T om A ssistant P rof 2 no Tenured?
M erlisa A ssociate P rof 7 no
G eorge P rofessor 5 yes
Joseph A ssistant P rof 7 yes

8
Classification
- The primary task performed by classifiers is to assign
labels to objects.
- Labels in classifiers are pre-determined unlike in
clustering where we discover the structure and assign
labels.
- Classifier problems are supervised learning methods.

Examples for Classification techniques(Methods):

Decision Trees

9
Classification Basic concepts:
Decision Trees

10
Decision Trees
• Decision Trees are a flexible method very commonly deployed in
classification applications.

• There are two types of trees; Classification Trees and Regression

(or Prediction) Trees

• Classification Trees (we will use in these slides) – are used to

segment observations into more homogenous groups (assign class
labels). They usually apply to outcomes that are binary or
categorical in nature.
• Regression Trees – are variations of regression and what is
returned in each node is the average value at each node (type of a
step function with which the average value can be computed).
Regression trees can be applied to outcomes that are continuous
(like account spend or personal income).
11
Decision Tree Classifier - What is it?

• Used for classification:

• Input variables can be continuous or discrete
• Output:
 A tree that describes the decision flow.

 Leaf nodes return either a probability score, or

simply a classification.

 Trees can be converted to a set of "decision rules“

 "IF income < $50,000 AND mortgage_amt >

$100K THEN default=T with 75% probability“
12
Classification

• Classification: assign labels to objects.

• Usually supervised: training set of pre-classified
examples.
• examples for Classification
techniques(Methods):
Decision Trees
(and Regression)

13
Root &C
parent
node

childe
branches

14
Trees

• A tree is a hierarchical data structure consisting of:

 Nodes – store information
 Branches – connect the nodes
• The top node is the root, occupying the highest
hierarchy.
• The leaves are at the bottom, occupying the lowest
hierarchy.
• Every node, except the root, has exactly one parent.
• Every node may have zero or more child nodes.
• A binary tree restricts the number of children per
node to a maximum of two
• Degenerate trees have only a single pathway from
root to its one leaf.
15
• Each node may have a left child and a right
child.
• If you start from any node and move
upward, you will eventually reach the root.
•depth: the path length from the root of the
tree to this node
Creating a Decision Tree
Let us consider a scenario where,
a new planet is discovered by a group of
astronomers. Now the question is whether it
could be ‘the next earth?’.

The decision factors can be whether,

what is the temperature, Water is present on
the planet, whether the surface is prone to
continuous storms, flora and fauna survives
the climate or not, etc.
Creating a Decision Tree Example
Decision Tree – Example of Visual Structure

Gender
Female Male
Branch – outcome of test

Income Age Internal Node – decision on variable

<=45,000 >45,000 <=40 >40

Yes No Yes No Leaf Node – class label

19
• Branches refer to the outcome of a decision .When
the decision is numerical, the “greater than” branch
is usually shown on the right and “less than” on the
left.

• Internal Nodes are the decision or test points.

Each refers to a single variable or attribute. In the
example here the outcomes are binary, although
there could be more than 2 branches stemming
from an internal node. For example, if the variable
was categorical and had 3 choices, you might need
a branch for each choice.
20
The Leaf Nodes are at the end of the last
branch on the tree. These represent the
outcome of all the prior decisions. The leaf
nodes are the class labels, or the segment in
which all observations that follow the path to
the leaf would be placed.

21
Advantages of Decision Trees
• Easy to understand.
• Map nicely to a set of production rules.
• Applied to real problems.
• Able to process both numerical and
categorical data.
Disadvantages of Decision Trees
• Output attribute must be categorical.
• Limited to one output attribute.
• Decision tree algorithms are unstable( slight
variations in the training set can results in
different attribute selections).
• Trees created from numeric datasets can be
complex as attribute splits for numeric data are
typically binary)
From Trees to rules
Decision trees can be nicely mapped to a set of production rules ─
one advantage of DTs

One rule for each leaf

From Trees to rules
Is Person Fit or Unfit?

If age <30 and eat pizza then unfit

THEN
AND = Yes Unfit
Eat
= Yes Pizza?
IF No
Fit
Age < 30
No Yes Fit
Exercise
No
Unfit
From Trees to rules

IF Temperature is not between -10 and

60 THEN Survival Difficult

Whether water is
present or not?

Whether flora and fauna

flourishes?

The planet has a

stormy surface?

Thus, we a have a decision tree

Decision Tree Classifier - Reasons to Choose (+)
& Cautions (-)
Reasons to Choose (+) Cautions (-)
Takes any input type (numeric, categorical) Decision surfaces can only be axis-aligned
In principle, can handle categorical variables with
many distinct values (ZIP code)
Robust with redundant variables, correlated variables Tree structure is sensitive to small changes in the
training data
Naturally handles variable interaction A "deep" tree is probably over-fit
Because each split reduces the training data for
subsequent splits
Handles variables that have non-linear effect on Not good for outcomes that are dependent on many
outcome variables
Related to over-fit problem, above
Computationally efficient to build Doesn't naturally handle missing values;
However most implementations include a
method for dealing with this
Easy to score data In practice, decision rules can be fairly complex

Many algorithms can return a measure of variable

importance
In principle, decision rules are easy to understand

27
Check Your Knowledge
Your Thoughts?

1. How do you define information gain?

2. List three use cases of Decision Trees.
3. What are weak learners and how are they used in
ensemble methods?

4. Why do we end up with an over fitted model with

deep trees and in data sets when we have outcomes
that are dependent on many variables?
28

VDP Pitch Deck
No ratings yet
VDP Pitch Deck
38 pages
Lecture 023+-+Decision+Trees+ - 1
No ratings yet
Lecture 023+-+Decision+Trees+ - 1
54 pages
Lec.7.intro.D.S. Fall 2023
No ratings yet
Lec.7.intro.D.S. Fall 2023
26 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Module 04
No ratings yet
Module 04
75 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Module 04 Edited
No ratings yet
Module 04 Edited
19 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
CH 5
No ratings yet
CH 5
84 pages
Module 6
No ratings yet
Module 6
82 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
16 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
unit 5
No ratings yet
unit 5
25 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
ml unit 3
No ratings yet
ml unit 3
13 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
7 Classification
100% (3)
7 Classification
63 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Chapter 2 Types of Machine Learning and Their Learning Strategies
No ratings yet
Chapter 2 Types of Machine Learning and Their Learning Strategies
45 pages
Week 8 - Understanding the Decision Tree
No ratings yet
Week 8 - Understanding the Decision Tree
28 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
4 Classification
No ratings yet
4 Classification
20 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
3 Dtrees-Lect6
No ratings yet
3 Dtrees-Lect6
63 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Lecture 6 Classification-Decision Tree Rule Based K-NN
No ratings yet
Lecture 6 Classification-Decision Tree Rule Based K-NN
73 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Down 4
No ratings yet
Down 4
83 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Classification
No ratings yet
Classification
33 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
Unit 3
No ratings yet
Unit 3
16 pages
Module 3
No ratings yet
Module 3
64 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
DMDM Part 2
No ratings yet
DMDM Part 2
94 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Chapter 12 - Lecture 1 Linear Regression Model and Estimation of Parameters
No ratings yet
Chapter 12 - Lecture 1 Linear Regression Model and Estimation of Parameters
19 pages
Codegemma Report
No ratings yet
Codegemma Report
9 pages
Arcom ELAN-104NC: PC/104 Compatible Embedded Processor Card Technical Manual
No ratings yet
Arcom ELAN-104NC: PC/104 Compatible Embedded Processor Card Technical Manual
55 pages
(Ebook) Encyclopedia of Information Systems, Four-Volume Set by Hossein Bidgoli ISBN 9780122272400, 0122272404 All Chapters Instant Download
100% (3)
(Ebook) Encyclopedia of Information Systems, Four-Volume Set by Hossein Bidgoli ISBN 9780122272400, 0122272404 All Chapters Instant Download
71 pages
W2733841M PDF
No ratings yet
W2733841M PDF
84 pages
3.2.4.7 Lab - Researching RFCs
No ratings yet
3.2.4.7 Lab - Researching RFCs
6 pages
PLDT Osp Training
100% (1)
PLDT Osp Training
55 pages
Boundary Value Analysis
No ratings yet
Boundary Value Analysis
5 pages
Real Time Manual Testing MCQ
100% (3)
Real Time Manual Testing MCQ
31 pages
Social Media Community Using Optimized Clustering Algorithm Data Mining Project
No ratings yet
Social Media Community Using Optimized Clustering Algorithm Data Mining Project
2 pages
(Name of Agency) Annual Procurement Plan For FY - : Code (PAP) Procurement Program/Project Pmo/ End-User
No ratings yet
(Name of Agency) Annual Procurement Plan For FY - : Code (PAP) Procurement Program/Project Pmo/ End-User
6 pages
thuvienhoclieu.com-De-kiem-tra-cuoi-HK2-Tieng-Anh-8-Global-De-3-
No ratings yet
thuvienhoclieu.com-De-kiem-tra-cuoi-HK2-Tieng-Anh-8-Global-De-3-
6 pages
Development Organisations in Pakistan
No ratings yet
Development Organisations in Pakistan
42 pages
NBDVRDUOHD - Instruction Manual (English R4)
No ratings yet
NBDVRDUOHD - Instruction Manual (English R4)
43 pages
Pluses, Minuses, Interesting/Implications (PMI) Chart Instructions
No ratings yet
Pluses, Minuses, Interesting/Implications (PMI) Chart Instructions
1 page
As 2578-2009 Traffic Signal Controllers
100% (1)
As 2578-2009 Traffic Signal Controllers
9 pages
EMP. TECH. Lesson 5 Advanced Spreadsheet and Presentation Skills
No ratings yet
EMP. TECH. Lesson 5 Advanced Spreadsheet and Presentation Skills
18 pages
AI Notes Module 1
No ratings yet
AI Notes Module 1
14 pages
Dbms Final
No ratings yet
Dbms Final
85 pages
Ntag213 215 216
No ratings yet
Ntag213 215 216
60 pages
Abdul Muhaimin Bin Che Sohor: Personal Details
No ratings yet
Abdul Muhaimin Bin Che Sohor: Personal Details
3 pages
Week 2 Decision Tree
No ratings yet
Week 2 Decision Tree
20 pages
Internship
No ratings yet
Internship
51 pages
Computer System Architecture Lab Report 3
No ratings yet
Computer System Architecture Lab Report 3
7 pages
Hardware-Software Debugging Techniques For Reconfigurable Systems-on-Chip
No ratings yet
Hardware-Software Debugging Techniques For Reconfigurable Systems-on-Chip
6 pages
Smart Printing Studio - Ass - Group Franchise - Group 4
No ratings yet
Smart Printing Studio - Ass - Group Franchise - Group 4
51 pages
How Blockchain Technology Boosts Operations Excellence 4.0 of Chemical Companies
100% (1)
How Blockchain Technology Boosts Operations Excellence 4.0 of Chemical Companies
24 pages
Row Chaining and Row Migration
No ratings yet
Row Chaining and Row Migration
8 pages
Case Study
No ratings yet
Case Study
7 pages

Lecture 8

Uploaded by

Lecture 8

Uploaded by

Lec.8.

• Such analysis can help provides us with a better

• Recent data science researches has built on such

classification step (where the model is

NAME RANK YEARS TENURED

Testing Unseen Data

Examples for Classification techniques(Methods):

• There are two types of trees; Classification Trees and Regression

• Classification Trees (we will use in these slides) – are used to

• Used for classification:

 Leaf nodes return either a probability score, or

 Trees can be converted to a set of "decision rules“

 "IF income < $50,000 AND mortgage_amt >

• Classification: assign labels to objects.

• A tree is a hierarchical data structure consisting of:

The decision factors can be whether,

Income Age Internal Node – decision on variable

<=45,000 >45,000 <=40 >40

Yes No Yes No Leaf Node – class label

• Internal Nodes are the decision or test points.

One rule for each leaf

If age <30 and eat pizza then unfit

IF Temperature is not between -10 and

Whether flora and fauna

The planet has a

Thus, we a have a decision tree

Many algorithms can return a measure of variable

1. How do you define information gain?

4. Why do we end up with an over fitted model with

You might also like