04 Classification

The document discusses data modeling techniques, particularly focusing on decision trees for customer profiling and classification tasks. It outlines the process of building decision trees, including training and test sets, tree construction methods, and evaluation criteria like information gain. Additionally, it provides examples of data and explains the structure of decision trees, including nodes and attributes.

Uploaded by

William D2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views20 pages

04 Classification

Uploaded by

William D2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

DATA MODELING

▪ sample data to get a training and test set

▪ utilize decision trees to generate rules for profiling
customers using training set
▪ utilize validation techniques on the set to determine
accuracy of predictions
▪ perform sequential analysis to determine the
sequence of call transaction made
▪ Classification is a data mining task of predicting
the value of a categorical variable by building a
model based on one or more numerical and/or
categorical variables
Outlook Temperature Humidity Windy Play?
sunny hot high false No
sunny hot high true No
overcast hot high false Yes
rain mild high false Yes
rain cool normal false Yes
rain cool normal true No
overcast cool normal true Yes

sunny mild high false No

sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No
▪ Decision Tree Induction
▪ Bayesian Classification
▪ Backpropagation
▪ Association Rule Mining
▪ Decision Trees are one of most common methods
to build models
▪ Intuitive appeal for users
▪ Presentation Forms
▪ “if, then” statements (decision rules)
▪ graphically - decision trees
▪ Works like a flow chart
▪ Looks like an upside down tree
▪ Nodes
▪ appear as rectangles or circles
▪ represent test or decision

▪ represent outcome of a test

▪ terminal (leaf) nodes
▪ root node
▪ Internal nodes
▪ An internal node is a test on an attribute.
▪ A branch represents an outcome of the test, e.g.,
Color=red.
▪ A leaf node represents a class label or class label
distribution.
▪ At each node, one attribute is chosen to split training
examples into distinct classes as much as possible
▪ A new case is classified by following a matching path to a
leaf node.
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class
Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
▪ Top-down tree construction
▪ At start, all training examples are at the root.
▪ Partition the examples recursively by choosing one
attribute each time.
▪ Bottom-up tree pruning
▪ Remove sub-trees or branches, in a bottom-up
manner, to improve the estimated accuracy on new
cases.
▪ At each node, available attributes are evaluated on
the basis of separating the classes of the training
examples. A goodness function is used for this
purpose.
▪ Typical goodness functions:
▪ information gain (ID3/C4.5)
▪ information gain ratio
▪ gini index
Outlook Temperature Humidity Windy Play?
sunny hot high false No
sunny hot high true No
overcast hot high false Yes
rain mild high false Yes
rain cool normal false Yes
rain cool normal true No
overcast cool normal true Yes

sunny mild high false No

sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No
▪ Which is the best attribute?
▪ The one which will result in the smallest tree
▪ Heuristic: choose the attribute that produces the “purest” nodes

▪ Popular impurity criterion: information gain

▪ Information gain increases with the average purity of the subsets
that an attribute produces
▪ Strategy: choose attribute that results in greatest
information gain
▪ Information is measured in bits
▪ Given a probability distribution, the info required to predict an
event is the distribution’s entropy
▪ Entropy gives the information required in bits (this can involve
fractions of bits!)
▪ Watch this video:

▪ https://www.youtube.com/watch?v=_L39rN6gz7Y&t=722s
Outlook Temperature Humidity Windy Play?
sunny hot high false No
▪ Create the decision tree of the sunny hot high true No
following data: overcast hot high false Yes
rain mild high false Yes
rain cool normal false Yes
rain cool normal true No
overcast cool normal true Yes

sunny mild high false No

sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No

wfm01 - s20 - QP - f2 (Ial)
No ratings yet
wfm01 - s20 - QP - f2 (Ial)
28 pages
7 Classification
100% (3)
7 Classification
63 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
DWDM UNIT-IV Classification and Prediction
100% (1)
DWDM UNIT-IV Classification and Prediction
70 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Exct 8
50% (4)
Exct 8
2 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Chap4 - Basic - Classification - Class Teaching
No ratings yet
Chap4 - Basic - Classification - Class Teaching
168 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Module 04
No ratings yet
Module 04
75 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
AREA UNDER CURVE JEE MAIN Previous Year Q Bank Till 2017
No ratings yet
AREA UNDER CURVE JEE MAIN Previous Year Q Bank Till 2017
5 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
1222201922027PM-Class 9 Maths Worksheet - HERONS FORMULA
100% (1)
1222201922027PM-Class 9 Maths Worksheet - HERONS FORMULA
2 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Electromagnetic Fields R 22 - Hyd ECE Course Structure & Syllabus
No ratings yet
Electromagnetic Fields R 22 - Hyd ECE Course Structure & Syllabus
2 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Class Basic
No ratings yet
Class Basic
75 pages
CH 5
No ratings yet
CH 5
84 pages
Module 3
No ratings yet
Module 3
64 pages
Week 5
No ratings yet
Week 5
72 pages
Lec05 Classification DecisionTree
No ratings yet
Lec05 Classification DecisionTree
67 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Lecture 06 Part A - Macine Learning
No ratings yet
Lecture 06 Part A - Macine Learning
77 pages
الإحصاء الهندسي
No ratings yet
الإحصاء الهندسي
64 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
DM Module-3 Notes
No ratings yet
DM Module-3 Notes
25 pages
AI Lecture 9
No ratings yet
AI Lecture 9
69 pages
Solving Equation
No ratings yet
Solving Equation
35 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Classification
No ratings yet
Classification
45 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Chapter 02 - DM Tasks - Part I - Classification
No ratings yet
Chapter 02 - DM Tasks - Part I - Classification
58 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
6 الى13 داتا ماينق
No ratings yet
6 الى13 داتا ماينق
19 pages
Tree Models
No ratings yet
Tree Models
42 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
AIML Lect5 Decision Tree
No ratings yet
AIML Lect5 Decision Tree
33 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Lecture 1 Functions, Limits and Continuity
No ratings yet
Lecture 1 Functions, Limits and Continuity
16 pages
DWDM Unit IV Note
No ratings yet
DWDM Unit IV Note
21 pages
2WB05 Simulation Lecture 5: Random-Number Generators: Marko Boon
No ratings yet
2WB05 Simulation Lecture 5: Random-Number Generators: Marko Boon
32 pages
Tugas Termodinamika
No ratings yet
Tugas Termodinamika
28 pages
06 Classification
No ratings yet
06 Classification
32 pages
Class 6th Question Paper
No ratings yet
Class 6th Question Paper
2 pages
Circle Geometry Notes
No ratings yet
Circle Geometry Notes
2 pages
PE ZC213 / TA ZC233 Engineering Measurements L-3: BITS Pilani
No ratings yet
PE ZC213 / TA ZC233 Engineering Measurements L-3: BITS Pilani
17 pages
Shell Programming
No ratings yet
Shell Programming
50 pages
DM Unit-3
No ratings yet
DM Unit-3
23 pages
A I T S 16 - PAPER - 2 - ADVANCED - 01 05 2020 AnswerKey - HTML
No ratings yet
A I T S 16 - PAPER - 2 - ADVANCED - 01 05 2020 AnswerKey - HTML
15 pages
Chapter 6. Decision Tree Classification
No ratings yet
Chapter 6. Decision Tree Classification
19 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Win Gravitation
100% (1)
Win Gravitation
4 pages
4 - Pseudocode With WHILE
No ratings yet
4 - Pseudocode With WHILE
17 pages
Curriculum Map Math 7 Q4
No ratings yet
Curriculum Map Math 7 Q4
3 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
How To Calculate The Center of Gravity
No ratings yet
How To Calculate The Center of Gravity
18 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Fast Narrow Bounds On The Value of Asian Options
No ratings yet
Fast Narrow Bounds On The Value of Asian Options
12 pages
Chapter 7 Partial Redundancy Analysis - Workshop 10 - Advanced Multivariate Analyses in R
No ratings yet
Chapter 7 Partial Redundancy Analysis - Workshop 10 - Advanced Multivariate Analyses in R
8 pages
Wilcox Et Al-2017-Ecology Letters
No ratings yet
Wilcox Et Al-2017-Ecology Letters
12 pages
Ijce V4i3p110
No ratings yet
Ijce V4i3p110
6 pages
Formative Versus Reflective Measurement Implications For Explaining Innovation in Marketing Partnerships
No ratings yet
Formative Versus Reflective Measurement Implications For Explaining Innovation in Marketing Partnerships
9 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Class Vi, Maths
No ratings yet
Class Vi, Maths
4 pages
Circles and Composite Shapes
No ratings yet
Circles and Composite Shapes
6 pages
Problems Set 1
No ratings yet
Problems Set 1
6 pages
Problem Set - VI
No ratings yet
Problem Set - VI
2 pages
Graph Parameters
No ratings yet
Graph Parameters
3 pages
Teori Idw Dari Arcgis
No ratings yet
Teori Idw Dari Arcgis
2 pages
Live Like A King - Spend Like A Pauper
From Everand
Live Like A King - Spend Like A Pauper
Richard Lincton
No ratings yet
A Guide to Basic Small Off Grid Alternative Power
From Everand
A Guide to Basic Small Off Grid Alternative Power
bill rosoman
No ratings yet