0% found this document useful (0 votes)

62 views

Datamining

The document discusses using different decision tree algorithms - information gain, gain ratio, and Gini index - to create a decision tree for a dataset on whether individuals buy computers. It calculates the information gain, gain ratio, and splits for each attribute to determine the root nodes and splitting criteria. The decision trees and rule sets produced by each algorithm are defined.

Uploaded by

ABDUL HANAN ZAHEER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Datamining

Uploaded by

ABDUL HANAN ZAHEER

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Mining

Decision Tree
Name: Hafiz Muhammad Behzad
Roll no: 17271519-027
-------------------------------------------------------------------------------------------------------------------------------

Q1. Create a decision tree using Information Gain, Gain Ratio and Gini Index of the following
data set. Also define the Rules Set of each decision tree.

Information Gain:
1. Calculate entropy of the target

entropy (Buys_computer) = Entropy (9, 5) = - (9/14log29/14) - (5/14log25/14) = 0.94

2. Calculate the information gain of the each attribute:

entropy (Age, Buys_computer) = P (<=30) * Entropy (2, 3) + P (31...40) * Entropy (4, 0)

+ P (>40, 30) * Entropy (3, 2)

= 5/14 * 0.97 + 4/14 * 0 + 5/14 * 0.97 = 0.345 + 0.345 = 0.69

Gain (Age, Buys_computer) = Entropy (Buys_computer) - Entropy (Age, Buys_computer)

= 0.94 – 0.69 = 0.24

entropy (Income, Buys_computer) = P (high) * Entropy (2, 2) + P (medium) * Entropy (4, 2)

+ P (low) * Entropy (3, 1)

= 4/14 * 1 + 6/14 * 0.92 + 4/14 * 0.81 = 0.29 + 0.39 + 0.23 = 0.91

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,

Buys_computer)

= 0.94 – 0.91 = 0.03

entropy (Student, Buys_computer) = P (no) * Entropy (3, 4) + P (yes) * Entropy (6, 1)

= 7/14 * 0.99 + 7/14 * 0.59 = 0.79

1
Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,
Buys_computer)

= 0.94 – 0.79 = 0.15

entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (6, 2) + P (excellent) * Entropy

(3,3)

= 8/14 * 0.81 + 6/14 * 1 = 0.89

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,

Buys_computer)

= 0.94 – 0.89 = 0.05

So, the largest gain attribute is Age.

Selected Age as root node with Childs <=30 , 31-40 -> buy=yes , >40 or 30 .

1. Calculate entropy of the target (when age<=30)

entropy (Buys_computer) =Entropy (2, 3) = 0.97

2. Calculate the information gain of the each attribute:

entropy (Income, Buys_computer) = P (high) * Entropy (0, 2) + P (medium) * Entropy (1, 1)

+ P (low) * Entropy (1, 0)

= 2/5 * 0 + 2/5 * 1 + 1/5 * 0 = 0.4

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,

Buys_computer)

= 0.97 – 0.4 = 0.53

entropy (Student, Buys_computer) = P (no) * Entropy (0, 2) + P (yes) * Entropy (0, 2) = 0

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,

Buys_computer)

2
= 0.97 – 0 = 0.97

Entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (1, 2) + P (excellent) * Entropy

(1, 1)

= 3/5 * 0.92 + 2/5 * 1

= 0.95

Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,

Buys_computer)

= 0.97 – 0.95 = 0.02

So, the largest gain attribute is Student.

The Student is Selected as root node.

1. Calculate entropy of the target (when age>30 or >40)

entropy (Buys_computer) =Entropy (3, 2) = 0.97

2. Calculate the information gain of the each attribute:

entropy (Income, Buys_computer) = P (medium) * Entropy (2, 1)

+ P (low) * Entropy (1, 1) = 3/5 * 0.92 + 2/5 * 1 = 0.95

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,

Buys_computer)

= 0.97 – 0.95 = 0.02

entropy (Student, Buys_computer) = P (no) * Entropy (1, 1) + P (yes) * Entropy (2, 1)

= 3/5 * 0.92 + 2/5 * 1 = 0.95

Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,

Buys_computer)

= 0.97 – 0.95 = 0.02

entropy (Credit_rating, Buys_computer) = P (fair) * Entropy (3, 0) + P (excellent) * Entropy (0,

= 3/5 * 0 + 2/5 * 0 = 0

3
Gain (Credit_rating, Buys_computer) = Entropy (Buys_computer) - Entropy (Credit_rating,
Buys_computer)

= 0.97 – 0 = 0.97

so, the largest gain attribute is Credit_rating

the Credit_rating is Selected as root node:

Rules set:

1. If age <=30 and student = no then buys_computer = no

2. If age <=30 and student = yes then buys_computer = yes
3. If age = 31 to 40 then buys_computer = yes
4. If age = >30 or >40 and credit rating = excellent then buys_computer = yes
5. If age = >30 or >40 and credit rating = fair then buys_computer = no

Gain Ration:
1. Calculate the gain ratio of the each attribute:

SplitInfo_age (<=30, 31...40,>40or>30) = -5/14log25/14-4/14log24/14-5/14log25/14 = 1.58

Gain Ratio (Age) = Gain (Age) / SplitInfo_age

= 0.24 / 1.58 = 0.15

SplitInfo_income (high, medium, low) = -4 / 14 log2 4 / 14 – 6 / 14 log2 6 / 14 – 4 / 14 log2 4 / 14

= 1.56

Gain Ratio (Income) = Gain (Income) / SplitInfo_income = 0.03 / 1.56 = 0.019

SplitInfo_student (yes, no) = -7 / 14 log2 7 / 14 – 7 / 14 log2 7 / 14 = 1

Gain Ratio (Student) = Gain (Student) / SplitInfo_student = 0.79 / 1 = 0.79

SplitInfo_credit_rating (fair, excellent) = -8 / 14 log2 8 / 14 – 6 / 14 log2 6 / 14

= 0.99

Gain Ratio (Credit_rating) = Gain (Credit_rating) / SplitInfo_credit_rating= 0.89 / 0.99 = 0.90

So, the largest gain ratio attribute is Credit_rating

Selected Credit_rating as root node:

Credit_rating can be fair or excellent.

4
1. Calculate the gain ratio of the each attribute:

SplitInfo_age (<=30, 31...40,>40or>30) = -3/8log23/8-2/14log22/14-3/14log23/14 = 1.56

entropy (Buys_computer) =Entropy (6, 2) = 0.81

2. Calculate the information gain of the each attribute:

entropy (Age, Buys_computer) = P (<=30) * Entropy (1, 2) + P (31..40) * Entropy (2, 0)

+ P (>40,>30) * Entropy (3, 0)

= 3/8 * 0.92 + 2/8 * 0 + 3/8 * 0 = 0.345

Gain (Age, Buys_computer) = Entropy (Buys_computer) - Entropy (Income, Buys_computer)

= 0.81 – 0.345 = 0.465

Gain Ratio (Age) = Gain (Age) / SplitInfo_age = 0.465 / 1.56 = 0.298

SplitInfo_income (high, middle,low) = 1.56

entropy (Buys_computer) =Entropy (6, 2) = 0.81

entropy (income, Buys_computer) = P (high) * Entropy (2, 1) + P (middle) * Entropy (2, 1)

+ P (low) * Entropy (2, 0)

= 3/8 * 0.92 + 3/8 * 0.92 = 0.345 + 0.345 = 0.69

Gain (Income, Buys_computer) = Entropy (Buys_computer) - Entropy (Income,

Buys_computer)

= 0.81 – 0.69 = 0.12

Gain Ratio (Income) = Gain (income) / SplitInfo_income = 0.12 / 1.56 = 0.08

SplitInfo_student (yes,no) = B

entropy (Buys_computer) =Entropy (6, 2) = 0.81

entropy (student, Buys_computer) = P (no) * Entropy (2, 2) + P (yes) * Entropy (4, 0)

= 4/8 * 1 + 4/8 * 0 = 0.5

5
Gain (Student, Buys_computer) = Entropy (Buys_computer) - Entropy (Student,
Buys_computer)

= 0.81 – 0.5 = 0.31

Gain Ratio (Student) = Gain (Student) / SplitInfo_student = 0.31 / 1 = 0.31

Credit_rating

Fair Excellent

Student student

Yes No yes no

Buy=Yes Age buy=yes buy=no

<=30 31..40 > 40 or > 30

buy = no buy = yes buy = yes

Outline For A Quantitative Study
100% (7)
Outline For A Quantitative Study
3 pages
1.4. Exact ODEs. Integrating Factors
No ratings yet
1.4. Exact ODEs. Integrating Factors
9 pages
Assignment-Decision Tree
No ratings yet
Assignment-Decision Tree
12 pages
Decision Tree and KNN Assignment Two
No ratings yet
Decision Tree and KNN Assignment Two
13 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
Apply Decision Tree Algorithm On The Following Table and Construct Decision Tree Accordingly
No ratings yet
Apply Decision Tree Algorithm On The Following Table and Construct Decision Tree Accordingly
7 pages
For Classification Models
No ratings yet
For Classification Models
47 pages
Unit3 DT Nodes
No ratings yet
Unit3 DT Nodes
6 pages
classification-by-decision-tree-induction
No ratings yet
classification-by-decision-tree-induction
25 pages
Classification Intr DT .Pptx
No ratings yet
Classification Intr DT .Pptx
31 pages
CALCULATION
No ratings yet
CALCULATION
15 pages
Q 3 Use Id3: Second Attribute Age
No ratings yet
Q 3 Use Id3: Second Attribute Age
4 pages
Homework1 Excersises
No ratings yet
Homework1 Excersises
12 pages
MIS416 Chapter6 by DrAsimAlwabel
No ratings yet
MIS416 Chapter6 by DrAsimAlwabel
73 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Machine Learning: BY:Vatsal J. Gajera (09BCE010)
No ratings yet
Machine Learning: BY:Vatsal J. Gajera (09BCE010)
25 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
Solutions To Part 2 of The Mid-Term Examination: I.E. 1062/2062 DATA MINING
No ratings yet
Solutions To Part 2 of The Mid-Term Examination: I.E. 1062/2062 DATA MINING
4 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
6 (4 Files Merged)
0% (1)
6 (4 Files Merged)
4 pages
DM GTU Study Material Presentations Unit-4 21052021124323PM
No ratings yet
DM GTU Study Material Presentations Unit-4 21052021124323PM
28 pages
DM DT Solved Example 01 - Unlocked
No ratings yet
DM DT Solved Example 01 - Unlocked
4 pages
Randomforest TNP
No ratings yet
Randomforest TNP
71 pages
FALLSEM2024-25 BCSE209L TH VL2024250101586 2024-07-30 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101586 2024-07-30 Reference-Material-I
22 pages
dm 3
No ratings yet
dm 3
37 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Grades Hardworking Intelligent Unlucky
No ratings yet
Grades Hardworking Intelligent Unlucky
3 pages
Mod 3 part1_merged
No ratings yet
Mod 3 part1_merged
101 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Chapter4 Machine Learning Part3
No ratings yet
Chapter4 Machine Learning Part3
43 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
University of Gondar: August 2011 E.C Gondar, Ethiopia
No ratings yet
University of Gondar: August 2011 E.C Gondar, Ethiopia
10 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
Decision Tree
100% (1)
Decision Tree
12 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
74 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Decision Tree Introduction
No ratings yet
Decision Tree Introduction
14 pages
LECTURE 8
No ratings yet
LECTURE 8
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
04 Classification
No ratings yet
04 Classification
72 pages
Data Mining Project
No ratings yet
Data Mining Project
21 pages
3 Decision Trees_LMS
No ratings yet
3 Decision Trees_LMS
47 pages
CH 5
No ratings yet
CH 5
81 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
dm4
No ratings yet
dm4
68 pages
Lesson 5
No ratings yet
Lesson 5
28 pages
Decision Trees
No ratings yet
Decision Trees
31 pages
Aiml Easy Solution
No ratings yet
Aiml Easy Solution
70 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
تمييز اشكال ميد
No ratings yet
تمييز اشكال ميد
267 pages
Unit 10 - Decision Trees
No ratings yet
Unit 10 - Decision Trees
21 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
08 Class Basic
No ratings yet
08 Class Basic
76 pages
3. Classification Trees,
No ratings yet
3. Classification Trees,
48 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
09 - ML - Decision Tree
No ratings yet
09 - ML - Decision Tree
45 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
The Mental Calculator: Learn to do Extraordinary Calculations at Lightning Speed
From Everand
The Mental Calculator: Learn to do Extraordinary Calculations at Lightning Speed
Seim Daniel
No ratings yet
Surviving Introduction to Finance
From Everand
Surviving Introduction to Finance
James Triplett
No ratings yet
AP Calculus Notes: Unit 6 - Definite Integrals
No ratings yet
AP Calculus Notes: Unit 6 - Definite Integrals
20 pages
MCR3U Exam Review2015
No ratings yet
MCR3U Exam Review2015
1 page
Signed Off - Practical Research 1G11 - q2 - Mod6 - Qualitativeresearch - v3 PDF
70% (10)
Signed Off - Practical Research 1G11 - q2 - Mod6 - Qualitativeresearch - v3 PDF
55 pages
Assgn 3 4
No ratings yet
Assgn 3 4
2 pages
Midterm Exam
No ratings yet
Midterm Exam
1 page
Maths Revision DPP No 5
No ratings yet
Maths Revision DPP No 5
8 pages
Name: Atrey Class: 12 School: The Doon School
No ratings yet
Name: Atrey Class: 12 School: The Doon School
33 pages
Chapter 12 ANOVA For Homework
No ratings yet
Chapter 12 ANOVA For Homework
5 pages
Smirnov - A Course of Higher Mathematics - Vol 3-2 - Complex Variables Special Functions
100% (1)
Smirnov - A Course of Higher Mathematics - Vol 3-2 - Complex Variables Special Functions
713 pages
Review Sheet - GR10
No ratings yet
Review Sheet - GR10
2 pages
Facebook Analytics For Target Marketing
No ratings yet
Facebook Analytics For Target Marketing
62 pages
Pubdoc 3 8192 1565
No ratings yet
Pubdoc 3 8192 1565
9 pages
Ebe1133 Lu2 2022
No ratings yet
Ebe1133 Lu2 2022
53 pages
Baye's Theorem
No ratings yet
Baye's Theorem
3 pages
Statistics and Probability
100% (1)
Statistics and Probability
11 pages
Finite Element Method Magnetics - AirCore
No ratings yet
Finite Element Method Magnetics - AirCore
3 pages
Signals and Systems - Mjroberts
100% (1)
Signals and Systems - Mjroberts
3 pages
Data Mining: Set-01: (Introduction)
No ratings yet
Data Mining: Set-01: (Introduction)
14 pages
Dynamic Optimization
No ratings yet
Dynamic Optimization
17 pages
Module 5 Ge 114
No ratings yet
Module 5 Ge 114
15 pages
Stats&Prob - WEEK 2&3
100% (2)
Stats&Prob - WEEK 2&3
2 pages
Course Outline Optimization Technique
No ratings yet
Course Outline Optimization Technique
2 pages
Linear Programming: Model Formulation and Graphical Solution
No ratings yet
Linear Programming: Model Formulation and Graphical Solution
48 pages
Newman Error Analysis
No ratings yet
Newman Error Analysis
8 pages
Quality Control and Realiability - All in One
No ratings yet
Quality Control and Realiability - All in One
430 pages
Data Mining
No ratings yet
Data Mining
2 pages
BA Sample Paper Questions2
0% (1)
BA Sample Paper Questions2
17 pages
MCQ Quantitative Techniques
No ratings yet
MCQ Quantitative Techniques
14 pages