0% found this document useful (0 votes)

16 views40 pages

Decision Tree Intro MDT903

Decision trees are supervised learning algorithms used for classification, capable of handling both categorical and continuous variables. They involve splitting data into homogeneous subsets based on significant attributes, with key concepts including root nodes, decision nodes, and pruning. While decision trees are easy to interpret and require minimal data cleaning, they can suffer from overfitting and may not perform well with continuous variables.

Uploaded by

gaxosiw600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views40 pages

Decision Tree Intro MDT903

Uploaded by

gaxosiw600

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 40

Classification:

Decision Trees

MDT903: AI Practitioner
Decision Tre

 Decision tree is a type of supervised

learning algorithm (having a pre-defined
target variable) that is mostly used in
classification problems.
 It works for both categorical and continuous
input and output variables. In this
technique, we split the population or
sample into two or more homogeneous sets
(or sub-populations) based on most
significant splitter / differentiator in input
variables.
2
Types of Decision Trees

 Types of decision tree is based on the type of

target variable we have. It can be of two types:
 Categorical Variable Decision Tree: Decision
Tree which has categorical target variable then it
called as categorical variable decision tree.
Example:- In above scenario of student problem,
where the target variable was “Student will play
cricket or not” i.e. YES or NO.
 Continuous Variable Decision Tree: Decision
Tree has continuous target variable then it is called
as Continuous Variable Decision Tree.

3
Important Terminology
related to Decision Trees
 Root Node: It represents entire population or sample and this
further gets divided into two or more homogeneous sets.
 Splitting: It is a process of dividing a node into two or more sub-
nodes.
 Decision Node: When a sub-node splits into further sub-nodes,
then it is called decision node.
 Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal
Pruning:
node. When we remove sub-nodes of a
decision node, this process is called pruning.
You can say opposite process of splitting.
Branch / Sub-Tree: A sub section of entire
tree is called branch or sub-tree.
Parent and Child Node: A node, which is
divided into sub-nodes is called parent node of
sub-nodes where as sub-nodes are the child of
parent node
4
Advantages
 Easy to Understand: Decision tree output is very easy to
understand even for people from non-analytical background. It does
not require any statistical knowledge to read and interpret them. Its
graphical representation is very intuitive and users can easily relate
their hypothesis.
 Useful in Data exploration: Decision tree is one of the fastest
way to identify most significant variables and relation between two
or more variables. With the help of decision trees, we can create
new variables / features that has better power to predict target
variable. It can also be used in data exploration stage. For example,
we are working on a problem where we have information available
in hundreds of variables, there decision tree will help to identify
most significant variable.
 Less data cleaning required: It requires less data cleaning
compared to some other modeling techniques. It is not influenced
by outliers and missing values to a fair degree.
 Data type is not a constraint: It can handle both numerical and
categorical variables.
 Non Parametric Method: Decision tree is considered to be a non-
parametric method. This means
5 that decision trees have no
Disadvantages

 Over fitting: Over fitting is one of the most

practical difficulty for decision tree models. This
problem gets solved by setting constraints on
model parameters and pruning (discussed in
detailed below).
 Not fit for continuous variables: While working
with continuous numerical variables, decision tree
looses information when it categorizes variables in
different categories.

6
Outline

 Top-Down Decision Tree Construction

 Choosing the Splitting Attribute
 Information Gain and Gain Ratio

7
DECISION TREE

8
DECISION TREE
 An internal node is a test on an attribute.
 A branch represents an outcome of the
test, e.g., Color=red.
 A leaf node represents a class label or
class label distribution.
 At each node, one attribute is chosen to
split training examples into distinct classes
as much as possible
 A new case is classified by following a
matching path to a leaf node.
9
Weather Data: Play or not Play?
Outlook Temperature Humidity Windy Play?
sunny hot high false No
sunny hot high true No Note:
overcast hot high false Yes Outlook is the
rain mild high false Yes Forecast,
rain cool normal false Yes
no relation to
rain cool normal true No
overcast cool normal true Yes
Microsoft
sunny mild high false No email program
sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No

10
Example Tree for “Play?”

Outlook

sunny rain
overcast

Humidity Yes
Windy

high normal true false

No Yes No Yes

11
Building Decision Tree [Q93]

 Top-down tree construction

 At start, all training examples are at the root.
 Partition the examples recursively by choosing
one attribute each time.
 Bottom-up tree pruning
 Remove subtrees or branches, in a bottom-up
manner, to improve the estimated accuracy on
new cases.

12
Choosing the Splitting Attribute

 At each node, available attributes are

evaluated on the basis of separating the
classes of the training examples. A
Goodness function is used for this purpose.
 Typical goodness functions:
 information gain (ID3/C4.5)
 information gain ratio
 gini index

13
witten&eibe
Which attribute to select?

14
witten&eibe
A criterion for attribute
selection
 Which is the best attribute?
 The one which will result in the smallest tree
 Smallest tree is not the same as shallowest tree!
 Heuristic: choose the attribute that produces
the “purest” nodes
 Popular impurity criterion: information gain
 Information gain increases with the average
purity of the subsets that an attribute produces
 Strategy: choose attribute that results in
greatest information gain

15
witten&eibe
*Claude “Father of
Shannon
Born: 30 April 1916 information
Died: 23 February 2001 theory”
Claude Shannon, who has died aged 84,
perhaps more than anyone laid the
groundwork for today’s digital revolution. His
exposition of information theory, stating that
all information could be represented
mathematically as a succession of noughts
and ones, facilitated the digital manipulation
of data without which today’s information
society would be unthinkable.
Shannon’s master’s thesis, obtained in 1940
at MIT, demonstrated that problem solving
could be achieved by manipulating the
symbols 0 and 1 in a process that could be
carried out automatically with electrical
circuitry. That dissertation has been hailed as
one of the most significant master’s theses of
the 20th century. Eight years later, Shannon
published another landmark paper, A
Mathematical Theory of Communication,
Shannon applied the same radical approach to cryptography research, in which
generally taken as his most important
he later became a consultant to the US government.
scientific contribution.
Many of Shannon’s pioneering insights were developed before they could be
applied in practical form. He was truly a remarkable man, yet unknown to most
16
ofwitten&eibe
the world.
Computing information

 Information is measured in bits

 Given a probability distribution, the info
required to predict an event is the distribution’s
entropy
 Entropy gives the information required in bits
(this can involve fractions of bits!)
 Formula for computing the entropy:
entropy( p1 , p2 , , pn )  p1logp1  p2 logp2   pn logpn

17
witten&eibe
Information Gain
 We now return to the problem of trying to
determine the best attribute to choose for a
particular node in a tree. The following measure
calculates a numerical value for a given
attribute, A, with respect to a set of examples, S.
Note that the values of attribute A will range
over a set of possibilities which we call
Values(A), and that, for a particular value from
that set, v, we write Sv for the set of examples
which have value v for attribute A.
 The information gain of attribute A, relative to a
collection of examples, S, is calculated as:

The information gain of an attribute can be seen as the expected

reduction in entropy caused by knowing the value of attribute A.
Example: attribute “Outlook”, 1
Outlook Temperature Humidity Windy Play?
sunny hot high false No
sunny hot high true No
overcast hot high false Yes
rain mild high false Yes
rain cool normal false Yes
rain cool normal true No
overcast cool normal true Yes
sunny mild high false No
sunny cool normal false Yes
rain mild normal false Yes
sunny mild normal true Yes
overcast mild high true Yes
overcast hot normal false Yes
rain mild high true No

19
witten&eibe
Computing the information
gain
 Information gain:
(information before split) – (information after
split)
gain(" Outlook" ) info([9,5] ) - info([2,3] , [4,0], [3,2]) 0.940 - 0.693
0.247 bits

 Information gain
gain(" for )attributes
Outlook" 0.247 bits from
weather gain("
data: Temperatur e" ) 0.029 bits
Most
Information
Gained
gain(" Humidity") 0.152 bits
gain(" Windy" ) 0.048 bits
20
witten&eibe
Continuing to split

gain(" Humidity") 0.971 bits

gain("Temperatur e" ) 0.571 bits

gain(" Windy" ) 0.020 bits

21
witten&eibe
The final decision tree

 Note: not all leaves need to be pure;

sometimes identical instances have
different classes
 Splitting stops when data can’t be split any
further
witten&eibe 22
Highly-branching attributes

 Problematic: attributes with a large

number of values (extreme case: ID code)
 Subsets are more likely to be pure if there
is a large number of values
Þ Information gain is biased towards choosing
attributes with a large number of values
Þ This may result in overfitting (selection of an
attribute that is non-optimal for prediction)

23
witten&eibe
Weather Data with ID code
ID Outlook Temperature Humidity Windy Play?
A sunny hot high false No
B sunny hot high true No
C overcast hot high false Yes
D rain mild high false Yes
E rain cool normal false Yes
F rain cool normal true No
G overcast cool normal true Yes
H sunny mild high false No
I sunny cool normal false Yes
J rain mild normal false Yes
K sunny mild normal true Yes
L overcast mild high true Yes
M overcast hot normal false Yes
N rain mild high true No

24
Split for ID Code Attribute

Entropy of split = 0 (since each leaf node is “pure”, having only

one case.

Information gain is maximal for ID code

gain(" ID" )info([9,5 ]) - 0 0.940
25
witten&eibe
Gain ratio
 Gain ratio: a modification of the
information gain that reduces its bias on
high-branch attributes
 Gain ratio should be
 Large when data is evenly spread
 Small when all data belong to one branch
 Gain ratio takes number and size of
branches into account when choosing an
attribute
 It corrects the information gain by taking the
intrinsic information of a split into account (i.e.
how much info do we need to tell which branch
26
witten&eibe
Gain Ratio and Intrinsic Info.
 Intrinsic information: entropy of
distribution of instances into branches
|S | |S |
IntrinsicInfo(S , A)   i log i .
|S| 2 | S |
 Gain ratio (Quinlan’86) normalizes info
gain by:

GainRatio(S , A)  Gain(S , A) .
IntrinsicInfo(S , A)

27
Computing the gain ratio
 Example: intrinsic information for ID code
info ([1,1,  ,1) 14 ( 1 / 14 log 1 / 14) 3.807 bits
 Importance of attribute decreases as
intrinsic information gets larger
 Example of gain ratio:
gain(" Attribute")
gain_ratio(" Attribute") 
intrinsic_info("Attribute")

 Example: gain_ratio(" ID_code") 0.940 bits 0.246

3.807 bits

28
witten&eibe
Gain ratios for weather data
Outlook Temperature
Info: 0.693 Info: 0.911
Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029
Split info: info([5,4,5]) 1.577 Split info: 1.362
info([4,6,4])
Gain ratio: 0.156 Gain ratio: 0.021
0.247/1.577 0.029/1.362

Humidity Windy
Info: 0.788 Info: 0.892
Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048
Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985
Gain ratio: 0.152/1 0.152 Gain ratio: 0.049
0.048/0.985

0.940 bits
gain_ratio(" ID_code")  0.246
3.807 bits

29
witten&eibe
More on the gain ratio
 “Outlook” still comes out top
 However: “ID code” has greater gain ratio
 Standard fix: ad hoc test to prevent splitting on
that type of attribute
 Problem with gain ratio: it may
overcompensate
 May choose an attribute just because its
intrinsic information is very low
 Standard fix:
 First, only consider attributes with greater than
average information gain
 Then, compare them on gain ratio
30
witten&eibe
*CART Splitting Criteria: Gini
Index
 If a data set T contains examples from n
classes, gini index, gini(T) is defined as
n
gini (T ) 1   p2j
j 1

where pj is the relative frequency of class j in T.

gini(T) is minimized if the classes in T are
skewed.

31
*Gini Index
After splitting T into two subsets T1 and T2 with
sizes N1 and N2, the gini index of the split
data is defined as
gini split (T )  N 1 gini(T 1)  N 2 gini(T 2 )
N N

 The attribute providing smallest ginisplit(T) is

chosen to split the node.

32
Discussion

 Algorithm for top-down induction of

decision trees (“ID3”) was developed by
Ross Quinlan
 Gain ratio just one modification of this basic
algorithm
 Led to development of C4.5, which can deal
with numeric attributes, missing values, and
noisy data
 Similar approach: CART (to be covered
later)
 There are many other attribute selection
33
Summary

 Top-Down Decision Tree Construction

 Choosing the Splitting Attribute
 Information Gain biased towards attributes
with a large number of values
 Gain Ratio takes number and size of
branches into account when choosing an
attribute

34
What is Random Forest ?
How does it work?
 Random Forest is a versatile machine learning method
capable of performing both regression and classification
tasks. It also undertakes dimensional reduction methods,
treats missing values, outlier values and other essential steps
of data exploration, and does a fairly good job. It is a type of
ensemble learning method, where a group of weak models
combine to form a powerful model.
 In Random Forest, we grow multiple trees as opposed to a
single tree in CART model. To classify a new object based on
attributes, each tree gives a classification and we say the
tree “votes” for that class. The forest chooses the
classification having the most votes

35
 It works in the following manner. Each tree is planted &
grown as follows:
 Assume number of cases in the training set is N. Then,
sample of these N cases is taken at random but with
replacement. This sample will be the training set for growing
the tree.
 If there are M input variables, a number m<M is specified
such that at each node, m variables are selected at random
out of the M. The best split on these m is used to split the
node. The value of m is held constant while we grow the
Each tree is grown to the largest extent
forest.
possible and there is no pruning.
Predict new data by aggregating the
predictions of the ntree trees (i.e., majority
votes for classification, average for
regression).

36
Advantages of Random
Forest
This algorithm can solve both type of problems i.e.
classification and regression and does a decent estimation at
both fronts.
 One of benefits of Random forest which excites me most is,
the power of handle large data set with higher
dimensionality. It can handle thousands of input variables
and identify most significant variables so it is considered as
one of the dimensionality reduction methods. Further, the
model outputs Importance of variable, which can be a
very handy feature (on some random data set).

37
 It has an effective method for estimating missing data and
maintains accuracy when a large proportion of the data are
missing.
 It has methods for balancing errors in data sets where
classes are imbalanced.
 The capabilities of the above can be extended to unlabeled
data, leading to unsupervised clustering, data views and
outlier detection.
 Random Forest involves sampling of the input data with
replacement called as bootstrap sampling. Here one third of
the data is not used for training and can be used to testing.
These are called the out of bag samples. Error estimated on
these out of bag samples is known as out of bag error. Study
of error estimates by Out of bag, gives evidence to show that
the out-of-bag estimate is as accurate as using a test set of
the same size as the training set. Therefore, using the out-of-
bag error estimate removes
38 the need for a set aside test set.
Disadvantages of Random
Forest
 It surely does a good job at classification but not as
good as for regression problem as it does not give
precise continuous nature predictions. In case of
regression, it doesn’t predict beyond the range in
the training data, and that they may over-fit data
sets that are particularly noisy.
 Random Forest can feel like a black box approach
for statistical modelers – you have very little
control on what the model does. You can at best –
try different parameters and random seeds!

39
 KDnugget : Data Mining Course by G.
Platesky Shapiro and G. Parker
 Chapter 7- Text Book J. Han and M. Kamber

Decision Tree
100% (4)
Decision Tree
66 pages
Decision Tree & Techniques
71% (7)
Decision Tree & Techniques
41 pages
2.4.3. Lerchs-Grossman Algorithm
100% (2)
2.4.3. Lerchs-Grossman Algorithm
18 pages
New PPT-2
No ratings yet
New PPT-2
12 pages
SLAM Techniques and Algorithms
No ratings yet
SLAM Techniques and Algorithms
73 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano, and TensorFlow (Machine Learning in Python) by LazyProgrammer
No ratings yet
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano, and TensorFlow (Machine Learning in Python) by LazyProgrammer
183 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Adaptive Digital Filters
No ratings yet
Adaptive Digital Filters
10 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
DSA Syllabus
No ratings yet
DSA Syllabus
4 pages
PDF2
No ratings yet
PDF2
720 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
ML Classification Tree
No ratings yet
ML Classification Tree
36 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
Mod 3 Part1 - Merged
No ratings yet
Mod 3 Part1 - Merged
101 pages
Unit 4a Decision Tree
No ratings yet
Unit 4a Decision Tree
90 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
Digital Signal Processing: Iv B .Tech I Sem - R19
No ratings yet
Digital Signal Processing: Iv B .Tech I Sem - R19
408 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Trees
No ratings yet
Trees
78 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Classification
No ratings yet
Classification
75 pages
Chapter 4 Least-Mean-Square Algorithm (LMS Algorithm)
No ratings yet
Chapter 4 Least-Mean-Square Algorithm (LMS Algorithm)
10 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Tree Models
No ratings yet
Tree Models
42 pages
SMA 2471 Numerical Analysis TIE
No ratings yet
SMA 2471 Numerical Analysis TIE
6 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Hopfield Networks and Boltzman Machines-Part 1
100% (1)
Hopfield Networks and Boltzman Machines-Part 1
13 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Class Basic
No ratings yet
Class Basic
75 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
DAA Assignment1
No ratings yet
DAA Assignment1
8 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
DSP 2 Marks - Au Ece - Unit 1 & 2
100% (2)
DSP 2 Marks - Au Ece - Unit 1 & 2
2 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
06 Binsearch
No ratings yet
06 Binsearch
45 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Denoising Audio Signals Using MATLAB
No ratings yet
Denoising Audio Signals Using MATLAB
7 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Pca 1692550768
No ratings yet
Pca 1692550768
13 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Lecture#02, DAA, Designing Algorithms, Calculating Costs
No ratings yet
Lecture#02, DAA, Designing Algorithms, Calculating Costs
24 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Solution of Linear Programming Problems With Matlab
No ratings yet
Solution of Linear Programming Problems With Matlab
3 pages
Lecture 09 - Pulse Code Modulation I
No ratings yet
Lecture 09 - Pulse Code Modulation I
27 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
05 Gradient Descent
No ratings yet
05 Gradient Descent
14 pages
Fibonicci: Example
No ratings yet
Fibonicci: Example
7 pages
Roman Urdu News Headline Classification Empowered With Machine Learning
No ratings yet
Roman Urdu News Headline Classification Empowered With Machine Learning
16 pages
Mid Sem II Questions: What Is Symbol Rate Packing?
No ratings yet
Mid Sem II Questions: What Is Symbol Rate Packing?
2 pages
12 Thdec ACA
No ratings yet
12 Thdec ACA
8 pages
An Efficient K-Means Clustering Algorithm
No ratings yet
An Efficient K-Means Clustering Algorithm
7 pages
TD1 ELTP 2023 Correction
No ratings yet
TD1 ELTP 2023 Correction
6 pages
Flops PDF
No ratings yet
Flops PDF
6 pages
Polynomials & Factoring Worksheet
No ratings yet
Polynomials & Factoring Worksheet
2 pages
HCS Clustering Algorithm
No ratings yet
HCS Clustering Algorithm
2 pages
Machine Learning 2: Exercise Sheet 1
No ratings yet
Machine Learning 2: Exercise Sheet 1
2 pages

Decision Tree Intro MDT903

Uploaded by

Decision Tree Intro MDT903

Uploaded by

Classification:

 Decision tree is a type of supervised

 Types of decision tree is based on the type of

 Over fitting: Over fitting is one of the most

 Top-Down Decision Tree Construction

high normal true false

 Top-down tree construction

 At each node, available attributes are

 Information is measured in bits

The information gain of an attribute can be seen as the expected

gain(" Humidity") 0.971 bits

gain(" Windy" ) 0.020 bits

 Note: not all leaves need to be pure;

 Problematic: attributes with a large

Entropy of split = 0 (since each leaf node is “pure”, having only

Information gain is maximal for ID code

 Example: gain_ratio(" ID_code") 0.940 bits 0.246

where pj is the relative frequency of class j in T.

 The attribute providing smallest ginisplit(T) is

 Algorithm for top-down induction of

 Top-Down Decision Tree Construction

You might also like