8.1. Machine Learning Decision Tree
8.1. Machine Learning Decision Tree
Decision Tree
Part – 1
Dr. Oybek Eraliev,
Department of Computer Engineering
Inha University In Tashkent.
Email: [email protected]
Ø In this lecture, we will cover all aspects of the decision tree algorithm,
including the working principles, different types of decision trees, the
process of building decision trees, and how to evaluate and optimize
these.
Ø It starts with a root node and ends with a decision made by leaves.
Ø C4.5: This is an improved version of ID3 that can handle missing data and
continuous attributes.
Ø Suppose you have a group of friends who decides which movie they can
watch together on Sunday. There are 2 choices for movies, one
is “Lucy” and the second is “Titanic” and now everyone has to tell their
choice. After everyone gives their answer we see that “Lucy” gets 4
votes and “Titanic” gets 5 votes. Which movie do we watch now? Isn’t it
hard to choose 1 movie now because the votes for both the movies are
somewhat equal.
Ø Here we could easily say that the majority of votes are for “Lucy” hence
everyone will be watching this movie.
𝐸 𝑆 = −𝑝 ! 𝑙𝑜𝑔𝑝 ! − 𝑝"𝑙𝑜𝑔𝑝"
Ø Here,
• p+ is the probability of positive class
• p– is the probability of negative class
• S is the subset of the training example
Ø For this, we bring a new metric called “Information gain” which tells us
how much the parent entropy has decreased after splitting it with some
feature.
Ø It is just entropy of the full dataset – entropy of the dataset given some
feature.
Ø Now we have two features to predict whether he/she will go to the gym
or not.
• Feature 1 is “Energy” which takes two values “high” and “low”
• Feature 2 is “Motivation” which takes 3 values “No motivation”,
“Neutral” and “Highly motivated”.
Ø We now see that the “Energy” feature gives more reduction which is 0.37
than the “Motivation” feature. Hence we will select the feature which has
the highest information gain and then split the node based on that
feature.
Ø In this example “Energy” will be our root node and we’ll do the same for
sub-nodes. Here we can see that when the energy is “high” the entropy is
low and hence we can say a person will definitely go to the gym if he has
high energy, but what if the energy is low? We will again split the node
based on the new feature which is “Motivation”.
Ø The more the value of max_depth, the more complex your tree will be.
The training error will off-course decrease if we increase
the max_depth value but when our test data comes into the picture, we
will get a very bad accuracy.
Ø Hence you need a value that will not overfit as well as underfit our data
and for this, you can use GridSearchCV.