What Is Decision Tree
What Is Decision Tree
ID3 : This algorithm measures how mixed up the data is at a node using something
called entropy. It then chooses the feature that helps to clarify the data the most.
C4.5 : This is an improved version of ID3 that can handle missing data and continuous
attributes.
CART : This algorithm uses a different measure called Gini impurity to decide how to
split the data. It can be used for both classification (sorting data into categories) and
regression (predicting continuous values) tasks.
Decision Tree Terminologies
• Root Node: The initial node at the beginning of a decision tree, where the entire
population or dataset starts dividing based on various features or conditions.
• Decision Nodes: Nodes resulting from the splitting of root nodes are known as
decision nodes. These nodes represent intermediate decisions or conditions within
the tree.
• Leaf Nodes: Nodes where further splitting is not possible, often indicating the final
classification or outcome. Leaf nodes are also referred to as terminal nodes.
• Sub-Tree: Similar to a subsection of a graph being called a sub-graph, a sub-section
of a these tree is referred to as a sub-tree. It represents a specific portion of the
decision tree.
• Pruning: The process of removing or cutting down specific nodes in a tree to
prevent overfitting and simplify the model.
• Branch / Sub-Tree: A subsection of the entire is referred to as a branch or sub-tree.
It represents a specific path of decisions and outcomes within the tree.
• Parent and Child Node: In a decision tree, a node that is divided into sub-nodes is
known as a parent node, and the sub-nodes emerging from it are referred to as
child nodes. The parent node represents a decision or condition, while the child
nodes represent the potential outcomes or further decisions based on that
condition.
Example of Decision Tree
Did you notice anything in the above flowchart? We see that if the weather is cloudy then we must go to play.
Why didn’t it split more? Why did it stop there?
To answer this question, we need to know about few more concepts like entropy, information gain, and Gini
index. But in simple terms, I can say here that the output for the training dataset is always yes for cloudy weather,
since there is no disorderliness here we don’t need to split the node further.
The goal of machine learning is to decrease uncertainty or disorders from the dataset and for this, we use these
trees.
Now you must be thinking how do I know what should be the root node? what should be the decision node?
when should I stop splitting? To decide this, there is a metric called “Entropy” which is the amount of uncertainty
in the dataset.
How Decision tree algorithms work?
• Decision Tree algorithm works in simpler steps:
• Starting at the Root: The algorithm begins at the top, called the
“root node,” representing the entire dataset.
• Asking the Best Questions: It looks for the most important feature
or question that splits the data into the most distinct groups. This
is like asking a question at a fork in the tree.
• Branching Out: Based on the answer to that question, it divides
the data into smaller subsets, creating new branches. Each branch
represents a possible route through the tree.
• Repeating the Process: The algorithm continues asking questions
and splitting the data at each branch until it reaches the final “leaf
nodes,” representing the predicted outcomes or classifications.
Advantages of Decision Trees
• Easy to Understand: They are simple to visualize and
interpret, making them easy to understand even for non-
experts.
• Handles Both Numerical and Categorical Data: They can work
with both types of data without needing much preprocessing.
• No Need for Data Scaling: These trees do not require
normalization or scaling of data.
• Automated Feature Selection: They automatically identify the
most important features for decision-making.
• Handles Non-Linear Relationships: They can capture non-
linear patterns in the data effectively.
Disadvantages of Decision Trees
• Overfitting Risk: It can easily overfit the training
data, especially if they are too deep.
• Unstable with Small Changes: Small changes in data
can lead to completely different trees.
• Biased with Imbalanced Data: They tend to be
biased if one class dominates the dataset.
• Limited to Axis-Parallel Splits: They struggle with
diagonal or complex decision boundaries.
• Can Become Complex: Large trees can become hard
to interpret and may lose their simplicity.
Applications of Decision Trees
• Healthcare
• Diagnosing diseases based on patient symptoms:
• Predicting patient outcomes and treatment effectiveness
• Identifying risk factors for specific health conditions:
• Finance
• Assessing credit risk for loan approvals
• Detecting fraudulent transactions
• Predicting stock market trends and investment risks
• Education
• Predicting student performance and outcomes
• Identifying factors affecting student dropout rates
• Personalizing learning paths for students
• import pandas as pd
• from sklearn.model_selection import train_test_split
• from sklearn.tree import DecisionTreeClassifier, plot_tree
• from sklearn import metrics
• from sklearn.preprocessing import LabelEncoder
• import matplotlib.pyplot as plt
• data = {
• 'Age': [36, 42, 23, 52, 43, 44, 66, 35, 52, 35, 24, 18, 45],
• 'Experience': [10, 12, 4, 4, 21, 14, 3, 14, 13, 5, 3, 3, 9],
• 'Rank': [9, 4, 6, 4, 8, 5, 7, 9, 7, 9, 5, 7, 9],
• 'Nationality': ['UK', 'USA', 'N', 'USA', 'USA', 'UK', 'N', 'UK', 'N', 'N', 'USA', 'UK', 'UK'],
• 'Go': ['NO', 'NO', 'NO', 'NO', 'YES', 'NO', 'YES', 'YES', 'YES', 'YES', 'NO', 'YES', 'YES']
• }
• df = pd.DataFrame(data)
• print("Dataset:")
• print(df)
• le_nationality = LabelEncoder()
• le_go = LabelEncoder()
• df['Nationality'] = le_nationality.fit_transform(df['Nationality'])
• df['Go'] = le_go.fit_transform(df['Go'])
• X = df.drop('Go', axis=1)
• y = df['Go']
• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
• dtree = DecisionTreeClassifier(random_state=42)
• dtree.fit(X_train, y_train)
• y_pred = dtree.predict(X_test)
• print("\nAccuracy on the test set:", metrics.accuracy_score(y_test, y_pred))
• plt.figure(figsize=(12,8))
• plot_tree(dtree, filled=True, feature_names=X.columns, class_names=le_go.classes_, rounded=True, proportion=True)
• plt.show()
Random Forest Algorithm
• A Random Forest Algorithm is a supervised machine learning
algorithm that is extremely popular and is used for Classification and
Regression problems in Machine Learning.
• We know that a forest comprises numerous trees, and the more trees
more it will be robust.
• Similarly, the greater the number of trees in a Random Forest
Algorithm, the higher its accuracy and problem-solving ability.
• Random Forest is a classifier that contains several decision trees on
various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset.
• It is based on the concept of ensemble learning which is a process of
combining multiple classifiers to solve a complex problem and
improve the performance of the model.
Steps to follow
• The following steps explain the working Random
Forest Algorithm:
• Step 1: Select random samples from a given data or
training set.
• Step 2: This algorithm will construct a decision tree
for every training data.
• Step 3: Voting will take place by averaging the
decision tree.
• Step 4: Finally, select the most voted prediction
result as the final prediction result.
• This combination of multiple models is called
Ensemble. Ensemble uses two methods:
• Bagging: Creating a different training subset from
sample training data with replacement is called
Bagging. The final output is based on majority
voting.
• Boosting: Combing weak learners into strong
learners by creating sequential models such that
the final model has the highest accuracy is called
Boosting. Example: ADA BOOST, XG BOOST.
• Bagging: From the principle mentioned above, we can understand Random forest uses the
Bagging code.
• Bagging is also known as Bootstrap Aggregation used by random forest.
• The process begins with any original random data.
• After arranging, it is organised into samples known as Bootstrap Sample. This process is
known as Bootstrapping.
• Further, the models are trained individually, yielding different results known as
Aggregation.
• In the last step, all the results are combined, and the generated output is based on
majority voting.
• This step is known as Bagging and is done using an Ensemble Classifier.
Decision Trees Random Forest
Non-Linear SVM:
What is Naive Bayes Classifier?
• Naïve Bayes Classifier is belongs to a family of
generative learning algorithms, aiming to
model the distribution of inputs within a
specific class or category. Unlike discriminative
classifiers such as logistic regression, it
doesn’t learn which features are most crucial
for distinguishing between classes. It’s widely
used in text classification, spam filtering, and
recommendation systems.
What is the Naive Bayes Algorithm?
• Definition: Naive Bayes is a classification technique based on Bayes'
Theorem with an independence assumption among predictors.
• Assumption: Assumes that the presence of a feature in a class is
independent of other features.
• Type: A supervised machine learning algorithm.
• Category: Belongs to generative learning algorithms, modeling input
distribution for each class.
• Usage: Commonly used in text classification, spam detection, sentiment
analysis, etc.
• Advantage: Fast, efficient, and works well with high-dimensional data.
• Limitation: Assumption of feature independence may not always hold
true.
Example
Let’s take a silly little example – Say the likelihood of a person
having Arthritis if they are over 65 years of age is 49%.
Check the above stats at: Centre for Disease Control and
Prevention
Now, let’s assume the following:
Class Prior: The probability of a person stepping in the clinic being
>65-year-old is 20%
Predictor Prior: The probability of a person stepping into the clinic
having Arthritis is 35%
What is the probability that a person is >65 years given that he has
Arthritis? This is Let’s calculate this with the help of Bayes’
theorem!
Step 1 – Collect raw data
Step 2 – Convert data to a frequency table(s)
Step 3 – Calculate prior probability and
evidence
Step 4 – Apply probabilities to Bayes’ Theorem equation
Let’s say you want to focus on the likelihood that you go for a run given that it’s sunny outside.