0% found this document useful (0 votes)
0 views35 pages

What Is Decision Tree

Decision trees are a machine learning tool used for classification and regression, characterized by a hierarchical structure of nodes. They simplify complex decisions into smaller, understandable steps and can handle both numerical and categorical data. While they are easy to interpret, they risk overfitting and can be unstable with small data changes, making ensemble methods like Random Forest more robust alternatives.

Uploaded by

Kala Mca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views35 pages

What Is Decision Tree

Decision trees are a machine learning tool used for classification and regression, characterized by a hierarchical structure of nodes. They simplify complex decisions into smaller, understandable steps and can handle both numerical and categorical data. While they are easy to interpret, they risk overfitting and can be unstable with small data changes, making ensemble methods like Random Forest more robust alternatives.

Uploaded by

Kala Mca
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

What is Decision Tree?

• Decision trees are a simple machine learning


tool used for classification and regression
tasks. They break complex decisions into
smaller steps, making them easy to
understand and implement.
Understanding Decision Tree

• A decision tree, which has a hierarchical structure made


up of root, branches, internal, and leaf nodes, is a non-
parametric supervised learning approach used for
classification and regression applications.
• It is a tool that has applications spanning several
different areas. These trees can be used for
classification as well as regression problems. The name
itself suggests that it uses a flowchart like a tree
structure to show the predictions that result from a
series of feature-based splits. It starts with a root node
and ends with a decision made by leaves.
Types of Decision Tree

ID3 : This algorithm measures how mixed up the data is at a node using something
called entropy. It then chooses the feature that helps to clarify the data the most.
C4.5 : This is an improved version of ID3 that can handle missing data and continuous
attributes.
CART : This algorithm uses a different measure called Gini impurity to decide how to
split the data. It can be used for both classification (sorting data into categories) and
regression (predicting continuous values) tasks.
Decision Tree Terminologies
• Root Node: The initial node at the beginning of a decision tree, where the entire
population or dataset starts dividing based on various features or conditions.
• Decision Nodes: Nodes resulting from the splitting of root nodes are known as
decision nodes. These nodes represent intermediate decisions or conditions within
the tree.
• Leaf Nodes: Nodes where further splitting is not possible, often indicating the final
classification or outcome. Leaf nodes are also referred to as terminal nodes.
• Sub-Tree: Similar to a subsection of a graph being called a sub-graph, a sub-section
of a these tree is referred to as a sub-tree. It represents a specific portion of the
decision tree.
• Pruning: The process of removing or cutting down specific nodes in a tree to
prevent overfitting and simplify the model.
• Branch / Sub-Tree: A subsection of the entire is referred to as a branch or sub-tree.
It represents a specific path of decisions and outcomes within the tree.
• Parent and Child Node: In a decision tree, a node that is divided into sub-nodes is
known as a parent node, and the sub-nodes emerging from it are referred to as
child nodes. The parent node represents a decision or condition, while the child
nodes represent the potential outcomes or further decisions based on that
condition.
Example of Decision Tree
Did you notice anything in the above flowchart? We see that if the weather is cloudy then we must go to play.
Why didn’t it split more? Why did it stop there?
To answer this question, we need to know about few more concepts like entropy, information gain, and Gini
index. But in simple terms, I can say here that the output for the training dataset is always yes for cloudy weather,
since there is no disorderliness here we don’t need to split the node further.
The goal of machine learning is to decrease uncertainty or disorders from the dataset and for this, we use these
trees.
Now you must be thinking how do I know what should be the root node? what should be the decision node?
when should I stop splitting? To decide this, there is a metric called “Entropy” which is the amount of uncertainty
in the dataset.
How Decision tree algorithms work?
• Decision Tree algorithm works in simpler steps:
• Starting at the Root: The algorithm begins at the top, called the
“root node,” representing the entire dataset.
• Asking the Best Questions: It looks for the most important feature
or question that splits the data into the most distinct groups. This
is like asking a question at a fork in the tree.
• Branching Out: Based on the answer to that question, it divides
the data into smaller subsets, creating new branches. Each branch
represents a possible route through the tree.
• Repeating the Process: The algorithm continues asking questions
and splitting the data at each branch until it reaches the final “leaf
nodes,” representing the predicted outcomes or classifications.
Advantages of Decision Trees
• Easy to Understand: They are simple to visualize and
interpret, making them easy to understand even for non-
experts.
• Handles Both Numerical and Categorical Data: They can work
with both types of data without needing much preprocessing.
• No Need for Data Scaling: These trees do not require
normalization or scaling of data.
• Automated Feature Selection: They automatically identify the
most important features for decision-making.
• Handles Non-Linear Relationships: They can capture non-
linear patterns in the data effectively.
Disadvantages of Decision Trees
• Overfitting Risk: It can easily overfit the training
data, especially if they are too deep.
• Unstable with Small Changes: Small changes in data
can lead to completely different trees.
• Biased with Imbalanced Data: They tend to be
biased if one class dominates the dataset.
• Limited to Axis-Parallel Splits: They struggle with
diagonal or complex decision boundaries.
• Can Become Complex: Large trees can become hard
to interpret and may lose their simplicity.
Applications of Decision Trees
• Healthcare
• Diagnosing diseases based on patient symptoms:
• Predicting patient outcomes and treatment effectiveness
• Identifying risk factors for specific health conditions:
• Finance
• Assessing credit risk for loan approvals
• Detecting fraudulent transactions
• Predicting stock market trends and investment risks
• Education
• Predicting student performance and outcomes
• Identifying factors affecting student dropout rates
• Personalizing learning paths for students
• import pandas as pd
• from sklearn.model_selection import train_test_split
• from sklearn.tree import DecisionTreeClassifier, plot_tree
• from sklearn import metrics
• from sklearn.preprocessing import LabelEncoder
• import matplotlib.pyplot as plt
• data = {
• 'Age': [36, 42, 23, 52, 43, 44, 66, 35, 52, 35, 24, 18, 45],
• 'Experience': [10, 12, 4, 4, 21, 14, 3, 14, 13, 5, 3, 3, 9],
• 'Rank': [9, 4, 6, 4, 8, 5, 7, 9, 7, 9, 5, 7, 9],
• 'Nationality': ['UK', 'USA', 'N', 'USA', 'USA', 'UK', 'N', 'UK', 'N', 'N', 'USA', 'UK', 'UK'],
• 'Go': ['NO', 'NO', 'NO', 'NO', 'YES', 'NO', 'YES', 'YES', 'YES', 'YES', 'NO', 'YES', 'YES']
• }
• df = pd.DataFrame(data)
• print("Dataset:")
• print(df)
• le_nationality = LabelEncoder()
• le_go = LabelEncoder()
• df['Nationality'] = le_nationality.fit_transform(df['Nationality'])
• df['Go'] = le_go.fit_transform(df['Go'])
• X = df.drop('Go', axis=1)
• y = df['Go']
• X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
• dtree = DecisionTreeClassifier(random_state=42)
• dtree.fit(X_train, y_train)
• y_pred = dtree.predict(X_test)
• print("\nAccuracy on the test set:", metrics.accuracy_score(y_test, y_pred))
• plt.figure(figsize=(12,8))
• plot_tree(dtree, filled=True, feature_names=X.columns, class_names=le_go.classes_, rounded=True, proportion=True)
• plt.show()
Random Forest Algorithm
• A Random Forest Algorithm is a supervised machine learning
algorithm that is extremely popular and is used for Classification and
Regression problems in Machine Learning.
• We know that a forest comprises numerous trees, and the more trees
more it will be robust.
• Similarly, the greater the number of trees in a Random Forest
Algorithm, the higher its accuracy and problem-solving ability.
• Random Forest is a classifier that contains several decision trees on
various subsets of the given dataset and takes the average to improve
the predictive accuracy of that dataset.
• It is based on the concept of ensemble learning which is a process of
combining multiple classifiers to solve a complex problem and
improve the performance of the model.
Steps to follow
• The following steps explain the working Random
Forest Algorithm:
• Step 1: Select random samples from a given data or
training set.
• Step 2: This algorithm will construct a decision tree
for every training data.
• Step 3: Voting will take place by averaging the
decision tree.
• Step 4: Finally, select the most voted prediction
result as the final prediction result.
• This combination of multiple models is called
Ensemble. Ensemble uses two methods:
• Bagging: Creating a different training subset from
sample training data with replacement is called
Bagging. The final output is based on majority
voting.
• Boosting: Combing weak learners into strong
learners by creating sequential models such that
the final model has the highest accuracy is called
Boosting. Example: ADA BOOST, XG BOOST.
• Bagging: From the principle mentioned above, we can understand Random forest uses the
Bagging code.
• Bagging is also known as Bootstrap Aggregation used by random forest.
• The process begins with any original random data.
• After arranging, it is organised into samples known as Bootstrap Sample. This process is
known as Bootstrapping.
• Further, the models are trained individually, yielding different results known as
Aggregation.
• In the last step, all the results are combined, and the generated output is based on
majority voting.
• This step is known as Bagging and is done using an Ensemble Classifier.
Decision Trees Random Forest

• Since they are created from subsets of


data and the final output is based on
• They usually suffer from the problem of overfitting if it’s allowed
average or majority ranking, the
to grow without any control.
problem of overfitting doesn’t happen
here.

• A single decision tree is comparatively faster in computation. • It is slower.

• Random Forest randomly selects


observations, builds a decision tree and
• They use a particular set of rules when a data set with features
then the result is obtained based on
are taken as input.
majority voting. No formulas are
required here.
Why Use a Random Forest Algorithm?
• There are a lot of benefits to using Random
Forest Algorithm, but one of the main
advantages is that it reduces the risk of
overfitting and the required training time.
• Additionally, it offers a high level of accuracy.
Random Forest algorithm runs efficiently in
large databases and produces highly accurate
predictions by estimating missing data.
Support Vector Machine Algorithm
• Support Vector Machine or SVM is one of the most popular
Supervised Learning algorithms, which is used for Classification as
well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so
that we can easily put the new data point in the correct category in
the future. This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the
hyperplane. These extreme cases are called as support vectors, and
hence algorithm is termed as Support Vector Machine. Consider the
below diagram in which there are two different categories that are
classified using a decision boundary or hyperplane:
• Example: SVM can be understood with the example that we
have used in the KNN classifier. Suppose we see a strange cat
that also has some features of dogs, so if we want a model
that can accurately identify whether it is a cat or dog, so such
a model can be created by using the SVM algorithm. We will
first train our model with lots of images of cats and dogs so
that it can learn about different features of cats and dogs, and
then we test it with this strange creature. So as support vector
creates a decision boundary between these two data (cat and
dog) and choose extreme cases (support vectors), it will see
the extreme case of cat and dog. On the basis of the support
vectors, it will classify it as a cat. Consider the below diagram:
• Types of SVM
• SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly separable data,
which means if a dataset can be classified into two classes by
using a single straight line, then such data is termed as
linearly separable data, and classifier is used called as Linear
SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly
separated data, which means if a dataset cannot be classified
by using a straight line, then such data is termed as non-linear
data and classifier used is called as Non-linear SVM classifier.
Linear SVM:
How does SVM works?

Non-Linear SVM:
What is Naive Bayes Classifier?
• Naïve Bayes Classifier is belongs to a family of
generative learning algorithms, aiming to
model the distribution of inputs within a
specific class or category. Unlike discriminative
classifiers such as logistic regression, it
doesn’t learn which features are most crucial
for distinguishing between classes. It’s widely
used in text classification, spam filtering, and
recommendation systems.
What is the Naive Bayes Algorithm?
• Definition: Naive Bayes is a classification technique based on Bayes'
Theorem with an independence assumption among predictors.
• Assumption: Assumes that the presence of a feature in a class is
independent of other features.
• Type: A supervised machine learning algorithm.
• Category: Belongs to generative learning algorithms, modeling input
distribution for each class.
• Usage: Commonly used in text classification, spam detection, sentiment
analysis, etc.
• Advantage: Fast, efficient, and works well with high-dimensional data.
• Limitation: Assumption of feature independence may not always hold
true.
Example
Let’s take a silly little example – Say the likelihood of a person
having Arthritis if they are over 65 years of age is 49%.
Check the above stats at: Centre for Disease Control and
Prevention
Now, let’s assume the following:
Class Prior: The probability of a person stepping in the clinic being
>65-year-old is 20%
Predictor Prior: The probability of a person stepping into the clinic
having Arthritis is 35%
What is the probability that a person is >65 years given that he has
Arthritis? This is Let’s calculate this with the help of Bayes’
theorem!
Step 1 – Collect raw data
Step 2 – Convert data to a frequency table(s)
Step 3 – Calculate prior probability and
evidence
Step 4 – Apply probabilities to Bayes’ Theorem equation
Let’s say you want to focus on the likelihood that you go for a run given that it’s sunny outside.

The Naïve Bayes Algorithm


Naïve Bayes assumes conditional independence over the training dataset. The classifier
separates data into different classes according to the Bayes’ Theorem. But assumes that the
relationship between all input features in a class is independent. Hence, the model is
called naïve.
This helps in simplifying the calculations by dropping the denominator from the formula while
assuming independence:
Let’s understand this through our running resolution example:
Say you want to predict if on the coming Wednesday, given the following weather conditions,
should you go for a run or sleep in:
Outlook: Rainy
Humidity: Normal
Wind: Weak
Run: ?
Likelihood of ‘Yes’ on Wednesday:
P(Outlook = Rainy|Yes) * P(Humidity = Normal|Yes) * P(Wind = Weak|Yes) * P(Yes)
= 1/8 * 1/8 * 9/9 * 8/14 = 0.0089
Likelihood of ‘No’ on Wednesday:
P(Outlook = Rainy|No) * P(Humidity = Normal|No) * P(Wind = Weak|No) * P(No)
= 3/6 * 3/6 * 2/5 * 6/14 = 0.0428
Now, to determine the probability of going for a run on Wednesday, you just need to divide
P(Yes) with the sum of the likelihoods of Yes and No.
P(Yes) = 0.0089 / (0.0089 + 0.0428) = 0.172
Similarly, P(No) = 0.0428 / (0.0089 + 0.0428) = 0.827
According to your model, it looks like there’s an almost 83% probability that you’re going to stay
under the covers next Wednesday!

You might also like