0% found this document useful (0 votes)
14 views7 pages

Practical - 6 Aim:: Decision Tree

Uploaded by

wadhaniyash14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Practical - 6 Aim:: Decision Tree

Uploaded by

wadhaniyash14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

CS3EL15(P): Machine Learning Laboratory Experiment No.

6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.

PRACTICAL - 6
Aim:
To implement the Decision Tree algorithm using Python on a suitable dataset and analyze its performance
using Cross Validation and Percentage Split techniques, thereby evaluating the model’s accuracy and
generalization capability.

Theory:
 Decision Tree
A Decision Tree is a simple, intuitive diagram that helps in making decisions by mapping out various
choices and their possible outcomes. It is represented in the form of a tree-like structure that breaks
down a complex decision-making process into smaller parts.
It allows users to visualize decisions and their consequences clearly and supports decision-making by
analyzing different possible outcomes.

Structure of a Decision Tree


A decision tree is a hierarchical model with several components:
 Root Node: The starting point of the tree. It represents the entire dataset and the first decision to be
made.
 Branches: These represent the flow of decisions from one node to another based on feature values.
 Internal Nodes: These are decision points where the data is split based on specific conditions
(features).
 Leaf Nodes (Terminal Nodes): These are the end points that represent the final decision or outcome.

Example:
If you’re deciding whether to drink coffee based on the time of day and how tired you are, the root node
checks the time. If it’s morning, the next decision checks tiredness.

If tired → Drink Coffee,


16
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.

if not → No Coffee.
This kind of logic flow is how a decision tree operates.

Classification of Decision Trees


Decision trees are categorized based on the type of target variable:
 Classification Trees: Used for categorical outputs (e.g., spam or not spam, yes or no, pass or fail).
 Regression Trees: Used for continuous numerical outputs (e.g., predicting house prices, stock
values, etc.).

How Decision Trees Work


1. The process begins at the root node, selecting the best feature to split the dataset using criteria like
Gini Index, Information Gain, or Entropy.
2. The data is split into subsets by asking yes/no or condition-based questions.
3. This splitting continues until:
o All data points in a node belong to a single class, or
o No more splits can improve the model.
Each path from root to a leaf represents a decision rule that leads to an outcome.

Program:

Step 1: Import Libraries

17
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.

Step 2: Load dataset (Iris dataset as an example)

Step 3: Create and train the Decision Tree Classifier

Step 4: Predict on the training data

Step 5: Print Accuracy

Step 5: Visualize the Decision Tree

Output:

18
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.

Advantages of Decision Trees


 Simple and Easy to Interpret: Decision trees resemble flowcharts, making it easy to understand.

 Versatile: Can be used for classification and regression tasks.


 No Need for Scaling: Feature scaling (normalization/standardization) is not required.
 Captures Non-linear Relationships: Effectively models complex decision boundaries.

Disadvantages of Decision Trees


 Overfitting: Trees may learn noise in the data, reducing generalization to unseen data.
 Instability: Small changes in data can drastically alter the structure of the tree.
 Bias toward Categorical Features with Many Levels: Features with more unique values may
dominate decision making unfairly.

Applications of Decision Trees


1. Loan Approval in Banking: Predict approval/rejection based on features like income, credit score,
and employment history.
2. Medical Diagnosis: Identify health conditions (e.g., diabetes) using features like glucose level, BMI,
etc.
3. Predicting Exam Results in Education: Predict student performance using data like attendance,
study time, and past grades.

19
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.

Analyzing Decision Tree Algorithm using Cross Validation and Percentage Split
To assess the effectiveness and generalization capability of the decision tree model, we use evaluation
techniques like Train-Test Split and Cross Validation. These methods help us determine how well the
model performs on unseen data and avoid overfitting or underfitting.
1. Train-Test Split (Percentage Split)
The Train-Test Split is a simple and commonly used method to evaluate model performance. In this
method:
 The dataset is divided into two parts:
o Training Set: Used to train the model (typically 70% or 80% of data).
o Testing Set: Used to test the model’s prediction accuracy on unseen data (remaining 30% or
20%).
 The model is trained on the training set and then evaluated using the testing set.
 Evaluation metrics like Accuracy, Confusion Matrix, and Classification Report are used.
Advantage: Easy and quick to implement.
Limitation: Performance can vary based on how the data is split.
2. Cross Validation
Cross Validation is a more robust technique for model evaluation. It reduces the variance associated with
random Train-Test splits by averaging performance across multiple splits.
In k-Fold Cross Validation:
 The dataset is divided into k equal parts (folds).
 The model is trained on k-1 folds and tested on the remaining 1 fold.
 This process is repeated k times, each time with a different fold as the test set.
 The final accuracy is the mean of accuracies across all folds.
Advantage: Provides a more reliable and generalized estimate of model performance.
Limitation: Computationally more expensive compared to a single Train-Test split.
Why Analyze with Both?
Using both Train-Test Split and Cross Validation provides a comprehensive understanding of model
performance:
 Train-Test Split shows how the model performs in a specific split scenario.
 Cross Validation helps validate if the model performance is consistent and generalizable across
different data distributions.
Combining both methods helps in selecting the best model parameters and ensures that the model is not
biased or overfitted to a particular dataset split.

20
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.

Program:

Step 1: Importing necessary libraries

Step 2: Load dataset (Iris dataset)

Step 3: Train-Test Split Analysis

Step 4: Accuracy and Evaluation Metrics

Step 5: Cross Validation Analysis

Step 6: Visualize the Decision Tree

21
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.

Output:

22
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129

You might also like