Practical - 6 Aim:: Decision Tree
Practical - 6 Aim:: Decision Tree
6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.
PRACTICAL - 6
Aim:
To implement the Decision Tree algorithm using Python on a suitable dataset and analyze its performance
using Cross Validation and Percentage Split techniques, thereby evaluating the model’s accuracy and
generalization capability.
Theory:
Decision Tree
A Decision Tree is a simple, intuitive diagram that helps in making decisions by mapping out various
choices and their possible outcomes. It is represented in the form of a tree-like structure that breaks
down a complex decision-making process into smaller parts.
It allows users to visualize decisions and their consequences clearly and supports decision-making by
analyzing different possible outcomes.
Example:
If you’re deciding whether to drink coffee based on the time of day and how tired you are, the root node
checks the time. If it’s morning, the next decision checks tiredness.
if not → No Coffee.
This kind of logic flow is how a decision tree operates.
Program:
17
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.
Output:
18
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.
19
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.
Analyzing Decision Tree Algorithm using Cross Validation and Percentage Split
To assess the effectiveness and generalization capability of the decision tree model, we use evaluation
techniques like Train-Test Split and Cross Validation. These methods help us determine how well the
model performs on unseen data and avoid overfitting or underfitting.
1. Train-Test Split (Percentage Split)
The Train-Test Split is a simple and commonly used method to evaluate model performance. In this
method:
The dataset is divided into two parts:
o Training Set: Used to train the model (typically 70% or 80% of data).
o Testing Set: Used to test the model’s prediction accuracy on unseen data (remaining 30% or
20%).
The model is trained on the training set and then evaluated using the testing set.
Evaluation metrics like Accuracy, Confusion Matrix, and Classification Report are used.
Advantage: Easy and quick to implement.
Limitation: Performance can vary based on how the data is split.
2. Cross Validation
Cross Validation is a more robust technique for model evaluation. It reduces the variance associated with
random Train-Test splits by averaging performance across multiple splits.
In k-Fold Cross Validation:
The dataset is divided into k equal parts (folds).
The model is trained on k-1 folds and tested on the remaining 1 fold.
This process is repeated k times, each time with a different fold as the test set.
The final accuracy is the mean of accuracies across all folds.
Advantage: Provides a more reliable and generalized estimate of model performance.
Limitation: Computationally more expensive compared to a single Train-Test split.
Why Analyze with Both?
Using both Train-Test Split and Cross Validation provides a comprehensive understanding of model
performance:
Train-Test Split shows how the model performs in a specific split scenario.
Cross Validation helps validate if the model performance is consistent and generalizable across
different data distributions.
Combining both methods helps in selecting the best model parameters and ensures that the model is not
biased or overfitted to a particular dataset split.
20
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.
Program:
21
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129
CS3EL15(P): Machine Learning Laboratory Experiment No. 6
Experiment: Identify a data set for executing the Decision Tree algorithm Page of 21
to implement using python and analyse the same with cross validation and
percentage split.
Output:
22
Department of Computer Science & Engineering
Student Name: Yuvraj Sikarwar Enrollment No: EN22CS3011129