Lab 2

The document outlines an experiment for a Machine Learning course focused on implementing Decision Trees for classification and regression tasks. It explains the theory behind Decision Trees, including key terminologies, steps for building a tree, and methods for attribute selection such as Information Gain and Gini Index. Additionally, it includes lab assignments using specific datasets to apply the concepts learned, including tasks related to overfitting analysis and tree pruning techniques.

Uploaded by

yugsavlabooks

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Lab 2

Uploaded by

yugsavlabooks

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Department of Computer Science and Engineering (Data Science)

Subject: Machine Learning – I (DJS23DCPC402)

AY: 2024-25

Experiment 2

(Decision Tree)

Aim: Implement Decision Tree on the given Datasets to build a classifier and Regressor. Apply appropriate
pruning method to overcome overfitting.

Theory:

Decision Tree is a Supervised learning technique that can be used for both classification and Regression
problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a dataset, branches represent the decision
rules and each leaf node represents the outcome. In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
The decisions or the test are performed on the basis of features of the given dataset.
It is a graphical representation for getting all the possible solutions to a problem/decision based on
given conditions. It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into
subtrees. Below diagram explains the general structure of a decision tree:

Decision Tree Terminologies

Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which
further gets divided into two or more homogeneous sets.

1
Department of Computer Science and Engineering (Data Science)

Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting
a leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the
child nodes.

Steps in building a Tree

Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
Step-3: Divide the S into subsets that contains possible values for the best attributes.
Step-4: Generate the decision tree node, which contains the best attribute.
Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and called the
final node as a leaf node.

Example: Suppose there is a candidate who has a job offer and wants to decide whether he should
accept the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary
attribute by ASM). The root node splits further into the next decision node (distance from the office) and
one leaf node based on the corresponding labels. The next decision node further gets split into one
decision node (Cab facility) and one leaf node. Finally, the decision node splits into two leaf nodes
(Accepted offers and Declined offer). Consider the below diagram:

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select the best attribute for the
root node and for sub-nodes. So, to solve such problems there is a technique which is called as Attribute
selection measure or ASM. By this measurement, we can easily select the best attribute for the nodes of
the tree. There are two popular techniques for ASM, which are:
1. Information Gain:
Information gain is the measurement of changes in entropy after the segmentation of a dataset based
on an attribute. It calculates how much information a feature provides us about a class.

2
Department of Computer Science and Engineering (Data Science)

According to the value of information gain, we split the node and build the decision tree.
A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute
having the highest information gain is split first. It can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data.
Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
S= Total number of samples
P(yes)= probability of yes
P(no)= probability of no

2. Gini Index:
Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
An attribute with the low Gini index should be preferred as compared to the high Gini index.
It only creates binary splits, and the CART algorithm uses the Gini index to create binary
splits. Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2
Pruning: Getting an Optimal Decision tree
Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision
tree. A too-large tree increases the risk of overfitting, and a small tree may not capture all the important
features of the dataset. Therefore, a technique that decreases the size of the learning tree without
reducing accuracy is known as Pruning. There are mainly two types of tree pruning technology used:
 Cost Complexity Pruning
 Reduced Error Pruning.

Lab Assignments to complete in this session:

Use the given dataset and perform the following tasks:

Dataset 1: IRIS.csv
Dataset 2: car prediction.csv

1. Use python libraries to build a decision tree classifier on Dataset 1. Analyze the results using confusion
matrix and accuracy. Plot the Decision Tree.
2. Write a code to show overfitting in the decision tree classifier built using Dataset 1. Use sklearn and
matplotlib.
3. Implement Decision tree regressor on Dataset 2.

Write-Up
1. Write the pseudo code of overfitting analysis in Decision Tree Classifier.

Google Cloud Essential
100% (1)
Google Cloud Essential
19 pages
Machine Learning Solved Mcqs Set 1
100% (6)
Machine Learning Solved Mcqs Set 1
6 pages
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
No ratings yet
Learning With Kernels Support Vector Machines, Regularization, Optimization, and Beyond by Bernhard Schlkopf, Alexander J. Smola
644 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Tree
No ratings yet
Tree
7 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
NOTES
No ratings yet
NOTES
18 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
DMDW 04
No ratings yet
DMDW 04
10 pages
Decision tree
No ratings yet
Decision tree
16 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Unit 4
No ratings yet
Unit 4
33 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
2179-Unit-3
No ratings yet
2179-Unit-3
29 pages
Decision Tree
No ratings yet
Decision Tree
24 pages
chapter 04
No ratings yet
chapter 04
48 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
10 pages
decisiontree
No ratings yet
decisiontree
4 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
AI22
No ratings yet
AI22
3 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
decision tree
No ratings yet
decision tree
13 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Cours #4—Decision Tree
No ratings yet
Cours #4—Decision Tree
18 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
UNIT-3 ML notes
No ratings yet
UNIT-3 ML notes
4 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Unit No. 03 - Classification & Regression
No ratings yet
Unit No. 03 - Classification & Regression
75 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
ML for ME S17 Decision Trees
No ratings yet
ML for ME S17 Decision Trees
12 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
MI_Unit 4
No ratings yet
MI_Unit 4
79 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Experiment 8_decisionTree
No ratings yet
Experiment 8_decisionTree
2 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
8 Free MIT Courses of DS
No ratings yet
8 Free MIT Courses of DS
10 pages
Assignment 8 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 8 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
3 pages
Malicious Url Detection Based On Machine Learning
No ratings yet
Malicious Url Detection Based On Machine Learning
52 pages
Heart Disease Prediction Using Supervised Machine Learning Algorithms
No ratings yet
Heart Disease Prediction Using Supervised Machine Learning Algorithms
3 pages
Imm 5270
No ratings yet
Imm 5270
156 pages
Thesis On Automatic Speech Recognition
100% (2)
Thesis On Automatic Speech Recognition
6 pages
Course Outline (Ds & Ai) 2024
No ratings yet
Course Outline (Ds & Ai) 2024
13 pages
AI Professional 6 Week Course in IIIT
No ratings yet
AI Professional 6 Week Course in IIIT
2 pages
Use of Artificial Intelligence in Drug Discovery and Its Development
No ratings yet
Use of Artificial Intelligence in Drug Discovery and Its Development
13 pages
Rohit's Resume
No ratings yet
Rohit's Resume
2 pages
Natural Language Processing for Analyzing Online C
No ratings yet
Natural Language Processing for Analyzing Online C
37 pages
State of AI Report 2020 - OnLINE
No ratings yet
State of AI Report 2020 - OnLINE
177 pages
Youtube Comments Sentiment Analysis 2
No ratings yet
Youtube Comments Sentiment Analysis 2
11 pages
DWDM Syllabus
No ratings yet
DWDM Syllabus
2 pages
Bayesian Optimization For Adaptive Experimental Design A Review
No ratings yet
Bayesian Optimization For Adaptive Experimental Design A Review
12 pages
Research Article: Key Frame Extraction For Sports Training Based On Improved Deep Learning
No ratings yet
Research Article: Key Frame Extraction For Sports Training Based On Improved Deep Learning
8 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
MIS Unit 3
No ratings yet
MIS Unit 3
39 pages
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
No ratings yet
4 Implementing A GPT Model From Scratch To Generate Text - Build A Large Language Model (From Scratch)
52 pages
NLP
No ratings yet
NLP
11 pages
Solution 10 Decision Trees
No ratings yet
Solution 10 Decision Trees
5 pages
Artificial Intelligence For The Prevention and Cli
No ratings yet
Artificial Intelligence For The Prevention and Cli
14 pages
Grade X AI Sample Paper-4 (2024-2025)
No ratings yet
Grade X AI Sample Paper-4 (2024-2025)
4 pages
Heirarchical Clustering.ipynb - Colab
No ratings yet
Heirarchical Clustering.ipynb - Colab
4 pages
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
No ratings yet
Dimensionality Reduction: Pca, SVD, MDS, Ica, and Friends
50 pages
Remotesensing 15 03055 v2
No ratings yet
Remotesensing 15 03055 v2
26 pages
Course Outline Da Sept 2023
No ratings yet
Course Outline Da Sept 2023
4 pages

Lab 2

Uploaded by

Lab 2

Uploaded by

Department of Computer Science and Engineering (Data Science)

Subject: Machine Learning – I (DJS23DCPC402)

Decision Tree Terminologies

Steps in building a Tree

Attribute Selection Measures

Lab Assignments to complete in this session:

Use the given dataset and perform the following tasks:

You might also like