0% found this document useful (0 votes)

51 views8 pages

Assignment 3

This document provides an overview of decision trees in machine learning, detailing their structure, classification methods, and algorithms like ID3, C4.5, and CART. It discusses the concepts of entropy and information gain for selecting attributes, as well as the advantages and disadvantages of using decision trees, including their susceptibility to overfitting and high variance. The conclusion emphasizes the importance of pruning techniques to enhance decision tree performance and generalization.

Uploaded by

apurva.kondekar6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views8 pages

Assignment 3

Uploaded by

apurva.kondekar6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

ASSIGNMENT NO.

3
AIM: Assignment of Decision Tree.

PREREQUISITE: Python programming

THEORY:

The purpose of an information system is to extract useful information from raw data. Data
science is a field of study that aims to understand and analyze data by means of statistics, big
data, machine learning and to provide support for decision makers and autonomous systems.
While this sounds complicated, the tools are based on mathematical models and specialized
software components that are already available (e.g. Python packages). In the following labs we
will learn about.. learning. Machine Learning, to be more specific, and the two main
classes: Supervised Learning and Unsupervised Learning. The general idea is to write
software programs that can learn from the available data, identify patterns and make decisions
with minimal human interventions, based on Machine Learning algorithms.
Machine Learning: Supervised Learning

Supervised learning is the Machine Learning task of learning a function (f) that maps an input
(X) to an output (y) based on example input-output pairs. The goal is to find (approximate) the
mapping function so that new data can be predicted. The function can be continuous in the case
of regression, or discrete in the case of classification, requiring different algorithms. Now, we
will discuss about classification methods, where the input/output variables are attributes and not
limited to numbers.

Regression vs classification

The main difference between them is that the output variable in regression is numerical (or
continuous, such as “dollars” or “weight”) while that for classification is categorical (or discrete,
such as “red”, “blue”, “small”, “large”). For example, when provided with a dataset about houses
(e.g. Boston), and you are asked to predict their prices, that is a regression task because price will

be a continuous output (see Lab 7). Examples of the common regression algorithms include
linear regression, Support Vector Regression (SVR), and regression trees.

Classification. Decision Trees

For example, when provided with a dataset about houses, a classification algorithm can try to
predict whether the prices for the houses “sell more or less than the recommended retail price”.
Examples of the common classification algorithms include logistic regression, Naïve
Bayes, decision trees, and K Nearest Neighbors.

Decision Trees (overview)

A decision tree is a classification and prediction tool having a tree like structure, where each
internal node denotes a test on an attribute, each branch represents an outcome of the test, and
each leaf node (terminal node) holds a class label:

▪ Input: historical data with known outcomes

▪ Output: rules and flowcharts (generated by the algorithm)
▪ How it works: the algorithm looks at all the different attributes and finds out the
decisions that have to be made at each step in order to reach the target value.

You can actually construct a flowchart that can be used to understand the decisions from the
historical data and predict decisions for the next sample of data.

Here is an example literally comparing apples and oranges based on the size and the texture of
the fruit, based on Decision Trees. The algorithm has to learn from the available, labelled
examples and then predict other fruits and classify them as either apples or oranges:

Types of Decision Trees

Hunt’s algorithm, which was developed in the 1960s to model human learning in Psychology,
forms the foundation of many popular decision tree algorithms, such as the following:

- ID3: Ross Quinlan is credited within the development of ID3, which is shorthand for “Iterative
Dichotomiser 3.” This algorithm leverages entropy and information gain as metrics to evaluate
candidate splits. Some of Quinlan’s research on this algorithm from 1986 can be
found here (PDF, 1.4 MB) (link resides outside of [Link]).

- C4.5: This algorithm is considered a later iteration of ID3, which was also developed by
Quinlan. It can use information gain or gain ratios to evaluate split points within the decision
trees.

- CART: The term, CART, is an abbreviation for “classification and regression trees” and was
introduced by Leo Breiman. This algorithm typically utilizes Gini impurity to identify the ideal

attribute to split on. Gini impurity measures how often a randomly chosen attribute is
misclassified. When evaluating using Gini impurity, a lower value is more ideal.

How to choose the best attribute at each node

While there are multiple ways to select the best attribute at each node, two methods, information
gain and Gini impurity, act as popular splitting criterion for decision tree models. They help to
evaluate the quality of each test condition and how well it will be able to classify samples into a
class.

Entropy and Information Gain

It’s difficult to explain information gain without first discussing entropy. Entropy is a concept
that stems from information theory, which measures the impurity of the sample values. It is
defined with by the following formula, where:

● S represents the data set that entropy is calculated

● c represents the classes in set, S
● p(c) represents the proportion of data points that belong to class c to the number of total
data points in set, S
Entropy values can fall between 0 and 1. If all samples in data set, S, belong to one class, then
entropy will equal zero. If half of the samples are classified as one class and the other half are in
another class, entropy will be at its highest at 1. In order to select the best feature to split on and
find the optimal decision tree, the attribute with the smallest amount of entropy should be used.
Information gain represents the difference in entropy before and after a split on a given attribute.
The attribute with the highest information gain will produce the best split as it’s doing the best
job at classifying the training data according to its target classification. Information gain is
usually represented with the following formula, where:

● a represents a specific attribute or class label

● Entropy(S) is the entropy of dataset, S
● |Sv|/ |S| represents the proportion of the values in Sv to the number of values in dataset, S
● Entropy(Sv) is the entropy of dataset, Sv
Let’s walk through an example to solidify these concepts. Imagine that we have the following
arbitrary dataset:

For this dataset, the entropy is 0.94. This can be calculated by finding the proportion of days
where “Play Tennis” is “Yes”, which is 9/14, and the proportion of days where “Play Tennis” is
“No”, which is 5/14. Then, these values can be plugged into the entropy formula above.
Entropy (Tennis) = -(9/14) log2(9/14) – (5/14) log2 (5/14) = 0.94
We can then compute the information gain for each of the attributes individually. For example,
the information gain for the attribute, “Humidity” would be the following:
Gain (Tennis, Humidity) = (0.94)-(7/14)*(0.985) – (7/14)*(0.592) = 0.151
As a recap,
- 7/14 represents the proportion of values where humidity equals “high” to the total number of
humidity values. In this case, the number of values where humidity equals “high” is the same as
the number of values where humidity equals “normal”.
- 0.985 is the entropy when Humidity = “high”
- 0.59 is the entropy when Humidity = “normal”
Then, repeat the calculation for information gain for each attribute in the table above, and select
the attribute with the highest information gain to be the first split point in the decision tree. In
this case, outlook produces the highest information gain. From there, the process is repeated for
each subtree.

Gini Impurity

Gini impurity is the probability of incorrectly classifying random data point in the dataset if it
were labeled based on the class distribution of the dataset. Similar to entropy, if set, S, is

pure—i.e. belonging to one class) then, its impurity is zero. This is denoted by the following
formula:

The accuracy and classification report for the wine quality dataset is:
Accuracy: 0.94
Classification Report:
precision recall f1-score support

0 1.00 0.93 0.96 14

1 0.88 1.00 0.93 14
2 1.00 0.88 0.93 8

The decision tree for this dataset is:

After applying pre-pruning the tree looks like:

After applying post- pruning the tree looks like:

Advantages and disadvantages of Decision Trees
While decision trees can be used in a variety of use cases, other algorithms typically outperform
decision tree algorithms. That said, decision trees are particularly useful for data mining and
knowledge discovery tasks. Let’s explore the key benefits and challenges of utilizing decision
trees more below:
Advantages

- Easy to interpret: The Boolean logic and visual representations of decision trees make them
easier to understand and consume. The hierarchical nature of a decision tree also makes it easy to
see which attributes are most important, which isn’t always clear with other algorithms,
like neural networks.

- Little to no data preparation required: Decision trees have a number of characteristics,

which make it more flexible than other classifiers. It can handle various data types—i.e. discrete
or continuous values, and continuous values can be converted into categorical values through the
use of thresholds. Additionally, it can also handle values with missing values, which can be
problematic for other classifiers, like Naïve Bayes.

- More flexible: Decision trees can be leveraged for both classification and regression tasks,
making it more flexible than some other algorithms. It’s also insensitive to underlying
relationships between attributes; this means that if two variables are highly correlated, the
algorithm will only choose one of the features to split on.
Disadvantages

- Prone to overfitting: Complex decision trees tend to overfit and do not generalize well to new
data. This scenario can be avoided through the processes of pre-pruning or post-pruning.
Pre-pruning halts tree growth when there is insufficient data while post-pruning removes
subtrees with inadequate data after tree construction.

- High variance estimators: Small variations within data can produce a very different decision
tree. Bagging, or the averaging of estimates, can be a method of reducing variance of decision
trees. However, this approach is limited as it can lead to highly correlated predictors.

- More costly: Given that decision trees take a greedy search approach during construction, they
can be more expensive to train compared to other algorithms.

- Not fully supported in scikit-learn: Scikit-learn is a popular machine learning library based
in Python. While this library does have a Decision Tree module (DecisionTreeClassifier, link
resides outside of [Link]), the current implementation does not support categorical variables.

REFERENCES:

1. Coursera Course on “What is Data Science?” offered by IBM. Available at

[Link]

2. [Link]

3. [Link]

CONCLUSION:

Pruning is an essential technique for improving decision tree performance by preventing

overfitting and enhancing generalization.

● Pre-Pruning stops tree growth early based on conditions like depth or minimum samples
per split, reducing complexity and computational cost but risking underfitting.
● Post-Pruning (Cost Complexity Pruning) grows a full tree first and then removes less
significant branches using an optimal α value, making it more effective at refining the
model.
● The best pruning strategy depends on the dataset, but a combination of both methods
often yields the most balanced results.

By implementing pruning techniques, decision trees become more interpretable, efficient, and
accurate, making them more suitable for real-world applications.

Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Training Day 22
No ratings yet
Training Day 22
48 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Classification
No ratings yet
Classification
30 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Tree Models
No ratings yet
Tree Models
42 pages
Unit II
No ratings yet
Unit II
34 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
7 pages
Trinh Khanh Ly 20213676
No ratings yet
Trinh Khanh Ly 20213676
13 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Inductive Inference with Decision Trees
No ratings yet
Inductive Inference with Decision Trees
53 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
Decision Tree - Associative Rule Mining
No ratings yet
Decision Tree - Associative Rule Mining
69 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Tree Basics
No ratings yet
Decision Tree Basics
30 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Trees
No ratings yet
Trees
78 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Bhabesh - Chapter 3 Complete Editing Including Summary
No ratings yet
Bhabesh - Chapter 3 Complete Editing Including Summary
18 pages
2024 Decision Trees
No ratings yet
2024 Decision Trees
28 pages
Module 5 Notes
No ratings yet
Module 5 Notes
8 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
ML Basics Unit 3
No ratings yet
ML Basics Unit 3
28 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
Berachain - DevOps Engineer
No ratings yet
Berachain - DevOps Engineer
1 page
Analogue Speech Recognition Based On Physical Computing: Article
No ratings yet
Analogue Speech Recognition Based On Physical Computing: Article
21 pages
Cbse Board Practical Examination-2024-25 (Information Technology)
No ratings yet
Cbse Board Practical Examination-2024-25 (Information Technology)
1 page
Class 10 Linear Equations MCQs
No ratings yet
Class 10 Linear Equations MCQs
18 pages
EcoSystem 2025 Brochure HIKVISION
No ratings yet
EcoSystem 2025 Brochure HIKVISION
8 pages
Beneview T6 (Standard Parameter)
No ratings yet
Beneview T6 (Standard Parameter)
3 pages
Math 10 - Week 1 - PPT
100% (1)
Math 10 - Week 1 - PPT
20 pages
Shaik Shavil
No ratings yet
Shaik Shavil
2 pages
SPX Brochure
No ratings yet
SPX Brochure
8 pages
PI Notifications User Guide
No ratings yet
PI Notifications User Guide
162 pages
IT Department Book List
No ratings yet
IT Department Book List
4 pages
Report Kernel Pca Method
No ratings yet
Report Kernel Pca Method
11 pages
Syllabus of B.Tech (CSE) 2nd Year - With CO and PO Mapping
No ratings yet
Syllabus of B.Tech (CSE) 2nd Year - With CO and PO Mapping
27 pages
CSE 122: Conditional Logic in C
No ratings yet
CSE 122: Conditional Logic in C
3 pages
UMTS Networks Architecture Mobility and Services Second Edition Heikki Kaaranen - PDF Download (2025)
No ratings yet
UMTS Networks Architecture Mobility and Services Second Edition Heikki Kaaranen - PDF Download (2025)
57 pages
Full Stack Developer Profile - Aradhana Salunkhe
No ratings yet
Full Stack Developer Profile - Aradhana Salunkhe
2 pages
DE37058280214217
No ratings yet
DE37058280214217
5 pages
Manufacturing Market Insights 2023
No ratings yet
Manufacturing Market Insights 2023
113 pages
Curriculum 2024
No ratings yet
Curriculum 2024
17 pages
Gradients
No ratings yet
Gradients
7 pages
Riser Diagram For Fire Alarm System
No ratings yet
Riser Diagram For Fire Alarm System
1 page
Software Project Management Exam Paper
No ratings yet
Software Project Management Exam Paper
2 pages
Electronic Filing Toolkit
No ratings yet
Electronic Filing Toolkit
3 pages
Lesson 8 Quiz
No ratings yet
Lesson 8 Quiz
3 pages
AW-NB110H WiFi & Bluetooth Module Datasheet
No ratings yet
AW-NB110H WiFi & Bluetooth Module Datasheet
13 pages
International Internships: Abhiney Srivastava IIT-Delhi
No ratings yet
International Internships: Abhiney Srivastava IIT-Delhi
11 pages
05 Chapter 03
No ratings yet
05 Chapter 03
7 pages
Baytek Catalog
No ratings yet
Baytek Catalog
18 pages
Internship Report (1) Simran
No ratings yet
Internship Report (1) Simran
27 pages
Networking Course for Cisco Exams
No ratings yet
Networking Course for Cisco Exams
14 pages

Assignment 3

Uploaded by

Assignment 3

Uploaded by

ASSIGNMENT NO.

PREREQUISITE: Python programming

Classification. Decision Trees

Decision Trees (overview)

▪​ Input: historical data with known outcomes

Types of Decision Trees

How to choose the best attribute at each node

Entropy and Information Gain

●​ S represents the data set that entropy is calculated

●​ a represents a specific attribute or class label

0 1.00 0.93 0.96 14

The decision tree for this dataset is:

After applying post- pruning the tree looks like:

- Little to no data preparation required: Decision trees have a number of characteristics,

1. Coursera Course on “What is Data Science?” offered by IBM. Available at

Pruning is an essential technique for improving decision tree performance by preventing

You might also like

▪ Input: historical data with known outcomes

● S represents the data set that entropy is calculated

● a represents a specific attribute or class label