0% found this document useful (0 votes)

22 views14 pages

BDA Literature Survey

Uploaded by

bhavanavovaldasu157

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views14 pages

BDA Literature Survey

Uploaded by

bhavanavovaldasu157

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Literature Survey on Decision Tree Classifiers

Big Data Anlytics

LG-2(Thrushala,Bhavana,Vaishnavi)

Machine learning has become a crucial tool in data-driven decision-

making, and among its various algorithms, decision trees are
among the most widely used methods for classification tasks. A
decision tree is a model that breaks down a complex decision-making
process into simpler, sequential decisions. Each internal node of the
tree represents a decision based on an attribute, while the branches
represent the possible outcomes of that decision. The leaves of the
tree contain the final decision or classification.

Decision Tree Classifiers

Decision trees are favored for their simplicity, interpretability, and

ability to handle both categorical and continuous data. These
classifiers work by recursively partitioning the data based on feature
values that maximize the separation between classes. Despite their
simplicity, decision trees are powerful, capable of capturing complex
patterns in data without needing extensive data preprocessing.

However, decision trees do have their challenges:

 Overfitting: Trees can become excessively complex and may
fit noise in the training data.
 Bias towards attributes with many values: Some algorithms
may prefer features with a large number of possible values,
leading to suboptimal splits.
 Handling continuous data: Continuous variables require
discretization, which can sometimes lead to loss of information.

In this project, we focus on applying decision tree classifiers to the

StarLightCurvesP dataset, which contains data related to light
curves of stars. The task is to classify these stars based on their light
curves, which can reveal information about their type (e.g., variable
stars or supernovae). The dataset is ideal for testing various decision
tree algorithms due to its complex structure and the presence of
continuous and categorical data.

The main objective of this project is to:

1. Apply decision tree classifiers (ID3, C4.5, and CART) to the

StarLightCurvesP dataset and evaluate their performance.
2. Explore the challenges of using decision tree classifiers, such as
overfitting and handling continuous data.
3. Compare the performance of these algorithms based on metrics
like accuracy, precision, recall, and F1 score.
This literature review is based on my research into
several key papers on Decision Tree Classifiers.

1 .A Survey on Decision Tree Algorithm for

Classification

Authors:

 Mr. Brijain R Patel

 Mr. Kushik K Rana

Year of Publication:
2014

Classification Used:
The paper focuses on Decision Tree Algorithms as a method for
classification in data mining.
Methods Used:
The paper provides an overview of various Decision Tree algorithms:

 ID3 (Iterative Dichotomiser 3)

 C4.5
 C5.0
 CART (Classification and Regression Tree)
 CHAID (Chi-squared Automatic Interaction Detector)
 Hunt’s Algorithm

These algorithms are compared based on their features, advantages,

and challenges.

Results:

 Comparison of decision tree algorithms based on their speed,

pruning methods, support for missing values, and data
types they can handle.
 C5.0 is highlighted as better in terms of speed and memory
efficiency compared to C4.5 and ID3.
 Applications of decision trees in various fields like business, e-
commerce, medicine, and image processing are discussed.

Datasets Used:
Examples of widely used datasets mentioned in the paper include:
 Balance Scale (psychological experiments)
 Breast Cancer (health resource)
 Heart Disease (health resource)
 Bank Marketing (finance-related)
 Image Segmentation (image-related)

Summary:
The paper surveys Decision Tree algorithms, exploring their
characteristics, challenges, advantages, and disadvantages. It
discusses the significance of decision trees in classification tasks
within data mining and highlights their applications across various
domains. Additionally, it provides insights into improvements in
algorithms like C5.0 and common datasets for research in decision
tree classification.

2. Building Decision Tree Classifier on Private Data

Authors:

 Wenliang Du
 Zhijun Zhan

Year of Publication:
2002

Classification Used:
Focus on privacy-preserving decision tree classification using
scalar product protocols.

Methods Used:
The paper introduces a privacy-preserving technique that uses
secure multi-party computation methods. It focuses on vertically
partitioned data where the data is held by two parties, and neither
party is willing to disclose their private data.

Results:

 The proposed method allows secure classification without

sharing private datasets.
 It successfully enables classification tasks without
compromising the privacy of data.

Applications Used:
 Finance (e.g., credit scoring, fraud detection)
 Healthcare (e.g., disease prediction, patient classification)

Summary:
The paper presents a method for building decision tree classifiers
based on privacy-preserving techniques for data that is partitioned
across two parties. The method leverages secure multi-party
computation techniques to ensure the privacy of each party's data
while still allowing accurate decision tree classification.

3. Analysis of Various Decision Tree Algorithms for

Classification in Data Mining

Authors:

 Bhumika Gupta, PhD

 Aditya Rawat
 Akshay Jain
 Arpit Arora
 Naresh Dhami
Year of Publication:
2017

Classification Used:
The paper presents a comparative analysis of different Decision
Tree algorithms: ID3, C4.5, and CART.

Methods Used:
The paper evaluates the performance of ID3, C4.5, and CART
algorithms with respect to their accuracy, handling missing values,
and speed in classification tasks.

Results:

 Detailed comparison of the three algorithms based on accuracy,

speed, and their ability to handle missing values.
 Discusses the strengths and weaknesses of each algorithm in
real-world applications.
Applications Used:

 Finance (e.g., loan prediction, fraud detection)

 Healthcare (e.g., patient diagnostics, medical outcomes)

Summary:
This paper compares the performance of ID3, C4.5, and CART,
highlighting their application in finance and healthcare sectors.
The paper outlines the challenges of using these algorithms on large
datasets and missing values and their overall effectiveness in real-
world classification tasks.

4. Granular Matrix Decision Tree (GMDT)

Authors:

 Chia-Hsiu Chen
 Tsu-Hsiang Chang
 Kuan-Chuan Peng
 Chin-Chen Chang

Year of Publication:
2020

Classification Used:
Proposes a new decision tree algorithm using Granular Computing
techniques.

Methods Used:
This paper introduces the Granular Matrix Decision Tree
(GMDT), which uses granular computing techniques to enhance
decision tree classification. The approach improves classification
accuracy by dividing data into granular matrices and optimizing
decision splits.

Results:

 GMDT outperforms traditional decision tree algorithms like

ID3 and C4.5 on multiple datasets from the UCI repository.
 Shows improvements in accuracy and processing speed
compared to existing decision tree models.

Datasets Used:

 UCI Datasets for testing and comparison.

Summary:
The paper proposes a new decision tree algorithm called Granular
Matrix Decision Tree (GMDT), which leverages granular
computing to improve classification results. It demonstrates that
GMDT provides better accuracy and efficiency than traditional
methods like ID3 and C4.5 on large-scale datasets.

5. Classification Based on Decision Tree Algorithm for

Machine Learning
Authors:

 Akhil Mehta
 Sujata S. K.
 Srinivasan V.

Year of Publication:
2017

Classification Used:
The paper focuses on the application of Decision Tree algorithms
for Machine Learning tasks, particularly for text and image
classification.

Methods Used:
The paper uses decision trees for text and image classification
tasks and evaluates the performance on several standard datasets.

Results:

 Achieved 99.91% accuracy on the CICIDS2017 dataset for

classification tasks.
 Demonstrates the effectiveness of decision trees in real-time
machine learning applications, particularly for text
classification and image recognition.
Applications Used:

 Text Classification (e.g., sentiment analysis, document

categorization)
 Image Classification (e.g., object detection, pattern
recognition)

Summary:
This paper discusses the high efficacy of decision tree algorithms
in machine learning applications. It presents a case study on the
CICIDS2017 dataset, where the decision tree classifier achieved an
impressive accuracy rate of 99.91%, demonstrating its usefulness for
real-world tasks in text and image classification.

DTminer - Presentation
No ratings yet
DTminer - Presentation
13 pages
Module 5 Decision Tree Part2
No ratings yet
Module 5 Decision Tree Part2
47 pages
A Survey On Decision Tree Algorithms of Classification in Data Mining
No ratings yet
A Survey On Decision Tree Algorithms of Classification in Data Mining
5 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Experiment No 4 Vanraj
No ratings yet
Experiment No 4 Vanraj
2 pages
2072 4119 1 SM
No ratings yet
2072 4119 1 SM
5 pages
Classification Techniques in Machine Learning: Applications and Issues
No ratings yet
Classification Techniques in Machine Learning: Applications and Issues
8 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Literature Review CCSIT205
No ratings yet
Literature Review CCSIT205
9 pages
Decision Trees Concepts Algorithms
No ratings yet
Decision Trees Concepts Algorithms
15 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
1 s2.0 S235197891930736X Main
No ratings yet
1 s2.0 S235197891930736X Main
6 pages
Siv UNIT-3 Classification DWM PART-A
No ratings yet
Siv UNIT-3 Classification DWM PART-A
12 pages
Kiran
No ratings yet
Kiran
12 pages
HSMC
No ratings yet
HSMC
5 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
7 pages
Assignment Decision Tree
No ratings yet
Assignment Decision Tree
15 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
Decision Tree
0% (1)
Decision Tree
24 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
Module 04
No ratings yet
Module 04
75 pages
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
No ratings yet
Analysis of Various Decision Tree Algorithms For Classification in Data Mining PDF
5 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Presentation On Decision Trees
No ratings yet
Presentation On Decision Trees
12 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Unit 3 &4 BDA Notes
No ratings yet
Unit 3 &4 BDA Notes
20 pages
Classification Techniquesin Machine Learning Applicationsand Issues
No ratings yet
Classification Techniquesin Machine Learning Applicationsand Issues
8 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Decision Tree Project Report
No ratings yet
Decision Tree Project Report
3 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Title
No ratings yet
Title
10 pages
Gemcom Minex: New Features
No ratings yet
Gemcom Minex: New Features
13 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
Practical 15 Python
No ratings yet
Practical 15 Python
6 pages
A Comparitive Study On Different Classification Algorithms Using Airline Dataset
No ratings yet
A Comparitive Study On Different Classification Algorithms Using Airline Dataset
4 pages
Prediction of Energy Consumption in Smart Homes Using Machine Learning Algorithms
No ratings yet
Prediction of Energy Consumption in Smart Homes Using Machine Learning Algorithms
13 pages
A Hybrid Approach For Classification Tree Generation
No ratings yet
A Hybrid Approach For Classification Tree Generation
3 pages
A Survey of Decision Trees Concepts Algorithms and Applications
No ratings yet
A Survey of Decision Trees Concepts Algorithms and Applications
12 pages
Assignment of Decision Tree in Machine Learning
No ratings yet
Assignment of Decision Tree in Machine Learning
15 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Classification, Prediction
100% (1)
Classification, Prediction
67 pages
Singh Surender - Biostatistics & Research Methodolgy
No ratings yet
Singh Surender - Biostatistics & Research Methodolgy
18 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
INT354 - Unit 2
No ratings yet
INT354 - Unit 2
26 pages
Lecture Ch4 Performance
No ratings yet
Lecture Ch4 Performance
25 pages
Cart
No ratings yet
Cart
24 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Data Mining and Machine Learning Discussion Assignment 5
No ratings yet
Data Mining and Machine Learning Discussion Assignment 5
5 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Step by Step DFS
No ratings yet
Step by Step DFS
53 pages
Methodology Mate Nu Paper
No ratings yet
Methodology Mate Nu Paper
7 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Understanding Decision Trees
No ratings yet
Understanding Decision Trees
2 pages
USAID - BHA RFSA M&E Technical Guidance May 2023
No ratings yet
USAID - BHA RFSA M&E Technical Guidance May 2023
143 pages
Office of The Sangguniang Kabataan
No ratings yet
Office of The Sangguniang Kabataan
5 pages
An Overview of Decision Tree Applied To Power Systems: Chengxi Liu, Zakir Hussain Rather, Zhe Chen, Claus Leth Bak
No ratings yet
An Overview of Decision Tree Applied To Power Systems: Chengxi Liu, Zakir Hussain Rather, Zhe Chen, Claus Leth Bak
7 pages
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
No ratings yet
Classification Through Machine Learning Technique: C4.5 Algorithm Based On Various Entropies
8 pages
Evolution of Media
100% (1)
Evolution of Media
8 pages
Rizvi College of Engineering: Project Synopsis Report
No ratings yet
Rizvi College of Engineering: Project Synopsis Report
17 pages
Module 3 Notes
No ratings yet
Module 3 Notes
26 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Compiled Test 1 EIS (All)
No ratings yet
Compiled Test 1 EIS (All)
349 pages
Test Ict450
100% (1)
Test Ict450
11 pages
Avila Et Al 2021 - Characterization of The Mechanical and Physical Properties
No ratings yet
Avila Et Al 2021 - Characterization of The Mechanical and Physical Properties
12 pages
7 - 5250 - 01880 - 01E Nachrüstsatz
No ratings yet
7 - 5250 - 01880 - 01E Nachrüstsatz
11 pages
HT Test Reopts July CTPT 2020
No ratings yet
HT Test Reopts July CTPT 2020
6 pages
Decision Trees
100% (6)
Decision Trees
28 pages
30 Days of Photoshop Schedule
No ratings yet
30 Days of Photoshop Schedule
9 pages
N - Channel Enhancement Mode " Single Feature Size " Power Mosfet
No ratings yet
N - Channel Enhancement Mode " Single Feature Size " Power Mosfet
9 pages
Bai Tap Ve May Bien AP
No ratings yet
Bai Tap Ve May Bien AP
7 pages
Micro Frontend Frameworks
No ratings yet
Micro Frontend Frameworks
7 pages
Directory Sites List: S.no. Date Client Url Website Url
No ratings yet
Directory Sites List: S.no. Date Client Url Website Url
6 pages
DS-K1T341CMF Datasheet 20231227
No ratings yet
DS-K1T341CMF Datasheet 20231227
4 pages
PST 3
No ratings yet
PST 3
3 pages
V1 N2 1980 Rabenhorst
No ratings yet
V1 N2 1980 Rabenhorst
6 pages
A Rapid Abnormal Event Detection Method For Surveillance Video Based On A Novel Feature in Compressed Domain of HEVC
No ratings yet
A Rapid Abnormal Event Detection Method For Surveillance Video Based On A Novel Feature in Compressed Domain of HEVC
6 pages
K22I4CMETQDEW175 Voucher
No ratings yet
K22I4CMETQDEW175 Voucher
1 page
0936E1001R00
No ratings yet
0936E1001R00
1 page
Text Mining and Visualization Using Vosviewer
No ratings yet
Text Mining and Visualization Using Vosviewer
5 pages
Deep Learning Based Smart Garbage Classifier For Effective Waste Management
No ratings yet
Deep Learning Based Smart Garbage Classifier For Effective Waste Management
4 pages
Pipe Risers and Their Supports
No ratings yet
Pipe Risers and Their Supports
4 pages
Cambridge 1 Syllabus Planer Nov - Dec 2023
No ratings yet
Cambridge 1 Syllabus Planer Nov - Dec 2023
3 pages
Consent Form Version 6
No ratings yet
Consent Form Version 6
2 pages

BDA Literature Survey

Uploaded by

BDA Literature Survey

Uploaded by

Literature Survey on Decision Tree Classifiers

Big Data Anlytics

Machine learning has become a crucial tool in data-driven decision-

Decision Tree Classifiers

Decision trees are favored for their simplicity, interpretability, and

However, decision trees do have their challenges:

In this project, we focus on applying decision tree classifiers to the

The main objective of this project is to:

1. Apply decision tree classifiers (ID3, C4.5, and CART) to the

1 .A Survey on Decision Tree Algorithm for

 Mr. Brijain R Patel

 ID3 (Iterative Dichotomiser 3)

These algorithms are compared based on their features, advantages,

 Comparison of decision tree algorithms based on their speed,

2. Building Decision Tree Classifier on Private Data

 The proposed method allows secure classification without

3. Analysis of Various Decision Tree Algorithms for

 Bhumika Gupta, PhD

 Detailed comparison of the three algorithms based on accuracy,

 Finance (e.g., loan prediction, fraud detection)

4. Granular Matrix Decision Tree (GMDT)

 GMDT outperforms traditional decision tree algorithms like

 UCI Datasets for testing and comparison.

5. Classification Based on Decision Tree Algorithm for

 Achieved 99.91% accuracy on the CICIDS2017 dataset for

 Text Classification (e.g., sentiment analysis, document

You might also like