0% found this document useful (0 votes)
22 views14 pages

BDA Literature Survey

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views14 pages

BDA Literature Survey

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Literature Survey on Decision Tree Classifiers

Big Data Anlytics


LG-2(Thrushala,Bhavana,Vaishnavi)

Machine learning has become a crucial tool in data-driven decision-


making, and among its various algorithms, decision trees are
among the most widely used methods for classification tasks. A
decision tree is a model that breaks down a complex decision-making
process into simpler, sequential decisions. Each internal node of the
tree represents a decision based on an attribute, while the branches
represent the possible outcomes of that decision. The leaves of the
tree contain the final decision or classification.

Decision Tree Classifiers

Decision trees are favored for their simplicity, interpretability, and


ability to handle both categorical and continuous data. These
classifiers work by recursively partitioning the data based on feature
values that maximize the separation between classes. Despite their
simplicity, decision trees are powerful, capable of capturing complex
patterns in data without needing extensive data preprocessing.

However, decision trees do have their challenges:


 Overfitting: Trees can become excessively complex and may
fit noise in the training data.
 Bias towards attributes with many values: Some algorithms
may prefer features with a large number of possible values,
leading to suboptimal splits.
 Handling continuous data: Continuous variables require
discretization, which can sometimes lead to loss of information.

In this project, we focus on applying decision tree classifiers to the


StarLightCurvesP dataset, which contains data related to light
curves of stars. The task is to classify these stars based on their light
curves, which can reveal information about their type (e.g., variable
stars or supernovae). The dataset is ideal for testing various decision
tree algorithms due to its complex structure and the presence of
continuous and categorical data.

The main objective of this project is to:

1. Apply decision tree classifiers (ID3, C4.5, and CART) to the


StarLightCurvesP dataset and evaluate their performance.
2. Explore the challenges of using decision tree classifiers, such as
overfitting and handling continuous data.
3. Compare the performance of these algorithms based on metrics
like accuracy, precision, recall, and F1 score.
This literature review is based on my research into
several key papers on Decision Tree Classifiers.

1 .A Survey on Decision Tree Algorithm for


Classification

Authors:

 Mr. Brijain R Patel


 Mr. Kushik K Rana

Year of Publication:
2014

Classification Used:
The paper focuses on Decision Tree Algorithms as a method for
classification in data mining.
Methods Used:
The paper provides an overview of various Decision Tree algorithms:

 ID3 (Iterative Dichotomiser 3)


 C4.5
 C5.0
 CART (Classification and Regression Tree)
 CHAID (Chi-squared Automatic Interaction Detector)
 Hunt’s Algorithm

These algorithms are compared based on their features, advantages,


and challenges.

Results:

 Comparison of decision tree algorithms based on their speed,


pruning methods, support for missing values, and data
types they can handle.
 C5.0 is highlighted as better in terms of speed and memory
efficiency compared to C4.5 and ID3.
 Applications of decision trees in various fields like business, e-
commerce, medicine, and image processing are discussed.

Datasets Used:
Examples of widely used datasets mentioned in the paper include:
 Balance Scale (psychological experiments)
 Breast Cancer (health resource)
 Heart Disease (health resource)
 Bank Marketing (finance-related)
 Image Segmentation (image-related)

Summary:
The paper surveys Decision Tree algorithms, exploring their
characteristics, challenges, advantages, and disadvantages. It
discusses the significance of decision trees in classification tasks
within data mining and highlights their applications across various
domains. Additionally, it provides insights into improvements in
algorithms like C5.0 and common datasets for research in decision
tree classification.

2. Building Decision Tree Classifier on Private Data


Authors:

 Wenliang Du
 Zhijun Zhan

Year of Publication:
2002

Classification Used:
Focus on privacy-preserving decision tree classification using
scalar product protocols.

Methods Used:
The paper introduces a privacy-preserving technique that uses
secure multi-party computation methods. It focuses on vertically
partitioned data where the data is held by two parties, and neither
party is willing to disclose their private data.

Results:

 The proposed method allows secure classification without


sharing private datasets.
 It successfully enables classification tasks without
compromising the privacy of data.

Applications Used:
 Finance (e.g., credit scoring, fraud detection)
 Healthcare (e.g., disease prediction, patient classification)

Summary:
The paper presents a method for building decision tree classifiers
based on privacy-preserving techniques for data that is partitioned
across two parties. The method leverages secure multi-party
computation techniques to ensure the privacy of each party's data
while still allowing accurate decision tree classification.

3. Analysis of Various Decision Tree Algorithms for


Classification in Data Mining

Authors:

 Bhumika Gupta, PhD


 Aditya Rawat
 Akshay Jain
 Arpit Arora
 Naresh Dhami
Year of Publication:
2017

Classification Used:
The paper presents a comparative analysis of different Decision
Tree algorithms: ID3, C4.5, and CART.

Methods Used:
The paper evaluates the performance of ID3, C4.5, and CART
algorithms with respect to their accuracy, handling missing values,
and speed in classification tasks.

Results:

 Detailed comparison of the three algorithms based on accuracy,


speed, and their ability to handle missing values.
 Discusses the strengths and weaknesses of each algorithm in
real-world applications.
Applications Used:

 Finance (e.g., loan prediction, fraud detection)


 Healthcare (e.g., patient diagnostics, medical outcomes)

Summary:
This paper compares the performance of ID3, C4.5, and CART,
highlighting their application in finance and healthcare sectors.
The paper outlines the challenges of using these algorithms on large
datasets and missing values and their overall effectiveness in real-
world classification tasks.

4. Granular Matrix Decision Tree (GMDT)


Authors:

 Chia-Hsiu Chen
 Tsu-Hsiang Chang
 Kuan-Chuan Peng
 Chin-Chen Chang

Year of Publication:
2020

Classification Used:
Proposes a new decision tree algorithm using Granular Computing
techniques.

Methods Used:
This paper introduces the Granular Matrix Decision Tree
(GMDT), which uses granular computing techniques to enhance
decision tree classification. The approach improves classification
accuracy by dividing data into granular matrices and optimizing
decision splits.

Results:

 GMDT outperforms traditional decision tree algorithms like


ID3 and C4.5 on multiple datasets from the UCI repository.
 Shows improvements in accuracy and processing speed
compared to existing decision tree models.

Datasets Used:

 UCI Datasets for testing and comparison.

Summary:
The paper proposes a new decision tree algorithm called Granular
Matrix Decision Tree (GMDT), which leverages granular
computing to improve classification results. It demonstrates that
GMDT provides better accuracy and efficiency than traditional
methods like ID3 and C4.5 on large-scale datasets.

5. Classification Based on Decision Tree Algorithm for


Machine Learning
Authors:

 Akhil Mehta
 Sujata S. K.
 Srinivasan V.

Year of Publication:
2017

Classification Used:
The paper focuses on the application of Decision Tree algorithms
for Machine Learning tasks, particularly for text and image
classification.

Methods Used:
The paper uses decision trees for text and image classification
tasks and evaluates the performance on several standard datasets.

Results:

 Achieved 99.91% accuracy on the CICIDS2017 dataset for


classification tasks.
 Demonstrates the effectiveness of decision trees in real-time
machine learning applications, particularly for text
classification and image recognition.
Applications Used:

 Text Classification (e.g., sentiment analysis, document


categorization)
 Image Classification (e.g., object detection, pattern
recognition)

Summary:
This paper discusses the high efficacy of decision tree algorithms
in machine learning applications. It presents a case study on the
CICIDS2017 dataset, where the decision tree classifier achieved an
impressive accuracy rate of 99.91%, demonstrating its usefulness for
real-world tasks in text and image classification.

You might also like