0% found this document useful (0 votes)

6 views4 pages

ML Project

This study compares the noise resilience of decision trees and linear regression in machine learning, hypothesizing that decision trees are more robust to noise in input data. Experiments using the IPARC dataset showed that decision trees maintained higher accuracy and lower error rates compared to linear regression when noise was introduced. The findings suggest that decision trees are better suited for applications involving noisy data, with potential for future research into other models and preprocessing techniques.

Uploaded by

Shiv Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views4 pages

ML Project

Uploaded by

Shiv Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Under review as submission to TMLR

Noise Resilience in Machine Learning Models: A Comparison

of Decision Trees and Linear Regression

Anonymous authors
Paper under double-blind review

Abstract

This study investigates how slight randomness (noise) in input data affects the accuracy of
machine learning models in recognizing patterns. It tests the hypothesis that decision trees,
which utilize rule-based splitting, are more resilient to noise compared to linear regression,
which assumes continuous relationships. By comparing accuracy drops in both models after
introducing noise, the research evaluates their robustness under imperfect data conditions,
providing insights into their suitability for real-world applications. Experiments were con-
ducted using a Python-based framework applied to the IPARC dataset. Models were tested
across various noise levels, and metrics such as MSE, MAE, and R2 were used to quantify
performance.

1 Introduction

Noise in data is a common challenge in machine learning applications, often stemming from measurement
errors or data preprocessing inconsistencies. Understanding how noise affects model performance is critical for
selecting the most appropriate algorithms for real-world scenarios. Decision trees (DTs) and linear regression
(LR) are two popular models with distinct mechanisms: DTs split data based on discrete thresholds, while
LR assumes continuous linear relationships between input and output variables. This study aims to quantify
the resilience of these models to noise in the context of the IPARC dataset.

2 Hypothesis

Introducing slight noise to inputs in the IPARC dataset will reduce the accuracy of ML models trained to
recognize patterns, but decision trees will demonstrate greater resilience to this noise compared to linear
regression.

3 Experimental Setup

The experiments were conducted using the Python-based IPARCExperiment framework, which implements
the following components:

3.1 1. Dataset and Preprocessing

• Dataset: The experiments focused on the "Simple" category of the IPARC dataset. Tasks involve
recognizing patterns in normalized pixel values of images.

• Preprocessing: Each input and output image was normalized to the [0, 1] range. Input images
were flattened for compatibility with regression models.

• Splitting: The dataset was split into training and testing sets using 5-fold cross-validation to ensure
robust evaluation.

1
Under review as submission to TMLR

3.2 2. Noise Levels

The study introduced Gaussian noise to the test data at the following levels: 5%, 10%, 20%, and 30%. Noise
was added as a percentage of the range of pixel values, ensuring the resulting values remained within the
normalized range.

3.3 3. Models
• Decision Tree: Implemented using DecisionTreeRegressor with a maximum depth of 10 to
prevent overfitting.
• Linear Regression: Implemented using LinearRegression, a standard least-squares regression
method.

3.4 4. Evaluation Metrics

• Mean Squared Error (MSE): Measures average squared differences between actual and predicted
values.
• Mean Absolute Error (MAE): Captures the average magnitude of errors.
• R2 : Indicates the proportion of variance in the dependent variable explained by the model.

3.5 5. Framework Implementation

The experiments were conducted using a custom Python class IPARCExperiment. The framework’s key fea-
tures include data normalization, noise generation, cross-validation, and detailed performance visualization.

4 Results and Analysis

The results confirm the hypothesis by demonstrating that noise reduces model accuracy and that decision
trees are more resilient to noise compared to linear regression.

4.1 1. Impact of Noise on Accuracy

• Decision Tree: MSE increased modestly with noise, indicating better noise tolerance.
– MSE rose from 0.1069 (5% noise) to 0.1240 (30% noise).
• Linear Regression: MSE increased significantly, highlighting sensitivity to noise.
– MSE rose from 0.1846 (5% noise) to 0.1938 (30% noise).

4.2 2. Resilience of Decision Trees

• At all noise levels, decision trees outperformed linear regression:
– At 5% noise, decision tree MSE was 42% lower than linear regression MSE.
– At 30% noise, decision tree MSE was 36% lower than linear regression MSE.
• Decision trees exhibited lower variance across folds, further supporting their robustness.

4.3 3. Significance of Metrics

The metrics highlight the strengths of decision trees in handling noisy data:

• MAE: Decision trees consistently showed lower MAE, reflecting smaller average errors.
• R2 : Decision trees maintained higher R2 values, indicating better explanatory power under noise.

2
Under review as submission to TMLR

5 Visualization

Figure 1 presents the performance comparison between decision trees and linear regression across noise levels.

results_plot.png

Figure 1: Model performance (MSE) vs noise levels with error bars representing standard deviation across
folds.

6 Insights from Related Research

Research on crisp and fuzzy decision trees provides additional insights:

• Attribute Noise: Decision trees are robust to small observation errors (o-noise) due to their
discrete splits.

• Class Label Noise: Noise in class labels significantly impacts tree complexity and accuracy, ne-
cessitating clean training data.

• Fuzzy Decision Trees: Studies show that fuzzy decision trees outperform crisp decision trees in
handling ambiguous data, suggesting a potential avenue for future work.

3
Under review as submission to TMLR

7 Conclusion and Future Work

• The results validate the hypothesis:
– Noise reduces accuracy for both models, but decision trees are significantly more resilient.
– Decision trees are suitable for noisy pattern recognition tasks like those in the IPARC dataset.
• Future Work:
– Evaluate other models, such as random forests and fuzzy decision trees, under similar noise
conditions.
– Explore the effects of higher noise levels and real-world noisy datasets.
– Investigate the impact of data augmentation and other preprocessing techniques on model re-
silience.

References
[1] J. Sun and X.-Z. Wang, An Initial Comparison on Noise Resisting Between Crisp and Fuzzy Decision
Trees, Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, pp.
2545–2550, 2005.

[2] François Chollet, On the Measure of Intelligence, CoRR abs/1911.01547, 2019. Available at https:
//arxiv.org/abs/1911.01547.

120 DS-With Answer
100% (1)
120 DS-With Answer
32 pages
3 1 Overfitting
No ratings yet
3 1 Overfitting
25 pages
Machine Learning CA 2
No ratings yet
Machine Learning CA 2
19 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Random Forest
No ratings yet
Random Forest
225 pages
Data Science
100% (1)
Data Science
14 pages
B.N.M. Institute of Technology: Prediction of Remaining Useful Life of Aircraft Engine
No ratings yet
B.N.M. Institute of Technology: Prediction of Remaining Useful Life of Aircraft Engine
28 pages
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 6 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
10 pages
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
No ratings yet
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms
27 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
Supervised Learning Classification Algorithms Comparison
No ratings yet
Supervised Learning Classification Algorithms Comparison
6 pages
Cbse - Department of Skill Education Curriculum For Session 2021-2022
No ratings yet
Cbse - Department of Skill Education Curriculum For Session 2021-2022
12 pages
Soft Computing Lab Practical Assignment 2
No ratings yet
Soft Computing Lab Practical Assignment 2
10 pages
Comparative Study of Event Prediction in Power Grids Using Supervised Machine Learning Methods
No ratings yet
Comparative Study of Event Prediction in Power Grids Using Supervised Machine Learning Methods
6 pages
A Machine Learning Approach To Predict Price of Airlines Tickets
No ratings yet
A Machine Learning Approach To Predict Price of Airlines Tickets
7 pages
A Comparitive Study On Different Classification Algorithms Using Airline Dataset
No ratings yet
A Comparitive Study On Different Classification Algorithms Using Airline Dataset
4 pages
Experiment No 4 Vanraj
No ratings yet
Experiment No 4 Vanraj
2 pages
Comparative Study Fuzzy Decision Tree, ID3
No ratings yet
Comparative Study Fuzzy Decision Tree, ID3
62 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
Project Occupancy Alfonso Vicente Aragues
No ratings yet
Project Occupancy Alfonso Vicente Aragues
18 pages
MLA NLP Lecture2
No ratings yet
MLA NLP Lecture2
76 pages
ML Unit 3
No ratings yet
ML Unit 3
83 pages
02 LecDT
No ratings yet
02 LecDT
85 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Machine Learning For Interviews
No ratings yet
Machine Learning For Interviews
12 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
20ee38011 Exp4
No ratings yet
20ee38011 Exp4
24 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
Forecasting: Chapter Three Operations Management William J. Stevenson
No ratings yet
Forecasting: Chapter Three Operations Management William J. Stevenson
13 pages
Forecasting Methods
100% (2)
Forecasting Methods
50 pages
4.3-DecisionTreesLearningAlgorithms Part 2
No ratings yet
4.3-DecisionTreesLearningAlgorithms Part 2
15 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Slay The Day
No ratings yet
Slay The Day
21 pages
1A. Step 1: POM-QM For Windows
No ratings yet
1A. Step 1: POM-QM For Windows
9 pages
Overview of Scatterometry Applications in High Volume Silicon Manufacturing
No ratings yet
Overview of Scatterometry Applications in High Volume Silicon Manufacturing
10 pages
Hydrologic and Water Quality Models Performance Measures and Eva 2015
No ratings yet
Hydrologic and Water Quality Models Performance Measures and Eva 2015
23 pages
Statistics For Management and Economics
100% (1)
Statistics For Management and Economics
16 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
ML Assignment
No ratings yet
ML Assignment
7 pages
Guide
No ratings yet
Guide
24 pages
Operation Analytics Notes
No ratings yet
Operation Analytics Notes
5 pages
Practical Issues
No ratings yet
Practical Issues
30 pages
Napolitano Amri 200912 PHD
No ratings yet
Napolitano Amri 200912 PHD
235 pages
Lecture1 SML-I Merged
No ratings yet
Lecture1 SML-I Merged
157 pages
2023 ML Assignment
No ratings yet
2023 ML Assignment
57 pages
Analytical Scalable PDF
No ratings yet
Analytical Scalable PDF
9 pages
Probability Estimation in Random Forests
No ratings yet
Probability Estimation in Random Forests
35 pages
Book Machine Learning Finance Python
100% (1)
Book Machine Learning Finance Python
75 pages
Unit-3 Alt
No ratings yet
Unit-3 Alt
24 pages
2005 - LANDWEHR - Logistic Model Trees
No ratings yet
2005 - LANDWEHR - Logistic Model Trees
45 pages
Anova (Keller)
No ratings yet
Anova (Keller)
91 pages
ML Unit 2-2-40
No ratings yet
ML Unit 2-2-40
39 pages
Chapter8 Decision Trees Exercises v2 20230112
No ratings yet
Chapter8 Decision Trees Exercises v2 20230112
42 pages
Slides (A19 A20)
No ratings yet
Slides (A19 A20)
261 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
DeepLearning L1 Intro
No ratings yet
DeepLearning L1 Intro
92 pages
FRM Course Syllabus IPDownload
No ratings yet
FRM Course Syllabus IPDownload
3 pages
Predicting True Value of Cars Using Ml-1
No ratings yet
Predicting True Value of Cars Using Ml-1
36 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
13 pages
COS324 Course Notes
No ratings yet
COS324 Course Notes
256 pages
An Overview of Decision Tree Applied To Power Systems: Chengxi Liu, Zakir Hussain Rather, Zhe Chen, Claus Leth Bak
No ratings yet
An Overview of Decision Tree Applied To Power Systems: Chengxi Liu, Zakir Hussain Rather, Zhe Chen, Claus Leth Bak
7 pages
20MIS1025 - Regression - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Regression - Ipynb - Colaboratory
5 pages
Decision - Tree - Regression - Ipynb - Colab
No ratings yet
Decision - Tree - Regression - Ipynb - Colab
3 pages
Inverse Design of Rectangular Microstrip Patch Antenna Using Neural Network Combining With Time-Domain Representation of S-Parameters
No ratings yet
Inverse Design of Rectangular Microstrip Patch Antenna Using Neural Network Combining With Time-Domain Representation of S-Parameters
4 pages
What Do TIPS Say About Real Interest Rates and Required Returns
No ratings yet
What Do TIPS Say About Real Interest Rates and Required Returns
25 pages
MLSP Lab Exp4
No ratings yet
MLSP Lab Exp4
9 pages
Supervised ML
No ratings yet
Supervised ML
13 pages
An Anaya
No ratings yet
An Anaya
40 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
QUESTION BANK, Sample Paper, and Many More
No ratings yet
QUESTION BANK, Sample Paper, and Many More
43 pages
978 3 031 59257 7
No ratings yet
978 3 031 59257 7
628 pages
Go To Record: Download Date: 14/11/2024
No ratings yet
Go To Record: Download Date: 14/11/2024
10 pages
Prediction of Pressurant Mass Requirements For Axisymmetric Liquid Hydrogen Tanks
No ratings yet
Prediction of Pressurant Mass Requirements For Axisymmetric Liquid Hydrogen Tanks
10 pages
Class 2a-Decision Trees
No ratings yet
Class 2a-Decision Trees
28 pages
Lecture 02
No ratings yet
Lecture 02
54 pages
14 - Ensemble Methods
No ratings yet
14 - Ensemble Methods
38 pages
ML Unit3
No ratings yet
ML Unit3
24 pages
ML Asst.-01
No ratings yet
ML Asst.-01
21 pages
A Deep Learning Approach For Optimizing Monoclonal Antibody Production Process Parameters
No ratings yet
A Deep Learning Approach For Optimizing Monoclonal Antibody Production Process Parameters
15 pages
Evaluation of Milkoscan FT2 Milk Analyze
No ratings yet
Evaluation of Milkoscan FT2 Milk Analyze
8 pages
Comparative Study of House Price Prediction Using Machine Learning Research Paper
No ratings yet
Comparative Study of House Price Prediction Using Machine Learning Research Paper
14 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
Chap5 - Machine Learning Part II - Decision Tree
No ratings yet
Chap5 - Machine Learning Part II - Decision Tree
68 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
Wa0054.
No ratings yet
Wa0054.
1 page
INT354 - Unit 2
No ratings yet
INT354 - Unit 2
26 pages

ML Project

Uploaded by

ML Project

Uploaded by

Under review as submission to TMLR

Noise Resilience in Machine Learning Models: A Comparison

3.1 1. Dataset and Preprocessing

3.2 2. Noise Levels

3.4 4. Evaluation Metrics

3.5 5. Framework Implementation

4 Results and Analysis

4.1 1. Impact of Noise on Accuracy

4.2 2. Resilience of Decision Trees

4.3 3. Significance of Metrics

6 Insights from Related Research

Research on crisp and fuzzy decision trees provides additional insights:

7 Conclusion and Future Work

You might also like