0% found this document useful (0 votes)

24 views31 pages

Unit 3

Uploaded by

Mohamed riyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views31 pages

Unit 3

Uploaded by

Mohamed riyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Unit-3

Decision trees
Representing concepts as Decision trees
Representing concepts as decision trees is a valuable approach in various fields, including
data analysis, machine learning, and artificial intelligence. Decision trees are graphical
representations that model decisions and their possible consequences, displaying choices as
branches and outcomes as leaves. They help in understanding complex relationships and
making predictions based on input features.

In this method, each internal node represents a "test" on an input feature, each branch
represents the outcome of the test, and each leaf node holds a class label or a probability
distribution. The tree is built in a top-down manner, iteratively splitting the data based on the
feature that provides the most significant information gain or reduction in impurity.

Decision trees can be particularly useful in the following ways:

1. Visualization: They provide a visual representation of the decision-making process,

making it easier to understand and interpret the model.

2. Handling non-linear relationships: Decision trees can capture non-linear relationships

between input features and the target variable, which may not be possible with linear models.

3. Handling both categorical and continuous data: Decision trees can handle both types of
data, allowing for a more versatile approach to problem-solving.

4. Feature selection: During the tree construction process, the algorithm naturally selects the
most important features, which can help in identifying the most relevant predictors in the
dataset.

5. Interpretability: Decision trees are relatively easy to interpret, allowing users to understand
how the model arrives at its predictions or classifications.

However, it's essential to be aware of potential limitations, such as overfitting, where the
model performs well on the training data but poorly on unseen data. To mitigate this,
techniques like pruning or using more complex decision trees (e.g., Random Forests or
Gradient Boosting Machines) can be employed.
Recursive induction of Decision trees

Recursive induction is the process of building a decision tree by recursively splitting the data
based on the most informative feature at each step. This method is widely used in various
machine learning algorithms, such as the CART (Classification and Regression Trees)
algorithm, ID3 (Iterative Dichotomizer 3), C4.5, and their extensions like C5.0 and Random
Forests.

Here's a detailed explanation of the recursive induction process:

1. Data preparation: Begin with a dataset containing input features (X) and a target variable
(Y). The dataset is usually split into a training set and a validation or test set.

2. Root node creation: Choose the best feature (based on information gain, Gini impurity, or
other criteria) to split the data. This feature becomes the root node of the decision tree.

3. Splitting: Split the data based on the selected feature's possible values. For example, if the
feature is "temperature," the split might be into "cold" and "not cold" groups. Each resulting
subset of data forms a new node in the decision tree.

4. Recursion: Repeat steps 2 and 3 for each new node created in the previous step, until one
of the following conditions is met:

a. Purity: If a node's data is homogeneous (all samples belong to the same class), it becomes
a leaf node with the corresponding class label.

b. Stopping criteria: A predefined threshold for the number of samples, depth of the tree, or
maximum allowed impurity reduction might be set to prevent Decision Trees. If any of these
conditions are met, the node becomes a leaf node.

c. No further improvement: If no feature can significantly improve the model's

performance, the node becomes a leaf node.

5. Tree pruning: In some cases, it's beneficial to remove certain branches from the decision
tree that were overfitting the training data. This process, called pruning, can help improve the
model's performance on unseen data.
6. Output: The final decision tree consists of nodes and branches representing the decision-
making process. Each leaf node holds a class label or a probability distribution, which can be
used to make predictions or classifications.

Recursive induction of decision trees is an iterative process that continues until the stopping
criteria are met. It allows the model to capture complex relationships between input features
and the target variable while maintaining interpretability. However, it's crucial to carefully
choose the appropriate stopping criteria and feature selection methods to avoid overfitting
and ensure the model's generalization ability.
Searching for simple trees and Computational complexity
Simple trees, also known as shallow trees or small decision trees, are decision trees with
limited depth or a restricted number of nodes. They are used to balance the trade-off between
model complexity and generalization performance. By limiting the tree's depth or size, we
can reduce the risk of overfitting, which occurs when a model performs well on the training
data but poorly on unseen data.

Computational complexity refers to the time and space requirements needed to execute an
algorithm or solve a problem. In the context of decision trees, we are concerned with both the
time taken to build the tree and the memory required to store it.

1. Time complexity: The time complexity of building a decision tree depends on the number
of nodes and the time taken to evaluate each node. In the worst-case scenario, when building
a complete binary tree, the time complexity is O(n), where n is the number of samples.
However, this is an idealized scenario, and in practice, the time complexity can be higher due
to the need to evaluate different features and their possible values at each node.

2. Space complexity: The space complexity of a decision tree depends on the number of
nodes in the tree. In the worst-case scenario, when building a complete binary tree, the space
complexity is O(n), where n is the number of samples. However, this can be higher in
practice due to the need to store additional information like feature names, node depths, and
other metadata.
Simple trees can help reduce the computational complexity of decision trees by limiting the
tree's size or depth. This can lead to faster training times and lower memory requirements.
However, it's essential to strike a balance between model complexity and performance, as
overly simplified trees may not capture the underlying relationships in the data effectively.

To find the optimal balance, techniques like cross-validation and tuning hyperparameters
(such as maximum tree depth) can be employed. These methods help ensure that the decision
tree model is both computationally efficient and capable of capturing the essential patterns in
the data.
Overfitting, noisy data, and pruning are crucial concepts in machine learning, particularly
when working with decision trees. Let's explore each of these topics in detail.

1. Overfitting: Overfitting occurs when a machine learning model learns the training data too
well, leading to poor performance on unseen or new data. In the context of decision trees,
overfitting can happen when the tree becomes too complex, with many nodes and deep
branches. This complexity allows the model to fit the training data accurately but fails to
generalize well to unseen data, as the model has learned the noise or random fluctuations in
the training data.

2. Noisy data: Noisy data refers to data that contains errors, outliers, or random fluctuations
that do not represent the underlying pattern or relationship between variables. Noisy data can
make it challenging for any machine learning model, including decision trees, to learn the
true pattern and generalize well. In such cases, more complex models may be more prone to
overfitting, as they tend to fit the noise in the data rather than the actual pattern.

3. Pruning: Pruning is a technique used to remove or simplify parts of a decision tree model
to prevent overfitting and improve the model's generalization performance. The main idea
behind pruning is to remove branches or nodes that contribute little to the model's accuracy
but increase its complexity

Removing noisy data is a crucial step in machine learning, as it can significantly improve the
performance and accuracy of your models. Here are some common types of noisy data and
techniques to remove them:

Types of noisy data:

Outliers: Data points that are significantly different from the rest of the data.
Noise: Random fluctuations or errors in the data.
Missing values: Data points that are missing or incomplete.
Anomalies: Data points that do not follow the expected pattern or behavior.
Duplicates: Duplicate data points that are identical or very similar.

Techniques to remove noisy data:

Data cleaning: Manually inspecting and correcting errors in the data.

Data preprocessing: Applying algorithms to transform and normalize the data.
Statistical filtering: Using statistical methods to identify and remove outliers.
Machine learning algorithms: Using algorithms like clustering, decision trees, or neural
networks to identify and remove noisy data.
Data visualization: Visualizing the data to identify patterns and anomalies.
Specific techniques for each type of noisy data:

Outliers:
Median absolute deviation (MAD) method: Calculate the median absolute deviation from
the median and remove data points that are more than 3-4 times the MAD away from the
median.
Density-based spatial clustering of applications with noise (DBSCAN): Identify clusters
and remove outliers based on their proximity to other data points.
Noise:
Gaussian mixture model (GMM): Model the data as a mixture of Gaussian distributions
and remove data points that do not fit the model.
Local outlier factor (LOF): Calculate the local density of each data point and remove those
with low density.
Missing values:
Mean imputation: Replace missing values with the mean of the corresponding feature.
Median imputation: Replace missing values with the median of the corresponding feature.
K-nearest neighbors (KNN) imputation: Replace missing values with the value of the
KNN nearest neighbor.
Anomalies:
Isolation forest: Identify anomalies by isolating them from other data points using an
ensemble of decision trees.
One-class SVM: Train a support vector machine on a subset of normal data points and
identify anomalies as those that are farthest from the decision boundary.
Duplicates:
Duplicate detection algorithms: Use algorithms like Jaro-Winkler distance or Levenshtein
distance to identify duplicate data points.
Remember that there is no one-size-fits-all solution for removing noisy data. The choice of
technique depends on the specific characteristics of your data and the problem you're trying
to solve.

What are L1 and L2 regularization?

In machine learning, regularization is a technique used to prevent overfitting by adding a

penalty term to the loss function. The goal is to reduce the model's complexity and prevent it
from becoming too specialized to the training data.
**L1 Regularization (Lasso)**

L1 regularization, also known as Lasso (Least Absolute Shrinkage and Selection Operator),
adds a term to the loss function that is proportional to the absolute value of each model
coefficient. The idea is to encourage some coefficients to be zero, effectively removing them
from the model.

Mathematically, L1 regularization adds a term to the loss function:

`L = (1/2) * (y - Xw)^2 + α * |w|`

where `L` is the loss function, `y` is the target variable, `X` is the feature matrix, `w` is the
model coefficient vector, `α` is the regularization strength, and `|w|` is the absolute value of
`w`.

L2 Regularization (Ridge)

L2 regularization, also known as Ridge regression, adds a term to the loss function that is
proportional to the square of each model coefficient. The idea is to discourage large
coefficients by adding a penalty term.

Mathematically, L2 regularization adds a term to the loss function:

`L = (1/2) * (y - Xw)^2 + α * w^2`

where `L` is the loss function, `y` is the target variable, `X` is the feature matrix, `w` is the
model coefficient vector, `α` is the regularization strength, and `w^2` is the square of `w`.

**Example:**

Suppose we have a simple linear regression model with two features (`x1` and `x2`) and one
target variable (`y`). We want to predict `y` using these features.
**L1 Regularization (Lasso)**

Let's say we have a dataset with 10 samples, and our model has coefficients `w1 = 3.5`, `w2 =
2.8`, and an intercept term `b = 0.5`. The loss function without regularization would be:

`L = (1/2) * (y - (x1w1 + x2w2 + b))^2`

To add L1 regularization with a strength of `α = 0.5`, we would modify the loss function as
follows:

`L = (1/2) * (y - (x1w1 + x2w2 + b))^2 + 0.5 * |w1| + 0.5 * |w2|`

In this case, the L1 regularization term encourages some coefficients to be zero. For example,
if `w1` becomes very small, its absolute value will become smaller than 0.5, which would
make it more likely for it to be set to zero.

L2 Regularization (Ridge)

Using L2 regularization with a strength of `α = 0.5`, we would modify the loss function as
follows:

`L = (1/2) * (y - (x1w1 + x2w2 + b))^2 + 0.5 * w1^2 + 0.5 * w2^2`

In this case, the L2 regularization term discourages large coefficients by adding a penalty
term that grows quadratically with their magnitude.

By applying L1 or L2 regularization, we can reduce overfitting and improve the

generalization performance of our model.
**What is Pruning in Machine Learning?**
Pruning in machine learning is a technique used to reduce the complexity of a trained model
by removing unnecessary or redundant parts of the model. The goal is to improve the model's
performance, interpretability, and scalability by reducing its size, computational
requirements, and memory usage.

Why is Pruning necessary?

As models become more complex and deeper, they can become prone to overfitting, which
means they become too specialized to the training data and fail to generalize well to new,
unseen data. Pruning helps to address this issue by:

1. Reducing Overfitting: By removing unnecessary parameters, pruning helps to reduce

the model's capacity to fit the noise in the training data, which can lead to overfitting.
2. **Improving Interpretable**: By simplifying the model, pruning can make it easier to
understand and interpret the relationships between features and the predictions.
3. **Enhancing Scalability**: Pruning can reduce the computational requirements and
memory usage of the model, making it more suitable for large-scale applications.

**Types of Pruning:**

There are several types of pruning techniques, including:

1. **Post-pruning**: This involves pruning the model after it has been trained.
2. **Pre-pruning**: This involves pruning the model during training.
3. **Layer-wise pruning**: This involves pruning individual layers of the model.
4. **Filter-wise pruning**: This involves pruning individual filters or neurons within a layer.
5. **Weight pruning**: This involves pruning individual weights or connections within a
layer.

**Pruning Algorithms:**

Some popular pruning algorithms include:

1. **L1 Regularization**: This involves adding a penalty term to the loss function that
encourages weights to be zero.
2. **L2 Regularization**: This involves adding a penalty term to the loss function that
encourages weights to be small.
3. **Dropout**: This involves randomly dropping out neurons during training.
4. **Gaussian Pruning**: This involves randomly dropping out neurons based on a Gaussian
distribution.
5. **Thermal Pruning**: This involves dropping out neurons based on their thermal activity.

**Pruning Techniques:**

Some popular pruning techniques include:

1. Random Pruning: This involves randomly selecting neurons or weights to prune.

2. **Gradient-based Pruning**: This involves selecting neurons or weights based on their
gradients.
3. **Taylor Series Pruning**: This involves approximating the loss function using a Taylor
series and pruning based on this approximation.
4. **Mutual Information Pruning**: This involves pruning neurons or weights based on their
mutual information.

**Pruning in Practice:**

Pruning is commonly used in practice in various domains, including:

1. **Computer Vision**: Pruning is used in image classification and object detection tasks to
reduce the computational requirements of deep neural networks.
2. **Natural Language Processing**: Pruning is used in language models and text
classification tasks to reduce the complexity of word embeddings and language models.
3. **Speech Recognition**: Pruning is used in speech recognition systems to reduce the
computational requirements of acoustic models.

In conclusion, pruning is a powerful technique for reducing the complexity of machine

learning models and improving their performance, interpretability, and scalability. By
understanding the different types of pruning techniques and algorithms, we can better apply
this technique in our own projects and improve our models' performance.

What is Decision Tree Pruning?

Decision tree pruning is a technique used to prevent decision trees from overfitting the
training data. Pruning aims to simplify the decision tree by removing parts of it that do not
provide significant predictive power, thus improving its ability to generalize to new data.

Decision Tree Pruning removes unwanted nodes from the overfitted decision tree to make it
smaller in size which results in more fast, more accurate and more effective predictions.

Types Of Decision Tree Pruning

There are two main types of decision tree pruning: Pre-Pruning and Post-Pruning.

Pre-Pruning (Early Stopping)

Sometimes, the growth of the decision tree can be stopped before it gets too complex, this is
called pre-pruning. It is important to prevent the overfitting of the training data, which results
in a poor performance when exposed to new data.

Some common pre-pruning techniques include:

Maximum Depth: It limits the maximum level of depth in a decision tree.

Minimum Samples per Leaf: Set a minimum threshold for the number of samples in each leaf
node.
Minimum Samples per Split: Specify the minimal number of samples needed to break up a
node.
Maximum Features: Restrict the quantity of features considered for splitting.
By pruning early, we come to be with a simpler tree that is less likely to overfit the training
facts.

Post-Pruning (Reducing Nodes)

After the tree is fully grown, post-pruning involves removing branches or nodes to improve
the model’s ability to generalize. Some common post-pruning techniques include:
Cost-Complexity Pruning (CCP): This method assigns a price to each subtree primarily based
on its accuracy and complexity, then selects the subtree with the lowest fee.
Reduced Error Pruning: Removes branches that do not significantly affect the overall
accuracy.
Minimum Impurity Decrease: Prunes nodes if the decrease in impurity (Gini impurity or
entropy) is beneath a certain threshold.
Minimum Leaf Size: Removes leaf nodes with fewer samples than a specified threshold.

Blockchain Hacking Preview
100% (1)
Blockchain Hacking Preview
37 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Instruction Manual: Programmable Automatic Shift System
No ratings yet
Instruction Manual: Programmable Automatic Shift System
25 pages
Full Introduction About Xilinx FPGA and Its Architecture
No ratings yet
Full Introduction About Xilinx FPGA and Its Architecture
19 pages
DBMS File
No ratings yet
DBMS File
96 pages
How Social Media Can Make A History by Clay Shirky - Reaction Paper John Darryl P. Ligan
No ratings yet
How Social Media Can Make A History by Clay Shirky - Reaction Paper John Darryl P. Ligan
2 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Rigging Safety
100% (1)
Rigging Safety
27 pages
Technically and Economically-Developed Refractory Concrete Concepts For The Cement Industry
No ratings yet
Technically and Economically-Developed Refractory Concrete Concepts For The Cement Industry
66 pages
Aiml QB With Ans - 075736
No ratings yet
Aiml QB With Ans - 075736
69 pages
SOP For Protocol For Working Standard
No ratings yet
SOP For Protocol For Working Standard
6 pages
Bhabesh - Chapter 3 Complete Editing Including Summary
No ratings yet
Bhabesh - Chapter 3 Complete Editing Including Summary
18 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Think and Decide Think and Observe: 3 Quarter Week 1 Lesson Plan Mathematics 4 I. Objectives
100% (1)
Think and Decide Think and Observe: 3 Quarter Week 1 Lesson Plan Mathematics 4 I. Objectives
3 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
ML Unit 2-2-40
No ratings yet
ML Unit 2-2-40
39 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
NOTES
No ratings yet
NOTES
18 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Lecture 7 Overview of ML Models
No ratings yet
Lecture 7 Overview of ML Models
77 pages
2179 Unit 3
No ratings yet
2179 Unit 3
29 pages
Lecture-4 Unit 2
No ratings yet
Lecture-4 Unit 2
73 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
ML Unit3
No ratings yet
ML Unit3
8 pages
UNIT 2 - Groups (Decision Tree)
No ratings yet
UNIT 2 - Groups (Decision Tree)
20 pages
Decision Tree DT
No ratings yet
Decision Tree DT
20 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Simulación Con VLECalc
No ratings yet
Simulación Con VLECalc
36 pages
ML Ch-3 Decision Trees and Ensemble Methods
No ratings yet
ML Ch-3 Decision Trees and Ensemble Methods
14 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
ML Unit 3
No ratings yet
ML Unit 3
15 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
Unit 4
No ratings yet
Unit 4
33 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Naive Bayes and Decision Tree Classification
No ratings yet
Naive Bayes and Decision Tree Classification
21 pages
A Master Gunmakers Guide To Building Bolt-Action Rifles
97% (33)
A Master Gunmakers Guide To Building Bolt-Action Rifles
153 pages
Tree
No ratings yet
Tree
7 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
HSMC
No ratings yet
HSMC
5 pages
Unit 1
No ratings yet
Unit 1
24 pages
Second Floor Beam & Slab Layout: B C D E A
No ratings yet
Second Floor Beam & Slab Layout: B C D E A
1 page
Decision Treesnotes
No ratings yet
Decision Treesnotes
3 pages
Draft DGS Order As An Addendum To Order 28 of 2020 v3
No ratings yet
Draft DGS Order As An Addendum To Order 28 of 2020 v3
19 pages
Introduction To Text Mining
No ratings yet
Introduction To Text Mining
45 pages
Generative Artificial Intelligence in Ophthalmology
No ratings yet
Generative Artificial Intelligence in Ophthalmology
11 pages
Pointers Reviewer For Second Periodical Exam
No ratings yet
Pointers Reviewer For Second Periodical Exam
2 pages
Technical Information Sheet: Lashing Points
No ratings yet
Technical Information Sheet: Lashing Points
2 pages
Naat Nisa Brochure 2023...
No ratings yet
Naat Nisa Brochure 2023...
4 pages
Clinical Job Aid Radiant Warmer Phoenix
No ratings yet
Clinical Job Aid Radiant Warmer Phoenix
2 pages
CLADLOK Flat Panel Datasheet
No ratings yet
CLADLOK Flat Panel Datasheet
2 pages
Mickael Musindo
No ratings yet
Mickael Musindo
2 pages
BCI Protocol V1.4
No ratings yet
BCI Protocol V1.4
3 pages
Vanessa Carbonell
No ratings yet
Vanessa Carbonell
4 pages
4 Word Processor
No ratings yet
4 Word Processor
22 pages
Spys Mykola Resume
No ratings yet
Spys Mykola Resume
1 page
BCN Campus Recruitment Process - FAQ
No ratings yet
BCN Campus Recruitment Process - FAQ
1 page
202550876663IF Chibuzor
No ratings yet
202550876663IF Chibuzor
1 page
13-13, Connection Box EJB 5380
No ratings yet
13-13, Connection Box EJB 5380
1 page
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Limitorque MX/QX: HART Field Unit
No ratings yet
Limitorque MX/QX: HART Field Unit
2 pages