0% found this document useful (0 votes)

8 views7 pages

Model Engineering

The document discusses different steps in model engineering including data splitting, building an unpruned decision tree model, visualizing and interpreting the decision tree, calculating feature importance, and how data can be separated for decision trees with different numbers of leaf nodes.

Uploaded by

stepbysteptoUranus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views7 pages

Model Engineering

Uploaded by

stepbysteptoUranus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Model Engineering

1. Data splitting :
a. Import Necessary Module: First, we import the train_test_split function from the
sklearn.model_selection module
b. Separate Features and Target Variable: Before splitting the data, we need to separate
the features (X) from the target variable (y). This ensures that we're correctly assigning
predictors and the variable we're trying to predict
c. Split the Data:
We use the train_test_split function to split the data into training and testing sets:
● X_train and y_train represent the features and target variables of the training set,
respectively.
● X_test and y_test represent the features and target variables of the testing set,
respectively.
● test_size=0.2 specifies that 20% of the data will be used for testing, and the
remaining 80% will be used for training.
● random_state=42 ensures reproducibility. It sets the random seed so that the data
split is the same every time the code is run.
The method used here is called Stratified Splitting. It's a technique used to split data into
training and testing sets while ensuring that the proportions of classes in the target variable
are preserved in both sets. This is particularly important when dealing with classification
problems where the distribution of classes in the data is imbalanced. train_test_split
automatically handles this when splitting the data.

2. Run the decision tree algorithm on the training data without

pruning :
An unpruned decision tree is a decision tree algorithm where the maximum depth of the tree
is not limited (max_depth=None in scikit-learn). This allows the tree to grow as large as
possible, splitting the data at every available opportunity based on the features.

Here's a breakdown of the key characteristics:

○ No Depth Restriction: The tree continues to split data points as long as there are
potential improvements in the splitting criterion (e.g., Gini impurity for classification).
○ Potentially Overfitting: Unpruned trees are susceptible to overfitting. They might
capture intricate details from the training data that don't generalize well to unseen
data. The tree becomes too complex, focusing on specific patterns in the training data
that might not be representative of the broader population.
○ High Variance: Unpruned trees can have high variance, meaning small changes in the
training data can lead to significantly different tree structures. This can make them less
reliable.
❖ Parameter settings :

1. Library:

● Choose a library that supports decision trees. This example uses scikit-learn's
DecisionTreeClassifier class in Python.

2. Disabling Pruning:

● To avoid pruning, we'll set the max_depth parameter to a high value (typically
None). This allows the tree to grow as much as possible, essentially disabling
pruning.
max_depth=None

3. Additional Notes:

● The default splitting criterion used by DecisionTreeClassifier for classification

tasks is the gini impurity index. You don't need to set this explicitly in most
cases.
● While this approach avoids pruning, it can lead to overfitting. Consider
evaluating the model on unseen data (testing set) and potentially using pruning
techniques for better generalization.

3. graphical and textual representation of the learned decision

tree

In this section, we discuss the process of visualizing and understanding the learned decision tree
model through both graphical and textual representations.

Exporting the Decision Tree to a DOT File:

● We use the export_graphviz function from scikit-learn's tree module to
export the learned decision tree to a DOT file.
● The function takes various parameters such as the trained decision tree
classifier (tree_classifier), feature names, class names, and formatting options.
● The output is a textual representation of the decision tree in the DOT
language, which is a plain text graph description language.
Visualizing the Decision Tree Using Graphviz:
● After exporting the decision tree to a DOT file, we utilize the Graphviz
library to visualize the tree structure.
● Graphviz allows us to generate graphical representations of graphs
described in the DOT language.
● The graphviz.Source class is used to load the DOT file and create a graph
object.
● We can then save the graphical representation as a PDF file
(graph.render("decision_tree")) and display it in the default viewer
(graph.view()).
Generating Textual Representation:
● Additionally, we employ the export_text function from scikit-learn's tree
module to generate a textual representation of the decision tree.
● This function produces a more human-readable summary of the decision
tree's structure compared to the DOT format.
● The textual representation includes information such as feature names,
thresholds, and decision criteria.
Printing Textual Representation:
● Finally, we print the generated textual representation to the console.
● This allows for a detailed examination of the decision tree's structure and
decision-making process.

By following these steps, we obtain both graphical and textual representations of the
learned decision tree model, facilitating better understanding and interpretation of its
behavior and performance.

4. The features that are most relevant for the classification task

The most relevant features for the classification task can be identified by examining the top-level
splits or those closer to the root of the learned decision tree model. These features play a crucial role
in distinguishing between different classes. In the textual representation of the decision tree,
features used for the initial splits are indicative of their importance in the classification process.

The most relevant features for the classification task can be determined by calculating the feature importance.
Feature importance is a measure of how much a given feature contributes to the classification accuracy of the
decision tree.
In a decision tree, the importance of a feature is calculated as the sum of the reduction in entropy (or increase in
information gain) at each node where the feature is used to split the data. The higher the importance score, the
more relevant the feature is for the classification task.

To compute the overall importance of a feature, we can sum up the importance scores for that feature across all
nodes where it is used. The following code shows how to calculate the feature importance for the decision tree
without pruning

5. how data are separated for 3 to 5 leaf nodes

To show how data are separated for 3 to 5 leaf nodes, we can look at the decision tree structure and
how it partitions the data based on various features and thresholds. Here's an example of how data
might be separated for a decision tree with 3 to 5 leaf nodes:

Three Leaf Nodes:

● The decision tree divides the data into three distinct regions or classes.
● Each leaf node represents a subset of the data that shares similar characteristics or
features.
● The splits in the tree effectively partition the feature space into three regions, each
corresponding to one of the leaf nodes.

● The decision tree divides the data into three distinct regions represented by three
leaf nodes: Leaf Node 1, Leaf Node 2, and Leaf Node 3.
● Leaf Node 1 corresponds to individuals who joined before or in 2016, have a
payment tier less than or equal to 2.50, and any other conditions that led to this
subset.
● Leaf Node 2 represents individuals who joined after 2016, have a payment tier less
than or equal to 2.50, and any other conditions specific to this group.
● Leaf Node 3 captures individuals with a payment tier greater than 2.50, along with
any other conditions that define this subset.
●
Four Leaf Nodes:
● With four leaf nodes, the decision tree creates a more detailed partitioning of the
data compared to three leaf nodes.
● The splits in the tree result in four distinct regions or classes, allowing for more
nuanced distinctions among the data points.
● Each leaf node captures a subset of the data with similar characteristics based on the
features considered by the tree.

This tree introduces a split based on gender for individuals with a payment tier greater than
2.50, resulting in four leaf nodes.

Five Leaf Nodes:

● A decision tree with five leaf nodes further refines the partitioning of the data.
● The additional leaf node allows for even more detailed distinctions among the data
points.
● Each leaf node represents a subset of the data with specific characteristics, and the
decision tree's splits aim to maximize the homogeneity within each leaf node while
maximizing the heterogeneity between nodes.

Demonstration
To further refine the partitioning of the data into five leaf nodes, we might introduce
additional splits or fine-tune existing ones, such as:

|--- JoiningYear <= 2017.50

Here, we introduced a split based on age for individuals with a payment tier greater than
2.50, leading to five leaf nodes.

6. Running the decision tree algorithm on the training data

with pre-pruning
The decision tree algorithm is run on the training data with pre-pruning, where the parameter
setting for pre-pruning is max_depth=3. This means that the maximum depth of the decision tree is
limited to 3 levels.

Explanation of the thresholds used for pre-pruning:

● max_depth: The max_depth parameter is set to 3, which specifies the maximum depth of
the decision tree. This limits the number of splits that can be made during the tree-building
process. By restricting the depth of the tree, we prevent it from becoming too complex and
overfitting the training data. In this case, a maximum depth of 3 was chosen, but the
optimal value for max_depth depends on the data and the desired level of accuracy.
● Gini impurity criterion: The decision tree algorithm uses the Gini impurity criterion to
measure the impurity of a node. The higher the Gini impurity, the more impure the node
is. The algorithm splits the nodes with the highest Gini impurity first, as this results in the
most significant reduction in impurity. By using pre-pruning with a maximum depth, we
limit the number of splits and thus control the complexity of the decision tree, which helps
prevent overfitting.

In summary, the decision tree algorithm with pre-pruning is configured to limit the maximum
depth of the tree to 3 levels, using the Gini impurity criterion to guide the splitting process. This
approach helps to obtain optimal results by balancing model complexity and accuracy.

1979 - Hartigan - A K-Means Algorithm PDF
No ratings yet
1979 - Hartigan - A K-Means Algorithm PDF
10 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
16 pages
Shooting Method PDF
No ratings yet
Shooting Method PDF
11 pages
Assignment F10 On LP Assignment Method-Largo, John Christian I.
No ratings yet
Assignment F10 On LP Assignment Method-Largo, John Christian I.
2 pages
Decision Trees Presentation
No ratings yet
Decision Trees Presentation
10 pages
1.10. Decision Trees — scikit-learn 0.24.1 documentation
No ratings yet
1.10. Decision Trees — scikit-learn 0.24.1 documentation
10 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
Data Structures
No ratings yet
Data Structures
2 pages
Chapter7Codes
No ratings yet
Chapter7Codes
2 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
ML
No ratings yet
ML
3 pages
ML 04 Classification Decission Tree
No ratings yet
ML 04 Classification Decission Tree
28 pages
6. Decision Trees
No ratings yet
6. Decision Trees
18 pages
Tree
No ratings yet
Tree
7 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
Assignment EE5179 ME20B145 Report
No ratings yet
Assignment EE5179 ME20B145 Report
6 pages
Decision Tree by Masud (1)
No ratings yet
Decision Tree by Masud (1)
12 pages
6 Decision Trees in Data Mining
No ratings yet
6 Decision Trees in Data Mining
10 pages
Lecture 1a - Algorithm
No ratings yet
Lecture 1a - Algorithm
25 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
Nptel Notes 2
No ratings yet
Nptel Notes 2
21 pages
Python Decision Tree Classification
No ratings yet
Python Decision Tree Classification
14 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
18 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
23BCE7199 ML Lab Assignment[1]
No ratings yet
23BCE7199 ML Lab Assignment[1]
15 pages
Best Splitting Attributes ML (2)
No ratings yet
Best Splitting Attributes ML (2)
34 pages
Lesson 5.0 Supervised Learning with Decision Trees (1)
No ratings yet
Lesson 5.0 Supervised Learning with Decision Trees (1)
16 pages
decision_trees_implementation (1)
No ratings yet
decision_trees_implementation (1)
13 pages
Dsplab 3
100% (1)
Dsplab 3
17 pages
Building A Decision Tree Classifier From Scratch
No ratings yet
Building A Decision Tree Classifier From Scratch
10 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
No ratings yet
Decision Trees (I) : ISOM3360 Data Mining For Business Analytics, Session 4
32 pages
The Cocke-Younger-Kasami Algorithm 1
No ratings yet
The Cocke-Younger-Kasami Algorithm 1
17 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Ai Merge All Slides'
No ratings yet
Ai Merge All Slides'
314 pages
11 00 0296-01-00sb Suggested Phase Noise Model For 802 11 HRB
No ratings yet
11 00 0296-01-00sb Suggested Phase Noise Model For 802 11 HRB
17 pages
DSP.m1
No ratings yet
DSP.m1
13 pages
Optimizare Multiobiectiv Utilizand Algoritmi Genetici
No ratings yet
Optimizare Multiobiectiv Utilizand Algoritmi Genetici
27 pages
Lecture4 Supervised Segmentation For Students
No ratings yet
Lecture4 Supervised Segmentation For Students
44 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
Prac 6
No ratings yet
Prac 6
6 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Decision Trees
No ratings yet
Decision Trees
21 pages
Soft Computing
No ratings yet
Soft Computing
2 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
DMI UNIT 4
No ratings yet
DMI UNIT 4
34 pages
ML(R20)-III-II- 10 marks Question bank
No ratings yet
ML(R20)-III-II- 10 marks Question bank
2 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
19 -- Decision Tree -- ID3
No ratings yet
19 -- Decision Tree -- ID3
87 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Decision Trees
No ratings yet
Decision Trees
11 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Improvement in Power Transformer Intelligent Dissolved Gas Analysis Method
No ratings yet
Improvement in Power Transformer Intelligent Dissolved Gas Analysis Method
4 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Digital Audio Compression: by Davis Yen Pan
No ratings yet
Digital Audio Compression: by Davis Yen Pan
14 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Data Structures and Algorithms Sheet #7 Trees: Part I: Exercises
No ratings yet
Data Structures and Algorithms Sheet #7 Trees: Part I: Exercises
6 pages
ECD312:Digital Communication and Systems Assignment: 1 (20 Points)
No ratings yet
ECD312:Digital Communication and Systems Assignment: 1 (20 Points)
3 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Treesnotes
No ratings yet
Decision Treesnotes
3 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
CSE 185 Introduction To Computer Vision: Local Invariant Features
No ratings yet
CSE 185 Introduction To Computer Vision: Local Invariant Features
57 pages
Dive Into Deep Learning
No ratings yet
Dive Into Deep Learning
972 pages
Study and Performance Analysis of IIR Filter For Noise Diminution in Digital Signal Using MATLAB
No ratings yet
Study and Performance Analysis of IIR Filter For Noise Diminution in Digital Signal Using MATLAB
7 pages
M.C.a. (Sem - II) Paper - I - Data Structures
No ratings yet
M.C.a. (Sem - II) Paper - I - Data Structures
132 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
decision tree
No ratings yet
decision tree
13 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Fir
No ratings yet
Fir
35 pages
Numerical Analysis
No ratings yet
Numerical Analysis
3 pages
EES Tutorial ME 435: A) B) C) D)
No ratings yet
EES Tutorial ME 435: A) B) C) D)
3 pages
Simpson's 1/3 Rule of Integration: Major: All Engineering Majors Authors: Autar Kaw, Charlie Barker
No ratings yet
Simpson's 1/3 Rule of Integration: Major: All Engineering Majors Authors: Autar Kaw, Charlie Barker
32 pages
Shape and Cross Section Optimisation of A Truss Structure
100% (1)
Shape and Cross Section Optimisation of A Truss Structure
15 pages
08.508 DSP Lab Manual Part-B
100% (2)
08.508 DSP Lab Manual Part-B
124 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
25 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Model Engineering

Uploaded by

Model Engineering

Uploaded by

Model Engineering

2. Run the decision tree algorithm on the training data without

Here's a breakdown of the key characteristics:

● The default splitting criterion used by DecisionTreeClassifier for classification

3. graphical and textual representation of the learned decision

​ Exporting the Decision Tree to a DOT File:

5. how data are separated for 3 to 5 leaf nodes

​ Three Leaf Nodes:

​ Five Leaf Nodes:

|--- JoiningYear <= 2017.50

6. Running the decision tree algorithm on the training data

Explanation of the thresholds used for pre-pruning:

You might also like

Exporting the Decision Tree to a DOT File:

Three Leaf Nodes:

Five Leaf Nodes: