0% found this document useful (0 votes)

56 views13 pages

Subject: Importing The Dataset

This document discusses building decision trees and estimating error rates using cross-validation in TANAGRA, ORANGE, and WEKA. It explains the basic steps: importing data, selecting attributes, choosing a learning algorithm, viewing the decision tree, and using cross-validation. For each software, it demonstrates importing the HEART dataset, building a decision tree, viewing the tree, and performing 10-fold cross-validation to estimate the error rate. The estimated error rates are 24.81% for TANAGRA, 24.44% for ORANGE, and 26.67% for WEKA.

Uploaded by

Vbg Da

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views13 pages

Subject: Importing The Dataset

Uploaded by

Vbg Da

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Didacticiel - Etudes de cas R.R.

Subject
Building decision tree with TANAGRA, ORANGE and WEKA. Error rate estimation using a
cross-validation.

When we build a decision tree from a dataset, we much follow the following steps (not
necessarily in the same order):
• Import the dataset in the software;
• Select the class attribute (TARGET) and the descriptors (INPUT);
• Choose the induction algorithm, according the implementation we can obtain slightly
different results;
• Learning process and viewing the decision tree;
• Use cross-validation in order to obtain an honest error rate estimate.

Dataset
We use the HEART.TXT (UCI IRVINE), some attributes are deleted; there are 270 examples
in the dataset.

Building a decision tree with TANAGRA

Importing the dataset

We create a new diagram and import the dataset with the FILE/NEW menu.

Defining the role of attributes

We add the DEFINE STATUS component (we use the button in the toolbar) in the diagram.
We set COEUR as TARGET, the other attributes as INPUT.

23/02/2006 Page 1 sur 13

Didacticiel - Etudes de cas R.R.

Selecting the learning algorithm

We want to use the Classification and Regression Tree (Breiman et al.) algorithm. There are two
steps when we want to define a supervised learning process in TANAGRA: (a) we insert a
meta-supervised learning component from the META SPV LEARNING tab…

(b) … and embed the learning algorithm, C-RT, from the SPV learning tab.

23/02/2006 Page 2 sur 13

Didacticiel - Etudes de cas R.R.

Displaying the results

In order to view the decision tree, we click on the VIEW menu of the last component, we see
the tree1: the resubstitution error rate is 19.63%; the tree has 4 leaves (4 rules).

1 TANAGRA uses a textual representation, if you want a graphical representation, you can try the SIPINA
software from the same author (http://eric.univ-lyon2.fr/~ricco/sipina.html).

23/02/2006 Page 3 sur 13

Didacticiel - Etudes de cas R.R.
Cross-validation
We want to compute the error rate with a cross-validation resampling method. We add the
CROSS-VALIDATION component from the SPV LEARNING ASSESMENT tab. We set the
number of folds to 10, and the number of repetition to 1.

The estimated error rate is 24.81%

23/02/2006 Page 4 sur 13

Didacticiel - Etudes de cas R.R.

Building a decision tree with ORANGE

When we execute ORANGE, we have the following interface.
Components :
Tool palettes
Data Mining tools

<< Workspace

Importing the dataset

ORANGE can handle text file format (tabulation separator). When we select the tool, a new
component is inserted in the diagram. We can select the file with the OPEN contextual menu.

23/02/2006 Page 5 sur 13

Didacticiel - Etudes de cas R.R.
Learning process
By default, the target attribute is the last column; the others are the input attributes. We have
the right configuration in our dataset.

We can add the classification tree component (CLASSIFY tab) in our diagram. We connect
this component with the dataset component.

Decision tree visualization

We can display the tree in a text viewer, it is recommended if we have numerous nodes in
the tree; there is also a graphical viewer that is more pleasant (CLASSIFICATION TREE
VIEWER 2D – CLASSIFY tab). We connect the CLASSIFICATION TREE component to this
last one. We click on the OPEN menu in order to display the tree. There are 10 rules (leaves)
in our tree.

23/02/2006 Page 6 sur 13

Didacticiel - Etudes de cas R.R.

Cross-validation
The TEST LEARNERS component (EVALUATE tab) enables to compute the cross-validation
error rate estimate. We connect to this new component the classification tree.

This component becomes operational when we will have specified the data source and the
training method -- it is possible to connect simultaneously several learning methods, which

23/02/2006 Page 7 sur 13

Didacticiel - Etudes de cas R.R.
makes it possible to realize, very easily, the comparison of performances. We thus carry out
the right connections, and then we display the results using the OPEN menu.

The classification accuracy is 75.56%; the error rate is 24.44%. Other statistics are available.
We can also interactively choose another resampling method.

Building a decision tree with WEKA

A dialog box appears when we execute WEKA; we choose the KNOWLEDGE FLOW
paradigm. We have used the 3.5.1 version.

Importing the dataset

The CSV LOADER enables to handle text file format. We select the HEART.TXT dataset with
the CONFIGURE contextual menu.

23/02/2006 Page 8 sur 13

Didacticiel - Etudes de cas R.R.

Learning process
By default, the target attribute is the last column; the others are the input attributes. We have
the right configuration in our dataset. On the other hand, we must explicitly select the
learning set in the WEKA diagram. We add the TRAINING SET MAKER (EVALUATION
tab) in the diagram; all examples are used for the construction of the decision tree. We choose
the DATASET connection when we connect the LOADER component to this last component.

23/02/2006 Page 9 sur 13

Didacticiel - Etudes de cas R.R.
We add the J48 component (decision tree algorithm such as C4.5, CLASSIFIERS tab). We set
the connection between TRAINING SET MAKER and J48 (training set connection).

Decision tree visualization

We have two representations in WEKA: a textual representation, suggested when we have a

lot of nodes in the tree; a graphical representation that is more pleasant. We select this last
one (GRAPH VIEWER – VISUALIZATION tab) and use the GRAPH connection.

In order to start the execution, we select the first node of the diagram and click on the START
LOADING contextual menu.

23/02/2006 Page 10 sur 13

Didacticiel - Etudes de cas R.R.
When the computation is achieved, we can select the last component (GRAPH VIEWER) and
click on the SHOW GRAPH menu.

The decision tree has 18 leaves.

Cross validation
WEKA has at one’s disposal sophisticated error rate estimation but needs to create a new
sequence of components to do that.

23/02/2006 Page 11 sur 13

Didacticiel - Etudes de cas R.R.

We need to the following components:

• CROSS VALIDATION FOLD MAKER (EVALUATION), which builds folds (DATASET
connection).
• Decision tree J48 component (CLASSIFY); be careful, we have to set the same parameters
as the precedent J48 component. We must connect twice the CROSS VALIDATION
FOLD MAKER to this component, for the training and the test sets.
• CLASSIFIER PERFORMANCE EVALUATOR (EVALUATION) computes the error rate
in each fold. We use the BATCH CLASSIFIER output of J48.
• Last, the TEXT VIEWER component displays the results.

One again, we select the START LOADING of the CSV LOADER component in order to start
the execution. The SHOW RESULTS menu of TEXT VIEWER displays the following results.

23/02/2006 Page 12 sur 13

Didacticiel - Etudes de cas R.R.

The computed error rate is 26.67%. Other statistics are available.

Let us note a very useful characteristic of WEKA; it is possible to visualize the 10 decision
trees computed during the cross validation process. It would be necessary for that to connect
a component TEXT VIEWER at the output of the J48 component, we can see the possible
differences between the trees and judge the stability of computations.

Conclusion
Cross-validation is a very popular method of error rate estimation, especially when we have
a few examples in our dataset. We see in this tutorial that ORANGE, TANAGRA and WEKA,
can handle easily this process.

23/02/2006 Page 13 sur 13

Beanstream API Integration Original
No ratings yet
Beanstream API Integration Original
92 pages
Shuttle - A14hv0x Rev 4.0 - 71r-A14hv6-T840
No ratings yet
Shuttle - A14hv0x Rev 4.0 - 71r-A14hv6-T840
40 pages
1 Theme: Comparison of The Implementation of The CART Algorithm Under Tanagra and R (Rpart Package)
No ratings yet
1 Theme: Comparison of The Implementation of The CART Algorithm Under Tanagra and R (Rpart Package)
15 pages
Data Mining Record
No ratings yet
Data Mining Record
34 pages
Classification With WEKA: Data Mining Lab 2
No ratings yet
Classification With WEKA: Data Mining Lab 2
8 pages
DWDM Lab Manual: Department of Computer Science and Engineering
No ratings yet
DWDM Lab Manual: Department of Computer Science and Engineering
46 pages
Experiment No. 7
No ratings yet
Experiment No. 7
4 pages
Exp. 5 Demonstration of Classification Process On Dataset Student - Arff Using j48 Algorithm
No ratings yet
Exp. 5 Demonstration of Classification Process On Dataset Student - Arff Using j48 Algorithm
6 pages
En Tanagra Perfs Comp Decision Tree
No ratings yet
En Tanagra Perfs Comp Decision Tree
15 pages
DWDM Lab 2
No ratings yet
DWDM Lab 2
3 pages
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Weka Tool
No ratings yet
Weka Tool
12 pages
Semester 2, 2020 Week 8: Data Mining in WEKA Tutorial/Lab Session - 7
No ratings yet
Semester 2, 2020 Week 8: Data Mining in WEKA Tutorial/Lab Session - 7
13 pages
DWDM Lab Tasks
No ratings yet
DWDM Lab Tasks
13 pages
(Exp 4) Classification Via Decision Trees in WEKA
No ratings yet
(Exp 4) Classification Via Decision Trees in WEKA
10 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
No ratings yet
Experiment 1 Aim:: Introduction To ML Lab With Tools (Hands On WEKA On Data Set (Iris - Arff) ) - (A) Start Weka
55 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
Classification Problems
No ratings yet
Classification Problems
53 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
WEKA Manual
No ratings yet
WEKA Manual
11 pages
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
Classification Error: Training Errors Generalization Errors
No ratings yet
Classification Error: Training Errors Generalization Errors
39 pages
CaseStudy ClassificationandEvaluation
No ratings yet
CaseStudy ClassificationandEvaluation
4 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
DM Lab Record PDF
No ratings yet
DM Lab Record PDF
32 pages
Unit 3
No ratings yet
Unit 3
95 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
Part I - Installing Weka: HW Assignment 1
No ratings yet
Part I - Installing Weka: HW Assignment 1
3 pages
Data Mining Lab Syllabus
No ratings yet
Data Mining Lab Syllabus
2 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
It Works As Follows:: Decision Tree ?
No ratings yet
It Works As Follows:: Decision Tree ?
3 pages
Indian Institute of Management Bangalore: PGP 4 Term 2019-20
No ratings yet
Indian Institute of Management Bangalore: PGP 4 Term 2019-20
3 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
No ratings yet
Title: Implementation of Decision Tree Classification: Department of Computer Science and Engineering
8 pages
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
No ratings yet
Weka Tutorial: 1. Downloading and Installing Weka (Version 3.6)
4 pages
CAP3770 Lab#4 DecsionTree Sp2017
No ratings yet
CAP3770 Lab#4 DecsionTree Sp2017
4 pages
Divorce Prediction System: Devansh Kapoor 179202050
No ratings yet
Divorce Prediction System: Devansh Kapoor 179202050
12 pages
Wa0002.
No ratings yet
Wa0002.
21 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
Weka Book Questions
0% (1)
Weka Book Questions
2 pages
Lec 10
No ratings yet
Lec 10
36 pages
Dataware Practical 5
No ratings yet
Dataware Practical 5
4 pages
Module4 QB 1
No ratings yet
Module4 QB 1
26 pages
Unit 3
100% (1)
Unit 3
21 pages
Business Intelligence DM2 WEKA Classification
No ratings yet
Business Intelligence DM2 WEKA Classification
102 pages
DMLB 1
No ratings yet
DMLB 1
3 pages
German Dataset Tasks
No ratings yet
German Dataset Tasks
6 pages
IML Trees
No ratings yet
IML Trees
66 pages
03 Decision Tree
No ratings yet
03 Decision Tree
59 pages
AI32 Guide To Weka PDF
No ratings yet
AI32 Guide To Weka PDF
6 pages
DM Manual-Min
No ratings yet
DM Manual-Min
100 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
An Approach To Evaluating Learning Algorithms For Decision Trees
No ratings yet
An Approach To Evaluating Learning Algorithms For Decision Trees
19 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Types of Pruning Techniques
No ratings yet
Types of Pruning Techniques
10 pages
DWMExp 5
No ratings yet
DWMExp 5
6 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
1 en 1 Chapter
No ratings yet
1 en 1 Chapter
8 pages
201903-Chari Et Al-Bank Reg
No ratings yet
201903-Chari Et Al-Bank Reg
40 pages
How To Take A Screenshot On The Bloomberg Terminal
No ratings yet
How To Take A Screenshot On The Bloomberg Terminal
3 pages
Self-Organizing Map: Teuvo Kohonen, Samuel Kaski, Panu Somervuo, Krista Lagus, Merja Oja, Vesa Paatero
No ratings yet
Self-Organizing Map: Teuvo Kohonen, Samuel Kaski, Panu Somervuo, Krista Lagus, Merja Oja, Vesa Paatero
10 pages
En Tanagra Kohonen SOM R
No ratings yet
En Tanagra Kohonen SOM R
21 pages
Gupta - 2015 - Forecasting Bankruptcy For SMEs Using Hazard Function - To What Extent Does Size Matter
No ratings yet
Gupta - 2015 - Forecasting Bankruptcy For SMEs Using Hazard Function - To What Extent Does Size Matter
25 pages
Tivoli Composite Application Manager For Transactions Best Practices For Web Server Monitoring
No ratings yet
Tivoli Composite Application Manager For Transactions Best Practices For Web Server Monitoring
10 pages
IXrouter2 Installation Guide (2017-01)
No ratings yet
IXrouter2 Installation Guide (2017-01)
15 pages
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Advanced Computer Architecture - WWW - Rgpvnotes.in
60 pages
Alisha-Vocational Training Report
No ratings yet
Alisha-Vocational Training Report
19 pages
Open SSH (Configuring Secure Shell)
No ratings yet
Open SSH (Configuring Secure Shell)
10 pages
Chapter - 6 Evolution of Msp430 Family Microcontrollers: S.No. Name of The Sub-Title Page No
No ratings yet
Chapter - 6 Evolution of Msp430 Family Microcontrollers: S.No. Name of The Sub-Title Page No
19 pages
DELL Inspiron 640M - WISTRON BERMUDA - POWER SEQUENCE PDF
No ratings yet
DELL Inspiron 640M - WISTRON BERMUDA - POWER SEQUENCE PDF
39 pages
Sleep Tracker Project App
No ratings yet
Sleep Tracker Project App
14 pages
IWP VIT Syllabus
No ratings yet
IWP VIT Syllabus
11 pages
Implementation of Boolean Function Using 8:1 Multiplexer: Title of Project Report
No ratings yet
Implementation of Boolean Function Using 8:1 Multiplexer: Title of Project Report
15 pages
Excel Core 2016 Lesson 09
No ratings yet
Excel Core 2016 Lesson 09
115 pages
PPS Question Bank - Updated
No ratings yet
PPS Question Bank - Updated
2 pages
CSV Vs CSA
No ratings yet
CSV Vs CSA
7 pages
Aurix™, Tricore™, Xc2000, Xe166, Xc800 Families Dap Connector
No ratings yet
Aurix™, Tricore™, Xc2000, Xe166, Xc800 Families Dap Connector
15 pages
OUTPUT
No ratings yet
OUTPUT
18 pages
Telnet
No ratings yet
Telnet
5 pages
SGGU BCA Sem 6 Revised
No ratings yet
SGGU BCA Sem 6 Revised
19 pages
R Mini-Compiler
No ratings yet
R Mini-Compiler
16 pages
Introduction To Python Programming
No ratings yet
Introduction To Python Programming
69 pages
64t64r Massive Mimo Remote Radio Unit
100% (1)
64t64r Massive Mimo Remote Radio Unit
2 pages
Skills IT Academy Profile
No ratings yet
Skills IT Academy Profile
8 pages
BGP Secure Routing 1708284503
No ratings yet
BGP Secure Routing 1708284503
82 pages
ATPDraw 5 User Manual Updates
No ratings yet
ATPDraw 5 User Manual Updates
51 pages
Learning M365 The Elon Musk Way
No ratings yet
Learning M365 The Elon Musk Way
177 pages
Faculty of Engineering and Technology Semester End Examination Question Paper
100% (1)
Faculty of Engineering and Technology Semester End Examination Question Paper
2 pages
Stars 1.06
No ratings yet
Stars 1.06
22 pages
Integrated Poe Ethernet Speed: Product Details For Rb4011Igs+Rm
No ratings yet
Integrated Poe Ethernet Speed: Product Details For Rb4011Igs+Rm
2 pages
Professional Ajax 2nd Edition Nicholas C. Zakas - Quickly Download The Ebook To Start Your Content Journey
No ratings yet
Professional Ajax 2nd Edition Nicholas C. Zakas - Quickly Download The Ebook To Start Your Content Journey
47 pages

Subject: Importing The Dataset

Uploaded by

Subject: Importing The Dataset

Uploaded by

Didacticiel - Etudes de cas R.R.

Building a decision tree with TANAGRA

Importing the dataset

Defining the role of attributes

23/02/2006 Page 1 sur 13

Selecting the learning algorithm

23/02/2006 Page 2 sur 13

Displaying the results

23/02/2006 Page 3 sur 13

The estimated error rate is 24.81%

23/02/2006 Page 4 sur 13

Building a decision tree with ORANGE

Importing the dataset

23/02/2006 Page 5 sur 13

Decision tree visualization

23/02/2006 Page 6 sur 13

23/02/2006 Page 7 sur 13

Building a decision tree with WEKA

Importing the dataset

23/02/2006 Page 8 sur 13

23/02/2006 Page 9 sur 13

Decision tree visualization

We have two representations in WEKA: a textual representation, suggested when we have a

23/02/2006 Page 10 sur 13

The decision tree has 18 leaves.

23/02/2006 Page 11 sur 13

We need to the following components:

23/02/2006 Page 12 sur 13

The computed error rate is 26.67%. Other statistics are available.

23/02/2006 Page 13 sur 13

You might also like