Using Weka

Weka is a software tool for data mining and machine learning. It contains tools for preprocessing data, applying machine learning algorithms, and comparing results. Weka works with ARFF data files and its Explorer interface allows users to load data files, select classifiers, build and visualize decision trees. The document provides instructions on using Weka's preprocessor to load a mushroom dataset, select attributes, save ARFF files, and remove attributes before building a decision tree classifier with the J48 algorithm.

Uploaded by

Arunima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

273 views6 pages

Using Weka

Uploaded by

Arunima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Introduction to Weka

Objectives of this Class

1. Explore the use of the Weka software tool

Weka
The Weka workbench is a set of tools for preprocessing data, experimenting with data-mining/machinelearning algorithms, and comparing the performance of different methods. Weka also provides a
Java class library that enables one to use the Weka filters and classifiers in their own programs
1. Although there are other Weka interfaces for advanced users, we will use the Explorer interface
for most of our work.
2. Go to http://www.cs.waikato.ac.nz/ml/weka and download Weka to your laptop. You should
download the stable version for the 3rd edition of the textbook.
3. If you are working in the 046 Colburn lab, Weka can be invoked as follows:
(a) Click on the Microsoft icon in the lower left corner, then click on AllPrograms.
(b) Scroll down and click on Weka 3.6.11 and then double click on Weka3.6 where you see
the bird icon dont use the with console option.
4. Invoke Weka; You should get a screen that displays a bird and offers a choice of four graphical
user interfaces. Click on Explorer.

ARFF data files

An ARFF file (Attribute-Relation File Format) is a standard way of representing machine learning
data sets as flat files (no relationships among instances). Weka works with ARFF files.
Lines beginning with % are comments.
Lines beginning with @ define the relation and then its attributes.
1. The relation is defined first using the command @relation:
@relation <relation-name>
2. Then each attribute is defined using the command @attribute:
@attribute <attribute-name> {<set-of-comma-separated-possible-attribute-values>}
@attribute <attribute-name> numeric
3. Then the data values are given:
@data
set of data instances, given as comma-separated values, with each data instance on
a separate line

Using Weka
1. Weka Preprocessor: Loading and Examining Data
(a) Often data is in an Excel spreadsheet, which can be converted to a CSV file (commaseparated file) which can in turn be converted to an ARFF file in Weka.
(b) Open the Excel spreadsheet Mushroom-data-625.xls which can be found in the Datasets directory on the class web site. Download this file to a folder on your machine. If
you are working on your own PC, you might want to put it in a subfolder of the data
folder that is created as a subfolder of the Weka-3-6 folder under ProgramFiles that was
created when you downloaded Weka. (Notice that the Weka data folder already contains
some data files that we will use during the course.) Let us examine the structure of
Mushroom-data-625.xls:
The first row gives the attribute names.
Each subsequent row represents an instance, with the value for each attribute given
in the respective column.
(c) To convert to a csv file, click on Save-as, select CSV as the file type, and save (in this
case as Mushroom-data-625). This saves the file as a comma-separated csv file, though
if one opens the file, it still looks like an Excel spreadsheet. Note that you now have
both a csv file and an Excel file named Mushroom-data-625.
(d) To convert the csv file to ARFF:
i. Invoke the Weka Explorer GUI
ii. Select Open File under Preprocess, move to the folder in which you stored the
CSV file Mushroom-data-625, set the file type to CSV, and select the file Mushroomdata-625 as the file to open. You should be opening the file that you saved as a csv
file.
Weka automatically changes the file to ARFF format.
(e) The csv file does not specify which attribute is the class attribute. You can specify
the class attribute by clicking on Class in the middle of the right side and selecting
the attribute that should serve as the class. Select the attribute Status as the class
attribute it can take on the value e for edible or p for poisonous.
(f) By clicking on one of the attributes on the left, you will see a histogram that shows how
often each of the two values of the class occurs for each value of the selected attribute.
Note that if you select the class attribute itself (in this case Status), the histogram
shows how often each of the classes occurs in the data. The table on the right
above the histogram enables you to identify what the histogram colors mean for
example the table shows 63 instances of Status=p and p is the first column in the
histogram and is labelled as having 63 instances.
(g) On the left side, select cap-surface as the attribute (but keep Status as the Class value.
On the right you see a histogram showing the distribution of values of the Status attribute
for the three different values of the cap-surface attribute.
(h) Questions-1:
i. How many instances are there in the data file?
ii. How many different values can cap-color take on?
iii. Which values of the cap-color attribute result in only edible mushrooms?

2. Weka Preprocessor: Saving and Viewing ARFF Files

(a) Save the file in ARFF format. To do this, click on Save in the Weka Explorer; in the
window that appears, set the file type to Arff data files, give the file a name (in this
case, set the name as Mushroom-data-625), and then click on Save. Weka has stored
the file in ARFF format so that you can use it again in the future. Note that the Weka
data files stored in the data subfolder of the Weka folder are stored in ARFF format.
(b) Questions-2:
i. Use a text editor to view the ARFF file representing the mushroom data. What
does the first line tell you?
ii. What do the next 23 lines tell you?
iii. What do the rest of the lines tell you?
3. Weka Preprocessor: Editing ARFF Files in Weka
(a) In the Weka Explorer, you can edit the data file by clicking on Edit; you can save the
edited file in Weka (not one of your folders) by clicking on Save. Click on Edit in the
Preprocessor and examine what appears.
(b) You can change the value of an attribute. For example, try left-clicking on one of the
attribute values for an instance note that you are given a choice from among the valid
values that the attribute can assume.
(c) You can make changes to an attribute or even remove an attribute. For example, rightclick on one of the attribute names at the top of the window, such as Cap-shape. Note
that you are given a number of choices clicking on delete-attribute will delete the
entire column for this attribute.
(d) Click on Cancel to return to the Explorer without making any changes to the data being
examined. (If you inadvertently saved a revised data file, you can always go back to
your folder and reload the original file, since the changes are only being saved in Weka
and are not reflected in the data files on your PC.)
4. The Classifier: Now we want to try using a classifier.
(a) Click on Classify.
(b) Click on Choose under Classifier and then click on the symbol to the left of Trees and
then click on J48. This will place J48 as the name of the classification method shown to
the right of Choose. (J48 is the Weka name for a decision tree classifier based on C4.5,
a well-known decision tree algorithm.)
(c) Click on the method that is listed next to Choose it should be J48. Click on More
to get information about the method that will be used. In the future, you will find this
information helpful. Then close the window, click on Cancel, and return to the Explorer.
(d) Immediately below More options on the left side of the Explorer screen, there is a line
that gives an attribute name preceded by (Nom); this attribute is probably not the
class attribute that you want. Click on the arrow to the right, and select Status as the
attribute to be used as the class attribute.
(e) Click on Use training set under Test options on the left side of the Explorer window.
This means that the classifier will be built and tested on the same set of data.
(f) Click on Start to build the classifier. The results appear on the right side of the Explorer
screen. Below where it says J48 pruned tree, you will see a textual description of the
tree. Look at this and get an idea of what the tree looks like.
3

(g) To actually visualize the tree, do the following. Right click on the last line on the left
side of the screen under Result list, and select Visualize tree. A new window will appear
with a graphical view of the decision tree that correlates with the textual description of
the tree.
(h) Questions-3: Scroll through the screen to answer the following questions.
i. How many instances were classified correctly? How many were classified incorrectly?
ii. Write an IF-THEN-ELSE rule that captures the decision tree that was developed.
(i) Now we want to remove the column for the attribute Odor and see what happens if Odor
is not available as an attribute. There are three ways that you could do this.
i. The first two ways have already been discussed. What are they? (If you dont
remember, ask the instructor.)
ii. The third way is to go back to the Preprocessor by clicking on Preprocess at the top
of the Explorer window, then click on the square box next to an attribute name and
then click on Remove. Do this to remove the attribute Odor.
iii. Now also remove the attributes gill-size, stalk-root, and habitat.
(j) Invoke the same classifier on this revised data make sure that you have reset the class
attribute as Status. Visualize the resulting decision tree. Enlarge the window displaying
the tree; then right click on empty space in the window and select Fit to Screen from the
menu that appears. For each leaf node, the class value at that leaf node is given along
wih one or two numbers. If there is only one number, that tells how many instances
reached that leaf node and were classified correctly; if there are two numbers, the first
tells how many of the instances that reached this leaf node were classified correctly and
the second number tells how many were classified incorrectly.
(k) Questions-4:
i. How many instances were classified correctly? How many were classified incorrectly?
ii. What attribute is at the root of the decision tree?
iii. Consider the following path in the decision tree: cap-color=n, stalk-surface-belowring=s, bruises?=f. What is the class value assigned to instances that follow this
path?
iv. What path in the decision tree leads to a leaf node where some instances are classified
incorrectly?
5. The Classifier: Noise in the Data
(a) Go back to your spreadsheet data and copy the first data instance and insert it as two
new rows at the beginning of the spreadsheet. Then change the value of Status for these
two new rows to r instead of p. Leave the third row unchanged. Then change the Status
value for the next ten data instances to r. (Notice that the Status attribute now has 3
possible values, p, e, and r.) Convert the revised file to cvs format, save it, load it into
Weka, remove the Odor attribute, save the file in ARFF format as Mushroom-data-625revised, and then run the classifier on it. (Be sure to set Status as the class attribute.)
Examine the results.
(b) At the bottom of Classifier output is a matrix that is called a confusion matrix which
we shall refer to as C. If there are n possible classes, then the confusion matrix has n
rows and n columns, one for each possible class. The entry Ci , j gives the number of
data items whose correct class is i and which were classified by J48 as class j
(c) Questions-5:
i. How many instances are incorrectly classified? Why did this happen?
4

ii. What does the diagonal of the confusion matrix tell you?
iii. Which class did the classifier always get wrong?
6. The Classifier: Dividing Data into Training and Test Sets
(a) Instead of training and testing on the same data set, we can ask Weka to hold out part
of the data set (ie., not use it to train our classifier) and use it instead as a test set. To
save part of the data set for testing, click on the radio button to the left of Percentage
split under Test options, and enter the number 90 as the %. Run the Classifier and look
at the results.
(b) Now do this twice more, once with 50 and once with 5 as the % for the split. Look at
the results and answer the following questions:
(c) Questions-6:
i. How many instances were used for training when there is a 90% split? How many
for testing?
ii. How many instances were misclassified when there is a 90% split?
iii. How many instances were used for training when there is a 50% split? How many
for testing?
iv. How many instances were misclassified when there is a 50% split?
v. How many instances were used for training when there is a 5% split? How many for
testing?
vi. How many instances were misclassified when there is a 5% split?
vii. What is the error rate under each of the different splits?
viii. What do you think is causing the differences in classification error rate under the
different splits?
7. The Classifier: Viewing the Output
(a) You can also see how the classifier classifies the individual instances in the test set. Let
us examine how to do this when only 5% of the data is in the test set (so that the output
is not huge).
(b) Set the split at 95% in the training set and thus 5% in the test set.
(c) Click on More Options, and then click on the box next to Output predictions and then
Click on OK.
(d) Run the classifier again, and examine the classifer output. Note that you can now see
how each instance in the test set was classified, and the ones that are incorrectly classified
are noted by a + sign in the error column.
8. The Classifier: Rerunning Models
(a) Note that on the lower left side of the Explorer window, there is a Result-list with an
entry for each run of the Classifier. If you click on one of these, you go back to the
results for that run. Try it to see that this is the case.
(b) You can also save a model for future use or reload a previously saved model. Right click
on one of the models in the Result-list, save it, and then reload it into Weka. Note that
only the model is loaded, not the results of testing the model.
(c) You can also save the results from testing a particular model, and then go back and view
the results using a text editor.

9. The Classifier: Using a New Test Set

(a) You might want to run a model on a new test set of data instances. This can be a bit
tricky, since the test set must be in EXACTLY the same format as the original training
set. Let us explore the problems involved in doing this.
(b) Now you want to run your model on a new test set. Open the Excel spreadsheet
Mushroom-data-625-test-2014.xls which can be found in the Data-sets directory on
the class web site. Save this file on your own machine and name it Mushroom-data-625test.xls. Convert the file to CSV format and save it. Then load the file into Weka and
save it in ARFF format.
(c) Develop a model using ARFF file Mushroom-data-625 and the J48 classifier. Make sure
that your class attribute is set to Status.
(d) Under Test options, click on the radio button to the left of Supplied test set and then
click on Set to the right of Supplied test set. Click on Open file in the small window that
appears, select your ARFF file Mushroom-data-625-test, and open it as the file that will
be used as test data.
(e) Make sure that your classification attribute is still set to Status, and then click on Start.
You should get an error message saying that the training and test sets are not compatible.
(f) Lets look at the problem. Use an editor to view the ARFF file Mushroom-data-625
that was used for training and the ARFF file Mushroom-data-625-test that was used for
testing, and compare them. What differences do you see?
(g) The problem is that the training and testing data files must have the same set of possible
values for the attributes and these possible values must be listed in the same order in the
attribute definition. Copy the ARFF file Mushroom-data-625-test into file Mushroomdata-625-test2 (Mushroom-data-625-test2 will also be an ARFF file); then edit the ARFF
file Mushroom-data-625-test2 so that it has the same set of possible values for the attributes as the ARFF training file Mushroom-data-625 and these possible values are
listed in the same order in the attribute definition. Then save Mushroom-data-625-test2.
(h) Set the test file as the edited Mushroom-data-625-test2. Again click on Start. If you have
edited the test file correctly, you should get a successful test run on the 10 instances in
the test file.
(i) Questions-7:
i. How many instances were classified incorrectly?
ii. Draw the confusion matrix.

Experiment No: 01 Data Exploration & Data Preprocessing
No ratings yet
Experiment No: 01 Data Exploration & Data Preprocessing
54 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
CSE 5th Semester - UI and UX Design - CCS370 - Hand Written Notes - Unit 2 - Foundations of UI Design
No ratings yet
CSE 5th Semester - UI and UX Design - CCS370 - Hand Written Notes - Unit 2 - Foundations of UI Design
25 pages
Param Merge1
No ratings yet
Param Merge1
5,432 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Cloud Computing Aws Lab
No ratings yet
Cloud Computing Aws Lab
27 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Perform Data Preprocessing Tasks Using Labor Data Set in WEKA
No ratings yet
Perform Data Preprocessing Tasks Using Labor Data Set in WEKA
6 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
Dm&pa Lab Manual
No ratings yet
Dm&pa Lab Manual
68 pages
Background Job Scheduling in SAP
No ratings yet
Background Job Scheduling in SAP
27 pages
Grade 7 Ste Creative Tech Lessons For 4TH Quarter
No ratings yet
Grade 7 Ste Creative Tech Lessons For 4TH Quarter
15 pages
Itdw
No ratings yet
Itdw
44 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Data Mining Complete Lab Manual - DRSNR
No ratings yet
Data Mining Complete Lab Manual - DRSNR
27 pages
Proxmox Backup 3
No ratings yet
Proxmox Backup 3
165 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Installation and User's Guide - IBM System x3550 M2 (7946) - English
No ratings yet
Installation and User's Guide - IBM System x3550 M2 (7946) - English
142 pages
Data Warehousing Lab Manual
No ratings yet
Data Warehousing Lab Manual
36 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
SoftICE Command Reference
No ratings yet
SoftICE Command Reference
274 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Data Warehousing - To Write
No ratings yet
Data Warehousing - To Write
23 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
Partb - Bindu - Sree M - 1ox22mc063 2
No ratings yet
Partb - Bindu - Sree M - 1ox22mc063 2
63 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Lab Manual
No ratings yet
Lab Manual
16 pages
DMW Lab Manual
No ratings yet
DMW Lab Manual
42 pages
Wekappt
No ratings yet
Wekappt
58 pages
Beginning With Weka and R Language
No ratings yet
Beginning With Weka and R Language
27 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
30 pages
Zihad Projeject
No ratings yet
Zihad Projeject
20 pages
IOT Based Real-Time Vehicle Tracking System
No ratings yet
IOT Based Real-Time Vehicle Tracking System
6 pages
Lab 04
No ratings yet
Lab 04
7 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
DataMining Record Using Weka Tool
No ratings yet
DataMining Record Using Weka Tool
55 pages
Spoto Ccna 200-125 Dumps
No ratings yet
Spoto Ccna 200-125 Dumps
5 pages
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
No ratings yet
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
19 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Wa0002.
No ratings yet
Wa0002.
21 pages
Data Mining: Index
No ratings yet
Data Mining: Index
47 pages
Voyager 5200 Series Quick Start Guide: Registering Your Product and Plantronics Software
No ratings yet
Voyager 5200 Series Quick Start Guide: Registering Your Product and Plantronics Software
3 pages
Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
Network On Chip Security and Privacy Prabhat Mishra All Chapter Instant Download
No ratings yet
Network On Chip Security and Privacy Prabhat Mishra All Chapter Instant Download
55 pages
A Seminar Report On Space Mouse
100% (2)
A Seminar Report On Space Mouse
27 pages
Tender 1526 ICT
No ratings yet
Tender 1526 ICT
7 pages
Weka Experiment
No ratings yet
Weka Experiment
13 pages
AXNav - Replaying Accessibility Tests From Natural Language
No ratings yet
AXNav - Replaying Accessibility Tests From Natural Language
16 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Basics On SDH From STM-1 Up To
No ratings yet
Basics On SDH From STM-1 Up To
124 pages
DWDM - Case Study On Weka - Ceb624
No ratings yet
DWDM - Case Study On Weka - Ceb624
13 pages
Q Gis Features
No ratings yet
Q Gis Features
15 pages
DMLB 1
No ratings yet
DMLB 1
3 pages
Malware Detection: A Framework For Reverse Engineered Android Applications Through Machine Learning Algorithms
No ratings yet
Malware Detection: A Framework For Reverse Engineered Android Applications Through Machine Learning Algorithms
20 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
Eight Principles
No ratings yet
Eight Principles
13 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Task 0: Weka Introduction
No ratings yet
Task 0: Weka Introduction
11 pages
Azure With Essential CLI Commands
No ratings yet
Azure With Essential CLI Commands
7 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
HXM Suite End User System Requirements: Public Document Version: 2H 2020 - 2021-03-19
No ratings yet
HXM Suite End User System Requirements: Public Document Version: 2H 2020 - 2021-03-19
22 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Computational Mathematics: Module Description
No ratings yet
Computational Mathematics: Module Description
8 pages
Introduction To Weka-A Toolkit For Machine Learning
No ratings yet
Introduction To Weka-A Toolkit For Machine Learning
11 pages
How To Flash Samsung Stock Firmware (4 Files) PDF
No ratings yet
How To Flash Samsung Stock Firmware (4 Files) PDF
9 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
Nti Serimux S 16 Ds
No ratings yet
Nti Serimux S 16 Ds
4 pages
COMSATS University Islamabad Lahore Campus: Defence Road, Off Raiwind Road, Lahore. 042-111-001-007 Ext: 820, 803
No ratings yet
COMSATS University Islamabad Lahore Campus: Defence Road, Off Raiwind Road, Lahore. 042-111-001-007 Ext: 820, 803
12 pages
Deleted The - Windowsapps - Folder in C - Program Files... What Now - Microsoft Community
No ratings yet
Deleted The - Windowsapps - Folder in C - Program Files... What Now - Microsoft Community
1 page
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
No ratings yet
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
13 pages
4 (B) - Data Preprocessing and Visualization
No ratings yet
4 (B) - Data Preprocessing and Visualization
6 pages
Ashish Kumar Mishra: Highlights
No ratings yet
Ashish Kumar Mishra: Highlights
2 pages
Ngaf M4500
No ratings yet
Ngaf M4500
2 pages
Learning To Use We Ka
No ratings yet
Learning To Use We Ka
5 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
AI32 Guide To Weka PDF
No ratings yet
AI32 Guide To Weka PDF
6 pages
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet
Hibernate, Spring & Struts Interview Questions You'll Most Likely Be Asked
From Everand
Hibernate, Spring & Struts Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Using Weka

Uploaded by

Using Weka

Uploaded by

Introduction to Weka

Objectives of this Class

ARFF data files

2. Weka Preprocessor: Saving and Viewing ARFF Files

9. The Classifier: Using a New Test Set

You might also like