0% found this document useful (0 votes)

141 views45 pages

Weka Tutorial

This document provides a tutorial for using the WEKA data mining software. It discusses how to launch WEKA Explorer, preprocess data by opening files in various formats, build classifiers to analyze data, cluster data, find associations, select attributes, and visualize results. The goal is to guide users through analyzing a problem using WEKA's tools for preprocessing, classification, clustering, association rules, attribute selection, and visualization.

Uploaded by

Anku Naidu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

141 views45 pages

Weka Tutorial

Uploaded by

Anku Naidu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 45

Machine Learning with WEKA

WEKA Explorer Tutorial

for WEKA Version 3.4.3

Svetlana S. Aksenova
[email protected]

School of Engineering and Computer Science

Department of Computer Science
California State University, Sacramento
California, 95819

2004
TABLE OF CONTENTS

1. INTRODUCTION......................................................................................................... 2
2. LAUNCHING WEKA EXPLORER .............................................................................. 2
3. PREPROCESSING DATA .......................................................................................... 3
3.1. FILE CONVERSION ................................................................................................... 4
3.2. OPENING FILE FROM A LOCAL FILE SYSTEM ................................................................ 5
3.3. OPENING FILE FROM A WEB SITE ............................................................................... 7
3.4. READING DATA FROM A DATABASE ............................................................................ 8
3.5. PREPROCESSING WINDOW ....................................................................................... 9
3.6. SETTING FILTERS .................................................................................................. 13
4. BUILDING “CLASSIFIERS”..................................................................................... 16
4.1. CHOOSING A CLASSIFIER ....................................................................................... 17
4.2. SETTING TEST OPTIONS ........................................................................................ 17
4.3. ANALYZING RESULTS ............................................................................................. 21
4.4. VISUALIZATION OF RESULTS ................................................................................... 22
Classification Exercise .............................................................................................. 25
5. CLUSTERING DATA ................................................................................................ 25
5.1. CHOOSING CLUSTERING SCHEME ........................................................................... 26
5.2. SETTING TEST OPTIONS ........................................................................................ 27
5.3. ANALYZING RESULTS ............................................................................................. 29
5.4. VISUALIZATION OF RESULTS ................................................................................... 30
Clustering Exercise ................................................................................................... 32
6. FINDING ASSOCIATIONS ....................................................................................... 32
6.1. CHOOSING ASSOCIATION SCHEME .......................................................................... 32
6.2. SETTING TEST OPTIONS ........................................................................................ 33
6.3. ANALYZING RESULTS ............................................................................................. 35
Association Rules Exercise ....................................................................................... 35
7. ATTRIBUTE SELECTION ........................................................................................ 35
7.1. SELECTING OPTIONS ............................................................................................. 36
7.2. ANALYZING RESULTS ............................................................................................. 37
7.3. VISUALIZING RESULTS ........................................................................................... 37
8. DATA VISUALIZATION............................................................................................ 39
8.1. CHANGING THE VIEW ............................................................................................. 40
8.2. SELECTING INSTANCES .......................................................................................... 41
9. CONCLUSION .......................................................................................................... 43
10. REFERENCES........................................................................................................ 44

1
1. Introduction

WEKA is a data mining system developed by the University of Waikato in New Zealand
that implements data mining algorithms. WEKA is a state-of-the-art facility for developing
machine learning (ML) techniques and their application to real-world data mining problems. It is
a collection of machine learning algorithms for data mining tasks. The algorithms are applied
directly to a dataset. WEKA implements algorithms for data preprocessing, classification,
regression, clustering, association rules; it also includes a visualization tools. The new machine
learning schemes can also be developed with this package. WEKA is open source software
issued under the GNU General Public License [3].
The goal of this Tutorial is to help you to learn WEKA Explorer. The tutorial will guide
you step by step through the analysis of a simple problem using WEKA Explorer preprocessing,
classification, clustering, association, attribute selection, and visualization tools. At the end of
each problem there is a representation of the results with explanations side by side. Each part is
concluded with the exercise for individual practice. By the time you reach the end of this tutorial,
you will be able to analyze your data with WEKA Explorer using various learning schemes and
interpret received results.
Before starting this tutorial, you should be familiar with data mining algorithms such as
C4.5 (C5), ID3, K-means, and Apriori. All working files are provided. For better performance, the
archive of all files used in this tutorial can be downloaded or copied from CD to your hard drive
as well as a printable version of the lessons. A trial version of Weka package can be
downloaded from the University of Waikato website at
http://www.cs.waikato.ac.nz/~ml/weka/index.html.

2. Launching WEKA Explorer

You can launch Weka from C:\Program Files directory, from your desktop selecting

icon, or from the Windows task bar ‘Start’ Æ ‘Programs’ Æ ‘Weka 3-4’. When ‘WEKA
GUI Chooser’ window appears on the screen, you can select one of the four options at the
bottom of the window [2]:

1. Simple CLI provides a simple command-line interface and allows direct execution of
Weka commands.

2
2. Explorer is an environment for exploring data.

3. Experimenter is an environment for performing experiments and conducting statistical

tests between learning schemes.
4. KnowledgeFlow is a Java-Beans-based interface for setting up and running machine
learning experiments.

For the exercises in this tutorial you will use ‘Explorer’. Click on ‘Explorer’ button in the ‘WEKA
GUI Chooser’ window.

‘WEKA Explorer’ window appears on a screen.

3. Preprocessing Data

At the very top of the window, just below the title bar there is a row of tabs. Only the first
tab, ‘Preprocess’, is active at the moment because there is no dataset open. The first three

3
buttons at the top of the preprocess section enable you to load data into WEKA. Data can be
imported from a file in various formats: ARFF, CSV, C4.5, binary, it can also be read from a
URL or from an SQL database (using JDBC) [4]. The easiest and the most common way of
getting the data into WEKA is to store it as Attribute-Relation File Format (ARFF) file.
You’ve already been given “weather.arff” file for this exercise; therefore, you can skip
section 3.1 that will guide you through the file conversion.

3.1. File Conversion

We assume that all your data stored in a Microsoft Excel spreadsheet “weather.xls”.

WEKA expects the data file to be in Attribute-Relation File Format (ARFF) file. Before you apply
the algorithm to your data, you need to convert your data into comma-separated file into ARFF
format (into the file with .arff extension) [1]. To save you data in comma-separated format, select
the ‘Save As…’ menu item from Excel ‘File’ pull-down menu. In the ensuing dialog box select
‘CSV (Comma Delimited)’ from the file type pop-up menu, enter a name of the file, and click
‘Save’ button. Ignore all messages that appear by clicking ‘OK’. Open this file with Microsoft
Word. Your screen will look like the screen below.

4
The rows of the original spreadsheet are converted into lines of text where the elements are
separated from each other by commas. In this file you need to change the first line, which holds
the attribute names, into the header structure that makes up the beginning of an ARFF file. Add
a @relation tag with the dataset’s name, an @attribute tag with the attribute
information, and a @data tag as shown below.

Choose ‘Save As…’ from the ‘File‘ menu and specify ‘Text Only with Line Breaks’ as the file
type. Enter a file name and click ‘Save’ button. Rename the file to the file with extension .arff to
indicate that it is in ARFF format.

3.2. Opening file from a local file system

Click on ‘Open file…’ button.

5
It brings up a dialog box allowing you to browse for the data file on the local file system, choose
“weather.arff” file.

Some databases have the ability to save data in CSV format. In this case, you can select CSV
file from the local filesystem. If you would like to convert this file into ARFF format, you can click
on ‘Save’ button. WEKA automatically creates ARFF file from your CSV file.

6
3.3. Opening file from a web site

A file can be opened from a website. Suppose, that “weather.arff” is on the following
website:

The URL of the web site in our example is http://gaia.ecs.csus.edu/~aksenovs/. It means that
the file is stored in this directory, just as in the case with your local file system. To open this file,
click on ‘Open URL…’ button, it brings up a dialog box requesting to enter source URL.

7
Enter the URL of the web site followed by the file name, in this example the URL is
http://gaia.ecs.csus.edu/~aksenovs/weather.arff, where weather.arff is the name of the file you
are trying to load from the website.

3.4. Reading data from a database

Data can also be read from an SQL database using JDBC. Click on ‘Open DB…’ button,
‘GenericObjectEditor’ appears on the screen.

To read data from a database, click on ‘Open’ button and select the database from a filesystem.

8
3.5. Preprocessing window

At the bottom of the window there is ‘Status’ box. The ‘Status’ box displays messages
that keep you informed about what is going on. For example, when you first opened the
‘Explorer’, the message says, “Welcome to the Weka Explorer”. When you loading
“weather.arff” file, the ‘Status’ box displays the message “Reading from file…”. Once the file is
loaded, the message in the ‘Status’ box changes to say “OK”. Right-click anywhere in ‘Status
box’, it brings up a menu with two options:

1. Available Memory that displays in the log and in ‘Status’ box the amount of
memory available to WEKA in bytes.
2. Run garbage collector that forces Java garbage collector to search for memory
that is no longer used, free this memory up and to allow this memory for new
tasks.

To the right of ‘Status box’ there is a ‘Log’ button that opens up the log. The log records
every action in WEKA and keeps a record of what has happened. Each line of text in the log
contains time of entry. For example, if the file you tried to open is not loaded, the log will have
record of the problem that occurred during opening.
To the right of the ‘Log’ button there is an image of a bird. The bird is WEKA status icon.
The number next to ‘X’ symbol indicates a number of concurrently running processes. When
you loading a file, the bird sits down that means that there are no processes running. The
number of processes besides symbol ‘X’ is zero that means that the system is idle. Later, in
classification problem, when generating result look at the bird, it gets up and start moving that
indicates that a process started. The number next to ‘X’ becomes 1 that means that there is one
process running, in this case calculation.

9
If the bird is standing and not moving for a long time, it means that something has gone wrong.
In this case you should restart WEKA Explorer.

Loading data
Lets load the data and look what is happening in the ‘Preprocess’ window.

The most common and easiest way of loading data into WEKA is from ARFF file, using ‘Open
file…’ button (section 3.2). Click on ‘Open file…’ button and choose “weather.arff” file from your
local filesystem. Note, the data can be loaded from CSV file as well because some databases
have the ability to convert data only into CSV format.

Once the data is loaded, WEKA recognizes attributes that are shown in the ‘Attribute’ window.
Left panel of ‘Preprocess’ window shows the list of recognized attributes:

No. is a number that identifies the order of the attribute as they are in data file,
Selection tick boxes allow you to select the attributes for working relation,
Name is a name of an attribute as it was declared in the data file.

The ‘Current relation’ box above ‘Attribute’ box displays the base relation (table) name and the
current working relation (which are initially the same) - “weather”, the number of instances - 14
and the number of attributes - 5.

During the scan of the data, WEKA computes some basic statistics on each attribute. The
following statistics are shown in ‘Selected attribute’ box on the right panel of ‘Preprocess’
window:

Name is the name of an attribute,

Type is most commonly Nominal or Numeric, and
Missing is the number (percentage) of instances in the data for which this attribute is
unspecified,
Distinct is the number of different values that the data contains for this attribute, and
Unique is the number (percentage) of instances in the data having a value for this attribute that
no other instances have.

10
An attribute can be deleted from the ‘Attributes’ window. Highlight an attribute you would like to
delete and hit Delete button on your keyboard.

By clicking on an attribute, you can see the basic statistics on that attribute. The frequency for
each attribute value is shown for categorical attributes. Min, max, mean, standard deviation
(StdDev) is shown for continuous attributes.

Click on attribute Outlook in the ‘Attribute’ window.

Outlook is nominal. Therefore, you can see the following frequency statistics for this attribute in
the ‘Selected attributes’ window:
Missing = 0 means that the attribute is specified for all instances (no missing values),
Distinct = 3 means that Outlook has three different values: sunny, overcast, rainy, and
Unique = 0 means that other instances do not have the same value as Outlook has.

Just below these values there is a table displaying count of instances of the attribute Outlook.
As you can see, there are three values: sunny with 5 instances, overcast with 4 instances, and
rainy with 5 instances. These numbers match the numbers of instances in the base relation and
table “weather.xls”.

Lets take a look at the attribute Temperature.

11
Temperature is a numeric value; therefore, you can see min, max, means, and standard
deviation in ‘Selected Attribute’ window.
Missing = 0 means that the attribute is specified for all instances (no missing values),
Distinct = 12 means that Temperature has twelve different values, and
Unique = 10 means that other attributes or instances have the same 10 value as Temperature
has.
Temperature is a Numeric value; therefore, you can see the statistics describing the distribution
of values in the data - Minimum, Maximum, Mean and Standard Deviation. Minimum = 64 is the
lowest temperature, Maximum = 85 is the highest temperature, mean and standard deviation.
Compare the result with the attribute table “weather.xls”; the numbers in WEKA match the
numbers in the table.

You can select a class in the ‘Class’ pull-down box. The last attribute in the ‘Attributes’
window is the default class selected in the ‘Class’ pull-down box.

12
You can Visualize the attributes based on selected class. One way is to visualize selected
attribute based on class selected in the ‘Class’ pull-down window, or visualize all attributes by
clicking on ‘Visualize All’ button.

3.6. Setting Filters

Pre-processing tools in WEKA are called “filters”. WEKA contains filters for
discretization, normalization, resampling, attribute selection, transformation and combination of
attributes [4]. Some techniques, such as association rule mining, can only be performed on
categorical data. This requires performing discretization on numeric or continuous attributes [5].
For classification example you do not need to transform the data. For you practice, suppose you
need to perform a test on categorical data. There are two attributes that need to be converted:
‘temperature’ and ‘humidity’. In other words, you will keep all of the values for these attributes in
the data. This means you can discretize by removing the keyword "numeric" as the type for the

13
‘temperature’ attribute and replace it with the set of “nominal” values. You can do this by
applying a filter.
In ‘Filters’ window, click on the ‘Choose’ button.

This will show pull-down menu with a list of available filters. Select Supervised Æ Attribute Æ
Discretize and click on ‘Apply’ button. The filter will convert Numeric values into Nominal.

When filter is chosen, the fields in the window changes to reflect available options.

14
As you can see, there is no change in the value Outlook. Select value Temperature, look at the
‘Selected attribute’ box, the ‘Type’ field shows that the attribute type has changed from Numeric
to Nominal. The list has changed as well: instead of statistical values there is count of instances,
and the count of it is 14 that means that there are 14 instances of the value Temperature.

Note, when you right-click on filter, a ‘GenericObjectEditor’ dialog box comes up on your screen.
The box lets you to choose the filter configuration options. The same box can be used for
classifiers, clusterers and association rules.
Clicking on ‘More’ button brings up an ‘Information’ window describing what the different options
can do.

15
At the bottom of the editor window there are four buttons. ‘Open’ and ‘Save’ buttons allow you to
save object configurations for future use. ‘Cancel’ button allows you to exit without saving
changes. Once you have made changes, click ‘OK’ to apply them.

4. Building “Classifiers”

Classifiers in WEKA are the models for predicting nominal or numeric quantities. The
learning schemes available in WEKA include decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons, logistic regression, and bayes’ nets. “Meta”-
classifiers include bagging, boosting, stacking, error-correcting output codes, and locally
weighted learning [4].

Once you have your data set loaded, all the tabs are available to you. Click on the ‘Classify’ tab.

‘Classify’ window comes up on the screen.

16
Now you can start analyzing the data using the provided algorithms. In this exercise you will
analyze the data with C4.5 algorithm using J48, WEKA’s implementation of decision tree
learner. The sample data used in this exercise is the weather data from the file “weather.arff”.
Since C4.5 algorithm can handle numeric attributes, in contrast to the ID3 algorithm from which
C4.5 has evolved, there is no need to discretize any of the attributes. Before you start this
exercise, make sure you do not have filters set in the ‘Preprocess’ window. Filter exercise in
section 3.6 was just a practice.

4.1. Choosing a Classifier

Click on ‘Choose’ button in the ‘Classifier’ box just below the tabs and select C4.5
classifier WEKA Æ Classifiers Æ Trees Æ J48.

4.2. Setting Test Options

Before you run the classification algorithm, you need to set test options. Set test options in
the ‘Test options’ box. The test options that available to you are [2]:

17
1. Use training set. Evaluates the classifier on haw well it predicts the class of the
instances it was trained on.
2. Supplied test set. Evaluates the classifier on how well it predicts the class of a set of
instances loaded from a file. Clicking on the ‘Set…’ button brings up a dialog allowing
you to choose the file to test on.
3. Cross-validation. Evaluates the classifier by cross-validation, using the number of folds
that are entered in the ‘Folds’ text field.
4. Percentage split. Evaluates the classifier on how well it predicts a certain percentage of
the data, which is held out for testing. The amount of data held out depends on the value
entered in the ‘%’ field.

In this exercise you will evaluate classifier based on how well it predicts 66% of the
tested data. Check ‘Percentage split’ radio-button and keep it as default 66%. Click on ‘More
options…’ button.

Identify what is included into the output. In the ‘Classifier evaluation options’ make sure that the
following options are checked [2]:

1. Output model. The output is the classification model on the full training set, so that it
can be viewed, visualized, etc.
2. Output per-class stats. The precision/recall and true/false statistics for each class
output.
3. Output confusion matrix. The confusion matrix of the classifier’s predictions is included
in the output.
4. Store predictions for visualization. The classifier’s predictions are remembered so
that they can be visualized.
5. Set ‘Random seed for Xval / % Split’ to 1. This specifies the random seed used when
randomizing the data before it is divided up for evaluation purposes.

18
The remaining options that you do not use in this exercise but that available to you are:

6. Output entropy evaluation measures. Entropy evaluation measures are included in

the output.
7. Output predictions. The classifier’s predictions are remembered so that they can be
visualized.

Once the options have been specified, you can run the classification algorithm. Click on
‘Start’ button to start the learning process. You can stop learning process at any time by clicking
on ‘Stop’ button.

When training set is complete, the ‘Classifier’ output area on the right panel of ‘Classify’
window is filled with text describing the results of training and testing. A new entry appears in
the ‘Result list’ box on the left panel of ‘Classify’ window.

19
20
4.3. Analyzing Results

=== Run information ===

Run Information gives you the following information:
• the algorithm you used - J48
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: weather • the relation name – “weather”
Instances: 14 • number of instances in the relation – 14
Attributes: 5 • number of attributes in the relation – 5 and the list of the
outlook attributes: outlook, temperature, humidity, windy, play
temperature
humidity
windy
play
Test mode: split 66% train, remainder test • the test mode you selected: split=66%

=== Classifier model (full training set) ===

J48 pruned tree

------------------
Classifier model is a pruned decision tree in textual form that was
outlook = sunny produced on the full training data. As you can see, the first split is
| humidity <= 75: yes (2.0) on the ‘outlook’ attribute, at the second level, the splits are on
| humidity > 75: no (3.0) ‘humidity’ and ‘windy’.
outlook = overcast: yes (4.0) In the tree structure, a colon represents the class label that has
outlook = rainy been assigned to a particular leaf, followed by the number of
| windy = t: no (2.0) instances that reach that leaf.
| windy = f: yes (3.0) Below the tree structure, there is a number of leaves (which is 5),
and the number of nodes in the tree - size of the tree (which is 8).
Number of Leaves : 5 The program gives a time it took to build the model, which is 0.06
seconds.
Size of the tree : 8

Time taken to build model: 0.06 seconds

Evaluation on test split. This part of the output gives estimates of
=== Evaluation on test split === the tree’s predictive performance, generated by WEKA’s
=== Summary === evaluation module. It outputs the list of statistics summarizing how
accurately the classifier was able to predict the true class of the
Correctly Classified Instances 2 40 % instances under the chosen test module. The set of
Incorrectly Classified Instances 3 60 % measurements is derived from the training data.
Kappa statistic -0.3636 In this case only 40% of 14 training instances have been
Mean absolute error 0.6 classified correctly. This indicates that the results obtained from
Root mean squared error 0.7746 the training data are not optimistic compared with what might be
Relative absolute error 126.9231 % obtained from the independent test set from the same source. In
Root relative squared error 157.6801 % addition to classification error, the evaluation output
Total Number of Instances 5 measurements derived from the class probabilities assigned by
the tree. More specifically, it outputs mean output error (0.6) of
=== Detailed Accuracy By Class === the probability estimates, the root mean squared error (0.77) is
the square root of the quadratic loss. The mean absolute error
TP Rate FP Rate Precision Recall F-Measure Class calculated in a similar way by using the absolute instead of
0.667 1 0.5 0.667 0.571 yes squared difference. The reason that the errors are not 1 or 0 is
0 0.333 0 0 0 no because not all training instances are classified correctly.
=== Confusion Matrix === Detailed Accuracy By Class demonstrates a more detailed per-
class break down of the classifier’s prediction accuracy.
a b <-- classified as
2 1 | a = yes From the Confusion matrix you can see that one instance of a
2 0 | b = no class ‘yes’ have been assigned to a class ‘no’, and two of class
‘no’ are assigned to class ’yes’.

21
4.4. Visualization of Results

After training a classifier, the result list adds an entry.

WEKA lets you to see a graphical representation of the classification tree. Right-click on the
entry in ‘Result list’ for which you would like to visualize a tree. It invokes a menu containing the
following items:

Select the item ‘Visualize tree’; a new window comes up to the screen displaying the tree.

22
WEKA also lets you to visualize classification errors. Right-click on the entry in ‘Result list’ again
and select ‘Visualize classifier errors’ from the menu:

‘Weka Classifier Visualize’ window displaying graph appears on the screen.

23
On the ‘Weka Classifier Visualize’ window, beneath the X-axis selector there is a drop-
down list, ‘Colour’, for choosing the color scheme. This allows you to choose the color of points
based on the attribute selected. Below the plot area, there is a legend that describes what
values the colors correspond to. In your example, red represents ‘no’, while blue represents
‘yes’. For better visibility you should change the color of label ‘yes’. Left-click on ‘yes’ in the
‘Class colour’ box and select lighter color from the color palette.

To the right of the plot area there are series of horizontal strips. Each strip represents an
attribute, and the dots within it show the distribution values of the attribute. You can choose
what axes are used in the main graph by clicking on these strips (left-click changes X-axis, right-
click changes Y-axis).
Change X - axis to ‘Outlook’ attribute and Y - axis to ‘Play’. The instances are spread out in the
plot area and concentration points are not visible. Keep sliding ‘Jitter’, a random displacement
given to all points in the plot, to the right, until you can spot concentration points.

On the plot you can see the results of classification. Correctly classified instances are
represented as crosses, incorrectly classified once represented as squares. In this example in
the left lower corner you can see blue cross indicating correctly classified instance: if Outlook =
‘sunny’ Æ play = ‘yes’.

24
Look to the upper left corner of the graph, there are two red squares in this corner. The square
represents incorrectly classified instance. The following is not correct: if Outlook = ‘sunny’ Æ
play = ‘no’.

Classification Exercise
Use ID3 algorithm to classify weather data from the “weather.arff” file. Perform initial
preprocessing and create a version of the initial dataset in which all numeric attributes should be
converted to categorical data.

5. Clustering Data

WEKA contains “clusterers” for finding groups of similar instances in a dataset. The
clustering schemes available in WEKA are k-Means, EM, Cobweb, X-means, FarthestFirst.
Clusters can be visualized and compared to “true” clusters (if given). Evaluation is based on log
likelihood if clustering scheme produces a probability distribution [4].
For this exercise we will use customer data [6] that is contained in “customers.arff” file
and analyze it with k-means clustering scheme.

An international online catalog company wishes to group its customers based on common
features. Company management does not have any predefined labels for these groups. Based
on the outcome of the grouping, they will target marketing and advertising campaigns to the
different groups. The information they have about the customers includes income, age, number
of children, marital status, and education. For our exercise we will use a part of the database for
customers in US. Depending on the type of advertising, not all attributes are important. For
example, suppose the advertising is for a special sale on children’s clothes. We will target the
advertising only to the persons with young children. The clustering that you will perform in this
exercise is as follows. The first group of people has young children and a high school degree,
the second group does not have children but has high school degree. The third group has both
children and a college degree. The fourth group has higher income and at least a college
degree. The fifth group has children and higher degree. Different clustering would have been
found by examining either age or marital status.

In ‘Preprocess’ window click on ‘Open file…’ button and select “customers.arff” file. Click
‘Cluster’ tab at the top of WEKA Explorer window.

25
5.1. Choosing Clustering Scheme

In the ‘Clusterer’ box click on ‘Choose’ button. In pull-down menu select WEKA Æ
Clusterers, and select the cluster scheme ‘SimpleKMeans’. Some implementations of K-means
only allow numerical values for attributes; therefore, we do not need to use a filter.

Once the clustering algorithm is chosen, right-click on the algorithm,

“weak.gui.GenericObjectEditor” comes up to the screen. Set the value in “numClusters” box to 5
(instead of default 2) because you have five clusters in your .arff file. Leave the value of ‘seed’
as is. The seed value is used in generating a random number, which is used for making the
initial assignment of instances to clusters. Note that, in general, K-means is quite sensitive to
how clusters are initially assigned. Thus, it is often necessary to try different values and
evaluate the results.

26
5.2. Setting Test Options

Before you run the clustering algorithm, you need to choose ‘Cluster mode’. Click on
‘Classes to cluster evaluation’ radio-button in ‘Cluster mode’ box and select ‘marital_status’ in
the pull-down box below. It means that you will compare how well the chosen clusters match up
with a pre-assigned class (‘marital_status’) in the data.

Once the options have been specified, you can run the clustering algorithm. Click on the ‘Start’
button to execute the algorithm.

27
When training set is complete, the ‘Cluster’ output area on the right panel of ‘Cluster’
window is filled with text describing the results of training and testing. A new entry appears in
the ‘Result list’ box on the left of the result. These behave just like their classification
counterparts.

28
5.3. Analyzing Results
=== Run information ===

Scheme: weka.clusterers.SimpleKMeans -N 5 -S 10 ‘Run Information’ gives you the following information:

Relation: customers • the clustering scheme used: SimpleKMeans with 5 clusters
Instances: 9
Attributes: 5 • the relation name “customers”
income • number of instances in the relation – 9
age
children • number of attributes in the relation – 6
education • list of attributes used in clustering
Ignored: • the ignored cluster ‘marital_status’ is an attribute the
marital_status
Test mode: Classes to clusters evaluation on training data clustering is performed on.
=== Clustering model (full training set) ===
kMeans
======

Number of iterations: 4
Within cluster sum of squared errors: 3.449558299853908

Cluster centroids: The clustering model shows the centroid of each cluster and
statistics on the number and percentage of instances assigned
Cluster 0
Mean/Mode: 22500 30 3 high_school to different clusters. Cluster centroids are the mean vectors for
Std Devs: 3535.5339 7.0711 N/A N/A each cluster; so, each dimension value and the centroid
Cluster 1 represents the mean value for that dimension in the cluster.
Mean/Mode: 145000 37.5 0 graduate_school
Std Devs: 77781.7459 10.6066 N/A N/A Thus, centroids can be used to characterize the clusters.
Cluster 2 WEKA generated clusters are:
Mean/Mode: 85000 55 0 college Cluster 0 shows that this is a segment of cases representing 25
Std Devs: 21213.2034 7.0711 N/A N/A
Cluster 3 and 35 year old, either single or divorced, people with income
Mean/Mode: 15000 25 1 high_school $22,500 in average, who have 3 children.
Std Devs: 0 0 N/A N/A In cluster 1 there are 30 and 45 year old married people who
Cluster 4
Mean/Mode: 25000 30 0 high_school do not have children.
Std Devs: 7071.0678 14.1421 N/A N/A In cluster 2 there are 50 and 60 year old married and divorced
people with higher income college degree and no children.
=== Evaluation on training set ===
Cluster 3 represents 25 year old married people with one child
kMeans lower income and high school degree.
====== Cluster 4 represents 20 and 40 year old single and divorced
Number of iterations: 4 people with lower income, high school degree and no children.
Within cluster sum of squared errors: 6.899116599707816

Cluster centroids:

Cluster 0
Mean/Mode: 22500 30 3 high_school
Std Devs: 3535.5339 7.0711 N/A N/A
Cluster 1 Sum of errors within the clusters is recalculated.
Mean/Mode: 145000 37.5 0 graduate_school
Std Devs: 77781.7459 10.6066 N/A N/A
Cluster 2
Mean/Mode: 85000 55 0 college
Std Devs: 21213.2034 7.0711 N/A N/A
Cluster 3
Mean/Mode: 15000 25 1 high_school
Std Devs: 0 0 N/A N/A
Cluster 4
Mean/Mode: 25000 30 0 high_school
Std Devs: 7071.0678 14.1421 N/A N/A

Clustered Instances

0 2 ( 22%) ‘Cluster Instances’ section shows the number of instances in

1 2 ( 22%) each new cluster.
2 2 ( 22%) For example, cluster 3 has 1 instance: people of age 25 who
3 1 ( 11%)
4 2 ( 22%) have one child.
Cluster 4 has 2 instances: people of age 30 in average
Class attribute: marital_status (including 20 and 40 y.o.), whose average income is $25,000,
Classes to Clusters:
with high school education and no children.
0 1 2 3 4 <-- assigned to cluster
1 0 0 0 1 | single ‘Classes to Clusters” represents class (‘marital-status’)
0 2 1 1 0 | married
1 0 1 0 1 | divorced assigned to clusters.
Cluster 0 <-- No class
Cluster 1 <-- married
Cluster 2 <-- divorced
Cluster 3 <-- No class
Cluster 4 <-- single The last line displays the you have 5 number incorrectly
Incorrectly clustered instances : 5.0 55.5556 % classified instances, which is 55.5 %.

29
5.4. Visualization of Results

Another way of representation of results of clustering is through visualization. Right-click

on the entry in the ‘Result list’ and select ‘Visualize cluster assignments’ in the pull-down
window.

This brings up the ‘Weka Clusterer Visualize’ window.

On the ‘Weka Clusterer Visualize’ window, beneath the X-axis selector there is a drop-
down list, ‘Colour’, for choosing the color scheme. This allows you to choose the color of points
based on the attribute selected. Below the plot area, there is a legend that describes what
values the colors correspond to. In your example, seven different colors represent seven
numbers (number of children). For better visibility you should change the color of label ‘3’. Left-
click on ‘3’ in the ‘Class colour’ box and select lighter color from the color palette.
To the right of the plot area there are series of horizontal strips. Each strip represents an
attribute, and the dots within it show the distribution values of the attribute. You can choose
what axes are used in the main graph by clicking on these strips (left-click changes X-axis, right-

30
click changes Y-axis). Set X - axis to ‘Cluster’ attribute, Y - axis to ‘Age’. Select ‘Children’ as the
color dimension. You can see the result in a visual rendering of the relationship within each
cluster. For instance, you can note that ‘cluster 0’ represents a group of people of age 25 and
35, who have 3 children, ‘cluster 1’ represents a group of people of age 30 and 45 who do not
have children, ‘cluster 2’ represents 50 and 60 year old people with no children, ‘cluster 3’
represents 25 year old married people with one child, and ‘cluster 4’ represents 20 and 40 year
old people without children.
The initially correctly clustered instances are represented by crosses, incorrectly
clustered once represented as squares. By changing the color dimension to other attributes, you
can see their distribution within each of the clusters.
You may want to save the resulting data set, which included each instance along with its
assigned cluster. To do so, click ‘Save’ button in the visualization window and save the result as
the file “customers_kmeans.arff”.

As you can see, there is a new attribute appeared in the file – ‘cluster’ that was added by
WEKA. This attribute represents the custering done by WEKA.

31
Clustering Exercise
Use k-means algorithm to bank data from the “bank.arff” file. Perform initial preprocessing and
create a version of the initial data set in which the ID field should be removed and the "children"
attribute should be converted to categorical data.

6. Finding Associations

WEKA contains an implementation of the Apriori algorithm for learning association rules.
This is the only currently available scheme for learning associations in WEKA. It works only with
discrete data and will identify statistical dependencies between groups of attributes, milk, peanut
butter and bread, jelly, beer and diapers, with confidence 40% and support 30%. Apriori can
compute all rules that have a given minimum support and exceed a given confidence.

6.1. Choosing Association Scheme

Click ‘Associate’ tab at the top of ‘WEKA Explorer’ window. It brings up interface for the
Apriori algorithm.

The association rule scheme cannot handle numeric values; therefore, for this exercise you will
use grocery store data from the “grocery.arff” file where all values are nominal. Go back to
‘Preprocessing’ section described in part 4 and open “grocery.arff” file.

32
6.2. Setting Test Options

Check the text field in the ‘Associator’ box at the top of the window. As you can see,
there are no other associators to choose and no extra options for testing the learning scheme.

Right-click on the ‘Associator’ box, ‘GenericObjectEditor’ appears on your screen. In the dialog
box, change the value in ‘minMetric’ to 0.4 for confidence = 40%. Make sure that the default
value of rules is set to 100. The upper bound for minimum support ‘upperBoundMinSupport’
should be set to 1.0 (100%) and ‘lowerBoundMinSupport’ to 0.1. Apriori in WEKA starts with the
upper bound support and incrementally decreases support (by delta increments, which by
default is set to 0.05 or 5%). The algorithm halts when either the specified number of rules is
generated, or the lower bound for minimum support is reached. The ‘significanceLevel’ testing
option is only applicable in the case of confidence and is (-1.0) by default (not used).

33
Once the options have been specified, you can run Apriori algorithm. Click on the ‘Start’ button
to execute the algorithm.

34
6.3. Analyzing Results

=== Run information ===

Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.4 -D 0.05 -U 1.0 -M 0.1 -S -

1.0 -A false -c -1 Run Information gives you the following information:
Relation: grocery_store
Instances: 5 • the scheme for learning association we used - Apriori
Attributes: 5 • the relation name – “grocery_store”
bread
jelly
• number of instances in the relation – 5
peanut_butter • number of attributes in the relation – 4 and the list of
milk attributes
beer
=== Associator model (full training set) ===

The results for Apriori algorithm are the following:

Apriori
======= First, the program generated the sets of large itemsets found for
each support size considered. In this case five item sets of
Minimum support: 0.3 three items were found to have the required minimum support.
Minimum metric <confidence>: 0.4
Number of cycles performed: 14
By default, Apriori tries to generate ten rules. It begins with a
Generated sets of large itemsets: minimum support of 100% of the data items and decreases this
Size of set of large itemsets L(1): 5 in steps of 5% until there are at least ten rules with the required
minimum confidence, or until the support has reached a lower
Size of set of large itemsets L(2): 7 bound of 10% whichever occurs first. The minimum confidence
Size of set of large itemsets L(3): 2 is set 0.4 (40%). As you can see, the minimum support
decreased to 0.3 (30%), before the required number of rules
Best rules found: can be generated. Generation of the required number of rules
1. peanut_butter=yes 3 ==> bread=yes 3 conf:(1) involved a total of 14 iterations.
2. jelly=yes 1 ==> bread=yes 1 conf:(1)
3. jelly=yes 1 ==> peanut_butter=yes 1 conf:(1) The last part gives the association rules that are found. The
4. jelly=yes peanut_butter=yes 1 ==> bread=yes 1 conf:(1)
5. bread=yes jelly=yes 1 ==> peanut_butter=yes 1 conf:(1) number preceding = => symbol indicates the rule’s support, that
6. jelly=yes 1 ==> bread=yes peanut_butter=yes 1 conf:(1) is, the number of items covered by its premise. Following the
7. peanut_butter=yes milk=yes 1 ==> bread=yes 1 conf:(1) rule is the number of those items for which the rule’s
8. bread=yes milk=yes 1 ==> peanut_butter=yes 1 conf:(1)
9. bread=yes 4 ==> peanut_butter=yes 3 conf:(0.75) consequent holds as well. In the parentheses there is a
10. milk=yes 2 ==> bread=yes 1 conf:(0.5) confidence of the rule.

Association Rules Exercise

Use Apriori algorithm to generate association rules for Iris data from the “iris.arff” file. Perform
initial preprocessing and create a version of the initial data set in which the numeric attributes
should be converted to categorical data.

7. Attribute Selection

Attribute selection searches through all possible combinations of attributes in the data
and finds which subset of attributes works best for prediction [1]. Attribute selection methods
contain two parts: a search method such as best-first, forward selection, random, exhaustive,
genetic algorithm, ranking, and an evaluation method such as correlation-based, wrapper,
information gain, chi-squared. Attribute selection mechanism is very flexible - WEKA allows
(almost) arbitrary combinations of the two methods [4].
For this exercise you will use weather data from the “weather.arff” file. To begin an
attribute selection, click ‘Select attributes’ tab.

35
7.1. Selecting Options

To search through all possible combinations of attributes in the data and find which
subset of attributes works best for prediction, make sure that you set up attribute evaluator to
‘CfsSubsetEval’ and a search method to ‘BestFirst’. The evaluator will determine what method
to use to assign a worth to each subset of attributes. The search method will determine what
style of search to perform.
The options that you can set for selection in the ‘Attribute Selection Mode’ box are [2]:

1. Use full training set. The worth of the attribute subset is determined using the
full set of training data.
2. Cross-validation. The worth of the attribute subset is determined by a process
of cross-validation. The ‘Fold’ and ‘Seed’ fields set the number of folds to use
and the random seed used when shuffling the data.

Specify which attribute to treat as the class in the drop-down box below the test options.
Once all the test options are set, you can start the attribute selection process by clicking
on ‘Start’ button.

36
When it is finished, the results of selection are shown on the right part of the window and entry
is added to the ‘Result list’.

7.2. Analyzing Results

=== Run information ===

Evaluator: weka.attributeSelection.CfsSubsetEval Run Information gives you the following information:

Search: weka.attributeSelection.BestFirst -D 1 -N 5 • the evaluator we used – CfsSubsetEval
Relation: weather • the search method - BestFit
Instances: 14 • the relation name – “weather”
Attributes: 5 • number of instances in the relation – 14
outlook • number of attributes in the relation – 5 and the list of
temperature attributes
humidity
windy
play
Evaluation mode: evaluate on all training data

=== Attribute Selection on all input data ===

Search Method:
Best first.
Start set: no attributes
Search direction: forward
Stale search after 5 node expansions
Total number of subsets evaluated: 11 The search method selected is the Best Fit. The software
Merit of best subset found: 0.196 started search with no attributes, and it is forward search. We
evaluated 11 subsets and the merit of the best subset is 0.196.
Attribute Subset Evaluator (supervised, Class (nominal): 5
play): The attribute evaluator used is CFS Subset Evaluator. We used
CFS Subset Evaluator supervised learning with labels in the attribute ‘play’.
Selected attributes: 1 : 1 The selected attribute for prediction is ‘outlook’.
outlook

7.3. Visualizing Results

Right-click on the entry in the ‘Result list’. From the pull-down menu select ‘Visualize
reduced data’.

37
In the window below you can see a prediction for ‘play’ depending on the ‘outlook’. For better
visibility the color of label ‘yes’ was changed to the lighter one and ‘Jitter’ was slid to the right to
see concentration points.
In the WEKA visualization window, beneath the X-axis selector there is a drop-down list,
‘Colour’, for choosing the color scheme. This allows you to choose the color of points based on
the attribute selected. Below the plot area, there is a legend that describes what values the
colors correspond to. In your example, red represents ‘no’, while blue represents ‘yes’. For
better visibility you should change the color of label ‘yes’. Left-click on ‘yes’ in the ‘Class colour’
box and select lighter color from the color palette.
To the right of the plot area there are series of horizontal strips. Each strip represents an
attribute, and the dots within it show the distribution values of the attribute. You can choose
what axes are used in the main graph by clicking on these strips (left-click changes X-axis, right-
click changes Y-axis).
Change X - axis to ‘Outlook’ attribute and Y - axis to ‘Play’. The instances are spread out in the
plot area and concentration points are not visible. Keep sliding ‘Jitter’, a random displacement
given to all points in the plot, to the right, until you can spot concentration points.

The prediction is as follows: if the ‘outlook’ is sunny, play = ‘yes’, and if the ‘outlook’ is ‘rainy’,
play = ‘no’, which is very likely to happen. There are few instances displayed in the window that

38
may or may not happen: if ‘outlook’ = ‘sunny’, ‘play’ = ‘no’ and if ‘outlook’ = ‘rainy’, ‘play’ = ‘yes’.
Note, in this section there are no correcty or incorrectly classified symbols in the graph because
the result is based on probability.

8. Data Visualization

WEKA’s visualization allows you to visualize a 2-D plot of the current working relation.
Visualization is very useful in practice, it helps to determine difficulty of the learning problem.
WEKA can visualize single attributes (1-d) and pairs of attributes (2-d), rotate 3-d visualizations
(Xgobi-style). WEKA has “Jitter” option to deal with nominal attributes and to detect “hidden”
data points [4].
To open Visualization screen, click ‘Visualize’ tab.

Select a square that corresponds to the attributes you would like to visualize. For example, let’s
choose ‘outlook’ for X – axis and ‘play’ for Y – axis. Click anywhere inside the square that
corresponds to ‘play on the left and ‘outlook’ at the top.

39
A ‘Visualizing weather’ window appears on the screen.

8.1. Changing the View

In the visualization window, beneath the X-axis selector there is a drop-down list,
‘Colour’, for choosing the color scheme. This allows you to choose the color of points based on
the attribute selected. Below the plot area, there is a legend that describes what values the
colors correspond to. In your example, red represents ‘no’, while blue represents ‘yes’. For
better visibility you should change the color of label ‘yes’. Left-click on ‘yes’ in the ‘Class colour’
box and select lighter color from the color palette.
To the right of the plot area there are series of horizontal strips. Each strip represents an
attribute, and the dots within it show the distribution values of the attribute. You can choose
what axes are used in the main graph by clicking on these strips (left-click changes X-axis, right-
click changes Y-axis).
The software sets X - axis to ‘Outlook’ attribute and Y - axis to ‘Play’. The instances are spread
out in the plot area and concentration points are not visible. Keep sliding ‘Jitter’, a random
displacement given to all points in the plot, to the right, until you can spot concentration points.

40
The results are shown below. But on this screen we changed ‘Colour’ to temperature.
Besides ‘outlook’ and ‘play’, this allows you to see the ‘temperature’ corresponding to the
‘outlook’. It will affect your result because if you see ‘outlook’ = ‘sunny’ and ‘play’ = ‘no’ to
explain the result, you need to see the ‘temperature’ – if it is too hot, you do not want to play.
Change ‘Colour’ to ‘windy’, you can see that if it is windy, you do not want to play as well.

8.2. Selecting Instances

Sometimes it is helpful to select a subset of the data using visualization tool. A special
case is the ‘UserClassifier’, which lets you to build your own classifier by interactively selecting
instances. Below the Y – axis there is a drop-down list that allows you to choose a selection
method. A group of points on the graph can be selected in four ways [2]:

1. Select Instance. Click on an individual data point. It brings up a window listing

attributes of the point. If more than one point will appear at the same location, more than
one set of attributes will be shown.

2. Rectangle. You can create a rectangle by dragging it around the points.

41
3. Polygon. You can select several points by building a free-form polygon. Left-click on
the graph to add vertices to the polygon and right-click to complete it.

4. Polyline. To distinguish the points on one side from the once on another, you can build
a polyline. Left-click on the graph to add vertices to the polyline and right-click to finish.

42
Once the area has been selected it is colored gray. You can click on ‘Submit’ button to
remove the points outside the gray area. To erase selected (gray) area without affecting the
graph, click on ‘Clear’ button. When you clicked on ‘Submit’ button, it changes to ‘Reset’ button.
By clicking on ‘Reset’ button, you can undo all changes and restore the original graph. To save
all your currently visible instances to ARFF file, click on ‘Save’ button.

9. Conclusion

This concludes WEKA Explorer Tutorial. You have covered a lot of material since the
Tutorial Introduction. There is a lot more to learn about WEKA than what you have covered in
these seven exercises. But you have already learned enough to be able to analyze your data
using preprocessing, classification, clustering, and association rule tools. You have learned how
to visualize the result and select attributes. This knowledge will prove invaluable to you. If you
plan to do any complicated data analysis, which require software flexibility, I recommend you to
use WEKA’s ‘Simple CLI’ interface. So, are you ready yet? Probably not. You have few new
tools, but practice makes perfect. Good luck with your data analysis.

43
10. References

1. Witten, E. Frank, Data Mining, Practical Machine Learning Tools and Techniques with Java
Implementation, Morgan Kaufmann Publishers, 2000.
2. R. Kirkby, WEKA Explorer User Guide for version 3-3-4, University of Weikato, 2002.
3. Weka Machine Learning Project, http://www.cs.waikato.ac.nz/~ml/index.html.
4. E.Frank, Machine Learning With WEKA, University of Waikato, New Zealand.
5. B. Mobasher, Data Preparation and Mining with WEKA,
http://maya.cs.depaul.edu/~classes/ect584/WEKA/association_rules.html, DePaul
University, 2003.
6. M. H. Dunham, Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002.

Chart2D Olchocx
No ratings yet
Chart2D Olchocx
262 pages
Universidad Autónoma de Zacatecas: Unidad Académica de Ingeniería Eléctrica
No ratings yet
Universidad Autónoma de Zacatecas: Unidad Académica de Ingeniería Eléctrica
14 pages
Chi Square Test in Weka
67% (3)
Chi Square Test in Weka
40 pages
Weka Tutorial
No ratings yet
Weka Tutorial
15 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
25 pages
VFP Cross Tab Query Vs
No ratings yet
VFP Cross Tab Query Vs
2 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Lab3 KNN
No ratings yet
Lab3 KNN
4 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
Weka Lab Experiment 1 2
No ratings yet
Weka Lab Experiment 1 2
12 pages
Getting Started SP 6
No ratings yet
Getting Started SP 6
138 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
HBase Interview Questions
No ratings yet
HBase Interview Questions
12 pages
Ufl User Manual
No ratings yet
Ufl User Manual
116 pages
Weka A Tool For Exploratory Data Mining
No ratings yet
Weka A Tool For Exploratory Data Mining
157 pages
Machine Learning Lab Assignment Using Weka Name:: Submitted To
No ratings yet
Machine Learning Lab Assignment Using Weka Name:: Submitted To
15 pages
C# Chart - Windows Forms
No ratings yet
C# Chart - Windows Forms
5 pages
Introduction To Hadoop - Part Two: 1 Hadoop and Comma Separated Values (CSV) Files 1
No ratings yet
Introduction To Hadoop - Part Two: 1 Hadoop and Comma Separated Values (CSV) Files 1
38 pages
How To Create Bar and Line Graphs3579
No ratings yet
How To Create Bar and Line Graphs3579
23 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Banknote Authentication
100% (1)
Banknote Authentication
3 pages
Weka Experiments
No ratings yet
Weka Experiments
4 pages
Neural
No ratings yet
Neural
35 pages
Weka Tutorial
100% (2)
Weka Tutorial
60 pages
Weka Lab Record Experiments
No ratings yet
Weka Lab Record Experiments
21 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
04 - 05-AI-Knowledge and Reasoning
No ratings yet
04 - 05-AI-Knowledge and Reasoning
61 pages
Mining Data Streams
No ratings yet
Mining Data Streams
67 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
Simple CLI: 4.1 Commands
No ratings yet
Simple CLI: 4.1 Commands
3 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Wekappt
No ratings yet
Wekappt
58 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Data Mining (WEKA) en Formatted
No ratings yet
Data Mining (WEKA) en Formatted
52 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Lab Manual
No ratings yet
Lab Manual
24 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
Itdw
No ratings yet
Itdw
44 pages
Learning To Use We Ka
No ratings yet
Learning To Use We Ka
5 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Statcon Electronics & Powtech 2024 internships+PPO
No ratings yet
Statcon Electronics & Powtech 2024 internships+PPO
7 pages
Genpact - Job Description - One Data and AI - B Tech Circuit and MCA (3) 2
No ratings yet
Genpact - Job Description - One Data and AI - B Tech Circuit and MCA (3) 2
4 pages
A 2022-Es Győri Közlekedéstudományi Konferencia Absztraktkötete
No ratings yet
A 2022-Es Győri Közlekedéstudományi Konferencia Absztraktkötete
84 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
103 pages
Classifier
No ratings yet
Classifier
39 pages
Diligenti 2017
No ratings yet
Diligenti 2017
4 pages
Weather Prediction Project
No ratings yet
Weather Prediction Project
32 pages
Day 1 Overview Python Al ML
No ratings yet
Day 1 Overview Python Al ML
17 pages
CV Yan-Barros EN
No ratings yet
CV Yan-Barros EN
4 pages
MLU Educator Enablement Train The Teacher Agenda
No ratings yet
MLU Educator Enablement Train The Teacher Agenda
8 pages
Proactive Wildfire Detection and Management Using Ai, ML, and 5G Technology in The United States
No ratings yet
Proactive Wildfire Detection and Management Using Ai, ML, and 5G Technology in The United States
15 pages
AI Lec3
No ratings yet
AI Lec3
22 pages
Preparing Data Science Career
No ratings yet
Preparing Data Science Career
41 pages
Introduction To Big Data Analytics
No ratings yet
Introduction To Big Data Analytics
35 pages
SSRN Id4713111
No ratings yet
SSRN Id4713111
61 pages
MISAssignment 775
No ratings yet
MISAssignment 775
38 pages
Microsoft Certified: Azure AI Fundamentals - Skills Measured
No ratings yet
Microsoft Certified: Azure AI Fundamentals - Skills Measured
3 pages
Traffic Flow Prediction With Big Data - A Deep Learning Approach
No ratings yet
Traffic Flow Prediction With Big Data - A Deep Learning Approach
9 pages
Ben Ulmer, Matt Fernandez, Predicting Soccer Results in The English Premier League
No ratings yet
Ben Ulmer, Matt Fernandez, Predicting Soccer Results in The English Premier League
5 pages
Project
No ratings yet
Project
67 pages
AI Agents Frameworks Part 3 Discover AI Agents, Their Design, and
No ratings yet
AI Agents Frameworks Part 3 Discover AI Agents, Their Design, and
28 pages
Active Online Learning For Social Media Analysis To Support Crisis Management
No ratings yet
Active Online Learning For Social Media Analysis To Support Crisis Management
14 pages
Ch. 1 Artificial Intelligence
No ratings yet
Ch. 1 Artificial Intelligence
5 pages
Gregory, Robert Wayne Et Al. - 'The Role of Artificial Intelligence and Data Network Effects For Creating User Value'
No ratings yet
Gregory, Robert Wayne Et Al. - 'The Role of Artificial Intelligence and Data Network Effects For Creating User Value'
18 pages
Classification of Headache Using Random Forest Algorithm
No ratings yet
Classification of Headache Using Random Forest Algorithm
5 pages
Heart Disease Prediction Using ML
No ratings yet
Heart Disease Prediction Using ML
48 pages
10.26650 Jecs2023 1415085 3641058
No ratings yet
10.26650 Jecs2023 1415085 3641058
10 pages
Book TheLMbook Sample
No ratings yet
Book TheLMbook Sample
30 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
73 pages
Project Review-2 Schedule July 2021 Day-5
No ratings yet
Project Review-2 Schedule July 2021 Day-5
2 pages

Weka Tutorial

Uploaded by

Weka Tutorial

Uploaded by

Machine Learning with WEKA

WEKA Explorer Tutorial

for WEKA Version 3.4.3

School of Engineering and Computer Science

2. Launching WEKA Explorer

3. Experimenter is an environment for performing experiments and conducting statistical

‘WEKA Explorer’ window appears on a screen.

3.1. File Conversion

3.2. Opening file from a local file system

Click on ‘Open file…’ button.

3.4. Reading data from a database

Name is the name of an attribute,

Click on attribute Outlook in the ‘Attribute’ window.

Lets take a look at the attribute Temperature.

3.6. Setting Filters

‘Classify’ window comes up on the screen.

4.1. Choosing a Classifier

4.2. Setting Test Options

6. Output entropy evaluation measures. Entropy evaluation measures are included in

=== Run information ===

=== Classifier model (full training set) ===

J48 pruned tree

Time taken to build model: 0.06 seconds

After training a classifier, the result list adds an entry.

‘Weka Classifier Visualize’ window displaying graph appears on the screen.

Once the clustering algorithm is chosen, right-click on the algorithm,

Scheme: weka.clusterers.SimpleKMeans -N 5 -S 10 ‘Run Information’ gives you the following information:

0 2 ( 22%) ‘Cluster Instances’ section shows the number of instances in

Another way of representation of results of clustering is through visualization. Right-click

This brings up the ‘Weka Clusterer Visualize’ window.

6.1. Choosing Association Scheme

=== Run information ===

Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.4 -D 0.05 -U 1.0 -M 0.1 -S -

The results for Apriori algorithm are the following:

Association Rules Exercise

7.2. Analyzing Results

=== Run information ===

Evaluator: weka.attributeSelection.CfsSubsetEval Run Information gives you the following information:

=== Attribute Selection on all input data ===

7.3. Visualizing Results

8.1. Changing the View

8.2. Selecting Instances

1. Select Instance. Click on an individual data point. It brings up a window listing

2. Rectangle. You can create a rectangle by dragging it around the points.

You might also like