DM Manual III-II
DM Manual III-II
1. Download the software as your requirements from the below given link.
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
2. The Java is mandatory for installation of WEKA so if you have already Java on your
machine then download only WEKA else download the software with JVM.
3. Then open the file location and double click on the file
4. Click Next
5. Click I Agree.
6. As your requirement do the necessary changes of settings and click Next. Full and Associate
files are the recommended settings.
8. If you want a shortcut then check the box and click install.
9. The Installation will start wait for a while it will finish within a minute.
11. click on the Finish and take a shovel and start Mining.
This is the GUI you get when started. You have 4 options Explorer, Experimenter, Knowledge
Flow and Simple CLI.
2. Understand the features of WEKA tool kit such as Explorer, Knowledge flow
interface, Experimenter, command-line interface.
Ans: WEKA
It is java based application.
It is collection often source, Machine Learning Algorithm.
The routines (functions) are implemented as classes and logically arranged in packages.
It comes with an extensive GUI Interface.
Weka routines can be used standalone via the command line interface.
The Weka GUI Chooser (class weka.gui.GUIChooser) provides a starting point for
launching Weka’s main GUI applications and supporting tools. If one prefers a MDI (“multiple
document interface”) appearance, then this is provided by an alternative launcher called “Main”
(class weka.gui.Main). The GUI Chooser consists of four buttons—one for each of the four major
Weka applications—and four menus
The buttons can be used to start the following applications:
Explorer An environment for exploring data with WEKA (the rest of this Documentation
deals with this application in more detail).
Experimenter An environment for performing experiments and conducting statistical tests
between learning schemes.
Knowledge Flow This environment supports essentially the same functions as the Explorer
but with a drag-and-drop interface. One advantage is that it supports incremental learning.
SimpleCLI Provides a simple command-line interface that allows direct execution of WEKA
commands for operating systems that do not provide their own command line interface.
1. Explorer
At the very top of the window, just below the title bar, is a row of tabs. When the Explorer
is first started only the first tab is active; the others are grayed out. This is because it is
necessary to open (and potentially pre-process) a data set before starting to explore the data.
The tabs are as follows:
Once the tabs are active, clicking on them flicks between different screens, on which the
respective actions can be performed. The bottom area of the window (including the status box, the
log button, and the Weka bird) stays visible regardless of which section you are in. The Explorer
can be easily extended with custom tabs. The Wiki article “Adding tabs in the Explorer”
explains this in detail.
An ARFF (= Attribute-Relation File Format) file is an ASCII text file that describes a list
of instances sharing a set of attributes.
ARFF files are not the only format one can load, but all files that can be converted with
Weka’s “core converters”. The following formats are currently supported:
ARFF (+ compressed)
C4.5
CSV
libsvm
binary serialized instances
XRFF (+ compressed)
Overview
ARFF files have two distinct sections. The first section is the Header information, which
is followed the Data information. The Header of the ARFF file contains the name of the relation,
a list of the attributes (the columns in the data), and their types.
2. Sources:
@RELATION iris
@ATTRIBUTE sepal_length NUMERIC
@ATTRIBUTE sepal width NUMERIC
@ATTRIBUTE petal length NUMERIC
@ATTRIBUTE petal width NUMERIC
@ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-irginica} The Data of the ARFF file looks
like the following:
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
The ARFF Header section of the file contains the relation declaration and at-
tribute declarations.
The relation name is defined as the first line in the ARFF file. The format is: @relation
<relation-name>
where <relation-name> is a string. The string must be quoted if the name includes spaces.
The @attribute Declarations
where the <attribute-name> must start with an alphabetic character. If spaces are to be
included in the name then the entire name must be quoted.
numeric
integer is treated as numeric
real is treated as numeric
<nominal-specification>
string
date [<date-format>]
relational for multi-instance data (for future use)
Numeric attributes
String attributes
String attributes allow us to create attributes containing arbitrary textual values. This is
very useful in text-mining applications, as we can create datasets with string attributes,
then write Weka Filters to manipulate strings (like String- ToWordVectorFilter). String
attributes are declared as follows:
Date attributes
Date attribute declarations take the form: @attribute <name> date [<date-format>] where
<name> is the name for the attribute and <date-format> is an optional string specifying
how date values should be parsed and printed (this is the same format used by
SimpleDateFormat). The default format string accepts the ISO-8601 combined date and
time format: yyyy-MM-dd’T’HH:mm:ss. Dates must be specified in the data section as
the corresponding string representations of the date/time (see example below).
The ARFF Data section of the file contains the data declaration line and the actual
instance lines.
The @data declaration is a single line denoting the start of the data segment in the file.
The format is:
@data
The instance data
Each instance is represented on a single line, with carriage returns denoting the end of the
instance. A percent sign (%) introduces a comment, which continues to the end of the
line.
Attribute values for each instance are delimited by commas. They must appear in the
order that they were declared in the header section (i.e. the data corresponding to the nth
@attribute declaration is always the nth field of the attribute).
@data 4.4,?,1.5,?,Iris-setosa
Values of string and nominal attributes are case sensitive, and any that contain space or
the comment-delimiter character % must be quoted. (The code suggests that double-
quotes are acceptable and that a backslash will escape individual characters.)
[DM Lab Manual] Page 11
Department of Computer Science & Engineering
Experiment
1. Load each dataset into Weka and run Aprior algorithm with different support
and confidence values. Study the rules generated.
Apriori
=======
2. Load each dataset into Weka and run Aprior algorithm with different support
and confidence values. Study the rules generated.
@data
t,t,?,?,?
t,?,t,?,t
t,t,?,?,t
t,t,t,?,t
t,t,t,t,t
t,t,?,t,?
t,t,t,t,t
t,t,t,?,?
?,t,?,?,t
?,t,t,?,?
?,t,t,?,t
t,t,?,t,?
t,?,?,?,?
?,?,t,?,?
?,?,t,?,t
Output
3. Load each dataset into Weka and run j48 classification algorithm, study the
classifier output.
Output
=== Run information ===
Scheme: weka.classifiers.rules.ZeroR
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Test mode: 10-fold cross-validation
a b c <-- classified as
50 0 0 | a = Iris-setosa
50 0 0 | b = Iris-versicolor
50 0 0 | c = Iris-virginica