0% found this document useful (0 votes)
59 views18 pages

DM Manual III-II

The document provides instructions for downloading and installing the WEKA data mining toolkit. It describes downloading WEKA from the listed link, ensuring Java is installed, opening and running the installation file, selecting installation options and locations, and completing the installation process. Upon completion, the WEKA graphical user interface is launched with options to explore, experiment, use knowledge flow interfaces, or the simple command line.

Uploaded by

saritha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views18 pages

DM Manual III-II

The document provides instructions for downloading and installing the WEKA data mining toolkit. It describes downloading WEKA from the listed link, ensuring Java is installed, opening and running the installation file, selecting installation options and locations, and completing the installation process. Upon completion, the WEKA graphical user interface is launched with options to explore, experiment, use knowledge flow interfaces, or the simple command line.

Uploaded by

saritha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Department of Computer Science & Engineering

1. Downloading and/or installation of WEKA data mining toolkit.

Ans: Install Steps for WEKA a Data Mining Tool

1. Download the software as your requirements from the below given link.
http://www.cs.waikato.ac.nz/ml/weka/downloading.html
2. The Java is mandatory for installation of WEKA so if you have already Java on your
machine then download only WEKA else download the software with JVM.
3. Then open the file location and double click on the file

4. Click Next

[DM Lab Manual] Page 1


Department of Computer Science & Engineering

5. Click I Agree.

6. As your requirement do the necessary changes of settings and click Next. Full and Associate
files are the recommended settings.

[DM Lab Manual] Page 2


Department of Computer Science & Engineering

7. Change to your desire installation location.

8. If you want a shortcut then check the box and click install.

[DM Lab Manual] Page 3


Department of Computer Science & Engineering

9. The Installation will start wait for a while it will finish within a minute.

10. After complete installation click on Next

11. click on the Finish and take a shovel and start Mining.

[DM Lab Manual] Page 4


Department of Computer Science & Engineering

This is the GUI you get when started. You have 4 options Explorer, Experimenter, Knowledge
Flow and Simple CLI.

[DM Lab Manual] Page 5


Department of Computer Science & Engineering

2. Understand the features of WEKA tool kit such as Explorer, Knowledge flow
interface, Experimenter, command-line interface.

Ans: WEKA

Weka is created by researchers at the university WIKATO in New Zealand. University of


Waikato, Hamilton, New Zealand Alex Seewald (original Command-line primer) David Scuse
(original Experimenter tutorial)


 It is java based application.

 It is collection often source, Machine Learning Algorithm.

 The routines (functions) are implemented as classes and logically arranged in packages.

 It comes with an extensive GUI Interface.
 Weka routines can be used standalone via the command line interface.

The Graphical User Interface;-

The Weka GUI Chooser (class weka.gui.GUIChooser) provides a starting point for
launching Weka’s main GUI applications and supporting tools. If one prefers a MDI (“multiple
document interface”) appearance, then this is provided by an alternative launcher called “Main”
(class weka.gui.Main). The GUI Chooser consists of four buttons—one for each of the four major
Weka applications—and four menus
The buttons can be used to start the following applications:

 Explorer An environment for exploring data with WEKA (the rest of this Documentation
deals with this application in more detail).
 Experimenter An environment for performing experiments and conducting statistical tests
between learning schemes.

 Knowledge Flow This environment supports essentially the same functions as the Explorer
but with a drag-and-drop interface. One advantage is that it supports incremental learning.


 SimpleCLI Provides a simple command-line interface that allows direct execution of WEKA
commands for operating systems that do not provide their own command line interface.




[DM Lab Manual] Page 6


Department of Computer Science & Engineering

1. Explorer

The Graphical user interface

1.1 Section Tabs

At the very top of the window, just below the title bar, is a row of tabs. When the Explorer
is first started only the first tab is active; the others are grayed out. This is because it is
necessary to open (and potentially pre-process) a data set before starting to explore the data.
The tabs are as follows:

1. Preprocess. Choose and modify the data being acted on.


2. Classify. Train & test learning schemes that classify or perform regression
3. Cluster. Learn clusters for the data.
4. Associate. Learn association rules for the data.
5. Select attributes. Select the most relevant attributes in the data.
6. Visualize. View an interactive 2D plot of the data.

Once the tabs are active, clicking on them flicks between different screens, on which the
respective actions can be performed. The bottom area of the window (including the status box, the
log button, and the Weka bird) stays visible regardless of which section you are in. The Explorer
can be easily extended with custom tabs. The Wiki article “Adding tabs in the Explorer”
explains this in detail.

[DM Lab Manual] Page 7


Department of Computer Science & Engineering

3. Study the ARFF file format

Ans: ARFF File Format

An ARFF (= Attribute-Relation File Format) file is an ASCII text file that describes a list
of instances sharing a set of attributes.

ARFF files are not the only format one can load, but all files that can be converted with
Weka’s “core converters”. The following formats are currently supported:

  ARFF (+ compressed)
  C4.5
  CSV
  libsvm
  binary serialized instances
 XRFF (+ compressed)

Overview

ARFF files have two distinct sections. The first section is the Header information, which
is followed the Data information. The Header of the ARFF file contains the name of the relation,
a list of the attributes (the columns in the data), and their types.

An example header on the standard IRIS dataset looks like this:

1. Title: Iris Plants Database

2. Sources:

@RELATION iris
@ATTRIBUTE sepal_length NUMERIC
@ATTRIBUTE sepal width NUMERIC
@ATTRIBUTE petal length NUMERIC
@ATTRIBUTE petal width NUMERIC
@ATTRIBUTE class {Iris-setosa, Iris-versicolor, Iris-irginica} The Data of the ARFF file looks
like the following:

[DM Lab Manual] Page 8


Department of Computer Science & Engineering

@DATA

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa

Lines that begin with a % are comments.


The @RELATION, @ATTRIBUTE and @DATA declarations are case insensitive.

The ARFF Header Section

The ARFF Header section of the file contains the relation declaration and at-
tribute declarations.

The @relation Declaration

The relation name is defined as the first line in the ARFF file. The format is: @relation
<relation-name>
where <relation-name> is a string. The string must be quoted if the name includes spaces.
The @attribute Declarations

Attribute declarations take the form of an ordered sequence of @attribute statements.


Each attribute in the data set has its own @attribute statement which uniquely defines the
name of that attribute and it’s data type. The order the attributes are declared
indicates the column position in the data section of the file. For example, if an attribute is
the third one declared then Weka expects that all that attributes values will be found in
the third comma delimited column.

The format for the @attribute statement is:

[DM Lab Manual] Page 9


Department of Computer Science & Engineering

@attribute <attribute-name> <datatype>

where the <attribute-name> must start with an alphabetic character. If spaces are to be
included in the name then the entire name must be quoted.

The <datatype> can be any of the four types supported by Weka:

  numeric
  integer is treated as numeric
  real is treated as numeric
  <nominal-specification>
  string
  date [<date-format>]
 relational for multi-instance data (for future use)

where <nominal-specification> and <date-format> are defined below. The keywords


numeric, real, integer, string and date are case insensitive.

Numeric attributes

Numeric attributes can be real or integer numbers.


Nominal attributes

Nominal values are defined by providing an <nominal-specification> listing the possible


values: <nominal-name1>, <nominal-name2>, <nominal-name3>,
For example, the class value of the Iris dataset can be defined as follows: @ATTRIBUTE
class {Iris-setosa,Iris-versicolor,Iris-virginica} Values that contain spaces must be
quoted.

String attributes

String attributes allow us to create attributes containing arbitrary textual values. This is
very useful in text-mining applications, as we can create datasets with string attributes,
then write Weka Filters to manipulate strings (like String- ToWordVectorFilter). String
attributes are declared as follows:

[DM Lab Manual] Page 10


Department of Computer Science & Engineering

@ATTRIBUTE LCC string

Date attributes

Date attribute declarations take the form: @attribute <name> date [<date-format>] where
<name> is the name for the attribute and <date-format> is an optional string specifying
how date values should be parsed and printed (this is the same format used by
SimpleDateFormat). The default format string accepts the ISO-8601 combined date and
time format: yyyy-MM-dd’T’HH:mm:ss. Dates must be specified in the data section as
the corresponding string representations of the date/time (see example below).

The ARFF Data Section

The ARFF Data section of the file contains the data declaration line and the actual
instance lines.

The @data Declaration

The @data declaration is a single line denoting the start of the data segment in the file.
The format is:

@data
The instance data

Each instance is represented on a single line, with carriage returns denoting the end of the
instance. A percent sign (%) introduces a comment, which continues to the end of the
line.

Attribute values for each instance are delimited by commas. They must appear in the
order that they were declared in the header section (i.e. the data corresponding to the nth
@attribute declaration is always the nth field of the attribute).

Missing values are represented by a single question mark, as in:

@data 4.4,?,1.5,?,Iris-setosa

Values of string and nominal attributes are case sensitive, and any that contain space or
the comment-delimiter character % must be quoted. (The code suggests that double-
quotes are acceptable and that a backslash will escape individual characters.)
[DM Lab Manual] Page 11
Department of Computer Science & Engineering
Experiment
1. Load each dataset into Weka and run Aprior algorithm with different support
and confidence values. Study the rules generated.

Steps for run Aprior algorithm in WEKA


1. Create Data set in CSV or arff

Milk Bread Tea Coffee Sugar


t t
t t t
t t t
t t t t
t t t t t
t t t
t t t t t
t t t
t t
t t
t t t
t t t
t
t
t t

2. Open WEKA Tool.


3. Click on WEKA Explorer.
4. Click on Preprocessing tab button.
5. Click on open file button.
6. Choose data set and open file.
7. Click on Associate tab and Choose Aprior algorithm
8. Click on start button.

[DM Lab Manual] Page 12


Department of Computer Science & Engineering
Output

=== Run information ===

Scheme: weka.associations.Apriori -I -R -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1


Relation: Apriori Algorithm
Instances: 15
Attributes: 5
Milk
Bread
Tea
Coffee
Sugar
=== Associator model (full training set) ===

Apriori
=======

Minimum support: 0.15 (2 instances)


Minimum metric <confidence>: 0.9
Number of cycles performed: 17

Generated sets of large itemsets:

Size of set of large itemsets L(1): 5

Large Itemsets L(1):


Milk=t 10
Bread=t 11
Tea=t 9
Coffee=t 4
Sugar=t 8

Size of set of large itemsets L(2): 10

Large Itemsets L(2):


Milk=t Bread=t 8
Milk=t Tea=t 5
Milk=t Coffee=t 4
Milk=t Sugar=t 5
Bread=t Tea=t 6
Bread=t Coffee=t 4
Bread=t Sugar=t 6
Tea=t Coffee=t 2
Tea=t Sugar=t 6
Coffee=t Sugar=t 2

Size of set of large itemsets L(3): 10

[DM Lab Manual] Page 13


Department of Computer Science & Engineering
Large Itemsets L(3):
Milk=t Bread=t Tea=t 4
Milk=t Bread=t Coffee=t 4
Milk=t Bread=t Sugar=t 4
Milk=t Tea=t Coffee=t 2
Milk=t Tea=t Sugar=t 4
Milk=t Coffee=t Sugar=t 2
Bread=t Tea=t Coffee=t 2
Bread=t Tea=t Sugar=t 4
Bread=t Coffee=t Sugar=t 2
Tea=t Coffee=t Sugar=t 2

Size of set of large itemsets L(4): 5

Large Itemsets L(4):


Milk=t Bread=t Tea=t Coffee=t 2
Milk=t Bread=t Tea=t Sugar=t 3
Milk=t Bread=t Coffee=t Sugar=t 2
Milk=t Tea=t Coffee=t Sugar=t 2
Bread=t Tea=t Coffee=t Sugar=t 2

Size of set of large itemsets L(5): 1

Large Itemsets L(5):


Milk=t Bread=t Tea=t Coffee=t Sugar=t 2

Best rules found:

1. Coffee=t 4 ==> Milk=t 4 <conf:(1)> lift:(1.5) lev:(0.09) [1] conv:(1.33)


2. Coffee=t 4 ==> Bread=t 4 <conf:(1)> lift:(1.36) lev:(0.07) [1] conv:(1.07)
3. Bread=t Coffee=t 4 ==> Milk=t 4 <conf:(1)> lift:(1.5) lev:(0.09) [1] conv:(1.33)
4. Milk=t Coffee=t 4 ==> Bread=t 4 <conf:(1)> lift:(1.36) lev:(0.07) [1] conv:(1.07)
5. Coffee=t 4 ==> Milk=t Bread=t 4 <conf:(1)> lift:(1.88) lev:(0.12) [1] conv:(1.87)
6. Tea=t Coffee=t 2 ==> Milk=t 2 <conf:(1)> lift:(1.5) lev:(0.04) [0] conv:(0.67)
7. Coffee=t Sugar=t 2 ==> Milk=t 2 <conf:(1)> lift:(1.5) lev:(0.04) [0] conv:(0.67)
8. Tea=t Coffee=t 2 ==> Bread=t 2 <conf:(1)> lift:(1.36) lev:(0.04) [0] conv:(0.53)
9. Coffee=t Sugar=t 2 ==> Bread=t 2 <conf:(1)> lift:(1.36) lev:(0.04) [0] conv:(0.53)
10. Coffee=t Sugar=t 2 ==> Tea=t 2 <conf:(1)> lift:(1.67) lev:(0.05) [0] conv:(0.8)

[DM Lab Manual] Page 14


Department of Computer Science & Engineering

2. Load each dataset into Weka and run Aprior algorithm with different support
and confidence values. Study the rules generated.

Steps for run FP algorithm in WEKA


1. Create Data set in arff
@relation 'FP'

@attribute Milk {t}


@attribute Bread {t}
@attribute Tea {t}
@attribute Coffee {t}
@attribute Sugar {t}

@data
t,t,?,?,?
t,?,t,?,t
t,t,?,?,t
t,t,t,?,t
t,t,t,t,t
t,t,?,t,?
t,t,t,t,t
t,t,t,?,?
?,t,?,?,t
?,t,t,?,?
?,t,t,?,t
t,t,?,t,?
t,?,?,?,?
?,?,t,?,?
?,?,t,?,t

2. Open WEKA Tool.


3. Click on WEKA Explorer.
4. Click on Preprocessing tab button.
5. Click on open file button.
6. Choose data set and open file.
7. Click on Associate tab and Choose FP Growth algorithm
8. Click on start button.

[DM Lab Manual] Page 15


Department of Computer Science & Engineering

Output

=== Run information ===

Scheme: weka.associations.FPGrowth -P 2 -I -1 -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1


Relation: FP
Instances: 15
Attributes: 5
Milk
Bread
Tea
Coffee
Sugar
=== Associator model (full training set) ===

FPGrowth found 38 rules (displaying top 10)

1. [Coffee=t]: 4 ==> [Bread=t]: 4 <conf:(1)> lift:(1.36) lev:(0.07) conv:(1.07)


2. [Coffee=t]: 4 ==> [Milk=t]: 4 <conf:(1)> lift:(1.5) lev:(0.09) conv:(1.33)
3. [Coffee=t]: 4 ==> [Bread=t, Milk=t]: 4 <conf:(1)> lift:(1.88) lev:(0.12) conv:(1.87)
4. [Bread=t, Coffee=t]: 4 ==> [Milk=t]: 4 <conf:(1)> lift:(1.5) lev:(0.09) conv:(1.33)
5. [Milk=t, Coffee=t]: 4 ==> [Bread=t]: 4 <conf:(1)> lift:(1.36) lev:(0.07) conv:(1.07)
6. [Tea=t, Coffee=t]: 2 ==> [Bread=t]: 2 <conf:(1)> lift:(1.36) lev:(0.04) conv:(0.53)
7. [Sugar=t, Coffee=t]: 2 ==> [Bread=t]: 2 <conf:(1)> lift:(1.36) lev:(0.04) conv:(0.53)
8. [Tea=t, Coffee=t]: 2 ==> [Milk=t]: 2 <conf:(1)> lift:(1.5) lev:(0.04) conv:(0.67)
9. [Sugar=t, Coffee=t]: 2 ==> [Milk=t]: 2 <conf:(1)> lift:(1.5) lev:(0.04) conv:(0.67)
10. [Tea=t, Coffee=t]: 2 ==> [Sugar=t]: 2 <conf:(1)> lift:(1.88) lev:(0.06) conv:(0.93)

[DM Lab Manual] Page 16


Department of Computer Science & Engineering

3. Load each dataset into Weka and run j48 classification algorithm, study the
classifier output.

Steps for run FP algorithm in WEKA


1. Create Data set in arff
2. Open WEKA Tool.
3. Click on WEKA Explorer.
4. Click on Preprocessing tab button.
5. Click on open file button.
6. Choose data set and open file.
7. Click on Classify tab and Choose j48 algorithm from tree
8. Select cross Validation or Percentage split.
9. Click on start button.

Output
=== Run information ===

Scheme: weka.classifiers.rules.ZeroR
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

ZeroR predicts class value: Iris-setosa

Time taken to build model: 0.02 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 50 33.3333 %


Incorrectly Classified Instances 100 66.6667 %
Kappa statistic 0
Mean absolute error 0.4444
Root mean squared error 0.4714
Relative absolute error 100 %
Root relative squared error 100 %
Coverage of cases (0.95 level) 100 %
Mean rel. region size (0.95 level) 100 %
Total Number of Instances 150

[DM Lab Manual] Page 17


Department of Computer Science & Engineering
=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class


1 1 0.333 1 0.5 0.5 Iris-setosa
0 0 0 0 0 0.5 Iris-versicolor
0 0 0 0 0 0.5 Iris-virginica
Weighted Avg. 0.333 0.333 0.111 0.333 0.167 0.5

=== Confusion Matrix ===

a b c <-- classified as
50 0 0 | a = Iris-setosa
50 0 0 | b = Iris-versicolor
50 0 0 | c = Iris-virginica

[DM Lab Manual] Page 18

You might also like