0% found this document useful (0 votes)

88 views20 pages

Weka Software Manuala

This document summarizes a presentation about the machine learning tool Weka. Weka is a collection of machine learning algorithms and data preprocessing tools implemented in Java. It provides algorithms for classification, regression, clustering, and association rule mining. Weka includes a graphical user interface that allows users to load data, apply machine learning techniques, and visualize results. It also includes APIs that allow algorithms to be integrated into other Java programs. The presentation provides an overview of Weka's main features and capabilities.

Uploaded by

Aswin Kumar Thanikachalam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views20 pages

Weka Software Manuala

Uploaded by

Aswin Kumar Thanikachalam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Weka: Practical machine learning tools and

techniques with Java implementations

AI Tools Seminar

University of Saarland, WS 06/07

Rossen Dimov 1

Supervisors:
Michael Feld, Dr. Michael Kipp,
2
Dr. Alassane Ndiaye and Dr. Dominik Heckmann

April 30, 2007

1
[email protected]
2
{michael.feld, michael.kipp, alassane.ndiaye, dominik.heckmann}@dfki.de
Abstract

With recent advances in computer technology large amounts of data

could be collected and stored. But all this data becomes more useful when
it is analyzed and some dependencies and correlations are detected. This
can be accomplished with machine learning algorithms.
WEKA (Waikato Environment for Knowledge Analysis) is a collection
of machine learning algorithms implemented in Java. WEKA consists of a
large number of learning schemes for classification and regression numeric
prediction - like decision trees, support vector machines, instance-based clas-
sifiers, Bayes decision schemes, neural networks etc. and clustering. It pro-
vides also meta classifiers like bagging and boosting, evaluation methods
like cross-validation and bootstrapping, numerous attribute selection meth-
ods and preprocessing techniques.
A graphical user interface provides loading of data, applying machine
learning algorithms and visualizing the built models. A Java interface avail-
able to all algorithms enables embedding them in any user’s program.
Contents
1 Introduction 2

2 WEKA’s main features 3

2.1 The Explorer . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.1 Loading data . . . . . . . . . . . . . . . . . . . . . . . 3
2.1.2 Preprocessing Data . . . . . . . . . . . . . . . . . . . . 3
2.1.3 Building classifiers . . . . . . . . . . . . . . . . . . . . 5
2.1.4 Association rules . . . . . . . . . . . . . . . . . . . . . 7
2.1.5 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.6 Attribute selection . . . . . . . . . . . . . . . . . . . . 9
2.1.7 Visualizing the data . . . . . . . . . . . . . . . . . . . 11
2.2 The Knowledge flow . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 The Experimenter . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 The Simple command line interface . . . . . . . . . . . . . . . 14
2.5 The Java interface . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Experimental results 15

4 Conclusion 17

1
1 Introduction
Machine learning algorithms serve for inducing classification rules from a
dataset of instances and thus broadening the domain knowledge and under-
standing.
WEKA is a workbench for machine learning that is intended to make
the application of machine learning techniques more easy and intuitive to a
variety of real-world problems. The environment targets not only the ma-
chine learning expert but also the domain specialist. That is why interactive
modules for data processing, data and trained model visualization, database
connection and cross-validation are provided. They go along with the basic
functionality that needs to be supported by a machine learning system -
classifying and regression predicting, clustering and attribute selection.
It is developed at the University of Waikato, New Zealand. The project
started when the authors needed to apply machine learning techniques on an
agricultural problem. This was about twelve years ago. Now version 3.5.5 is
available and two years ago the authors have also published a book[4]. This
book covers the different algorithms, their possible weak and strong points,
all preprocessing and evaluating methods. It also covers a detailed descrip-
tion for all four graphical modules and some basic introduction on how to
use the Java interface in your own programs. The project is developed and
distributed under the GPL license and has a subdomain on the Sourceforge1
portal.
This article coves a description of the main features of WEKA version
3.5.5 and an application for spam detection using the Java interface.
Some basic machine learning definitions, used in the forthcoming part
follow:

• The instances are objects from a space of fixed dimension.

• Each dimension corresponds to a so-called attribute of the object.

• Most often attributes could be nominal (enumerated) or numerical

(real number) or strings.

• One special attribute is the class attribute, which determines the

appurtenance of the instance to a specific group of instances.

• A dataset is a set of instances.

• Training set is a set that is used for building a classifier, which is

the process of learning something from instances in order to predict
the class attribute of new ones.

• Test set is a set that is used for evaluation a classifier.

1
http://weka.sourceforge.net/wiki/index.php/Main Page

2
An example of data set can be records of days when the weather condi-
tions are appropriate for surfing. The temperature, the humidity, the speed
of the wind are attributes that can be measured and can be enumerated
and/or numerical. Surfing or not are the values of the class attribute. The
record for one single day represents one instance. The classification is used
for predicting the value of the class attribute for the future days.

2 WEKA’s main features

WEKA consists of four graphical user interface modules available to the
user. They are called Explorer, Experimenter, Knowledge Flow and Simple
Command Line Interface[3].

2.1 The Explorer

The Explorer is the main module for visualizing and preprocessing the input
data and applying machine learning algorithms to it.

2.1.1 Loading data

The data is usually stored in a spreadsheet or database and is also called
dataset. Each dataset consists of instances, which are represented by a row
in the spreadsheet or the database table.
The native data storage format of WEKA is ARFF (Attribute-Relation
File Format)2 . It consists of a header and data section. The first section
contains metadata describing the second. It consists of all instances’ at-
tributes and their types. The second section consists of attribute values
separated by commas. Starting with version 3.5.x this format is used and
an XML-based extension is created - XRFF (eXtensible attribute-Relation
File Format)3 . The metadata and the data section are represented in XML
and attribute and instance weights are added.
There are also other data formats supported. All of them can be found
in weka.core.converters package, which can be extended for more. Data can
also be loaded from a database using JDBC and from a URL.

2.1.2 Preprocessing Data

After data is loaded, it is shown in the ’Preprocess’ panel of the Explorer.
Summary statistics are available for every attribute from the dataset. If the
attribute is nominal the distribution of the instances according the attribute
values is shown. If the attribute is numerical the minimum, maximum, mean
and standard deviation are given.
2
http://www.cs.waikato.ac.nz/˜ml/weka/arff.html
3
http://weka.sourceforge.net/wekadoc/index.php/en:XRFF (3.5.4)

3
Over the dataset simple editing operations, like editing single values for
concrete instances and removing columns for all instances, can be done by
hand. Automatic operations can be done by filters. Usually the data for-
mat needs to be transformed for various reasons depending on the machine
learning scheme that will be used. For example a machine learning algorithm
might only accept numeric values of the attributes, so all non-numeric at-
tributes should be transformed in order for this algorithm to be used. A
filter is chosen from a tree view, which contains all available filters - see
figure 1. Each of them has a description about how it works and a refer-
ence on all parameters it uses. Most of the filters are explained in detail in
the book but since there are newer versions of WEKA new filters are also
implemented and can be chosen.

Figure 1: The ’Preprocess’ panel.

4
A list of some filters follows:

Discretize An instance filter that discretizes a range of

numeric attributes in the dataset into nom-
inal attributes
NominalToBinary Converts all nominal attributes into binary
numeric attributes
Normalize Scales all numeric values in the dataset to
lie within the interval [0, 1]
ReplaceMissingValues Replaces all missing values for nominal and
numeric attributes with the modes and
means of the training data
StringToWordVector Converts a string attribute to a vector that
represents word occurrence frequencies

After choosing an appropriate filter it can be applied to the initial

dataset. The result of this transformation is shown in the ’Preprocess’ panel.
Consecutive transformations can be applied in case additional preprocessing
is needed. The transformed dataset can also be saved as a file.

2.1.3 Building classifiers

After the input dataset is transformed in the format that is suitable for the
machine learning scheme, it can be fed to it. Building or training a classifier
is the process of creating a function or data structure that will be used for
determining the missing value of the class attribute of the new unclassified
instances. The concrete classifier can be chosen from the ’Classify’ panel of
the Explorer. There are numerous classifiers available. Each of them has a
description about how it works and a reference for all parameters it uses.
Most of the classifiers are explained in detail in the book but since there
are newer versions of WEKA new classifiers are implemented and can be
chosen.
A list of some classifiers is shown here:

Naı̈ve Bayes Standard probabilistic Naı̈ve Bayes classifier

J48 C4.5 decision tree learner
MultilayerPerceptron Backpropagation neural network
IBk k-nearest neighbour classifier
SVMReg Support vector machine for regression

The trained classifier can be evaluated either with an additional test set
or through k-fold cross validation, or by dividing the input dataset to a
training and test set. The result of the evaluation is shown in the ’Classifier
output’ pane - figure 2. It contains a textual representation of the created

5
Figure 2: The ’Classify’ panel. J48 classifier with corresponding parameters
used, evaluated with 10-fold cross validation. In the output pane a textual
representation of the build classifier and some statistics are shown.

6
model and statistics about the accuracy of the classifier, such as TP (True
Positive), FP (False Positive) rate and confusion matrix. TP rate shows
the percentage of instances whose predicted values of the class attribute are
identical with the actual values. FP rate shows the percentage of instances
whose predicted values of the class attribute are not identical with the ac-
tual values. The confusion matrix shows the number of instances of each
class that are assigned to all possible classes according to the classifier’s
prediction.
There is a special group of classifiers called meta-classifiers. They are
used to enhance the performance or to extend the capabilities of the other
classifiers.
A list of some important meta-classifiers is shown here:

AdaboostM1 Class for boosting a nominal class

classifier using the Adaboost M1
method
Bagging Class for bagging a classifier to re-
duce variance
AttributeSelectedClassifier Dimensionality of training and test
data is reduced by attribute selec-
tion before being passed on to a clas-
sifier
FilteredClassifier Class for running an arbitrary clas-
sifier on data that has been passed
through an arbitrary filter

The trained classifier can be saved. This is possible due to the serializa-
tion mechanism supported by the Java programming language.
Beside the classification schemes WEKA supply two other schemes -
association rules and clustering.

2.1.4 Association rules

There are few association rules algorithms implemented in WEKA. They try
to find associations between different attributes instead of trying to predict
the value of the class attribute. The interface for choosing and configuring
them is the same as for filters and classifiers. There is no option for choosing
test and training sets. The results shown in the output pane are quite similar
to these produced after building classifier. See figure 3.

7
Figure 3: The Apriori association rule applied over the training data. All
generated rules are shown in the ’Output’ pane.

2.1.5 Clustering
There are nine clustering algorithms implemented in WEKA. They also do
not try to predict the value of the class attribute but to divide the training
set into clusters. All the instances in one group are close, according to an
appropriate metric, to all instances in the same group and far from the
instances in the other groups. The interface for choosing and configuring
them is the same as for filters and classifiers. There are options for choosing
test and training sets. The results shown in the output pane are quite similar
to these produced after building classifier. See figure 4.

8
Figure 4: The SimpleKMeans algorithm applied over the training data and
the two resulting clusters are shown in the ’Output’ pane.

A list of some clustering algorithms is shown here:

Cobweb Generates hierarchical clustering, where clusters are de-

scribed probabilistically
SimpleKMeans The algorithm attempts to find the centers of natural
clusters in the training data
EM EM assigns a probability distribution to each instance
which indicates the probability of it belonging to each
of the clusters

2.1.6 Attribute selection

One important feature of WEKA, which may be crucial for some learning
schemes, is the opportunity of choosing a smaller subset from all attributes.
One reason could be that some algorithms work slower when the instances
have lots of attributes. Another could be that some attributes might not be
relevant. Both reasons lead to better classifiers. Determining the relevance
of the attributes is searching in all possible subsets of attributes and finding

9
the one subset that works best for classifying. Two operators are needed -
subset evaluator and search method. The search method traverses the whole
attribute subset space and uses the evaluator for quality measure. Both of
them can be chosen and configured similar to the filters and classifiers. After
an attribute selection is performed a list of all attributes and their relevance
rank is shown in the ’Output’ pane. See figure 5.

Figure 5: The ChiSquaredAttributeEval attribute evaluator and Ranker

search method ordered the attributes according to their relevance.

10
Some of the attribute evaluators are shown here:

PrincipalComponents Performs a principal component analysis

and transformation of the data
ChiSquaredAttirbuteEval Evaluates the worth of an attribute by com-
puting the value of the chi-squared statistic
with respect to the class
ReliefFAttributeEval Evaluates the worth of an attribute by re-
peatedly sampling an instance and consid-
ering the value of the given attribute for the
nearest instance of the same and a different
class

2.1.7 Visualizing the data

The last panel of the explorer is used for visualizing input data. It shows
the 2D distribution of the data. On the X and Y axes, any attribute can
be plotted. This way by choosing the class attribute on one of the axes and
changing the attribute on the other axis we can see how good or bad each
attribute can separate the data. See figure 6.

11
Figure 6: In the ’Visualize’ panel all possible 2D distributions are shown.

2.2 The Knowledge flow

Another approach for accessing and using the same functionality but with
a drag-and-drop style is the Knowledge flow module. All components, like
data loaders, classifiers, clusterers, attribute selectors etc. can be placed on
canvas, where they are connected to each other into a graph - see figure 7.
Its functionality is similar to that which the Explorer offers, but also adds
designing and execution of configurations for streamed data processing. The

12
Figure 7: The Knowledge flow

visualization functionality is also more developed. The model of the built

classifiers can be seen on each step of the cross validation process. For
all classifiers that can handle data incrementally an additional visualizing
component - the strip chart - plots the accuracy against the time.

2.3 The Experimenter

The Explorer and the Knowledge flow are not suitable for performing mul-
tiple experiments. In order to claim that a learning scheme is better than
another they need to be evaluated on different datasets and with differ-
ent parameter settings- see figure 8. The Experimenter can automate this
process and allows analyzing the results.
There are two different approaches for setting up the experiments. The
simple one allows choosing datasets, the method of splitting data into train-

13
Figure 8: The Experimenter setted up with two different data sets, with
three different classifiers while one of them is with two different sets of
parameters.

ing and test set and all different learning schemes. In the more advanced
one there is an option for performing a distributed experiment using RMI.
Various statistics such as error rate, percentage incorrect can be shown
in the ’Analyze’ panel and used for finding the better classifier.

2.4 The Simple command line interface

WEKA is not just front-end graphical tool. It is written entirely in Java and
all learning schemes are organized in packages. They are used in all GUI
components explained before. But any learning scheme has also a uniform
command line interface. In the Simple CLI a command can be entered
and the output is shown. A command is just invoking a classifier from a

14
certain package, specifying its options and providing input data. The output
looks like the output in the ’Output’ pane of the ’Classify’ or ’Cluster’ or
’Associate’ panels of the Explorer.

2.5 The Java interface

WEKA’s functionality can also be embedded in any Java application. All
of the packages are in weka.jar, which comes with the installation package
available on the WEKA homepage. The implementations of the algorithms
are also available so they can be investigated and modified.
The developers of WEKA are using javadoc for the creating documenta-
tion of the classes and functions. This documentation is useful, but is missing
some details and behaviors for some of the functions. In the book there are
examples for the most trivial usage of some basic functionality of the classes.
Altogether this means that making something more complicated will involve
an extended effort in looking at the implementations of some algorithms and
searching in FAQs in the internet. Another issue are some problems with
memory management in WEKA. Running some algorithms with big amount
of data often leads to OutOfMemoryException. On the other hand just using
an existing learning scheme is really easy because of the abstraction layer
provided by all classifiers. Every classifier extends weka.classifiers.Classifier
and thus has buildClassifier and classifyInstance(for nominal classes) or dis-
tributionForInstance (for numerical classes) methods. These two methods
are sufficient for classifying new instances.

3 Experimental results
A small spam filtering application was developed using WEKA’s classifiers
as a practical assignment for the seminar. The team included three students
attending the seminar.
The system reads emails from a directory on the hard drive. Then it
builds six classifiers, saves them on the hard drive and evaluates them. The
classification step consists of loading one of the saved classifiers and putting
labels on emails situated in a user defined folder.
Emails were collected from a repository, containing all emails of a US
bankrupted company. They are parsed in order to extract the body and
the subject of the email and some other features that we assumed could be
important for correct predicting. Namely: are there any HTML tags, are
there any Jscripts, is there NoToField, number of recipients and percentage
of capital letters. Then from each email an instance was created using all
the features as attributes. With the produced instances an initial sets of
training and testing data were created.
Then follows data preprocessing in terms of applying WEKA’s filter.
The filter used is weka.filters.unsupervised.attribute.StringToWordVector. It

15
converts String attributes into a set of attributes representing word occur-
rence information from the text contained in the strings. A parameter for
making some words that occur only in one of the classes more important
is set. This way each instance is converted to an instance with about 4000
attributes. Adding the fact that we used 2000 spam and 2000 non-spam
emails for building the classifiers this is already a big input training set.
After having some OutOfMemoryExceptions a decision for applying some
attribute selection was taken.
A meta-classifier for reducing the dimensionality of training and test
data is used for combining a non-meta classifier with an attribute selection
method. All five classifiers - J48, IBk, NaiveBayesUpdateable, Multilayer-
Perceptron and SMO - are used together with the ChiSquaredAttributeEval
attribute evaluator and Ranker search method. 250 relevant attributers
were chosen and used in the building of these classifiers.
We also implemented a brand new classifier. It extends the base
weka.classifiers.Classifier, implements the buildClassifier and overrides clas-
sifyInstance methods. It combines the decision tree, the k-nearest neigbours
and the support vector machine algorithms already built with reduced num-
ber of attributes. For classifying it performs majority voting of the three
classifiers.
The evaluation of each classifier is performed with a 10-fold cross vali-
dation. The following results are obtained for building time and accuracy:

Classifier Training time in min Evaluation with

10-fold cross
validation in %
k-nearest neighbours 3 91.8
decision tree 3 94.7
naı̈ve bayes 3 86.6
neural network 20 92.8
support vector machine 3 95.6
meta 9 94.0

The lowest value for the naı̈ve bayes classifier we got was suspicious
enough on that phase to think, that somehow the training data is biased.
This was proven when we tested our best classifier - support vector machine
- with an independent test set. The accuracy was about 50% with a lot of
true negative (non-spam mails classified as spam) predictions.

16
4 Conclusion
The WEKA machine learning workbench provides an environment with al-
gorithms for data preprocessing, feature selection, classification, regression,
and clustering. They are complemented by graphical user interfaces for input
data and build modes exploration. It supports also experimental compari-
son of one with varying parameters or different algorithms applied on one or
more datasets. This is done in order to facilitate the process of extraction
of useful information from the data. The input dataset is in form of a table.
Any row represents a single instance and any column represents a different
attribute.
Perhaps the most important feature is the uniform Java interface to all
algorithms. They are organized in packages and when the weka.jar is added
to a standalone project only import section is needed in order to get access
to any functionality of WEKA.
WEKA has some memory issues. It expects that the data set will be
completely loaded into the main memory, which is not possible for some data
mining tasks. It is very slow on large data sets. For k-cross fold validation
WEKA creates k copies of the original data. Only one copy exists at a time,
but recourses for copying are used in vain.
There are a lot of projects that are using WEKA to some extent or
even extend it. One of them is BioWEKA[1], which is used for knowledge
discovery and data analysis in biology, biochemistry and bioinformatics. It is
an open source project by the Ludwig Maximilians-Universitaet Muenchen.
For example, there are filters for translating DNA to RNA sequences and
vice versa.
Another project is YALE (Yet Another Learning Environment)[2], which
is implemented in the University of Dortmund. It supports composition
and analysis of complex operator chains consisting of different nested pre-
processing, building of classifiers, evaluating and complex feature generators
for introducing new attributes.

References
[1] Jan E. Gewehr, Martin Szugat, and Ralf Zimmer. BioWeka -Extending
the Weka Framework for Bioinformatics. Bioinformatics, page btl671,
2007.

[2] S. Fischer I. Mierswa und S. Felske O. Ritthoff, R. Klinkenberg. YALE

- Yet Another Learning Environment. In: LLWA 01 - Tagungs-
band der GI-Workshop-Woche Lernen - Lehren - Wissen - Adaptivitaet,
Forschungsberichte des Fachbereichs Informatik, Universitaet Dortmund,
2001.

17
[3] I. Witten, E. Frank, L. Trigg, M. Hall, G. Holmes, and S. Cunning-
ham. Weka: Practical machine learning tools and techniques with java
implementations. 1999.

[4] Ian H. Witten and Eibe Frank. Data mining: Practical machine learning
tools and techniques. 2005.

Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
Confined Space Entry Procedure
100% (2)
Confined Space Entry Procedure
4 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Wekappt
No ratings yet
Wekappt
58 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
No ratings yet
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
41 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
No ratings yet
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
19 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Weka (20030421-Version1 by Kdelab)
No ratings yet
Weka (20030421-Version1 by Kdelab)
51 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Data Mining (WEKA) en Formatted
No ratings yet
Data Mining (WEKA) en Formatted
52 pages
Machine Learning With WEKA An Introduction
No ratings yet
Machine Learning With WEKA An Introduction
66 pages
What Is Weka
No ratings yet
What Is Weka
2 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Meka Tutorial
No ratings yet
Meka Tutorial
18 pages
NOTES
No ratings yet
NOTES
45 pages
Lab Manual
No ratings yet
Lab Manual
24 pages
Result Prediction Using Weka: An Effort by - Shlok Tibrewal (14bit0088) Siddarth Nyati (14bit0074)
No ratings yet
Result Prediction Using Weka: An Effort by - Shlok Tibrewal (14bit0088) Siddarth Nyati (14bit0074)
11 pages
Unit-7 Tools of AI (April 9, 2024)
No ratings yet
Unit-7 Tools of AI (April 9, 2024)
88 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Weka Experiment
No ratings yet
Weka Experiment
13 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
ExplorerGuide A Version 3-5-8
No ratings yet
ExplorerGuide A Version 3-5-8
22 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
No ratings yet
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
16 pages
AI32 Guide To Weka PDF
No ratings yet
AI32 Guide To Weka PDF
6 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Introduction To Weka: Xingquan (Hill) Zhu
No ratings yet
Introduction To Weka: Xingquan (Hill) Zhu
63 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Lecture 7 - Weka
No ratings yet
Lecture 7 - Weka
69 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
Rocky Mountain Spotted Fever
No ratings yet
Rocky Mountain Spotted Fever
9 pages
Physical Science Q2 Week 6 SLM 7
33% (3)
Physical Science Q2 Week 6 SLM 7
15 pages
MESOPOTAMIA
No ratings yet
MESOPOTAMIA
22 pages
Lesson 21 Organic & Inorganic Chemistry
No ratings yet
Lesson 21 Organic & Inorganic Chemistry
5 pages
Writing Research Report
No ratings yet
Writing Research Report
33 pages
Cot-English 2 Q2 W6
No ratings yet
Cot-English 2 Q2 W6
7 pages
SNEHA JADHAV Projects........... 2000
No ratings yet
SNEHA JADHAV Projects........... 2000
84 pages
Unit 2
No ratings yet
Unit 2
39 pages
Ogunka 3 PDF
No ratings yet
Ogunka 3 PDF
18 pages
Irosin Wacs
No ratings yet
Irosin Wacs
22 pages
Isolationism-Imperialism Essay
No ratings yet
Isolationism-Imperialism Essay
4 pages
Cylinder Liner - Production Recommendation 0742048 3
No ratings yet
Cylinder Liner - Production Recommendation 0742048 3
17 pages
12 - Memory Management, Garbage Collection, Immutability, and Design by Contrac
No ratings yet
12 - Memory Management, Garbage Collection, Immutability, and Design by Contrac
3 pages
General Ledger of Journal 1
No ratings yet
General Ledger of Journal 1
8 pages
Frequency-Dependence of Relative Permeability in Steel
No ratings yet
Frequency-Dependence of Relative Permeability in Steel
8 pages
Polysafe Strata Product Spec
No ratings yet
Polysafe Strata Product Spec
1 page
Astm F 1145
100% (2)
Astm F 1145
12 pages
EDST2003 Week 1 Final
No ratings yet
EDST2003 Week 1 Final
54 pages
7 Key Principles of Apparel Costing - Textile Tutorials
No ratings yet
7 Key Principles of Apparel Costing - Textile Tutorials
2 pages
The Algerian Democratic Republic High School: Khalifa Ben Mahmoud First English Exam
100% (2)
The Algerian Democratic Republic High School: Khalifa Ben Mahmoud First English Exam
3 pages
Traducción Del Libro Java-HOW TO PROGRAM - Ninth Edition-2012 - Paul & Harvel Deitel
No ratings yet
Traducción Del Libro Java-HOW TO PROGRAM - Ninth Edition-2012 - Paul & Harvel Deitel
2 pages
Aroon Kumar: "Award Winning Global Marketer and Digital Business Leader"
No ratings yet
Aroon Kumar: "Award Winning Global Marketer and Digital Business Leader"
6 pages
AoR-2020 (MGNREGA), Vol-I
No ratings yet
AoR-2020 (MGNREGA), Vol-I
135 pages
Small Engines: Global Motorcycle Trends E-Mobility Trends Emissions Legislation Upgrades Motorcycle Market
No ratings yet
Small Engines: Global Motorcycle Trends E-Mobility Trends Emissions Legislation Upgrades Motorcycle Market
8 pages
Passive - Comparative - Mini Test PDF
No ratings yet
Passive - Comparative - Mini Test PDF
2 pages
CV Vetting Guidelines 2023-24
No ratings yet
CV Vetting Guidelines 2023-24
16 pages
Science-Unit-Plann-Final 2
No ratings yet
Science-Unit-Plann-Final 2
111 pages

Weka Software Manuala

Uploaded by

Weka Software Manuala

Uploaded by

Weka: Practical machine learning tools and

techniques with Java implementations

University of Saarland, WS 06/07

April 30, 2007

With recent advances in computer technology large amounts of data

2 WEKA’s main features 3

• The instances are objects from a space of fixed dimension.

• Each dimension corresponds to a so-called attribute of the object.

• Most often attributes could be nominal (enumerated) or numerical

• One special attribute is the class attribute, which determines the

• A dataset is a set of instances.

• Training set is a set that is used for building a classifier, which is

• Test set is a set that is used for evaluation a classifier.

2 WEKA’s main features

2.1 The Explorer

2.1.1 Loading data

2.1.2 Preprocessing Data

Figure 1: The ’Preprocess’ panel.

Discretize An instance filter that discretizes a range of

After choosing an appropriate filter it can be applied to the initial

2.1.3 Building classifiers

Naı̈ve Bayes Standard probabilistic Naı̈ve Bayes classifier

AdaboostM1 Class for boosting a nominal class

2.1.4 Association rules

A list of some clustering algorithms is shown here:

Cobweb Generates hierarchical clustering, where clusters are de-

2.1.6 Attribute selection

Figure 5: The ChiSquaredAttributeEval attribute evaluator and Ranker

PrincipalComponents Performs a principal component analysis

2.1.7 Visualizing the data

2.2 The Knowledge flow

visualization functionality is also more developed. The model of the built

2.3 The Experimenter

2.4 The Simple command line interface

2.5 The Java interface

Classifier Training time in min Evaluation with

[2] S. Fischer I. Mierswa und S. Felske O. Ritthoff, R. Klinkenberg. YALE

You might also like