0% found this document useful (0 votes)

15 views11 pages

WEKA A Machine Learning Workbench For Data Mining

The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools.

Uploaded by

Laylla Toledo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views11 pages

WEKA A Machine Learning Workbench For Data Mining

The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools.

Uploaded by

Laylla Toledo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Chapter 1

WEKA
A Machine Learning Workbench for Data Mining

Eibe Frank, Mark Hall, Geoffrey Holmes, Richard Kirkby, Bernhard

Pfahringer, Ian H. Witten
Department of Computer Science, University of Waikato, Hamilton, New Zealand
{eibe, mhall, geoff, rkirkby, bernhard, ihw}@cs.waikato.ac.nz

Len Trigg
Reel Two, P O Box 1538, Hamilton, New Zealand
[email protected]

Abstract The Weka workbench is an organized collection of state-of-the-art ma-

chine learning algorithms and data preprocessing tools. The basic way
of interacting with these methods is by invoking them from the com-
mand line. However, convenient interactive graphical user interfaces are
provided for data exploration, for setting up large-scale experiments on
distributed computing platforms, and for designing configurations for
streamed data processing. These interfaces constitute an advanced en-
vironment for experimental data mining. The system is written in Java
and distributed under the terms of the GNU General Public License.

Keywords: machine learning software, data mining, data preprocessing, data visu-
alization, extensible workbench

1. Introduction
Experience shows that no single machine learning method is appro-
priate for all possible learning problems. The universal learner is an
idealistic fantasy. Real datasets vary, and to obtain accurate models the
bias of the learning algorithm must match the structure of the domain.
The Weka workbench is a collection of state-of-the-art machine learn-
ing algorithms and data preprocessing tools. It is designed so that users

1
2

Figure 1.1. The Explorer interface.

can quickly try out existing machine learning methods on new datasets
in very flexible ways. It provides extensive support for the whole process
of experimental data mining, including preparing the input data, evalu-
ating learning schemes statistically, and visualizing both the input data
and the result of learning. This has been accomplished by including a
wide variety of algorithms for learning different types of concepts, as well
as a wide range of preprocessing methods. This diverse and comprehen-
sive set of tools can be invoked through a common interface, making it
possible for users to compare different methods and identify those that
are most appropriate for the problem at hand.
The workbench includes methods for all the standard data mining
problems: regression, classification, clustering, association rule mining,
and attribute selection. Getting to know the data is is a very important
part of data mining, and many data visualization facilities and data
preprocessing tools are provided. All algorithms and methods take their
input in the form of a single relational table, which can be read from a
file or generated by a database query.

Exploring the Data

The main graphical user interface, the “Explorer,” is shown in Fig-
ure 1.1. It has six different panels, accessed by the tabs at the top, that
correspond to the various data mining tasks supported. In the “Pre-
process” panel shown in Figure 1.1, data can be loaded from a file or
extracted from a database using an SQL query. The file can be in CSV
Weka 3

format, or in the system’s native ARFF file format. Database access is

provided through Java Database Connectivity, which allows SQL queries
to be posed to any database for which a suitable driver exists. Once a
dataset has been read, various data preprocessing tools, called “filters,”
can be applied—for example, numeric data can be discretized. In Fig-
ure 1.1 the user has loaded a data file and is focusing on a particular
attribute, normalized-losses, examining its statistics and a histogram.
Through the Explorer’s second panel, called “Classify,” classification
and regression algorithms can be applied to the preprocessed data. This
panel also enables users to evaluate the resulting models, both numer-
ically through statistical estimation and graphically through visualiza-
tion of the data and examination of the model (if the model structure is
amenable to visualization). Users can also load and save models.
The third panel, “Cluster,” enables users to apply clustering algo-
rithms to the dataset. Again the outcome can be visualized, and, if
the clusters represent density estimates, evaluated based on the statis-
tical likelihood of the data. Clustering is one of two methodologies for
analyzing data without an explicit target attribute that must be pre-
dicted. The other one comprises association rules, which enable users
to perform a market-basket type analysis of the data. The fourth panel,
“Associate,” provides access to algorithms for learning association rules.
Attribute selection, another important data mining task, is supported
by the next panel. This provides access to various methods for measur-
ing the utility of attributes, and for finding attribute subsets that are
predictive of the data. Users who like to analyze the data visually are
supported by the final panel, “Visualize.” This presents a color-coded
scatter plot matrix, and users can then select and enlarge individual
plots. It is also possible to zoom in on portions of the data, to retrieve
the exact record underlying a particular data point, and so on.
The Explorer interface does not allow for incremental learning, be-
cause the Preprocess panel loads the dataset into main memory in its
entirety. That means that it can only be used for small to medium sized
problems. However, some incremental algorithms are implemented that
can be used to process very large datasets. One way to apply these is
through the command-line interface, which gives access to all features of
the system. An alternative, more convenient, approach is to use the sec-
ond major graphical user interface, called “Knowledge Flow.” Illustrated
in Figure 1.2, this enables users to specify a data stream by graphically
connecting components representing data sources, preprocessing tools,
learning algorithms, evaluation methods, and visualization tools. Using
it, data can be processed in batches as in the Explorer, or loaded and
4

Figure 1.2. The Knowledge Flow interface.

Figure 1.3. The Experimenter interface.

processed incrementally by those filters and learning algorithms that are

capable of incremental learning.
An important practical question when applying classification and re-
gression techniques is to determine which methods work best for a given
problem. There is usually no way to answer this question a priori, and
one of the main motivations for the development of the workbench was
to provide an environment that enables users to try a variety of learning
techniques on a particular problem. This can be done interactively in
Weka 5

the Explorer. However, to automate the process Weka includes a third

interface, the “Experimenter,” shown in Figure 1.3. This makes it easy
to run the classification and regression algorithms with different param-
eter settings on a corpus of datasets, collect performance statistics, and
perform significance tests on the results. Advanced users can also use the
Experimenter to distribute the computing load across multiple machines
using Java Remote Method Invocation.

Methods and Algorithms

Weka contains a comprehensive set of useful algorithms for a panoply
of data mining tasks. These include tools for data engineering (called
“filters”), algorithms for attribute selection, clustering, association rule
learning, classification and regression. In the following subsections we
list the most important algorithms in each category. Most well-known
algorithms are included, along with a few less common ones that natu-
rally reflect the interests of our research group.
An important aspect of the architecture is its modularity. This allows
algorithms to be combined in many different ways. For example, one can
combine bagging, boosting, decision tree learning and arbitrary filters
directly from the graphical user interface, without having to write a
single line of code. Most algorithms have one or more options that can
be specified. Explanations of these options and their legal values are
available as built-in help in the graphical user interfaces. They can also
be listed from the command line. Additional information and pointers
to research publications describing particular algorithms may be found
in the internal Javadoc documentation.

Classification. Implementations of almost all main-stream classifi-

cation algorithms are included. Bayesian methods include naive Bayes,
complement naive Bayes, multinomial naive Bayes, Bayesian networks,
and AODE. There are many decision tree learners: decision stumps,
ID3, a C4.5 clone called “J48,” trees generated by reduced error prun-
ing, alternating decision trees, and random trees and forests thereof.
Rule learners include OneR, an implementation of Ripper called “JRip,”
PART, decision tables, single conjunctive rules, and Prism. There are
several separating hyperplane approaches like support vector machines
with a variety of kernels, logistic regression, voted perceptrons, Winnow
and a multi-layer perceptron. There are many lazy learning methods
like IB1, IBk, lazy Bayesian rules, KStar, and locally-weighted learning.
As well as the basic classification learning methods, so-called
“meta-learning” schemes enable users to combine instances of one or
more of the basic algorithms in various ways: bagging, boosting (includ-
6

ing the variants AdaboostM1 and LogitBoost), and stacking. A method

called “FilteredClassifier” allows a filter to be paired up with a classifier.
Classification can be made cost-sensitive, or multi-class, or ordinal-class.
Parameter values can be selected using cross-validation.

Regression. There are implementations of many regression schemes.

They include simple and multiple linear regression, pace regression, a
multi-layer perceptron, support vector regression, locally-weighted learn-
ing, decision stumps, regression and model trees (M5) and rules (M5rules).
The standard instance-based learning schemes IB1 and IBk can be ap-
plied to regression problems (as well as classification problems). More-
over, there are additional meta-learning schemes that apply to regression
problems, such as additive regression and regression by discretization.

Clustering. At present, only a few standard clustering algorithms

are included: KMeans, EM for naive Bayes models, farthest-first clus-
tering, and Cobweb. This list is likely to grow in the near future.

Association rule learning. The standard algorithm for association

rule induction is Apriori, which is implemented in the workbench. Two
other algorithms implemented in Weka are Tertius, which can extract
first-order rules, and Predictive Apriori, which combines the standard
confidence and support statistics into a single measure.

Attribute selection. Both wrapper and filter approaches to at-

tribute selection are supported. A wide range of filtering criteria are im-
plemented, including correlation-based feature selection, the chi-square
statistic, gain ratio, information gain, symmetric uncertainty, and a
support vector machine-based criterion. There are also a variety of
search methods: forward and backward selection, best-first search, ge-
netic search, and random search. Additionally, principal components
analysis can be used to reduce the dimensionality of a problem.

Filters. Processes that transform instances and sets of instances are

called “filters,” and they are classified according to whether they make
sense only in a prediction context (called “supervised”) or in any context
(called “unsupervised”). We further split them into “attribute filters,”
which work on one or more attributes of an instance, and “instance
filters,” which manipulate sets of instances.
Unsupervised attribute filters include adding a new attribute, adding
a cluster indicator, adding noise, copying an attribute, discretizing a
numeric attribute, normalizing or standardizing a numeric attribute,
Weka 7

making indicators, merging attribute values, transforming nominal to

binary values, obfuscating values, swapping values, removing attributes,
replacing missing values, turning string attributes into nominal ones
or word vectors, computing random projections, and processing time
series data. Unsupervised instance filters transform sparse instances
into non-sparse instances and vice versa, randomize and resample sets
of instances, and remove instances according to certain criteria.
Supervised attribute filters include support for attribute selection, dis-
cretization, nominal to binary transformation, and re-ordering the class
values. Finally, supervised instance filters resample and subsample sets
of instances to generate different class distributions—stratified, uniform,
and arbitrary user-specified spreads.

System Architecture
In order to make its operation as flexible as possible, the workbench
was designed with a modular, object-oriented architecture that allows
new classifiers, filters, clustering algorithms and so on to be added easily.
A set of abstract Java classes, one for each major type of component,
were designed and placed in a corresponding top-level package.
All classifiers reside in subpackages of the top level “classifiers” pack-
age and extend a common base class called “Classifier.” The Classifier
class prescribes a public interface for classifiers and a set of conventions
by which they should abide. Subpackages group components accord-
ing to functionality or purpose. For example, filters are separated into
those that are supervised or unsupervised, and then further by whether
they operate on an attribute or instance basis. Classifiers are organized
according to the general type of learning algorithm, so there are sub-
packages for Bayesian methods, tree inducers, rule learners, etc.
All components rely to a greater or lesser extent on supporting classes
that reside in a top level package called “core.” This package provides
classes and data structures that read data sets, represent instances and
attributes, and provide various common utility methods. The core pack-
age also contains additional interfaces that components may implement
in order to indicate that they support various extra functionality. For
example, a classifier can implement the “WeightedInstancesHandler” in-
terface to indicate that it can take advantage of instance weights.
A major part of the appeal of the system for end users lies in its graph-
ical user interfaces. In order to maintain flexibility it was necessary to
engineer the interfaces to make it as painless as possible for developers
to add new components into the workbench. To this end, the user in-
terfaces capitalize upon Java’s introspection mechanisms to provide the
8

ability to configure each component’s options dynamically at runtime.

This frees the developer from having to consider user interface issues
when developing a new component. For example, to enable a new clas-
sifier to be used with the Explorer (or either of the other two graphical
user interfaces), all a developer need do is follow the Java Bean con-
vention of supplying “get” and “set” methods for each of the classifier’s
public options.

Applications
Weka was originally developed for the purpose of processing agri-
cultural data, motivated by the importance of this application area in
New Zealand. However, the machine learning methods and data engi-
neering capability it embodies have grown so quickly, and so radically,
that the workbench is now commonly used in all forms of data min-
ing applications—from bioinformatics to competition datasets issued by
major conferences such as Knowledge Discovery in Databases.
New Zealand has several research centres dedicated to agriculture and
horticulture, which provided the original impetus for our work, and many
of our early applications. For example, we worked on predicting the
internal bruising sustained by different varieties of apple as they make
their way through a packing-house on a conveyor belt Holmes et al., 1998;
predicting, in real time, the quality of a mushroom from a photograph in
order to provide automatic grading Kusabs et al., 1998; and classifying
kiwifruit vines into twelve classes, based on visible-NIR spectra, in order
to determine which of twelve pre-harvest fruit management treatments
has been applied to the vines Holmes and Hall, 2002. The applicability
of the workbench in agricultural domains was the subject of user studies
McQueen et al., 1998 that demonstrated a high level of satisfaction with
the tool and gave some advice on improvements.
There are countless other applications, actual and potential. As just
one example, Weka has been used extensively in the field of bioinfor-
matics. Published studies include automated protein annotation Baz-
zan et al., 2002, probe selection for gene expression arrays Tobler et al.,
2002, plant genotype discrimination Taylor et al., 2002, and classifying
gene expression profiles and extracting rules from them Li et al., 2003.
Text mining is another major field of application, and the workbench has
been used to automatically extract key phrases from text Frank et al.,
1999, and for document categorization Sauban and Pfahringer, 2003 and
word sense disambiguation Pedersen, 2002.
The workbench makes it very easy to perform interactive experiments,
so it is not surprising that most work has been done with small to
REFERENCES 9

medium sized datasets. However, larger datasets have been successfully

processed. Very large datasets are typically split into several training
sets, and a voting-
committee structure is used for prediction. The recent development of
the knowledge flow interface should see larger scale application develop-
ment, including online learning from streamed data.
Many future applications will be developed in an online setting. Re-
cent work on data streams Holmes et al., 2003 has enabled machine
learning algorithms to be used in situations where a potentially infinite
source of data is available. These are common in manufacturing indus-
tries with 24/7 processing. The challenge is to develop models that con-
stantly monitor data in order to detect changes from the steady state.
Such changes may indicate failure in the process, providing operators
with warning signals that equipment needs re-calibrating or replacing.

Summing up the Workbench

Weka has three principal advantages over most other data mining
software. First, it is open source, which not only means that it can
be obtained free, but—more importantly—it is maintainable, and mod-
ifiable, without depending on the commitment, health, or longevity of
any particular institution or company. Second, it provides a wealth of
state-of-the-art machine learning algorithms that can be deployed on
any given problem. Third, it is fully implemented in Java and runs on
almost any platform—even a Personal Digital Assistant.
The main disadvantage is that most of the functionality is only appli-
cable if all data is held in main memory. A few algorithms are included
that are able to process data incrementally or in batches Frank et al.,
2002. However, for most of the methods the amount of available mem-
ory imposes a limit on the data size, which restricts application to small
or medium-sized datasets. If larger datasets are to be processed, some
form of subsampling is generally required. A second disadvantage is the
flip side of portability: a Java implementation may be somewhat slower
than an equivalent in C/C++.

Acknowledgments
Many thanks to past and present members of the Waikato machine
learning group and the many external contributors for all the work they
have put into Weka.
10

References
Bazzan, A. L., Engel, P. M., Schroeder, L. F., and da Silva, S. C. (2002).
Automated annotation of keywords for proteins related to mycoplas-
mataceae using machine learning techniques. Bioinformatics, 18:35S–
43S.
Frank, E., Holmes, G., Kirkby, R., and Hall, M. (2002). Racing commit-
tees for large datasets. In Proceedings of the International Conference
on Discovery Science, pages 153–164. Springer-Verlag.
Frank, E., Paynter, G. W., Witten, I. H., Gutwin, C., and Nevill-Manning,
C. G. (1999). Domain-specific keyphrase extraction. In Proceedings
of the 16th International Joint Conference on Artificial Intelligence,
pages 668–673. Morgan Kaufmann.
Holmes, G., Cunningham, S. J., Rue, B. D., and Bollen, F. (1998). Pre-
dicting apple bruising using machine learning. Acta Hort, 476:289–
296.
Holmes, G. and Hall, M. (2002). A development environment for predic-
tive modelling in foods. International Journal of Food Microbiology,
73:351–362.
Holmes, G., Kirkby, R., and Pfahringer, B. (2003). Mining data streams
using option trees. Technical Report 08/03, Department of Computer
Science, University of Waikato.
Kusabs, N., Bollen, F., Trigg, L., Holmes, G., and Inglis, S. (1998).
Objective measurement of mushroom quality. In Proc New Zealand
Institute of Agricultural Science and the New Zealand Society for Hor-
ticultural Science Annual Convention, page 51.
Li, J., Liu, H., Downing, J. R., Yeoh, A. E.-J., and Wong, L. (2003).
Simple rules underlying gene expression profiles of more than six sub-
types of acute lymphoblastic leukemia (all) patients. Bioinformatics,
19:71–78.
McQueen, R., Holmes, G., and Hunt, L. (1998). User satisfaction with
machine learning as a data analysis method in agricultural research.
New Zealand Journal of Agricultural Research, 41(4):577–584.
Pedersen, T. (2002). Evaluating the effectiveness of ensembles of decision
trees in disambiguating Senseval lexical samples. In Proceedings of the
ACL-02 Workshop on Word Sense Disambiguation: Recent Successes
and Future Directions.
Sauban, M. and Pfahringer, B. (2003). Text categorisation using doc-
ument profiling. In Proceedings of the 7th European Conference on
Principles and Practice of Knowledge Discovery in Databases, pages
411–422. Springer.
REFERENCES 11

Taylor, J., King, R. D., Altmann, T., and Fiehn, O. (2002). Application
of metabolomics to plant genotype discrimination using statistics and
machine learning. Bioinformatics, 18:241S–248S.
Tobler, J. B., Molla, M., Nuwaysir, E., Green, R., and Shavlik, J. (2002).
Evaluating machine learning approaches for aiding probe selection for
gene-expression arrays. Bioinformatics, 18:164S–171S.

Monthly Statement: Name Address Account Number Statement Period
No ratings yet
Monthly Statement: Name Address Account Number Statement Period
8 pages
Criminal Procedure - Reviewer
No ratings yet
Criminal Procedure - Reviewer
16 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
DWM1
No ratings yet
DWM1
19 pages
Lab Manual - DM
No ratings yet
Lab Manual - DM
56 pages
Weka Tutorial
100% (1)
Weka Tutorial
58 pages
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
Unit-7 Tools of AI (April 9, 2024)
No ratings yet
Unit-7 Tools of AI (April 9, 2024)
88 pages
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
No ratings yet
Machine Learning: Algorithms and Applications: Quang Nhat Nguyen
16 pages
Lab 12 Introduction To Rapidminer/Weka.: Objective
No ratings yet
Lab 12 Introduction To Rapidminer/Weka.: Objective
24 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
ExplorerGuide A Version 3-5-8
No ratings yet
ExplorerGuide A Version 3-5-8
22 pages
Bioinformatics: Applications Note
No ratings yet
Bioinformatics: Applications Note
3 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
DM Lab Task-1 Expr's-1
No ratings yet
DM Lab Task-1 Expr's-1
58 pages
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
No ratings yet
WEKA Explorer User Guide For Version 3-4: Richard Kirkby Eibe Frank July 15, 2008
13 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
Data Mining Complete Lab Manual - DRSNR
No ratings yet
Data Mining Complete Lab Manual - DRSNR
27 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
Data Mining (WEKA) en Formatted
No ratings yet
Data Mining (WEKA) en Formatted
52 pages
Weka Clustering
No ratings yet
Weka Clustering
15 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
Expt 1 Docx
No ratings yet
Expt 1 Docx
15 pages
Weka U5
No ratings yet
Weka U5
3 pages
Lab 04
No ratings yet
Lab 04
7 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
DM Lab
No ratings yet
DM Lab
101 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Itdw
No ratings yet
Itdw
44 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
OS Journal
No ratings yet
OS Journal
28 pages
Lecture 7 - Weka
No ratings yet
Lecture 7 - Weka
69 pages
Weka (20030421-Version1 by Kdelab)
No ratings yet
Weka (20030421-Version1 by Kdelab)
51 pages
Wa0002.
No ratings yet
Wa0002.
21 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
CCS341-DW LAB Manual - Chumma Chumma Practical Notes
No ratings yet
CCS341-DW LAB Manual - Chumma Chumma Practical Notes
89 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Wekappt
No ratings yet
Wekappt
58 pages
Weka DW&DM Lab Notes
No ratings yet
Weka DW&DM Lab Notes
37 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
From Everand
Lexicon of Programming Terminology: Lexicon of Tech and Business, #17
Mustafa Al-Dori
5/5 (1)
Emergency
No ratings yet
Emergency
5 pages
GOP Moves To Extend Ballot Verification
No ratings yet
GOP Moves To Extend Ballot Verification
6 pages
Rishil Jain (Bba) Project Report
No ratings yet
Rishil Jain (Bba) Project Report
33 pages
Ethics and Culture: LIS 580: Spring 2006 Instructor-Michael Crandall
No ratings yet
Ethics and Culture: LIS 580: Spring 2006 Instructor-Michael Crandall
26 pages
Warning: Replacing The Main Chassis Batteries
No ratings yet
Warning: Replacing The Main Chassis Batteries
4 pages
Bussman Eaton Guide
100% (5)
Bussman Eaton Guide
316 pages
Chap 8 AE
No ratings yet
Chap 8 AE
8 pages
COMSOL
No ratings yet
COMSOL
20 pages
Form Ii - Phys - Pre Mock
No ratings yet
Form Ii - Phys - Pre Mock
7 pages
Scenarıos To Tercıos Volume I Travlos DRAFT
100% (1)
Scenarıos To Tercıos Volume I Travlos DRAFT
36 pages
Hfe in Healthcare
No ratings yet
Hfe in Healthcare
14 pages
Roccal SDS
No ratings yet
Roccal SDS
7 pages
Report Socio of Family
No ratings yet
Report Socio of Family
2 pages
The Practice of Religion
No ratings yet
The Practice of Religion
26 pages
Planner
No ratings yet
Planner
2 pages
Computing Key Stage 3 Lesson COMy9u5L1
No ratings yet
Computing Key Stage 3 Lesson COMy9u5L1
20 pages
PT1.3 Capacitor and Dielectrics
No ratings yet
PT1.3 Capacitor and Dielectrics
5 pages
Government-Schemes-June-2024 1723610626565 OB
No ratings yet
Government-Schemes-June-2024 1723610626565 OB
29 pages
Cs101 Final Term Solved Papers 2014
100% (1)
Cs101 Final Term Solved Papers 2014
8 pages
A630 A640 Advanced - Manual.en
No ratings yet
A630 A640 Advanced - Manual.en
147 pages
Philippine National Standards For Drinking Water 2017 (DOH AO 2017-0010) - Drinking Water - Water Quality
No ratings yet
Philippine National Standards For Drinking Water 2017 (DOH AO 2017-0010) - Drinking Water - Water Quality
1 page
Ecostruxure Control Expert With Topology Manager
100% (1)
Ecostruxure Control Expert With Topology Manager
11 pages
Republic of The Philippines Province of Isabela Municipality of Gamu BARANGAY - Office of The Punong Barangay
No ratings yet
Republic of The Philippines Province of Isabela Municipality of Gamu BARANGAY - Office of The Punong Barangay
2 pages
BRD-Amelia Novianti M.P - 1401184539 - MB42INT1 (1) FEEDBACK
No ratings yet
BRD-Amelia Novianti M.P - 1401184539 - MB42INT1 (1) FEEDBACK
50 pages
Project 2
No ratings yet
Project 2
3 pages
Astrophotography Nightscape Lens Rating
No ratings yet
Astrophotography Nightscape Lens Rating
26 pages
A Synopsis Report ON Credit Risk Management AT Icici Bank LTD
No ratings yet
A Synopsis Report ON Credit Risk Management AT Icici Bank LTD
19 pages

WEKA A Machine Learning Workbench For Data Mining

Uploaded by

WEKA A Machine Learning Workbench For Data Mining

Uploaded by

Chapter 1

Eibe Frank, Mark Hall, Geoffrey Holmes, Richard Kirkby, Bernhard

Abstract The Weka workbench is an organized collection of state-of-the-art ma-

Figure 1.1. The Explorer interface.

Exploring the Data

format, or in the system’s native ARFF file format. Database access is

Figure 1.2. The Knowledge Flow interface.

Figure 1.3. The Experimenter interface.

processed incrementally by those filters and learning algorithms that are

the Explorer. However, to automate the process Weka includes a third

Methods and Algorithms

Classification. Implementations of almost all main-stream classifi-

ing the variants AdaboostM1 and LogitBoost), and stacking. A method

Regression. There are implementations of many regression schemes.

Clustering. At present, only a few standard clustering algorithms

Association rule learning. The standard algorithm for association

Attribute selection. Both wrapper and filter approaches to at-

Filters. Processes that transform instances and sets of instances are

making indicators, merging attribute values, transforming nominal to

ability to configure each component’s options dynamically at runtime.

medium sized datasets. However, larger datasets have been successfully

Summing up the Workbench

You might also like