0% found this document useful (0 votes)
49 views17 pages

Group 3: Elhaine, Jai, Icelle and Marianne

Weka is an open-source collection of machine learning algorithms and data mining tools developed in Java at the University of Waikato. It contains tools for data pre-processing, classification, clustering, association rule mining and its algorithms can be run from the command line or via its graphical user interface. Weka loads data from files in formats like ARFF and allows importing data from databases or URLs.

Uploaded by

Icelle Timbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views17 pages

Group 3: Elhaine, Jai, Icelle and Marianne

Weka is an open-source collection of machine learning algorithms and data mining tools developed in Java at the University of Waikato. It contains tools for data pre-processing, classification, clustering, association rule mining and its algorithms can be run from the command line or via its graphical user interface. Weka loads data from files in formats like ARFF and allows importing data from databases or URLs.

Uploaded by

Icelle Timbal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

Group 3: Elhaine, Jai, Icelle and Marianne

*A collection of open source ML algorithms


1. pre-processing
2. classifiers
3. clustering
4. association rule
*Created by researchers at the University of Waikato
in New Zealand
*Java based
Download software from
http://www.cs.waikato.ac.nz/ml/weka/
* If you are interested in modifying/extending weka
there is a developer version that includes the
source code
Set the weka environment variable for java
* setenv WEKAHOME /usr/local/weka/weka-3-0-2
* setenv CLASSPATH
$WEKAHOME/weka.jar:$CLASSPATH
Download some ML data from
http://mlearn.ics.uci.edu/MLRepository.html
* Routines are implemented as classes and logically
arranged in packages
* Comes with an extensive GUI interface
--Weka routines can be used stand alone via the command
line
Eg. java eka.classifiers.j48.J48 -t $WEKAHOME/data/iris.arff
WEKA:: Interface
1. Simple CLI provides a command line interface
to weka’s routines
2. Explorer interface provides a graphical front
end to weka’s routines and components
4. Experimenter allows you to build classification
experiments
4. Knowledge Flow provides an alternative to the
Explorer as a graphical front end to Weka's
core algorithms.
*Uses flat text files to describe the data
*Can work with a wide variety of data files including
its own “.arff” format and C4.5 file formats
*Data can be imported from a file in various formats:
* ARFF, CSV, C4.5, binary
*Data can also be read from a URL or from an SQL
database (using JDBC)

*
@relation heart-disease-simplified

@attribute age numeric


@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...

A more thorough description is available here


http://www.cs.waikato.ac.nz/~ml/weka/arff.html
* Pre-processing tools in WEKA are called “filters”
* WEKA contains filters for:
--Discretization, normalization, resampling, attribute
selection, transforming, combining attributes, etc
* Classifiers in WEKA are models for predicting nominal or
numeric quantities
* Implemented learning schemes include:
--Decision trees and lists, instance-based classifiers, support
vector machines, multi-layer perceptrons, logistic regression,
Bayes’ nets, …
* “Meta”-classifiers include:
--Bagging, boosting, stacking, error-correcting output codes,
locally weighted learning, …
WEKA:: Explorer: Clustering

 Example showing simple K-means on the Iris


dataset
* A very comprehensive open-source software
implementing tools for
* intelligent data analysis, data mining, knowledge
discovery, machine learning, predictive analytics,
forecasting, and analytics in business intelligence
(BI).
* Is implemented in Java and available under
GPL among other licenses
* Available from http://rapid-i.com

*
* Is similar in spirit to Weka’s Knowledge flow
* Data mining processes/routines are views as
sequential operators
* Knowledge discovery process are modeled as
operator chains/trees
* Operators define their expected inputs and
delivered outputs as well as their parameters
* Has over 400 data mining operators

*
* Uses XML for describing operator trees in the
KD process
* Alternatively can be started through the
command line and passed the XML process file

You might also like