0% found this document useful (0 votes)
241 views

Introduction To Weka

Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preprocessing, classification, regression, clustering, association rules, and visualization. Datasets are stored in ARFF files which describe the attributes and contain the data values. Weka contains many classifiers like J48 decision trees and Naive Bayes. It also has filters for preprocessing data and tools like the Explorer for classification and clustering and Experimenter for running multiple experiments.

Uploaded by

sandyguru05
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
241 views

Introduction To Weka

Weka is a collection of machine learning algorithms for data mining tasks. It contains tools for data preprocessing, classification, regression, clustering, association rules, and visualization. Datasets are stored in ARFF files which describe the attributes and contain the data values. Weka contains many classifiers like J48 decision trees and Naive Bayes. It also has filters for preprocessing data and tools like the Explorer for classification and clustering and Experimenter for running multiple experiments.

Uploaded by

sandyguru05
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Introduction to Weka

Overview


What is Weka?

Where to find Weka?

Command Line Vs GUI

Datasets in Weka

ARFF Files

Classifiers in Weka

Filters
What is Weka?


Weka is a collection of machine learning
algorithms for data mining tasks. The
algorithms can either be applied directly to a
dataset or called from your own Java code.
Weka contains tools for data pre-processing,
classification, regression, clustering,
association rules, and visualization. It is also
well-suited for developing new machine
learning schemes.
Where to find Weka


Weka website (Latest version 3.6):
– http://www.cs.waikato.ac.nz/ml/weka/


Weka Manual:
− http://transact.dl.sourceforge.net/sourcefor
ge/weka/WekaManual-3.6.0.pdf
CLI Vs GUI


Recommended for in-depth usage 
Explorer

Offers some functionality not 
Experimenter
available via the GUI 
Knowledge Flow
Datasets in Weka


Each entry in a dataset is an instance of the
java class:
− weka.core.Instance

Each instance consists of a number of
attributes
Attributes


Nominal: one of a predefined list of values
− e.g. red, green, blue

Numeric: A real or integer number

String: Enclosed in “double quotes”

Date

Relational
ARFF Files


The external representation of an Instances
class

Consists of:
− A header: Describes the attribute types
− Data section: Comma separated list of data
ARFF File Example

Dataset name

Comment

Attributes

Target / Class variable

Data Values
Assignment ARFF Files


Credit-g

Heart-c

Hepatitis

Vowel

Zoo


http://www.cs.auckland.ac.nz/~pat/weka/
ARFF Files


Basic statistics and validation by running:
− java weka.core.Instances data/soybean.arff
Classifiers in Weka

Learning algorithms in Weka are derived from
the abstract class:
− weka.classifiers.Classifier

Simple classifier: ZeroR
− Just determines the most common class
− Or the median (in the case of numeric
values)
− Tests how well the class can be predicted
without considering other attributes
− Can be used as a Lower Bound on
Performance.
Classifiers in Weka


Simple Classifier Example
− java weka.classifiers.rules.ZeroR -t
data/weather.arff
− java weka.classifiers.trees.J48 -t
data/weather.arff

Help Command
− java weka.classifiers.trees.J48 -h
Classifiers in Weka


Soybean.arff split into train and test set
– Soybean-train.arff
– Soybean-test.arff Training data

Input command:
– java weka.classifiers.trees.J48 -t soybean-
train.arff -T soybean-test.arff -i

Test data Provides more detailed


output
Soybean Results
Soybean Results (cont...)
Soybean Results (cont...)

• True Positive (TP)


– Proportion classified as class x / Actual total in
class x
– Equivalent to Recall
• False Positive (FP)
– Proportion incorrectly classified as class x /
Actual total of all classes, except x
Soybean Results (cont...)

• Precision:
– Proportion of the examples which truly have
class x / Total classified as class x
• F-measure:
– 2*Precision*Recall / (Precision + Recall)
– i.e. A combined measure for precision and
recall
Soybean Results (cont...)
Total Actual h

Total Classified as h Total Correct


Filters


weka.filters package

Transform datasets

Support for data preprocessing
− e.g. Removing/Adding Attributes
− e.g. Discretize numeric attributes into
nominal ones

More info in Weka Manual p. 15 & 16.
More Classifiers
Explorer

• Preprocess
• Classify
• Cluster
• Associate
• Select attributes
• Visualize
Preprocess

• Load Data
• Preprocess Data
• Analyse Attributes
Classify

• Select Test Options e.g:


– Use Training Set
– % Split,
– Cross Validation...
• Run classifiers
• View results
Classify
Results
Experimenter

• Allows users to create, run, modify and


analyse experiments in a more convenient
manner than when processing individually.
– Setup
– Run
– Analyse
Experimenter: Setup

• Simple/Advanced
• Results Destinations
– ARFF
– CSV
– JDBC Database
10-fold
Cross Datasets
Validation
Num of
runs
Classifiers
Run Simple Experiment
Results
Advanced Example

Multiple Classifiers
Advanced Example

You might also like