0% found this document useful (0 votes)
211 views

Weka Tutorial

Weka is a Java-based machine learning tool that implements various classification algorithms. It has three modes of operation: a graphical user interface (GUI), command line, and Java API. Weka contains sample datasets in ARFF format and can be used to classify data, analyze results, and save models. The document provides instructions for loading data, running classifiers like Naive Bayes through the GUI or command line, evaluating accuracy, and using Weka's tools to complete homework assignments.

Uploaded by

hugobernal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
211 views

Weka Tutorial

Weka is a Java-based machine learning tool that implements various classification algorithms. It has three modes of operation: a graphical user interface (GUI), command line, and Java API. Weka contains sample datasets in ARFF format and can be used to classify data, analyze results, and save models. The document provides instructions for loading data, running classifiers like Naive Bayes through the GUI or command line, evaluating accuracy, and using Weka's tools to complete homework assignments.

Uploaded by

hugobernal
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 13

A Short Introduction to Weka

Natural Language Processing Thursday, November 5th

What is weka?

Java-based Machine Learning Tool


Implements numerous classifiers 3 modes of operation

GUI Command Line Java API (not discussed here)

Google: weka java

weka Homepage

http://www.cs.waikato.ac.nz/ml/weka/
To run:

java -Xmx1024M -jar ~cs4705/bin/weka.jar &

.arff file format

http://www.cs.waikato.ac.nz/~ml/weka/arff.html

% 1. Title: Iris Plants Database % @RELATION iris @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE @ATTRIBUTE sepallength NUMERIC sepalwidth NUMERIC petallength NUMERIC petalwidth NUMERIC class {Iris-setosa,Iris-versicolor, Iris-virginica}

@DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa

.arff file format


@attribute attrName {numeric, string, <nominal>, date}

numeric: a number nominal: a (finite) set of strings, e.g.


{Iris-setosa,Iris-versicolor, Irisvirginica}

string: <arbitrary strings>

date: (default ISO-8601) yyyy-MMddTHH:mm:ss

Example Arff Files

~cs4705/bin/weka-3-4-11/data/

iris.arff soybean.arff weather.arff

To Classify with weka GUI


1. Run weka GUI
1. (in Unix: java jar weka.jar)

7.Click 'Start'
8.Wait... 9.Right-click on Result list entry
a.'Save result buffer'

2.Click 'Explorer'

3.'Open file...'
4.Select 'Classify' tab

5.'Choose' a classifier
6.Confirm options

b.'Save model'

Classify

Some classifiers to start with.


NaiveBayes JRip J48 SMO

Find References by selecting a classifier


Use Cross-Validation!

Analyzing Results

Important tools for Homework 3

Accuracy

Correctly classified instances

F-measure Confusion matrix

Save model
Visualization

Running weka from the Command Line

http://weka.wikispaces.com/Primer
Running an N-fold cross validation experiment

java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -x N -i

Using a predefined test set

java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -T testingdata.arff

Saving the model

java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -t trainingdata.arff -d output.model

Classifying a test set

java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -l input.model -T testingdata.arff

Getting help

java -cp ~cs4705/bin/weka.jar weka.classifiers.bayes.NaiveBayes -?

Homework 3 Weka Workflow


T1

TN

S1 S2 SN
results
Your Feature Extractor

Your Feature Extractor

.arff

Weka

best model

Test .arff

Weka

results

Preprocessing (you)

Experimentation (you)

Grading (us)

Tips for Homework Success


Start early Read instructions carefully Start simply Your system should always work

80/20 Rule

Add features incrementally


This way, you always have something you can turn in.

You might also like