0% found this document useful (0 votes)

67 views22 pages

Unit V

The document describes the ARFF file format used in WEKA machine learning tool. ARFF files contain a header section and a data section. The header section includes information like the relation name and attributes. Each attribute is defined with a name and data type. The data section contains the actual data values organized into instances based on the attribute order defined in the header. Common examples of datasets in ARFF format included with WEKA are the Iris plants database and Wisconsin breast cancer dataset.

Uploaded by

Vasantha Kumar V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views22 pages

Unit V

Uploaded by

Vasantha Kumar V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Unit V WEKA Tool

Datasets – Introduction - ARFF File Format

ARFF files have two distinct sections. The first section is the Header information,
which is followed the Data information.

The Header of the ARFF file contains the name of the relation, a list of the attributes
(the columns in the data), and their types. An example header on the standard IRIS
dataset looks like this:
% 1. Title: Iris Plants Database
%
% 2. Sources:
% (a) Creator: R.A. Fisher
% (b) Donor: Michael Marshall (MARSHALL%[email protected])
% (c) Date: July, 1988
%
@RELATION iris

@ATTRIBUTE sepallength NUMERIC

@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

The Data of the ARFF file looks like the following:

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa

Lines that begin with a % are comments.

The @RELATION, @ATTRIBUTE and @DATA declarations are case insensitive.

Examples
Several well-known machine learning datasets are distributed with Weka in the
$WEKAHOME/data directory as ARFF files.

The ARFF Header Section

The ARFF Header section of the file contains the relation declaration and attributes
declarations.

The @relation Declaration

The relation name is defined as the first line in the ARFF file. The format is:
@relation <relation-name>

where <relation-name> is a string. The string must be quoted if the name includes
spaces.
The @attribute Declarations

Attribute declarations take the form of an ordered sequence of @attribute statements.

Each attribute in the data set has its own @attribute statement which uniquely
defines the name of that attribute and it's data type. The order the attributes are
declared indicates the column position in the data section of the file. For example, if
an attribute is the third one declared then Weka expects that all that attributes values
will be found in the third comma delimited column.

The format for the @attribute statement is:

@attribute <attribute-name> <datatype>

where the <attribute-name> must start with an alphabetic character. If spaces are to
be included in the name then the entire name must be quoted.

The <datatype> can be any of the four types currently (version 3.2.1) supported by
Weka:

 numeric
 <nominal-specification>
 string
 date [<date-format>]
where <nominal-specification> and <date-format> are defined below. The
keywords numeric, string and date are case insensitive.

Numeric attributes

Numeric attributes can be real or integer numbers.

Nominal attributes

Nominal values are defined by providing an <nominal-specification> listing the

possible values: {<nominal-name1>, <nominal-name2>, <nominal-name3>, ...}

For example, the class value of the Iris dataset can be defined as follows:
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

Values that contain spaces must be quoted.

String attributes

String attributes allow us to create attributes containing arbitrary textual values. This
is very useful in text-mining applications, as we can create datasets with string
attributes, then write Weka Filters to manipulate strings (like String To Word Vector
Filter). String attributes are declared as follows:
@ATTRIBUTE LCC string

Date attributes

Date attribute declarations take the form:

@attribute <name> date [<date-format>]

where <name> is the name for the attribute and <date-format> is an optional string
specifying how date values should be parsed and printed (this is the same format
used by Simple Date Format). The default format string accepts the ISO-8601
combined date and time format: "yyyy-MM-dd'T'HH:mm:ss".

Dates must be specified in the data section as the corresponding string representations
of the date/time (see example below).
ARFF Data Section

The ARFF Data section of the file contains the data declaration line and the actual
instance lines.

The @data Declaration

The @data declaration is a single line denoting the start of the data segment in the
file. The format is:
@data

Example
Name GiveBirth CanFly Live In Water HaveLegsClass
Human yes no no yes mammals
Python no no no no non-mammals
Salmon no no yes no non-mammals
Whale yes no yes no mammals
Frog no no sometimes yes non-mammals
Kornodo no no no yes non-mammals
Bat yes yes no yes mammals
Pigeon no yes no yes non-mammals
Cat yes no no yes mammals
Leopard sha yes no yes no non-mammals
Turtle no no sometimes yes non-mammals
Penguin no no sometimes yes non-mammals
Porcupine yes no no yes mammals
Cel no no yes no non-mammals
Salamander no no sometimes yes non-mammals
Gilamanster no no no yes non-mammals
Platypus no no no yes mammals
Owl no yes no yes non-mammals
Dolphin yes no yes no mammals
Eagle no yes no yes non-mammals

program.arff:
@relation program

@attribute GiveBirth{Yes,No}
@attribute CanFly{Yes,No}

@attribute LiveInWater{Yes,No}

@attribute HaveLegs{Yes,No}

@attribute class{Mammals,Non-mammals}

@data

Yes,No,No,Yes,Mammals

No,NO,No,No,Non-mammals

No,No,Yes,No,Non-mammals

Yes,No,Yes,No,Mammals

No,No,Sometimes,Yes,Non-mammals

No,No,No,Yes,Non-mammals

Yes,Yes,No,Yes,Mammals

No,Yes,No,Yes,Non-mammals

Yes,No,No,Yes,Mammals

Yes,No,Yes,No,Non-mammals

No,No,sometimes,Yes,Non-mammals

Yes,No,No,Yes,Mammals

No,No,Yes,No,Non-mammals

No,No,sometimes,Yes,Non-mammals

No,No,No,Yes,Non-mammals

No,No,No,Yes,Mammals

No,Yes,No,Yes,Non-mammals

Yes,No,Yes,No,Mammals

No,Yes,No,Yes,Non-mammals
Iris plants database
https://archive.ics.uci.edu/ml/datasets/Iris

Data Set Information:

This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is
a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data
set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is
linearly separable from the other 2; the latter are NOT linearly separable from each other.

Predicted attribute: class of iris plant. This is an exceedingly simple domain.

This data differs from the data presented in Fishers article (identified by Steve
Chadwick, spchadwick '@' espeedaz.net ). The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa"
where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa" where the errors are
in the second and third features.

Attribute Information:
1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class:
-- Iris Setosa
-- Iris Versicolour
-- Iris Virginica

Breast Cancer Wisconsin (Original) Data Set

https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)

Data Set Information:

Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database therefore reflects this
chronological grouping of the data. This grouping information appears immediately below, having been
removed from the data itself:

Group 1: 367 instances (January 1989)

Group 2: 70 instances (October 1989)
Group 3: 31 instances (February 1990)
Group 4: 17 instances (April 1990)
Group 5: 48 instances (August 1990)
Group 6: 49 instances (Updated January 1991)
Group 7: 31 instances (June 1991)
Group 8: 86 instances (November 1991)
-----------------------------------------
Total: 699 points (as of the donated datbase on 15 July 1992)
Note that the results summarized above in Past Usage refer to a dataset of size 369, while Group 1 has
only 367 instances. This is because it originally contained 369 instances; 2 were removed. The following
statements summarizes changes to the original Group 1's set of data:

##### Group 1 : 367 points: 200B 167M (January 1989)

##### Revised Jan 10, 1991: Replaced zero bare nuclei in 1080185 & 1187805

##### Revised Nov 22,1991: Removed 765878,4,5,9,7,10,10,10,3,8,1 no record

##### : Removed 484201,2,7,8,8,4,3,10,3,4,1 zero epithelial
##### : Changed 0 to 1 in field 6 of sample 1219406
##### : Changed 0 to 1 in field 8 of following sample:
##### : 1182404,2,3,1,1,1,2,0,1,1,1

Attribute Information:
1. Sample code number: id number
2. Clump Thickness: 1 - 10
3. Uniformity of Cell Size: 1 - 10
4. Uniformity of Cell Shape: 1 - 10
5. Marginal Adhesion: 1 - 10
6. Single Epithelial Cell Size: 1 - 10
7. Bare Nuclei: 1 - 10
8. Bland Chromatin: 1 - 10
9. Normal Nucleoli: 1 - 10
10. Mitoses: 1 - 10
11. Class: (2 for benign, 4 for malignant)

1985 Auto Imports Database

https://github.com/jihoonerd/1985_Auto_Imports_Database

Prediction Model of Loss Payment Ratio of Motors, using 1985 Auto Imports Database

Overview

The objective of this project is training a prediction model to infer normalized loss ratio
of automobiles. This project has four stages. First, in project setup stage, it prepares the
data to be ready for data processing. Second, exploratory data analysis is conducted to
visualize the data. In the third stage, a prediction model is implemented. Lastly,
performance is recorded and visualized.

Data Set Information:

This data set consists of three types of entities: (a) the specification of an auto in terms
of various characteristics, (b) its assigned insurance risk rating, (c) its normalized losses
in use as compared to other cars. The second rating corresponds to the degree to
which the auto is more risky than its price indicates. Cars are initially assigned a risk
factor symbol associated with its price. Then, if it is more risky (or less), this symbol is
adjusted by moving it up (or down) the scale. Actuarians call this process "symboling". A
value of +3 indicates that the auto is risky, -3 that it is probably pretty safe. The third
factor is the relative average loss payment per insured vehicle year. This value is
normalized for all autos within a particular size classification (two-door small, station
wagons, sports/speciality, etc...), and represents the average loss per car per year.
Note: Several of the attributes in the database could be used as a "class" attribute.

Dataset Size:

 Number of Instances: 205

 Number of Attributes: 26 total
o 15 continuous
o 1 integer
o 10 nominal

Introduction to WEKA:
Waika to Environment for Knowledge Analysis.
 Weka, developed at University of Waikato in New Zealand, is an open-source data
mining software in Java.
 It contains a collection of algorithms for data mining tasks, including data preprocessing,
association mining, classification, regression, clustering and visualization.

WEKA is a data mining system developed by the University of Waikato in New Zealand that implements
data mining algorithms. WEKA is a state-of-the-art facility for developing machine learning (ML)
techniques and their application to real-world data mining problems. It is a collection of machine
learning algorithms for data mining tasks. The algorithms are applied directly to a dataset. WEKA
implements algorithms for data preprocessing, classification, regression, clustering, association rules; it
also includes a visualization tools. The new machine learning schemes can also be developed with this
package. WEKA is open source software issued under the GNU General Public License [3]

 Weka supports four file formats:

1. .arff
2. .csv
3. .name and
4. .data

Procedure to load the dataset :

Weka Explorer OpenFile program

Console
Select classify :
Choose

Trees

J48

Start
Rightclick on 18:35:26-trees.J48

Select visualize Tree

DECISION TREE:
GiveBirth
=Yes =No

Mammals(7. Non-
0/1.0) mammals(13.
0/1.0)

Select classify:
choose

Rules

ZeroR

start

Load Your Data

Click the “Open file” button from the Pre-process section and load your .arff file from your local
file system. If you couldn’t convert your .csv to .arff, don’t worry, because Weka will do that
instead of you.
Figure 3.1 Preprocess of Iris Dataset

If you could follow all the steps so far, you can load your data set successfully and you’ll see
attribute names (it is illustrated at the red area on above images). The pre-process stage is named
as Filter in Weka, you can click the ‘Choose’ button from Filter and apply any filter you want.
For example, if you would like to use Association Rule Mining as a training model, you have to
dissociate numeric and continuous attributes. To be able to do that you can follow the path:
Choose -> Filter -> Supervised -> Attribute -> Discretize.

Classification

The concept of classification is basically distribute data among the various classes defined on a
data set. Classification algorithms learn this form of distribution from a given set of training and
then try to classify it correctly when it comes to test data for which the class is not specified. The
values that specify these classes on the dataset are given a label name and are used to determine
the class of data to be given during the test.

For this tutorial we will use Iris dataset to illustrate the usage of classification with Weka. You
can download the dataset from here. Since Iris dataset doesn’t need pre-processing, we can do
classification directly by using it. Weka is a good tool for beginners; it includes a tremendous
amount of algorithms in it. After you load your dataset, by clicking the Classify section you can
switch to another window which we will talk about in this post.

In the Classify section, as you can see in the Area 1 according to Figure 4.1, ZeroR is the default
classifier for Weka. But since ZeroR algorithm’s performance are not good for Iris dataset, we’ll
switch it with the J48 algorithm known for its very good success rate for our dataset. By clicking
the Choose button from Area 1 on the above Figure 4.1, a new algorithm can be selected from
list. J48 algorithm is inside of trees directory in the Classifier list. Before running the algorithm
we have to select the test options from Area 2. Test options consist of 4 options:

Use training set: Classifies your model based on the dataset which you originally trained your
model with.

Supplied test set: Controls how your model is classified based on the dataset you supply from
externally. Select a dataset file by clicking the Set button.

Cross-validation: The cross validation option is a widely used one, especially if you have limited
amount of datasets. The number you enter in the Fold section are used to divide your dataset into
Fold numbers (let’s say it is 10). The original dataset is randomly partitioned into 10 subsets.
After that, Weka uses set 1 for testing and 9 sets for training for the first training, then uses set 2
for testing and the other 9 sets for training, and repeat that 10 times in total by incrementing the
set number each time. In the end, the average success rate is reported to the user.

Percentage split: Divide your dataset into train and test according to the number you enter. By
default the percentage value is 66%, it means 66% of your dataset will be used as training set and
the other 33% will be your test set.

By clicking the text area, (the arrow on Figure 4.2) you can edit the parameters of the algorithm
according to your needs.

I chose the 10 fold cross validation from Test Options using the J48 algorithm. I chose my class
feature from the drop down list as class and click the “Start” button from Area 2 in Figure 4.3.
According the result, the success rate is 96%, you can see it from the Classifier Output has
shown at Area 1 in Figure 4.3.
Run Information in Area 1 will give you detailed results as you can see in Figure 4.4. It consists
of 5 parts; the first one is Run Information, which gives detailed information about the dataset
and the model you used. As you can see in Figure 4.4, we used J48 as a classification model, our
dataset was Iris dataset and its features are sepallength, sepalwidth, petallength, petalwidth, class.
Our test mode is 10-fold cross-validation. Since J48 is a decision tree, our model created a
pruned tree. As you can see on the tree, the first branching happened on petallength which shows
the petal length of the flowers, if the value is smaller or equal to 0.6, the species is Iris-setosa,
otherwise there is another branch that checks another specification to decide the species. In tree
structure, ‘:’ represents the class label.

The Classifier Model part illustrates the model as a tree and gives some information about the
tree, like number of leaves, size of the tree, etc. Next is the stratified cross-validation part and it
shows the error rates. By checking this part you can see how successful your model is. For
example, our model correctly classified 96% of the training data and our mean absolute error rate
is 0.035, which is acceptable according to Iris dataset and our model.
You can see a Confusion Matrix and detailed Accuracy Table at the bottom of the report. F-
Measure and ROC Area rates are important for the models and they are developed according to a
confusion matrix. A confusion matrix represents the True Positive, True Negative, False Positive
and False Negative rates, which I explain next. If you already understand Confusion Matrices
you can directly skip to the Visualizing the Result part.

Visualizing the Result

If you’d like to visualize these results you can use graphic presentations as you can see in below
Figure 4.5.

By right clicking Visualize tree you’ll see your model’s illustration like in Figure 4.6.
If you’d like to see classification errors illustrated, select Visualize Classifier Errors in same
menu. By sliding jitter (you can see in Area 1 at Figure 4.6) you can see all samples on
coordinate plane. The X plane represents predicted classifier results, the Y plane represents
actual classifier results. Squares represent wrongly classified samples. Stars represent true
classified samples. Blue colored ones are Iris-setosa, red colored stars are Iris-versicolor, green
ones Iris-virginica species. So, red square means our model classified this sample as Iris
versicolor but it supposed to be Iris virginica.
Weka Machine Learning Algorithms
Weka has a lot of machine learning algorithms. This is great, it is one of the large benefits of
using Weka as a platform for machine learning.

A down side is that it can be a little overwhelming to know which algorithms to use, and
when. Also, the algorithms have names that may not be familiar to you, even if you know
them in other contexts.

In this section we will start off by looking at some well known algorithms supported by
Weka. What we will learn in this post applies to the machine learning algorithms used
across the Weka platform, but the Explorer is the best place to learn more about the
algorithms as they are all available in one easy place.

1. Open the Weka GUI Chooser.

2. Click the “Explorer” button to open the Weka explorer.
3. Open a dataset, such as the Pima Indians dataset from the data/diabetes.arff file in your
Weka installation.
4. Click “Classify” to open the Classify tab.
The classify tab of the Explorer is where you can learn about the various different
algorithms and explore predictive modeling.

You can choose a machine learning algorithm by clicking the “Choose” button.

Clicking on the “Choose” button presents you with a list of machine learning algorithms to
choose from. They are divided into a number of main groups:

 bayes: Algorithms that use Bayes Theorem in some core way, like Naive Bayes.

 function: Algorithms that estimate a function, like Linear Regression.

 lazy: Algorithms that use lazy learning, like k-Nearest Neighbors.

 meta: Algorithms that use or combine multiple algorithms, like Ensembles.

 misc: Implementations that do not neatly fit into the other groups, like running a saved
model.

 rules: Algorithms that use rules, like One Rule.

 trees: Algorithms that use decision trees, like Random Forest.

The tab is called “Classify” and the algorithms are listed under an overarching group called
“Classifiers”. Nevertheless, Weka supports both classification (predict a category) and regression
(predict a numeric value) predictive modeling problems.

Weka Machine Clustering Algorithms

A clustering algorithm finds groups of similar instances in the entire dataset. WEKA supports
several clustering algorithms such as EM, FilteredClusterer, HierarchicalClusterer,
SimpleKMeans and so on. You should understand these algorithms completely to fully exploit
the WEKA capabilities.
As in the case of classification, WEKA allows you to visualize the detected clusters graphically.
To demonstrate the clustering, we will use the provided iris database. The data set contains
three classes of 50 instances each. Each class refers to a type of iris plant.
Click on the Cluster TAB to apply the clustering algorithms to our loaded data. Click on
the Choose button. You will see the following screen with the list of algorithms available in weka

Association rules
Association rule learners find associations between attributes. Between any attributes: there’s no
particular class attribute. Rules can predict any attribute, or indeed any combination of attributes.
To find them we need a different kind of algorithm. “Support” and “confidence” are two
measures of a rule that are used to evaluate them, and rank them. The most popular association
rule learner, and the one used in Weka, is called Apriori.
Associator
Click on the Associate TAB and click on the Choose button. Select the Apriori association as
shown in the screenshot −

To set the parameters for the Apriori algorithm, click on its name, a window will pop up as shown
below that allows you to set the parameters –

Microsoft Office Professional 2010 Step by Step PDF
100% (2)
Microsoft Office Professional 2010 Step by Step PDF
1,072 pages
CCS341-Data Warehousing Lab Manual (2021)
100% (1)
CCS341-Data Warehousing Lab Manual (2021)
50 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
SB Binomial Distribution
67% (3)
SB Binomial Distribution
6 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Dm&pa Lab Manual
No ratings yet
Dm&pa Lab Manual
68 pages
EDPMS User Manual Guide
No ratings yet
EDPMS User Manual Guide
34 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
P4
No ratings yet
P4
55 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
CCIE Security v5 Configure LAB1 Questions
No ratings yet
CCIE Security v5 Configure LAB1 Questions
23 pages
Data Mining Lab Manual COMPLETE GMR
No ratings yet
Data Mining Lab Manual COMPLETE GMR
66 pages
Assignment No 5 K-Means Clustering
No ratings yet
Assignment No 5 K-Means Clustering
2 pages
CCS341-Data Warehousing Lab Manual (2021)
No ratings yet
CCS341-Data Warehousing Lab Manual (2021)
88 pages
Got f900 Serise Operation M
No ratings yet
Got f900 Serise Operation M
422 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
DM Lab
No ratings yet
DM Lab
101 pages
Datamining Lab Manual
No ratings yet
Datamining Lab Manual
62 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
DataMining Record Using Weka Tool
No ratings yet
DataMining Record Using Weka Tool
55 pages
DMDW Lab Record
No ratings yet
DMDW Lab Record
60 pages
Data Mining: Index
No ratings yet
Data Mining: Index
47 pages
Beginning With Weka and R Language
No ratings yet
Beginning With Weka and R Language
27 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
30 pages
Lab Manual CSF346
No ratings yet
Lab Manual CSF346
21 pages
Unit 6 Homework Week November 26
No ratings yet
Unit 6 Homework Week November 26
4 pages
Data Minig Lab File
No ratings yet
Data Minig Lab File
25 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
AJAX Toolkit
No ratings yet
AJAX Toolkit
48 pages
Lab Manual
No ratings yet
Lab Manual
16 pages
COLab File
No ratings yet
COLab File
24 pages
K Weka
No ratings yet
K Weka
10 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Btech Seminar
No ratings yet
Btech Seminar
40 pages
GlideFast Consulting LLC Announced Today It Has Achieved Gold Services Partner Status From ServiceNow, The Enterprise Cloud Company
No ratings yet
GlideFast Consulting LLC Announced Today It Has Achieved Gold Services Partner Status From ServiceNow, The Enterprise Cloud Company
2 pages
Command Injection Essence
No ratings yet
Command Injection Essence
11 pages
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
No ratings yet
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
19 pages
Workshop 1
No ratings yet
Workshop 1
16 pages
ML Assignment 2
No ratings yet
ML Assignment 2
25 pages
DWDM Print
No ratings yet
DWDM Print
20 pages
DMlab - FilE prINCE
No ratings yet
DMlab - FilE prINCE
27 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
An Introduction To WEKA
No ratings yet
An Introduction To WEKA
85 pages
DWDM - Case Study On Weka - Ceb624
No ratings yet
DWDM - Case Study On Weka - Ceb624
13 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Batch Data Communication
No ratings yet
Batch Data Communication
38 pages
IGCSE Computer Science Number Systems TQ Paper Set 3
No ratings yet
IGCSE Computer Science Number Systems TQ Paper Set 3
8 pages
Weka - File Formats5
No ratings yet
Weka - File Formats5
4 pages
Assignment 2: Problem Statement
No ratings yet
Assignment 2: Problem Statement
5 pages
Unicast Routing Protocols
No ratings yet
Unicast Routing Protocols
31 pages
Configuration TO Enterprise Structure (Fi) : Welcome
No ratings yet
Configuration TO Enterprise Structure (Fi) : Welcome
50 pages
6171675-Ix Std-Artificial Intelligence - Retestpostmidtermqp
No ratings yet
6171675-Ix Std-Artificial Intelligence - Retestpostmidtermqp
6 pages
DM Manual III-II
No ratings yet
DM Manual III-II
18 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
Weka Lab Experiment 1 2
No ratings yet
Weka Lab Experiment 1 2
12 pages
MLA Lab 1
No ratings yet
MLA Lab 1
2 pages
NguyenCongSang ITITIU20292 Lab1
No ratings yet
NguyenCongSang ITITIU20292 Lab1
7 pages
Machine Learning in Python
No ratings yet
Machine Learning in Python
5 pages
DMLB 1
No ratings yet
DMLB 1
3 pages
The ARFF Header Section
No ratings yet
The ARFF Header Section
4 pages
ML Lab External QP
No ratings yet
ML Lab External QP
2 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
A Capstone Course On Agile Software Development Using Scrum
No ratings yet
A Capstone Course On Agile Software Development Using Scrum
9 pages
Application For E.D.P. - Dairy Farming
No ratings yet
Application For E.D.P. - Dairy Farming
3 pages
Program No-1 OBJECTIVE: To Create Data-Set in .Arff File Format. Demonstration of Preprocessing On WEKA Data-Set
No ratings yet
Program No-1 OBJECTIVE: To Create Data-Set in .Arff File Format. Demonstration of Preprocessing On WEKA Data-Set
7 pages
WEKA Manual
No ratings yet
WEKA Manual
11 pages
SGD Transfer Instructions
No ratings yet
SGD Transfer Instructions
38 pages
Weka Assign 1 Text
No ratings yet
Weka Assign 1 Text
2 pages
IANA IPv4 Address Space Registry
No ratings yet
IANA IPv4 Address Space Registry
10 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
PDSAHaystackCh01 Overview
No ratings yet
PDSAHaystackCh01 Overview
7 pages
4 (B) - Data Preprocessing and Visualization
No ratings yet
4 (B) - Data Preprocessing and Visualization
6 pages
Banking & Finance Banking: Central Bank of India Case Study
No ratings yet
Banking & Finance Banking: Central Bank of India Case Study
1 page
Using Weka
No ratings yet
Using Weka
6 pages
Algorithm Complexity I
No ratings yet
Algorithm Complexity I
5 pages
Introduction To Weka-A Toolkit For Machine Learning
No ratings yet
Introduction To Weka-A Toolkit For Machine Learning
11 pages
Title Design and Implementation of PRBS Generator Using VHDL
No ratings yet
Title Design and Implementation of PRBS Generator Using VHDL
7 pages
Simulation Study of Black Hole and Jellyfish - Attack On MANET Using NS3
No ratings yet
Simulation Study of Black Hole and Jellyfish - Attack On MANET Using NS3
5 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
Springer Book Proposal Form: Philosophy and Religious Studies Senior Editor
No ratings yet
Springer Book Proposal Form: Philosophy and Religious Studies Senior Editor
11 pages
AVLTrees
No ratings yet
AVLTrees
2 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
DM Assignments
No ratings yet
DM Assignments
4 pages
Create Crossword Puzzle
No ratings yet
Create Crossword Puzzle
4 pages
8085 Prog-Ans
No ratings yet
8085 Prog-Ans
23 pages
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
From Everand
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
Vere salazar
No ratings yet

Unit V

Uploaded by

Unit V

Uploaded by

Unit V WEKA Tool

Datasets – Introduction - ARFF File Format

@ATTRIBUTE sepallength NUMERIC

The Data of the ARFF file looks like the following:

Lines that begin with a % are comments.

The ARFF Header Section

The @relation Declaration

Attribute declarations take the form of an ordered sequence of @attribute statements.

The format for the @attribute statement is:

Numeric attributes can be real or integer numbers.

Nominal values are defined by providing an <nominal-specification> listing the

Values that contain spaces must be quoted.

Date attribute declarations take the form:

The @data Declaration

Data Set Information:

Predicted attribute: class of iris plant. This is an exceedingly simple domain.

Breast Cancer Wisconsin (Original) Data Set

Data Set Information:

Group 1: 367 instances (January 1989)

##### Group 1 : 367 points: 200B 167M (January 1989)

##### Revised Nov 22,1991: Removed 765878,4,5,9,7,10,10,10,3,8,1 no record

1985 Auto Imports Database

Data Set Information:

 Number of Instances: 205

 Weka supports four file formats:

Procedure to load the dataset :

Weka Explorer OpenFile program

Select visualize Tree

Load Your Data

Visualizing the Result

1. Open the Weka GUI Chooser.

 function: Algorithms that estimate a function, like Linear Regression.

 lazy: Algorithms that use lazy learning, like k-Nearest Neighbors.

 meta: Algorithms that use or combine multiple algorithms, like Ensembles.

 rules: Algorithms that use rules, like One Rule.

 trees: Algorithms that use decision trees, like Random Forest.

Weka Machine Clustering Algorithms

You might also like