0% found this document useful (0 votes)

30 views7 pages

Weka Data Miningvsem

Weka is a collection of machine learning algorithms for data mining tasks like classification, regression, clustering, and association rule mining. It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. The core algorithms are organized into categories like bayes, functions, lazy, meta, rules, and trees. Weka supports common data formats like ARFF and works on various operating systems.

Uploaded by

Bhagwan Singh Dangi CSEDS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views7 pages

Weka Data Miningvsem

Uploaded by

Bhagwan Singh Dangi CSEDS

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Weka Data Mining

Weka contains a collection of visualization tools and algorithms for data analysis and predictive
modelling, together with graphical user interfaces for easy access to these functions. The original non-
Java version of Weka was a Tcl/Tk front-end to (mostly third-party) modelling algorithms implemented
in other programming languages, plus data preprocessing utilities in C and a makefile-based system for
running machine learning experiments.

This original version was primarily designed as a tool for analyzing data from agricultural domains. Still,
the more recent fully Java-based version (Weka 3), developed in 1997, is now used in many different
application areas, particularly for educational purposes and research. Weka has the following
advantages, such as:

o Free availability under the GNU General Public License.

o Portability, since it is fully implemented in the Java programming language and thus runs on
almost any modern computing platform.
o A comprehensive collection of data preprocessing and modelling techniques.
o Ease of use due to its graphical user interfaces.

Weka supports several standard data mining tasks, specifically, data preprocessing, clustering,
classification, regression, visualization, and feature selection. Input to Weka is expected to be formatted
according to the Attribute-Relational File Format and filename with the .arff extension.

All Weka's techniques are predicated on the assumption that the data is available as one flat file or
relation, where a fixed number of attributes describes each data point (numeric or nominal attributes,
but also supports some other attribute types). Weka provides access to SQL databases using Java
Database Connectivity and can process the result returned by a database query. Weka provides access to
deep learning with Deeplearning4j.

It is not capable of multi-relational data mining. Still, there is separate software for converting a
collection of linked database tables into a single table suitable for processing using Weka. Another
important area currently not covered by the algorithms included the Weka distribution in sequence
modelling.

History of Weka

o In 1993, the University of Waikato in New Zealand began the development of the original version
of Weka, which became a mix of Tcl/Tk, C, and makefiles.
o In 1997, the decision was made to redevelop Weka from scratch in Java, including implementing
modelling algorithms.
o In 2005, Weka received the SIGKDD Data Mining and Knowledge Discovery Service Award.
o In 2006, Pentaho Corporation acquired an exclusive licence to use Weka for business intelligence.
It forms the data mining and predictive analytics component of the Pentaho business intelligence
suite. Hitachi Vantara has since acquired Pentaho, and Weka now underpins the PMI (Plugin for
Machine Intelligence) open-source component.

Features of Weka
Weka has the following features, such as:

1. Preprocess

The preprocessing of data is a crucial task in data mining. Because most of the data is raw, there are
chances that it may contain empty or duplicate values, have garbage values, outliers, extra columns, or
have a different naming convention. All these things degrade the results.

To make data cleaner, better and comprehensive, WEKA comes up with a comprehensive set of options
under the filter category. Here, the tool provides both supervised and unsupervised types of operations.
Here is the list of some operations for preprocessing:

o ReplaceMissingWithUserConstant: to fix empty or null value issue.

o ReservoirSample: to generate a random subset of sample data.
o NominalToBinary: to convert the data from nominal to binary.
o RemovePercentage: to remove a given percentage of data.
o RemoveRange: to remove a given range of data.
2. Classify

Classification is one of the essential functions in machine learning, where we assign classes or categories
to items. The classic examples of classification are: declaring a brain tumour as "malignant" or
"benign" or assigning an email to a "spam" or "not_spam" class.

After selecting the desired classifier, we select test options for the training set. Some of the options are:

o Use training set: the classifier will be tested on the same training set.
o A supplied test set: evaluates the classifier based on a separate test set.
o Cross-validation Folds: assessment of the classifier based on cross-validation using the number
of provided folds.
o Percentage split: the classifier will be judged on a specific percentage of data.

Other than these, we can also use more test options such as Preserve order for % split, Output source
code, etc.

3. Cluster

In clustering, a dataset is arranged in different groups/clusters based on some similarities. In this case,
the items within the same cluster are identical but different from other clusters. Examples of clustering
include identifying customers with similar behaviours and organizing the regions according to
homogenous land use.

4. Associate

Association rules highlight all the associations and correlations between items of a dataset. In short, it is
an if-then statement that depicts the probability of relationships between data items. A classic example
of association refers to a connection between the sale of milk and bread.

The tool provides Apriori, FilteredAssociator, and FPGrowth algorithms for association rules mining
in this category.

5. Select Attributes

Every dataset contains a lot of attributes, but several of them may not be significantly valuable.
Therefore, removing the unnecessary and keeping the relevant details are very important for building a
good model.

Many attribute evaluators and search methods include BestFirst, GreedyStepwise, and Ranker.
6. Visualize

In the visualize tab, different plot matrices and graphs are available to show the trends and errors
identified by the model.

Requirements and Installation of Weka

We can install WEKA on Windows, MAC OS, and Linux. The minimum requirement is Java 8 or above for
the latest stable versions of Weka.

As shown in the above screenshot, five options are available in the Applications category.

o The Exploreris the central panel where most data mining tasks are performed. We will further
explore this panel in upcoming sections.
o The tool provides an Experimenter In this panel, we can run experiments and also design them.
o WEKA provides the KnowledgeFlow panel. It provides an interface to drag and drop
components, connect them to form a knowledge flow and analyze the data and results.
o The Simple CLIpanel provides the command line powers to run WEKA. For example, to fire up
the ZeroR classifier on the arff data, we'll run from the command line:

java weka.classifiers.trees.ZeroR -t iris.arff

Weka Datatypes and Format of Data
Numeric (Integer and Real), String, Date, and Relational are the only four datatypes provided by WEKA.
By default, WEKA supports the ARFF format. The ARFF, attribute-relation file format, is an ASCII format
that describes a list of instances sharing a set of attributes. Every ARFF file has two sections: header and
data.

o The header section consists of attribute types,

o And the data section contains a comma-separated list of data for that attributes.

It is important to note that the declaration of the header (@attribute) and the declaration of

the data (@data) are case-insensitive.

Let's look at the format with a weather forecast dataset:

@attribute outlook {sunny,overcast,rainy}

@attribute tempreture {hot,mild,cool}
@attribute humidity {high,normal}
@attribute windy {TRUE,FALSE}
@attribute play {yes,no}

@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,yes
10. overcast,hot,high,TRUE,yes
11. overcast,cool,normal,TRUE,yes
12. rainy,cool,normal,FALSE,no
13. rainy,cool,normal,TRUE,no

Besides ARFF, the tool supports different file formats such as CSV, JSON, and XRFF.

Loading of Data in Weka

WEKA allows you to load data from four types of sources:

1. The local file system

2. A public URL
3. Query to a database
4. Generate artificial data to run models

Once data is loaded from different sources, the next step is to preprocess the data. For this purpose, we
can choose any suitable filter technique. All the methods come up with default settings that are
configurable by clicking on the name:

If there are some errors or outliers in one of the attributes, such as sepallength, in that case, we can
remove or update it from the Attributes section.
Types of Algorithms by Weka
WEKA provides many algorithms for machine learning tasks. Because of their core nature, all the
algorithms are divided into several groups. These are available under the Explorer tab of the WEKA. Let's
look at those groups and their core nature:

o Bayes: consists of algorithms based on Bayes theorem like Naive Bayes

o functions: comprises the algorithms that estimate a function, including Linear Regression
o lazy: covers all algorithms that use lazy learning similar to KStar, LWL
o meta: consists of those algorithms that use or integrate multiple algorithms for their work
like Stacking, Bagging
o misc: miscellaneous algorithms that do not fit any of the given categories
o rules: combines algorithms that use rules such as OneR, ZeroR
o trees: contains algorithms that use decision trees, such as J48, RandomForest

Each algorithm has configuration parameters such as batchSize, debug, etc. Some configuration
parameters are common across all the algorithms, while some are specific. These configurations can be
editable once the algorithm is selected to use.

Weka Extension Packages

In version 3.7.2, a package manager was added to allow the easier installation of extension packages.
Some functionality that includes Weka before this version has moved into such extension packages, but
this change also makes it easier for others to contribute extensions to Weka and maintain the software,
as this modular architecture allows independent updates of the Weka core and individual extensions.

2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
WEKA Lab Record
No ratings yet
WEKA Lab Record
69 pages
Data Mining Example (Using Weka)
50% (2)
Data Mining Example (Using Weka)
59 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Adec Service Guide For 1010-1015-1020 Dental Exam Chairs
No ratings yet
Adec Service Guide For 1010-1015-1020 Dental Exam Chairs
90 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
Unit-7 Tools of AI (April 9, 2024)
No ratings yet
Unit-7 Tools of AI (April 9, 2024)
88 pages
Weka DW&DM Lab Notes
No ratings yet
Weka DW&DM Lab Notes
37 pages
Data Warehousing and Data Mining Lab Manual
100% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
No ratings yet
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
41 pages
Aiml Manual
No ratings yet
Aiml Manual
27 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
No ratings yet
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
19 pages
Vendor List
No ratings yet
Vendor List
36 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Empowerment Technologies: Quarter 2 - Module 12: Multimedia and ICT
69% (13)
Empowerment Technologies: Quarter 2 - Module 12: Multimedia and ICT
19 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
Weka Installation Steps Final
No ratings yet
Weka Installation Steps Final
7 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
DWDM File
No ratings yet
DWDM File
26 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Wekappt
No ratings yet
Wekappt
58 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Advisory Circular 21 38 Aircraft Electrical Load Analysis
No ratings yet
Advisory Circular 21 38 Aircraft Electrical Load Analysis
51 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
AZ 305T00A ENU TrainerCaseStudies
No ratings yet
AZ 305T00A ENU TrainerCaseStudies
27 pages
DWM1
No ratings yet
DWM1
19 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
DMBI Exp1: Introduction To WEKA Tool
No ratings yet
DMBI Exp1: Introduction To WEKA Tool
6 pages
What Is Weka
No ratings yet
What Is Weka
2 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
Exp 6
No ratings yet
Exp 6
9 pages
Introduction To Weka-A Toolkit For Machine Learning
No ratings yet
Introduction To Weka-A Toolkit For Machine Learning
11 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Working of Cd's and Dvd's
No ratings yet
Working of Cd's and Dvd's
2 pages
Chapter 1 New
No ratings yet
Chapter 1 New
28 pages
Lifepak 9 Service Manual
No ratings yet
Lifepak 9 Service Manual
214 pages
1000 Watts Ups Circuit Diagram
100% (3)
1000 Watts Ups Circuit Diagram
24 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
By Y.G.SAI RAGHU (124576) Under The Guidance of Dr. P. Sree Hari Rao
No ratings yet
By Y.G.SAI RAGHU (124576) Under The Guidance of Dr. P. Sree Hari Rao
26 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
2-Week Look Ahead
No ratings yet
2-Week Look Ahead
1 page
How To Make A Three Axis CNC Machine (Cheaply and Easily)
No ratings yet
How To Make A Three Axis CNC Machine (Cheaply and Easily)
17 pages
Safety Audit Checklist - Excavation & Foundation
No ratings yet
Safety Audit Checklist - Excavation & Foundation
2 pages
Electric Bike
No ratings yet
Electric Bike
15 pages
Modelling The Effects of Cooling Moderate
No ratings yet
Modelling The Effects of Cooling Moderate
10 pages
TS-V9 Multi-Functional Vehicle GPS Tracker User Manual Updated 201801
No ratings yet
TS-V9 Multi-Functional Vehicle GPS Tracker User Manual Updated 201801
14 pages
Knowing The Drill: Virtual Teamwork at BP
No ratings yet
Knowing The Drill: Virtual Teamwork at BP
11 pages
Database Presentation Slides
No ratings yet
Database Presentation Slides
52 pages
Thesis Tracking Uow
100% (1)
Thesis Tracking Uow
6 pages
Network Basic Commands
No ratings yet
Network Basic Commands
17 pages
CFOs Guide To AI and Machine Learning
No ratings yet
CFOs Guide To AI and Machine Learning
13 pages
Eligibility Criteria and Duration (All Courses 2023-24) Sr. Name of Courses Educational Qualification Required Duration of Course
No ratings yet
Eligibility Criteria and Duration (All Courses 2023-24) Sr. Name of Courses Educational Qualification Required Duration of Course
6 pages
Bis Final
No ratings yet
Bis Final
10 pages
Profile APIs
No ratings yet
Profile APIs
2 pages
Mil Spec
No ratings yet
Mil Spec
36 pages
Documentation Sheet Steel Spring Isolator General
No ratings yet
Documentation Sheet Steel Spring Isolator General
2 pages
Designing Wifi Networks in Warehouses - Antennas and Accessories
No ratings yet
Designing Wifi Networks in Warehouses - Antennas and Accessories
6 pages
Accredited Test Centers For MNRE Off Grid Programme
No ratings yet
Accredited Test Centers For MNRE Off Grid Programme
4 pages
EX4 Constraint
No ratings yet
EX4 Constraint
3 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Java Data Structures Explained: A Practical Guide with Example
From Everand
Java Data Structures Explained: A Practical Guide with Example
William E. Clark
No ratings yet
JavaScript Data Structures Explained: A Practical Guide with Examples
From Everand
JavaScript Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet

Weka Data Miningvsem

Uploaded by

Weka Data Miningvsem

Uploaded by

Weka Data Mining

o Free availability under the GNU General Public License.

o ReplaceMissingWithUserConstant: to fix empty or null value issue.

Requirements and Installation of Weka

java weka.classifiers.trees.ZeroR -t iris.arff

o The header section consists of attribute types,

the data (@data) are case-insensitive.

Let's look at the format with a weather forecast dataset:

@attribute outlook {sunny,overcast,rainy}

Loading of Data in Weka

1. The local file system

o Bayes: consists of algorithms based on Bayes theorem like Naive Bayes

Weka Extension Packages

You might also like