Exp 6

The document describes using the WEKA data mining tool to perform data pre-processing, classification, clustering, association rule mining, and visualization on datasets. Key steps include cleansing and transforming raw data during pre-processing, selecting a machine learning algorithm like Naive Bayes for classification, applying clustering algorithms like k-means, using the Apriori algorithm for association rule mining, and visualizing results.

Uploaded by

ansari amman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views9 pages

Exp 6

Uploaded by

ansari amman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

EXPERIMENT NO.

AIM:-
Perform data Pre-processing task and demonstrate Classification,
Clustering, Association algorithm on data sets using data mining tool
(WEKA/R tool).

THEORY:-
WEKA - an open source software provides tools for data pre-processing,
implementation of several Machine Learning algorithms, and visualization
tools so that you can develop machine learning techniques and apply them to
real- world data mining problems. What WEKA offers is summarized in the
following diagram −
If you observe the beginning of the flow of the image, you will understand that
there are many stages in dealing with Big Data to make it suitable for machine
learning −
First, you will start with the raw data collected from the field. This data may
contain several null values and irrelevant fields. You use the data pre-
processing tools provided in WEKA to cleanse the data.
Then, you would save the pre-processed data in your local storage for applying
ML algorithms.
Next, depending on the kind of ML model that you are trying to develop you
would select one of the options such as Classify, Cluster, or Associate.
The Attributes Selection allows the automatic selection of features to create a
reduced dataset.
Note that under each category, WEKA provides the implementation of several
algorithms. You would select an algorithm of your choice, set the desired
parameters and run it on the dataset.
Then, WEKA would give you the statistical output of the model processing. It
provides you a visualization tool to inspect the data.
The various models can be applied on the same dataset. You can then compare
the outputs of different models and select the best that meets your purpose.
Thus, the use of WEKA results in a quicker development of machine learning
models on the whole.
Pre-processing using WEKA:
The data that is collected from the field contains many unwanted things that
leads to wrong analysis. For example, the data may contain null fields, it may
contain columns that are irrelevant to the current analysis, and so on. Thus, the
data must be pre-processed to meet the requirements of the type of analysis you
are seeking. This is the done in the pre-processing module.
To demonstrate the available features in pre-processing, we will use
the Abalone database that is provided in the installation.
Using the Open file ... option under the Pre-process tag select
the abalone.arff file.

Using Filters:
Some of the machine learning techniques such as association rule mining
requires categorical data.
weka→filters→supervised→attribute→Discretize
weka→filters→unsupervised→attribute→ReplaceWithMissing Values

Clustering Using WEKA:

A clustering algorithm finds groups of similar instances in the entire dataset.
WEKA supports several clustering algorithms such as EM, FilteredClusterer,
HierarchicalClusterer, SimpleKMeans and so on. You should understand these
algorithms completely to fully exploit the WEKA capabilities.
As in the case of classification, WEKA allows you to visualize the detected
clusters graphically.
Click on the Cluster TAB to apply the clustering algorithms to our loaded
data. Click on the Choose button and choose HierarchicalClusterer.
Classification using WEKA:
Many machine learning applications are classification related. For example,
you may like to classify a tumor as malignant or benign. You may like to
decide whether to play an outside game depending on the weather conditions.
Generally, this decision is dependent on several features/conditions of the
weather. So you may prefer to use a tree classifier to make your decision of
whether to play or not.
In this chapter, we will learn how to build such a naïve bayes classifier
Naive Bayes is a classification algorithm. Traditionally it assumes that the input
values are nominal, although it numerical inputs are supported by assuming a
distribution.

Naive Bayes uses a simple implementation of Bayes Theorem (hence naive)

where the prior probability for each class is calculated from the training data
and assumed to be independent of each other (technically called conditionally
independent).

Selecting Classifier
Click on the Choose button and select the following classifier −
weka→classifiers>bayes>Naïve Bayes
Association Rule mining using WEKA:
It was observed that people who buy beer also buy diapers at the same time.
That is there is an association in buying beer and diapers together. Though this
seems not well convincing, this association rule was mined from huge
databases of supermarkets. Similarly, an association may be found between
peanut butter and bread.
Finding such associations becomes vital for supermarkets as they would stock
diapers next to beers so that customers can locate both items easily resulting in
an increased sale for the supermarket.
The Apriori algorithm is one such algorithm in ML that finds out the probable
associations and creates association rules. WEKA provides the implementation
of the Apriori algorithm. You can define the minimum support and an
acceptable confidence level while computing these rules.

Visualization using WEKA:

Data visualization in WEKA can be performed using sample datasets or user-
made datasets in .arff,.csv format. Association Rule Mining is performed using
the Apriori algorithm. It is the only algorithm provided by WEKA to perform
frequent pattern mining.

Data Visualization
The method of representing data through graphs and plots with the aim to
understand data clearly is data visualization.

There are many ways to represent data. Some of them are as follows:
1) Pixel Oriented Visualization: Here the color of the pixel represents the
dimension value. The color of the pixel represents the corresponding values.
2) Geometric Representation: The multidimensional datasets are represented
in 2D, 3D, and 4D scatter plots.
3) Icon Based Visualization: The data is represented using Chernoff’s faces
and stick figures. Chernoff’s faces use the human mind’s ability to recognize
facial characteristics and differences between them. The stick figure uses 5 stick
figures to represent multidimensional data.
4) Hierarchical Data Visualization: The datasets are represented
using treemaps. It represents hierarchical data as a set of nested
triangles.

The Effect of An Unplugged Coding Course On Primary
No ratings yet
The Effect of An Unplugged Coding Course On Primary
17 pages
Planning: 1 - Task 2: Game Plan
60% (5)
Planning: 1 - Task 2: Game Plan
1 page
Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
Positive Reinforcement
100% (1)
Positive Reinforcement
2 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
Wekappt
No ratings yet
Wekappt
58 pages
Data Warehousing Lab Exp 1-3
No ratings yet
Data Warehousing Lab Exp 1-3
24 pages
DWM1
No ratings yet
DWM1
19 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
Unit-7 Tools of AI (April 9, 2024)
No ratings yet
Unit-7 Tools of AI (April 9, 2024)
88 pages
WEKA Practical Protocol
No ratings yet
WEKA Practical Protocol
40 pages
Data Mining (WEKA) en
No ratings yet
Data Mining (WEKA) en
51 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Data Mining (WEKA) en Formatted
No ratings yet
Data Mining (WEKA) en Formatted
52 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Final Weka Lab Tutorial
No ratings yet
Final Weka Lab Tutorial
142 pages
DMW LabFile 0901CS243D11 Swastik
No ratings yet
DMW LabFile 0901CS243D11 Swastik
25 pages
What Is Weka
No ratings yet
What Is Weka
2 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Printing 1-3
No ratings yet
Printing 1-3
36 pages
DWDM Lab File
No ratings yet
DWDM Lab File
29 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
Exp 6
No ratings yet
Exp 6
12 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Machine Learning With WEKA An Introduction
No ratings yet
Machine Learning With WEKA An Introduction
66 pages
Laboratory Manual On: Data Mining
No ratings yet
Laboratory Manual On: Data Mining
41 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
50 pages
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
No ratings yet
Introduction To WEKA: Data Mining WEKA - What Is It? Weka Uis Integration With Pentaho Projects Based On Weka
27 pages
DA LabFile
No ratings yet
DA LabFile
63 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
DMDV 210
No ratings yet
DMDV 210
63 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
No ratings yet
Overview: Data Mining Methods: WEKA: A Machine Learning Toolkit The Explorer
41 pages
Data Warehousing Full
No ratings yet
Data Warehousing Full
41 pages
DMDV Main Manual
No ratings yet
DMDV Main Manual
35 pages
32013105-BDA LabManual
No ratings yet
32013105-BDA LabManual
122 pages
Data Warehousing Laboratory
0% (1)
Data Warehousing Laboratory
28 pages
Deepak Dmbi File
No ratings yet
Deepak Dmbi File
40 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
Dataminingg
No ratings yet
Dataminingg
22 pages
Itdw
No ratings yet
Itdw
44 pages
Data Warehousing and Data Mining Lab Manual
0% (1)
Data Warehousing and Data Mining Lab Manual
30 pages
Department of Computer Engineering: Experiment No.3
No ratings yet
Department of Computer Engineering: Experiment No.3
4 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
DMDV
No ratings yet
DMDV
22 pages
Weka Overview Slides
No ratings yet
Weka Overview Slides
31 pages
DW 9 Exp 1
No ratings yet
DW 9 Exp 1
43 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Research Methodology: Diana Vanegas
No ratings yet
Research Methodology: Diana Vanegas
9 pages
7 Simple Strategies To Improve Reading
No ratings yet
7 Simple Strategies To Improve Reading
3 pages
Is X-Ray Examination Necessary?: Lesson Plan
No ratings yet
Is X-Ray Examination Necessary?: Lesson Plan
12 pages
Psychology Practical 231204 221214
No ratings yet
Psychology Practical 231204 221214
43 pages
English 9 Quarter 1 Module 3
No ratings yet
English 9 Quarter 1 Module 3
28 pages
Lesson 1-The K To 12 Framework
No ratings yet
Lesson 1-The K To 12 Framework
14 pages
Lesson Plan What's Your Address
100% (1)
Lesson Plan What's Your Address
3 pages
Diverse Life Cycles 3 Lesson Unit - Lesson 3
No ratings yet
Diverse Life Cycles 3 Lesson Unit - Lesson 3
7 pages
Aussie Rules Day 1
No ratings yet
Aussie Rules Day 1
4 pages
ISSUES Unlocked
No ratings yet
ISSUES Unlocked
3 pages
Pedagogy Solved Questions
No ratings yet
Pedagogy Solved Questions
10 pages
Deep Learning Training Best Practices
No ratings yet
Deep Learning Training Best Practices
40 pages
DLL - RW - LC 1 - Francis EN1112RWS-IIIa-1
No ratings yet
DLL - RW - LC 1 - Francis EN1112RWS-IIIa-1
3 pages
Volleyball Differentiation
No ratings yet
Volleyball Differentiation
9 pages
Field Study 1 Syllabus New
No ratings yet
Field Study 1 Syllabus New
14 pages
Case Study Analysis of Senior High School Students
No ratings yet
Case Study Analysis of Senior High School Students
6 pages
Syllabus RRB Teacher 2025
No ratings yet
Syllabus RRB Teacher 2025
4 pages
Detailed-Lesson-Plan - Module 2 Final Na Jod
No ratings yet
Detailed-Lesson-Plan - Module 2 Final Na Jod
13 pages
MGT 3013 Questions Ch03
100% (1)
MGT 3013 Questions Ch03
48 pages
Essentials of Instructional Technology: Mudasir Hamid Malik Aqueel Ahmad Pandith
No ratings yet
Essentials of Instructional Technology: Mudasir Hamid Malik Aqueel Ahmad Pandith
66 pages
Artikel Kartini
No ratings yet
Artikel Kartini
8 pages
Distance Education Bibliography Annotated
No ratings yet
Distance Education Bibliography Annotated
135 pages
Lor Salazar Maria
No ratings yet
Lor Salazar Maria
1 page
K.Raja Sravan Kumar, Contact No. +91-9963819172 Career Objective
No ratings yet
K.Raja Sravan Kumar, Contact No. +91-9963819172 Career Objective
4 pages
WWW Selfstudys Com Books New Ncert Books English Class 11 Physical Education Cbse 10 Training and Doping in Sports 487899
No ratings yet
WWW Selfstudys Com Books New Ncert Books English Class 11 Physical Education Cbse 10 Training and Doping in Sports 487899
3 pages
Malaysia National Education Philosophy
100% (1)
Malaysia National Education Philosophy
25 pages
Physics Global News: Aloysius Niko, A Best Madya Laboratory's Assistant About Me
No ratings yet
Physics Global News: Aloysius Niko, A Best Madya Laboratory's Assistant About Me
3 pages

Exp 6

Uploaded by

Exp 6

Uploaded by

EXPERIMENT NO.

Clustering Using WEKA:

Naive Bayes uses a simple implementation of Bayes Theorem (hence naive)

Visualization using WEKA:

You might also like