0% found this document useful (0 votes)
23 views9 pages

Exp 6

The document describes using the WEKA data mining tool to perform data pre-processing, classification, clustering, association rule mining, and visualization on datasets. Key steps include cleansing and transforming raw data during pre-processing, selecting a machine learning algorithm like Naive Bayes for classification, applying clustering algorithms like k-means, using the Apriori algorithm for association rule mining, and visualizing results.

Uploaded by

ansari amman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views9 pages

Exp 6

The document describes using the WEKA data mining tool to perform data pre-processing, classification, clustering, association rule mining, and visualization on datasets. Key steps include cleansing and transforming raw data during pre-processing, selecting a machine learning algorithm like Naive Bayes for classification, applying clustering algorithms like k-means, using the Apriori algorithm for association rule mining, and visualizing results.

Uploaded by

ansari amman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

EXPERIMENT NO.

AIM:-
Perform data Pre-processing task and demonstrate Classification,
Clustering, Association algorithm on data sets using data mining tool
(WEKA/R tool).

THEORY:-
WEKA - an open source software provides tools for data pre-processing,
implementation of several Machine Learning algorithms, and visualization
tools so that you can develop machine learning techniques and apply them to
real- world data mining problems. What WEKA offers is summarized in the
following diagram −
If you observe the beginning of the flow of the image, you will understand that
there are many stages in dealing with Big Data to make it suitable for machine
learning −
First, you will start with the raw data collected from the field. This data may
contain several null values and irrelevant fields. You use the data pre-
processing tools provided in WEKA to cleanse the data.
Then, you would save the pre-processed data in your local storage for applying
ML algorithms.
Next, depending on the kind of ML model that you are trying to develop you
would select one of the options such as Classify, Cluster, or Associate.
The Attributes Selection allows the automatic selection of features to create a
reduced dataset.
Note that under each category, WEKA provides the implementation of several
algorithms. You would select an algorithm of your choice, set the desired
parameters and run it on the dataset.
Then, WEKA would give you the statistical output of the model processing. It
provides you a visualization tool to inspect the data.
The various models can be applied on the same dataset. You can then compare
the outputs of different models and select the best that meets your purpose.
Thus, the use of WEKA results in a quicker development of machine learning
models on the whole.
Pre-processing using WEKA:
The data that is collected from the field contains many unwanted things that
leads to wrong analysis. For example, the data may contain null fields, it may
contain columns that are irrelevant to the current analysis, and so on. Thus, the
data must be pre-processed to meet the requirements of the type of analysis you
are seeking. This is the done in the pre-processing module.
To demonstrate the available features in pre-processing, we will use
the Abalone database that is provided in the installation.
Using the Open file ... option under the Pre-process tag select
the abalone.arff file.

Using Filters:
Some of the machine learning techniques such as association rule mining
requires categorical data.
weka→filters→supervised→attribute→Discretize
weka→filters→unsupervised→attribute→ReplaceWithMissing Values

Clustering Using WEKA:


A clustering algorithm finds groups of similar instances in the entire dataset.
WEKA supports several clustering algorithms such as EM, FilteredClusterer,
HierarchicalClusterer, SimpleKMeans and so on. You should understand these
algorithms completely to fully exploit the WEKA capabilities.
As in the case of classification, WEKA allows you to visualize the detected
clusters graphically.
Click on the Cluster TAB to apply the clustering algorithms to our loaded
data. Click on the Choose button and choose HierarchicalClusterer.
Classification using WEKA:
Many machine learning applications are classification related. For example,
you may like to classify a tumor as malignant or benign. You may like to
decide whether to play an outside game depending on the weather conditions.
Generally, this decision is dependent on several features/conditions of the
weather. So you may prefer to use a tree classifier to make your decision of
whether to play or not.
In this chapter, we will learn how to build such a naïve bayes classifier
Naive Bayes is a classification algorithm. Traditionally it assumes that the input
values are nominal, although it numerical inputs are supported by assuming a
distribution.

Naive Bayes uses a simple implementation of Bayes Theorem (hence naive)


where the prior probability for each class is calculated from the training data
and assumed to be independent of each other (technically called conditionally
independent).

Selecting Classifier
Click on the Choose button and select the following classifier −
weka→classifiers>bayes>Naïve Bayes
Association Rule mining using WEKA:
It was observed that people who buy beer also buy diapers at the same time.
That is there is an association in buying beer and diapers together. Though this
seems not well convincing, this association rule was mined from huge
databases of supermarkets. Similarly, an association may be found between
peanut butter and bread.
Finding such associations becomes vital for supermarkets as they would stock
diapers next to beers so that customers can locate both items easily resulting in
an increased sale for the supermarket.
The Apriori algorithm is one such algorithm in ML that finds out the probable
associations and creates association rules. WEKA provides the implementation
of the Apriori algorithm. You can define the minimum support and an
acceptable confidence level while computing these rules.

Visualization using WEKA:


Data visualization in WEKA can be performed using sample datasets or user-
made datasets in .arff,.csv format. Association Rule Mining is performed using
the Apriori algorithm. It is the only algorithm provided by WEKA to perform
frequent pattern mining.

Data Visualization
The method of representing data through graphs and plots with the aim to
understand data clearly is data visualization.

There are many ways to represent data. Some of them are as follows:
1) Pixel Oriented Visualization: Here the color of the pixel represents the
dimension value. The color of the pixel represents the corresponding values.
2) Geometric Representation: The multidimensional datasets are represented
in 2D, 3D, and 4D scatter plots.
3) Icon Based Visualization: The data is represented using Chernoff’s faces
and stick figures. Chernoff’s faces use the human mind’s ability to recognize
facial characteristics and differences between them. The stick figure uses 5 stick
figures to represent multidimensional data.
4) Hierarchical Data Visualization: The datasets are represented
using treemaps. It represents hierarchical data as a set of nested
triangles.

You might also like