0% found this document useful (0 votes)
23 views9 pages

EX-01-Weka and Rapidminer

The document outlines a study aimed at exploring the features of Weka, Rapid Miner tools, and UCI repository datasets. It details fundamental terms related to data attributes and instances, along with step-by-step procedures for using Weka and Rapid Miner to preprocess datasets. The conclusion emphasizes the exploration of various features across these tools and datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views9 pages

EX-01-Weka and Rapidminer

The document outlines a study aimed at exploring the features of Weka, Rapid Miner tools, and UCI repository datasets. It details fundamental terms related to data attributes and instances, along with step-by-step procedures for using Weka and Rapid Miner to preprocess datasets. The conclusion emphasizes the exploration of various features across these tools and datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

EX.

No : 01 STUDY OF WEKA, RAPID MINER TOOLS AND UCI REPOSITORY DATASETS

AIM:
To explore the various features of Weka, Rapid miner Tools and UCI Repository
datasets.

PROCEDURE:
FUNDAMENTAL TERMS:
Feature/Attribute: A single column of data is called a feature. It is a component of an
observation and is also called an attribute of a data instance. Some features may be inputs to a
model (the predictors) and others may be outputs or the features to be predicted.
Attribute values: Attribute values are numbers or symbols assigned to an attribute.
Target attribute: Target attribute is a special attribute which corresponds to the label of each
instance.
Instance: Each row in the dataset is called the instance.
Datasets: A collection of instances is a dataset.
Training Dataset: A dataset that is fed into the machine learning algorithm to train the model.
Testing Dataset: A dataset that is used to validate the accuracy of the model but is not used to
train the model.
WEKA:
1. Download and install weka,
2. In the window, select the explorer button from the available five buttons.
3. The weka supports two common formats for files:
ARFF-Attribute Relation File Format
CSV-Comma Separated Values
EXPLORER:
The explorer window contains preprocess, classify, cluster, associate, select attribute and
visualize from which select preprocess.
OPEN FILE:
To open the default dataset into the machine.
OPEN URL:
To access the dataset in the website.
OPENDB:
To open the database which the user saved in the machine.
CHOOSE:
To select the filter option.
EDIT:
To set the filled dataset before and after the filter.
FILTERS:
To filter or tune the data.
1. REMOVE(ATTRIBUTE) :

A filter that removes a range of attributes from the dataset.


1. Open file button is clicked
2. Choose weka and open the data folder
3. Choose weather numeric.arff
4. In filter tab, click choose button
5. Choose unsupervised and select attribute
6. Select remove filter
7. Specify the attribute index in the filter editor window
8. Apply button is clicked
9. Choose the edit button to see the output data after filtering the attribute
2. REMOVE WITH VALUES:
Filters instances according to the value of the attribute.
1. Open file button is clicked
2. Choose weather numeric.arff
3. Choose unsupervised and select instance
4. Select “remove with values” filter
5. Set attribute index to 2 and split point to 60
6. The output contains column dataset in which the second column contains only the
values which is above the split point

3. REPLACE WITH MISSING VALUES:


Replace all missing values for nominal and numeric attributes in a dataset with the
modes and means from the training data.
1. Choose unsupervised and select attributes.
2. Select “remove missing values” filter.
3. Click “edit” and delete any one of the data.
4. After applying the filter, the deleted values or any missing values are replaced by
taking the mean values.

4. REMOVE PERCENTAGE:
A filter that removes a given percentage of a database.
1. Choose unsupervised and select instance.
2. Select “remove percentage” filter.
3. Set percentage as “50.0”
4. After filter is applied , from the dataset 50% of the instance are removed

5. REMOVE FREQUENT VALUES:


Determine which values of attribute or retained and filters the instances accordingly.
1. Choose unsupervised and select instances.
2. Select “remove frequent values” filter.
3. Specify the attribute index as 2.
4. When apply is clicked the less frequently repeated values are removed
OUTPUT:
Dataset: Weather.numeric.arff

Applying Remove Filter:

Applying Remove with values Filter


Applying Replacing Missing Value Filter

Applying Remove Percentage Filter


Applying Remove Frequent Values Filter

RAPIDMINER:

Design View
Preprocessing: Replace missing values

1. Load the Labor-Negotiations data set from the Samples folder.


2. Drag and drop the Replace Missing Values Operator. It applies the replacement on all
attributes in the dataset which have at least one missing value.
3. Click the play button and view the output.

Dataset with missing values:

Dataset after filling missing values:


UCI Repository:

Sample dataset:
Conclusion: The various features of Weka Tool, Rapidminer Tool and UCI Repository datasets
have been explored.

You might also like