EX-01-Weka and Rapidminer
EX-01-Weka and Rapidminer
AIM:
To explore the various features of Weka, Rapid miner Tools and UCI Repository
datasets.
PROCEDURE:
FUNDAMENTAL TERMS:
Feature/Attribute: A single column of data is called a feature. It is a component of an
observation and is also called an attribute of a data instance. Some features may be inputs to a
model (the predictors) and others may be outputs or the features to be predicted.
Attribute values: Attribute values are numbers or symbols assigned to an attribute.
Target attribute: Target attribute is a special attribute which corresponds to the label of each
instance.
Instance: Each row in the dataset is called the instance.
Datasets: A collection of instances is a dataset.
Training Dataset: A dataset that is fed into the machine learning algorithm to train the model.
Testing Dataset: A dataset that is used to validate the accuracy of the model but is not used to
train the model.
WEKA:
1. Download and install weka,
2. In the window, select the explorer button from the available five buttons.
3. The weka supports two common formats for files:
ARFF-Attribute Relation File Format
CSV-Comma Separated Values
EXPLORER:
The explorer window contains preprocess, classify, cluster, associate, select attribute and
visualize from which select preprocess.
OPEN FILE:
To open the default dataset into the machine.
OPEN URL:
To access the dataset in the website.
OPENDB:
To open the database which the user saved in the machine.
CHOOSE:
To select the filter option.
EDIT:
To set the filled dataset before and after the filter.
FILTERS:
To filter or tune the data.
1. REMOVE(ATTRIBUTE) :
4. REMOVE PERCENTAGE:
A filter that removes a given percentage of a database.
1. Choose unsupervised and select instance.
2. Select “remove percentage” filter.
3. Set percentage as “50.0”
4. After filter is applied , from the dataset 50% of the instance are removed
RAPIDMINER:
Design View
Preprocessing: Replace missing values
Sample dataset:
Conclusion: The various features of Weka Tool, Rapidminer Tool and UCI Repository datasets
have been explored.