Rapidminer Report
Rapidminer Report
Student Name :
Alaa Ali Ahmed Farghaly
Student code :
202611000021
Subject Name :
Data Analytics programming
Subject Code :
IS403
Subject Lecturer :
Dr. Ahmed Adel
Teaching assistant :
Eng. Abd El-Rahman Ahmed Taher
1- Introduction about Rapidminer :
• Designing Workflows:
You create a workflow
by dragging and
dropping operators from the Operators Panel onto the Process
Panel.
• Connecting Operators: Operators are connected with ‘ports’
that define the flow of data from one operation to the next.
• Executing Processes: Once the operators are connected, you
can run the entire process or step through it one operator at a
time to debug or understand intermediate steps.
• Modifying Workflows: You can easily modify a workflow by
adding, removing, or rearranging operators to optimize or
adjust the analysis process.
- The Operators Panel is a comprehensive library of
all the operators available in RapidMiner:
Information :
Name: Iris
Number of rows: 150
Number of columns: 5
Label / Target :
Name: label
Type: nominal
Range: [Iris-setosa, Iris-versicolor, Iris-virginica]
Missing: 0
Attributes / Columns :
a1, a2, a3, a4
- Preprocessing Data :
The decision tree model can be applied to new Examples using the
Apply Model Operator. Each Example follows the branches of the
tree in accordance to the splitting rule until a leaf is reached.
Input
• training set (Data Table)
The input data which is used to generate the decision tree
model.
Output
• model (Decision Tree)
The decision tree model is delivered from this output port.
• example set (Data Table)
The ExampleSet that was given as input is passed without
changing to the output through this port.
• weights (Attribute Weights)
An ExampleSet containing Attributes and weight values,
where each weight represents the feature importance for the
given Attribute. A weight is given by the sum of
improvements the selection of a given Attribute provided at a
node. The amount of improvement is dependent on the
chosen criterion.
Other operations :
o Read CSV : This Operator reads an ExampleSet from the
specified CSV file.
o Set Role : This Operator is used to change the role of one or
more Attributes.
o Multiply : This Operator creates copies of a RapidMiner
Object.
o Cross validation : This Operator performs a cross validation
to estimate the statistical performance of a learning model.
o Weight by info gain : This operator calculates the relevance
of the attributes based on information gain and assigns
weights to them accordingly.
o Apply model :This Operator applies a model on an
ExampleSet.
o Performance : This operator is used for statistical
performance evaluation of classification tasks. This operator
delivers a list of performance criteria values of the
classification task.
-
2- Naive Bayes :
- Data set used ( IRIS dataset ) :
Information :
Name: Iris
Number of rows: 150
Number of columns: 5
Label / Target :
Name: label
Type: nominal
Range: [Iris-setosa, Iris-versicolor, Iris-virginica]
Missing: 0
Attributes / Columns :
a1, a2, a3, a4
- Preprocessing Data :
Input
Output
• model (Model)
3- KNN :
- Data set used ( IRIS dataset ) :
Information :
Name: Iris
Number of rows: 150
Number of columns: 5
Label / Target :
Name: label
Type: nominal
Range: [Iris-setosa, Iris-versicolor, Iris-virginica]
Missing: 0
Attributes / Columns :
a1, a2, a3, a4
- Preprocessing Data :
Input
Output
• model (Model)
Results :
4- Linear Regression :
- Data set used ( Advertising dataset) :
Information
Name: Advertising
Number of rows: 200
Number of columns: 5
Target :
Name: sale
Type: numerical
Attributes / Columns
att1, TV, radio, newspaper
- Preprocessing Data :
Input
Output
Results :
5- Polynomial Regression :
- Data set used ( Real estate ) :
Information
Target :
Name: Y house price of unit area
Type: numerical
Attributes / Columns
No, X1 transaction date, X2 house age, X3 distance to the nearest
MRT station, X4 number of convenience stores, X5 latitude, X6
longitude
- Preprocessing Data :
Input
Output
• model (Model)
6- PCA :
- Data set used ( IRIS dataset ) :
Information :
Name: Iris
Number of rows: 150
Number of columns: 5
Label / Target :
Name: label
Type: nominal
Range: [Iris-setosa, Iris-versicolor, Iris-virginica]
Missing: 0
Attributes / Columns :
a1, a2, a3, a4
- Preprocessing Data :
- Rapidminer provide auto cleansing which remove low quality
columns , replace missing values etc based on data set and it’s
requirements .
Input
Output