0% found this document useful (0 votes)

95 views9 pages

RapidMiner For ML

This document summarizes an article that analyzes five popular open source data analysis tools: RapidMiner, Weka, R Tool, Knime, and Orange. It describes the tools' capabilities for data preparation, machine learning, predictive analytics, and visualization. The article evaluates the tools based on parameters like the amount of data they can handle, response time, ease of use, cost, supported algorithms, and more. The goal is to determine which tool is most efficient based on these factors.

Uploaded by

basirma.info.officer.2017

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views9 pages

RapidMiner For ML

Uploaded by

basirma.info.officer.2017

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/331169871

An Extensive Study of Data Analysis Tools (Rapid Miner, Weka, R Tool, Knime,
Orange)

Article · September 2018

DOI: 10.14445/23488387/IJCSE-V5I9P102

CITATIONS READS
18 4,098

3 authors, including:

Venkateswarlu Pynam
Jawaharlal Nehru Technological Gurajada University
6 PUBLICATIONS 26 CITATIONS

SEE PROFILE

All content following this page was uploaded by Venkateswarlu Pynam on 10 October 2020.

The user has requested enhancement of the downloaded file.

SSRG International Journal of Computer Science and Engineering ( SSRG – IJCSE ) – Volume 5 Issue 9 – September 2018

An Extensive Study of Data Analysis Tools

(Rapid Miner, Weka, R Tool, Knime, Orange)
Venkateswarlu Pynam 1, R Roje Spanadna2, Kolli Srikanth 3
Assistant professors, Department of Information Technology, University College of engineering Vizianagaram
JNTUniversity KAKINADA, Andhara Pradesh :525003

Abstract namely descriptive, inferential, predictive and

In today’s data has been increasing in the prescriptive analytics. With the increasing need of data
concept of 3 v’s (volume, velocity and variety) analysis [7] some tools that are directly analyze the data
technology. Due to the large and Complex Collection of and derive a conclusion are in demand.
Datasets is difficult to process on traditional data
processing applications. So that leads to arrive a new There are thousands of Big Data tools out for data
technology called Data Analytics. It is a science of analysis at present. Data analysis is the process of
exploring raw data and elicitation the useful inspecting the data, cleaning the data, transforming the
information and hidden pattern. The main aim of data data and modeling data with the goal of discovering
analysis is to use advance analytics techniques for huge useful information, Suggesting conclusions and
and different datasets. The size of the dataset may vary supporting decision making. Data analysis in the areas
from terabytes to zetta bytes and that can be structured of open source data tools, data visualization tools,
or unstructured. The paper gives the comprehensive sentimental tools, data extraction tools and databases.
and theoretical analysis of five open source data These tools are generating a report to summaries the
analytics tools which are RapidMiner, Weka, R tool, conclusions and provide better visualizations and
KNIME and Orange. By employing the study the choice produce accurate results with minimum effort. There
and selection of tools and be made easy, these tools are are different tools available for data analytics like
evaluated on basis of various parameters like amount of RapidMiner, Weka, KNIME, R tool, Orange,
data used, response time, ease of use, price tag, OpenRefine, Solver, Julia, etc [5]. we have choose five
analysis of algorithm and handling. tools among these for comparison which are
RapidMiner, Weka, KNIME, R tool and Orange then
Keywords - Data Analytics, Big Data, Data analytical we will find out most efficient tool among these on
tools, Visualization tools, Data mining. basis of few parameters.

I. INTRODUCTION II. DATA ANALYTICAL TOOLS

Data is a collection of values in the form of raw Open Source Data Tools Rapid Miner is a data
data which is translated into forms that is easy to science software platform which has been developed by
process. The data is been increasing exponentially in Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer at
the digital form since last few decades. Data size has the Artificial Intelligence. RapidMiner [9] that provides
raise from gigabytes to terabytes. This explosive rate of an unified climate for data preparation, machine
data increment is growing day by day and estimations learning, deep learning, text mining, and predictive
tell that the amount of information in world gets double analytics and business analytics. RapidMiner is used for
almost every month. This type of massive amount of business, commercial applications, research, education,
data in both structured and unstructured is called Bid training, rapid prototyping and application development
Data. When handling and processing of data has and supports all machine learning process including
become difficult with conventional databases and data preparation, results visualization, model validation
software techniques. There are different problems with and optimization [8].
big data [1] like processing of large data without solid
analytical techniques become difficult which often RapidMiner uses a client or server model with the
leads to inaccurate result. Data Analytic is the science server offered as either as a premise or in social or
of analyzing data to convert information to useful separate cloud infrastructures. There is no scoping
knowledge. This knowledge could help us understand mechanism in RapidMiner processes therefore objects
our world better and in many contexts enable us to can be stored and retrieved at any nesting level. The
make better decisions. The data analytics techniques are parameter optimization schemes are also available in
structured around of different category of data analytics RapidMiner. Numerous clustering operators are

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 4

SSRG International Journal of Computer Science and Engineering ( SSRG – IJCSE ) – Volume 5 Issue 9 – September 2018

available in RapidMiner that generate a cluster attribute

e.g. the K-Means operator. The macro is one of the Identifying a wider variety of data sources may increase
advanced topics of RapidMiner. RapidMiner naturally [2]
the probability of finding hidden patterns and
calculate the type of attributes of particular dataset and correlations. For example, to provide insight, it can be
all attributes have legitimate role. The type and proper beneficial to identify as many types of related data
role can be changed by using the comparable operators sources as possible, especially when it is unclear
[3]. RapidMiner operators because writing scripts can exactly what to look for. Depending on the business
be time engrossing and error prone [3]. RapidMiner scope of the analysis the nature of the business
keeps datasets in memory as long as possible. So if problems being addressed, the required datasets and
there is any memory left RapidMiner will not dispose their sources can be internal and/or external to the
of previous results of the process. The report described enterprise.
RapidMiner's strengths as a "platform that supports an
extensive breadth and depth of functionality, and with B. Designing data requirement
that it comes quite close to the business market. To perform the data analytics for a distinct
problem, it needs datasets from associated domains.
III. ANALYSIS TECHNIQUES Based on the domain and problem specification, the
data source can be determined and based on the
There are various phases in each of the problem definition the data characteristics of these
analysis process which are performed in order to get the datasets can be defined.
output. These phases are performed in sequential order For example, if we are going to perform social media
to achieve the desired goal effectively. The phases of analytics like problem specification, we use the data
analytics [10] are: source as Facebook or Twitter. For identifying the user
characteristics, we need user profile information, likes,
1. Identify the problem, and posts as data attributes.
2. Designing data requirement,
3. Pre-Processing data, C. Preprocessing data
4. Performing analytics over data and In data analytics, we don‟t use the duplicate
5. Visualizing data. data sources, data characteristics, data tools, and
algorithms all of them will not need data in the
duplicate configuration. This advantage is to the
achievement of data operations, such as data cleansing,
data aggregation, data augmentation, data sorting, and
data formatting to furnish the data in a financed
arrangement to the data tools and as well as algorithms
that will be used in the data analytics.

Preprocessing is used to achieve data operation to

decipher data into a fixed data arrangement previously
furnished data to algorithms. The data analytics process
will be proposed formatted data as the input. In Big
Data, the datasets need to be formatted and transfer to
Hadoop Distributed File System (HDFS) and used
further by distinct nodes with Mappers and Reducers in
Hadoop clusters.

Figure : 1 The phases of analytics D. Performing analytics over data

A. Identify the problem After data is available in the appropriate format
Now a day‟s analytics are performed on web for data analytics applications will be performed. The
datasets because if increasing the use of internet and data analytics applications are achieved for determining
growing business of organizations over internet. This essential knowledge from data to take improved
leads to gradual increase of data size day by day. The decisions towards business in data mining concepts. It
organizations are wants to make predictions over the may use either descriptive or predictive analytics for
data to make desired decisions. The analytical business perception.
applications must be scalable for collecting the datasets.
Let us assume that there is an e-commerce website and Analytics can be achieved with various machine
wants to increase the business [4]. learning and custom algorithmic concepts, such as data

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 5

SSRG International Journal of Computer Science and Engineering ( SSRG – IJCSE ) – Volume 5 Issue 9 – September 2018

regression, data classification, data clustering, etc. For Now to load our data we can simply select the button:
Big Data, the equivalent algorithms can be converted in ‟import data‟. Click on the button „import data‟
to MapReduce algorithms for working on Hadoop Step 1: After locating the file click „next‟.
clusters by converting their data analytics logic to the
MapReduce which is to be run over Hadoop clusters.
These models need to be calculated and improved by
discrete stages of machine learning concepts. The
improved algorithms can provide better observation.

E. Visualizing Data
The capability to analyze large amounts of data
and find useful judgment brings little value that can
clarify the results are the analysts. The Data
Visualization is committed to using data visualization
approach to distinctly disseminate the analysis results
for effective clarification by business users. Business Figure: 3 loading the data
users are able to understand the results in order to
Step 2: Loads in the data and displays much like a
achieve value from the analysis. The results of
spreadsheet.
completing the Data Visualization provide users with
the ability to perform visual analysis [4].

IV. DATA ANALYTICS TOOLS

A. Rapidminer
Rapid Miner is applicable in both Free and open-
source software and economic version and is a
popular predictive analytic platform. Rapid Miner is
helping activity enclose predictive analysis in their
work processes with its user amicable, well-healed
library of data science and machine learning Figure: 4 Loads data in spreadsheet
algorithms through its all-in-one programming
surrounding like Rapid Miner Studio. Likewise the Step 3: In this window we can decide if we want to
basic data mining appearances like data cleansing, exclude any certain column by selecting the „exclude
filtering, clustering, etc. The tool is also compatible column‟ entry. Further you can change the „name‟,
with weak scripts. Rapid Miner is used for business or ‟role‟ or „type‟ of an attribute. Since the default for
commercial applications, research and education. each column for loading is „general attribute‟ in this
Now make sure to highlight the repository so that the case we need to change the role of our „churn‟-
folders end up in right place. Now create a folder attribute.
named „data‟

Figure: 5 load a general attribute

BUILDING A DECISION TREE

To create a decision tree first we have to import a
dataset. Here, we are using a dataset about customer
Figure: 2 import the new data churn. After downloading the dataset and importing

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 6

SSRG International Journal of Computer Science and Engineering ( SSRG – IJCSE ) – Volume 5 Issue 9 – September 2018

into the rapid miner tool, we have to retrieve the data appearance. Predictive modelling was using a linear
from our repository. Now click on the process regression predictor to evaluation sales for each item
directory, highlight your customer data and drag it over. accordingly [6]. Finally, we refine out the appropriate
columns and exported it to a .csv file

1. File reader
The most familiar way to store nearly small
amounts of data is static a text file. Among text files,
the most familiar pattern has been so far the CSV
(Comma Separated Version) format. The “comma” in
the CSV phrase is just one of the available characters to
separate data inner the file. Semicolon, colon, dot, tab,
and many other signs are uniformly sufficient. A more
rigid clarification of the file structure cause of course
Figure: 6 process directory for quick reading. However, occasionally you need a
more malleable definition of the file structure to get to a
Before we actually build a model we have to inspect result, even if it desires a bit of a longer composition
our data for issues and see if we need to do any further time.
preparation. so click on the „output „ port of the
operator and drag a connection on to the „results‟ port
of the process panel.
Now, click the port to establish the connection and
come over to your „run process‟ button and run it.

Figure: 9 read data from an ASCII file or URL location

2. Partitioning
The input table is division into two partitions (i.e.
row-wise), e.g. train and test data. The two separations
are accessible at the two output ports.

Figure: 7 running the dataset

Figure: 10 partitioning the data.

Figure: 8 decision tree for customer churn data 3. Decision Tree
After the data is partitioned into train and test
B. Knime data, a Decision Tree Model is trained and applied.
Knime is a data mining tool that can be used The Decision Tree learner node is important for the
gaining approximately any kind of analysis. We guidance of a decision tree model. Here is a abrupt
explored how to visualise a dataset and retrive essential

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 7

SSRG International Journal of Computer Science and Engineering ( SSRG – IJCSE ) – Volume 5 Issue 9 – September 2018

description of the basic environment available in its Additionally, it is possible to hilight cells of this matrix
configuration window. to determine the underlying rows. The dialog allows
you to select two columns for comparison; the values
from the first selected column are represented in the
confusion matrix's rows and the values from the second
column by the confusion matrix's columns. The output
of the node is the confusion matrix with the number of
matches in each cell. Additionally, the second out-port
reports a number of accuracy statistics such as True-
Positives, False-Positives, True-Negatives, False-
Negatives, Recall, Precision, Sensitivity, Specificity, F-
measure, as well as the overall accuracy and Cohen's
kappa.

Figure: 11 Decision Tree learner node

4. Decision tree image
Decision tree aspect on an image are presently
supported image type is PNG. The data input is
optional. It can be used to provide a column with color
information. This color information is needed for the
chart in the nodes of the decision tree.

Figure: 14 confusion matrix of the node

7. Entropy scorer
Scorer for clustering results given a reference
clustering. Connect the table containing the reference
clustering to the first input port (the table should
contain a column with the cluster IDs) and the table
with the clustering results to the second input port (it
should also contain a column with some cluster IDs).
Select the respective columns in both tables from the
dialog. After successful execution, the view will show
entropy values (the smaller the better) and some quality
Figure: 12 Nodes of the decision tree
value (in [0,1] - with 1 being the best possible value, as
5. Decision tree predictor used in Fuzzy Clustering in Parallel Universes , section
6: "Experimental results").

Figure: 13 predictors of the decision tree Figure: 15 clustering results

6. Scorer
Compares two columns by their attribute value 8. Numeric scorer
pairs and shows the confusion matrix, i.e. how many This node computes certain statistics between the
rows of which attribute and their classification match. a numeric column's values (ri) and predicted (pi)

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 8

SSRG International Journal of Computer Science and Engineering ( SSRG – IJCSE ) – Volume 5 Issue 9 – September 2018

values. It computes R²=1-SSres/SStot=1-Σ(pi-ri)²/Σ(ri- Select the file hypothyroid.arff from the given datasets
1/n*Σri)² (can be negative!), mean absolute error and click on open button
(1/n*Σ|pi-ri|), mean squared error (1/n*Σ(pi-ri)²), root
mean squared error (sqrt(1/n*Σ(pi-ri)²)), and mean
signed difference (1/n*Σ(pi-ri)). The computed values
can be inspected in the node's view and/or further
processed using the output table.
Statistics:
This node calculates statistical moments such as
minimum, maximum, mean, standard deviation,
variance, median, overall sum, number of missing
values and row count across all numeric columns, and
counts all nominal values together with their
occurrences. The dialog offers two options for choosing
the median and/or nominal values calculations:
Figure: 19 select .arff file from datasets
With the Selected dataset Preprocessing is perfomed
and the respective graph is shown based on the class
and data items selected as shown below.

Figure: 16 statistical moments’ calculations

Figure: 20 preprocessing the data

Here we are classifying the dataset based on percentage
split with 65% which yields 95.97% for correctly
classified instances. To Show the output screen simply
click on start button.
Figure: 17 decision tree for data set

C. Weka
Initially after starting the weka explorer the
following window will be appeared where we can
perform various operations using different datasets
available [4]. To load the required dataset simply click
on the button open file and choose the path C:/weka-
3.8/data

Figure: 21 split the data

To generate a Decision tree simply click on the folder
Trees and select the algorithm to generate a decision
tree. Here we are selecting J48 algorithim to generate
Figure: 18 load the thyroid dataset

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 9

SSRG International Journal of Computer Science and Engineering ( SSRG – IJCSE ) – Volume 5 Issue 9 – September 2018

based on the test option “Use Training set”. Now the

following decision tree is generated based on the
classified data items.

Figure: 25 preprocessing the data

Figure: 22 generate a Decision tree
A decision tree is a architecture that includes a root
D. Orange node, branches, and leaf nodes. Each subjective node
Orange provides data visualisation and data stand for a test on an attribute, each branch stand for the
analysis for novice and expert, through interactive outcome of a test, and each leaf node holds a class
workflows. The File widget will now read the famous label. The uppermost node in the tree is the root node.
data set on iris flower dataset, and send it to the
workflow. The changes will proliferate through the
workflow updating its appliance.

Figure: 26 decision tree for dataset

V. CONCLUSION
Figure: 23 read the data on iris flower dataset
Our aim is to inspect different types of animals, Depends on the analysis, Weka would be studied a
classification of them. Field colander design on the very close to KNIME because of its many inherent
canvas and attach it to the File appliance. appearance that require no coding knowledge.
RapidMiner would be considered appropriate for
experts, particularly those in the hard sciences, because
of the additional programming skills that are needed,
and the limited visualization support that is provided.
RapidMiner has good and simple to use graphical
efficiency, so it can be simply used and achieve on any
system, furthermore it integrates superlative algorithms
of other specified tools. R is the leading tool in
visualization but it is a bit harder to create pretty
graphs. R promotes reproducible research. R
commands contribute an identical record of how an
Figure: 24 split the data
We can visualize the pre-processed data in the form of analysis was done. Commands can be alter, rerun,
simple graphs. The above pre-processed data can be clarify, shared, etc. It can be concluded from
visualized by using the box plot graph. information that though data analytics is the basic
concept to all tool yet, In comparison, Orange offers

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 10

SSRG International Journal of Computer Science and Engineering ( SSRG – IJCSE ) – Volume 5 Issue 9 – September 2018

tools that seem to be targeted primarily at people with

probably less need for custom applications into their
own software but a distant accessible time with user
communication, its written in python and origin is
available, user preservatives are supported.

REFERENCES

[1] Lekha R. Nair , Sujala D. Shetty. “Research in Big Data and

Analytics: An Overview” presented at International Journal of
Computer Applications, Volume 108 – No 14, 2014.
[2] Mike Barlow. Real-Time Big Data Analytics: Emerging
Architecture. Sebastopol, CA: O‟Reilly Media, 2013, pp. 3.
[3] Sanjay Rathee. “Big Data and Hadoop with components like
Flume, Pig, Hive and Jaql,” presented at International
Conference on Cloud, Big Data and Trust 2013, RGPV, 2015.
[4] Swasti Singhal, Monika Jena. “A Study on WEKA Tool for
Data Preprocessing, Classification and Clustering” presented at
International Journal of Innovative Technology and Exploring
Engineering (IJITEE), Volume-2, Issue-6,2013
[5] Kalpana Rangra, Dr. K. L. Bansal. “Comparative Study of Data
Mining Tools”, presented at International Journal of Advanced
Research in Computer Science and Software Engineering,
Volume 4, Issue 6, 2014.
[6] Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas
R. Gabriel, Tobias K¨otter, Thorsten Meinl, Peter Ohl, Kilian
Thiel and Bernd Wiswedel. “KNIME – The Konstanz
Information Miner” presented at University of Konstanz
Nycomed Chair for Bioinformatics and Information Mining,
Germany.
[7] http://bigdata-madesimple.com/top-30-big-data-tools-data-
analysis/
[8] http://opensourceforu.com/2017/03/top-10-open-source-data-
mining-tools/
[9] https://rapidminer.com/wp-
content/uploads/2014/10/RapidMiner-5-Operator-Reference.pdf
[10] http://pingax.com/understanding-data-analytics-project-life-
cycle/

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 11

View publication stats

Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
From Everand
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management
Carl A. Bolton
No ratings yet
Practical Data Analysis - Second Edition
From Everand
Practical Data Analysis - Second Edition
Hector Cuesta
No ratings yet
ISO 80000-3 A Complete Guide
From Everand
ISO 80000-3 A Complete Guide
Gerardus Blokdyk
No ratings yet
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
MODULE 1 - ST
No ratings yet
MODULE 1 - ST
13 pages
Dmi Unit 5
No ratings yet
Dmi Unit 5
12 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
DA unit-II
No ratings yet
DA unit-II
15 pages
Data Analysis Class-63820632
No ratings yet
Data Analysis Class-63820632
8 pages
A Comparative Study On Data Mining Tools: Related Papers
No ratings yet
A Comparative Study On Data Mining Tools: Related Papers
4 pages
A Research Review On Comparative Analysis of Data Mining Tools, Techniques and Parameters
No ratings yet
A Research Review On Comparative Analysis of Data Mining Tools, Techniques and Parameters
7 pages
Data Analytics II-unit
No ratings yet
Data Analytics II-unit
20 pages
DWM Practical 1
No ratings yet
DWM Practical 1
6 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
What Is Data Mining Tools
No ratings yet
What Is Data Mining Tools
3 pages
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Bda 4
No ratings yet
Bda 4
18 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Applied Analytics with Spotfire: Definitive Reference for Developers and Engineers
From Everand
Applied Analytics with Spotfire: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Mining and Business Analytics
No ratings yet
Data Mining and Business Analytics
7 pages
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
From Everand
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
Neal Fishman
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Datasist: A Python-Based Library For Easy Data Analysis, Visualization and Modeling
No ratings yet
Datasist: A Python-Based Library For Easy Data Analysis, Visualization and Modeling
17 pages
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Websraching
No ratings yet
Websraching
4 pages
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
Tool For Data Analysis
No ratings yet
Tool For Data Analysis
4 pages
Unit-1 Introduction To Data Analytics
No ratings yet
Unit-1 Introduction To Data Analytics
35 pages
Unit II
No ratings yet
Unit II
32 pages
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Management & Data Architecture
No ratings yet
Data Management & Data Architecture
21 pages
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
Synopsis For Data Analyzer
No ratings yet
Synopsis For Data Analyzer
18 pages
5
No ratings yet
5
1 page
Sales Analysis and Prediction Using Pyth
No ratings yet
Sales Analysis and Prediction Using Pyth
5 pages
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Unit II
No ratings yet
Unit II
91 pages
Big Data Analytics
100% (3)
Big Data Analytics
79 pages
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
From Everand
Architecting Real-Time Analytics with Druid: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
2.data Science Tools
No ratings yet
2.data Science Tools
13 pages
Disruptive Technologies DA Lecture 9
No ratings yet
Disruptive Technologies DA Lecture 9
15 pages
Kingword
No ratings yet
Kingword
11 pages
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Unit 3 Data-Analytics
No ratings yet
Unit 3 Data-Analytics
48 pages
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
From Everand
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
Pradeep Pasupuleti
No ratings yet
Big Data
No ratings yet
Big Data
10 pages
Adbase Presentation Group 4
No ratings yet
Adbase Presentation Group 4
60 pages
Big Data Big Data Analysis, I
No ratings yet
Big Data Big Data Analysis, I
10 pages
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
No ratings yet
Literature Review On Big Data Analytics Vishal Kumar Harsh Bansal
6 pages
(BIT-601) Data Analytics Question Bank
No ratings yet
(BIT-601) Data Analytics Question Bank
56 pages
Reasearch Paper 2
No ratings yet
Reasearch Paper 2
15 pages
Reporting V/s Analysis: Dr. Anil Kumar Dubey
No ratings yet
Reporting V/s Analysis: Dr. Anil Kumar Dubey
30 pages
Vishwakarma Institute of Technology
No ratings yet
Vishwakarma Institute of Technology
32 pages
Machine Learning With Python
100% (1)
Machine Learning With Python
14 pages
K Metoids
No ratings yet
K Metoids
18 pages
Intro To Machine Learning 101 Python Data Science v2
No ratings yet
Intro To Machine Learning 101 Python Data Science v2
101 pages
Drain Log Parser
No ratings yet
Drain Log Parser
12 pages
Polygenic Scoring Accuracy Varies Across The Genetic Ancestry Continuum
No ratings yet
Polygenic Scoring Accuracy Varies Across The Genetic Ancestry Continuum
25 pages
Stream and Pool Based Active Learning
No ratings yet
Stream and Pool Based Active Learning
11 pages
Coventry (1,2,3,4)
No ratings yet
Coventry (1,2,3,4)
49 pages
DVT (Lab) - R Language Manual
No ratings yet
DVT (Lab) - R Language Manual
20 pages
DMBI Theory
No ratings yet
DMBI Theory
15 pages
Program Book
No ratings yet
Program Book
51 pages
Lab 1: Preprocessing Using Python
No ratings yet
Lab 1: Preprocessing Using Python
5 pages
V11-2021-160 - Credits Scheme and Syllabus
No ratings yet
V11-2021-160 - Credits Scheme and Syllabus
90 pages
UCS551 Chapter 7 - Clustering
No ratings yet
UCS551 Chapter 7 - Clustering
6 pages
A02-Multivariate Time Series Clustering Based On Complex Network
No ratings yet
A02-Multivariate Time Series Clustering Based On Complex Network
17 pages
Machine Learning in The Era of Big Data 1
No ratings yet
Machine Learning in The Era of Big Data 1
14 pages
BADS (KMBA 106) - Qus Bank
No ratings yet
BADS (KMBA 106) - Qus Bank
7 pages
Huang, S. L. Ku, H. H. - (Skala) Brend Imidz I Tipicnost
No ratings yet
Huang, S. L. Ku, H. H. - (Skala) Brend Imidz I Tipicnost
17 pages
Microsoft Word - B.Tech. - 3rd - Yr - CSE (DS) - 2022 - 23
No ratings yet
Microsoft Word - B.Tech. - 3rd - Yr - CSE (DS) - 2022 - 23
43 pages
Cluster Analysis - Part B
No ratings yet
Cluster Analysis - Part B
25 pages
Cluster Dynamics in Theory and Practice
No ratings yet
Cluster Dynamics in Theory and Practice
30 pages
02-Data Mining Functionalities-2
No ratings yet
02-Data Mining Functionalities-2
23 pages
Deterministic Feature Selection For KMeans Clustering
No ratings yet
Deterministic Feature Selection For KMeans Clustering
12 pages
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
100% (1)
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
41 pages
Confident Data Skills 2nd Edition Eremenko Download
No ratings yet
Confident Data Skills 2nd Edition Eremenko Download
55 pages
Coronnello Et Al. - 2005 - Sector Identification in A Set of Stock Return Time Series Traded at The London Stock Exchange (2) - Annotated
No ratings yet
Coronnello Et Al. - 2005 - Sector Identification in A Set of Stock Return Time Series Traded at The London Stock Exchange (2) - Annotated
27 pages
Big Data Analytics
No ratings yet
Big Data Analytics
19 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Statistical Analysis of Brain MRI Image Segmentation For The Level Set Method
No ratings yet
Statistical Analysis of Brain MRI Image Segmentation For The Level Set Method
6 pages
SRM Digest 2010
No ratings yet
SRM Digest 2010
208 pages

RapidMiner For ML

Uploaded by

RapidMiner For ML

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article · September 2018

The user has requested enhancement of the downloaded file.

An Extensive Study of Data Analysis Tools

Abstract namely descriptive, inferential, predictive and

I. INTRODUCTION II. DATA ANALYTICAL TOOLS

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 4

available in RapidMiner that generate a cluster attribute

Preprocessing is used to achieve data operation to

Figure : 1 The phases of analytics D. Performing analytics over data

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 5

IV. DATA ANALYTICS TOOLS

Figure: 5 load a general attribute

BUILDING A DECISION TREE

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 6

Figure: 9 read data from an ASCII file or URL location

Figure: 7 running the dataset

Figure: 10 partitioning the data.

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 7

Figure: 11 Decision Tree learner node

Figure: 14 confusion matrix of the node

Figure: 13 predictors of the decision tree Figure: 15 clustering results

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 8

Figure: 16 statistical moments’ calculations

Figure: 20 preprocessing the data

Figure: 21 split the data

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 9

based on the test option “Use Training set”. Now the

Figure: 25 preprocessing the data

Figure: 26 decision tree for dataset

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 10

tools that seem to be targeted primarily at people with

[1] Lekha R. Nair , Sujala D. Shetty. “Research in Big Data and

ISSN: 2348 – 8387 http://www.internationaljournalssrg.org Page 11

View publication stats

You might also like