RapidMiner
Developer(s) rapid-i.com
Stable release 5.2 / 1 February 2012; 4 months ago  (2012-02-01)
Operating system Cross-platform
Type Artificial Intelligence
License AGPL
Website sourceforge.net/projects/rapidminer

RapidMiner, formerly YALE (Yet Another Learning Environment), is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications. In a poll by KDnuggets, a data-mining newspaper, RapidMiner ranked second in data mining/analytic tools used for real projects in 2009[1] and was first in 2010.[2] It is distributed under the AGPL open source license and has been hosted by SourceForge since 2004.

The RapidMiner project was started in 2001 by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer at the Artificial Intelligence Unit of the University of Dortmund. In 2006 Ingo Mierswa and Ralf Klinkenberg founded the company Rapid-I that is now the main contributor out of more than 30 international developers further developing RapidMiner.

Contents

Purpose [link]

RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, modelling, evaluation, and deployment. The data mining processes can be made up of arbitrarily nestable operators, described in XML files and created in RapidMiner's graphical user interface (GUI). RapidMiner is written in the Java programming language. It also integrates learning schemes and attribute evaluators of the Weka machine learning environment and statistical modelling schemes of the R-Project.

The Community Edition of RapidMiner is a toolkit for data mining. It is able to define analytical steps (similar to R), and in generating graphs like MS Excel. It is also used for analyzing data generated by high-throughput instruments used in processes such as genotyping, proteomics, and mass spectrometry.

Example applications:

  • Bypassing its data mining functions and have RapidMiner generate figures.
  • Exploring data in Microsoft Excel fashion ("knowledge discovery").
  • Constructing custom data analysis workflows.
  • Calling RapidMiner functions from programs written in other languages/systems (e.g. Perl).

Features:

  • Broad collection of data mining algorithms such as decision trees and self-organization maps.
  • Overlapping histograms, tree charts and 3D scatter plots.
  • Many varied plugins, such as a text plugin for doing text analysis.

Applications [link]

RapidMiner can be used for text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. RapidMiner was rated as the fifth most used text mining software (6%) by Rexer's Annual Data Miner Survey in 2010.[3]

RapidMiner is found in the: electronics industry, energy industry, automobile industry, commerce, aviation, telecommunications, banking and insurance, production, IT industry, market research, pharmaceutical industry and other fields.

Properties [link]

Some properties of RapidMiner are:

  • written in Java
  • knowledge discovery processes are modeled as operator trees
  • internal XML representation ensures standardized interchange format of data mining experiments
  • scripting language allows for automatic large-scale experiments
  • multi-layered data view concept ensures efficient and transparent data handling
  • graphical user interface, command line mode (batch mode), and Java API for using RapidMiner from other programs
  • plugin and extension mechanisms, several plugins already exist
  • plotting facility offering a large set of high-dimensional visualization schemes for data and models
  • applications include text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.

GUI [link]

RapidMiner provides a GUI to design an analytical pipeline (the "operator tree"). The GUI generates an XML (eXtensible Markup Language) file that defines the analytical processes the user wishes to apply to the data. This file is then read by RapidMiner to run the analyses automatically.

While these are running the GUI can also be used to interactively control and inspect running processes.

Other uses can involve calling RapidMiner from other programs and processes, for example from a Perl program. The Java application programming interface (API) provides clear interfaces for applying operators individually, i.e. there is no need to create an operator tree, providing the ability to bypass the GUI and control analytical processes directly. Individual RapidMiner functions can be called directly from the command line.

Software Versions [link]

RapidMiner is open-source and is offered free of charge as a Community Edition released under the GNU AGPL.[4] There is also an Enterprise Edition offered under a proprietary commercial license, to allow integration into closed-source solutions.[5]

Extensions [link]

The Rapidminer can be extended with additional plugins. The program suite contains around 15 extensions which advance its applicability to: text mining, image processing, time series processing, web mining, statistics, visualization, semantics, paralleling of computation process, automatic process design (PaREn Automatic System Construction Wizard) and others.

Several of the extensions can be found directly in the application in an extension manager. The other extensions can be downloaded from their respective developers.

See also [link]

  • Weka - machine learning algorithms that can be integrated into RapidMiner
  • R-Project - statistical framework that can be integrated into RapidMiner

References [link]

  • Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz, and Timm Euler: YALE: Rapid Prototyping for Complex Data Mining Tasks, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.

External links [link]


https://wn.com/RapidMiner

Podcasts:

PLAYLIST TIME:
×