![]() |
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (February 2011) |
![]() |
This article may require copy editing for grammar, style, cohesion, tone, or spelling. You can assist by editing it. (July 2011) |
Developer(s) | rapid-i.com |
---|---|
Stable release | 5.2 / 1 February 2012 |
Operating system | Cross-platform |
Type | Artificial Intelligence |
License | AGPL |
Website | sourceforge.net/projects/rapidminer |
RapidMiner, formerly YALE (Yet Another Learning Environment), is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications. In a poll by KDnuggets, a data-mining newspaper, RapidMiner ranked second in data mining/analytic tools used for real projects in 2009[1] and was first in 2010.[2] It is distributed under the AGPL open source license and has been hosted by SourceForge since 2004.
The RapidMiner project was started in 2001 by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer at the Artificial Intelligence Unit of the University of Dortmund. In 2006 Ingo Mierswa and Ralf Klinkenberg founded the company Rapid-I that is now the main contributor out of more than 30 international developers further developing RapidMiner.
Contents |
RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, modelling, evaluation, and deployment. The data mining processes can be made up of arbitrarily nestable operators, described in XML files and created in RapidMiner's graphical user interface (GUI). RapidMiner is written in the Java programming language. It also integrates learning schemes and attribute evaluators of the Weka machine learning environment and statistical modelling schemes of the R-Project.
The Community Edition of RapidMiner is a toolkit for data mining. It is able to define analytical steps (similar to R), and in generating graphs like MS Excel. It is also used for analyzing data generated by high-throughput instruments used in processes such as genotyping, proteomics, and mass spectrometry.
Example applications:
Features:
RapidMiner can be used for text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. RapidMiner was rated as the fifth most used text mining software (6%) by Rexer's Annual Data Miner Survey in 2010.[3]
RapidMiner is found in the: electronics industry, energy industry, automobile industry, commerce, aviation, telecommunications, banking and insurance, production, IT industry, market research, pharmaceutical industry and other fields.
Some properties of RapidMiner are:
RapidMiner provides a GUI to design an analytical pipeline (the "operator tree"). The GUI generates an XML (eXtensible Markup Language) file that defines the analytical processes the user wishes to apply to the data. This file is then read by RapidMiner to run the analyses automatically.
While these are running the GUI can also be used to interactively control and inspect running processes.
Other uses can involve calling RapidMiner from other programs and processes, for example from a Perl program. The Java application programming interface (API) provides clear interfaces for applying operators individually, i.e. there is no need to create an operator tree, providing the ability to bypass the GUI and control analytical processes directly. Individual RapidMiner functions can be called directly from the command line.
RapidMiner is open-source and is offered free of charge as a Community Edition released under the GNU AGPL.[4] There is also an Enterprise Edition offered under a proprietary commercial license, to allow integration into closed-source solutions.[5]
The Rapidminer can be extended with additional plugins. The program suite contains around 15 extensions which advance its applicability to: text mining, image processing, time series processing, web mining, statistics, visualization, semantics, paralleling of computation process, automatic process design (PaREn Automatic System Construction Wizard) and others.
Several of the extensions can be found directly in the application in an extension manager. The other extensions can be downloaded from their respective developers.