Oracle Data Mining
Oracle Data Mining
Oracle Data Mining (ODM) is an option of Oracle Database Enterprise Edition. It contains several data
mining and data analysis algorithms for classification, prediction, regression, associations, feature selection,
anomaly detection, feature extraction, and specialized analytics. It provides means for the creation,
management and operational deployment of data mining models inside the database environment.
In data mining, the process of using a model to derive predictions or descriptions of behavior that is yet to
occur is called "scoring". In traditional analytic workbenches, a model built in the analytic engine has to be
deployed in a mission-critical system to score new data, or the data is moved from relational tables into the
analytical workbench - most workbenches offer proprietary scoring interfaces. ODM simplifies model
deployment by offering Oracle SQL functions to score data stored right in the database. This way, the
user/application-developer can leverage the full power of Oracle SQL - in terms of the ability to pipeline
and manipulate the results over several levels, and in terms of parallelizing and partitioning data access for
performance.
Models can be created and managed by one of several means. Oracle Data Miner provides a graphical user
interface that steps the user through the process of creating, testing, and applying models (e.g. along the
lines of the CRISP-DM methodology). Application- and tools-developers can embed predictive and
descriptive mining capabilities using PL/SQL or Java APIs. Business analysts can quickly experiment with,
or demonstrate the power of, predictive analytics using Oracle Spreadsheet Add-In for Predictive Analytics,
a dedicated Microsoft Excel adaptor interface. ODM offers a choice of well-known machine learning
approaches such as Decision Trees, Naive Bayes, Support vector machines, Generalized linear model
(GLM) for predictive mining, Association rules, K-means and Orthogonal Partitioning[1][2] Clustering, and
Non-negative matrix factorization for descriptive mining. A minimum description length based technique to
grade the relative importance of input mining attributes for a given problem is also provided. Most Oracle
Data Mining functions also allow text mining by accepting text (unstructured data) attributes as input. Users
do not need to configure text-mining options - the Database_options database option handles this behind
the scenes.
History
Oracle Data Mining was first introduced in 2002 and its releases are named according to the corresponding
Oracle database release:
Oracle Data Mining is a logical successor of the Darwin data mining toolset developed by Thinking
Machines Corporation in the mid-1990s and later distributed by Oracle after its acquisition of Thinking
Machines in 1999. However, the product itself is a complete redesign and rewrite from ground-up - while
Darwin was a classic GUI-based analytical workbench, ODM offers a data mining
development/deployment platform integrated into the Oracle database, along with the Oracle Data Miner
GUI.
The Oracle Data Miner 11gR2 New Workflow GUI was previewed at Oracle Open World 2009. An
updated Oracle Data Miner GUI was released in 2012. It is free, and is available as an extension to Oracle
SQL Developer 3.1 .
Functionality
As of release 11gR1 Oracle Data Mining contains the following data mining functions:
Oracle Data Mining distinguishes numerical, categorical, and unstructured (text) attributes. The product
also provides utilities for data preparation steps prior to model building such as outlier treatment,
discretization, normalization and binning (sorting in general speak)
From version 11.2 of the Oracle database, Oracle Data Miner integrates with Oracle SQL Developer.[3]
BEGIN
DBMS_DATA_MINING.CREATE_MODEL (
model_name => 'credit_risk_model',
function => DBMS_DATA_MINING.classification,
data_table_name => 'credit_card_data',
case_id_column_name => 'customer_id',
target_column_name => 'credit_risk',
settings_table_name => 'credit_risk_model_settings');
END;
where 'credit_risk_model' is the model name, built for the express purpose of classifying future customers'
'credit_risk', based on training data provided in the table 'credit_card_data', each case distinguished by a
unique 'customer_id', with the rest of the model parameters specified through the table
'credit_risk_model_settings'.
Oracle Data Mining also supports a Java API consistent with the Java Data Mining (JDM) standard for data
mining (JSR-73) for enabling integration with web and Java EE applications and to facilitate portability
across platforms.
SELECT customer_name
FROM credit_card_data
WHERE PREDICTION (credit_risk_model USING *) = 'LOW' AND customer_value = 'HIGH';
PMML
In Release 11gR2 (11.2.0.2), ODM supports the import of externally created PMML for some of the data
mining models. PMML is an XML-based standard for representing data mining models.
See also
Oracle LogMiner - in contrast to generic data mining, targets the extraction of information
from the internal logs of an Oracle database
References
1. US patent 7174344 (https://worldwide.espacenet.com/textdoc?DB=EPODOC&IDX=US7174
344), Campos, Marcos M. & Milenova, Boriana L., "Orthogonal partitioning clustering",
issued 2007-02-06, assigned to Oracle International Corporation
2. Boriana L. Milenova and Marcos M. Campos (2002); O-Cluster: Scalable Clustering of Large
High Dimensional Data Sets (http://dl.acm.org/citation.cfm?id=844729), ICDM '02
Proceedings of the 2002 IEEE International Conference on Data Mining, pages 290-297,
ISBN 0-7695-1754-4.
3. "Oracle Data Miner" (http://www.oracle.com/technetwork/database/options/advanced-analyti
cs/odm/dataminerworkflow-168677.html?msgid=3-10214982236). Oracle technology
Network. Oracle Corporation. 2014. Retrieved 2014-07-17. "The Oracle Data Miner is an
Oracle SQL Developer extension that enables data analysts to work directly with data inside
the database, explore the data graphically, build and evaluate multiple data mining models,
apply Oracle Data Mining models to new data and deploy Oracle Data Mining's predictions
and insights throughout the enterprise. [...] Oracle Data Miner is comprised of three
components: Oracle Database 12c or Oracle Database 11g Release 2 SQL Developer
(client) which bundles the Oracle Data Miner work flow GUI Data Miner Repository -
installed in the Oracle Database"
External links
Oracle Data Mining at Oracle Technology Network (http://www.oracle.com/technology/produ
cts/bi/odm/index.html).
Oracle Data Mining Blog (http://blogs.oracle.com/datamining/).
Oracle Database 11g at Oracle Technology Network (http://www.oracle.com/technology/prod
ucts/database/oracle11g/index.html).
Oracle Data Mining and Analytics Blog (http://oracledmt.blogspot.com/).
Oracle Wiki for Data Mining (http://wiki.oracle.com/page/Oracle+Data+Mining).
Oracle Data Mining RSS Feed (http://www.oracle.com/technology/syndication/rss_otn_odm.
xml).
Oracle Data Mining at Oracle Technology Network (http://www.oracle.com/technology/produ
cts/bi/odm/index.html).
Oracle Data Mining related blog by Brendan Tierney (Oracle ACE Director) (http://www.oralyt
ics.com).
Oracle Data Mining Examples (on Panoply Technology) (https://panoplytechen.wordpress.c
om/tag/odm/).