Cti Oracle Data Mining
Cti Oracle Data Mining
Aljon E. Antiola
Technological Institute of the Philippines College of Information Technology Education A Research Requirement in CTI IT22FB5
INTRODUCTION
Too much data and not enough information this is a problem facing many businesses and industries. Most businesses have an enormous amount of data, with a great deal of information hiding within it, but "hiding" is usually exactly what it is doing: So much data exists that it overwhelms traditional methods of data analysis. Data mining provides a way to get at the information buried in the data. Data mining creates models to find hidden patterns in large, complex collections of data, patterns that sometimes elude traditional statistical approaches to analysis because of the large number of attributes, the complexity of patterns, or the difficulty in performing the analysis.
Making the entire data mining process work in a reproducible and reliable way is challenging; it may involve automation and transfers across servers, data repositories, applications, and tools. For example, some data mining tools require that data be exported from the corporate database and converted to the data mining tool's format; data mining results must be imported into the database. Removing or reducing these obstacles can enable data mining to be utilized more frequently to extract more valuable information and, in many cases, to make a significant impact on the bottom-line of an enterprise. Data mining in the database makes the data movement required by tools that do not operate in the database unnecessary and make it much easier to mine up-to-date data. Also, the less data movement, the less time the entire data mining process takes. Data movement can make data insecure. If data never leaves the database, database security protects the data. In summary, data mining in the database provides the following benefits:
BACKGROUND
Data Mining in Database
Data mining projects usually require a significant amount of data collection and data processing before and after model building. Data tables are created by combining many different types and sources of information. Real-world data is often dirty, that is, includes wrong or missing values; data must often be cleaned before it can be used. Data is filtered, normalized, sampled, transformed in various ways, and eventually used as input to data mining algorithms. Up to 80% of the effort in a data mining project is often devoted to data preparation. When the data is stored as a table in a database, data preparation can be performed using database facilities. Data mining models have to be built, tested, validated, managed, and deployed in their appropriate application domain environments. The data mining results may need to be post-processed as part of domain specific computations (for example, calculating estimated risks, expected utilities, and response probabilities) and then stored into permanent databases or data warehouses.
interface and explore their data to find patterns, relationships, and hidden insights. Oracle Data Mining provides a collection of in-database data mining algorithms that solve a wide range of business problems. Anyone who can access data stored in an Oracle Database can access Oracle Data Mining results-predictions, recommendations, and discoveries using Oracle Business Intelligence Solutions.
HISTORY
Oracle Data Mining was first introduced in 2002 and its releases are named according to the corresponding Oracle database release: Oracle Data Mining 9iR2 (9.2.0.1.0 May 2002) Oracle Data Mining 10gR1 (10.1.0.2.0 - February 2004) Oracle Data Mining 10gR2 (10.2.0.1.0 - July 2005) Oracle Data Mining 11gR1 (11.1 September 2007) Oracle Data Mining 11gR2 (11.2 September 2009) Oracle Data Mining is a logical successor of the Darwin data mining toolset developed by Thinking Machines Corporation in the mid-1990s and later distributed by Oracle after its acquisition of Thinking Machines in 1999. However, the product itself is a complete redesign and rewrite from ground-up - while Darwin was a classic GUIbased analytical workbench, ODM offers a data mining development/deployment platform integrated into the Oracle database, along with the GUI.
Generalized linear model (GLM) for Logistic regression. Support Vector Machine (SVM). Decision Trees (DT). Anomaly detection. One-class Support Vector Machine (SVM). Regression Support Vector Machine (SVM). Generalized linear model (GLM) for Multiple regression Clustering: Enhanced k-means (EKM). Orthogonal Partitioning Clustering (O-Cluster). Association rule learning: Itemsets and association rules (AM). Feature extraction. Non-negative matrix factorization (NMF). Text and spatial mining: Combined text and non-text columns of input data. Spatial/GIS data.
AND
DATA
FUNCTIONALITY
As of release 11gR1 Oracle Data Mining contains the following data mining functions: Data transformation and model analysis: Data sampling, binning, discr etization, and other data transformations. Model exploration, evaluation and analysis. Feature selection (Attribute Importance). Minimum description length (MDL). Classification. Naive Bayes (NB).
Most Oracle Data Mining functions accept as input one relational table or view. Flat data can be combined with transactional data through the use of nested columns, enabling mining of data involving one-to-many relationships (e.g. a star schema). The full functionality of SQL can be used when preparing data for data mining, including dates and spatial data. Oracle Data Mining distinguishes numerical, categorical, and unstructured (text) attributes. The product also provides utilities for data preparation steps prior to model building such as outliertreatment, discretization, normalization and binning (sorting in general speak)
model parameters. The user interface also allows the automated generation of Java and/or SQL code associated with the data mining activities. The Java Code Generator is an extension to Oracle JDeveloper. There is also an independent interface: the Spreadsheet Add-In for Predictive Analytics which enables access to the Oracle Data Mining Predictive Analytics PL/SQL package from Microsoft Excel.
case_id_column_name => 'customer_id', target_column_name => 'credit_risk', settings_table_name => 'credit_risk_model_settings'); END; where 'credit_risk_model' is the model name, built for the express purpose of classifying future customers' 'credit_risk', based on training data provided in the table 'credit_card_data', each case distinguished by a unique 'customer_id', with the rest of the model parameters specified through the table
'credit_risk_model_settings'. Oracle Data Miner work flows capture, document and automate the in-database predictive analytics process. Oracle Data Mining also supports a Java API consistent with the Java Data Mining (JDM) standard for data mining (JSR-73) for enabling integration with web and Java EE applications and to facilitate portability across platforms.
column feature selection. The new 11g feature PROFILE finds customer segments and their profiles, given a target attribute. These operations can be used as part of an operational pipeline providing actionable results or displayed for interpretation by end users.
REFERENCES
http://docs.oracle.com/html/B14339_01/1intro.htm http://www.enotes.com/topic/Oracle_Data_Mining#History http://www.oracle.com/technetwork/database/options/odm/i ndex.html
PMML
In Release 11gR2 (11.2.0.2), ODM supports the import of externally-created PMML for some of the data mining models. PMML is an XML-based standard for representing data mining models.