Micro Oledb
Micro Oledb
Introduction
Overview and design philosophy Basic components
Industry standard is critical for data mining development, usage, interoperability, and exchange OLEDB for DM is a natural evolution from OLEDB and OLDB for OLAP Building mining applications over relational databases is nontrivial
Goal: ease the burden of developing mining applications in large relational databases
Data Mining: Concepts and Techniques 3
Generating data mining models Store, maintain and refresh models as data is updated Programmatically use the model on other data set Browse models
Overview
Define a mining model Attributes to be predicted Attributes to be used for prediction Algorithm used to build the model Populate a mining model from training data Predict attributes for new data Browse a mining model fro reporting and visualization
Data Mining: Concepts and Techniques 7
Create a data mining module object CREATE MINING MODEL [model_name] Insert training data into the model and train it INSERT INTO [model_name] Use the data mining model SELECT relation_name.[id], [model_name].[predict_attr] consult DMM content in order to make predictions and browse statistics obtained by the model Using DELETE to empty/reset Predictions on datasets: prediction join between a model and a data set (tables) Deploy DMM by just writing SQL queries!
Data Mining: Concepts and Techniques 8
Product Purchases
Customer ID Product Name Quantity
Age Prob
Product Type
CID Gend
Male Male
Hair
Black Black
Age
35 35
Age prob
100% 100%
Prod
TV VCR
Quan
1 1
Type
Elec Elec
Car
Car Car
Car prob
100% 100%
Car Owernership
Customer ID
Car Car Prob
1 1
1
1 1 1
Male
Male Male Male
Black
Black Black Black
35
35 35 35
100%
100% 100% 100%
Ham
TV VCR Ham
6
1 1 6
Food
Elec Elec Food
Car
Van Van Van
100%
50% 50% 50%
10
Quan
1 1 6
Type
Elec Elec Food
Car prob
100% 50%
11
12
13
Example
CREATE MINING MODEL [Age Prediction] %Name of Model ( [Customer ID] LONG KEY, %source column [Gender] TEXT DISCRETE, %source column [Age] Double DISCRETIZED() PREDICT, %prediction column [Product Purchases] TABLE %source column ( [Product Name] TEXT KEY, %source column [Quantity] DOUBLE NORMAL CONTINUOUS, %source column [Product Type] TEXT DISCRETE RELATED TO [Product Name] %source column )) USING [Decision_Trees_101] %Mining algorithm used
January 14, 2014 Data Mining: Concepts and Techniques 14
Column Specifiers
KEY ATTRIBUTE RELATION (RELATED TO clause) QUALIFIER (OF clause) PROBABILITY: [0, 1] VARIANCE SUPPORT PROBABILITY-VARIANCE ORDER TABLE
Data Mining: Concepts and Techniques 15
Attribute Types
DISCRETE
ORDERED CYCLICAL CONTINOUS DISCRETIZED SEQUENCE_TIME
16
Populating A DMM
Use INSERT INTO statement
Consuming a case using the data mining model Use SHAPE statement to create the nested table from the input data
17
Prediction join Prediction on dataset D using DMM M Different to equi-join DMM: a truth table SELECT statement associated with PREDICTION JOIN specifies values extracted from DMM
19
Browsing DMM
What is in a DMM?
Browsing DMM
Visualization
21
Concluding Remarks
OLE DB for DM integrates data mining and database systems A good standard for mining application builders How can we be involved? Provide association/sequential pattern mining modules for OLE DB for DM? Design more concrete language primitives? References http://www.microsoft.com/data.oledb/d m.html
Data Mining: Concepts and Techniques 22