CS201 17data Mining
CS201 17data Mining
Data Mining
09/15/15
Objectives
Purpose of online analytical processing (OLAP)
OLAP applications.
09/15/15
Objectives
OLAP extensions to SQL.
Concepts associated with data mining.
Main data mining operations including
data warehousing.
09/15/15
Acknowledgments
These slides have been adapted from Thomas
09/15/15
09/15/15
Introducing OLAP
The dynamic synthesis, analysis, and
09/15/15
Introducing OLAP
Enables users to gain a deeper
09/15/15
Introducing OLAP
Can easily answer who? and what?
09/15/15
OLAP Applications
Just-In-Time (JIT) information is
09/15/15
Examples of OLAP
Applications in
Various Functional
Areas
09/15/15
10
OLAP Applications
AlthoughOLAPapplicationsarefoundinwidely
divergentfunctionalareas,allhavefollowingkey
features:
multidimensionalviewsofdata;
supportforcomplexcalculations;
time intelligence.
09/15/15
11
Representing MultiDimensional
Data
Example of two-dimensional
query.
What is the total revenue generated by property
09/15/15
12
Multi-Dimensional Data as
Three-Field Table versus
Two-Dimensional Matrix
09/15/15
13
Representing MultiDimensional
Dataquery.
Example of three-dimensional
What is the total revenue generated
09/15/15
14
Multi-Dimensional
Data as Four-Field
Table versus ThreeDimensional Cube
09/15/15
15
Representing MultiDimensional
Data
Cube represents data as
cells in an
array.
Relational table only represents multi-
09/15/15
16
Multi-Dimensional
OLAP
Servers
Use multi-dimensional
structures to store
data and relationships between data.
dimensions.
09/15/15
17
Multi-Dimensional
A cube supports
matrix arithmetic.
OLAP
Servers
Multi-dimensional query response
09/15/15
18
Multi-Dimensional
However, majority of multi-dimensional
OLAP
Servers
queries use summarized, high-level data.
Solution is to pre-aggregate
09/15/15
19
Multi-Dimensional
OLAP
Servers
Predefined
hierarchy allows logical
pre-aggregation and, conversely,
allows for a logical drill-down.
Supports common analytical
operations
Consolidation.
Drill-down.
Slicing and dicing.
09/15/15
20
Multi-Dimensional OLAP
Consolidation - aggregation of data such
Servers
as simple roll-ups or complex
expressions involving inter-related data.
09/15/15
21
09/15/15
22
Multi-Dimensional OLAP
Ability to omit empty or repetitive
Servers
cells can greatly reduce the size of the
cube and the amount of processing.
Allows analysis of exceptionally large
amounts of data.
09/15/15
23
Multi-Dimensional OLAP
In summary, pre-aggregation, dimensional
Servers
hierarchy, and sparse data management
can significantly reduce the size of the
cube and the need to calculate values onthe-fly.
Removes need for multi-table joins and
09/15/15
24
OLAP Extensions to
SQL promoted as easy to learn, nonSQL
procedural, free-format, DBMS-
09/15/15
25
OLAP Extensions to
Many database vendors including IBM,
SQL
Oracle, Informix, and Red Brick Systems
have already implemented portions of
specifications in their DBMSs.
Red Brick Systems was first to
09/15/15
26
OLAP Extensions to
SQL
- RISQL
Designed
for business analysts.
Set of extensions that augments SQL
09/15/15
27
09/15/15
28
SELECTmonth,monthlySales,
MOVINGAVG(monthlySales)AS3MonthMovingAvg,
MOVINGSUM(monthlySales)AS3MonthMovingSum
FROMBranchSales
WHEREbranchNo=B003;
09/15/15
29
Data
Mining
previously unknown, comprehensible,
The process of extracting valid,
09/15/15
30
Data
Mining
unexpected, as little value in finding patterns
Reveals information that is hidden and
09/15/15
31
Data
Mining
representation of structure of sample data,
Starts by developing an optimal
09/15/15
32
Examples of
Applications of Data
Mining
Retail / Marketing
Identifying buying patterns of
customers.
Finding associations among customer
demographic characteristics.
Predicting response to mailing
campaigns.
Market basket analysis.
09/15/15
33
Examples of
Banking
Applications
of
Data
Detecting patterns of fraudulent credit
card use.
Mining
Identifying loyal customers.
09/15/15
34
Examples of
Applications of Data
Mining
Insurance
Claims analysis.
Predicting which customers will buy
new policies.
Medicine
Characterizing patient behavior to
predict surgery visits.
Identifying successful medical
therapies for different illnesses.
09/15/15
35
Data Mining
Operations
database segmentation.
09/15/15
36
Data
Mining
implementations of the data mining
operations.
Techniques
Techniques are specific
and weaknesses.
09/15/15
37
Data Mining
Criteria for selection of tool includes
Techniques
Suitability for certain input data types.
Transparency of the mining output.
Tolerance of missing variable values.
Level of accuracy possible.
Ability to handle large volumes of data.
09/15/15
38
Data Mining
Operations and
Associated Techniques
09/15/15
39
Predictive Modeling
09/15/15
40
Predictive
Modeling
Model is developed using a supervised
learning approach, which has two
phases: training and testing.
Training builds a model using a large
09/15/15
41
Predictive
Modeling
Applications of predictive modeling
include customer retention
management, credit approval, cross
selling, and direct marketing.
Two techniques associated with
09/15/15
42
Predictive Modeling
Used to establish a specific
- Classification
predetermined class for each record in
a database from a finite set of
possible class values.
Two specializations of classification:
09/15/15
43
Example of
Classification using
Tree Induction
09/15/15
44
Example of
Classification using
Neural Induction
09/15/15
45
Predictive Modeling
Used to estimate a continuous numeric
-value
Value
Prediction
that is associated
with a
database record.
09/15/15
46
Predictive Modeling
Linear regression attempts to fit a straight
-lineValue
Prediction
through a plot
of the data, such that
the line is the best representation of the
average of all observations at that point in
the plot.
Problem is that the technique only works
09/15/15
47
Predictive Modeling
Although nonlinear regression avoids
-theValue
Prediction
main problems of linear regression,
still not flexible enough to handle all
possible shapes of the data plot.
Statistical measurements are fine for
09/15/15
48
Predictive Modeling
Data mining requires statistical
-methods
Value
Prediction
that can accommodate nonlinearity, outliers, and non-numeric
data.
Applications of value prediction
09/15/15
49
Database
Aim is to partition a database into an
Segmentation
unknown number of segments, or
clusters, of similar records.
09/15/15
50
Database
Less precise than other operations thus less
sensitive
to redundant and irrelevant
Segmentation
features.
Sensitivity can be reduced by ignoring a
09/15/15
51
Example of Database
Segmentation using a
Scatterplot
09/15/15
52
between records.
Presentation of the resulting segments
for analysis.
09/15/15
53
Link Analysis
09/15/15
54
09/15/15
55
Link
Analysis
Finds patterns between events such
that the presence ofPattern
one set of items
Sequential
is followed by another set of items in
Discovery
a database of events over a period of
time.
09/15/15
56
09/15/15
57
Deviation
Detection
commercially available data mining
tools.
09/15/15
58
Deviation
Detection
09/15/15
59
Example of Database
Segmentation using a
Visualization
09/15/15
60
There are
a growing number
of
Data
Mining
Tools
commercial data mining tools on the
marketplace.
09/15/15
61
09/15/15
62
09/15/15
63
09/15/15
64
09/15/15
65