CITS4243: Advanced Databases: Datta@csse - Uwa.edu - Au
CITS4243: Advanced Databases: Datta@csse - Uwa.edu - Au
CITS4243: Advanced Databases: Datta@csse - Uwa.edu - Au
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
December 8, 2021 Data Mining: Concepts and Techniques 11
Data Mining and Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Other
Algorithm Disciplines
General functionality
Descriptive data mining
Predictive data mining
Different views lead to different classifications
Data view: Kinds of data to be mined
Knowledge view: Kinds of knowledge to be discovered
Method view: Kinds of techniques utilized
Application view: Kinds of applications adapted
Outlier analysis
Outlier: Data object that does not comply with the general behavior
of the data
Noise or exception? Useful in fraud detection, rare events analysis
Periodicity analysis
Similarity-based analysis
practices of Knowledge
Data Mining and Knowledge
Discovery and Data Mining Discovery (DAMI or DMKD)
(PKDD) IEEE Trans. On Knowledge
Pacific-Asia Conf. on and Data Eng. (TKDE)
Knowledge Discovery and Data KDD Explorations
Mining (PAKDD) ACM Trans. on KDD
December 8, 2021 Data Mining: Concepts and Techniques 22
Where to Find References? DBLP, CiteSeer, Google
Task-relevant data
Database or data warehouse name
Database tables or data warehouse cubes
Condition for data selection
Relevant attributes or dimensions
Data grouping criteria
Type of knowledge to be mined
Characterization, discrimination, association, classification,
prediction, clustering, outlier analysis, other data mining tasks
Background knowledge
Pattern interestingness measurements
Visualization/presentation of discovered patterns
December 8, 2021 Data Mining: Concepts and Techniques 42
Primitive 3: Background Knowledge
Simplicity
e.g., (association) rule length, (decision) tree size
Certainty
e.g., confidence, P(A|B) = #(A and B)/ #(B), classification
reliability or accuracy, certainty factor, rule strength, rule quality,
discriminating weight, etc.
Utility
potential usefulness, e.g., support (association), noise threshold
(description)
Novelty
not previously known, surprising (used to remove redundant
rules)
Motivation
A DMQL can provide the ability to support ad-hoc and
interactive data mining
By providing a standardized language like SQL
Hope to achieve a similar effect like that SQL has on
relational database
Foundation for system development and evolution
Facilitate information exchange, technology transfer,
commercialization and wide acceptance
Design
DMQL is designed with the primitives described earlier
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data
Warehouse Server