Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
January 16, 2022 Data Mining: Concepts and Techniques 12
Knowledge Discovery (KDD) Process
data mining as an essential step in the process of knowledge discovery. Here is
the list of steps involved in the knowledge discovery process:
Data Cleaning: In this step, the noise and inconsistent data is removed.
Data Integration: In this step, multiple data sources are combined.
Data Selection: In this step, data relevant to the analysis task are
retrieved from the database.
Data Transformation: In this step, data is transformed or consolidated
into forms appropriate for mining by performing summary or aggregation
operations.
Data Mining: In this step, intelligent methods are applied in order to
extract data patterns.
Pattern Evaluation: In this step, data patterns are evaluated.
Knowledge Presentation: In this step, knowledge is represented.
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Other
Algorithm Disciplines
General functionality
Descriptive data mining (Find human interrupt-able pattern that is
describe the data)
Outlier analysis
Outlier: Data object that does not comply with the general behavior
of the data
Noise or exception? Useful in fraud detection, rare events analysis
Periodicity analysis
Similarity-based analysis
practices of Knowledge
Data Mining and Knowledge
Discovery and Data Mining Discovery (DAMI or DMKD)
(PKDD) IEEE Trans. On Knowledge
Pacific-Asia Conf. on and Data Eng. (TKDE)
Knowledge Discovery and Data KDD Explorations
Mining (PAKDD) ACM Trans. on KDD
January 16, 2022 Data Mining: Concepts and Techniques 28
Where to Find References? DBLP, CiteSeer, Google
Task-relevant data
Database or data warehouse name
Database tables or data warehouse cubes
Condition for data selection
Relevant attributes or dimensions
Data grouping criteria
Type of knowledge to be mined
Characterization, discrimination, association, classification,
prediction, clustering, outlier analysis, other data mining tasks
Background knowledge
Pattern interestingness measurements
Visualization/presentation of discovered patterns
January 16, 2022 Data Mining: Concepts and Techniques 44
Primitive 3: Background Knowledge
Simplicity
e.g., (association) rule length, (decision) tree size
Certainty
e.g., confidence, P(A|B) = #(A and B)/ #(B), classification
reliability or accuracy, certainty factor, rule strength, rule quality,
discriminating weight, etc.
Utility
potential usefulness, e.g., support (association), noise threshold
(description)
Novelty
not previously known, surprising (used to remove redundant
rules, e.g., Illinois vs. Champaign rule implication support ratio)
Motivation
A DMQL can provide the ability to support ad-hoc and
interactive data mining
By providing a standardized language like SQL
Hope to achieve a similar effect like that SQL has on
relational database
Foundation for system development and evolution
Facilitate information exchange, technology transfer,
commercialization and wide acceptance
Design
DMQL is designed with the primitives described earlier
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data
Warehouse Server