1-Introduction To Data Mining-13-12-2024
1-Introduction To Data Mining-13-12-2024
Alternative names
Knowledge discovery (mining) in databases
(KDD), knowledge extraction, data/pattern
analysis, data archeology, data dredging,
information harvesting, business intelligence,
etc.
Task-relevant Data
Data Selection
Warehouse
Data Cleaning
Data Integration
Databases
March 20, 2025 SWE2009 - Data Mining 13
KDD Process - Steps
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be
combined)
3. Data selection (where data relevant to the analysis
task are retrieved from the database)
4. Data transformation (where data are transformed or
consolidated into forms appropriate for mining by
performing summary or aggregation operations)
5. Data mining (an essential process where intelligent
methods are applied in order to extract data patterns)
6. Pattern evaluation (to identify the truly interesting
patterns representing knowledge based on some
interestingness measures)
7. Knowledge presentation (where visualization and
knowledge representation techniques are used to
present the mined knowledge to the user)
knowledge can include concept hierarchies,
used to organize attributes or attribute values
into different levels of abstraction.
Knowledge such as user beliefs, which can be
used to assess a pattern’s interestingness based
on its unexpectedness, may also be included.
Data Exploration
Statistical Summary, Querying, and Reporting
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Other
Algorithm Disciplines
Descriptive Methods
finding human-interpretable patterns
describing the data
Unique features:
huge or possibly infinite volume
dynamically changing
flowing in and out in a fixed order
allowing only one or a small number of scans
demanding fast (often real-time) response time.