Unit 1
Unit 1
Unit – 1
Data Mining
– Data mining—core of
Pattern Evaluation
knowledge discovery
process
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
Difference between KDD and Data Mining
• Although the two terms KDD and Data Mining are heavily used interchangeably,
they refer to two related yet slightly different concepts.
• KDD is the overall process of extracting knowledge from data, while Data Mining
is a step inside the KDD process, which deals with identifying patterns in data.
• And Data Mining is only the application of a specific algorithm based on the
overall goal of the KDD process.
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data Warehouse
Server
Database
Technology Statistics
Machine Visualization
Data Mining
Learning
Pattern
Recognition Other
Algorithm Disciplines
Data Mining: On What Kinds of Data?
• Cluster analysis
– Class label is unknown: Group data to form new classes, e.g., cluster houses to find
distribution patterns
– Maximizing intra-class similarity & minimizing interclass similarity
• Outlier analysis
– Outlier: Data object that does not comply with the general behavior of the data
– Noise or exception? Useful in fraud detection, rare events analysis
• Trend and evolution analysis
– Trend and deviation: e.g., regression analysis
– Sequential pattern mining: e.g., digital camera large SD memory
– Periodicity analysis
– Similarity-based analysis
• Other pattern-directed or statistical analyses
Data Mining - Issues
Data Mining - Issues
• Mining methodology
– Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web
– Performance: efficiency, effectiveness, and scalability
– Pattern evaluation: the interestingness problem
– Incorporation of background knowledge
– Handling noise and incomplete data
– Parallel, distributed and incremental mining methods
– Integration of the discovered knowledge with existing one: knowledge fusion
• User interaction
– Data mining query languages and ad-hoc mining
– Expression and visualization of data mining results
– Interactive mining of knowledge at multiple levels of abstraction
• Applications and social impacts
– Domain-specific data mining & invisible data mining
– Protection of data security, integrity, and privacy
Data Mining Applications