Data Mining and Machine Learning Notes by Niraj
Data Mining and Machine Learning Notes by Niraj
Data Mining and Machine Learning Notes by Niraj
Task-relevant Data
Data Cleaning
Data Integration
Databases
February 15, 2024 Data Mining: Concepts and Techniques 6
Data Mining and Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Other
Algorithm Disciplines
General functionality
Descriptive data mining
Predictive data mining
Different views lead to different classifications
Data view: Kinds of data to be mined
Knowledge view: Kinds of knowledge to be discovered
Method view: Kinds of techniques utilized
Application view: Kinds of applications adapted
Outlier analysis
Outlier: Data object that does not comply with the general behavior
of the data
Noise or exception? Useful in fraud detection, rare events analysis
Periodicity analysis
Similarity-based analysis
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data
Warehouse Server
Task-relevant data
Database or data warehouse name
Database tables or data warehouse cubes
Condition for data selection
Relevant attributes or dimensions
Data grouping criteria
Type of knowledge to be mined
Characterization, discrimination, association, classification,
prediction, clustering, outlier analysis, other data mining tasks
Background knowledge
Pattern interestingness measurements
Visualization/presentation of discovered patterns
February 15, 2024 Data Mining: Concepts and Techniques 29
Primitive 3: Background Knowledge
Simplicity
e.g., (association) rule length, (decision) tree size
Certainty
e.g., confidence, P(A|B) = #(A and B)/ #(B), classification
reliability or accuracy, certainty factor, rule strength, rule quality,
discriminating weight, etc.
Utility
potential usefulness, e.g., support (association), noise threshold
(description)
Novelty
not previously known, surprising (used to remove redundant
rules, e.g., Illinois vs. Champaign rule implication support ratio)