1 - 1 Intro To Data Mining - ch1
1 - 1 Intro To Data Mining - ch1
There is an inherent meaning in everything. “Signs for people who can see.”
agenda
Course Introduction
Course Details
• Reference book:
• Elements of Statistical Learning by Hastie,
Tibshirani and Friedman
• Freely available online (google for it)
Course Requirement
• You should have some knowledge of the
concepts and terminology associated with
• database systems,
• statistics,
• machine learning.
Database
Technology Statistics
Machine Visualization
Learning Data Mining
Pattern
Recognition Artificial
Algorithm Intelligence
Knowledge Discovery (KDD) Process
◦ Data mining—core of
Pattern Evaluation
knowledge discovery
process
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
KDD Process: Several Key Steps
• Learning the application domain
• relevant prior knowledge and goals of application
• Creating a target data set: data selection
• Data cleaning and preprocessing: (may take 60% of effort!)
• Data reduction and transformation
• Find useful features, dimensionality/variable reduction,
invariant representation
• Choosing functions of data mining
• summarization, classification, regression, association,
clustering
• Choosing the mining algorithm(s)
• Data mining: search for patterns of interest
• Pattern evaluation and knowledge presentation
• visualization, transformation, removing redundant patterns, etc.
• Use of discovered knowledge