Lecture 1 Introduction To Data Mining
Lecture 1 Introduction To Data Mining
TO DATA MINING
Dr. Dhaval Patel
CSE, IIT-Roorkee
What is data mining?
Data mining is also called knowledge discovery and
data mining (KDD)
Data mining is
extractionof useful patterns from data sources, e.g.,
databases, texts, web, image.
Knowledge
Patterns
Data
Knowledge Discovery in Data: Process
Knowledge Discovery in Data: Challenges
Volume
- Big Data
- Small Data
Data
Variety
Velocity - Transaction
- Data Stream - Temporal
- Static - Spatial
…
5
Outline (Part 1)
Introduction to Data
TransactionalData
Temporal Data
Data Preprocessing
Missing
Values
Summarization
INTRODUCTION TO DATA
Data Come from Everywhere
• Transaction Data
TID Items
1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
Market-Basket Dataset
Data Matrix
timeout
season
coach
game
score
team
ball
lost
pla
wi
n
y
Distance Matrix
3
point x y
2 p1
p1 0 2
p3 p4
1
p2 2 0
p2 p3 3 1
0 p4 5 1
0 1 2 3 4 5 6
p1 p2 p3 p4
p1 0 2.828 3.162 5.099
p2 2.828 0 1.414 3.162
p3 3.162 1.414 0 2
p4 5.099 3.162 2 0
Distance Matrix
Temporal Data
Sequences Data
B
C
A
• Spatial Data
Spatial Data
http://csc.noaa.gov/hurricanes
Spatial & Spatial-Temporal Data
Stadium
Movie Complex
Swimming Pool
P1 on weekends
Home
Visualization Classification
In, Summary
Machine Visualization
Learning
Statistics Databases
43
Related Field
Statistics:
more theory-based
more focused on testing hypotheses
Machine learning
more heuristic
focused on improving performance of a learning agent
also looks at real-time learning and robotics – areas not part of data
mining
45
Clustering
46
Association Rules & Frequent Itemsets
Transactions
Frequent Itemsets:
TID Produce
1 MILK, BREAD, EGGS Milk, Bread (4)
2 BREAD, SUGAR Bread, Cereal (3)
3 BREAD, CEREAL Milk, Bread, Cereal (2)
4 MILK, BREAD, SUGAR …
5 MILK, CEREAL
6 BREAD, CEREAL
7 MILK, CEREAL
8 MILK, BREAD, CEREAL, EGGS
9 MILK, BREAD, CEREAL
Rules:
Milk => Bread (66%)
47
Visualization & Data Mining
Visualizing the data to
facilitate human
discovery
Presenting the
discovered results in a
visually "nice" way
48
Summarization
49
Data Mining Models and Tasks