DM Lec1
DM Lec1
Introduction
Lec 1
Mohammed
What is data mining?
• After years of data mining there is still no unique
answer to this question.
• A tentative definition:
• Data Mining
• Find all credit applicants who are poor credit risks.
(classification)
• Identify customers with similar buying habits. (Clustering)
• Find all items which are frequently purchased with milk.
(association rules)
A Bit of History
•We are drowning in data, but starving for knowledge.
(John Naisbitt, 1982)
James Webb
Telescope
≈57 GB/day
≈21 TB/year
We are Drowning in Data...
•
Facebook
≈12 TB/day added
(as of Mar. 2010 )
Google
≈20 PB/day processed
(Jan. 2010 )
We are Drowning in Data...
We are Drowning in Data...
...but starving for knowledge!
• Scientific Viewpoint:
• Data collected and stored at
enormous speeds (GB/hour)
• remote sensors on a satellite
• telescopes scanning the skies
• microarrays generating gene
expression data
• scientific simulations
generating terabytes of data
Data is power!
• “The data is the computer”
• Large amounts of data can be more powerful than
complex algorithms and models
• Google has solved many Natural Language Processing problems,
simply by looking at the data
• Example: misspellings, synonyms
• Data is power!
• Today, the collected data is one of the biggest assets of an
online company
• Query logs of Google
• The friendship and updates of Facebook
• Tweets and follows of Twitter
• Amazon transactions
Data is power!
• Competitive Pressure is Strong
• Provide better, customized services for anedge (e.g. in Customer
Relationship Management)
Itemsets Discovered:
{Milk,Coke}
{Diaper, Milk}
Rules Discovered:
{Milk} --> {Coke}
{Diaper, Milk} --> {Beer}
• Recommendations:
• Users who buy this item often buy this item as well
• Users who watched James Bond movies, also watched
Jason Bourne movies.
Test
Set
Learn
Training Model
Set Classifier