CSM6404 DM L1
CSM6404 DM L1
Email: [email protected]
Lecture 1
Outline
Definition,motivation & application
Branches of data mining
Major issues in data mining
What Is Data Mining?
Query Query
◦ Well defined ◦ Poorly defined
◦ SQL ◦ No precise query language
• Data • Data
• Operational data • Not operational data
• Output • Output
• Precise • Fuzzy
• Subset of database • Not a subset of database
Evolution of Database Technology
Query Examples
Database
– Find all credit applicants with last name of Smith.
– Identify customers who have purchased more than $10,000 in the last
month.
– Find all customers who have purchased milk
Data Mining
– Find all credit applicants who are poor credit risks. (classification)
– Find all items which are frequently purchased with milk. (association
rules)
Potential Applications
Data analysis and decision support
◦ Market analysis and management
Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
◦ Risk analysis and management
Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
◦ Fraud detection and detection of unusual patterns (outliers)
Other Applications
◦ Text mining (news group, email, documents) and Web mining
◦ Stream data mining
◦ Bioinformatics and bio-data analysis
Ex.: Market Analysis and Management
Where does the data come from?—Credit card
transactions, loyalty cards, discount coupons, customer
complaint calls, surveys …
Target marketing
◦ Find clusters of “model” customers who share the same
characteristics: interest, income level, spending habits, etc.,
E.g. Most customers with income level 60k – 80k with food expenses $600 - $800
a month live in that area
◦ Determine customer purchasing patterns over time
E.g. Customers who are between 20 and 29 years old, with income of 20k – 29k
usually buy this type of CD player
Cross-market analysis—Find associations/co-relations between
product sales, & predict based on such association
◦ E.g. Customers who buy computer A usually buy software B
15
Ex.: Market Analysis and Management (2)
Customer requirement analysis
◦ Identify the best products for different customers
◦ Predict what factors will attract new customers
Provision of summary information
◦ Multidimensional summary reports
E.g. Summarize all transactions of the first quarter from three different branches
Summarize all transactions of last year from a particular branch
Summarize all transactions of a particular product
◦ Statistical summary information
E.g. What is the average age for customers who buy product A?
Fraud detection
◦ Find outliers of unusual transactions
Financial planning
◦ Summarize and compare the resources and spending
16
Data Mining Tasks
Prediction Tasks
◦ Use some variables to predict unknown or future values of other
variables
Description Tasks
◦ Find human-interpretable patterns that describe the data.
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
combine data)
Knowledge base (turn data into meaningful groups
Database
systems
Major Issues in Data Mining
Mining methodology and User interaction
◦ Mining different kinds of knowledge
DM should cover a wide spectrum of data analysis and knowledge discovery tasks
Enable to use the database in different ways
Require the development of numerous data mining techniques
27
Major Issues in Data Mining (contd..)
Performance Issues
◦ Efficiency and scalability
Huge amount of data
Running time must be predictable and acceptable