Unit I Dbmi
Unit I Dbmi
INTELLIGENCE
DEVIPRIYA P
AP
1.1 Why Data Mining?
• The Explosive Growth of Data: from terabytes(10004) to yottabytes(10008)
– Data collection and data availability
• Automated data collection tools, database systems, web
– Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: bioinformatics, scientific simulation, medical research …
• Society and everyone: news, digital cameras, …
• Data rich but information poor!
– What does those data mean?
– How to analyze data?
• Data mining — Automated analysis of massive data sets
Evolution of Database Technology
1.2 What Is Data Mining?
• Data mining (knowledge discovery from data)
– Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data
– Data mining: a misnomer?
• Alternative names
– Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
• Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, surveys …
• Target marketing
– Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.,
• E.g. Most customers with income level 60k – 80k with food expenses $600 - $800 a month live in that area
– Determine customer purchasing patterns over time
• E.g. Customers who are between 20 and 29 years old, with income of 20k – 29k usually buy this type of CD player
• Fraud detection
– Find outliers of unusual transactions
• Financial planning
– Summarize and compare the resources and spending
Increasing
potential
to support End User
business decisions Decision
Making
Visualization Analyst
Techniques
Data Mining
Information Data
Discovery Analys
Data Exploration t
Statistical Summary, Querying, and
Reporting
Data Preprocessing/Integration, Data Warehouses
DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database
Systems 10
A typical DM System Architecture
24
1.4 Data Mining Functionalities
- What kinds of patterns can be mined?
• Cluster Analysis
– Class label is unknown: group data to form new classes
• Evolution Analysis
– Describes and models regularities or trends for objects whose
particular companies.
26
1.5 Which Technologies Are Used?
• Statistics
Statistics studies the collection, analysis, interpretation or explanation,
and presentation of data. Data mining has an inherent connection with statistics.
• Machine Learning
– Supervised learning
Supervised learning is basically a synonym for classification
– Unsupervised learning
Unsupervised learning is essentially a synonym for clustering.
– Semi-supervised learning
Semi-supervised learning is a class of machine learning techniques that make
use of both labeled and unlabeled examples when learning a model.
– Active learning
Active learning is a machine learning approach that lets users play an
active role in the learning process.
• Database Systems and Data Warehouses
– Information retrieval (IR) is the science of searching for documents or
information in documents. Documents can be text or multimedia, and
may reside on the Web
Data mining adopts techniques from many domains
1.6 Which Kinds of Applications Are
Targeted?
• Business Intelligence
– It is critical for businesses to acquire a better understanding of the
commercial context of their organization, such as their customers, the
market, supply and resources, and competitors.
– Business intelligence (BI) technologies provide historical, current, and
predictive views of business operations.