Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
1
Unit-II (a): Data Mining
1. Why Data Mining?
2
1. Why Data Mining?
The Explosive Growth of Data: from terabytes to petabytes
Data collection and data availability
Automated data collection tools, database systems, Web,
computerized society
Major sources of abundant data
Business: Web, e-commerce, transactions, stocks, …
Science: Remote sensing, bioinformatics, scientific simulation, …
Society and everyone: news, digital cameras, YouTube
We are drowning in data, but starving for knowledge!
“Necessity is the mother of invention”—Data mining—Automated
analysis of massive data sets
3
1. Why Data Mining?
i. Moving toward the Information Age
We are living in the information age.
The world is data rich but information poor.
The number of people who search for flu-related information
and the number of people who actually have flu symptoms
4
5
Unit-II (a): Data Mining
1. Why Data Mining?
6
2. What Is Data Mining?
Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or
knowledge from huge amount of data
Alternative names
Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis, data
archeology, data dredging, information harvesting,
business intelligence, etc.
7
Knowledge Discovery (KDD) Process
This is a view from typical database
systems and data warehousing
communities Pattern Evaluation
Data mining plays an essential role in
the knowledge discovery process
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Databases
8
Example: A Web Mining Framework
9
Data Mining in Business Intelligence
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
11
3. Data Mining: On What Kinds of Data?
Data Warehouse
Transactional database
12
June 21, 2019 Data Mining: Concepts and Techniques 13
14
15
ii. Other Kinds of Data
Data streams and sensor data
Time-series data, temporal data, sequence data
Structure data, graphs, social networks
Object-relational databases
Heterogeneous databases and legacy databases
Spatial data and spatiotemporal data
Multimedia database
Text databases
The World-Wide Web
16
Unit-II (a): Data Mining
1. Why Data Mining?
17
4. Data Mining Techniques
v. Outlier Analysis
18
Data Mining Tasks
19
i. Data characterization & discrimination
It is a summarization of the general characteristics or features
of a target class of data.
Output:
pie charts, bar charts, curves, multidimensional data cubes,
22
June 21, 2019 Data Mining: Concepts and Techniques 23
(iv) Cluster Analysis
24
June 21, 2019 Data Mining: Concepts and Techniques 25
(v) Outlier Analysis
Outlier analysis
Outlier: A data object that does not comply with
26
(vi) Are all mined knowledge interesting
A pattern is interesting if it is
(1) easily understood by humans,
(2) valid on new or test data with some degree of certainty,
(3) potentially useful, and
(4) novel.
28
5. Major Issues in Data Mining
i. Mining Methodology
Mining various and new kinds of knowledge (Integrated Clustering)
Mining knowledge in multi-dimensional space (CHG & Data Cube)
Data mining: An interdisciplinary effort (Text & Bug Mining)
Boosting the power of discovery in a networked environment
Handling noise, uncertainty, & incompleteness of data (Cleaning)
Pattern evaluation & pattern-or constraint-guided mining (Beliefs)
ii. User Interaction
Interactive mining (UI, diff. mining requests, search, OLAP Oper’s)
Incorporation of background knowledge (Domain knowledge)
Ad hoc data mining and data mining query languages (SQL/DMQL)
Presentation and visualization of data mining results (Understandable)
29
Contd..
iii. Efficiency and Scalability
Efficiency & scalability of DM algorithms (Time & Performance)
Parallel, distributed, stream, and incremental mining methods
30
Summary
Data mining: Discovering interesting patterns and knowledge
from massive amount of data.
31