0% found this document useful (0 votes)
9 views

Introduction To Data Mining

Uploaded by

shaina45796
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Introduction To Data Mining

Uploaded by

shaina45796
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Data Mining:

Concepts and Techniques

1
Introduction

 Motivation: Why data mining?


 What is data mining?
 Data Mining: On what kind of data?
 Data mining functionality
 Are all the patterns interesting?
 Classification of data mining systems
 Major issues in data mining

2
Why Data Mining?
 The Explosive Growth of Data: from terabytes to petabytes
 Data collection and data availability

Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, …

Science: Remote sensing, bioinformatics, scientific simulation, …

Society and everyone: news, digital cameras,
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets

3
Evolution of Database Technology
 1960s:

Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)

Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s

Stream data management and mining

Data mining and its applications

Web technology (XML, data integration) and global information systems 4
What Is Data Mining?

 Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
patterns or knowledge from huge amount of data
 Alternative name
 Knowledge discovery in databases (KDD)
 Watch out: Is everything “data mining”?
 Query processing
 Expert systems or statistical programs
5
Why Data Mining?—Potential Applications

 Data analysis and decision support


 Market analysis and management

Target marketing, customer relationship management
(CRM), market basket analysis, market segmentation
 Risk analysis and management

Forecasting, customer retention, quality control,
competitive analysis
 Fraud detection and detection of unusual
patterns (outliers) 6
Why Data Mining?—Potential Applications

 Other Applications
 Text mining (news group, email, documents)
and Web mining
 Stream data mining
 Bioinformatics and bio-data analysis

7
Market Analysis and Management

 Where does the data come from?


 Credit card transactions, discount coupons,
customer complaint calls
 Target marketing
 Find clusters of “model” customers who share
the same characteristics: interest, income level,
spending habits, etc.
 Determine customer purchasing patterns over
time
8
Market Analysis and Management

 Cross-market analysis
 Associations/co-relations between product sales,
& prediction based on such association
 Customer profiling
 What types of customers buy what products
 Customer requirement analysis
 Identifying the best products for different
customers
 Predict what factors will attract new customers 9
Fraud Detection & Mining Unusual Patterns

 Approaches: Clustering & model construction for frauds, outlier


analysis
 Applications: Health care, retail, credit card service,
telecomm.
 Medical insurance

Professional patients, and ring of doctors

Unnecessary or correlated screening tests
 Telecommunications:

Phone call model: destination of the call, duration, time of day
or week. Analyze patterns that deviate from an expected norm
 Retail industry

Analysts estimate that 38% of retail shrink is due to dishonest
employees 10
Data Mining: A KDD Process


Data mining—core of Pattern Evaluation
knowledge discovery
process
Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

11
Databases
Architecture: Typical Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine


Knowledge-
Database or
data warehouse base
server
Data cleaning & data integration Filtering

Data
Databases Warehouse
12
Data Mining: On What Kinds of Data?
 Relational database
 Data warehouse
 Transactional database
 Advanced database and information repository
 Spatial and temporal data
 Time-series data
 Stream data
 Multimedia database
 Text databases & WWW
13
Data Mining: Confluence of Multiple
Disciplines

Database
Statistics
Systems

Machine
Learning
Data Mining Visualization

Algorithm Other
Disciplines
14
Data Mining: Classification
Schemes
 Different views, different classifications
 Kinds of data to be mined
 Kinds of knowledge to be discovered
 Kinds of techniques utilized
 Kinds of applications adapted

15
Multi-Dimensional View of Data Mining
 Data to be mined
 Relational, data warehouse, transactional,
stream, object-oriented/relational, active,
spatial, time-series, text, multi-media,
heterogeneous, WWW
 Knowledge to be mined
 Characterization, discrimination, association,
classification, clustering, trend/deviation, outlier
analysis, etc.
 Multiple/integrated functions and mining at
16
multiple levels
Multi-Dimensional View of Data Mining
 Techniques utilized
 Database-oriented, data warehouse (OLAP),
machine learning, statistics, visualization, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud
analysis, bio-data mining, stock market
analysis, Web mining, etc.

17
Major Issues in Data Mining
 Mining methodology
 Performance: efficiency, effectiveness, and
scalability
 Pattern evaluation: the interestingness problem
 Incorporation of background knowledge
 Handling noise and incomplete data
 Parallel, distributed and incremental mining
methods
 Integration of the discovered knowledge with
existing one: knowledge fusion 18
Major Issues in Data Mining
 User interaction
 Data mining query languages and ad-hoc mining
 Expression and visualization of data mining results
 Interactive mining of knowledge at multiple levels
of abstraction
 Applications and social impacts
 Domain-specific data mining & invisible data
mining
 Protection of data security, integrity, and privacy

19
Summary
 Data mining: discovering interesting patterns from large amounts of
data
 A natural evolution of database technology, in great demand, with
wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
 Data mining systems and architectures
 Major issues in data mining
20

You might also like