0% found this document useful (0 votes)
13 views

Introduction To Data Mining

Uploaded by

shaina45796
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Introduction To Data Mining

Uploaded by

shaina45796
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Data Mining:

Concepts and Techniques

1
Introduction

 Motivation: Why data mining?


 What is data mining?
 Data Mining: On what kind of data?
 Data mining functionality
 Are all the patterns interesting?
 Classification of data mining systems
 Major issues in data mining

2
Why Data Mining?
 The Explosive Growth of Data: from terabytes to petabytes
 Data collection and data availability

Automated data collection tools, database systems, Web,
computerized society
 Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, …

Science: Remote sensing, bioinformatics, scientific simulation, …

Society and everyone: news, digital cameras,
 We are drowning in data, but starving for knowledge!
 “Necessity is the mother of invention”—Data mining—Automated analysis of
massive data sets

3
Evolution of Database Technology
 1960s:

Data collection, database creation, IMS and network DBMS
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO, deductive, etc.)

Application-oriented DBMS (spatial, scientific, engineering, etc.)
 1990s:
 Data mining, data warehousing, multimedia databases, and Web
databases
 2000s

Stream data management and mining

Data mining and its applications

Web technology (XML, data integration) and global information systems 4
What Is Data Mining?

 Data mining (knowledge discovery from data)


 Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
patterns or knowledge from huge amount of data
 Alternative name
 Knowledge discovery in databases (KDD)
 Watch out: Is everything “data mining”?
 Query processing
 Expert systems or statistical programs
5
Why Data Mining?—Potential Applications

 Data analysis and decision support


 Market analysis and management

Target marketing, customer relationship management
(CRM), market basket analysis, market segmentation
 Risk analysis and management

Forecasting, customer retention, quality control,
competitive analysis
 Fraud detection and detection of unusual
patterns (outliers) 6
Why Data Mining?—Potential Applications

 Other Applications
 Text mining (news group, email, documents)
and Web mining
 Stream data mining
 Bioinformatics and bio-data analysis

7
Market Analysis and Management

 Where does the data come from?


 Credit card transactions, discount coupons,
customer complaint calls
 Target marketing
 Find clusters of “model” customers who share
the same characteristics: interest, income level,
spending habits, etc.
 Determine customer purchasing patterns over
time
8
Market Analysis and Management

 Cross-market analysis
 Associations/co-relations between product sales,
& prediction based on such association
 Customer profiling
 What types of customers buy what products
 Customer requirement analysis
 Identifying the best products for different
customers
 Predict what factors will attract new customers 9
Fraud Detection & Mining Unusual Patterns

 Approaches: Clustering & model construction for frauds, outlier


analysis
 Applications: Health care, retail, credit card service,
telecomm.
 Medical insurance

Professional patients, and ring of doctors

Unnecessary or correlated screening tests
 Telecommunications:

Phone call model: destination of the call, duration, time of day
or week. Analyze patterns that deviate from an expected norm
 Retail industry

Analysts estimate that 38% of retail shrink is due to dishonest
employees 10
Data Mining: A KDD Process


Data mining—core of Pattern Evaluation
knowledge discovery
process
Data Mining

Task-relevant Data

Data Selection
Warehouse
Data Cleaning

Data Integration

11
Databases
Architecture: Typical Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine


Knowledge-
Database or
data warehouse base
server
Data cleaning & data integration Filtering

Data
Databases Warehouse
12
Data Mining: On What Kinds of Data?
 Relational database
 Data warehouse
 Transactional database
 Advanced database and information repository
 Spatial and temporal data
 Time-series data
 Stream data
 Multimedia database
 Text databases & WWW
13
Data Mining: Confluence of Multiple
Disciplines

Database
Statistics
Systems

Machine
Learning
Data Mining Visualization

Algorithm Other
Disciplines
14
Data Mining: Classification
Schemes
 Different views, different classifications
 Kinds of data to be mined
 Kinds of knowledge to be discovered
 Kinds of techniques utilized
 Kinds of applications adapted

15
Multi-Dimensional View of Data Mining
 Data to be mined
 Relational, data warehouse, transactional,
stream, object-oriented/relational, active,
spatial, time-series, text, multi-media,
heterogeneous, WWW
 Knowledge to be mined
 Characterization, discrimination, association,
classification, clustering, trend/deviation, outlier
analysis, etc.
 Multiple/integrated functions and mining at
16
multiple levels
Multi-Dimensional View of Data Mining
 Techniques utilized
 Database-oriented, data warehouse (OLAP),
machine learning, statistics, visualization, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud
analysis, bio-data mining, stock market
analysis, Web mining, etc.

17
Major Issues in Data Mining
 Mining methodology
 Performance: efficiency, effectiveness, and
scalability
 Pattern evaluation: the interestingness problem
 Incorporation of background knowledge
 Handling noise and incomplete data
 Parallel, distributed and incremental mining
methods
 Integration of the discovered knowledge with
existing one: knowledge fusion 18
Major Issues in Data Mining
 User interaction
 Data mining query languages and ad-hoc mining
 Expression and visualization of data mining results
 Interactive mining of knowledge at multiple levels
of abstraction
 Applications and social impacts
 Domain-specific data mining & invisible data
mining
 Protection of data security, integrity, and privacy

19
Summary
 Data mining: discovering interesting patterns from large amounts of
data
 A natural evolution of database technology, in great demand, with
wide applications
 A KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation, and
knowledge presentation
 Mining can be performed in a variety of information repositories
 Data mining functionalities: characterization, discrimination,
association, classification, clustering, outlier and trend analysis, etc.
 Data mining systems and architectures
 Major issues in data mining
20

You might also like