0% found this document useful (0 votes)
42 views15 pages

Data Mining: Concepts and Techniques

Data mining involves extracting useful patterns from large amounts of data. It has many potential applications, including market analysis, risk analysis, and fraud detection. The data mining process involves selecting relevant data, cleaning and preprocessing it, applying data mining algorithms to discover patterns, and evaluating the results. Data mining draws upon multiple disciplines like database systems, machine learning, statistics, and visualization.

Uploaded by

Hira Baig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views15 pages

Data Mining: Concepts and Techniques

Data mining involves extracting useful patterns from large amounts of data. It has many potential applications, including market analysis, risk analysis, and fraud detection. The data mining process involves selecting relevant data, cleaning and preprocessing it, applying data mining algorithms to discover patterns, and evaluating the results. Data mining draws upon multiple disciplines like database systems, machine learning, statistics, and visualization.

Uploaded by

Hira Baig
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Mining:

Concepts and Techniques

1
Why Data Mining?
 The Explosive Growth of Data: from terabytes to
petabytes
 Data collection and data availability
 Automated data collection tools, database
systems
 Major sources of abundant data
 Business: Web, e-commerce, transactions
 Science: Remote sensing, bioinformatics
 Society and everyone: news, digital cameras,
 We are drowning in data, but starving for knowledge!
 Data mining—Automated analysis of massive data sets
2
What Is Data Mining?

 Extraction of interesting (implicit,


previously unknown and potentially
useful) patterns or knowledge from huge
amount of data

 Alternative name
 Knowledge discovery in databases (KDD)

3
Data Mining—Potential Applications

 Data analysis for decision support


 Market analysis and management
 Target marketing, customer relationship
management (CRM), market basket analysis,
market segmentation
 Risk analysis and management
 Forecasting, customer retention, quality control,
competitive analysis
 Fraud detection and detection of unusual patterns
(outliers)
4
Data Mining—Potential Applications

 Other Applications
 Text mining (news group, email, documents)
and Web mining
 Stream data mining
 Bioinformatics and bio-data analysis

5
Market Analysis and Management

 Data for DM
 Credit card transactions, discount coupons,
customer complaint calls
 Target marketing
 Find customers who share the same
characteristics: interest, income level, spending
habits, etc.
 Determine customer purchasing patterns over
time

6
Market Analysis and Management

 Cross-market analysis
 Associations/co-relations between product sales, &
prediction based on such association
 Customer profiling
 What types of customers buy what products

 Customer requirement analysis


 Identifying the best products for different customers
 Predict what factors will attract new customers

7
Fraud Detection & Mining Unusual Patterns

 Approaches: Clustering & model construction for frauds, outlier


analysis
 Applications: Health care, retail, credit card service,
telecomm.
 Medical insurance
 Professional patients, and ring of doctors
 Unnecessary or correlated screening tests
 Telecommunications:
 Phone call model: destination of the call, duration, time of day
or week. Analyze patterns that deviate from an expected norm
 Retail industry
 Analysts estimate that 38% of retail shrink is due to dishonest
employees
8
Other Applications

 Internet Web Surf-Aid


 IBM Surf-Aid applies data mining algorithms to Web
access logs for market-related pages to discover
customer preference and behavior pages, analyzing
effectiveness of Web marketing, improving Web site
organization, etc.

9
Data Mining Process

 Data mining—core of Pattern Evaluation


knowledge discovery
process
Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases
10
Steps of a KDD Process

 Learning the application domain


 Relevant prior knowledge and goals of application
 Creating a target data set: data selection
 Data cleaning and preprocessing: (may take 60% of effort!)
 Data reduction and transformation
 Find useful features, dimensionality/variable reduction.
 Choosing functions of data mining
 Summarization, classification, regression, association, clustering.
 Choosing the mining algorithm(s)
 Data mining: search for patterns of interest
 Pattern evaluation and knowledge presentation
 Visualization, transformation, removing redundant patterns, etc.
 Use of discovered knowledge

11
Architecture: Typical Data Mining System

Graphical user interface

Pattern evaluation

Data mining engine


Knowledge-base
Database or data
warehouse server
Data cleaning & data integration Filtering

Data
Databases Warehouse

12
Data Mining: On What Kinds of Data?

 Relational database
 Data warehouse
 Transactional database
 Advanced database and information repository
 Spatial and temporal data

 Time-series data

 Stream data

 Multimedia database

 Text databases & WWW

13
Data Mining Functionalities
 Cluster analysis
 Class label is unknown: Group data to form new classes, e.g.,

cluster houses to find distribution patterns


 Maximizing intra-class similarity & minimizing interclass similarity

 Outlier analysis
 Outlier: a data object that does not comply with the general

behavior of the data


 Useful in fraud detection, rare events analysis

 Trend and evolution analysis


 Trend and deviation: regression analysis

 Sequential pattern mining, periodicity analysis

14
Data Mining: Combination of Multiple Disciplines

Database
Statistics
Systems

Machine
Learning
Data Mining Visualization

Algorithm Other
Disciplines

15

You might also like