0% found this document useful (0 votes)
131 views29 pages

Data Mining: Concepts & Techniques

Data mining involves extracting useful patterns from large amounts of data. It has become necessary due to the explosion in data collection. Data mining is the core of the knowledge discovery process, which involves data selection, cleaning, transformation, mining, and interpretation. The goal is to discover unknown and potentially useful information and patterns from data.

Uploaded by

Le Putra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views29 pages

Data Mining: Concepts & Techniques

Data mining involves extracting useful patterns from large amounts of data. It has become necessary due to the explosion in data collection. Data mining is the core of the knowledge discovery process, which involves data selection, cleaning, transformation, mining, and interpretation. The goal is to discover unknown and potentially useful information and patterns from data.

Uploaded by

Le Putra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Mining:

Concepts & Techniques

Motivation:
Necessity is the Mother of Invention
Data explosion problem
Automated data collection tools and mature database technology
lead to tremendous amounts of data stored in databases, data
warehouses and other information repositories
We are drowning in data, but starving for knowledge!
Solution: Data warehousing and data mining
Data warehousing and on-line analytical processing
Extraction of interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases

Evolution of Database Technology

What Is Data Mining?


Data mining (knowledge discovery in databases):
Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) information or patterns from
data in large databases
Alternative names and their inside stories:
Data mining: a misnomer?
Knowledge discovery(mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
What is not data mining?
(Deductive) query processing.
Expert systems or small ML/statistical programs
6

Data Mining: A KDD Process

Data mining:
the core of
knowledge
discovery
process

Steps of a KDD Process


Learning the application domain:
relevant prior knowledge and goals of application
Creating a target data set: data selection
Data cleaning and preprocessing: (may take 60% of effort!)
Data reduction and transformation:
Find useful features, dimensionality/variable reduction, invariant
representation.
Choosing functions of data mining
summarization, classification, regression, association, clustering.
Choosing the mining algorithm(s)
Data mining: search for patterns of interest
Pattern evaluation and knowledge presentation
visualization, transformation, removing redundant patterns, etc.

Use of discovered knowledge

Knowledge Discovery Process


The whole process of extraction of implicit, previously unknown and
potentially useful knowledge from a large database
It includes data selection, cleaning, enrichment,
coding, data mining, and reporting
Data Mining is the key stage of Knowledge Discovery
Process
The process of finding the desired information from large
database

10

Knowledge Discovery Process


Example: the database of a magazine publisher which sells five
types of magazines on cars, houses, sports, music and comics
Data mining:
Find interesting categorical properties
Questions:
What is the profile of a reader of a car magazine?
Is there any correlation between an interest in cars and an
interest in comics?
The knowledge discovery process consists of six stages

11

Data Selection
Select the information about people who have subscribed to a
magazine

12

Cleaning
Pollutions: Type errors, moving from one place to another without
notifying change of address, people give incorrect information
about themselves
Pattern Recognition Algorithms

13

Cleaning
Lack of domain consistency

14

Enrichment
Need extra information about the clients consisting of date of birth,
income, amount of credit, and whether or not an individual owns a
car or a house

15

Enrichment
The new information need to be easily joined to the existing
client records
Extract more knowledge

16

Coding
We select only those records that have enough information to be
of value (row)
Project the fields in which we are interested (column)

17

Coding
Code the information which is too detailed
Address to region
Birth date to age
Divide income by 1000
Divide credit by 1000
Convert cars yes-no to 1-0
Convert purchase date to month numbers starting from
1990
The way in which we code the information will
determine the type of patterns we find
Coding has to be performed repeatedly in order to get the best
results

18

Coding
The way in which we code the information will determine the
type of patterns we find

19

Coding
We are interested in the relationships between readers of
different magazines
Perform flattening operation

20

Data mining
We may find the following rules
A customer with credit > 13000 and aged between 22 and 31 who
has subscribed to a comics at time T will very likely subscribe to
a car magazine five years later
The number of house magazines sold to customers with credit
between 12000 and 31000 living in region 4 is increasing
A customer with credit between 5000 and 10000 who reads a
comics magazine will very likely become a customer with
credit between 12000 and 31000 who reads a sports and a house
magazine after 12 years

21

Knowledge Discovery Process

22

Business-Question-Driven Process

23

Data Mining and Business


Intelligence
In
cr
ea
sin
g
po
te
nti
al
to
su
pp
or
t
business decisions

24

Data Presenta
Visualization
Techniques

Data Mining
Information D

Data Explorat

Making

Decisions

Stati
stica
l
Ana
lysis
,
Que
ryin
g

and Reporting
Data Warehouses /
Data Marts
OLAP,
MDA

End
User

Busi
ness
An
aly
st
D
a
t
a
An
25

alyst
D
B
A
Data Sources
Paper, Files, Information Providers, Database Systems,
OLTP

26

Architecture of a Typical Data


Mining System

Data Mining: On What Kind of Data?


Relational databases
Data warehouses
Transactional databases
Advanced DB and information repositories
Object-oriented and object-relational databases
Spatial databases
Time-series data and temporal data
Text databases and multimedia databases
Heterogeneous databases
WWW

You might also like