Data Mining Fundamentals
Data Mining Fundamentals
Data analysis
Most companies own huge databases
containing
operational data
Data mining fundamentals textual documents
experiment results
DB
These databases are a potential
MG
source of useful information
Data Base and Data Mining Group of Politecnico di Torino
Elena Baralis
Politecnico di Torino
DB
MG
2
2,000,000
Extracted information is represented by means of
1,500,000
Disk space (TB) abstract models
since 1995
1,000,000 denoted as pattern
500,000
Analyst
number
0
DB DB
1995 1996 1997 1998 1999
MG MG
3 4
From R. Grossman, C. Kamath, V. Kumar, Data
Mining for Scientific and Engineering Applications
DB
MG
5 DB
MG
6
Elena Baralis
Politecnico di Torino
1
DB
MG Data mining fundamentals
DataBase and Data Mining Group of Politecnico di Torino
DB
MG
7 DB
MG
8
DB
MG
12 DB
MG
13
Elena Baralis
Politecnico di Torino
2
DB
MG Data mining fundamentals
DataBase and Data Mining Group of Politecnico di Torino
Classification Classification
Objectives Approaches
decision trees
prediction of a class label
bayesian classification
definition of an interpretable model of a given classification rules
phenomenon neural networks
k-nearest neighbours
training data training data SVM
model model
DB
MG
14 DB
MG
15
Classification Classification
Requirements Applications
accuracy detection of customer propension to leave a company
(churn or attrition)
interpretability fraud detection
scalability classification of different pathology types
noise and outlier
management
training data dati di training
model modello
DB
MG
16 DB
MG
17
Clustering Clustering
Approaches
Objectives
partitional (K-means)
detecting groups of similar data objects
hierarchical
identifying exceptions and outliers
density-based (DBSCAN)
SOM
Requirements
scalability
management of
noise and outliers
large dimensionality
interpretability
DB
MG
18 DB
MG
19
Elena Baralis
Politecnico di Torino
3
DB
MG Data mining fundamentals
DataBase and Data Mining Group of Politecnico di Torino
DB
MG
20 DB
MG
21
DB
MG
22 DB
MG
23
Open issues
DB
MG
24
Elena Baralis
Politecnico di Torino
4