3analysing Important Trend
3analysing Important Trend
Concepts and
Techniques
(3rd ed.)
— Chapter 13 —
7
Chapter 13: Data Mining Trends
and Research Frontiers
9
Major Statistical Data Mining
Methods
Regression
Generalized Linear Model
Analysis of Variance
Mixed-Effect Models
Factor Analysis
Discriminant Analysis
Survival Analysis
10
Statistical Data Mining (1)
11
Scientific and Statistical Data Mining
(2)
Regression trees
Binary trees used for classification
and prediction
Similar to decision trees:Tests are
performed at the internal nodes
In a regression tree the mean of
the objective attribute is computed
and used as the predicted value
Analysis of variance
Analyze experimental data for two
or more populations described by
a numeric response variable and
one or more categorical variables
(factors)
13
Statistical Data Mining (4)
Factor analysis
determine which variables are
Survival analysis
Predicts the
probability that a
patient undergoing a
medical treatment
would survive at least
to time t (life span
prediction)
15
Other Methodologies of Data
Mining
Statistical Data Mining
Views on Data Mining Foundations
Visual and Audio Data Mining
16
Views on Data Mining Foundations (I)
Data reduction
Basis of data mining: Reduce data
representation
Trades accuracy for speed in response
Data compression
Basis of data mining: Compress the given data
by encoding in terms of bits, association rules,
decision trees, clusters, etc.
Probability and statistical theory
Basis of data mining: Discover joint probability
distributions of random variables
17
Views on Data Mining Foundations (II)
Microeconomic view
A view of utility: Finding patterns that are interesting only
to the extent in that they can be used in the decision-
making process of some enterprise
Pattern Discovery and Inductive databases
Basis of data mining: Discover patterns occurring in the
database, such as associations, classification models,
sequential patterns, etc.
Data mining is the problem of performing inductive logic
on databases
The task is to query the data and the theory (i.e., patterns)
of the database
Popular among many researchers in database systems
18
Other Methodologies of Data
Mining
Statistical Data Mining
Views on Data Mining Foundations
Visual and Audio Data Mining
19
Visual Data Mining
Visualization: Use of computer graphics to create
visual images which aid in the understanding of
complex, often massive representations of data
Visual Data Mining: discovering implicit but useful
knowledge from large data sets using visualization
techniques
Multimedi
Compute Human
a Systems
r Compute
Graphics r
Visual Interface
High Data s
Mining Pattern
Performance
Recogniti
Computing
on
20
Visualization
Purpose of Visualization
Gain insight into an information space by
mapping data onto graphical primitives
Provide qualitative overview of large data sets
Search for patterns, trends, structure,
irregularities, relationships among data.
Help find interesting regions and suitable
parameters for further quantitative analysis.
Provide a visual proof of computer
representations derived
21
Visual Data Mining & Data
Visualization
24
Visualization of Data Mining Results
in SAS Enterprise Miner: Scatter Plots
25
Visualization of Association
Rules in SGI/MineSet 3.0
26
Visualization of a Decision Tree in
SGI/MineSet 3.0
27
Visualization of Cluster Grouping in
IBM Intelligent Miner
28
Data Mining Process Visualization
29
Visualization of Data Mining
Processes by Clementine
Understand
variations with
visualized data
30
Interactive Visual Data Mining
31
Perception-Based Classification
(PBC)
32
Audio Data Mining
33
Chapter 13: Data Mining Trends
and Research Frontiers
Industries
Data Mining in Science and Engineering
35
Data Mining for Financial Data
Analysis (I)
Financial data collected in banks and financial
institutions are often relatively complete, reliable, and
of high quality
Design and construction of data warehouses for
multidimensional data analysis and data mining
View the debt and revenue changes by month, by
region, by sector, and by other factors
Access statistical information such as max, min,
total, average, trend, etc.
Loan payment prediction/consumer credit policy
analysis
feature selection and attribute relevance ranking
using ensemble)
42
Chapter 13: Data Mining Trends
and Research Frontiers
44
Privacy, Security and Social Impacts of
Data Mining
Many data mining applications do not touch personal data
E.g., meteorology, astronomy, geography, geology, biology, and
other scientific and engineering data
Many DM studies are on developing scalable algorithms to find
general or statistically significant patterns, not touching individuals
The real privacy concern: unconstrained access of individual
records, especially privacy-sensitive information
Method 1: Removing sensitive IDs associated with the data
Method 2: Data security-enhancing methods
Multi-level security model: permit to access to only authorized
level
Encryption: e.g., blind signatures, biometric encryption, and
anonymous databases (personal information is encrypted and
stored at different locations)
Method 3: Privacy-preserving data mining methods
45
Privacy-Preserving Data Mining
Privacy-preserving (privacy-enhanced or privacy-sensitive)
mining:
Obtaining valid mining results without disclosing the
underlying sensitive data values
Often needs trade-off between information loss and privacy
Privacy-preserving data mining methods:
Randomization (e.g., perturbation): Add noise to the data in
order to mask some attribute values of records
K-anonymity and l-diversity: Alter individual records so that
they cannot be uniquely identified
k-anonymity: Any given record maps onto at least k other records
l-diversity: enforcing intra-group diversity of sensitive values
Distributed privacy preservation: Data partitioned and
distributed either horizontally, vertically, or a combination of
both
Downgrading the effectiveness of data mining: The output of
data mining may violate privacy
Modify data or mining results, e.g., hiding some association rules or
46
Chapter 13: Data Mining Trends
and Research Frontiers
48
Chapter 13: Data Mining Trends
and Research Frontiers