Data Mining Concepts - Binary
Data Mining Concepts - Binary
WHAT IS DATAMINING?
Definition:
It is the computational
process
of
discovering
patterns in large data sets
involving methods at the
intersection of artificial
intelligence,
machine
learning, statistics, and
WHAT IS DATAMINING?
Data mining (knowledge discovery
from data)
Extraction of interesting (non-trivial, implicit,
automatic or semi-automatic
means, of large quantities of data
in order to discover meaningful
WHY DATAMINING?
Credit ratings/targeted marketing:
Given a database of 100,000 names, which
persons are the least likely to default on their
credit cards?
Identify likely responders to sales promotions
Fraud detection
Which types of transactions are likely to be
information
DATAMINING VS KDD
Knowledge
Discovery
in
Databases (KDD): process of
finding useful information and
patterns in data.
Data Mining: Use of algorithms
to extract the information and
patterns derived by the KDD
process.
KNOWLEDGE DISCOVERY
PROCESS
Knowledge
Data mining: the core Knowledge Interpretation
of knowledge
discovery process.
Data Mining
Task-relevant Data
Data transformations
Preprocessed
Data
Data Cleaning
Data Integration
Databases
Selection
GOALS
PREDICTION - Data mining can show how certain
BASIC OPERATION
ASSOCIATION
BASIC OPERATION
SEQUENTIAL PATTERNS - -A sequence
ASSOCIATION RULES
An association algorithm creates
CLASSIFICATION
Given old data about customers and
Classifier
Decision rules
Salary > 5
L
Prof. =
Exec
New applicants
data
Good/
bad
CLASSIFICATION
Decision Tree Method
Marrie
d
no
yes
Accnt
Bal
Salary
<20
k
poor
risk
>=2
0k
fair risk
>=5
0k
<5k
Age
poor
risk
good
risk
>=5
k
<25k
fair risk
>=2
5k
good
risk
SEQUENTIAL PATTERNS
Given is a set of objects, with each object
(A B)
(C)
(D E)
>ng<= ws
<= ms
SEQUENTIAL PATTERN:
EXAMPLE
In point-of-sale transaction sequences,
Computer Bookstore:
CLUSTERING
Clustering algorithms find groups of
CLUSTERING
Group Data into Clusters
Similar data is grouped in the same
cluster
Dissimilar data is grouped in the same
cluster
How is this achieved ?
K-Nearest Neighbor
A
classification
method
that
classifies a point by calculating the
distances between the point and
points in the training data set.
Then it assigns the point to the
class that is most common among
APPLICATIONS OF
DATAMINING
Banking: loan/credit card
approval
Customer relationship
management:
competitor.
Targeted marketing:
Fraud detection:
telecmmunications, financial
transactions
from an online stream of event identify
fraudulent events
APPLICATIONS OF DATAMINING
Medicine: disease outcome,
effectiveness of treatments
analyze patient disease history: find
relationship between diseases
Molecular/Pharmaceutical:
identify new drugs
Scientific data analysis:
identify new galaxies by searching
for sub clusters
Web site/store design and
promotion:
knowledge
New knowledge can be used to improve
services or products
Improvements lead to:
Bigger profits
More efficient service
THANK
YOU &
GOD
BLESS!