01-Introduction To Data Mining
01-Introduction To Data Mining
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
• Description Methods
▫ Find human-interpretable patterns that describe
the data.
Data Mining Tasks (contd..)
• Classification [Predictive]
• Clustering [Descriptive]
• Association Rule Discovery [Descriptive]
• Sequential Pattern Discovery [Descriptive]
• Regression [Predictive]
• Deviation Detection [Predictive]
1. Classification (Definition)
• Given a collection of records (training set )
▫ Each record contains a set of attributes, one of the
attributes is the class.
Set Classifier
Classification (application -1)
• Direct Marketing
▫ Goal: Reduce cost of mailing by targeting a set of
consumers likely to buy a new cell-phone product.
▫ Approach:
Use the data for a similar product introduced before.
We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class
attribute.
Collect various demographic, lifestyle, and company-
interaction related information about all such customers.
Type of business, where they stay, how much they earn, etc.
Use this information as input attributes to learn a classifier
model.
Classification (application-2)
• Fraud Detection
▫ Goal: Predict fraudulent cases in credit card
transactions.
▫ Approach:
Use credit card transactions and the information on its
account-holder as attributes.
When does a customer buy, what does he buy, how often he pays on
time, etc
Label past transactions as fraud or fair transactions. This
forms the class attribute.
Learn a model for the class of the transactions.
Use this model to detect fraud by observing credit card
transactions on an account.
Classification (application-3)
• Customer Attrition/Churn:
▫ Goal: To predict whether a customer is likely to be
lost to a competitor.
▫ Approach:
Use detailed record of transactions with each of the
past and present customers, to find attributes.
How often the customer calls, where he calls, what
time-of-the day he calls most, his financial status,
marital status, etc.
Label the customers as loyal or disloyal.
Find a model for loyalty.
2. Clustering (definition)
• Given a set of data points, each having a set of
attributes, and a similarity measure among
them, find clusters such that
▫ Data points in one cluster are more similar to one
another.
▫ Data points in separate clusters are less similar to
one another.
• Similarity Measures:
▫ Euclidean Distance
▫ Cosine similarity, etc.
Illustration of clustering
! Euclidean Distance Based Clustering in 3-D space.
(A B) (C) (D E)