Topic 1c - Tasks & Techniques
Topic 1c - Tasks & Techniques
Task and
Techniques
of Data
Mining
Ts. Dr. Tuan Norhafizah Tuan Zakaria
Objectives
Tasks
Techniques
Tasks Techniques
• Classification • Decision Trees
• Clustering • Association Rule
• Association Rules • k-means
• Prediction • Neural Networks
• Sequential Analysis • Naïve Bayes
• Deviation analysis • k-nearest neighbor
• Similarity analysis • Statistical Method
• Trend analysis
Given a collection of records (training set )
• Each record contains a set of attributes, one of the
attributes is the class.
Approach:
We know Collect various demographic, lifestyle,
Identify which customers decided to buy and
and company-interaction related information, Use this information as input attributes to learn a
which decided otherwise. This {buy, don’t buy}
type of business, where they stay, how much classifier model.
decision forms the class attribute.
they earn, etc.
Classification: Customer Attrition/Churn
Approach:
How often the customer calls,
Use detailed record of transactions where he calls, what time-of-the day Label the customers as loyal or
Find a model for loyalty.
(past and present customers he calls most, his financial status, disloyal.
marital status, etc.
Given a set of data points, each having a
set of attributes, and a similarity measure
among them, find clusters such that
g
Similarity Measures:
Intracluster Intercluster
distances distances
are minimized are maximized
Clustering: Market Segmentation
2. Approach:
Collect different attributes of customers Measure the clustering quality by observing
based on their geographical and lifestyle Find clusters of similar customers. buying patterns of customers in same cluster
related information. vs. those from different clusters.
Clustering: Market Segmentation
Segment 1: high duration
Segment 2: moderate
but low number of
duration of generated calls
generated calls and
and moderate to high data
moderate number of sent
usage.
and received SMS.
2. Approach:
To identify frequently occurring terms in each document. Gain: Information Retrieval can utilize the clusters to
Form a similarity measure based on the frequencies of relate a new document or search term to clustered
different terms. Use it to cluster. documents.
Association
Rule TID Items
Discovery 1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
• Given a set of records each of
which contain some number of 4 Beer, Bread, Diaper, Milk
items from a given collection; 5 Coke, Diaper, Milk
• Produce dependency rules
which will predict occurrence
of an item based on Rules
RulesDiscovered:
Discovered:
occurrences of other items. {Milk}
{Milk}-->
-->{Coke}
{Coke}
{Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
Association Rule
Discovery:
Marketing & Sales
Promotion
• Let the rule discovered be
{Bagels, … } --> {Potato Chips}
• Potato Chips as consequent can be used to
determine what should be done to boost its sales.
• Bagels in the antecedent Can be used to see which
products would be affected if the store discontinues
selling bagels.
• Bagels in antecedent and Potato chips in consequent
can be used to see what products should be sold
with Bagels to promote sale of Potato chips!
Goal: To identify items that are bought
Association together by sufficiently many customers.
Rule Approach:
Discovery: • Process the point-of-sale data collected with barcode
Supermark scanners to find dependencies among items.
https://www.digitalnewsasia.com/download/tapwaycasestudy.pdf
Regression
Analysis:
Anomaly Applications:
Detection • Credit card fraud detection
• Network intrusion detection
Typical network traffic at University level may reach over 100 million connections per day
Deviation Analysis: Fraud Detection
https://www.insurancebusinessmag.com/asia/news/breaking-news/malaysias-antifraud-system-operational-by-october-74933.aspx
Profiteering Cases
https://www.freemalaysiatoday.com/category/nation/2018/08/25/yes-keep-receipts-to-fight-profit
eering-say-retailers/