Topic 1c - Tasks and Techniques of DM
Topic 1c - Tasks and Techniques of DM
2. Approach:
• We know Collect various demographic, lifestyle, and company-interaction
related information, type of business, where they stay, how much they earn,
etc.
• Identify which customers decided to buy and which decided otherwise. This
{buy, don’t buy} decision forms the class attribute.
• Use this information as input attributes to learn a classifier model.
CLASSIFICATION: APPLICATION 2
CUSTOMER ATTRITION/CHURN
2. Approach:
• Use detailed record of transactions (past and present customers
• How often the customer calls, where he calls, what time-of-the day he calls
most, his financial status, marital status, etc.
• Label the customers as loyal or disloyal.
• Find a model for loyalty.
CLUSTERING DEFINITION
Similarity Measures:
• Euclidean Distance if attributes are continuous.
• Other Problem-specific Measures.
ILLUSTRATING CLUSTERING
Euclidean Distance Based Clustering in 3-D space.
2. Approach:
• Collect different attributes of customers based on their geographical and
lifestyle related information.
• Find clusters of similar customers.
• Measure the clustering quality by observing buying patterns of customers in
same cluster vs. those from different clusters.
CLUSTERING: APPLICATION 1 – MARKET SEGMENTATION
Segment 1: high duration but low number of generated calls and moderate number
of sent and received SMS. Segment 2: moderate duration of generated calls and
moderate to high data usage.
Segment 3: high duration of off-net calls, high number of generated calls, and
moderate to low of both duration of generated calls and data usage.
Segment 4: very low call duration, high sent and received SMS, and high data usage.
Segment 5: very low data usage, low duration of generated calls, and high number of
received calls with respect to the number of generated calls. Segment 6: relatively
high duration of international calls.
DOCUMENT CLUSTERING
2. Approach:
• To identify frequently occurring terms in each document. Form a similarity
measure based on the frequencies of different terms. Use it to cluster.
• Gain: Information Retrieval can utilize the clusters to relate a new
document or search term to clustered documents.
ASSOCIATION RULE DISCOVERY: DEFINITION
Given a set of records each of which contain some number of items from a given
collection;
• Produce dependency rules which will predict occurrence of an item based on
occurrences of other items.
TID Items
1 Bread, Coke, Milk
2 Beer, Bread Rules
RulesDiscovered:
Discovered:
{Milk}
{Milk}-->
-->{Coke}
{Coke}
3 Beer, Coke, Diaper, Milk {Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
ASSOCIATION RULE DISCOVERY: APPLICATION 1
2. Approach:
• Process the point-of-sale data collected with barcode scanners to find
dependencies among items.
3. A classic rule
• If a customer buys diaper and milk, then he is very likely to buy rootbeer.
• So, don’t be surprised if you find six-packs of rootbeer stacked next to diapers!
RETAIL ANALYTICS
https://www.digitalnewsasia.com/download/tapwaycasestudy.pdf
REGRESSION
1. Predict a value of a given continuous valued variable based on the values of other
variables, assuming a linear or nonlinear model of dependency.
2. Greatly studied in statistics, and machine learning fields.
3. Examples:
• Predicting sales amounts of new product based on advertising expenditure.
• Predicting wind velocities as a function of temperature, humidity, air pressure,
etc.
• Time series prediction of stock market indices.
DEVIATION ANALYSIS
Typical network traffic at University level may reach over 100 million connections per day
DEVIATION ANALYSIS (FRAUD DETECTION)
https://www.insurancebusinessmag.com/asia/news/breaking-news/malaysias-antifraud-system-operational-by-october-74933.aspx
PROFITEERING CASES
https://www.freemalaysiatoday.com/category/nation/2018/08/25/yes-keep-receipts-to-fight-profit
eering-say-retailers/
1. Tan, Steinbach, Karpatne, Kumar, Lecture Notes, Chapter 1, Introduction to Data Mining, 2 nd Edition, 2018
2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data Mining, Addison Wesley, 2019.
3. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann, 2012.
4. Coenen, F. Data mining: past, present and future. Knowledge Engineering Review, 26(1), 25-29, 2011
5. Gregory Piatetsky-Shapiro, Data Science: Past, Present, and Future KDnuggets 1© Kdnuggets, 2016
THANK YOU
Shuzlina Abdul Rahman | Sofianita Mutalib | Siti Nur Kamaliah Kamarudin