0% found this document useful (0 votes)

4 views23 pages

Topic 1c - Tasks & Techniques

The document provides an overview of data mining, including its objectives, tasks, and techniques. It discusses various data mining tasks such as classification, clustering, and association rule discovery, along with their applications in fields like marketing and fraud detection. Additionally, it highlights the importance of understanding data relationships and the methods used to analyze and predict trends from data.

Uploaded by

2024793147

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views23 pages

Topic 1c - Tasks & Techniques

Uploaded by

2024793147

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Topic 1c:

Task and
Techniques
of Data
Mining
Ts. Dr. Tuan Norhafizah Tuan Zakaria
Objectives

To introduce about To discuss the history, To discuss Data Mining

Data Mining and its evolution and techniques, tasks,
relationship with data motivation of Data applications and some
and knowledge Mining major issues
Knowledge Discovery
in Databases

DM: Tasks and Techniques Data mining

Tasks

Techniques

Tasks Techniques
• Classification • Decision Trees
• Clustering • Association Rule
• Association Rules • k-means
• Prediction • Neural Networks
• Sequential Analysis • Naïve Bayes
• Deviation analysis • k-nearest neighbor
• Similarity analysis • Statistical Method
• Trend analysis
Given a collection of records (training set )
• Each record contains a set of attributes, one of the
attributes is the class.

Classificati Find a model for class attribute as a

function of the values of other attributes.
on:
Definition Goal: previously unseen records should be
assigned a class as accurately as possible.
• A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to build
the model and test set used to validate it.
Classification Example
l l us
ir ca ir ca uo
go go ti n
te te n s
ca ca co lc as
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?
3 No Single 70K No No Married 150K ?
4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
Set
10

7 Yes Divorced 220K No

8 No Single 85K Yes
9 No Married 75K No Training Learn
Set Classifier Model
10 No Single 90K Yes
10
Classification: Direct Marketing

Goal: Reduce cost of mailing by targeting a set of

consumers likely to buy a new cell-phone product.

Approach:
We know Collect various demographic, lifestyle,
Identify which customers decided to buy and
and company-interaction related information, Use this information as input attributes to learn a
which decided otherwise. This {buy, don’t buy}
type of business, where they stay, how much classifier model.
decision forms the class attribute.
they earn, etc.
Classification: Customer Attrition/Churn

Goal: To predict whether a customer is likely to be lost to a

competitor.

Approach:
How often the customer calls,
Use detailed record of transactions where he calls, what time-of-the day Label the customers as loyal or
Find a model for loyalty.
(past and present customers he calls most, his financial status, disloyal.
marital status, etc.
Given a set of data points, each having a
set of attributes, and a similarity measure
among them, find clusters such that

• Data points in one cluster are more similar to one

another.

Clusterin • Data points in separate clusters are less similar to one

another.

g
Similarity Measures:

• Euclidean Distance if attributes are continuous.

• Other Problem-specific Measures.
Clustering: Euclidean Distance
Based Clustering in 3-D space.

Intracluster Intercluster
distances distances
are minimized are maximized
Clustering: Market Segmentation

Goal: subdivide a market into distinct subsets of customers

where any subset may conceivably be selected as a market target
to be reached with a distinct marketing mix.

2. Approach:
Collect different attributes of customers Measure the clustering quality by observing
based on their geographical and lifestyle Find clusters of similar customers. buying patterns of customers in same cluster
related information. vs. those from different clusters.
Clustering: Market Segmentation
Segment 1: high duration
Segment 2: moderate
but low number of
duration of generated calls
generated calls and
and moderate to high data
moderate number of sent
usage.
and received SMS.

Segment 3: high duration of

off-net calls, high number Segment 4: very low call
of generated calls, and duration, high sent and
moderate to low of both received SMS, and high
duration of generated calls data usage.
and data usage.

Segment 5: very low data

usage, low duration of
Segment 6: relatively high
generated calls, and high
duration of international
number of received calls
calls.
with respect to the number
of generated calls.
Clustering: Document Clustering

Goal: To find groups of documents that are similar to each

other based on the important terms appearing in them.

2. Approach:
To identify frequently occurring terms in each document. Gain: Information Retrieval can utilize the clusters to
Form a similarity measure based on the frequencies of relate a new document or search term to clustered
different terms. Use it to cluster. documents.
Association
Rule TID Items
Discovery 1 Bread, Coke, Milk
2 Beer, Bread
3 Beer, Coke, Diaper, Milk
• Given a set of records each of
which contain some number of 4 Beer, Bread, Diaper, Milk
items from a given collection; 5 Coke, Diaper, Milk
• Produce dependency rules
which will predict occurrence
of an item based on Rules
RulesDiscovered:
Discovered:
occurrences of other items. {Milk}
{Milk}-->
-->{Coke}
{Coke}
{Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
Association Rule
Discovery:
Marketing & Sales
Promotion
• Let the rule discovered be
{Bagels, … } --> {Potato Chips}
• Potato Chips as consequent can be used to
determine what should be done to boost its sales.
• Bagels in the antecedent Can be used to see which
products would be affected if the store discontinues
selling bagels.
• Bagels in antecedent and Potato chips in consequent
can be used to see what products should be sold
with Bagels to promote sale of Potato chips!
Goal: To identify items that are bought
Association together by sufficiently many customers.

Rule Approach:
Discovery: • Process the point-of-sale data collected with barcode
Supermark scanners to find dependencies among items.

et Shelf A classic rule

Manageme • If a customer buys diaper and milk, then he is very
nt likely to buy rootbeer.
• So, don’t be surprised if you find six-packs of rootbeer
stacked next to diapers!
Retail
Analytics

https://www.digitalnewsasia.com/download/tapwaycasestudy.pdf
Regression

Predict a value of a given

continuous valued variable
Greatly studied in statistics,
based on the values of
and machine learning Examples:
other variables, assuming a
fields.
linear or nonlinear model
of dependency.

Predicting sales amounts of Predicting wind velocities as

Time series prediction of
new product based on a function of temperature,
stock market indices.
advertising expenditure. humidity, air pressure, etc.
Deviation Analysis
Discovering most significant changes in data from previously measured or normative
values
• Usually, categorical separately from other data mining tasks

Deviations are often infrequent

Modifications of classification, clustering, time series analysis can be used as a means

to achieve the goal

Outlier detection in statistics

Detect significant deviations from
Deviation normal behavior.

Analysis:
Anomaly Applications:
Detection • Credit card fraud detection
• Network intrusion detection

Typical network traffic at University level may reach over 100 million connections per day
Deviation Analysis: Fraud Detection

Compare employee home

Identify employee accounts at addresses, social security numbers,
financial institutions that have telephone numbers and bank
excess numbers of credit memos. routing and account numbers to
Excess credit memos can indicate those of vendors from vendor
diversion of funds into employee master file. This test can reveal
accounts. bogus or improperly selected vendor
accounts.
Deviation Analysis: Fraud Detection

https://www.insurancebusinessmag.com/asia/news/breaking-news/malaysias-antifraud-system-operational-by-october-74933.aspx
Profiteering Cases

https://www.freemalaysiatoday.com/category/nation/2018/08/25/yes-keep-receipts-to-fight-profit
eering-say-retailers/

Yes, keep receipts to fight profiteering, say retailers

Robin Augustin -August 25, 2018 8:00 AM
http://english.astroawani.com/malaysia-news/gst-1-256-profiteering-
cases-detected-1-115-notices-issued-till-june-5-61853
References

1. Tan, Steinbach, Karpatne, Kumar, Lecture Notes, Chapter 1, Introduction to Data

Mining, 2nd Edition, 2018
2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data Mining,
Addison Wesley, 2019.
3. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 3rd
Edition, Morgan Kaufmann, 2012.
4. Coenen, F. Data mining: past, present and future. Knowledge Engineering Review,
26(1), 25-29, 2011
5. Gregory Piatetsky-Shapiro, Data Science: Past, Present, and Future KDnuggets 1©
Kdnuggets, 2016

Data Mining
No ratings yet
Data Mining
7 pages
Fin645 Trial Game Report Group Assignment
No ratings yet
Fin645 Trial Game Report Group Assignment
16 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Project Report On Employee Training and Development of Itc Chirala
0% (2)
Project Report On Employee Training and Development of Itc Chirala
99 pages
Past Paper Questions and Answers Ucc
No ratings yet
Past Paper Questions and Answers Ucc
36 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Project Management - Chapter One..
100% (1)
Project Management - Chapter One..
46 pages
Strategic Groups in An Industry: Prepared For
100% (1)
Strategic Groups in An Industry: Prepared For
22 pages
Business Plan I
No ratings yet
Business Plan I
56 pages
Foundations of Data Science - Unit 3
No ratings yet
Foundations of Data Science - Unit 3
18 pages
Data Mining
100% (13)
Data Mining
25 pages
COEN413 Machine Learning-2
No ratings yet
COEN413 Machine Learning-2
38 pages
Data Mining: Introduction: Lecture Notes For Chapter 1
No ratings yet
Data Mining: Introduction: Lecture Notes For Chapter 1
32 pages
Instructor:: Doaa Adil Mohamed Altayeb
No ratings yet
Instructor:: Doaa Adil Mohamed Altayeb
34 pages
Fakulteti I Shkencave Kompjuterike: Lënda
No ratings yet
Fakulteti I Shkencave Kompjuterike: Lënda
58 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Data Mining Slides
No ratings yet
Data Mining Slides
65 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
2a. Basic Data Mining Techniques
No ratings yet
2a. Basic Data Mining Techniques
39 pages
Datamining ch1
No ratings yet
Datamining ch1
24 pages
3 Data Mining
No ratings yet
3 Data Mining
58 pages
Data Mining Slide
No ratings yet
Data Mining Slide
35 pages
Chap1 Intro
No ratings yet
Chap1 Intro
28 pages
Slides CRM - 4
No ratings yet
Slides CRM - 4
33 pages
02 - Data Mining
No ratings yet
02 - Data Mining
27 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
Data Mining
No ratings yet
Data Mining
69 pages
Data Mining, Data Wharehousing and Olap
No ratings yet
Data Mining, Data Wharehousing and Olap
33 pages
Data Mining
No ratings yet
Data Mining
33 pages
INS2061 Introductions
No ratings yet
INS2061 Introductions
75 pages
L1 Intro
No ratings yet
L1 Intro
32 pages
Knowledge Discovery and Data Mining (KDD)
No ratings yet
Knowledge Discovery and Data Mining (KDD)
52 pages
Introduction
No ratings yet
Introduction
29 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Topic 1c - Tasks and Techniques of DM
No ratings yet
Topic 1c - Tasks and Techniques of DM
24 pages
Data Mining
No ratings yet
Data Mining
23 pages
3 DM
No ratings yet
3 DM
36 pages
Data Mining
No ratings yet
Data Mining
37 pages
DM in Marketing
No ratings yet
DM in Marketing
14 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
Data Management
No ratings yet
Data Management
36 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
CPS 196.03: Information Management and Mining: Shivnath Babu
No ratings yet
CPS 196.03: Information Management and Mining: Shivnath Babu
30 pages
Knowledge Discovery & Data Mining
No ratings yet
Knowledge Discovery & Data Mining
30 pages
Marketing Plan For Milk Tea
No ratings yet
Marketing Plan For Milk Tea
3 pages
Data Mining 4545
No ratings yet
Data Mining 4545
20 pages
Data Mining and Warehousing: - Module 1 - Introduction
No ratings yet
Data Mining and Warehousing: - Module 1 - Introduction
29 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
36 pages
02-Data Mining Functionalities-2
No ratings yet
02-Data Mining Functionalities-2
23 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
Marketing Research Written Report
No ratings yet
Marketing Research Written Report
9 pages
Thesis Sa Social Networking Sites
100% (3)
Thesis Sa Social Networking Sites
4 pages
The Impact of Storytelling in Creating Firm and Customer Connections in Online Environments
No ratings yet
The Impact of Storytelling in Creating Firm and Customer Connections in Online Environments
21 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
MKT10007 2020 Semester #2 Assignment #2 COVID19 Collaborate Version
No ratings yet
MKT10007 2020 Semester #2 Assignment #2 COVID19 Collaborate Version
8 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
DTDC Franchise PROCESS
No ratings yet
DTDC Franchise PROCESS
9 pages
Secondary Research CBPD PDF
No ratings yet
Secondary Research CBPD PDF
21 pages
Tanishq and Mia Offers
No ratings yet
Tanishq and Mia Offers
2 pages
IME 672-Chapter 1 PDF
No ratings yet
IME 672-Chapter 1 PDF
41 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
BRM Haldiram 1
No ratings yet
BRM Haldiram 1
53 pages
MGT602 Technical Article Theme 9
No ratings yet
MGT602 Technical Article Theme 9
7 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
Final Exam: Note: Submit The Question Sheet Together With The Answer Sheet
No ratings yet
Final Exam: Note: Submit The Question Sheet Together With The Answer Sheet
19 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Lecture2 DataMiningFunctionalities
No ratings yet
Lecture2 DataMiningFunctionalities
18 pages
The Models of Buyer Behaviour For The Product
100% (1)
The Models of Buyer Behaviour For The Product
2 pages
Turnaround Strategies
No ratings yet
Turnaround Strategies
26 pages
Scripts-Unit 4
No ratings yet
Scripts-Unit 4
5 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
2021 Business Studies Grade 11 Step Ahead
No ratings yet
2021 Business Studies Grade 11 Step Ahead
61 pages
Case Study ppt123
No ratings yet
Case Study ppt123
22 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Assignment Unit II
No ratings yet
Assignment Unit II
10 pages
Brand Heart
No ratings yet
Brand Heart
13 pages
Relationship CVP (Part 3)
No ratings yet
Relationship CVP (Part 3)
60 pages
Entry Strategy - IB
No ratings yet
Entry Strategy - IB
18 pages
The Effects of Social Media Marketing On Consumer Behavior
No ratings yet
The Effects of Social Media Marketing On Consumer Behavior
17 pages
Sarth Sandesh Lohokare
No ratings yet
Sarth Sandesh Lohokare
2 pages
JSS Assignment 1 1001 23
No ratings yet
JSS Assignment 1 1001 23
7 pages
Loreal Carlos Oseguera
No ratings yet
Loreal Carlos Oseguera
3 pages
Hand Pallet Truck Business Plan
No ratings yet
Hand Pallet Truck Business Plan
3 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Choose your WoW - Second Edition (FRENCH): A Disciplined Agile Approach to Optimizing Your Way of Working
From Everand
Choose your WoW - Second Edition (FRENCH): A Disciplined Agile Approach to Optimizing Your Way of Working
Mark Lines
No ratings yet

Topic 1c - Tasks & Techniques

Uploaded by

Topic 1c - Tasks & Techniques

Uploaded by

Topic 1c:

To introduce about To discuss the history, To discuss Data Mining

DM: Tasks and Techniques Data mining

Classificati Find a model for class attribute as a

1 Yes Single 125K No No Single 75K ?

7 Yes Divorced 220K No

Goal: Reduce cost of mailing by targeting a set of

Goal: To predict whether a customer is likely to be lost to a

• Data points in one cluster are more similar to one

Clusterin • Data points in separate clusters are less similar to one

• Euclidean Distance if attributes are continuous.

Goal: subdivide a market into distinct subsets of customers

Segment 3: high duration of

Segment 5: very low data

Goal: To find groups of documents that are similar to each

et Shelf A classic rule

Predict a value of a given

Predicting sales amounts of Predicting wind velocities as

Deviations are often infrequent

Modifications of classification, clustering, time series analysis can be used as a means

Outlier detection in statistics

Compare employee home

Yes, keep receipts to fight profiteering, say retailers

1. Tan, Steinbach, Karpatne, Kumar, Lecture Notes, Chapter 1, Introduction to Data

You might also like