0% found this document useful (0 votes)

87 views24 pages

Topic 1c - Tasks and Techniques of DM

Uploaded by

syazaqilah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views24 pages

Topic 1c - Tasks and Techniques of DM

Uploaded by

syazaqilah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

TOPIC 1 – PART 3

TASKS AND TECHNIQUES

OF DATA MINING
OBJECTIVES
To introduce about Data Mining (DM) and its
relationship with data and knowledge

To discuss the history, evolution and motivation of DM

To discuss DM tasks, techniques, applications ✅ and

some major issues
DATA MINING: TASKS and TECHNIQUES
TASKS include; TECHNIQUES include;
Classification Decision Trees
Clustering
Association Rule Knowledge Discovery
Association Rules in Databases
k-means
Prediction
Neural Networks Data mining
Sequential Analysis
Naïve Bayes
Deviation analysis Tasks
k-nearest neighbor
Similarity analysis
Techniques
Trend analysis Statistical Method
CLASSIFICATION: DEFINITION
Given a collection of records (training set )
• Each record contains a set of attributes, one of the attributes is the class.

Find a model for class attribute as a function of the values of other

attributes.

Goal: previously unseen records should be assigned a class as accurately

as possible.
• A test set is used to determine the accuracy of the model. Usually, the given
data set is divided into training and test sets, with training set used to build the
model and test set used to validate it.
CLASSIFICATION EXAMPLE
l l us
rica rica o
ego ego ti nu
t t n s
ca ca co lc as
Tid Refund Marital Taxable Refund Marital Taxable
Status Income Cheat Status Income Cheat

1 Yes Single 125K No No Single 75K ?

2 No Married 100K No Yes Married 50K ?
3 No Single 70K No No Married 150K ?
4 Yes Married 120K No Yes Divorced 90K ?
5 No Divorced 95K Yes No Single 40K ?
6 No Married 60K No No Married 80K ? Test
Set
10

7 Yes Divorced 220K No

8 No Single 85K Yes
9 No Married 75K No Training Learn
Set Classifier Model
10 No Single 90K Yes
10
CLASSIFICATION: APPLICATION 1
DIRECT MARKETING

1. Goal: Reduce cost of mailing by targeting a set of consumers likely to

buy a new cell-phone product.

2. Approach:
• We know Collect various demographic, lifestyle, and company-interaction
related information, type of business, where they stay, how much they earn,
etc.
• Identify which customers decided to buy and which decided otherwise. This
{buy, don’t buy} decision forms the class attribute.
• Use this information as input attributes to learn a classifier model.
CLASSIFICATION: APPLICATION 2
CUSTOMER ATTRITION/CHURN

1. Goal: To predict whether a customer is likely to be lost to a

competitor.

2. Approach:
• Use detailed record of transactions (past and present customers
• How often the customer calls, where he calls, what time-of-the day he calls
most, his financial status, marital status, etc.
• Label the customers as loyal or disloyal.
• Find a model for loyalty.
CLUSTERING DEFINITION

Given a set of data points, each having a set of attributes,

and a similarity measure among them, find clusters such that
• Data points in one cluster are more similar to one another.
• Data points in separate clusters are less similar to one another.

Similarity Measures:
• Euclidean Distance if attributes are continuous.
• Other Problem-specific Measures.
ILLUSTRATING CLUSTERING
 Euclidean Distance Based Clustering in 3-D space.

Intracluster distances Intercluster distances

are minimized are maximized
CLUSTERING: APPLICATION 1
MARKET SEGMENTATION

1. Goal: subdivide a market into distinct subsets of customers where any

subset may conceivably be selected as a market target to be reached with a
distinct marketing mix.

2. Approach:
• Collect different attributes of customers based on their geographical and
lifestyle related information.
• Find clusters of similar customers.
• Measure the clustering quality by observing buying patterns of customers in
same cluster vs. those from different clusters.
CLUSTERING: APPLICATION 1 – MARKET SEGMENTATION

Segment 1: high duration but low number of generated calls and moderate number
of sent and received SMS. Segment 2: moderate duration of generated calls and
moderate to high data usage.

Segment 3: high duration of off-net calls, high number of generated calls, and
moderate to low of both duration of generated calls and data usage.

Segment 4: very low call duration, high sent and received SMS, and high data usage.

Segment 5: very low data usage, low duration of generated calls, and high number of
received calls with respect to the number of generated calls. Segment 6: relatively
high duration of international calls.

Market Segmentation: https://online-journals.org/index.php/i-jim/article/download/4392/3606

CLUSTERING: APPLICATION 2

DOCUMENT CLUSTERING

1. Goal: To find groups of documents that are similar to each other

based on the important terms appearing in them.

2. Approach:
• To identify frequently occurring terms in each document. Form a similarity
measure based on the frequencies of different terms. Use it to cluster.
• Gain: Information Retrieval can utilize the clusters to relate a new
document or search term to clustered documents.
ASSOCIATION RULE DISCOVERY: DEFINITION
Given a set of records each of which contain some number of items from a given
collection;
• Produce dependency rules which will predict occurrence of an item based on
occurrences of other items.

TID Items
1 Bread, Coke, Milk
2 Beer, Bread Rules
RulesDiscovered:
Discovered:
{Milk}
{Milk}-->
-->{Coke}
{Coke}
3 Beer, Coke, Diaper, Milk {Diaper,
{Diaper,Milk}
Milk}-->
-->{Beer}
{Beer}
4 Beer, Bread, Diaper, Milk
5 Coke, Diaper, Milk
ASSOCIATION RULE DISCOVERY: APPLICATION 1

MARKETING AND SALES PROMOTION

• Let the rule discovered be

{Bagels, … } --> {Potato Chips}
• Potato Chips as consequent can be used to determine what
should be done to boost its sales.
• Bagels in the antecedent Can be used to see which products
would be affected if the store discontinues selling bagels.
• Bagels in antecedent and Potato chips in consequent can be
used to see what products should be sold with Bagels to
promote sale of Potato chips!
ASSOCIATION RULE DISCOVERY: APPLICATION 2

SUPERMARKET SHELF MANAGEMENT

1. Goal: To identify items that are bought together by sufficiently many

customers.

2. Approach:
• Process the point-of-sale data collected with barcode scanners to find
dependencies among items.
3. A classic rule
• If a customer buys diaper and milk, then he is very likely to buy rootbeer.
• So, don’t be surprised if you find six-packs of rootbeer stacked next to diapers!
RETAIL ANALYTICS
https://www.digitalnewsasia.com/download/tapwaycasestudy.pdf
REGRESSION

1. Predict a value of a given continuous valued variable based on the values of other
variables, assuming a linear or nonlinear model of dependency.
2. Greatly studied in statistics, and machine learning fields.
3. Examples:
• Predicting sales amounts of new product based on advertising expenditure.
• Predicting wind velocities as a function of temperature, humidity, air pressure,
etc.
• Time series prediction of stock market indices.
DEVIATION ANALYSIS

1. Discovering most significant changes in data from previously measured

or normative values
2. Usually categorical separately from other data mining tasks
3. Deviations are often infrequent
4. Modifications of classification, clustering, time series analysis can be
used as a means to achieve the goal
5. Outlier detection in statistics
DEVIATION ANALYSIS (ANOMALY DETECTION)

1. Detect significant deviations from normal behavior.

2. Applications:

Credit Card Fraud Detection Network Intrusion Detection

Typical network traffic at University level may reach over 100 million connections per day
DEVIATION ANALYSIS (FRAUD DETECTION)

1. Identify employee accounts at financial institutions that have excess numbers

of credit memos. Excess credit memos can indicate diversion of funds into
employee accounts.

2. Compare employee home addresses, social security numbers, telephone

numbers and bank routing and account numbers to those of vendors from
vendor master file. This test can reveal bogus or improperly selected vendor
accounts.
DEVIATION ANALYSIS (FRAUD DETECTION)

https://www.insurancebusinessmag.com/asia/news/breaking-news/malaysias-antifraud-system-operational-by-october-74933.aspx
PROFITEERING CASES

https://www.freemalaysiatoday.com/category/nation/2018/08/25/yes-keep-receipts-to-fight-profit
eering-say-retailers/

Yes, keep receipts to fight profiteering, say retailers

Robin Augustin -August 25, 2018 8:00 AM
http://english.astroawani.com/malaysia-news/gst-1-256-profiteering-
cases-detected-1-115-notices-issued-till-june-5-61853
REFERENCES

1. Tan, Steinbach, Karpatne, Kumar, Lecture Notes, Chapter 1, Introduction to Data Mining, 2 nd Edition, 2018
2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data Mining, Addison Wesley, 2019.
3. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 3rd Edition, Morgan Kaufmann, 2012.
4. Coenen, F. Data mining: past, present and future. Knowledge Engineering Review, 26(1), 25-29, 2011
5. Gregory Piatetsky-Shapiro, Data Science: Past, Present, and Future KDnuggets 1© Kdnuggets, 2016
THANK YOU
Shuzlina Abdul Rahman | Sofianita Mutalib | Siti Nur Kamaliah Kamarudin

Fakulteti I Shkencave Kompjuterike: Lënda
No ratings yet
Fakulteti I Shkencave Kompjuterike: Lënda
58 pages
Chap1 Intro
No ratings yet
Chap1 Intro
28 pages
Data Mining
No ratings yet
Data Mining
69 pages
Data Mining, Data Wharehousing and Olap
No ratings yet
Data Mining, Data Wharehousing and Olap
33 pages
Topic 1c - Tasks & Techniques
No ratings yet
Topic 1c - Tasks & Techniques
23 pages
Data Mining
No ratings yet
Data Mining
7 pages
Data Management
No ratings yet
Data Management
36 pages
3 DM
No ratings yet
3 DM
36 pages
Slides CRM - 4
No ratings yet
Slides CRM - 4
33 pages
Data Mining Slide
No ratings yet
Data Mining Slide
35 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
Datamining ch1
No ratings yet
Datamining ch1
24 pages
3 Data Mining
No ratings yet
3 Data Mining
58 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
No ratings yet
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
40 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
INS2061 Introductions
No ratings yet
INS2061 Introductions
75 pages
02 - Data Mining
No ratings yet
02 - Data Mining
27 pages
Foundations of Data Science - Unit 3
No ratings yet
Foundations of Data Science - Unit 3
18 pages
Data Mining
No ratings yet
Data Mining
33 pages
L1 Intro
No ratings yet
L1 Intro
32 pages
Instructor:: Doaa Adil Mohamed Altayeb
No ratings yet
Instructor:: Doaa Adil Mohamed Altayeb
34 pages
Unit 2 Data Mining
No ratings yet
Unit 2 Data Mining
7 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Data Mining Slides
No ratings yet
Data Mining Slides
65 pages
Introduction
No ratings yet
Introduction
29 pages
Lect 1
No ratings yet
Lect 1
38 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
Introduction To Data Mining Unit 2
No ratings yet
Introduction To Data Mining Unit 2
18 pages
Grade 12 Physics Exam Questions and Answers
80% (10)
Grade 12 Physics Exam Questions and Answers
3 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
Data Mining: Introduction: Lecture Notes For Chapter 1
No ratings yet
Data Mining: Introduction: Lecture Notes For Chapter 1
32 pages
Data Mining
No ratings yet
Data Mining
23 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
36 pages
Data Mining
No ratings yet
Data Mining
87 pages
2a. Basic Data Mining Techniques
No ratings yet
2a. Basic Data Mining Techniques
39 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
Knowledge Discovery and Data Mining (KDD)
No ratings yet
Knowledge Discovery and Data Mining (KDD)
52 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Data Mining
No ratings yet
Data Mining
37 pages
Ch2 DTasks
No ratings yet
Ch2 DTasks
44 pages
Data Mining and Warehousing: - Module 1 - Introduction
No ratings yet
Data Mining and Warehousing: - Module 1 - Introduction
29 pages
Knowledge Discovery & Data Mining
No ratings yet
Knowledge Discovery & Data Mining
30 pages
CPS 196.03: Information Management and Mining: Shivnath Babu
No ratings yet
CPS 196.03: Information Management and Mining: Shivnath Babu
30 pages
DM in Marketing
No ratings yet
DM in Marketing
14 pages
Course Structure R15me
No ratings yet
Course Structure R15me
217 pages
International Project Management Guide 2.0 (IAPM)
100% (1)
International Project Management Guide 2.0 (IAPM)
44 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
LT-LT-: Satellite Tracer
No ratings yet
LT-LT-: Satellite Tracer
70 pages
IME 672-Chapter 1 PDF
No ratings yet
IME 672-Chapter 1 PDF
41 pages
Inolab Cond 730
No ratings yet
Inolab Cond 730
80 pages
Feasib1 5
No ratings yet
Feasib1 5
87 pages
Data Mining
100% (13)
Data Mining
25 pages
VOCALOID 6 Reference Manual ENG
No ratings yet
VOCALOID 6 Reference Manual ENG
88 pages
Sas#4 - Ite 303-Sia
No ratings yet
Sas#4 - Ite 303-Sia
10 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Đê DX Duyên H I Final
No ratings yet
Đê DX Duyên H I Final
14 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
Data Mining
No ratings yet
Data Mining
63 pages
M & W Strategy
No ratings yet
M & W Strategy
19 pages
Asset Holiday Home Work 2
No ratings yet
Asset Holiday Home Work 2
13 pages
CHAPTER 2 - FILE HANDLING-txtfile
No ratings yet
CHAPTER 2 - FILE HANDLING-txtfile
23 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Nba Lab Details May 2014
No ratings yet
Nba Lab Details May 2014
38 pages
Project Documentation 2023 - 24 TK
No ratings yet
Project Documentation 2023 - 24 TK
18 pages
Darrel Todd Woodruff 261 WEST 600 NORTH #1, Logan, UT 84321 435-232-4326 Email Website
No ratings yet
Darrel Todd Woodruff 261 WEST 600 NORTH #1, Logan, UT 84321 435-232-4326 Email Website
2 pages
1730083731684.CB - VI - Art Integrated Project
100% (1)
1730083731684.CB - VI - Art Integrated Project
5 pages
Thermosonication and Optimization of Stingless Bee Honey Processing
No ratings yet
Thermosonication and Optimization of Stingless Bee Honey Processing
15 pages
Class IX E-Content Links (Final)
No ratings yet
Class IX E-Content Links (Final)
1 page
Monday Tuesday Wednesday Thursday Friday
No ratings yet
Monday Tuesday Wednesday Thursday Friday
8 pages
Exp 4-Noor Syaza Aqilah-2019317049-As2466a1
100% (1)
Exp 4-Noor Syaza Aqilah-2019317049-As2466a1
9 pages
Dimidi 2019
No ratings yet
Dimidi 2019
26 pages
Umakant B
No ratings yet
Umakant B
3 pages
A Conversation With William Rathje-Anthropology Today
No ratings yet
A Conversation With William Rathje-Anthropology Today
7 pages
Stingless Bee Honey and Its Potential Value A Syst
No ratings yet
Stingless Bee Honey and Its Potential Value A Syst
11 pages
Composition, Thermal and Rheological Behaviour of Selected Greek Honeys
No ratings yet
Composition, Thermal and Rheological Behaviour of Selected Greek Honeys
13 pages
Chemical Composition and Temperature Influence On Honey Texture Properties
No ratings yet
Chemical Composition and Temperature Influence On Honey Texture Properties
10 pages
Skill Development Under RKVY-2016-17
No ratings yet
Skill Development Under RKVY-2016-17
10 pages
Topic 1a - Introduction To Data Mining
No ratings yet
Topic 1a - Introduction To Data Mining
12 pages
Project Name: Wilmont's Pharmacy Drone Case: Qualitative Risk Analysis
100% (1)
Project Name: Wilmont's Pharmacy Drone Case: Qualitative Risk Analysis
3 pages
Skymionic Beams PDF
No ratings yet
Skymionic Beams PDF
6 pages
A Review On Honey Adulteration and The Available Detection Approaches
No ratings yet
A Review On Honey Adulteration and The Available Detection Approaches
7 pages
Journal of Food Engineering: Mircea Oroian
No ratings yet
Journal of Food Engineering: Mircea Oroian
6 pages
Read The Article Given and Answer The Questions
No ratings yet
Read The Article Given and Answer The Questions
6 pages
Optimization of Fermentation Conditions For Producing Indian Rock Bee (Apis
No ratings yet
Optimization of Fermentation Conditions For Producing Indian Rock Bee (Apis
5 pages
Satish
No ratings yet
Satish
5 pages
FST 606 Raw Data For AAS Ashing Method Lab
No ratings yet
FST 606 Raw Data For AAS Ashing Method Lab
4 pages
FST 606 Raw Data For AAS Ashing Method Lab
No ratings yet
FST 606 Raw Data For AAS Ashing Method Lab
4 pages
Shamjith UiUx Design Resume
No ratings yet
Shamjith UiUx Design Resume
1 page
Clostridium Botulinum PDF
No ratings yet
Clostridium Botulinum PDF
3 pages
Graph 2 Worksheet
No ratings yet
Graph 2 Worksheet
2 pages
Hydroline Breather FSB TB 130417
No ratings yet
Hydroline Breather FSB TB 130417
3 pages
Presentation Osha
No ratings yet
Presentation Osha
4 pages
C9 WS 3 PHY Electromagnet
No ratings yet
C9 WS 3 PHY Electromagnet
5 pages
Paper Number 11
No ratings yet
Paper Number 11
8 pages
Mehdi Belouahchia Resume F
No ratings yet
Mehdi Belouahchia Resume F
2 pages
Writing Ratios and Proportions
No ratings yet
Writing Ratios and Proportions
10 pages
Kenkel
No ratings yet
Kenkel
1 page
Outmarket the Competition: Advanced Marketing Tactics to Drive Growth and Profitability
From Everand
Outmarket the Competition: Advanced Marketing Tactics to Drive Growth and Profitability
Nick Doyle
No ratings yet
Dividends Still Don't Lie: The Truth About Investing in Blue Chip Stocks and Winning in the Stock Market
From Everand
Dividends Still Don't Lie: The Truth About Investing in Blue Chip Stocks and Winning in the Stock Market
Kelley Wright
No ratings yet
A Manual for Agribusiness Value Chain Analysis in Developing Countries
From Everand
A Manual for Agribusiness Value Chain Analysis in Developing Countries
Benjamin Dent
No ratings yet

Topic 1c - Tasks and Techniques of DM

Uploaded by

Topic 1c - Tasks and Techniques of DM

Uploaded by

TOPIC 1 – PART 3

TASKS AND TECHNIQUES

To discuss the history, evolution and motivation of DM

To discuss DM tasks, techniques, applications ✅ and

Find a model for class attribute as a function of the values of other

Goal: previously unseen records should be assigned a class as accurately

1 Yes Single 125K No No Single 75K ?

7 Yes Divorced 220K No

1. Goal: Reduce cost of mailing by targeting a set of consumers likely to

1. Goal: To predict whether a customer is likely to be lost to a

Given a set of data points, each having a set of attributes,

Intracluster distances Intercluster distances

1. Goal: subdivide a market into distinct subsets of customers where any

Market Segmentation: https://online-journals.org/index.php/i-jim/article/download/4392/3606

1. Goal: To find groups of documents that are similar to each other

MARKETING AND SALES PROMOTION

• Let the rule discovered be

SUPERMARKET SHELF MANAGEMENT

1. Goal: To identify items that are bought together by sufficiently many

1. Discovering most significant changes in data from previously measured

1. Detect significant deviations from normal behavior.

Credit Card Fraud Detection Network Intrusion Detection

1. Identify employee accounts at financial institutions that have excess numbers

2. Compare employee home addresses, social security numbers, telephone

Yes, keep receipts to fight profiteering, say retailers

You might also like