0% found this document useful (0 votes)

15 views

Lesson 6 - Data Mining

Uploaded by

jerickculaniban.cosca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Lesson 6 - Data Mining

Uploaded by

jerickculaniban.cosca

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 44

DATA MINING

LESSON 6
CONTENT

 The Need for Data Mining and Business Value

 The Data Mining Process
 Define Business Objectives
 Get Raw Data
 Identify Relevant Predictive Variables
 Gain Customer Insight
 Act
THE NEED FOR DATA MINING

 Companies developed a need to understand their customers better

They must be proactive and anticipate customer desires

 Many companies have full access to detailed customer data

Data mining provides businesses with the ability to make

knowledge-driven strategic business decisions
THE NEED FOR DATA MINING

 The increasing data deluge instead of lack of data generates a

problem for many companies which leads into an obstacle (to use
data)
 Companies need to implement a standardized data mining
procedure in order to extract customer intelligence and value from
that data
THE BUSINESS VALUE OF DATA MINING

 Data mining can assist in selecting the right target customers or in

identifying customer segments with similar behavior and needs
 Applications of data mining include the following:
 Identifying customers that are likely to stop business with the
company with the help of predictive AU1 models
 Increasing customer profitability by identifying customers with a
high growth potential
 Reducing marketing costs by more selective targeting
THE DATA MINING PROCESS
THE BUSINESS VALUE OF DATA MINING
EXTENT OF INVOLVEMENT OF THE THREE MAIN GROUPS
PARTICIPATING IN A DATA MINING PROJECT
EXTENT OF INVOLVEMENT OF THE THREE MAIN GROUPS
PARTICIPATING IN A DATA MINING PROJECT

 Data mining group

 Understand business objectives and support the business group

to refine and sometimes correct the scope and expectations
 Most active during the variable selection and modeling phase
 Share obtained customer insights with the business group
EXTENT OF INVOLVEMENT OF THE THREE MAIN GROUPS
PARTICIPATING IN A DATA MINING PROJECT

 IT resources

 Source and extract required data used for modeling

EXTENT OF INVOLVEMENT OF THE THREE MAIN GROUPS
PARTICIPATING IN A DATA MINING PROJECT

 Business group

 Check plausibility and soundness of the solution in business

terms
 Take lead in deploying new insights into corporate action (e.g., a
call center or direct mail campaign)
DATA MANIPULATION

 In a simple, two-dimensional data table columns represent

descriptive variables, whereas rows represent single observations
DATA MANIPULATION

 Manipulation on columns:
 Transformation
 Derivation
 Elimination
DATA MANIPULATION

 Manipulation on rows:
 Aggregation
 Change detection
 Missing value detection
 Outlier detection
DATA PREPARATION

 For modeling, incoming data is sampled and split into various

streams as:
 Train set: Used to build models
 Test set: Used for out-of-sample tests of the model quality and to
select the final model candidate
 Scoring data: Used for model-based prediction
DATA PREPARATION

 For modeling, incoming data is sampled and split into various streams
as:
 Train set: Used to build models
 Test set: Used for out-of-sample tests of the model quality and to
select the final model candidate
 Scoring data: Used for model-based prediction
 The data sets must be carefully examined and designed to assure
statistical significance of the results obtained
DEFINE BUSINESS OBJECTIVES

 Modeling of expected customer potential in order to target the acquisition

of customers who will be profitable over the lifetime of the business
relationship
 Mathematically define the target variable of customer behavior that has to
be predicted
 Distinguish between customers by setting a target variable for specific
customer groups, e.g., 1 for one group and 0 for another
 Establish likely threshold levels indicating which customer should be
targeted in the marketing campaign
DEFINE BUSINESS OBJECTIVES

 Define a set of business or selection rules for the campaign (e.g.,

identifying customers that should be excluded / included in the
target groups)
 Define the details of project execution specifying the start and
delivery dates
 of the data mining process, and the responsible resources for each
task
 Define the chosen experimental setup for the campaign
DEFINE BUSINESS OBJECTIVES

 Define a cost/revenue matrix describing how the business

mechanics will work in the supported campaign and how it will
impact the data mining process
 Establish criteria for evaluating the success of the campaign
 Find a benchmark to compare results obtained in the past for similar
campaign setups using traditional targeting methods instead of
predictive models
COST/REVENUE MATRIX

 A Cost / Revenue matrix describes how the business mechanics will work in the
supported campaign and give business users an immediately interpretable table
Example: Call Center Campaign
 Assuming average cost per call is $5, each positive responder (purchaser) will
generate additional cost due to:
 Administration work required to register him as a new customer
 Cost of the delivered phone handset ($100)
 Customers who respond positively will generate average revenue of $1,000 per
year
COST/REVENUE MATRIX – CALL CENTER CAMPAIGN

Cost/Revenue matrix In reality prospect did In reality prospect did

not purchase purchase
Model predicts prospect Cost: $0 lost business opportunity
will not purchase 1st year revenue: $0 of +$895
(not contacted) Total: $0
Model predicts prospect Cost: -$5 Cost: -$5; -$100
will purchase (contacted) 1st year revenue: $0 1st year revenue: +$1000
Total: -$5 Total: +$895
GET RAW DATA

 Business objectives need to be translated into data requirements

 Identified data has to be extracted and consolidated in a database
 The quality of the analytical data has to be checked (business
context)
GET RAW DATA

 Three steps of getting raw data:

 Step 1: Looking for Data Sources

 Step 2 : Loading the Data
 Step 3 : Checking Data Quality
STEP 1: LOOKING FOR DATA SOURCES

 Data sourcing
 Mixed top-down and bottom-up process driven by business requirements
(top) and technical restrictions (bottom)
 Data warehouse infrastructures with advanced data cleansing processes can
help ensure working with high-quality data
 All metadata available has to be collected to fully understand data types,
value ranges and the primary/ foreign key structures
 Build a (simple) relational data model onto which the source data will be
mapped
STEP 2: LOADING THE DATA

 Defining further query restrictions in order to model subsets of the full

data
 After data management is requested to deliver the specified data, IT
teams prepare the necessary data queries
 Deliver extracted data to the data mining environment in a pre-defined
format
 Further processing and using data to fill previously defined data model in
the data mining environment as part of the ETL process (Extract-
Transform-Lo
STEP 3: CHECKING DATA QUALITY

 Importance of data quality

 According to Olson (2003) the costs of poor data quality are
estimated at 15-25% of operating profit
 Assess and understand limitations of data resulting from its
inherent quality (good or bad) aspects
 Create an analytical database as the basis for subsequent
analyses
STEP 3: CHECKING DATA QUALITY

 Relevant aspects of data quality

 Accuracy (consistency and validity)
 Relevance
 Completeness
 Reliability
STEP 3: CHECKING DATA QUALITY

 Carry out preliminary data quality assessment

 To assure an acceptable level of quality of the delivered data
 To ensure that the data mining team has a clear understanding of
how to interpret the data in business terms
 Data miners have to carry out some basic data interpretation and
aggregation exercises
STEP 3: CHECKING DATA QUALITY

 The data available for the mining project must be analyzed to

answer the following questions:
 Does the data correspond to the original sourcing requirements?
 Is the quality sufficient?
 Do we understand the data?
IDENTIFY RELEVANT PREDICTIVE VARIABLES

 Step 1: Create Analytical Customer View – Flattening the Data

 Step 2: Create Analytical Variables
 Step 3: Select Predictive Variables
STEP 1: CREATE ANALYTICAL CUSTOMER VIEW – FLATTENING
THE DATA

 Individual customer constitutes an observational unit for data analysis and

predictive modeling
 All data pertaining to an individual customer is contained in one observation
(row, record)
 Individual columns (variables, fields) represent the conditions at specific
points in time or a summary over a whole period – this requires
denormalizing the original relational data structures (flattening)
 Definition of the target or dependent variable- values should be generated
for all customers and added to the existing data tables
STEP 2: CREATE ANALYTICAL VARIABLES

 Introduce additional variables derived from the original ones

 When needed, transform variables to get new and more predictive
variables
 Example: Transform customer birth date into age
 Increase normality of variable distributions to help the predictive
model training process
 Missing value management is key for enhancing the quality of the
analytical data set
STEP 3: SELECT PREDICTIVE VARIABLES

 Inspect the descriptive statistics of all univariate distributions associated to all available
variables
 Variables that can be excluded
 Taking on only one value (i.e. the variable is a constant)
 With mostly missing values
 Directly or indirectly identifying an individual customer
 Showing collinearities
 Showing very little correlation with the target variable
 Containing personal identifiers
 Check if all variables have been mapped to the appropriate data types
GAIN CUSTOMER INSIGHT

 Step 1: Preparing Data Sample

 Step 2: Predictive Modeling
 Step 3: Select Model
STEP 1: PREPARING DATA SAMPLES

 Analyze if sufficient data is available to obtain statistically

significant results
 If enough data is available, split the data into two samples
 The train set to fit the models
 The test set to check the model’s performance on observations
that have not been used to build it
STEP 2: PREDICTIVE MODELING

 Two steps:
 The rules (or linear / non-linear analytical models) are built based
on a training set
 These rules are then applied to a new dataset for generating the
answers needed for the campaign
STEP 2: PREDICTIVE MODELING

 Guidelines:
 Distinguish between different types of predictive models obtained
through different modeling paradigms: supervised and unsupervised
modeling
 Find the right relationships between variables describing the customers
to predict their respective group membership likelihood: purchaser or
non-purchaser, referred to as scoring (e.g., between 0 and 1)
 Apply unsupervised modeling where group membership is not known
beforehand
STEP 3: SELECT MODEL

 Compare relative quality of prediction by comparing respective

misclassification rates obtained on the test set
 Economic implications of a model by applying the previously defined
cist / revenue matrix will be included
 Predictive models, for instance, deliver a score value, or likelihood,
for each customer to show the modeled target behavior (e.g.,
purchase of a credit card)
ACT

 Step 1: Deliver Results to Operational Systems

 Step 2: Archive Results
 Step 3: Learn
STEP 1: DELIVER RESULTS TO OPERATIONAL SYSTEMS

 Apply the selected model to the entire customer base

 Prepare score data set containing the most recent information for each
customer with the variables required by the model
 The obtained score value for each customer and the defined threshold
value will determine whether the corresponding customer qualifies to
participate in the campaign
 When delivering results to the operational systems, provide necessary
customer identifiers to unambiguously link the model’s score
information to the correct customer
STEP 2: ARCHIVE RESULTS

 Each data mining project will produce a huge amount of information including:
 Raw data used
 Transformations for each variable
 Formulas for creating derived variables
 Train, test and score data sets
 Target variable calculation
 Models and their parameterizations
 Score threshold levels
 Final customer target selections
STEP 2: ARCHIVE RESULTS

 Useful to preserve especially if the same model is used to score

different data sets obtained at different times
STEP 3: LEARN

 Referred to as “closing the loop”

 Obtain the facts describing performance of data mining project and
business impact
 These facts are attained by monitoring campaign performance while
it is running and from final campaign performance analysis after the
campaign has ended
 Detect when a model has to be re-trained
SUMMARY

 Data Mining can assist in selecting the right target customers or in identifying previously
unknown customers with similar behavior and needs
 A good target list is likely to increase purchase rates and has a positive impact on
revenue
 In the context of CRM, the individual customer is often the central object analyzed by
means of data mining methods
 A complete data mining process comprises assessing and specifying the business
objectives, data sourcing, transformation and creation of analytical variables and
building analytical models using techniques such as logistic regression and neural
networks, scoring customers and obtaining feedback from the field
 Learning and refining the data mining process is the key to success

Market Segmentation For Airlines
No ratings yet
Market Segmentation For Airlines
1 page
CRM _ Part 3 _ Analytical CRM_Chap 7
No ratings yet
CRM _ Part 3 _ Analytical CRM_Chap 7
36 pages
Directed Data Mining
No ratings yet
Directed Data Mining
34 pages
DM in Marketing
No ratings yet
DM in Marketing
14 pages
CRM L4B Using Databases (1)
No ratings yet
CRM L4B Using Databases (1)
15 pages
ModelQB - Part B&C-1
No ratings yet
ModelQB - Part B&C-1
51 pages
Lecture 7 8 Data Mining
No ratings yet
Lecture 7 8 Data Mining
23 pages
LJS Assignment 3
No ratings yet
LJS Assignment 3
10 pages
Data Mining
No ratings yet
Data Mining
13 pages
CRM Chapter 7
No ratings yet
CRM Chapter 7
32 pages
Data Mining Information
No ratings yet
Data Mining Information
7 pages
2 & 3_Business Problems and Science Solution
No ratings yet
2 & 3_Business Problems and Science Solution
26 pages
Data Mining Concepts
100% (3)
Data Mining Concepts
122 pages
09-Datamining Concepts
100% (1)
09-Datamining Concepts
121 pages
BI module 4
No ratings yet
BI module 4
8 pages
Data
No ratings yet
Data
9 pages
Data Mining
No ratings yet
Data Mining
69 pages
Lecture 1 & 2- Introduction to Data Mining2
No ratings yet
Lecture 1 & 2- Introduction to Data Mining2
19 pages
Crisp-Dm
No ratings yet
Crisp-Dm
4 pages
Assignment For Knowledge Management Submitted By: Chuimatai Shimray Roll No: 201816
No ratings yet
Assignment For Knowledge Management Submitted By: Chuimatai Shimray Roll No: 201816
3 pages
Introduction to Data Mining
No ratings yet
Introduction to Data Mining
27 pages
Data Mining U Bankarstvu: Apstrakt
No ratings yet
Data Mining U Bankarstvu: Apstrakt
7 pages
BDMDM Course Outline
No ratings yet
BDMDM Course Outline
3 pages
UNIT3
No ratings yet
UNIT3
125 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Data Mining Implementation Process
No ratings yet
Data Mining Implementation Process
9 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
crisp (1)
No ratings yet
crisp (1)
31 pages
T Assignment
No ratings yet
T Assignment
5 pages
Crisp
No ratings yet
Crisp
28 pages
UNIT-2_BI
No ratings yet
UNIT-2_BI
58 pages
Data Mining
No ratings yet
Data Mining
14 pages
Data Mining
No ratings yet
Data Mining
30 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
PredictiveAnalysis U1 U2
No ratings yet
PredictiveAnalysis U1 U2
7 pages
Data Warehouse Fundamentals: Instructor: Paul Chen
No ratings yet
Data Warehouse Fundamentals: Instructor: Paul Chen
97 pages
12 When To Use Data Mining
No ratings yet
12 When To Use Data Mining
19 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Data Mining - 15 Paginas
No ratings yet
Data Mining - 15 Paginas
8 pages
01 Unit1
No ratings yet
01 Unit1
13 pages
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
0% (1)
Datamining: by Guan Hang Su Cs157A Section 2 Fall 2005
31 pages
PREDICTIVE & PRESCRIPTIVE ANALYTICS
No ratings yet
PREDICTIVE & PRESCRIPTIVE ANALYTICS
19 pages
Data Mining.intro
No ratings yet
Data Mining.intro
17 pages
Intorduction To Data Mining
No ratings yet
Intorduction To Data Mining
26 pages
2-Overview of Data Mining
No ratings yet
2-Overview of Data Mining
19 pages
Prof. Chandan Singhavi
No ratings yet
Prof. Chandan Singhavi
86 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
7-Lecture Jan 15th 2014
No ratings yet
7-Lecture Jan 15th 2014
40 pages
Data Mning Tools and TechniquesAIMA
No ratings yet
Data Mning Tools and TechniquesAIMA
97 pages
Data Mining Poster
No ratings yet
Data Mining Poster
1 page
Crisp DM 1stclass
No ratings yet
Crisp DM 1stclass
30 pages
Web Mining: Faculty of Information Technology Department of Software Engineering and Information Systems
No ratings yet
Web Mining: Faculty of Information Technology Department of Software Engineering and Information Systems
67 pages
Data Mining
No ratings yet
Data Mining
31 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
2 Data Mining Process
No ratings yet
2 Data Mining Process
5 pages
HND - BI - W8 - Data Mining
No ratings yet
HND - BI - W8 - Data Mining
19 pages
data mining
No ratings yet
data mining
17 pages
Introduction To Data Mining: Business Analytics, 1e
No ratings yet
Introduction To Data Mining: Business Analytics, 1e
18 pages
2 crisp-DM
No ratings yet
2 crisp-DM
28 pages
CIW Data Analyst Exam Prep: 500 Practice Questions for Certification Success
From Everand
CIW Data Analyst Exam Prep: 500 Practice Questions for Certification Success
Steve Brown
No ratings yet
Effective Analytics for Marketing
From Everand
Effective Analytics for Marketing
Sucheta Kakkar
No ratings yet
Lesson Plan
No ratings yet
Lesson Plan
10 pages
Typing and Computer Keyboard
No ratings yet
Typing and Computer Keyboard
23 pages
Lesson 7 - CRM and Database Marketing
No ratings yet
Lesson 7 - CRM and Database Marketing
27 pages
Lesson 8 - Design Coordination and Service Catalogue
No ratings yet
Lesson 8 - Design Coordination and Service Catalogue
35 pages
Lesson 6 - SQL SELECT II
No ratings yet
Lesson 6 - SQL SELECT II
87 pages
Lesson 5 - Customer Analytics and Centricity
No ratings yet
Lesson 5 - Customer Analytics and Centricity
65 pages
Unit 2 CUSTOMER VALUE
No ratings yet
Unit 2 CUSTOMER VALUE
54 pages
Customer Value
No ratings yet
Customer Value
251 pages
Activity Stat
No ratings yet
Activity Stat
6 pages
الدروس الخصوصية بين مطالب التلاميذ ومسئولية الأساتذة
No ratings yet
الدروس الخصوصية بين مطالب التلاميذ ومسئولية الأساتذة
20 pages
ML Lab Programs
No ratings yet
ML Lab Programs
21 pages
Assignment Six 20 Points
No ratings yet
Assignment Six 20 Points
8 pages
Fit Indices Accepted Value Absolute Fit Measures
No ratings yet
Fit Indices Accepted Value Absolute Fit Measures
4 pages
GROUP 4 PPT Format
No ratings yet
GROUP 4 PPT Format
40 pages
Get Introduction to Data Mining Global Edition Pang Ning Tan Michael Steinbach Anuj Karpatne Vipin Kumar PDF ebook with Full Chapters Now
100% (3)
Get Introduction to Data Mining Global Edition Pang Ning Tan Michael Steinbach Anuj Karpatne Vipin Kumar PDF ebook with Full Chapters Now
65 pages
Amaron Battery
No ratings yet
Amaron Battery
59 pages
Travel Agg
No ratings yet
Travel Agg
54 pages
Journal of Business Research: Dursun Delen, Hamed M. Zolbanin T
No ratings yet
Journal of Business Research: Dursun Delen, Hamed M. Zolbanin T
10 pages
Data Analysis For Social Scientists (14.1310x)
No ratings yet
Data Analysis For Social Scientists (14.1310x)
12 pages
Assessment of Adoption of Improved Maize Varieties in Ayamelum LGA, Anambra State-NWANKWO COLUMBUS CHINEDUM
No ratings yet
Assessment of Adoption of Improved Maize Varieties in Ayamelum LGA, Anambra State-NWANKWO COLUMBUS CHINEDUM
28 pages
PL 300ExamRequirements240206
No ratings yet
PL 300ExamRequirements240206
4 pages
ANAK - Interpretasi WHO BOY
No ratings yet
ANAK - Interpretasi WHO BOY
34 pages
A Review On Data Mining Techniques For Fertilizer Recommendation
No ratings yet
A Review On Data Mining Techniques For Fertilizer Recommendation
5 pages
Business Analytics
No ratings yet
Business Analytics
10 pages
STAT 3001/7301: Mathematical Statistics: Week 3 - Lecture 8
No ratings yet
STAT 3001/7301: Mathematical Statistics: Week 3 - Lecture 8
20 pages
Eviews Commands
100% (1)
Eviews Commands
3 pages
Pengaruh Teknik Budidaya Terhadap Produksi Kopi (Coffea Spp. L.) MASYARAKAT KARO
No ratings yet
Pengaruh Teknik Budidaya Terhadap Produksi Kopi (Coffea Spp. L.) MASYARAKAT KARO
16 pages
GE MODMAT Unit 4 Statistics 1
No ratings yet
GE MODMAT Unit 4 Statistics 1
14 pages
Chapter 4 Thesis Sample Quantitative
100% (3)
Chapter 4 Thesis Sample Quantitative
8 pages
Negative Scores: Table A-2
No ratings yet
Negative Scores: Table A-2
10 pages
Exploring The Use of Metrics For Software Assurance
No ratings yet
Exploring The Use of Metrics For Software Assurance
69 pages
Practical-Research-2-module
No ratings yet
Practical-Research-2-module
42 pages
Title Authors and Affiliation Methods Results Discussion Acknowledgments Reference
No ratings yet
Title Authors and Affiliation Methods Results Discussion Acknowledgments Reference
8 pages
How Big Data Is Different
No ratings yet
How Big Data Is Different
5 pages
ECN225sol4 PDF
0% (1)
ECN225sol4 PDF
5 pages
Lab Experiment No. 3 Part A Part B Name: Dhruv Jain SAP ID: 60004190030 Div/Batch: A/A2 Aim
No ratings yet
Lab Experiment No. 3 Part A Part B Name: Dhruv Jain SAP ID: 60004190030 Div/Batch: A/A2 Aim
5 pages
Computer Aided Simulation and Analysis Lab Manual - 7
100% (2)
Computer Aided Simulation and Analysis Lab Manual - 7
98 pages

Lesson 6 - Data Mining

Uploaded by

Lesson 6 - Data Mining

Uploaded by

DATA MINING

 The Need for Data Mining and Business Value

 Companies developed a need to understand their customers better

They must be proactive and anticipate customer desires

Data mining provides businesses with the ability to make

 The increasing data deluge instead of lack of data generates a

 Data mining can assist in selecting the right target customers or in

 Data mining group

 Understand business objectives and support the business group

 Source and extract required data used for modeling

 Check plausibility and soundness of the solution in business

 In a simple, two-dimensional data table columns represent

 For modeling, incoming data is sampled and split into various

 Modeling of expected customer potential in order to target the acquisition

 Define a set of business or selection rules for the campaign (e.g.,

 Define a cost/revenue matrix describing how the business

Cost/Revenue matrix In reality prospect did In reality prospect did

 Business objectives need to be translated into data requirements

 Three steps of getting raw data:

 Step 1: Looking for Data Sources

 Defining further query restrictions in order to model subsets of the full

 Importance of data quality

 Relevant aspects of data quality

 Carry out preliminary data quality assessment

 The data available for the mining project must be analyzed to

 Step 1: Create Analytical Customer View – Flattening the Data

 Individual customer constitutes an observational unit for data analysis and

 Introduce additional variables derived from the original ones

 Step 1: Preparing Data Sample

 Analyze if sufficient data is available to obtain statistically

 Compare relative quality of prediction by comparing respective

 Step 1: Deliver Results to Operational Systems

 Apply the selected model to the entire customer base

 Useful to preserve especially if the same model is used to score

 Referred to as “closing the loop”

You might also like