Ch1 2

The document outlines methodologies for data analysis, focusing on two main approaches: Top-Down hypothesis testing and Bottom-Up knowledge discovery. It details the steps involved in hypothesis testing, including theory generation, data preparation, model building, and evaluation, as well as the process of knowledge discovery through machine learning. The ultimate goal is to identify actionable insights from data that can drive management decisions and improve business outcomes.

Uploaded by

yjcho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views23 pages

Ch1 2

Uploaded by

yjcho

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Methodology

Analyzing data
Given management goals and that management
can translate knowledge into action
Basic Styles
• Top-Down: HYPOTHESIS TESTING
– SUPERVISED
– have a theory, experiment to prove or disprove
– SCIENCE
• Bottom-Up: KNOWLEDGE DISCOVERY
– UNSUPERVISED
– start with data, see new patterns
– CREATIVITY
Hypothesis Testing
• Generate theory
• Determine data needed
• Get data
• Prepare data
• Build computer model
• Evaluate model results
– confirm or reject hypotheses
Generate Theory
• Study
• Systematically tie different input sources
together (MENTAL MODEL)
– what causes sales volume?
• Sales rep performance
• economy, seasonality
• product quality, price, promotion, location
Generate Theory
• Brainstorm:
– diverse representatives for broad coverage of
perspectives (electronic)
– keep under control (keep positive)
– generate testable hypotheses
Define Data Needed
• Determine data needed to test hypothesis
– Lucky - query existing database
– More often - gather
• pull together from diverse databases, survey, buy
Locate Data
• Usually scattered or unavailable
• Sources: warranty claims
point-of-sale data (cash register records)
medical insurance claims
telephone call detail records
direct mail response records
demographic data, economic data
• PROFILE: counts, summary statistics, cross-tabs,
cleanup
Prepare Data for Analysis
• Summarize: too much - no discriminant information
too little - swamped with useless detail
• Process for computer: EBCDIC, ASCII
• Data encoding: how data is recorded can vary
may have been collected with specific purpose (CAL
omitting LA)
• Textual data: avoid if possible (may need to code)
• Missing values: missing salary- use mean?
Build Computer Model
• Convert mental model into quantitative
– roamers less sensitive to price than others
• threshold defining roamer
• average price per call, or number of calls above
price level
– families with children in high school most
likely to respond to home-equity loan offer
• identify families with, without high school age
• past data - responded or didn’t
Evaluate Model
• Determine if hypotheses supported
– statistical practice
– test rule-based systems for accuracy
• Requires both business and analytic
knowledge
SUPERVISED
Dorn, National Underwriter Oct 18, 2004, 34,39
• Health care fraud
– Use statistics to identify
indicators of fraud or abuse
– Can rapidly sort through
large databases
• Identify patterns different
from norm
– Moderately successful
• But only effective on
schemes already detected
• To benefit firm, need to
identify fraud prior to
paying claim
Knowledge Discovery
• Machine learning?
– Usually need intelligent analyst
• Directed: explain value of some variable
• Undirected: no dependent variable selected
– identify patterns
• use undirected to recognize relationships,
• directed to explain once found
Directed
• Goal-oriented
• Examples: if discount apples, impact on products
who is likely to purchase credit insurance?
Predicted profitability of new customer
what to bundle with a particular package
• Identify sources of preclassified data
• Prepare data for analysis
• Built & train computer model
• Evaluate
Identify Data Sources
• Best - existing corporate data warehouse
– data clean, verified, consistent, aggregated
• usually need to generate
– most data in form most efficient for designed
purpose
– historical sales data often purged for dormant
customers (but you need that information)
Prepare Data
• Put in needed format for computer
• Make consistent in meaning
• need to recognize what data is missing
change in balance = new - old
add missing but known-to-be-important data
• divide data into training, test, evaluation
• decide how to treat outliers
– statistically biasing, but may be most important
Build & Train Model
• Regression - human builds (selects IVs)
• Automatic systems train
– give it data, let it hammer
• OVERFITTING:
– fit the data
– TEST SET a means to evaluate model against
data not used in training
• tune weights before using to evaluate
Evaluate Model
• ERROR RATE: proportion of
classifications in evaluation set that were
wrong
• too little training: poor fit on training data
and poor error rate
• optimal training: good fit on both
• too much training: great fit on training data
and poor error rate
Undirected Discovery
• What items sell together? Strawberries & cream
– Directed: what items sell with tofu? tabasco
• Long distance caller market segmentation
– uniform usage-weekday & weekend, spikes on
holidays
– after segmentation:
high & uniform except for several months of nothing
high credit worthiness & profitability college students
UNSUPERVISED
• Health care fraud
– Look at historical
claim submissions
• Build ad hoc model to
compare with current
claims
– Assign similarity score
to fraudulent claims
– Predict fraud potential
Undirected Process
• Identify data sources
• Prepare data
• Build & train computer model
• Evaluate model
• Apply model to new data
• Identify potential targets for undirected
• Generate new hypotheses to test
Identify potential targets
• Why
• Who
• When
Generate hypotheses
• Any commonalities in data?
• Are they useful?
– Many adults watch children’s movies
• chaperones an important market segment
• they probably make final decision
• when hypothesis is generated, that
determines data needed
Summary
• Knowledge Discovery
– New paradigm of data analysis
– Discover unexpected patterns
• ACTIONABLE – can make money from this
knowledge

Planning For Estidama
No ratings yet
Planning For Estidama
34 pages
NeuralHack Stage 2 Python
100% (1)
NeuralHack Stage 2 Python
2 pages
Hauz Khas Urban Village
No ratings yet
Hauz Khas Urban Village
7 pages
MS For Survey Works (Draft) R5
No ratings yet
MS For Survey Works (Draft) R5
47 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
XRIO User Manual
No ratings yet
XRIO User Manual
38 pages
Surds PDF
No ratings yet
Surds PDF
12 pages
Project On Mysql
No ratings yet
Project On Mysql
67 pages
ADM 2302: Introduction To Business Analytics
No ratings yet
ADM 2302: Introduction To Business Analytics
49 pages
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
No ratings yet
Mitutoyo - Przenośny Twardościomierz Leeb HH-411 - 2006 EN
2 pages
Sess01 Intro
No ratings yet
Sess01 Intro
38 pages
ONLINE PRACTICE 26.7.2021 - EC5-14 (Code: N.2)
No ratings yet
ONLINE PRACTICE 26.7.2021 - EC5-14 (Code: N.2)
13 pages
IME 672-Chapter 1 PDF
No ratings yet
IME 672-Chapter 1 PDF
41 pages
Introduction To Analytics
100% (1)
Introduction To Analytics
45 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
ICT 7 Learning Module
No ratings yet
ICT 7 Learning Module
77 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
61 pages
Business Analytics
No ratings yet
Business Analytics
25 pages
Analytics: Understanding Patterns: Tuesday 10 July 2012
100% (1)
Analytics: Understanding Patterns: Tuesday 10 July 2012
15 pages
Marketing Analytics Session-I&II S
No ratings yet
Marketing Analytics Session-I&II S
45 pages
Sepam80 64REF Wiring 4wire Low-Voltage Transformer T81 v0
No ratings yet
Sepam80 64REF Wiring 4wire Low-Voltage Transformer T81 v0
2 pages
Max 15.0V at 12V Max 31.5V at 24V Max 61.0V at 48V: Main Features
No ratings yet
Max 15.0V at 12V Max 31.5V at 24V Max 61.0V at 48V: Main Features
1 page
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
Business Analytics Process and Data Exploration
No ratings yet
Business Analytics Process and Data Exploration
38 pages
Project Management: - Dr. Gyanesh Kumar Sinha Associate Professor - Operations and Analytics
No ratings yet
Project Management: - Dr. Gyanesh Kumar Sinha Associate Professor - Operations and Analytics
10 pages
M62015L, FP M62016L, FP: V C Reset INT GND
No ratings yet
M62015L, FP M62016L, FP: V C Reset INT GND
4 pages
Analytics
No ratings yet
Analytics
20 pages
Overview of Data Mining Process
No ratings yet
Overview of Data Mining Process
43 pages
CHAPTER 3-4 (Reviewer)
No ratings yet
CHAPTER 3-4 (Reviewer)
50 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Unit 1 - Data Scientist Tool Box
No ratings yet
Unit 1 - Data Scientist Tool Box
26 pages
Data Science With Python - Lesson 02 - Data Analytics Overview
No ratings yet
Data Science With Python - Lesson 02 - Data Analytics Overview
54 pages
9-Mm Pistol Pmi Training: REF: FM 23 - 35
No ratings yet
9-Mm Pistol Pmi Training: REF: FM 23 - 35
30 pages
Chapter 1: Introduction To Business Analytics
No ratings yet
Chapter 1: Introduction To Business Analytics
14 pages
Lesson 04 Data Analytics Overview
No ratings yet
Lesson 04 Data Analytics Overview
47 pages
Forward and Inverse Modeling of Gravity Data
No ratings yet
Forward and Inverse Modeling of Gravity Data
14 pages
Ma 1
No ratings yet
Ma 1
31 pages
Opticalsmokedetector Salwicoev P
No ratings yet
Opticalsmokedetector Salwicoev P
2 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
32 pages
FDS Introduction
No ratings yet
FDS Introduction
41 pages
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
No ratings yet
List of Government Colleges Affiliated To The University of Jammu (ACADEMIC SESSION 2020-21)
9 pages
Introd M
No ratings yet
Introd M
37 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Data Mining
No ratings yet
Data Mining
38 pages
2 - Preprocessing
No ratings yet
2 - Preprocessing
74 pages
Or Lecture 1
No ratings yet
Or Lecture 1
30 pages
Q. No Sub Q.No Answer: (Autonomous)
No ratings yet
Q. No Sub Q.No Answer: (Autonomous)
23 pages
Transactions - 1
No ratings yet
Transactions - 1
41 pages
Literary Voice - March 2021
No ratings yet
Literary Voice - March 2021
372 pages
Assignment/Tugasan: Kod Kursus /course Code: EBTQ 3103 Tajuk Kursus /course Title: Quality Control
No ratings yet
Assignment/Tugasan: Kod Kursus /course Code: EBTQ 3103 Tajuk Kursus /course Title: Quality Control
6 pages
03-Data Science Methodology
No ratings yet
03-Data Science Methodology
8 pages
Chapter 1 Richardson
No ratings yet
Chapter 1 Richardson
13 pages
Analytics and Data Science - Self Notes
No ratings yet
Analytics and Data Science - Self Notes
35 pages
Fundamentals of Business Analytics With Spreadsheet - Notes
No ratings yet
Fundamentals of Business Analytics With Spreadsheet - Notes
22 pages
fml-g12s Ds en
No ratings yet
fml-g12s Ds en
7 pages
PTDLKT
No ratings yet
PTDLKT
11 pages
Hair EOMA 1e Chap001 PPT
No ratings yet
Hair EOMA 1e Chap001 PPT
23 pages
Glossary of Problem & Approach
No ratings yet
Glossary of Problem & Approach
3 pages
File 1704445511 0009750 Unit-1 PPT 01
No ratings yet
File 1704445511 0009750 Unit-1 PPT 01
41 pages
03 Business Analytics
No ratings yet
03 Business Analytics
33 pages
Economic-Geology-1965 - v60-n07 - P1459-P1477structural Analysis of Ore Shoots at Greenside
No ratings yet
Economic-Geology-1965 - v60-n07 - P1459-P1477structural Analysis of Ore Shoots at Greenside
19 pages
Susanto Update Cv.2023
No ratings yet
Susanto Update Cv.2023
3 pages
Accounting Analytics 2
No ratings yet
Accounting Analytics 2
41 pages
Lecture 2
No ratings yet
Lecture 2
18 pages
Unit 1 221226040256 44f48981
No ratings yet
Unit 1 221226040256 44f48981
32 pages
CH05 Business Analytics Process and Data Exploration
No ratings yet
CH05 Business Analytics Process and Data Exploration
37 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
39 pages
Directed Data Mining
No ratings yet
Directed Data Mining
34 pages
OB Biruktawit Zegeye
No ratings yet
OB Biruktawit Zegeye
6 pages
MGMT 134 C1
No ratings yet
MGMT 134 C1
5 pages
Report Shawari
No ratings yet
Report Shawari
10 pages
4 Political Frame Worksheet
No ratings yet
4 Political Frame Worksheet
3 pages
15 Remanufact
No ratings yet
15 Remanufact
6 pages
Chapter 1
No ratings yet
Chapter 1
50 pages
Introduction To Business Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
No ratings yet
Introduction To Business Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
37 pages
Revision For Mid Term Test
No ratings yet
Revision For Mid Term Test
7 pages
Big Data Analytics Introduction-Lect 1
No ratings yet
Big Data Analytics Introduction-Lect 1
26 pages
Info Analytics Review
No ratings yet
Info Analytics Review
18 pages
CRM - Part 3 - Analytical CRM - Chap 7
No ratings yet
CRM - Part 3 - Analytical CRM - Chap 7
36 pages
MGOC15 Lecture 1 - Final
No ratings yet
MGOC15 Lecture 1 - Final
49 pages
Unit - 1
No ratings yet
Unit - 1
32 pages
Data Analytics Part 3
No ratings yet
Data Analytics Part 3
54 pages
Unit 3-BA
No ratings yet
Unit 3-BA
31 pages
(Ebook PDF) Business Analytics 3rd Edition by Jeffrey D. Camm Download
100% (2)
(Ebook PDF) Business Analytics 3rd Edition by Jeffrey D. Camm Download
55 pages
6 - InnovatiCS - Data Visualization (Numerical & Graphical Descriptive Statistics)
No ratings yet
6 - InnovatiCS - Data Visualization (Numerical & Graphical Descriptive Statistics)
96 pages
5 - InnovatiCS - Data Types - Measure of Shape - Position - Dispersion
No ratings yet
5 - InnovatiCS - Data Types - Measure of Shape - Position - Dispersion
47 pages
Ground Floor Containment Overall Layout
No ratings yet
Ground Floor Containment Overall Layout
1 page
Pragmalytics: Practical Approaches to Marketing Analytics in the Digital Age
From Everand
Pragmalytics: Practical Approaches to Marketing Analytics in the Digital Age
Cesar A. Brea
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet

Ch1 2

Uploaded by

Ch1 2

Uploaded by

Methodology

You might also like