0% found this document useful (0 votes)

15 views52 pages

Chapter 04 - in Class

Chapter 4 of the document discusses the data mining process, methods, and algorithms, emphasizing its role as an enabling technology for business analytics. It outlines the objectives and benefits of data mining, various applications across industries, and standardized processes like CRISP-DM, SEMMA, and KDD. The chapter also covers different data mining methods, including classification, clustering, and association rule mining, along with their respective techniques and applications.

Uploaded by

sanasyed806

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views52 pages

Chapter 04 - in Class

Uploaded by

sanasyed806

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Analytics, Data Science and AI:

Systems for Decision Support

Eleventh Edition

Chapter 4
Data Mining Process, Methods, and
Algorithms

Slide in this Presentation Contain Hyperlinks.

JAWS users should be able to get a list of links
by using INSERT+F7

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Learning Objectives
4.1 Define data mining as an enabling technology for
business analytics
4.2 Understand the objectives and benefits of data mining
4.3 Become familiar with the wide range of applications of
data mining
4.4 Learn the standardized data mining processes
4.5 Learn different methods and algorithms of data mining
4.6 Build awareness of the existing data mining software
tools
4.7 Understand the privacy issues, pitfalls, and myths of
data mining
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette
Miami-Dade Police Department Is Using
Predictive Analytics to Foresee and Fight Crime
• Predictive analytics in law enforcement
– Policing with less
– New thinking on cold cases
– The big picture starts small (robbery unit)
– Success brings credibility
– Just for the facts
– Safer streets for smarter cities

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Why Data Mining?
• Recognition of the value in data sources.
• Availability of quality data on customers, vendors,
transactions, Web, etc.
• Consolidation and integration of data repositories into data
warehouses.
• The exponential increase in data processing and storage
capabilities; and decrease in cost.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Definition of Data Mining
• The nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data stored in structured databases.
-- Fayyad et al., (1996)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Is a Blend of Multiple
Disciplines
Figure 4.1 Data Mining Is a Blend of Multiple Disciplines.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Characteristics &
Objectives
• Source of data for DM is often a consolidated data
warehouse (not always!).
• DM environment is usually a client-server or a Web-based
information systems architecture.
• Data is the most critical ingredient for DM which may
include unstructured data.
• The miner is often an end user.
• Striking it rich requires creative thinking.
• Data mining tools’ capabilities and ease of use are
essential (web, parallel processing, etc.)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
How Data Mining Works
• DM extract patterns from data
– Pattern? A mathematical (numeric and/or symbolic)
relationship among data items
• Types of patterns
– Association: commonly co-occurring things
– Prediction: future occurrences of certain events
prediction
– Clustering (Segmentation): natural grouping of things
– Sequential relationships: time-ordered events
discovery

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Figure 4.2 A Simple Taxonomy for Data Mining Tasks, Methods, and Algorithms.
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining versus Statistics
• Are they the same?
– Same: Relationships within data
– Difference
 Statistics: well-defined hypothesis with manageable
dataset size
 Data Mining: loosely defined discovery statement
for patterns, lots of data; often used as a model for
future events

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Applications (1 of 4)
• Customer Relationship Management
– Maximize return on marketing campaigns
– Improve customer retention (churn analysis)
– Maximize customer value (cross-, up-selling)
– Identify and treat most valued customers
• Banking & Other Financial
– Automate the loan application process
– Detecting fraudulent transactions
– Maximize customer value (cross-, up-selling)
– Optimizing cash reserves with forecasting

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Applications (2 of 4)
• Retailing and Logistics
– Optimize inventory levels at different locations
– Improve the store layout and sales promotions
– Optimize logistics by predicting seasonal effects
– Minimize losses due to limited shelf life
• Manufacturing and Maintenance
– Predict/prevent machinery failures
– Identify anomalies in production systems to optimize
the use manufacturing capacity
– Discover novel patterns to improve product quality

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Applications (3 of 4)
• Brokerage and Securities Trading
– Predict changes on certain bond prices
– Forecast the direction of stock fluctuations
– Assess the effect of events on market movements
– Identify and prevent fraudulent activities in trading
• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Applications (4 of 4)
• Computer hardware and software
• Science and engineering
• Government and defense
• Homeland security and law enforcement
• Travel, entertainment, sports
• Healthcare and medicine
• Sports,… virtually everywhere…

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Process
• A systematic way to conduct DM projects
• Moving from Art to Science for DM project
• Most common standard processes:
– CRISP-DM (Cross-Industry Standard Process for Data
Mining)
– SEMMA (Sample, Explore, Modify, Model, and
Assess)
– KDD (Knowledge Discovery in Databases)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Process: CRISP-DM
(1 of 2)
• Cross Industry Standard Process for Data Mining
• Proposed in 1990s by a European consortium
• Composed of six consecutive steps
– Step 1: Business Understanding  Accounts for

– Step 2: Data Understanding  ~85% of total
– Step 3: Data Preparation  project time

The above steps involve Descriptive Analytics, or exploratory data analysis (EDA)

– Step 4: Model Building

– Step 5: Testing and Evaluation
– Step 6: Deployment

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Process: CRISP-DM
(2 of 2)
• Figure 4.3 The Six-
Step CRISP-DM Data
Mining Process. 
• The process is highly
repetitive and
experimental

Figure 4.5 SEMMA Data Mining Process.

• Developed by SAS Institute

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Process: KDD
Figure 4.6 KDD (Knowledge Discovery in Databases)
Process.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Which Data Mining Process is the
Best?
Figure 4.7 Ranking of Data Mining Methodologies/Processes.

Source: Used with permission from KDnuggets.com.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Methods: Classification
• Part of the machine-learning family
• Employ supervised learning
• Learn from past data, classify new data
• The output/target variable is categorical (nominal or
ordinal) in nature
– If numeric, we often use the “regression” method

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Assessment Methods for
Classification
• Predictive accuracy
– Hit rate
• Speed
– Model building versus predicting/usage speed
• Robustness
– Performance on noisy, missing, or error data
• Scalability
– Performance on large amount of data
• Interpretability
– Insights provided by the model
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Accuracy of Classification Models
• In classification problems, the primary source for accuracy
estimation is the confusion matrix
TP  TN
Accuracy 
TP  TN  FP  FN

TP
Precision 
TP  FP

TN
True Positive Rate 
TP True Negative Rate 
TP  FN TN  FP
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Estimation Methodologies for
Classification: Simple Split
• Simple split (or holdout or test sample estimation)
– Split the data into 2 mutually exclusive sets: training
(~70%) and testing (30%)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Estimation Methodologies for
Classification: k-Fold Cross
Validation
• Data is split into k mutual subsets and k number training/testing
experiments are conducted
Figure 4.10 A Graphical Depiction of k-Fold Cross-Validation.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Additional Estimation Methodologies
for Validation
• Leave-one-out
– Similar to k-fold and testing on each data point
• Bootstrapping
– Random sampling with replacement
• Jackknifing
– Similar to leave-one-out, accuracy counted with one
sample out

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Area Under the ROC Curve (AUC)
• ROC curve: plotting the true positive rate on Y and false
positive rate on X

Figure 4.11 A Sample ROC Curve.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Area Under the ROC Curve (AUC)
• Works with binary classification
• The curve is generated by using different classification
probability threshold for a method
• The area under the curve (AUC) is used to compare
different methods
– values from 0 to 1.0
– the higher, the better methods
• Produces good assessment for skewed class distributions

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Estimating the Relative Importance
of Predictor Variables
• Sensitivity analysis
– Relative discernibility
– Input value perturbation
– Leave one out experiments

Figure 4.12 Graphical Depiction of the Sensitivity Analysis Process

• Decision tree analysis

• Statistical analysis
• Neural networks
• Case-based reasoning
• Bayesian classifiers
• Genetic algorithms
• Rough sets

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Decision Trees (1 of 2)
• Employs a divide-and-conquer method
• Recursively divides a training set until each division consists of
examples from one class (as possible)
• A general algorithm (steps) can be:
1. Create a root node and assign all of the training
data to it.
2. Select the best splitting attribute.
3. Add a branch to the root node for each value of
the split. Split the data into mutually exclusive
subsets along the lines of the specific split.
4. Repeat the steps 2 and 3 for each and every leaf
node until the stopping criteria is reached.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Decision Trees (2 of 2)
• DT algorithms mainly differ on
1. Splitting criteria
 Which variable, what value, etc.
 Best attribute to split for purifying the class
representation (e.g. Gini index, information gain)
2. Stopping criteria
 When to stop building the tree
3. Pruning (generalization method)
 Pre-pruning versus post-pruning
• Most popular DT algorithms include
– ID3, C4.5, C5; CART; CHAID; M5
Example in RapidMiner – Hotel App Customer Churn
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining
• Used for automatic identification of natural groupings of
things
• Part of the machine-learning family
• Employ unsupervised learning
• Learns the clusters of things from past data, then assigns
new instances
• There is NO output/target variable
– In marketing, it is also known as segmentation

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining
• Clustering results may be used to
– Identify natural groupings of customers
– Identify rules for assigning new cases to classes for
targeting/diagnostic purposes
– Provide characterization, definition, labeling of
populations
– Decrease the size and complexity of problems for
other data mining methods
– Identify outliers in a specific domain (e.g., rare-event
detection)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining
• Analysis methods
– Statistical methods, such as k-means, k-modes…
– Neural networks (self-organizing map)
– Fuzzy logic
– Genetic algorithms
• How many clusters?
– Determine the optimal number of clusters
• General approach
– Divisive (start with one cluster and then broken apart)
– Agglomerative (start as individual cluster and the
joined)
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining
• k-Means Clustering Algorithm
– k: pre-determined number of clusters
– Algorithm (Step 0: determine value of k)
Step 1: Randomly generate k random points as initial
cluster centers.
Step 2: Assign each point to the nearest cluster center.
Step 3: Re-compute the new cluster centers.
Repetition step: Repeat steps 3 and 4 until some
convergence criterion is met (usually when the
assignment of points to clusters becomes stable).

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining -
k-Means Clustering Algorithm
Figure 4.13 A Graphical Illustration of the Steps in the
k-Means Algorithm.

Example in RapidMiner: Credit Risk Clustering

• Finds interesting relationships (affinities) between

variables (items or events)
• Part of machine learning family
• Employs unsupervised learning
• There is NO output variable
• Also known as market basket analysis
• Famous example - “relationship between diapers and
beers!”

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining
• Input: the simple point-of-sale transaction data
• Output: Most frequent affinities among items
• Example: according to the transaction data…
“Customer who bought a lap-top computer and a virus
protection software, also bought extended service plan
70 percent of the time."
• How do you use such a pattern/knowledge?
– Put the items next to each other
– Promote the items as a package
– Place items far apart from each other!

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining
• Also named “Market-basket Analysis”
• Applications
– Sales transactions
– Credit card transactions
– Banking services
– Insurance service products
– Telecommunication services
– Medical records

A Generic Rule: X  Y [S%, C%]

Example: {Laptop Computer, Antivirus Software} 
{Extended Service Plan}
[30%, 70%]

X, Y: products and/or services

S (Support): how often X and Y go together
C (Confidence): how often Y happened given X
Are all association rules interesting and useful?

• Several algorithms are developed for discovering

(identifying) association rules
– Apriori
– Eclat
– FP-Growth
– Derivatives and hybrids of the three
• The algorithms help identify the frequent item sets, which
are, then converted to association rules

• Apriori Algorithm
– Finds subsets that are common to at least a minimum
number of the item sets (i.e. the minimum support)
– Uses a bottom-up approach to extend frequent item
sets one item a time

Source: Used with permission from KDnuggets.com.

• Goal: Predicting financial success of Hollywood movies

before the start of their production process
• How: Use of advanced predictive analytics methods
• Results: promising
• p. 239-242

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 4.6 (2 of 4)
Data Mining Goes to Hollywood: Predicting
Financial Success of Movies
A Typical Classification Problem
Table 4.3 Movie Classification
Class No. 1 2 3 4 5 6 7 8 9

Range <1 >1 >10 >20 >40 >65 >100 >150 >200
(in millions of dollars) (Flop) <10 <20 <40 <65 <100 <150 <200 (Blockbuster)

Table 4.4 Summary of Independent Variables

Independent Variable Number of Values Possible Values
MPA A Rating 5 G, PG, PG-13, R, NR
Competition 3 High, medium, low
Star value 3 High, medium, low
Genre 10 Sci-Fi, Historic Epic Drama, Modern Drama,
Politically Related, Thriller, Horror, Comedy,
Cartoon, Action, Documentary
Special effects 3 High, medium, low
Sequel 2 Yes, no
Number of screens 1 A positive integer between 1 and 3,876

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 4.6 (3 of 4)
Data Mining Goes to Hollywood: Predicting
Financial Success of Movies
FIGURE 4.16 Process
Flow Screenshot for the
Box-Office Prediction
System.

The DM Process Map in IBM

SPSS Modeler

Business Machines Corporation.
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 4.6 (4 of 4)
Data Mining Goes to Hollywood: Predicting
Financial Success of Movies

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Myths
Table 4.6 Data Mining Myths.
Myth Reality
Data mining provides instant, crystal-ball-like Data mining is a multistep process that
predictions. requires deliberate, proactive design
and use.
Data mining is not yet viable for mainstream The current state of the art is ready for
business applications. almost any business type and/or size.
Data mining requires a separate, dedicated Because of the advances in database
database. technology, a dedicated database is
not required.
Only those with advanced degrees can do Newer Web-based tools enable
data mining. managers of all educational levels to
do data mining.
Data mining is only for large firms that have If the data accurately reflect the
lots of customer data. business or its customers, any
company can use data mining.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Mistakes
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data mining is and
what it really can/cannot do
3. Beginning without the end in mind.
4. Not leaving insufficient time for data acquisition,
selection and preparation
5. Looking only at aggregated results and not at individual
records/predictions
6. … 10 more mistakes… in your book

This work is protected by United States copyright laws and is

provided solely for the use of instructors in teaching their
courses and assessing student learning. Dissemination or sale of
any part of this work (including on the World Wide Web) will
destroy the integrity of the work and is not permitted. The work
and materials from it should never be made available to students
except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and
the needs of other instructors who rely on these materials.

Integrated Mathematics IA
50% (2)
Integrated Mathematics IA
40 pages
Predictive Analytics
100% (1)
Predictive Analytics
62 pages
Sharda 11e Full Accessible PPT 04
No ratings yet
Sharda 11e Full Accessible PPT 04
40 pages
Chapter 4 - IS 466 - Fall Semester 24-25
No ratings yet
Chapter 4 - IS 466 - Fall Semester 24-25
57 pages
Chapter 4 - IS 466 - Spring Semester 23-24 Final
No ratings yet
Chapter 4 - IS 466 - Spring Semester 23-24 Final
57 pages
CH 05 PPTaccessible
No ratings yet
CH 05 PPTaccessible
60 pages
Chapter 6 - Data Mining
No ratings yet
Chapter 6 - Data Mining
62 pages
Chapter 4 SR2023
No ratings yet
Chapter 4 SR2023
58 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
Turban Dss9e Ch05
No ratings yet
Turban Dss9e Ch05
38 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
Turban Dss9e ch05
No ratings yet
Turban Dss9e ch05
54 pages
3510-6510 - Ch4 Predictive Analytics I
No ratings yet
3510-6510 - Ch4 Predictive Analytics I
66 pages
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
No ratings yet
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
50 pages
Handout 2 Data Mining
No ratings yet
Handout 2 Data Mining
16 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
48 pages
7-8 - Data Mining Process, Methods, and Algorithms
No ratings yet
7-8 - Data Mining Process, Methods, and Algorithms
64 pages
Data Mining - Bi 3
No ratings yet
Data Mining - Bi 3
40 pages
3510-6510 Ch4
No ratings yet
3510-6510 Ch4
62 pages
Screenshot 2024-06-04 at 12.07.18 AM
No ratings yet
Screenshot 2024-06-04 at 12.07.18 AM
45 pages
Screenshot 2024-06-03 at 11.59.21 PM
No ratings yet
Screenshot 2024-06-03 at 11.59.21 PM
45 pages
Screenshot 2024-06-04 at 12.00.45 AM
No ratings yet
Screenshot 2024-06-04 at 12.00.45 AM
45 pages
Screenshot 2024-06-04 at 12.01.00 AM
No ratings yet
Screenshot 2024-06-04 at 12.01.00 AM
45 pages
Business Intelligence: A Managerial Approach (2 Edition)
No ratings yet
Business Intelligence: A Managerial Approach (2 Edition)
58 pages
BI Unit 3 Part 1
No ratings yet
BI Unit 3 Part 1
51 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
60 Common Data Mining Interview Questions in 2025
No ratings yet
60 Common Data Mining Interview Questions in 2025
20 pages
Business Intelligence Data Mining: (John Naisbett)
No ratings yet
Business Intelligence Data Mining: (John Naisbett)
60 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Article 6
No ratings yet
Article 6
6 pages
Lec 1
No ratings yet
Lec 1
33 pages
Unit - I MLT
No ratings yet
Unit - I MLT
137 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Chapter - 5 - Data Mining
No ratings yet
Chapter - 5 - Data Mining
18 pages
Data Mining
No ratings yet
Data Mining
30 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
Data Mining vs. Statistics: Pavel Brusilovsky
No ratings yet
Data Mining vs. Statistics: Pavel Brusilovsky
22 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
DWM Merged
No ratings yet
DWM Merged
125 pages
ML Lect1
100% (1)
ML Lect1
51 pages
DMiningKuliah 1 Introduction
No ratings yet
DMiningKuliah 1 Introduction
41 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
38 pages
TFB M1 C2 Data Mining
No ratings yet
TFB M1 C2 Data Mining
18 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
Data Mining Data-Mining
No ratings yet
Data Mining Data-Mining
34 pages
DSS Chapter 5
No ratings yet
DSS Chapter 5
9 pages
Data Mining
No ratings yet
Data Mining
9 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
DM - Unit-1 - Fundamentals of Data Mining
No ratings yet
DM - Unit-1 - Fundamentals of Data Mining
43 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining - I
No ratings yet
Data Mining - I
126 pages
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
No ratings yet
Business Understanding This Step Involves Understanding The Problem That Needs To Be Solved and Defining The Objectives of The Data Mining Project
5 pages
DSS Lec.8
No ratings yet
DSS Lec.8
22 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
47-Article Text-103-1-10-20220403
No ratings yet
47-Article Text-103-1-10-20220403
16 pages
Studiul I Meta Analysis Investment Model Le Agnew 2003
100% (1)
Studiul I Meta Analysis Investment Model Le Agnew 2003
21 pages
Kartu Pelatihan Toefl Prediction: Code of Training Regulations For The Participants
No ratings yet
Kartu Pelatihan Toefl Prediction: Code of Training Regulations For The Participants
1 page
QM323 Analytics For Business Problem Set
No ratings yet
QM323 Analytics For Business Problem Set
6 pages
Assignment-2 ML Solution by Loknath Regmi
No ratings yet
Assignment-2 ML Solution by Loknath Regmi
6 pages
The T Distribution Table
No ratings yet
The T Distribution Table
1 page
Tutorial 4 Sim
No ratings yet
Tutorial 4 Sim
2 pages
Sta404 Chapter 08
No ratings yet
Sta404 Chapter 08
120 pages
Statistics
No ratings yet
Statistics
57 pages
Excel and Excel QM Examples
No ratings yet
Excel and Excel QM Examples
84 pages
Statistics and Probability Yong Hwa M. Jeong Grade 11 STEM-B Quarter 4 - Module 1: Test of Hypothesis
No ratings yet
Statistics and Probability Yong Hwa M. Jeong Grade 11 STEM-B Quarter 4 - Module 1: Test of Hypothesis
22 pages
Panel GMM Commands
No ratings yet
Panel GMM Commands
13 pages
Stat Cheat Sheet
No ratings yet
Stat Cheat Sheet
2 pages
Week 3
No ratings yet
Week 3
2 pages
A Levels Stats 2 Chapter 6
100% (1)
A Levels Stats 2 Chapter 6
19 pages
Outlier Analysis 2nd Edition Charu C. Aggarwal (Auth.) Download
No ratings yet
Outlier Analysis 2nd Edition Charu C. Aggarwal (Auth.) Download
59 pages
Machine Learning in PySpark
No ratings yet
Machine Learning in PySpark
18 pages
BM60116 - Slides 3.0
No ratings yet
BM60116 - Slides 3.0
11 pages
728HW2 Godkin
No ratings yet
728HW2 Godkin
3 pages
Group Assignment Alfy 602 2023
No ratings yet
Group Assignment Alfy 602 2023
3 pages
Unit 4-B: Multiple Regression
No ratings yet
Unit 4-B: Multiple Regression
75 pages
Dsbda Insem
No ratings yet
Dsbda Insem
1 page
Forecasting of Motorcycle Demand Using Calender Variations, Hybrid Calender Variations-ANN and Disagregation (Case Study in Jombang)
No ratings yet
Forecasting of Motorcycle Demand Using Calender Variations, Hybrid Calender Variations-ANN and Disagregation (Case Study in Jombang)
8 pages
Final Chi Square
No ratings yet
Final Chi Square
22 pages
QUESTIONS TRIAL KMJ AM025 - Part2
No ratings yet
QUESTIONS TRIAL KMJ AM025 - Part2
2 pages
Stats AP Review
100% (2)
Stats AP Review
38 pages
Answers 4
No ratings yet
Answers 4
10 pages
C4 English
No ratings yet
C4 English
27 pages
(A) (B) (C) (D) : No. of Questions 7
No ratings yet
(A) (B) (C) (D) : No. of Questions 7
4 pages

Chapter 04 - in Class

Uploaded by

Chapter 04 - in Class

Uploaded by

Analytics, Data Science and AI:

Systems for Decision Support

Slide in this Presentation Contain Hyperlinks.

– Step 4: Model Building

Figure 4.5 SEMMA Data Mining Process.

Source: Used with permission from KDnuggets.com.

Figure 4.11 A Sample ROC Curve.

Figure 4.12 Graphical Depiction of the Sensitivity Analysis Process

• Decision tree analysis

Example in RapidMiner: Credit Risk Clustering

• Finds interesting relationships (affinities) between

A Generic Rule: X  Y [S%, C%]

X, Y: products and/or services

• Several algorithms are developed for discovering

Source: Used with permission from KDnuggets.com.

• Goal: Predicting financial success of Hollywood movies

Table 4.4 Summary of Independent Variables

The DM Process Map in IBM

Source: Reprint Courtesy of International Business Machines Corporation, © International

This work is protected by United States copyright laws and is

You might also like