0% found this document useful (0 votes)

50 views57 pages

Chapter 4 - IS 466 - Spring Semester 23-24 Final

Uploaded by

alhanoofalsagir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views57 pages

Chapter 4 - IS 466 - Spring Semester 23-24 Final

Uploaded by

alhanoofalsagir

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Analytics, Data Science and AI:

Systems for Decision Support

Eleventh Edition

Chapter 4
Data Mining Process, Methods, and
Algorithms

Slide in this Presentation Contain Hyperlinks.

JAWS users should be able to get a list of links
by using INSERT+F7

4.1 Define data mining as an enabling technology for business analytics

4.2 Understand the objectives and benefits of data mining
4.3 Become familiar with the wide range of applications of data mining
4.4 Learn the standardized data mining processes
4.5 Learn different methods and algorithms of data mining.
4.6 Build awareness of the existing data mining software tools
4.7 Understand the privacy issues, pitfalls, and myths of data mining

• Data mining is a process that uses statistics, mathematical, and AI

technologies to extract and identify useful information and subsequent
knowledge (or patterns) from large sets of data.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Definition of Data Mining
• The nontrivial process of identifying valid, novel, potentially useful,
and ultimately understandable patterns in data stored in structured
databases. -- Fayyad et al., (1996)
• Keywords in this definition: Process, nontrivial, valid, novel, potentially
useful, understandable.
• Other names: knowledge extraction, pattern analysis, knowledge
discovery, information harvesting, pattern searching, data dredging.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Opening Vignette
Miami-Dade Police Department Is Using Predictive
Analytics to Foresee and Fight Crime

• Predictive analytics in law enforcement

– Policing with less
– New thinking on cold cases
– The big picture starts small
– Success brings credibility
– Just for the facts
– Safer streets for smarter cities

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Concepts and Definitions:
Why Data Mining?
• More intense competition at the global scale.
• Recognition of the value in data sources.
• Availability of quality data on customers, vendors, transactions, Web,
etc.
• Consolidation and integration of data repositories into data
warehouses.
• The exponential increase in data processing and storage capabilities.
• Decrease in hardware and software for data storage & processing
costs.
• Movement toward conversion of information resources into nonphysical
form.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Is a Blend of Multiple
Disciplines
Figure 4.1 Data Mining Is a Blend of Multiple Disciplines.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Characteristics & Objectives
• Source of data for DM is often a consolidated data warehouse (not
always!).
• DM environment is usually a client-server or a Web-based information
systems architecture.
• Data is the most critical ingredient for DM which may include
soft/unstructured data.
• The miner is often an end user
• Striking it rich requires creative thinking
• Data mining tools’ capabilities and ease of use are essential (Web,
parallel processing, etc.)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
How Data Mining Works
• DM extract patterns from data
– Pattern? A mathematical (numeric and/or symbolic) relationship among
data items

• Types of patterns
– There are four different types of patterns:

 Prediction: tell the nature of future occurrences of certain

events based on what has happened in the past, such as
predicting the winner of the Super Bowl or forecasting the
absolute temperature of a particular day.

 Association: find the commonly co-occurring groupings of

things, such as baby formula and diapers going together in
market-basket analysis.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
How Data Mining Works
• Types of patterns (continued)

 Clusters (or segmentation): identify natural groupings of

things based on their known characteristics, such as
assigning customers in different segments based on their
demographics and past purchase behaviors.

 Sequential relationships: discover time-ordered events,

such as predicting that an existing banking customer who
already has a checking account will open a savings account
followed by an investment account within a year.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
A Taxonomy for Data Mining
Figure 4.2 A Simple
Taxonomy for Data
Mining Tasks,
Methods, and
Algorithms.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Other Data Mining Patterns/Tasks
• Time-series forecasting
– Part of the sequence or link analysis?
• Visualization
– Another data mining task?
• Data Mining versus Statistics
– Are they the same?
– What is the relationship between the two?

• Customer Relationship Management

– Maximize return on marketing campaigns
– Improve customer retention (churn analysis)
– Maximize customer value (cross-, up-selling)
– Identify and treat most valued customers
• Banking & Other Financial
– Automate the loan application process
– Detecting fraudulent transactions
– Maximize customer value (cross-, up-selling)
– Optimizing cash reserves with forecasting

• Retailing and Logistics

– Optimize inventory levels at different locations
– Improve the store layout and sales promotions
– Optimize logistics by predicting seasonal effects
– Minimize losses due to limited shelf life
• Manufacturing and Maintenance
– Predict/prevent machinery failures
– Identify anomalies in production systems to optimize the use
manufacturing capacity
– Discover novel patterns to improve product quality

• Brokerage and Securities Trading

– Predict changes on certain bond prices
– Forecast the direction of stock fluctuations
– Assess the effect of events on market movements
– Identify and prevent fraudulent activities in trading
• Insurance
– Forecast claim costs for better business planning
– Determine optimal rate plans
– Optimize marketing to specific customers
– Identify and prevent fraudulent claim activities

• Computer hardware and software

• Science and engineering
• Government and defense
• Homeland security and law enforcement
• Travel, entertainment, sports
• Healthcare and medicine
• Sports,… virtually everywhere…

• A manifestation of the best practices

• A systematic way to conduct DM projects
• Moving from Art to Science for DM project
• Everybody has a different version
• Most common standard processes:
– CRISP-DM (Cross-Industry Standard Process for Data Mining)
– SEMMA (Sample, Explore, Modify, Model, and Assess)
– KDD (Knowledge Discovery in Databases)

• Cross Industry Standard Process for Data Mining

• Proposed in 1990s by a European consortium
• Composed of six consecutive phases

– Step 1: Business Understanding  Accounts for


– Step 2: Data Understanding  ~85% of total
– Step 3: Data Preparation  project time

– Step 4: Model Building
– Step 5: Testing and Evaluation
– Step 6: Deployment

• Figure 4.3 The Six-Step

CRISP-DM Data Mining
Process. 
• The process is highly
repetitive and
experimental (DM: art
versus science?)

Figure 4.5 SEMMA Data Mining Process.

• Developed by SAS Institute

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Process: KD D
Figure 4.6 K DD (Knowledge Discovery in Databases)
Process.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
What Data Mining Methodology are you
using?
Figure 4.7 Ranking of Data Mining Methodologies/Processes.

Source: Used with permission from KDnuggets.com.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 4.4
Data Mining Helps in Cancer Research
Questions for Discussion
1. How can data mining be
used for ultimately curing
illnesses like cancer?
2. What do you think are the
promises and major
challenges for data miners
in contributing to medical
and biological research
endeavors?

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Best Algorithms based on type of DM Task
• Depending on the business need, different types of data mining tasks
can be used: prediction, clustering, or association.

• Most popular algorithms to be used based on type of task:

1. decision trees for classification (prediction),
2. k-means for clustering (segmentation),
3. Apriori algorithm for association rule mining.

• Classification versus regression

– Classification – what is being predicted is a class label
 weather: sunny, cloudy, rainy
 credit approval: good, bad credit risk

– Regression: what is being predicted is a numeric value

 Temperature: 31
 Number of attendees: 100,000

• Most frequently used DM method

• Part of the machine-learning family

• Employ supervised learning

• Learn from past data, classify new data

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Methods: Classification
• The output variable is categorical (nominal or ordinal) in nature
– Nominal data:
 data that can be labelled or classified into mutually exclusive
categories within a variable.
 Categories cannot be ordered in a meaningful way.
 Example, for the nominal variable of preferred mode of
transportation, you may have the categories of car, bus, train, tram or
bicycle.
– Ordinal data:
 statistical data type where the variables have natural, ordered
categories
 Example: For a grading system: excellent, very good, good, poor;
– or for winner in a race: first, second, third.

• Two-step Methodology of classification-type prediction involves:

– Model development/training
 A collection of input data (variables), including the predicted
actual known class labels (for loans approval as an example:
good, risky) is used for building and train the model.
– Model testing/deployment
 The model is tested against the holdout sample for accuracy
assessment and eventually deployed for actual use where it is
to predict classes of new data instances (where the class label
is unknown).

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Estimation Methodologies for
Classification: Single/Simple Split
• Simple split (or holdout or test sample estimation)
– Split the data into 2 mutually exclusive sets: training (~70%) and
testing (30%)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Estimation Methodologies for
Classification: k-Fold Cross Validation
• Data is split into k mutual subsets and k number training/testing
experiments are conducted

Figure 4.10 A Graphical Depiction of k-Fold Cross-Validation.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Accuracy of Classification Models
• In classification problems, the primary source for accuracy
estimation is the confusion matrix (or, classification matrix)

TP  TN
Accuracy 
TP  TN  FP  FN

TP
True Positive Rate 
TP  FN
TN
True Negative Rate 
TN  FP

TP TP
Precision  Recall 
TP  FP TP  FN

Figure 4.8 Matrix for tabulation of two-classification results

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Classification Techniques
• Decision tree analysis
• Statistical analysis
• Neural networks
• Support vector machines
• Case-based reasoning
• Bayesian classifiers
• Genetic algorithms
• Rough sets

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Decision Trees
• Employs a divide-and-conquer method
• Recursively divides a training set until each division consists of
examples from one class:

A general 1. Create a root node and assign all of the training data to
algorithm it.
(steps) for 2. Select the best splitting attribute.
building a 3. Add a branch to the root node for each value of the
decision split. Split the data into mutually exclusive subsets
tree along the lines of the specific split.
4. Repeat the steps 2 and 3 for each and every leaf node
until the stopping criteria is reached.

2. Stopping criteria
 When to stop building the tree

3. Pruning (generalization method)

 Which parts of the tree to remove

• Most popular DT algorithms include

– ID3, C4.5, C5; CART; CHAID; M5

Source:
https://www.softwaretestinghelp.com/decision-tree-algorithm-examples-data-mining/

Source of image:
https://www.softwaretestinghelp.com/decision-tree-algorithm-examples-data-mining/

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining
(1 of 4)
• Used for automatic identification of natural groupings of
things
• Part of the machine-learning family
• Employ unsupervised learning
• Learns the clusters of things from past data, then assigns
new instances
• There is not an output/target variable
• In marketing, it is also known as segmentation

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining
(2 of 4)
• Clustering results may be used to
– Identify natural groupings of customers
– Identify rules for assigning new cases to classes for
targeting/diagnostic purposes
– Provide characterization, definition, labeling of
populations
– Decrease the size and complexity of problems for
other data mining methods
– Identify outliers in a specific domain (e.g., rare-event
detection)

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining
(3 of 4)
• Analysis methods
– Statistical methods such as k-means, k-modes, and so
on.
– Neural networks (adaptive resonance theory [ART],
self-organizing map [SO M])
– Fuzzy logic (e.g., fuzzy c-means algorithm)
– Genetic algorithms
• How many clusters?

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining
(4 of 4)
• k-Means Clustering Algorithm
– k: pre-determined number of clusters
– Algorithm (Step 0: determine value of k)
Step 1: Randomly generate k random points as initial
cluster centers.
Step 2: Assign each point to the nearest cluster center.
Step 3: Re-compute the new cluster centers.
Repetition step: Repeat steps 3 and 4 until some
convergence criterion is met (usually that the
assignment of points to clusters becomes stable).

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Cluster Analysis for Data Mining -
k-Means Clustering Algorithm
Figure 4.13 A Graphical Illustration of the Steps in the
k-Means Algorithm.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining (1 of 7)
• A very popular DM method in business
• Finds interesting relationships (affinities) between
variables (items or events)
• Part of machine learning family
• Employs unsupervised learning
• There is no output variable
• Also known as market basket analysis or affinity analysis

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining (2 of 7)
• Input: the simple point-of-sale transaction data
• Output: Most frequent affinities among items
• Example: according to the transaction data…
“Customer who bought a lap-top computer and a virus
protection software, also bought extended service plan
70 percent of the time."
• How do you use such a pattern/knowledge?
– Put the items next to each other
– Promote the items as a package
– Place items far apart from each other!

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining (3 of 7)
• A representative applications of association rule mining
include
– In business: cross-marketing, cross-selling, store
design, catalog design, e-commerce site design,
optimization of online advertising, product pricing, and
sales/promotion configuration
– In medicine: relationships between symptoms and
illnesses; diagnosis and patient characteristics and
treatments (to be used in medical DSS); and genes
and their functions (to be used in genomics projects)
– …

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining (4 of 7)
• Are all association rules interesting and useful?
A Generic Rule: X  Y [S%, C%]
X, Y: products and/or services
X: Left-hand-side (LHS) ~ antecedent
Y: Right-hand-side (RHS) ~ consequent
S: Support: how often X and Y go together
C: Confidence: how often Y go together with the X
Example:
In the total number of transactions data:
{Laptop Computer, Antivirus Software}  {Extended Service Plan}
[30%, 70%]
i.e., laptops and antivirus software were present in 30% of total
transactions, and in cases where laptops and antivirus software were
present also extended service plan was found 70% of the time.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining (5 of 7)
Example:
In the total number of transactions data:
{Laptop Computer, Antivirus Software}  {Extended
Service Plan} [30%, 70%]
i.e., laptops and antivirus software were present in 30% of total
transactions, and in cases where laptops and antivirus software
were present also extended service plan was found 70% of the
time.

If total transactions = 100,

- Number of times laptops and antivirus were found together 30
times
- For these 30 times, it was found that extended service plan was
present 21 times: 21/30 = 70%

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining (6 of 7)
• Several algorithms are developed for discovering
(identifying) association rules
– Apriori
– Eclat
– FP-Growth
– + Derivatives and hybrids of the three
• The algorithms help identify the frequent item sets, which
are, then converted to association rules

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Association Rule Mining (7 of 7)
• Apriori Algorithm
– Finds subsets that are common to at least a minimum
number of the itemsets
– Uses a bottom-up approach
 frequent subsets are extended one item at a time
(the size of frequent subsets increases from one-
item subsets to two-item subsets, then three-item
subsets, and so on), and
 groups of candidates at each level are tested
against the data for minimum support. (see the
figure)  --

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Data Mining Software Tools
Figure 4.15 Popular Data Mining Software Tools (Poll Results).
• Commercial
– IBM SPSS Modeler
(formerly Clementine)
– SAS Enterprise Miner
– Statistica - Dell/Statsoft
– … many more
• Free and/or Open Source
– KNIME
– RapidMiner
– Weka
– R, …

Source: Used with permission from KDnuggets.com.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 4.6 (1 of 4)
Data Mining Goes to Hollywood: Predicting
Financial Success of Movies
• Goal: Predicting financial success of Hollywood movies
before the start of their production process
• How: Use of advanced predictive analytics methods.
• Results: promising.

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 4.6 (2 of 4)
Data Mining Goes to Hollywood: Predicting
Financial Success of Movies
A Typical Classification Problem
Table 4.3 Movie Classification based on Receipts
Class No. 1 2 3 4 5 6 7 8 9

Range >1 >1 >10 >20 >40 >65 >100 >150 >200
(in millions of dollars) (Flop) <610 <20 <640 <665 <6100 <6150 <6200 (Blockbuster)

Table 4.4 Summary of Independent Variables

Independent Variable Number of Values Possible Values
MPA A Rating 5 G, PG, PG-13, R, NR
Competition 3 High, medium, low
Star value 3 High, medium, low
Genre 10 Sci-Fi, Historic Epic Drama, Modern Drama,
Politically Related, Thriller, Horror, Comedy,
Cartoon, Action, Documentary
Special effects 3 High, medium, low
Sequel 2 Yes, no
Number of screens 1 A positive integer between 1 and 3,876

Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 4.6 (3 of 4)
Data Mining Goes to Hollywood: Predicting
Financial Success of Movies
FIGURE 4.16 Process
Flow Screenshot for the
Box-Office Prediction
System.

The DM Process Map in IB

M SPSS Modeler

Business Machines Corporation.
Copyright © 2020, 2015, 2011 Pearson Education, Inc. All Rights Reserved
Application Case 4.6 (4 of 4)
Data Mining Goes to Hollywood: Predicting
Financial Success of Movies

This work is protected by United States copyright laws and is

provided solely for the use of instructors in teaching their
courses and assessing student learning. Dissemination or sale of
any part of this work (including on the World Wide Web) will
destroy the integrity of the work and is not permitted. The work
and materials from it should never be made available to students
except by instructors using the accompanying text in their
classes. All recipients of this work are expected to abide by these
restrictions and to honor the intended pedagogical purposes and
the needs of other instructors who rely on these materials.

Capstone Interim Report - HR CTC Prediction
80% (10)
Capstone Interim Report - HR CTC Prediction
16 pages
U9L05 - Big, Open, and Crowdsourced Data
50% (2)
U9L05 - Big, Open, and Crowdsourced Data
1 page
Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
Predictive Analytics
100% (1)
Predictive Analytics
62 pages
Chapter 7 - Exercise Answers
67% (3)
Chapter 7 - Exercise Answers
6 pages
Chapter 4 - IS 466 - Fall Semester 24-25
No ratings yet
Chapter 4 - IS 466 - Fall Semester 24-25
57 pages
CH 05 PPTaccessible
No ratings yet
CH 05 PPTaccessible
60 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
Sharda 11e Full Accessible PPT 04
No ratings yet
Sharda 11e Full Accessible PPT 04
40 pages
Chapter 04 - in Class
No ratings yet
Chapter 04 - in Class
52 pages
3510-6510 - Ch4 Predictive Analytics I
No ratings yet
3510-6510 - Ch4 Predictive Analytics I
66 pages
3510-6510 Ch4
No ratings yet
3510-6510 Ch4
62 pages
7-8 - Data Mining Process, Methods, and Algorithms
No ratings yet
7-8 - Data Mining Process, Methods, and Algorithms
64 pages
Chapter 6 - Data Mining
No ratings yet
Chapter 6 - Data Mining
62 pages
Chapter 4 SR2023
No ratings yet
Chapter 4 SR2023
58 pages
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
No ratings yet
PPT4 W3 S4 R0 Predictive Analytics I Data Mining Process
50 pages
Turban Dss9e ch05
No ratings yet
Turban Dss9e ch05
54 pages
Business Intelligence: A Managerial Approach (2 Edition)
No ratings yet
Business Intelligence: A Managerial Approach (2 Edition)
58 pages
Business Intelligence Data Mining: (John Naisbett)
No ratings yet
Business Intelligence Data Mining: (John Naisbett)
60 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Chapter 5 - Data Mining
No ratings yet
Chapter 5 - Data Mining
29 pages
Topic 3 Data Mining For Business Intelligence
No ratings yet
Topic 3 Data Mining For Business Intelligence
49 pages
DMiningKuliah 1 Introduction
No ratings yet
DMiningKuliah 1 Introduction
41 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Turban Dss9e Ch05
No ratings yet
Turban Dss9e Ch05
38 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
Data Mining e Resources
No ratings yet
Data Mining e Resources
98 pages
Handout 2 Data Mining
No ratings yet
Handout 2 Data Mining
16 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
48 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
Data Mining
No ratings yet
Data Mining
31 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
Data Mining
No ratings yet
Data Mining
63 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Data Mining and Its Branches
No ratings yet
Data Mining and Its Branches
37 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
Unit - I
No ratings yet
Unit - I
22 pages
Paper 6: Management Information System Module 20: Data Mining For Decision Support
No ratings yet
Paper 6: Management Information System Module 20: Data Mining For Decision Support
16 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
09-Datamining Concepts
100% (1)
09-Datamining Concepts
121 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Data Mining - Bi 3
No ratings yet
Data Mining - Bi 3
40 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
Unit 1
No ratings yet
Unit 1
46 pages
1 DMiningKuliah 1 Introduction
No ratings yet
1 DMiningKuliah 1 Introduction
51 pages
1 - DM
No ratings yet
1 - DM
5 pages
Data Mining-Session 1
No ratings yet
Data Mining-Session 1
29 pages
Turban ch05
No ratings yet
Turban ch05
54 pages
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
No ratings yet
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
25 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Presentation 1
No ratings yet
Presentation 1
28 pages
Data Mining Nostos - Resp
No ratings yet
Data Mining Nostos - Resp
39 pages
Mohammad Adnan Sheikh, Div C, Roll No 42
No ratings yet
Mohammad Adnan Sheikh, Div C, Roll No 42
48 pages
Data Mining
No ratings yet
Data Mining
26 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Data Mining
No ratings yet
Data Mining
9 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
The Effect of Innovation On Manufacturing Firms' Performance in Ethiopia: Evidence From Garment Firms' in Addis Ababa
100% (1)
The Effect of Innovation On Manufacturing Firms' Performance in Ethiopia: Evidence From Garment Firms' in Addis Ababa
94 pages
Correlation Regression Curve Fitting-1
No ratings yet
Correlation Regression Curve Fitting-1
3 pages
Definitions and Formulae With Statistical Tables For Elementary Statistics and Quantitative Methods Courses
No ratings yet
Definitions and Formulae With Statistical Tables For Elementary Statistics and Quantitative Methods Courses
13 pages
Exploring Performance Measurement Practices in Brazilian Startups
No ratings yet
Exploring Performance Measurement Practices in Brazilian Startups
29 pages
Linear Regression and Anova
No ratings yet
Linear Regression and Anova
11 pages
Stats 12 Practice Test
No ratings yet
Stats 12 Practice Test
6 pages
MAE Electives Course
No ratings yet
MAE Electives Course
9 pages
Slides Prepared by John S. Loucks St. Edward's University
No ratings yet
Slides Prepared by John S. Loucks St. Edward's University
59 pages
Project Synopsis of Student Droupout Prediction
No ratings yet
Project Synopsis of Student Droupout Prediction
6 pages
Rhul Geography Dissertation Archive
100% (1)
Rhul Geography Dissertation Archive
6 pages
CUSUM Chart
0% (1)
CUSUM Chart
23 pages
Data Science Comprehensive Overview
No ratings yet
Data Science Comprehensive Overview
42 pages
Guidelines For Project Work On Field - 1 PDF
No ratings yet
Guidelines For Project Work On Field - 1 PDF
10 pages
Business Analytics & Applications
No ratings yet
Business Analytics & Applications
21 pages
Dissertation Help Statistical Analysis
100% (2)
Dissertation Help Statistical Analysis
7 pages
Analysis of Variance
No ratings yet
Analysis of Variance
28 pages
Gambaran Kepuasan Pasien Kanker Stadium Lanjut Terhadap Perawatan Paliatif Di Rsup Dr. Sardjito Yogyakarta
No ratings yet
Gambaran Kepuasan Pasien Kanker Stadium Lanjut Terhadap Perawatan Paliatif Di Rsup Dr. Sardjito Yogyakarta
9 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Predictive Data Mining and Discovering Hidden Values of Data Warehouse
No ratings yet
Predictive Data Mining and Discovering Hidden Values of Data Warehouse
5 pages
Data Scientist Resume Example
No ratings yet
Data Scientist Resume Example
1 page
Data-Mining-Lab-Manual Cs 703b
No ratings yet
Data-Mining-Lab-Manual Cs 703b
41 pages
Apurba Resume
No ratings yet
Apurba Resume
1 page
Data Analytics Course Syllabus
No ratings yet
Data Analytics Course Syllabus
10 pages
I Semester 15Tt01 Quantitative Techniques in Textile Engineering 3 0 0 3
No ratings yet
I Semester 15Tt01 Quantitative Techniques in Textile Engineering 3 0 0 3
26 pages
Nama: Lingga Pristiya Ningsih Nim: 1501015020 MK: Ekonometrika
No ratings yet
Nama: Lingga Pristiya Ningsih Nim: 1501015020 MK: Ekonometrika
2 pages
Econometrics QP Calicut
No ratings yet
Econometrics QP Calicut
17 pages

Chapter 4 - IS 466 - Spring Semester 23-24 Final

Uploaded by

Chapter 4 - IS 466 - Spring Semester 23-24 Final

Uploaded by

Analytics, Data Science and AI:

Systems for Decision Support

Slide in this Presentation Contain Hyperlinks.

4.1 Define data mining as an enabling technology for business analytics

• Data mining is a process that uses statistics, mathematical, and AI

• Predictive analytics in law enforcement

 Prediction: tell the nature of future occurrences of certain

 Association: find the commonly co-occurring groupings of

 Clusters (or segmentation): identify natural groupings of

 Sequential relationships: discover time-ordered events,

• Customer Relationship Management

• Retailing and Logistics

• Brokerage and Securities Trading

• Computer hardware and software

• A manifestation of the best practices

• Cross Industry Standard Process for Data Mining

– Step 1: Business Understanding  Accounts for

• Figure 4.3 The Six-Step

Figure 4.5 SEMMA Data Mining Process.

Source: Used with permission from KDnuggets.com.

• Most popular algorithms to be used based on type of task:

• Classification versus regression

– Regression: what is being predicted is a numeric value

• Most frequently used DM method

• Part of the machine-learning family

• Employ supervised learning

• Learn from past data, classify new data

• Two-step Methodology of classification-type prediction involves:

Figure 4.10 A Graphical Depiction of k-Fold Cross-Validation.

Figure 4.8 Matrix for tabulation of two-classification results

3. Pruning (generalization method)

• Most popular DT algorithms include

If total transactions = 100,

Source: Used with permission from KDnuggets.com.

Table 4.4 Summary of Independent Variables

The DM Process Map in IB

Source: Reprint Courtesy of International Business Machines Corporation, © International

This work is protected by United States copyright laws and is

You might also like