0% found this document useful (0 votes)

48 views

Intro To Data Mining

Uploaded by

Kamote Sweet

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Intro To Data Mining

Uploaded by

Kamote Sweet

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Introduction to Data Mining

12-1
Data Mining

• Data mining is a rapidly growing field of

business analytics focused on better
understanding of characteristics and patterns
among variables in large data sets.
• It is used to identify and understand hidden
patterns that large data sets may contain.
• It involves both descriptive and prescriptive
analytics, though it is primarily prescriptive.

Copyright © 2013 Pearson Education, Inc.

12-2
publishing as Prentice Hall
The Scope of Data Mining
Some common approaches to data mining
Association
• - analyze data to identify natural associations
among variables and create rules for target
marketing or buying recommendations
• Netflix uses association to understand what
types of movies a customer likes and provides
recommendations based on the data
• Amazon makes recommendations based on
past purchases
• Supermarket loyalty cards collect data on
customer purchase habits and print coupons
based on what was currently bought.
12-3
The Scope of Data Mining
Some common approaches to data mining
Clustering
₋ Similar to classification, but when no groups have
been defined; finds groupings within data
₋ Example: Insurance company could use
clustering to group clients by their age, location
and types of insurance purchased.
₋ The categories are unspecified and this is
referred to as ‘unsupervised learning’

12-4
The Scope of Data Mining

Some common approaches to data mining

Classification
- analyze data to predict how to classify new
elements
– Spam filtering in email by examining textural
characteristics of message
– Help predict if credit-card transaction may be
fraudulent
– Is a loan application high risk
– Will a consumer respond to an ad

12-5
Association Rule Mining
Association Rule Mining (affinity analysis)
• Seeks to uncover associations in large data sets
• Association rules identify attributes that occur
together frequently in a given data set.
• Market basket analysis, for example, is used determine
groups of items consumers tend to purchase together.
• Association rules provide information in the form of if-
then (antecedent-consequent) statements.
• The rules are probabilistic in nature.

Copyright © 2013 Pearson Education, Inc.

12-6
publishing as Prentice Hall
Association Rule Mining
Custom Computer Configuration
(PC Purchase Data)
• Suppose we want to know which PC
components are often ordered together.

Figure 12.35

Copyright © 2013 Pearson Education, Inc.

12-7
publishing as Prentice Hall
Association Rule Mining

Measuring the Strength of Association Rules

Support for the (association) rule is the
percentage (or number) of transactions that
include all items both antecedent and
consequent.
Confidence of the (association) rule:
Lift is a ratio of confidence to expected
confidence.

Copyright © 2013 Pearson Education, Inc.

12-8
publishing as Prentice Hall
Association Rule Mining
Measuring Strength of Association
A supermarket database has 100,000 point-of-sale transactions:
2000 include both A and B items
5000 include C
800 include A, B, and C
Association rule:
If A and B are purchased, then C is also purchased.
Support = 800/100,000 = 0.008
Confidence = 800/2000 = 0.40
Expected confidence = 5000/100,000 = 0.05
Lift = 0.40/0.05 = 8

12-9
Association Rule Mining
(continued) Identifying Association Rules for PC
Purchase Data

Figure 12.37

Copyright © 2013 Pearson Education, Inc.

12-10
publishing as Prentice Hall
Association Rule Mining

Example 12.14 (continued) Identifying

Association Rules for PC Purchase Data

Figure 12.38

Rules are sorted by their Lift Ratio (how much more likely one is to
purchase the consequent if they purchase the antecedents).

Copyright © 2013 Pearson Education, Inc.

12-11
publishing as Prentice Hall
Cluster Analysis
• Similar to classification, but when no groups have been
defined; finds groupings within data
• Cluster Analysis has many powerful uses like Market Segmentation.
• You can view individual record’s predicted cluster membership.

• Also called data segmentation

• Two major methods
1. Hierarchical clustering
a) Agglomerative methods (used in XLMiner)
proceed as a series of fusions

2. k-means clustering (available in XLMiner)

partitions data into k clusters so that each element belongs to the
cluster with the closest mean
12-12
Cluster Analysis – Agglomerative Methods
Dendrogram – a diagram illustrating fusions or
divisions at successive stages
Objects “closest” in distance to each other are
gradually joined together.
Euclidean distance is
the most commonly
used measure of the clid ea
n
Eu
distance between
objects.
Figure 12.2

Copyright © 2013 Pearson Education, Inc.

12-13
publishing as Prentice Hall
Clustering Colleges and Universities
Cluster the Colleges and Universities data
using the five numeric columns in the data
set.
Use the hierarchical method

Figure 12.3

Copyright © 2013 Pearson Education, Inc.

12-14
publishing as Prentice Hall
• This process of agglomeration leads to the construction of a dendrogram.

• This is a tree-like diagram that summarizes the process of clustering.

• For any given number of clusters we can determine the records in the clusters by sliding a
horizontal line (ruler) up and down the dendrogram until the number of vertical intersections of
the horizontal line equals the number of clusters desired.

Copyright © 2013 Pearson Education, Inc.

12-15
publishing as Prentice Hall
(continued) Clustering of Colleges
Hierarchical clustering results: Dendrogram

Height of the bars is a measure of

dissimilarity in the clusters that are
merging into one.

Smaller clusters “agglomerate” into

bigger ones, with least possible loss of
cohesiveness at each stage.

From Figure 12.8

12-16
publishing as Prentice Hall
(continued) Clustering of Colleges
Hierarchical clustering results: Predicted clusters

From Figure 12.9

12-17
(continued) Clustering of Colleges

Hierarchical clustering results:

Predicted clusters

Cluster # Colleges
1 23
2 22
3 3
4 1

Schools in cluster 3 appear similar.

Cluster 4 has considerably higher Median SAT and Expenditures/Student.

12-19
Classification
 Recognizes patterns that describe group to
which item belongs
 We will analyze the Credit Approval Decisions
data to predict how to classify new elements.
Categorical variable of interest: Decision
(whether to approve or reject a credit
application)
Predictor variables: shown in columns A-E

Figure 12.10

12-20
Classification
Modified Credit Approval Decisions
The categorical variables are coded as numeric:
Homeowner - 0 if No, 1 if Yes
Decision - 0 if Reject, 1 if Approve

Figure 12.11

12-21
publishing as Prentice Hall
Classification
Using Training and Validation Data
Data mining projects typically involve large volumes of
data.
The data can be partitioned into:
▪ training data set – has known outcomes and is used to
“teach” the data-mining algorithm
▪ validation data set – used to fine-tune a model
▪ test data set – tests the accuracy of the model
In XLMiner, partitioning can be random or user-specified.

12-22
Classification
(continued) Partitioning Data Sets in XLMiner
Partitioning choices when choosing random
1. Automatic 60% training, 40% validation
2. Specify % 50% training, 30% validation, 20% test
(training and validation % can be modified)
3. Equal # records 33.33% training, validation, test
XLMiner has size and relative size limitations on the
data sets, which can affect the amount and % of data
assigned to the data sets.

12-23
publishing as Prentice Hall
Classification Techniques
Three Data-Mining Approaches to Classification:
1. k-Nearest Neighbors (k-NN) Algorithm
find records in a database that have similar numerical values of a
set of predictor variables
2. Discriminant Analysis (what we will do)
use predefined classes based on a set of linear
discriminant functions of the predictor variables
3. Logistic Regression
estimate the probability of belonging to a category using a
regression on the predictor variables

12-24
Classification Techniques
(continued) Using Discriminant Analysis for
Classifying New Data

Figure 12.27

Half of the applicants are in the “Approved” class

12-25

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
57% (82)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (108)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
The 36 Questions That Lead To Love - The New York Times
91% (35)
The 36 Questions That Lead To Love - The New York Times
3 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
70% (73)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Reframing Ropes Lesson Plan Sample For Weebly
No ratings yet
Reframing Ropes Lesson Plan Sample For Weebly
2 pages
Week 12-Asociation Dan Forecasting
No ratings yet
Week 12-Asociation Dan Forecasting
22 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
DataMining_Chapter1
No ratings yet
DataMining_Chapter1
13 pages
8 Chapter Eight
No ratings yet
8 Chapter Eight
20 pages
Data Mining: © Pearson Education Limited 1995, 2005
No ratings yet
Data Mining: © Pearson Education Limited 1995, 2005
50 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
No ratings yet
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
40 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
تنقيب بيانات 7 بعد التعديل Maj
No ratings yet
تنقيب بيانات 7 بعد التعديل Maj
35 pages
Lecture Notes 1.1 & 1.2
No ratings yet
Lecture Notes 1.1 & 1.2
8 pages
Lec 1 Data Mining Introduction For Exam
No ratings yet
Lec 1 Data Mining Introduction For Exam
48 pages
BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Patterns Mined +frequent Patterns
No ratings yet
Patterns Mined +frequent Patterns
18 pages
Data Mining
No ratings yet
Data Mining
23 pages
datamining ch1
No ratings yet
datamining ch1
24 pages
Article 6
No ratings yet
Article 6
6 pages
Unit 4
No ratings yet
Unit 4
15 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
1.1 Project Overview: Data Mining
No ratings yet
1.1 Project Overview: Data Mining
74 pages
Dwdm Unit-II Notes
No ratings yet
Dwdm Unit-II Notes
29 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
36 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Unit - 2 Data Minig Notes
No ratings yet
Unit - 2 Data Minig Notes
15 pages
Data Mining
No ratings yet
Data Mining
30 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
Exercises 5
No ratings yet
Exercises 5
5 pages
02 - Data Mining
No ratings yet
02 - Data Mining
27 pages
unit 3 BI & Data science (1)
No ratings yet
unit 3 BI & Data science (1)
19 pages
Chapter 10 - Introduction To Data Mining
No ratings yet
Chapter 10 - Introduction To Data Mining
40 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
CS 412: Introduction To Data Mining Course Syllabus
No ratings yet
CS 412: Introduction To Data Mining Course Syllabus
7 pages
CS 412: Introduction To Data Mining Course Syllabus
No ratings yet
CS 412: Introduction To Data Mining Course Syllabus
7 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Association Rule Mining - Models and Algorithms (Zhang & Zhang 2002-05-28)
50% (2)
Association Rule Mining - Models and Algorithms (Zhang & Zhang 2002-05-28)
248 pages
Paper Dinesh Clustering Techniques
No ratings yet
Paper Dinesh Clustering Techniques
5 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
8 Data Mining Algorithms
No ratings yet
8 Data Mining Algorithms
8 pages
Datamining & Cluster Coputing
No ratings yet
Datamining & Cluster Coputing
16 pages
Evaluation_of_Student_Academic_Performan
No ratings yet
Evaluation_of_Student_Academic_Performan
7 pages
Evans Analytics2e PPT 10 Data Mining
No ratings yet
Evans Analytics2e PPT 10 Data Mining
69 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
Informa) CS: Lecture 6 - Processing Informa4on
No ratings yet
Informa) CS: Lecture 6 - Processing Informa4on
29 pages
Clustering
No ratings yet
Clustering
123 pages
Clustering Agglo Devisive DBSCAN
No ratings yet
Clustering Agglo Devisive DBSCAN
78 pages
DMW Notes UNIT-1 2023-24
No ratings yet
DMW Notes UNIT-1 2023-24
15 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
Running Head:: Data Mining 1
No ratings yet
Running Head:: Data Mining 1
7 pages
DM NOTES
No ratings yet
DM NOTES
91 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Data Mining Concepts - Binary
No ratings yet
Data Mining Concepts - Binary
22 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Orton - Docx Sample Steps
No ratings yet
Orton - Docx Sample Steps
9 pages
CPS 5-Year Vision
No ratings yet
CPS 5-Year Vision
24 pages
Performance Task in TLE 10 (Cookery) - 20240308 - 062303 - 0000
No ratings yet
Performance Task in TLE 10 (Cookery) - 20240308 - 062303 - 0000
9 pages
Socio-Cultural Foundation of Education - Report Feb 17, 2018
100% (2)
Socio-Cultural Foundation of Education - Report Feb 17, 2018
16 pages
Factors Affecting Grade 10 Students in C
100% (1)
Factors Affecting Grade 10 Students in C
57 pages
Tara Poole Resume
No ratings yet
Tara Poole Resume
4 pages
Junior Executive Syllabus 2024
No ratings yet
Junior Executive Syllabus 2024
5 pages
IV-Day 34 Detailed Lesson Plan Math-8
No ratings yet
IV-Day 34 Detailed Lesson Plan Math-8
3 pages
IRF Chart
No ratings yet
IRF Chart
2 pages
Best HR Practices of Google
No ratings yet
Best HR Practices of Google
7 pages
Temporal Adjuncts Clauses
No ratings yet
Temporal Adjuncts Clauses
58 pages
PMO Assessment Flyer PDF
No ratings yet
PMO Assessment Flyer PDF
2 pages
BÁO CÁO THỰC TẬP TIẾNG ANH TRUNG TÂM NGOẠI NGỮ
No ratings yet
BÁO CÁO THỰC TẬP TIẾNG ANH TRUNG TÂM NGOẠI NGỮ
31 pages
AI in HRM
100% (1)
AI in HRM
108 pages
Ed 688 Health Lesson
No ratings yet
Ed 688 Health Lesson
2 pages
Evaluation of The Patient With Dementia1747
No ratings yet
Evaluation of The Patient With Dementia1747
39 pages
Define Requisition Form. Identify and Explain The Parts of A Requisition Form. Value The Importance of A Requisition Form
No ratings yet
Define Requisition Form. Identify and Explain The Parts of A Requisition Form. Value The Importance of A Requisition Form
4 pages
Report For Corporate Skill Development-I: Title Table of Contents
No ratings yet
Report For Corporate Skill Development-I: Title Table of Contents
2 pages
Daily Lesson Plan 23.8
No ratings yet
Daily Lesson Plan 23.8
3 pages
Self-Concept Paper - Frankie Romeo
No ratings yet
Self-Concept Paper - Frankie Romeo
5 pages
Elsevier Computer Science Journals List
No ratings yet
Elsevier Computer Science Journals List
1 page
TBAC Islamic Week Circular - 16th - 19th Sep 24
No ratings yet
TBAC Islamic Week Circular - 16th - 19th Sep 24
1 page
Fair Frames Presentation
No ratings yet
Fair Frames Presentation
13 pages
Study Planner For Board Examinations PDF
No ratings yet
Study Planner For Board Examinations PDF
4 pages
Framework UM 7wk
No ratings yet
Framework UM 7wk
144 pages
PAD 381 Proposal
No ratings yet
PAD 381 Proposal
2 pages
Umpan Balik Dalam Pembelajaran
No ratings yet
Umpan Balik Dalam Pembelajaran
19 pages
Joshua Lagonoy - Outcomes Based Assessment 5
No ratings yet
Joshua Lagonoy - Outcomes Based Assessment 5
3 pages
Definition of Terms: Maed-Filipino
0% (1)
Definition of Terms: Maed-Filipino
14 pages

Intro To Data Mining

Uploaded by

Intro To Data Mining

Uploaded by

Introduction to Data Mining

• Data mining is a rapidly growing field of

Copyright © 2013 Pearson Education, Inc.

Some common approaches to data mining

Copyright © 2013 Pearson Education, Inc.

Copyright © 2013 Pearson Education, Inc.

Measuring the Strength of Association Rules

Copyright © 2013 Pearson Education, Inc.

Copyright © 2013 Pearson Education, Inc.

Example 12.14 (continued) Identifying

Copyright © 2013 Pearson Education, Inc.

• Also called data segmentation

2. k-means clustering (available in XLMiner)

Copyright © 2013 Pearson Education, Inc.

Copyright © 2013 Pearson Education, Inc.

• This is a tree-like diagram that summarizes the process of clustering.

Copyright © 2013 Pearson Education, Inc.

Height of the bars is a measure of

Smaller clusters “agglomerate” into

From Figure 12.8

Copyright © 2013 Pearson Education, Inc.

From Figure 12.9

Hierarchical clustering results:

Schools in cluster 3 appear similar.

Copyright © 2013 Pearson Education, Inc.

Copyright © 2013 Pearson Education, Inc.

Half of the applicants are in the “Approved” class

You might also like