Intro To Data Mining
Intro To Data Mining
12-1
Data Mining
12-4
The Scope of Data Mining
12-5
Association Rule Mining
Association Rule Mining (affinity analysis)
• Seeks to uncover associations in large data sets
• Association rules identify attributes that occur
together frequently in a given data set.
• Market basket analysis, for example, is used determine
groups of items consumers tend to purchase together.
• Association rules provide information in the form of if-
then (antecedent-consequent) statements.
• The rules are probabilistic in nature.
Figure 12.35
12-9
Association Rule Mining
(continued) Identifying Association Rules for PC
Purchase Data
Figure 12.37
Figure 12.38
Rules are sorted by their Lift Ratio (how much more likely one is to
purchase the consequent if they purchase the antecedents).
Figure 12.3
• For any given number of clusters we can determine the records in the clusters by sliding a
horizontal line (ruler) up and down the dendrogram until the number of vertical intersections of
the horizontal line equals the number of clusters desired.
12-17
(continued) Clustering of Colleges
Cluster # Colleges
1 23
2 22
3 3
4 1
Copyright © 2013
Figure Pearson
12.9 Education, Inc.
12-18
publishing as Prentice Hall
(continued) Clustering of Colleges
Hierarchical clustering results for clusters 3 and 4
12-19
Classification
Recognizes patterns that describe group to
which item belongs
We will analyze the Credit Approval Decisions
data to predict how to classify new elements.
Categorical variable of interest: Decision
(whether to approve or reject a credit
application)
Predictor variables: shown in columns A-E
Figure 12.10
12-20
Classification
Modified Credit Approval Decisions
The categorical variables are coded as numeric:
Homeowner - 0 if No, 1 if Yes
Decision - 0 if Reject, 1 if Approve
Figure 12.11
12-22
Classification
(continued) Partitioning Data Sets in XLMiner
Partitioning choices when choosing random
1. Automatic 60% training, 40% validation
2. Specify % 50% training, 30% validation, 20% test
(training and validation % can be modified)
3. Equal # records 33.33% training, validation, test
XLMiner has size and relative size limitations on the
data sets, which can affect the amount and % of data
assigned to the data sets.
12-24
Classification Techniques
(continued) Using Discriminant Analysis for
Classifying New Data
Figure 12.27
12-25