Data Mining Unit2 3
Data Mining Unit2 3
Margaret H. Dunham
Department of Computer Science and Engineering
Southern Methodist University
Companion slides for the text by Dr. M.H.Dunham, Data Mining, Introductory and
Advanced Topics, Prentice Hall, 2002.
© Prentice Hall 1
Data Mining Outline
• PART I
– Introduction
– Related Concepts
– Data Mining Techniques
• PART II
– Classification
– Clustering
– Association Rules
• PART III
– Web Mining
– Spatial Mining
– Temporal Mining
© Prentice Hall 2
Classification Outline
Goal: Provide an overview of the classification problem
and introduce some of the basic algorithms
© Prentice Hall 3
Classification Problem
© Prentice Hall 4
Classification Examples
© Prentice Hall 5
Classification Ex: Grading
x
• If x >= 90 then grade =A.
<90 >=90
• If 80<=x<90 then grade
=B. x A
F D
© Prentice Hall 6
Classification Ex: Letter
Recognition
Letter A Letter B
Letter C Letter D
Letter E Letter F
© Prentice Hall 7
Classification Techniques
• Approach:
1. Create specific model by evaluating
training data (or using domain experts’
knowledge).
2. Apply model developed to new data.
• Classes must be predefined
• Most common techniques use DTs, NNs,
or are based on distances or statistical
methods.
© Prentice Hall 8
Defining Classes
Distance Based
Partitioning Based
© Prentice Hall 9
Issues in Classification
• Missing Data
– Ignore
– Replace with assumed value
• Measuring Performance
– Classification accuracy on test data
– Confusion matrix
– OC Curve
© Prentice Hall 10
Height Example Data
Name Gender Height Output1 Output2
Kristina F 1.6m Short Medium
Jim M 2m Tall Medium
Maggie F 1.9m Medium Tall
Martha F 1.88m Medium Tall
Stephanie F 1.7m Short Medium
Bob M 1.85m Medium Medium
Kathy F 1.6m Short Medium
Dave M 1.7m Short Medium
Worth M 2.2m Tall Tall
Steven M 2.1m Tall Tall
Debbie F 1.8m Medium Medium
Todd M 1.95m Medium Medium
Kim F 1.9m Medium Tall
Amy F 1.8m Medium Medium
Wynette F 1.75m Medium Medium
© Prentice Hall 11
Classification Performance
© Prentice Hall 12
Confusion Matrix Example
Actual Assignment
Membership Short Medium Tall
Short 0 4 0
Medium 0 5 3
Tall 0 1 2
© Prentice Hall 13
Operating Characteristic Curve
© Prentice Hall 14
Regression
• Assume data fits a predefined function
• Determine best values for regression coefficients
c0,c1,…,cn.
• Assume an error: y = c0+c1x1+…+cnxn+e
• Estimate error using mean squared error for
training set:
© Prentice Hall 15
Linear Regression Poor Fit
© Prentice Hall 16
Classification Using Regression
© Prentice Hall 17
Division
© Prentice Hall 18
Prediction
© Prentice Hall 19
Classification Using Distance
© Prentice Hall 21
KNN
© Prentice Hall 22
KNN Algorithm
© Prentice Hall 23
Classification Using Decision Trees
© Prentice Hall 25
DT Induction
© Prentice Hall 26
DT Splits Area
M
Gender
Height
© Prentice Hall 27
Comparing DTs
Balanced
Deep
© Prentice Hall 28
DT Issues
© Prentice Hall 29
Decision Tree Induction is often based on
Information Theory
So
© Prentice Hall 30
Information
© Prentice Hall 31
DT Induction
© Prentice Hall 32
Information/Entropy
© Prentice Hall 33
Entropy
© Prentice Hall 34
ID3
© Prentice Hall 35
ID3 Example (Output1)
• Starting state entropy:
4/15 log(15/4) + 8/15 log(15/8) + 3/15 log(15/3) = 0.4384
• Gain using gender:
– Female: 3/9 log(9/3)+6/9 log(9/6)=0.2764
– Male: 1/6 (log 6/1) + 2/6 log(6/2) + 3/6 log(6/3) = 0.4392
– Weighted sum: (9/15)(0.2764) + (6/15)(0.4392) =
0.34152
– Gain: 0.4384 – 0.34152 = 0.09688
• Gain using height:
0.4384 – (2/15)(0.301) = 0.3983
• Choose height as first splitting attribute
© Prentice Hall 36
C4.5
• ID3 favors attributes with large number of
divisions
• Improved version of ID3:
– Missing Data
– Continuous Data
– Pruning
– Rules
– GainRatio:
© Prentice Hall 37
CART
© Prentice Hall 38
CART Example
© Prentice Hall 39
Classification Using Neural
Networks
© Prentice Hall 40
NN Issues
© Prentice Hall 41
Decision Tree vs. Neural Network
© Prentice Hall 42
Propagation
Tuple Input
Output
© Prentice Hall 43
NN Propagation Algorithm
© Prentice Hall 44
Example Propagation
© Prentie Hall
© Prentice Hall 45
NN Learning
© Prentice Hall 46
NN Supervised Learning
© Prentice Hall 47
Supervised Learning
© Prentice Hall 48
NN Backpropagation
© Prentice Hall 49
Backpropagation
Error
© Prentice Hall 50
Backpropagation Algorithm
© Prentice Hall 51
Gradient Descent
© Prentice Hall 52
Gradient Descent Algorithm
© Prentice Hall 53
Output Layer Learning
© Prentice Hall 54
Hidden Layer Learning
© Prentice Hall 55
Types of NNs
© Prentice Hall 56
Perceptron
© Prentice Hall 57
Perceptron Example
• Suppose:
– Summation: S=3x1+2x2-6
– Activation: if S>0 then 1 else 0
© Prentice Hall 58
Self Organizing Feature Map
(SOFM)
© Prentice Hall 59
Kohonen Network
© Prentice Hall 60
Kohonen Network
© Prentice Hall 61
Radial Basis Function Network
© Prentice Hall 62
Radial Basis Function Network
© Prentice Hall 63
Classification Using Rules
© Prentice Hall 64
Generating Rules from DTs
© Prentice Hall 65
Generating Rules Example
© Prentice Hall 66
Generating Rules from NNs
© Prentice Hall 67
1R Algorithm
© Prentice Hall 68
1R Example
© Prentice Hall 69
PRISM Algorithm
© Prentice Hall 70
PRISM Example
© Prentice Hall 71
Decision Tree vs. Rules
• Tree
Ruleshas
have
implied
no ordering
order inofwhich
predicates.
splitting is
performed.
•• Tree
Only created
need to based
look aton looking
one at all
class to classes.its
generate
rules.
© Prentice Hall 72
Clustering Outline
Goal: Provide an overview of the clustering problem
and introduce some of the basic algorithms
© Prentice Hall 73
Clustering Examples
© Prentice Hall 74
Clustering Example
© Prentice Hall 75
Clustering Houses
Geographic
Size
Distance
Based Based
© Prentice Hall 76
Clustering vs. Classification
• No prior knowledge
– Number of clusters
– Meaning of clusters
• Unsupervised learning
© Prentice Hall 77
Clustering Issues
• Outlier handling
• Dynamic data
• Interpreting results
• Evaluating results
• Number of clusters
• Data to be used
• Scalability
© Prentice Hall 78
Impact of Outliers on Clustering
Outliers are sample points much different from those
of remaining set of data
© Prentice Hall 79
Clustering Problem
© Prentice Hall 80
Types of Clustering
© Prentice Hall 81
Clustering Approaches
Clustering
© Prentice Hall 82
Cluster Parameters
© Prentice Hall 83
Distance Between Clusters
• Single Link: smallest distance between points
• Complete Link: largest distance between points
• Average Link: average distance between points
• Centroid: distance between centroids
© Prentice Hall 84
Hierarchical Clustering
© Prentice Hall 85
Hierarchical Algorithms
• Single Link
• MST Single Link
• Complete Link
• Average Link
© Prentice Hall 86
Dendrogram
© Prentice Hall 87
Levels of Clustering
© Prentice Hall 88
Agglomerative Example
A B C D E A B
A 0 1 2 2 3
B 1 0 2 4 3
C 2 2 0 1 5 E C
D 2 4 1 0 3
E 3 3 5 3 0
D
Threshold of
1 2 3 4 5
A B C D E
© Prentice Hall 89
Divisive clustering
A B
A B C D E
A 0 1 2 2 3
B 1 0 2 4 3 E C
C 2 2 0 1 5
D 2 4 1 0 3
D
E 3 3 5 3 0
© Prentice Hall 91
Agglomerative Algorithm
© Prentice Hall 92
Single Link
© Prentice Hall 93
MST Single Link Algorithm
© Prentice Hall 94
Single Link Clustering
© Prentice Hall 95
Partitional Clustering
• Nonhierarchical
• Creates clusters in one step as opposed to
several steps.
• Since only one set of clusters is output, the
user normally has to input the desired number
of clusters, k.
• Usually deals with static sets.
© Prentice Hall 96
Partitional Algorithms
• MST -Syllabus
• Squared Error
• K-Means
• Nearest Neighbor-Syllabus
• PAM
• BEA
• GA
© Prentice Hall 97
MST Algorithm
© Prentice Hall 98
Squared Error
© Prentice Hall 99
Squared Error Algorithm
• Example.Assume threshold as 2
• K1={A} ,look at B,dis(A,B)=1.it is less than 2 so
K1={A,B},look at C It is 2 K1={A,B,C}
• Dis (D,C)=1<2 K1={A,B,C,D}
• Look at E it is 3.
• K2={E}.
PAM
• {A,B,C,D,E,F,G,H}
• Randomly choose initial solution:
{A,C,E} {B,F} {D,G,H} or
10101000, 01000100, 00010011
• Suppose crossover at point four and choose
1st and 3rd individuals:
10100011, 01000100, 00011000
• What should termination criteria be?
• CT Triple: (N,LS,SS)
– N: Number of points in cluster
– LS: Sum of points in the cluster
– SS: Sum of squares of points in the cluster
• CF Tree
– Balanced search tree
– Node has CF triple for each child
– Leaf node represents cluster and has CF value for each
subcluster in it.
– Subcluster has maximum diameter
s=30% a = 50%
• Advantages:
– Uses large itemset property.
– Easily parallelized
– Easy to implement.
• Disadvantages:
– Assumes transaction database is memory
resident.
– Requires up to m database scans.
• Large databases
• Sample the database and apply Apriori to the
sample.
• Potentially Large Itemsets (PL): Large itemsets
from sample
• Negative Border (BD - ):
– Generalization of Apriori-Gen applied to
itemsets of varying sizes.
– Minimal set of itemsets which are not in PL, but
whose subsets are all in PL.
PL -
PL BD (PL)
© Prentice Hall 148
Sampling Algorithm
1. Ds = sample of Database D;
2. PL = Large itemsets in Ds using smalls;
3. C = PL BD-(PL);
4. Count C in Database using s;
5. ML = large itemsets in BD-(PL);
6. If ML = then done
7. else C = repeated application of BD-;
8. Count C in Database;
© Prentice Hall 149
Sampling Example
• Advantages:
– Reduces number of database scans to one in the
best case and two in worst.
– Scales better.
• Disadvantages:
– Potentially large number of candidates in second
pass
D1
S=10%
• Advantages:
– Adapts to available main memory
– Easily parallelized
– Maximum number of database scans is two.
• Disadvantages:
– May have many candidates during second scan.
• Based on Apriori
• Techniques differ:
– What is counted at each site
– How data (transactions) are distributed
• Data Parallelism
– Data partitioned
– Count Distribution Algorithm
• Task Parallelism
– Data and candidates partitioned
– Data Distribution Algorithm
• Target
• Type
• Data Type
• Data Source
• Technique
• Itemset Strategy and Data Structure
• Transaction Strategy and Data Structure
• Optimization
• Architecture
• Parallelism Strategy
© Prentice Hall 161
Comparison of AR Techniques
• Support
• Confidence
• Interest
• Conviction
• Chi Squared Test