3 Ravi
3 Ravi
Discretization:
• Divide the range of a continuous attribute into intervals
• Some classification algorithms only accept categorical attributes.
• Reduce data size by discretization
• Prepare for further analysis
Discretization
Typical methods:
1 Binning
2 Clustering analysis
3 Interval merging by χ2Analysis
Architecture of a typical data mining
system
Pattern evaluation
Knowledge base
Data cleansing
Data Integration Filtering
• Background knowledge
• Interestingness measure
• Utility (support)
– usefulness of a pattern
support (A=>B) = # tuples containing both A and B
total # of tuples
A support of 30% for the rule means that 30%
of all customers purchased both a computer
and software.
Every association rule has a support and a confidence.
“The support is the percentage of transactions that demonstrate the rule.”
Example: Database with transactions ( customer_# : item_a1, item_a2, … )
1: 1, 3, 5.
2: 1, 8, 14, 17, 12.
3: 4, 6, 8, 12, 9, 104.
4: 2, 1, 8.
support {8,12} = 2 (,or 50% ~ 2 of 4 customers)
support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )
support {1} = 3 (,or 75% ~ 3 of 4 customers)