Data Mining Techniques & Applications
Data Mining Techniques & Applications
multiple tables)
middle aged no low safe
retired yes middle safe
TID Items
retired no middle safe
◼ Transaction database
100 apple, beer, newspaper
200 retired
apple, yes
beef, beer, newspaper, potato high safe
300 beef, potato
of items) 34
41
59
15,19
16,07
19,98
181,14
202,83
313,46
0,99
0,9
0,88
655,58
953,22
688,95
249,74
241,63
239,67
242,32
244,51
248,89
numeric attributes)
match
coach
game
32 15,16 180,49 0,99 624,84 233,26 263,12
play
win
DocumentID
◼ Document-term matrix 1
2
10
2
2
1
2
0
2
0
3
2
3 0 34 5 10 10
4 4 0 1 2 2
◼ .
Foci Progress
Region measured at 0.5 h measured at 1 h measured at 2 h measured at 6 h
1 72.30 33.98 30.7 10.2
2 65 32.5 26.4 12.5
3 67.8 34.3 22.1 8.4
Fat layer
abstraction
22244 04/04/2006 beer MK 6.99 1022 ……
22244 04/04/2006 nappies MK 10.89 1022 ……
23311 05/04/2006 beer MK 6.99 1011 ……
patterns ……
06/06/2006
……
Buckingham
……
1.99
……
……
How:
04/04/2006 Buckingham 13.32 ……
04/04/2006 MK 8.94 ……
05/04/2006 MK 6.99 ……
◼ By generalization using a given …… …… …… ……
concept hierarchy
By applying aggregate functions
Number of Items TotalPrice Clubcard# ……
◼ …… …… …… ……
(count, sum, average, etc.) 1
3
1.99
36.97
1111
1011
……
……
◼ Dropping some attributes ……
2 27.87
……
1022
……
……
……
attributes
Subset Not ok Stopping ok Selected Validate with
selection criterion subset Mining task
Why: 1900
1800
1700
mining
1400
1300
1200
interpretable 600
500
How:
100
0
count#
[1, 10]
[11, 20]
[21, 30]
[31, 40]
[41, 50]
[51, 60]
[61, 70]
[71, 80]
[81, 90]
[91, 100]
[101, 200]
[201, 300]
[301, 400]
[401, 500]
[141, 150]
◼ Transformation using function
Ex. xk, log(x), sin(x), etc. Call time (sec)
Count#
care.
1100
600
100
[0, 1] [1, 2] [2, 3]
-400
logrithm (base 10) of Call Time
(x
1
2 = − x) 2
m −1
i
i =1
◼ Variance (σ2) m
(x
1
= − x) 2
◼ Standard Deviation (σ) m −1
i
i =1
(x
1
covariance( x, y ) = − x)( yi − y )
m −1
i
◼ Matrix of covariance i =1
covariance( x, y )
◼ Correlation correlation ( x, y ) =
x y
of a table
4 1 0 1 0 0 1
5 0 1 0 1 1 0
6 1 0 1 0 0 1
7 0 1 0 1 1 0
◼ Sorting 4
6
1
1
1
3
1
2
0
5
0
4
0
2 1 1 1 0 0 0
6 1 1 1 0 0 0
8 1 1 1 0 0 0
5 0 0 0 1 1 1
3 0 0 0 1 1 1
9 0 0 0 1 1 1
1 0 0 0 1 1 1
7 0 0 0 1 1 1
60
Child
50
Teen
40 Adult
30
20
10
-5 5 15 25 35 45 Age
d
a e
h c a
Cluster 1 Cluster 2
0,4 0,1
Cluster 3
0,5
k i b b
c
0,1
0,3
0,8
0,3
0,1
0,4
g f d
e
0,1
0,4
0,1
0,2
0,8
0,4
f 0,7 0,1 0,2
g 0,5 0,4 0,1
• Total Customer = 5
2000
1999
• Customer Names 1998
Northampton
Milton Keynes
1998 Milton Keynes
March Buckingham
winter spring summer autumn