Hw2 Solution
Hw2 Solution
n
1 1 1
x̄= ∑ x i= , ȳ =
n i=1 2 2
n 1
std ( x )=
[ 1
∑
n−1 i=1 ]2 1
(x i − x̄ )2 = , std( y )=
√3
1
√3
x − x̄ 1 1 1 1 y − ȳ 1 1 1 1
x 'k = k = √ 3(− , ,− , ), y 'k = k =√ 3( ,− , ,− )
std ( x ) 2 2 2 2 std ( y ) 2 2 2 2
' '
correlation( p , q )=x ⋅y =−3
c) x =(1,-1,0,1), y=(1,0,-1,0) cosine, correlation, Euclidian
d ⋅d
cos( d 1 , d 2 )= 1 2
‖d 1‖‖d 2‖ , ‖x‖=√3,‖y‖= √2,cos( x , y)=1/ √ 6
n
Euclidian=
√∑
k =1
2
( p k −q k ) =√ 3
,
std ( x )=
1
[ ∑
n−1 i=1 ] 2 2
(x i − x̄ )2 =
√15
, std( y )=
2
√ 15
'
x k− x̄ √ 15 1 1 2 1 2 1 '
y k− ȳ √15 1 1 1 2 2 1
xk = = ( , ,− , ,− , , ), y k= = ( , , ,− ,− , , )
std ( x ) 2 3 3 3 3 3 3 std( y ) 2 3 3 3 3 3 3
' ' 5
correlation( p , q )=x ⋅y = =1. 25
4
a) How would you convert this data into a form suitable for association analysis?
Ans:
Association rule analysis works with binary attributes, so you have to convert
original data into binary form as follows:
b) In particular, what type of attributes would you have and how many of them are
there?
Ans:
400 asymmetric binary attributes.
1. Use smoothing by bin means to smooth the above data, using a bin depth of 3.
Illustrate your steps. Comment on the effect of this technique for the given data.
Ans:
Bin 1: 44/3, 44/3, 44/3 Bin 2: 55/3, 55/3, 55/3 Bin 3: 21, 21, 21
Bin 4: 24, 24, 24 Bin 5: 80/3, 80/3, 80/3 Bin 6: 101/3, 101/3, 101/3
Bin 7: 35, 35, 35 Bin 8: 121/3, 121/3, 121/3 Bin 9: 56, 56, 56
Ans:
Outliers in the data may be detected by clustering, where similar values are
organized into groups, or “clusters”. Values that fall outside of the set of
groups may be considered outliers. Alternatively, a combination of computer
and human inspection can be used where a predetermined data distribution is
implemented to allow the computer to identify possible outliers. These
possible outliers can then be verified by human inspection with much less
effort than would be required to verify the entire data set.
Ans:
[−1, 1]. Many times the data has only positive entries and in that case
the range is [0, 1].
(b) If two objects have a cosine measure of 1, are they identical? Explain.
Ans:
Not necessarily. All we know is that the values of their attributes differ
by a constant factor.
1. Cosine similarity:
x⋅y
cos( x , y)=
‖x‖‖y‖
2. Correlation:
n
1
x̄= ∑ x i=0
n i=1
n 1 n 1
std ( x )=
1
[ ∑
n−1 i=1 ][2
(x i − x̄ )2 =
1
∑ 2 1
x2 =
n−1 i=1 i ]
√ n−1
‖x‖
'' x− x̄ √ n−1 x
x= =
std( x ) ‖x‖
x⋅y
correlation( p , q )=x '⋅y ' =( n−1) =(n−1)cos( x , y )
‖x‖‖y‖
a) Compute the support for itemsets {e}, {b, d}, and {b, d, e} by treating each
transaction ID as a market basket.
Ans:
s({e}) = 8/10 = 0.8
s({b, d}) = 2/10 = 0.2
s({b, d, e}) = 2/10 = 0.2
c) Repeat part (a) by treating each customer ID as a market basket. Each item should
be treated as a binary variable (1 if an item appears in at least one transaction
bought by the customer, and 0 otherwise.)
Ans:
d) Use the results in part (c) to compute the confidence for the association rules
{b, d} −→ {e} and {e} −→ {b, d}.
Ans: