Discovering Correlated Subspace Clusters in 3D Continuous-Valued Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

2010 IEEE International Conference on Data Mining

Discovering Correlated Subspace Clusters in 3D Continuous-Valued Data

Kelvin Sim Zeyar Aung Vivekanand Gopalkrishnan


Institute for Infocomm Research Masdar Institute of Science and Technology Nanyang Technological University
A*STAR, Singapore Abu Dhabi, United Arab Emirates Singapore
Email: [email protected] Email: [email protected] Email: [email protected]

p
tam
Abstract—Subspace clusters represent useful information in T3
T2

es
high-dimensional data. However, mining significant subspace T1

Tim
clusters in continuous-valued 3D data such as stock-financial T0 Financial ratios
ratio-year data, or gene-sample-time data, is difficult. Firstly, Stocks Div. LT debt Operating
ROA
yield /assets margin Cluster 4
typical metrics either find subspaces with very few objects, or
they find too many insignificant subspaces – those which exist Microsoft 2 5 19 35 (Significant
Apple 0 0 20 27 cluster)
by chance. Besides, typical 3D subspace clustering approaches
Google 0 0 20 35
abound with parameters, which are usually set under biased
Adams Golf 0 0 --20
20 --16
16 Cluster 5
assumptions, making the mining process a ‘guessing game’. We

………..
A. P. Pharma 0 0 -97
97 --812
812
812
address these concerns by proposing an information theoretic
4 Kids Ent. 0 0 -67
67 --135
135
135
measure, which allows us to identify 3D subspace clusters that
Barnes & Noble 4 0 3 3
stand out from the data. We also develop a highly effective,
Gap 2 0 14 13 Cluster 9
efficient and parameter-robust algorithm, which is a hybrid
Citigroup 0 30 0 -10 Cluster 10
of information theoretical and statistical techniques, to mine
AIG 0 13 -1 -14 (Significant
these clusters. From extensive experimentations, we show that cluster)
our approach can discover significant 3D subspace clusters Cluster 1 Cluster 3 Cluster 2
embedded in 110 synthetic datasets of varying conditions. We (a)
also perform a case study on real-world stock datasets, which
shows that our clusters can generate higher profits compared Metric Clusters obtained
Support 1,2,3,4,10
to those mined by other approaches. Lift [2] 5,. . . ,9
Non-uniform distributed [3] 1,2,4,10
Keywords-3D subspace clustering, financial data mining, Correlation information [4] 4,10
information theory.
(b)
I. I NTRODUCTION Figure 1. (a) Each rounded rectangle represents a 3D subspace cluster.
Three-dimensional (3D) data, in the general form of Solid rectangles represent significant clusters, as their values have high
occurrences and their occurrences purely constitute the cluster. (b) 3D
object-attribute-time/location has become increasingly pop- subspace clusters considered to be significant by different metrics.
ular in data analysis. Many real-world applications, such
as microarray analysis based on gene-sample-time or gene-
sample-region data, and stock analysis based on stock-
different characteristics by rounded rectangles. Intuitively,
financial ratio-year data, basically cluster the continuous
clusters 4 and 10 are the most significant among them;
3D data to perform their task. However, because these
technology stocks Microsoft, Apple, Google are in cluster
data are essentially high dimensional, traditional clustering
4, and financial stocks Citigroup and AIG are in cluster
approaches operating on the full data space become ineffec-
10. On the other hand, clusters 1,2,3 contain stocks from
tive. Zhao and Zaki [1] attempted to tackle this problem
different industries, while clusters 5 to 9 contain a single
by clustering subspaces in the 3D data, so that objects
stock each. Clusters 1 and 2 are based on values that most of
are grouped based upon their similarity in some subset of
the stocks in the dataset have, which is obvious and useless
attributes and time. In such formulations, a 3D subspace
information. Cluster 3 is induced by the clusters 1 and 2,
cluster can be considered as a cuboid spanned by a group
thus its existence is by chance. Both, clusters 4 and 10,
of objects, a group of attributes and a group of timestamps.
have high occurrences of similar values on multiple financial
This cuboid is inherently axis-parallel, which is important
ratios, and all occurrences of the similar values constitute the
for the user to easily interpret and understand the cluster.
cluster.
In order to be useful, a 3D subspace cluster generally
requires a substantial number of objects which have similar We denote such significant axis-parallel 3D subspace clus-
values in a subset of attributes and timestamps. Consider the ters as Correlated 3D Subspace Clusters (CSCs), where the
stock-financial ratio-year 3D continuous dataset as shown values in each cluster have the correlation characteristics:
in Fig. 1(a). The figure represents 3D subspace clusters of • they have high co-occurrences

1550-4786/10 $26.00 © 2010 IEEE 471


DOI 10.1109/ICDM.2010.19
• their co-occurrences are not by chance This brings us to our second question: “How to efficiently
Having defined the desired clusters, we are now faced mine CSCs with high correlation information, and with min-
with an open question: “How to measure the goodness of imum user interference?” The ideal solution is to exhaus-
a CSC?” Almost all subspace clustering algorithms use the tively mine all CSCs, and output those whose correlation
support metric, which requires the values in the cluster to information are high. But this approach is computationally
have high occurrences together, but they do not consider the infeasible, as even the simpler 2D subspace clustering is a
second characteristic. Brin et al. [2] proposed the lift metric, NP-hard problem [10]. We propose a more pragmatic and
which measures the second characteristic, but is biased efficient approach, by using pairs of values whose correlation
towards values with low occurrences. Moise and Sander [3] information are significantly high as building blocks for
proposed that a cluster is significant when the occurrences CSCs. This approach has two main advantages: the CSCs
of values in it do not follow a uniform distribution. This are guaranteed to have high correlation information, and the
statistical approach handles the above characteristics, but it search space for CSCs is tremendously reduced.
can only handle data with uniform distribution. Next, we need to determine when a correlation informa-
We propose using a metric correlation information, which tion is considered to be significantly high. Explicitly setting
also embodies the above characteristics, but is not dependent a threshold is meaningless, as the user would not know
on the data distribution. Correlation information quantifies the correct setting. We propose using the notion of rarity
the correlation between a pair of subspace clusters [4], thus it to measure significance: a high correlation information is
can be naturally extended to quantify the correlation within significant when its occurrence is extremely rare, i.e., when
a 3D subspace cluster. Figure 1(b) shows the 3D subspace its probability (denoted as p-value) is less than a threshold α.
clusters obtained using different metrics. As we can see, We show that our default setting of α works well in practice,
several metrics can be used to find the desired CSCs, but and is insensitive to the input data.
some of them generate spurious results. Contributions In summary, we address the problem
In order to use correlation information on 3D continuous- of mining CSCs, which are significant axis-parallel 3D
valued data, we must calculate the probability density func- subspace clusters, and make the following contributions:
tion (pdf) and probabilities of high dimensional continuous • We present a novel information theoretic measure to
values, which is extremely challenging. We use kernel quantify the correlation of CSCs (c.f., Section III).
density estimation to estimate the pdf, as it is non-parametric • We develop a highly effective and efficient algorithm,
and can converge to the true pdf [5]. To calculate the which uses a hybrid of information theoretical and
probabilities of high dimensional continuous values, we statistical methods to mine CSCs (c.f., Section IV).
use the normalized pdf method [6], which is faster than • We empirically show the superiority of our approach
performing Monte Carlo integration of pdf [7] and more over the state-of-the-art using a wide range of ex-
accurate than estimating the pdf as the probability [8]. periments (c.f., Section V). Our approach is better at
Now that an appropriate metric for CSCs has been cho- finding significant 3D subspace clusters embedded in
sen, the remaining task is to design a subspace clustering 110 synthetic datasets of varying conditions. We also
algorithm that mines the clusters based upon this metric. provide a case study on real stock market datasets,
Generally, most clustering algorithms require the user to set which shows that CSCs generate higher profits than
the parameters of their metrics, and then these algorithms other clusters.
return clusters that satisfy the parameters. Hence, it is the
user who determines the results, based upon his/her biased II. R ELATED W ORK
assumptions. Besides, these algorithms typically abound Most of the subspace clustering algorithms find subspace
with parameters, thereby increasing the burden on the user. clusters that fulfill certain distance or similarity-based func-
For example, due to the complex nature of 3D continuous- tions [10]–[12], so that the members in each cluster display a
valued data, the pioneering work in 3D subspace clustering certain degree of homogeneity. The user has to set parameter
[1] requires a total of 7 parameter settings. thresholds on these functions, but the optimal thresholds
Recent work by Keogh et al. [9], promoting the concept are generally unknown. Likewise in density-based subspace
of parameter-free or parameter-light data mining, declares: clustering [13], a global density threshold is required. Wang
“A parameter-free algorithm prevents us from imposing et al. [14] proposed k-subspace clustering, which replaces
our prejudices and presumptions on the problem at hand, the thresholds requirement with specifying k number of
and lets the data itself speak to us”. subspace clusters needed, but the optimal k is unknown. The
Following this spirit, we should be interested in 3D parameter-light algorithm of Moise and Sander [3] mines
subspace clusters that are intrinsically prominent in the data, subspace clusters that are statistically significant but it only
in other words, the mining process should discover clusters handle uniformly-distributed data. Sim et al. [4] proposed
that ‘stand out’ in the data, without requiring fine tuning of mining top-k multi-attribute co-clusters (MACs) from 2D
parameters. data, which are highly correlated pairs of subspace clusters.

472
Other clustering techniques which handle high dimen-
sional data have been proposed, such as co-clustering [19]
and tensor clustering [20]. However, they partition the data
O O O into clusters, which is different from subspace clustering.
T Parameter-free algorithms have been developed for tra-
A at A T ditional clustering [9], graph partitioning [21] and clusters
(a) (b) (c) refinement [22], but due to the complexity of mining 3D sub-
ci ( v 2 , v 3 | v1 ) = 0 space clusters in 3D continuous-valued data, no parameter-
p(v1, v2, v3)
c~ i ( v 2 , v 3 | v1 ) = 0.523 = 0.3 free or parameter-light algorithm has yet been developed for
v3 this problem.
o p(v3) = 0.3
v1 v2 t3 Sun et al. [23] proposed a global approach of projecting a
v2 o p(v2) = 0.3 data of M order (e.g. the data in cuboid is of 3rd order) into
v1 t2
…...

o v= p(v a tensor of the same order, but of a much smaller magnitude.


T o 1) = 0.3
A vd t1 Its objective is to get a summary of the data, which is
A
(d) (e) different from the task of subspace clustering. Jakulin and
Bratko [24] proposed using mutual information to analyze if
Figure 2. (a) Continuous-valued cuboid D = O × A × T . (b) A sub- attributes ‘interact’ with each other, i.e. how correlated they
cuboid: domain of attribute a at time t, D(a, t) = O × {a} × {t}. (c) are. Thus, theirs is a global approach of finding correlation
A sub-cuboid: domain of a set of attributes A at a set of timestamps T ,
D(A, T ) = O × A × T . (d) A slice of D(A, T ), S = {o} × A × T , between attributes, whereas ours is a local approach of
˜ 2 , v3 |v1 ) ≥ ci(v2 , v3 |v1 )
mapped to a vector v. (e) Example of ci(v finding correlations within a 3D subspace cluster.
(c.f., Section III-D).
III. C ORRELATION I NFORMATION
A. Preliminaries
In this paper, we extend its concept of using correlation
Our data is a continuous-valued cuboid D = O × A ×
information to mine CSCs. Note that the aforementioned
T , with objects O, attributes A and timestamps T as its
methods are only applicable for 2D data, and not suitable
dimensions (Fig. 2(a)). We denote the value of object o on
for solving our problem. Furthermore, it is non-trivial to
attribute a, and at time t as xoat . Let O ⊆ O, A ⊆ A and
extend them to handle 3D subspace clusters. Jiang et al.
T ⊆ T . We define sub-cuboid C = O × A × T as a subset
[15] attempted this by transforming the 3D data in a 2D data
of cuboid D.
and then mine 3D subspace clusters from the transformed
The domain of attribute a at time t is a sub-cuboid
data; but they consider the temporal/spatial dimension in full
denoted as D(a, t) = O × {a} × {t}. The domain of a set of
space, which means there is no concept of subspace in the
attributes A at a set of timestamps T is a sub-cuboid denoted
temporal/spatial dimension.
as D(A, T ) = O×A×T . Examples of D(a, t) and D(A, T )
Axis-parallel 3D subspace clusters are extensions of the
are shown in Fig. 2(b) and Fig. 2(c) respectively. We define
2D subspace clusters with time/location as the third dimen-
a slice as a sub-cuboid S = {o} × A × T ∈ D(A, T ) (Fig.
sion. Tricluster [1] is the pioneer work on 3D subspace
2(d)). We can also map a slice S to a column vector v as
clusters. Similar to 2D subspace clusters, triclusters fulfill
follows:
certain similarity-based functions and thresholds have to
Definition 1: (Mapping of slice S = {o} × A × T to
be set on these functions. Xu et al. [16] also proposed a
column vector v) Let slice S be represented as a partially
3D subspace clustering model, S 2 D3 cluster, which also
ordered set {xoat |a ∈ A, t ∈ T } with cardinality d, and let
requires thresholds to be set on its parameters, but S 2 D3
v = (v1 , . . . , vd )T be a column vector of d values. We map
clusters are not axis-parallel. Triclusters and S 2 D3 clusters
S to v using function β : S → v = xoat 7→ vi (1 ≤ i ≤ d).
also suffer from the parameter settings problem that their
2D peers have. Moreover, their problem is compounded by B. Estimation of Probability Density Function (pdf) using
the 3D nature of the data; they have 7 and 5 parameters Kernel Density Estimation
to set respectively. Our model does not suffer from the In order to calculate correlation information, we need to
parameter setting problem, as it only has a single parameter, first calculate the pdf and the probability of the values and
and the data is insensitive to this parameter. Ji et al. [17] vectors of the data.
and Cerf et al. [18] also proposed 3D subspace clusters Let us assume domain D(a, t) as a continuous random
which satisfy certain parameters’ thresholds, but they can variable with values xoat and f (xoat ) as the pdf of domain
only handle simple binary data and cannot be extended to D(a, t). We use kernel density estimation to estimate pdf
handle continuous-valued data. Even if we transform the data f (xoat ). Firstly, kernel density estimation is non-parametric,
by discretization, selecting the appropriate discretization so there are no rigid assumptions on the distribution of the
method is a hard problem, and the number of attributes may data. Secondly, amongst the non-parametric pdf estimators,
increase exponentially.

473
it gives the smoothest pdf estimate on continuous-valued the total pdf of all values or vectors in the domain. As such,
data. Furthermore, Parzen [5] showed that it converges to the the method intuitively follows the concept of probability [6].
true pdf if the bandwidth of the kernel is properly selected. Definition 4: (Probability of value xoat ∈ D(a, t))
Definition 2: (pdf estimate of domain D(a, t))
fˆ(xoat )
p̂(xoat ) = P (3)
ˆ 0
1

1  xoat − xo0 at 2
 xo0 at ∈D(a,t) f (xo at )
fˆ(xoat ) =
X
√ exp −
|O|h 2π 2 h Definition 5: (Probability of vector v ∈ D(A, T ))
xo0 at ∈D(a,t)
(1) fˆ(v)
We use the Gaussian kernel but it is acceptable to use p̂(v) = P (4)
ˆ 0
v0 ∈D(A,T ) f (v )
other kernels, as the difference in the pdf estimate by
different kernels is negligible [25]. h is the bandwidth which D. Correlation Information
determines the width of the kernel and we adopt Silver- Correlation information is derived from the generalization
man’s bandwidth formula [25] and set h = 0.09m|O|−1/5 , of mutual information [27], and is defined as follows.
where m = min{σD(a,t) , iqrD(a,t) /1.34}. σD(a,t) is the Definition 6: (Correlation information between two vec-
sample standard deviation of D(a, t) and iqrD(a,t) is the tors v2 and v3 ) Let v2 = (v1 , . . . , vd )T ∈ D(A, {t2 }) and
interquartile range of D(a, t). We set our bandwidth to be v3 = (w1 , . . . , wn )T ∈ D(A, {t3 }).
an order of magnitude smaller than Silverman’s formula to
increase the details of the pdf, so that regions of different d X
X n
densities in the pdf are more distinguishable. There is danger ci(v2 , v3 ) = p̂(v1...i , w1...j )
of undersmoothing the data, but in Section V, we show that i=1 j=1
p̂(vi ,wj |v1...i−1 ,w1...j−1 )
it works well in practice. log p̂(vi |v1...i−1 ,w1...j−1 )p̂(wj |v1...i−1 ,w1...j−1 )
Definition 3: (pdf estimate of domain D(A, T )) (5)
For brevity, we denote a sequence of values v1 , . . . , vd
as v1...d . Note that p̂(x|y) can be calculated by simply
fˆ(v) = 1
|O|hd (2π)d/2 det(S)
1/2
0 T
 (2) expanding p̂(x|y) = p̂(x,y)
p̂(y) .
S −1 (v−v0 )
exp − (v−v )
P
v0 ∈D(A,T ) 2h2 Equation 5 only considers the correlation information
between a pair of vectors, and we need to extend it to
S is the sample covariance matrix of the vector v calculate the correlation information of vectors between dif-
of the domain D(A, T ), and det(S) is the determinant ferent timestamps. Let v1 = (u1 , . . . , um )T ∈ D(A, {t1 }).
of S. We also use the Gaussian kernel in calculating To consider the correlation information between v2 and v3 ,
fˆ(v). We adopt Silverman’s multivariate Gaussian ker- given prior knowledge of v1 , we can extend Eq. 5 as:
nel bandwidth formula [25] and set the bandwidth h =
  1/(d+4)
4
0.01 d+2 |O|−1/(d+4) , which is also an order of d X
X n

magnitude smaller than Silverman’s formula. ci(v2 , v3 |v1 ) = p̂(v1...i , w1...j , u1...m )
i=1 j=1
C. Estimation of Probability using Normalized pdf p̂(vi ,wj |v1...i−1 ,w1...j−1 ,u1...m )
log p̂(vi |v1...i−1 ,w1...j−1 ,u1...m )p̂(wj |v1...i−1 ,w1...j−1 ,u1...m )
Kwak and Choi [8] assumed the pdf estimate as the Xd X n

probability of the value or vector, which is a very poor = p̂(v1...i , w1...j , u1...m )
approximation because it is possible that pdf is greater than i=1 j=1
p̂(v1...i ,w1...j ,u1...m )p̂(u1...m )
one. A more accurate approximation is by integrating the pdf log p̂(v1...i ,w1...j−1 ,u1...m )p̂(v1...i−1 ,w1...j ,u1...m )
to obtain the area surrounding the value or hyperrectangle (6)
containing the vector, and the result is the probability1 . The However, Eq. 6 is not exactly suitable for our problem.
Newton-Cotes formulas [26] can be used to calculate areas, The original Eq. 5 measures correlation between a pair
but they cannot be used to calculate hyperrectangles. Monte of vectors. Hence, using Eq. 6 will penalize sequences
Carlo integration can be used on vectors, but the number of vectors that are correlated across time. To remedy this
of random vectors needed to calculate the hyperrectangle is problem, we amend Eq. 6 to get the following equation
exponentially large [7]. which promotes correlation across time.
We propose using normalized pdf, which is more accurate Definition 7: (Adjusted correlation information between
than assuming pdf as probability and more efficient than vectors v2 and v3 , given v1 )
d X n
Monte Carlo integration. Normalized pdf approximates the ˜ 2 , v3 |v1 ) =
X
ci(v p̂(v1...i , w1...j , u1...m )
probability by calculating the pdf of the value or vector over
i=1 j=1
ˆ 1...i ,w1...j ,u1...m )
p(v
1 The integration of a value or vector is zero, hence we integrate an area log p̂(v1...i ,w1...j−1 ,u1...m )p̂(v1...i−1 ,w1...j ,u1...m )
surrounding the value or hyperrectangle containing the vector. (7)

474
Proposition 1: ci(v ˜ 2 , v3 |v1 ) ≥ ci(v2 , v3 |v1 ) IV. A LGORITHM FOR M INING CSC S (MIC)
Proof: Let a represents We present the algorithm MIC to mine CSCs from a
p̂(v1...i ,w1...j ,u1...m )
log p̂(v1...i ,w1...j−1 ,u 1...m )p̂(v1...i−1 ,w1...j ,u1...m )
and b continuous-valued cuboid D, with its framework shown in
represents p̂(v1...i , w1...j , u1...m ). For any two real Fig. 3(b). MIC consists of two parts:
numbers x and y, we know that log(x) < log(y) if
1) Generating seeds. From D, we obtain pairs of values
0 < x < y. Since p̂(u1...m ) ≤ 1, ∀i ∈ {1, . . . , d} and
(xoat , xoat+1 ) that have significantly high correlation
∀j ∈ {1, . . . , n}, a.b ≥ a.b.p̂(u1...m ).
information, which we denote as seeds.
According to Proposition 1, the prior knowledge of v1
2) Mining CSCs. The seeds are used as building blocks
penalizes ci(v2 , v3 |v1 ) even if v1 is highly correlated to
for CSCs.
v2 and v3 , due to the term p̂(u1...m ). This penalty does
not occur in ci(v ˜ 2 , v3 |v1 ) as this term is removed. We use
A. Generating seeds
Fig. 2(e) to illustrate our point, which shows 3 slices v1 ,
v2 , v3 , and p̂(v1 ) = p̂(v2 ) = p̂(v3 ) = p̂(v1 , v2 , v3 ) = For each pair of values (xoat , xoat+1 ) given o ∈ O, a ∈
0.3. Based on the pairwise correlation concept of Eq. 6, A, t ∈ T , we calculate the adjusted correlation information
ci(v2 , v3 |v1 ) = 0 because by having the prior information ˜ oat , xoat+1 ). Although adjusted correlation information
ci(x
of v1 , we know that the correlation between v2 and v3 is is for vectors, we can simply represent a pair of values as
the same as the correlation between v1 and v2 , so no new a pair of vectors (each vector contains a value), and use the
information is gained. However, based on the correlation same formula. For conciseness, we denote ci(x ˜ oat , xoat+1 )
across time concept of Eq. 7, ci(v ˜ 2 , v3 |v1 ) = 0.523 because as ci. Let the set of positive adjusted correlation information
by having the prior information of v1 , we know that v1 , v2 of pairs of values be denoted as CI = {ci|ci > 0, o ∈ O, a ∈
and v3 are correlated, and this is new information gained. A, t ∈ T }.
We propose that the significance of a seed is determined
E. Correlated 3D Subspace Cluster (CSC) on the rarity of its correlation information, which can be
We denote DO (A, {t}) = O × A × {t} as the domain calculated using statistics. Let us assume that we have the
of the set of attributes A at time t, projected on the set of null hypothesis “A sample ci is equal to the mean of CI”,
objects O. and let the probability of having ci be p-value(ci), assuming
Definition 8 (Correlated 3D subspace cluster (CSC)): the null hypothesis holds. Very low p-value(ci) means that
Sub-cuboid C = O × A × T is a CSC if the adjusted it is very rare to have such seed with high correlation
correlation information of C, information, and we deem a pair of values to be a seed if
˜
ci(C) =
X X
˜ i , vi−1 |v1,...,i−2 )
ci(v
its ci is statistically significant, i.e., p-value(ci) ≤ α, where
i∈T v1 ∈DO (A,{1})...vi ∈DO (A,{i})
α is a probability threshold. Therefore, the set of seeds is
(8)
is high. seeds = {(xoat , xoat+1 )|p-value(ci) ≤ α, o ∈ O, a ∈ A, t ∈ T }
Intuitively, a sub-cuboid C is a CSC if (1) for each time
We set a default setting of α = 1.0E − 4, which we shown
frame O × A × {t} of C, its values are correlated and (2)
in our experiments to work well in practice. However, the
for each pair of contiguous time frames O × A × {t} and
user can also set his preferred α.
O × A × {t + 1} of C, they are correlated, given prior
˜ We now explain how we derive p-value(ci). We first need
time frames. We empirically show that high ci(C) leads to
to model the probability distribution of CI. Given that (1)
significant 3D subspace clusters. We embedded a significant
values in CI are continuous, (2) they are positive and (3)
3D subspace cluster having 15 objects, 3 attributes and 5
probability distribution of CI is unknown and dependent on
timestamps in a synthetic 3D continuous-valued dataset of
data D, either gamma or Weibull distribution is a suitable
100 objects, 10 attributes and 10 timestamps, and exhaus-
candidate. Both offers the flexibility of modeling any contin-
tively mined all 3D subspace clusters and calculated their
˜ uous and positive distribution, as the scale and shape of the
ci(C). Figure 3(a) shows the results of using the mined
distribution can be adjusted by their two parameters [28].
clusters to recover the embedded cluster. The results are
Hence, using either one will result in obtaining the same
measured by significance, which is defined in Section V-A.
˜ quality of clusters, but we adopt the gamma distribution for
We can see that mined clusters with high ci(C) leads to
computation efficiency’s sake, as the convergences of the
exact discovery of the embedded cluster.
˜ maximum likelihood estimation (MLE) of its parameters are
Details of determining how high ci(C) is considered to
much faster [29].
be significant is explained in the next section. We only mine
Let CI be gamma-distributed with parameters, shape
maximal CSCs to remove redundancies in the CSCs. A CSC
k and scale θ, CI ∼ Γ(k, θ). The pdf of the gamma
C = O×A×T is maximal when there does not exist another cik−1 ci
CSC C 0 = O0 × A0 × T 0 such that O ⊆ O0 , A ⊆ A0 and distribution is f (ci; k, θ) = Γ(k)θ k exp(− θ ), where ci ∈ CI
R ∞ k−1 −t
T ⊆ T 0. and Γ(k) = 0 t e dt is the gamma function.

475
1

0.8
Significance

0.6 Generating Seeds Mining CSCs


0.4 Calculate Extend each
3D Model the Get the pairs of values
ci of all seed with
0.2 continuous- probability (xoat, xoat+1)
pairs of other seeds
valued distribution whose probability
0
values to create
data of ci p-value(ci) ≤ a as seeds
0.2 0.4 0.6 0.8 1 (xoat, xoat+1) CSCs
Adjusted correlation information
(a) (b)

Figure 3. (a) Significance of the mined 3D subspace clusters with varying adjusted correlation information in recovering the embedded cluster. The x-axis
is normalized from 0 to 1, (b) Framework of Algorithm MIC.

Algorithm 1 growSeeds workflow of function growSeeds. Although we use greedy-


Input: based approach, we show in Section V that the quality of
seeds (building blocks for CSCs) our clusters is high across different experiments.
Output: As we are dealing with continuous-valued data, it is
Maximal CSCs possible to have seeds whose values are highly similar and
Description: are concentrated in certain value ranges. Hence, we remove
1: prune seeds; these duplicates to improve the algorithm’s efficiency. Let
2: for all seed in seeds do there be two values x and z in D such that x ≤ z. They
3: initialize seed as a cluster C = O × A × T ; are denoted as neighbors if there does not exist another
4: while C can still be extended do value y in D such that x ≤ y ≤ z. Let there be two seeds
5: ˜
ciatt ← 0max ci(extend(C, a0 )); (xoat , xoat+1 ) and (xo0 at , xo0 at+1 ). They are considered as
a ∈A/A
6: t0 ← the next time where extend(C, t0 ) is valid; duplicates if and only if (1) xoat and xo0 at are neighbors
7: ˜
citime ← ci(extend(C, t0 )); and (2) xoat+1 and xo0 at+1 are neighbors. The seed with
8: if ciatt > citime then the lower correlation information will be pruned, but the
9: C ← extend(C, a0 ); pruned seeds are kept for later usage.
10: else In line 3 of Algorithm 1, a seed (xoat , xoat+1 ) is initial-
11: C ← extend(C, t0 ); ized as a cluster C = {o} × {a} × {t, t + 1}. Cluster C
12: end if is iteratively extended either by attribute a or time t until
13: end while no more extensions are valid. The choice is dependent on
14: add pruned seeds to C; output C as a CSC; which extension gives the higher correlation information.
15: end for The illustration of the extension is given in the middle
diagram of Fig. 4.
In function extend(C, a0 ), cluster C is extended by a
After obtaining the estimated parameters k̃, θ̃ using MLE set of seeds whose values belong to attribute a0 ∈ A/A.
[30], we use gamma distribution to model CI. We then These seeds are selected from {(xoa0 t , xoa0 t+1 )|o ∈ O, a0 ∈
proceed to calculate p-value(ci) using the cumulative dis- A/A, t, t + 1 ∈ T }, which we denote as candidates. An
tribution function (cdf) of the gamma distribution extension by a0 is valid if, for each pair of timestamps
Z ci t, t+1 ∈ T , C can be extended with the candidate that gives
1 ˜ The extension by attribute a0
p-value(ci) = tk̃−1 e−t/θ̃ dt (9) the largest increase in ci(C).
Γ(k̃)θ̃k̃ 0 is invalid if there is no increase of ci(C) ˜ or there are no
which can be efficiently calculated using the Newton- candidates in any pair of timestamps t, t + 1 ∈ T . A new
Raphson method [26]. cluster is created when the extension is valid.
Similarly, in function extend(C, t0 ), cluster C is extended
B. Mining CSCs by a set of seeds whose values belong to time t0 , with the
Algorithm 1 presents the function growSeeds, which uses seeds selected from the candidates {(xoat0 , xoat0 +1 )| o ∈
the generated seeds as building blocks for CSCs. The general O, a ∈ A}. An extension by t0 is valid if, for each attribute
idea is that each seed is considered as an initial cluster, a ∈ A, cluster C can be extended with the candidate that
and we greedily ‘grow’ this initial cluster by extending it ˜
gives the largest increase in ci(C). The extension by time
with other seeds. This growth is guided by maximizing the ˜
t0 is invalid if there is no increase of ci(C) or there are no
correlation information of the cluster. Figure 4 shows the candidates of any attribute a ∈ A. A new cluster is created

476
seeds
Pruned O
Seeds Seeds are used
Seeds t+1
as building t
A
o t+1 o t+1
blocks Extend by End of CSC
a t a t for CSCs CSC timestamp t seeds extension
…...
…...

Pruned
Extend by seeds
o t+1 o t+1 O attribute a
a t a t O
T T
a
Pruned seeds are
A
merged to a CSC to
A CSC is iteratively extended by obtain the final CSC
attribute a or timestamp t

Figure 4. Framework of function growSeeds. Each seed is initialized as a CSC and is iteratively extended.

when the extension is valid. We select t0 ← t|T |−1 as the authors. The experiments were performed in a Windows 7
time for the extension and check if the extension is valid. environment, using an Intel Core 2 Quad 3.0 Ghz CPU with
If not, the time for extension will be iteratively incremented 8Gb RAM. As TRICLUSTER could only be run on Unix,
until there is a valid extension. we evaluated it using AMD Opteron, a powerful server with
Note that there is no need to directly extend the set of 1024 2.5 Ghz CPUs and 4096 Gb RAM.
objects of cluster C as the extension of the set of objects
is induced during the extension by attribute or time. After A. Quality of 3D Subspace Clusters
cluster C is fully extended, we add seeds pruned earlier We investigated the quality of the 3D subspace clusters
(Algo 1 line 1) to C to further extend it. Pruned seeds can mined by various algorithms. We created synthetic 3D
be added to C if they are neighbors of the seeds used to continuous-valued datasets D = O × A × T , each having
build C. After all CSCs are mined, we do a simple post- 1000 objects, 10 attributes and 10 timeframes, with values
processing to output the maximal CSCs. ranging from -1 to 1. In each dataset D, we embedded 10
random 3D subspace clusters, each having 10–20 objects
C. Time Complexity of MIC
being similar in 2–4 attributes, across 4–6 timeframes. In
We discuss the worst case time complexity of MIC. Let each time of the cluster, we set a maximum difference
n be the number of values in the cuboid D. The generating (denoted as diff) between its objects’ values on each of its
seeds phase has a time complexity of O(n2 ) as we calculate attributes. This ensures that the cluster is homogeneous, and
the pairwise correlation information between values. Let s we varied diff from 0–0.1. In order to have a thorough and
be the number of seeds generated. At the worst case, each fair experiment, we created 10 datasets for each diff setting,
seed is extended s − 1 times, and the upper bound of the resulting in a grand total of 110 synthetic datasets.
number of values involved in each extension is n. Hence Let C ∗ be the set of embedded 3D subspace clusters, and
the worst case time complexity of the growSeed is O(ns2 ). C = O∗ ×A∗ ×T ∗ be one such embedded cluster. Similarly,

In total, the worst case complexity of MIC is O(n2 + ns2 ). let C be the set of mined 3D subspace clusters, and C =
Since the size of the data is not within our control, limiting O × A × T be one such mined cluster. We use the following
the number of seeds can improve the efficiency of MIC, quality metrics to measure the closeness of C to C ∗ [31].
which is achievable by decreasing α. ∗
P r(S ∗ )
• Recoverability: re(C ) = arg maxC∈C t∈T ∗ |S ∗ | ,
V. E XPERIMENTS & A NALYSES where r(S ∗ ) = max{|S ∗ ∩ S| such that S ∗ = O∗ ×
We conducted three main experiments to validate our ap- A∗ ×{t} ⊂ C ∗ , S = O ×A×{t} ⊂ C}. Recoverability
proach. First, we embedded significant 3D subspace clusters measures the ability of C to recover C ∗ . P
of various characteristics in a large number of synthetic • Spuriousness: sp(C) = arg minC ∗ ∈C ∗ t∈T s(S)
|S| ,
datasets and checked if different algorithms are able to mine where s(S) = |S| − max{|S ∗ ∩ S| such that S ∗ =
them. Second, we evaluated the efficiency and scalability of O∗ × A∗ × {t} ⊂ C ∗ , S = O × A × {t} ⊂ C}.
the algorithm MIC. Third, we performed a financial data Spuriousness measures how spurious C is.
2Re(1−Sp)
mining case study by mining 3D subspace clusters from real- • Significance = Re+(1−Sp) , where Re =

P P
world stock-financial ratio-year datasets and investigated the C ∗ ∈C ∗ re(C ) and Sp = C∈C sp(C).
usefulness of the different types of 3D subspace clusters. Significance is a measure to find the best trade-off be-
All programs were coded in C++, and the codes for com- tween recoverability and spuriousness. The higher the
peting techniques were kindly provided by their respective Significance, the more similar are the mined clusters to

477
the embedded clusters. In an ideal case, Recoverability to be 100 in this experiment (we will discuss how the
is 1 and Spuriousness is 0. efficiency of MIC is affected by the number of seeds
We compared our algorithm MIC with parameter-laden in the next paragraph). Figure 6(a) presents the running
3D and 2D subspace clustering algorithms, TRICLUSTER times for varying number of attributes and timestamps in
[1] and MaxnCluster [12] (denoted as MNC in the graphs), dataset D, and Figure 6(b) presents the running time for
and a parameter-light 2D subspace clustering algorithm varying number of objects in dataset D. Although MIC uses
STATPC [3]. For the 2D algorithms, we mine subspace computationally intensive techniques such as kernel density
clusters in each time frame, intersect all combinations of the estimation and correlation information, it is still scalable
subspace clusters to form 3D subspace clusters, and output when the attributes and timestamps are in tens and when
3D subspace clusters that are maximal. the objects are in thousands. Hence MIC is more suitable
The parameter-laden algorithms have two main types of for medium-sized data, such as the financial ratios data
parameters: minimum size of the clusters, and similarity and the microarray (gene expression) data. In fact, MIC
functions. Varying all parameter settings (TRICLUSTER can complete the experiments described in Section V-A and
has 7 parameters, MaxnCluster has 4 parameters) for the V-C, whereas TRICLUSTER, the only other axis-parallel 3D
experiments is practically impossible. Hence, we gave them subspace clustering algorithm, fails to do so.
an unfair advantage by letting them have the prior knowledge 2) Parameter α: The p-value threshold parameter α
of the size of the embedded clusters and we just varied their determines the number of seeds generated in MIC – lower
similarity functions. For TRICLUSTER and MaxnCluster, α results in lesser number of seeds. Figure 6(c) presents the
we varied their similarity function parameters  and δ from running time for varying α. We can see that it is important
0.05 to 0.15 respectively, and kept their other parameters at to keep α small for a faster running time. Besides, keeping
default settings. For STATPC, we used its default settings. α small is also necessary for yielding CSCs with high
For MIC, we set p-value at 1.0E-4 and 1.0E-5. correlation information.
Figure 5 presents the average recoverability, spuriousness
C. Case Study on Stock Market Data
and significance of the algorithms on the 110 datasets
across varying diff. We can see that MIC has the highest Fundamental investors analyze the stock-financial ratio-
significance and recoverability across varying diff. Even with time data to pick stocks, as they believe that financial
one order of magnitude difference in p-value, the results are ratios are indicators of future stock price movements [32].
still highly similar, which shows that MIC is parameter- This relation between stock price and financial ratios can
insensitive, and fine tuning of p-value is not required. be studied by finding groups of high-performance stocks
Although MIC does not have the prior knowledge of the with similar financial ratios across years, which can be
actual size of the embedded clusters while MaxnCluster represented as 3D subspace clusters. In this experiment,
and TRICLUSTER have this advantageous knowledge, MIC we investigated the effectiveness of using significant 3D
is still able to outperform them. For MaxnCluster, its best subspace clusters in stocks selection. We mined 3D subspace
results are depended on δ; for embedded clusters with higher clusters from a training data of stocks having constant high
diff, δ = 0.1 gives the best results, and for embedded clusters growth in their prices, and in the testing data, we bought
with lower diff, δ = 0.05 gives the best results. This shows stocks that contain the financial ratios’ values of these 3D
that unless the user knows exactly what he wants or what are subspace clusters. Then the price returns of these stocks are
the actual clusters in the data (if this is the case, then there is used to gauge the effectiveness of the 3D subspace clusters.
no need for clustering), setting the right parameters to obtain We downloaded financial figures of North American
the correct result is a guessing game for parameter-laden stocks from year 1980 to 2000, from Compustat [33]. We
algorithms. The results of TRICLUSTER are not shown as converted these financial figures into 30 financial ratios,
it has scalability issues; it could not complete execution even based on the ratios’ formula from Investopedia [34]. Stocks
after 24 hours on the powerful server. For STATPC, it could whose prices are less than USD$5 were removed from the
not find most of the embedded clusters. data, as they are manipulative in nature and their financial
figures are less transparent [35]. We used stocks with com-
B. Efficiency and Scalability Analysis pound annual growth rate (CAGR) of at least 10% from year
We analyzed the efficiency of MIC in two main areas: 1980 to 1989 as the training data. For the testing data, we
the size of the dataset D and the parameter p-value. Unless used all stocks from year 1990 to 2000. Thus, the training
specifically stated, the default dataset D is a synthetic 3D data contains 231 stocks and the testing data contains 8406
continuous-valued data which contains 1000 objects, 10 stocks.
attributes and 10 timestamps. In the training phase, we set p-value to 10−4 and 10−5
1) Size of the dataset D: We investigated the scalability for MIC. For STATPC, we used its default settings. For
of MIC with respect to the number of objects, attributes the parameter-laden algorithms, it is impossible to try all
and timestamps of dataset D. We set the number of seeds combinations of their parameters. For both TRICLUSTER

478
MNC 0.05 MNC 0.1 MNC 0.15 STATPC MIC 1.0E-4 MIC 1.0E-5
1 1 1
Recoverability

0.8 0.8 0.8

Spuriousness

Significance
0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
diff diff diff
(a) (b) (c)

Figure 5. Quality of the 3D subspace clusters mined by different algorithms across 110 synthetic datasets. Each synthetic dataset is embedded with
clusters having varying homogeneity. Although the parameter-laden algorithms are informed of the size of the embedded clusters prior to mining, MIC
still outperforms them. The results of TRICLUSTER are not shown as it could not complete execution even after 24 hours.

5 5 4
10 10 10
Attribute
Running time (secs)

Running time (secs)

Running time (secs)


Timestamp

104 104

3 3 3
10 10 10
0 20 40 60 80 100 0 2,000 4,000 6,000 8,000 10,000 10-8 10-7 10-6 10-5 10-4 10-3

(a) Varying number of attributes and timestamps (b) Varying number of objects (c) Varying α

Figure 6. The running time of MIC across different settings.

and MaxnCluster, we fixed the minimum attributes and the average return of stocks by MIC are the highest, and
timestamps to 2 and 4 respectively, varied minimum stocks even its 95% confidence interval is substantially higher than
from 50–100 with increments of 10, and varied the similarity the baseline. For STATPC, its average return of stocks is
setting from 0–0.15 in increments of 0.05. So, we had 24 much lower than the baseline. For MaxnCluster, its average
settings for each algorithm. return of stocks is above the baseline, but is lower than MIC.
In the testing data from year 1990 to 2000, we bought a Let us assume that we have the following null hypothesis
stock if more than one 3D subspace cluster covered it. Let “The average return of a stock using Graham’s selling
us assume that a 3D subspace cluster has a set of years T . method is 27.5%”. Based on this null hypothesis, we
We deem that a 3D subspace cluster covers a stock if there calculate the p-values of the results shown in Fig. 7(a),
exists a set of years T 0 for the stock, such that there is a one- and present them in Fig. 7(b). As the p-values of MIC
to-one correspondence between T and T 0 : ∀k ∈ {1, . . . , n}, and MaxnCluster are extremely small, we can conclude
the financial ratios’ values of the stock in year jk ∈ T 0 are that their results are statistically significant. On the other
in the financial ratios’ ranges of the 3D subspace cluster in hand, the result of STATPC is statistically insignificant. In
year ik ∈ T . We bought the stock on the last year of T 0 and summary, MIC is able to select stocks whose average return
to evaluate the stock, we used the selling method proposed is statistically higher than the baseline and stocks selected
by Graham [36]. In this method, the stock is sold after two by other algorithms.
years, or if its price appreciates to 50% within two years. If
we had bought all stocks in the testing data, we would have VI. C ONCLUSION
obtained an average return of 27.5%, which is the baseline We proposed mining CSCs, which are 3D subspace
of this experiment. clusters that are intrinsically prominent or significant in
Figure 7(a) shows the average returns with 95% confi- 3D continuous-valued data. They are clusters that stand
dence intervals of the stocks, based on the different types of out in the data, and not manifestations of the bias and
3D subspace clusters, along with the baseline. Unfortunately, prejudices of the user. We developed an algorithm MIC,
TRICLUSTER could not complete execution, even after 24 which uses a hybrid of information theory and statistical
hours under any of the parameter settings. We can see that techniques to mine CSCs. In certain situations where user’s

479
0.4 100 [12] G. Liu, K. Sim, J. Li, and L. Wong, “Efficient mining
Avg return of stocks

10-2
-4
of distance-based subspace clusters,” Stat. Anal. Data Min.,
0.35 10
vol. 2, no. 5-6, pp. 427–444, 2009.

p-value
10-6
0.3
[13] P. Kröger, H.-P. Kriegel, and K. Kailing, “Density-connected
10-8
subspace clustering for high-dimensional data,” in SDM,
10-10
0.25 2004, pp. 246–257.
10-12
[14] D. Wang, C. H. Q. Ding, and T. Li, “K-subspace clustering,”
10-14
0.2 in ECML/PKDD (2), 2009, pp. 506–521.
Baseline STATPC MNC MIC STATPC MNC MIC
[15] D. Jiang, J. Pei, M. Ramanathan, C. Tang, and A. Zhang,
(a) (b) “Mining coherent gene clusters from gene-sample-time mi-
croarray data,” in KDD, 2004, pp. 430–439.
Figure 7. (a) Average returns with 95% confidence intervals of the stocks [16] X. Xu, Y. Lu, K.-L. Tan, and A. K. H. Tung, “Finding time-
bought using different 3D subspace clusters, compared with the baseline, lagged 3D clusters,” in ICDE, 2009, pp. 445–456.
(b) p-values of the average returns. [17] L. Ji, K.-L. Tan, and A. K. H. Tung, “Mining frequent closed
cubes in 3D datasets,” in VLDB, 2006, pp. 811–822.
[18] L. Cerf, J. Besson, C. Robardet, and J.-F. Boulicaut, “Data-
Peeler: Constraint-based closed pattern mining in n-ary rela-
presumption is needed on the CSCs mining, we allow tions,” in SDM, 2008, pp. 37–48.
the user to set a parameter which controls the number of [19] I. S. Dhillon, S. Mallela, and D. S. Modha, “Information-
CSCs to be mined. Its default setting is also shown to theoretic co-clustering,” in KDD, 2003, pp. 89–98.
work well in practice. In our experiments, we showed that [20] H. Huang, C. Ding, D. Luo, and T. Li, “Simultaneous tensor
CSCs are significant 3D subspace clusters in a wide range subspace selection and clustering: The equivalence of high
order SVD and k-means clustering,” in KDD, 2008, pp. 327–
of synthetic datasets and from real-world stock-financial 335.
ratio-year datasets, higher profits can be generated using [21] D. Chakrabarti, “AutoPart: Parameter-free graph partitioning
CSCs, compared to 3D subspace clusters mined by other and outlier detection,” in PKDD, 2004, pp. 112–124.
algorithms. We also showed that MIC is scalable to medium- [22] C. Böhm, C. Faloutsos, J.-Y. Pan, and C. Plant, “Robust
sized data, which are common in financial and biological information-theoretic clustering,” in KDD, 2006.
[23] J. Sun, D. Tao, and C. Faloutsos, “Beyond streams and graphs:
domains. Dynamic tensor analysis,” in KDD, 2006, pp. 374–383.
[24] A. Jakulin and I. Bratko, “Analyzing attribute dependencies,”
R EFERENCES in PKDD, 2003, pp. 229–240.
[25] B. W. Silverman, Density Estimation for Statistics and Data
[1] L. Zhao and M. J. Zaki, “TRICLUSTER: An effective algo- Analysis (Chapman & Hall/CRC Monographs on Statistics &
rithm for mining coherent clusters in 3D microarray data,” in Applied Probability), 1st ed. Chapman and Hall/CRC, 1986.
SIGMOD, 2005, pp. 694–705. [26] R. L. Burden and J. D. Faires, Numerical analysis, 9th ed.
[2] S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic Cengage Learning, 2010.
itemset counting and implication rules for market basket [27] T. M. Cover and J. A. Thomas, Elements of Information
data,” in SIGMOD, 1997, pp. 255–264. Theory. Wiley-Interscience, 1991.
[3] G. Moise and J. Sander, “Finding non-redundant, statistically [28] N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous
significant regions in high dimensional data: A novel ap- Univariate Distributions, Vol. 1 (Wiley Series in Probability
proach to projected and subspace clustering,” in KDD, 2008, and Statistics), 2nd ed. Wiley-Interscience, 1994.
pp. 533–541. [29] L. J. Bain and M. Englehardt, Statistical Analysis of Relia-
[4] K. Sim, V. Gopalkrishnan, H. N. Chua, and S.-K. Ng, “MACs: bility and Life-testing Models, 2nd ed. CRC-Press, 1991.
Multi-attribute co-clusters with high correlation information,” [30] T. P. Minka, “Estimating a Gamma distribution,” 2002.
in ECML/PKDD (2), 2009, pp. 398–413. [31] R. Gupta, G. Fang, B. Field, M. Steinbach, and V. Kumar,
[5] E. Parzen, “On estimation of a probability density function “Quantitative evaluation of approximate frequent pattern min-
and mode,” Ann. Math. Stat., vol. 33, no. 3, pp. 1065–1076, ing algorithms,” in KDD, 2008, pp. 301–309.
1962. [32] B. Graham, The Intelligent Investor: A Book of Practical
[6] R. R. Yager, “On the instantiation of possibility distributions,” Counsel. Harper Collins Publishers, 1986.
Fuzzy Sets and Systems, vol. 128, no. 2, pp. 261–266, 2002. [33] “Compustat,” http://www.compustat.com [Last accessed
[7] W. J. Morokoff, R. E. Caflisch, and O. Numbers, “Quasi- 2009].
Monte Carlo integration,” J. Comput. Phy., vol. 122, pp. 218– [34] “Investopedia,” http://www.investopedia.com/university/ratios/
230, 1995. [Last accessed 2009].
[35] U.S. Securities and Exchange Commission,
[8] N. Kwak and C.-H. Choi, “Input feature selection by mutual
“Microcap stock: A guide for investors,”
information based on Parzen window,” IEEE Trans. Pattern
http://www.sec.gov/investor/pubs/microcapstock.htm, 2009.
Anal. Mach. Intell., vol. 24, no. 12, pp. 1667–1671, 2002.
[36] H. R. Oppenheimer, “A test of Ben Graham’s stock selection
[9] E. J. Keogh, S. Lonardi, and C. A. Ratanamahatana, “Towards
criteria,” Finan. Anal. J., vol. 40, no. 5, pp. 68–74, 1984.
parameter-free data mining,” in KDD, 2004, pp. 206–215.
[10] Y. Cheng and G. M. Church, “Biclustering of expression
data,” in ISMB, 2000, pp. 93–103.
[11] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan,
“Automatic subspace clustering of high dimensional data for
data mining applications,” in SIGMOD, 1998, pp. 94–105.

480

You might also like