Clustering For Geo Timeseries 2020
Clustering For Geo Timeseries 2020
Science
To cite this article: Xiaojing Wu, Changxiu Cheng, Raul Zurita-Milla & Changqing Song (2020):
An overview of clustering methods for geo-referenced time series: from one-way clustering
to co- and tri-clustering, International Journal of Geographical Information Science, DOI:
10.1080/13658816.2020.1726922
Article views: 2
REVIEW ARTICLE
1. Introduction
Advances in data collection and sharing techniques have resulted in significant increases
in spatio-temporal datasets. Therefore, novel approaches in terms of pattern mining and
knowledge extraction are required for such large datasets (Miller and Han 2009, Cheng
et al. 2014, Shekhar et al. 2015). Geo-referenced time series (GTS), a type of spatio-
temporal data, record time-changing values of one or more observed attributes at fixed
locations and consistent time intervals (Kisilevich et al. 2010). GTS is popular in real
applications, and examples include hourly PM2.5 concentrations observed at a network
of ground monitoring stations. Moreover, sequences of images can also be considered as
GTS, e.g., satellite image time series.
Clustering is a data mining task that identifies similar data elements and groups them
together. As a result, data elements in each group, or cluster, are similar to each other and
dissimilar to those in other groups (Berkhin 2006, Han et al. 2012). This allows an overview
of datasets at the cluster level and also provides insights into details by focusing on
a single cluster (Andrienko et al. 2009). Thus, clustering is useful for extracting patterns
from spatio-temporal datasets.
As mentioned in previous studies (Zhao and Zaki 2005, Henriques and Madeira 2018, Wu
et al. 2018), clustering methods for GTS can be classified as one-way clustering, co-clustering
and tri-clustering methods depending on the dimensions involved in the analysis. In such
a classification, one-way clustering methods, also termed traditional clustering, identify
clusters in one of the dimensions of 2D datasets based on the similarity of data elements
along the other dimension (Dhillon et al. 2003, Zhao and Zaki 2005). Also analyzing 2D
datasets, co-clustering methods identify (co-)clusters along both spatial and temporal
dimensions based on the similarity of data elements along these two dimensions
(Banerjee et al. 2007, Wu et al. 2020a). Tri-clustering methods identify (tri-) clusters based
on the similarity of data elements along spatial, temporal and third, e.g., attribute, dimen-
sions from 3D datasets (Wu et al. 2018, Henriques and Madeira 2018). However, a systematic
description of clustering methods for GTS from this perspective has not yet been reported.
Besides, an important issue concerns selecting appropriate clustering methods for
specific tasks at hand considering various available methods (Grubesic et al. 2014).
Similar issues also exist when choosing clustering methods for GTS and we aim to provide
suggestions for selecting suitable methods. To achieve this, we define a taxonomy of
clustering-related geographical questions, compare clustering methods in the above
classification by answering these questions using representative algorithms and a case
study dataset, and provide suggestions for selecting suitable methods.
Thus, the objective of this study is to provide two important and unique perspectives
on clustering methods for GTS. First, we provide an overview of clustering methods for
GTS using the classification outlined above. Thereafter, we compare different clustering
methods by answering clustering-related geographical questions and provide sugges-
tions on selecting suitable methods.
The structure of this paper is as follows: First, we describe the types of GTS and define
clustering-related questions in Section 2. Thereafter, we systematically describe the
clustering methods for GTS in the classification in Section 3. In Section 4, our case study
dataset and representative algorithms are described. Clustering results are interpreted
and the algorithms are compared in Section 5. Finally, we discuss the results in Section 6
and draw conclusions in Section 7.
nested hierarchies in either spatial or temporal dimension (e.g., day and hour in the case
of time), then GTS also includes the single attribute GTS with nested hierarchies in the
spatial dimension (abbreviated to GTS-Ss, where the S after the hyphen indicates the spatial
dimension, and s indicates the plural form) and one with nested hierarchies in the temporal
dimension (abbreviated as GTS-Ts, where T after the hyphen indicates the temporal dimen-
sion, and s indicates the plural form). GTS with more complex structures, e.g., with multiple
attributes and nested spatial and temporal dimensions, are beyond the scope of this paper
and not further discussed – also because they need the development of new clustering
methods.
With two dimensions, GTS-A are 2D GTS and typically organized into a data table where
rows are locations, columns are timestamps in which the attribute is observed, and
elements of the table are values of the attribute (Figure 1(a)); for example, hourly PM2.5
concentrations recorded at monitoring stations. With three dimensions, GTS-As, GTS-Ss
and GTS-Ts are 3D GTS, and any of them can be organized into a data cuboid with rows,
columns and depths as its three dimensions. Take GTS-As for instance, in which rows are
locations, columns are timestamps, depths are attributes, and elements are values of
attributes observed at corresponding locations and timestamps (Figure 1(b)); for example,
hourly PM2.5, PM10, NO2 and CO values recorded at monitoring stations.
In addition to the data characteristics, the other important factor for selecting cluster-
ing methods is the questions researchers are interested to answer (Andrienko and
Andrienko 2006). According to the triad framework developed by Peuquet (left of
Figure 2), three types of questions can be structured for GTS concerning the three
components: (1) where (space) + when (time) → what (attribute); (2) when + what →
where; (3) where + what → when (Peuquet 1994). For these questions, two reading levels
Figure 1. Various formats of GTS under different situations. (a) single attribute GTS (GTS-A); (b)
multiple attributes GTS (GTS-As); (c) single attribute GTS with nested hierarchies in spatial dimension
(GTS-Ss); (d) single attribute GTS with nested hierarchies in temporal dimension (GTS-Ts).
Figure 3. Partitional clustering methods for GTS: (a&b1&b2) one-way clustering, (a&c) co-clustering
and (d&e) tri-clustering.
2018, Wu et al. 2020b). However, the majority of previous studies focused on other fields,
especially bioinformatics (Eren et al. 2012), with only a few recent studies focusing on
spatio-temporal data (Wu et al. 2017). To ensure our overview is comprehensive, co-
clustering methods used in other fields are also mentioned here.
Regarding partitional co-clustering methods, Dhillon et al. (2003) proposed the informa-
tion theoretic co-clustering (ITCC) algorithm for simultaneous word-document clustering.
With an initial random mapping from words to word-clusters and document to document-
clusters, ITCC regards the co-clustering issue as the optimization process in information
theory and formulates the objective function as the loss of mutual information between the
original variables (word and document) and the clustered ones (word-clusters and docu-
ment-clusters). Then, it optimizes the objective function by reassigning words and docu-
ments to word-clusters and document-clusters until convergence is achieved. Cho et al.
(2004), who aimed to analyze gene expression data, developed a co-clustering algorithm by
using the minimum sum-squared residual as the similarity/dissimilarity measure. This algo-
rithm organizes data in the form of a 2D matrix and yields the first set of row-clusters and
column-clusters using either random or spectral initialization, and uses residuals to build
the objective function, which it then minimizes to obtain the optimal co-clustering results.
Generalizing these previous studies (Dhillon et al. 2003, Cho et al. 2004), Banerjee et al.
(2007) subsequently proposed the Bregman co-clustering algorithm as a meta co-clustering
algorithm that aims to partition the original data into co-clusters with several distortion
functions such as the Euclidean distance. They also mentioned several applications of co-
clustering such as natural language processing (Rohwer and Freitag 2004) and video
content analysis (Cai et al. 2005). Recently, Wu et al. (2015) applied the Bregman block
average co-clustering algorithm with I-divergence (BBAC_I), a special case of the Bregman
co-clustering algorithm, to analyze temperature series for simultaneous location and time-
stamp clustering. This was the first study that applied co-clustering analysis to spatio-
temporal data. Afterwards, several studies applied BBAC_I for analyzing GTS in a variety
of fields, for example, disease hotspot detection (Ullah et al. 2017) and identification of
favorable conditions for virus outbreaks (Andreo et al. 2018).
Regarding hierarchical co-clustering methods, Hartigan (1972) developed a direct co-
clustering algorithm and applied it to analyze American presidential voting. This algo-
rithm, which is one of the earliest co-clustering algorithms, employs the squared
Euclidean distance to build the objective function and then aims to minimize it by
using a ‘divide and conquer’ direct clustering algorithm in a hierarchical manner.
Another hierarchical co-clustering algorithm proposed by Hosseini and Abolhassani
(2007) aimed to analyze queries and URLs of a search engine log, to mine the query
logs in web information systems. This algorithm uses the queries and URLs to construct
a bipartite graph in which singular value decomposition (SVD) is used to perform dimen-
sion reduction. Subsequently, k-means is used to iteratively cluster queries, and URLs are
used to create the hierarchical categorization. Costa et al. (2008) developed a hierarchical,
model-based co-clustering algorithm and used it to analyze internet advertisements.
Considering the dataset as a joint probability distribution, this algorithm groups tuples
into clusters characterized by different probability distributions. Thereafter, co-clusters are
identified by exploring the conditional distribution of elements over tuples. Inspired by
ITCC, Cheng et al. (2012), and Cheng et al. (2016) proposed a hierarchical co-clustering
algorithm by employing the information divergence as the measure of similarity/
8 X. WU ET AL.
dissimilarity to analyze newsgroups and documents. This algorithm starts with an initial
co-cluster, and then constructs hierarchical structures of rows and columns by iteratively
splitting the rows and columns to achieve convergence. Unlike ITCC, which uses the loss
in mutual information, Ienco et al. (2009) and Pensa et al. (2012) proposed a hierarchical
co-clustering algorithm named Incremental Flat and Hierarchical Co-Clustering (iHiCC).
This algorithm employs Goodman-Kruskal’s τ coefficient to measure the strength of the
link between two variables and uses the result for text categorization. Using the first
hierarchy created by τCoClust (Robardet 2002), this algorithm divides rows and columns
iteratively until only one element remains in all leaves of the hierarchies of both rows and
columns. However, to date, studies that have applied hierarchical co-clustering methods
for the analysis of spatio-temporal data have not been published. Detailed reviews on co-
clustering were presented by Charrad and Ahmed (2011), Eren et al. (2012), and Padilha
and Campello (2017).
named the Bregman cuboid average tri-clustering algorithm with I-divergence (BCAT_I) to
analyze 3D GTS.
Few studies concerned with hierarchical tri-clustering methods have been reported
and these efforts mostly focus on analyzing biological data. Gerber et al. (2007) developed
a tri-clustering algorithm named GeneProgram for gene expression data analysis based
on hierarchical Dirichlet processes. This algorithm first discretizes continuous gene
expression data, and then employs Markov chain Monte Carlo sampling to approximate
the model posterior probability distribution using a three-level hierarchy in the Dirichlet
process, and finally identifies tri-clusters by summarizing the distribution. Amar et al.
(2015) proposed an algorithm known as three-way module inference via Gibbs sampling
(TWIGS) to analyze large 3D biological datasets. TWIGS functions by initially developing
a hierarchical Bayesian generative model for binary data by using the Bernoulli-Beta
assumption and for real-valued data by using the Normal-Gamma assumption.
Subsequently, TWIGS employs a co-clustering solution as the starting point and then
iteratively improves it using the Gibbs sampler. Finally, tri-clusters are inferred from
candidate co-clusters. A detailed overview of tri-clustering algorithms was published by
Henriques and Madeira (2018).
Figure 4. Thiessen polygon map indicating the area covered by each station and the location of the
study area in Beijing (inset).
Figure 5. Temporal distribution of PM2.5 dataset collected in Beijing with non-zero days indicated in
green (outer circle) and the number of non-zero days in each month over the study period (inner
histogram).
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 11
4.2. K-means
Since the case study dataset is an hourly PM2.5 dataset, i.e., single attribute GTS with day
and hour as nested temporal dimensions (GTS-Ts), it was averaged to daily PM2.5 dataset,
i.e., single attribute GTS (GTS-A) when subjected to the traditional clustering method. The
dataset is organized into a table where rows are stations, columns are days and elements
are daily PM2.5 concentrations. Such a data table could also be seen as the co-occurrence
matrix OSD between a spatial and temporal variable, the former taking values in m (18)
stations and the latter in n (299) days.
Because of its wide use in many applications (Berkhin 2006), k-means was selected as
the algorithm representative of traditional clustering methods and used in this study to
perform temporal clustering. It is noteworthy that k-means can also be used to perform
spatial clustering.
12 X. WU ET AL.
Suppose the days are clustered into l day-clusters. The pseudocode of the process of
iteratively optimizing day-clusters by k-means is depicted in Figure 6. With a random
initialization, l days are first selected as the cluster centers (step 1). Then, the iterative
process starts by assigning each of n days to the most similar cluster center measured by
the Euclidean distance indicated by Deuc ð; Þ(step 2.1). Next, for each of l day-clusters, the
cluster center is updated as the mean of all days assigned to this cluster (step 2.2). The
objective function of k-means is typically formulated as the sum of squared errors
between the days and corresponding day-clusters. The iterative process continues until
the objective function converges (i.e. reaches a value below a predefined threshold) and
the optimized l day-clusters are yielded.
information divergence (step 2), where DI ðjjÞ indicates the information divergence of
two matrices. Thereafter, the iterative process starts by re-assigning stations to station-
clusters and days to day-clusters, to optimize the objective function (step 3). This process
has been proven to monotonically decrease the objective function after each reassign-
ment (Banerjee et al. 2007). The iterative process terminates when the objective function
achieves convergence (i.e., gets below a predefined threshold) and k × l optimized sta-
tion-day co-clusters are yielded.
optimization process of partitioning the hourly PM2.5 matrix into tri-clusters in an iterative
manner. Starting with a random initialization by mapping stations to k station-clusters, days
to l day-clusters and hours to z hour-clusters (step1), the algorithm first generates a tri-
clustered 3D data matrix (O^ SDH ). In the next step, BCAT_I measures the distortion between
the original and the tri-clustered matrices using the information divergence to build its
objective function (step 2). Thereafter, it aims to minimize the objective function by
iteratively updating mappings from stations to station-clusters, days to day-clusters and
hours to hour-clusters (step 3). The iterative process ceases when the objective function is
below a preset threshold, which yields the optimized k × l × z station-day-hour tri-clusters.
5. Results
In our analysis, the number of station-clusters was chosen to be three in accordance with
previous studies (Zhao et al. 2014, Wang et al. 2015) and the number of day-clusters is set
as four, with the expectation that days would fall into four ‘real’ seasons to enable us to
explore patterns of seasonal variations. Additionally, the number of hour-clusters was set
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 15
to six because the air pollution index (AQI) for PM2.5 is categorized into six levels: Excellent
(0–50), Good (51–100), Lightly polluted (101–150), Moderately polluted (151–200), Heavily
polluted (201–300) and Severely polluted (>300) (according to the Technical Regulation
on Ambient Air Quality Index (on Trial) (China 2012)). Clustering results are interpreted by
answering example questions for clustering the PM2.5 dataset and then the three cluster-
ing algorithms are compared in terms of several aspects.
Figure 9. Ringmap displaying the results of k-means clustering. The innermost circle indicating days
with zero values. Other four circles from inside outward indicating day-cluster 1 to day-cluster 4 and
days in each day-cluster colored the same using average value.
16 X. WU ET AL.
outward. Each circle indicates 365 days, which is divided into 12 months from
February 2013 to January 2014 in a clockwise direction, and days falling into each cluster
are colored using the average value of that cluster.
With the ringmap, four out of the 22 questions (numbers: 3, 6, 17, 21) can be answered.
In response to question number 3, the ringmap shows that day-cluster2 has the averaged
value of ‘Lightly polluted’ according to (China 2012) and days therein mainly occur in
Spring (April and May), Summer and early Autumn (July, August, September and October).
As for question number 6, it can be seen that ‘Good’ days in day-cluster1 are sparsely
spread in April, July, August, December and January while ‘Lightly polluted’ days occupy
most of the study area. ‘Heavily polluted’ days are scattered throughout Winter and also
October. In response to question number 16, days in day-cluster4 are ‘Heavily polluted,’
and the fewest days are scattered throughout January 2014, February, early March and
October. For question number 20, because day-clusters are arranged from 1 to 4 with
increasing values of PM2.5 concentrations, the pollution level becomes worse from day-
cluster1 to day-cluster4.
Figure 10. Heatmap displaying BBAC_I co-clustering results. The color of each co-cluster intersected
by each station- and day-cluster indicating the average value of that co-cluster.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 17
Figure 11. Small multiples (top) and ringmap (bottom) displaying BBAC_I co-clustering results. In the
small multiples, stations falling into each station-cluster colored using the average value of that
station-cluster. In the ringmap, the innermost circle indicating days with zero values. Other four circles,
from inside outward, representing day-cluster1 to day-cluster4 and days in each day-cluster colored
the same as the average value.
Figure 11) shows the temporal distribution of four day-clusters using four circles inside
outward with increasing values. For each circle, days in corresponding day-clusters are
displayed in the same color as an average value.
With these visualizations, more than half of the example questions (13) can be
answered (numbers: 1, 3, 5, 6, 7, 9, 11, 12, 14, 15, 17, 19, 21). Questions answered by
k-means clustering results are not repeated. For question number 1, the heatmap shows
18 X. WU ET AL.
Figure 12. Quasi-3D heatmap displaying the tri-clustering results. The color of each tri-cluster
intersected by each station-, day- and hour-cluster indicating the average value of that tri-cluster.
whereas high values exist in the south (east) (Zhao et al. 2014, Wang et al. 2015). With
respect to the seasonal variation, it is shown that high fluctuations of PM2.5 concentrations
occur in the Autumn and especially in the Winter, whereas a more stable pattern of
middle-valued concentrations appear in the Spring and Summer (Li et al. 2015, 2016).
Furthermore, hours from 7:00 to 14:00 are characterized by low concentrations, whereas
hours from 21:00 to 24:00 and 1:00 to 3:00 occur the highest PM2.5 concentrations. These
results are supported by previous studies on diurnal variations (Zhao et al. 2014, Chen
et al. 2015). For question number 10, the heatmap shows that in day-cluster2 and hour-
cluster1, station-cluster2 and station-cluster3 are observed to have a ‘Good’ pollution
level. This includes all stations except Haidianbeijingzhiwuyuan (1002, 海淀北京植物园)
as shown in the small multiples. The response to question number 13 is that, in day-
cluster2 and hour-cluster1, the pollution level worsens from ‘Excellent’ to ‘Good’ from
station-cluster1 to station-clusters2&3. For question number 16, the heatmap shows that,
in station-cluster1, the pollution level is observed to be ‘Good’ at several intersections
of day-clusters and hour-clusters, e.g., the intersections of day-cluster1 and hour-cluster6,
day-cluster2 and hour-cluster2. It also shows that the pollution level of ‘Good’ is observed
at additional intersections of day-clusters and hour-clusters in the study area for question
number 18 (e.g., that of day-cluster2 and hour-clusters1-6 at station-cluster3). For ques-
tion number 20, at station-cluster1, the pollution level worsens from day-cluster1
and hour-cluster1 to day-cluster4 and hour-cluster6, i.e., from hours 7:00–9:00 on days
scattered throughout April and November to hours 21:00–24:00 on days spread sparsely
across September, October and January 2014. Moreover, it shows that the pollution level
20 X. WU ET AL.
Figure 13. Small multiples (top), ringmap (middle) and bar timelines (bottom) displaying the tri-clustering
results. In the small multiples, stations in each station-cluster colored the same using average value. In the
ringmap, the innermost circle indicating days with zero values. Other four circles (from inside outward)
indicating day-cluster1 to day-cluster4 and days in each day-cluster colored the same using average value.
In the bar timelines, hours in each hour-cluster colored the same using average value.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 21
6. Discussion
6.1. Suggestions for selecting clustering methods
As mentioned above, tri-clustering methods represented by BCAT_I are more powerful in
analyzing GTS with fine resolutions and exploring complex patterns but are less compu-
tationally efficient than other methods. In comparison, traditional clustering methods
represented by k-means and co-clustering methods represented by BBAC_I are capable of
exploring less complex patterns but require less running time. Then, given one-way
clustering, co- and tri-clustering methods for GTS, is there one type as the best and
most suitable for any task and dataset? Or is it possible to select a single method as being
superior? There is no clear cut answer to such a question, as stated by Grubesic et al.
(2014). Selection of the most suitable method should consider the data type to be
analyzed, the research questions with which researchers are concerned, the computa-
tional effort, and the availability of the methods (Table 4).
If the data at hand are 2D GTS and research questions relate to the whole study area or
period, traditional clustering methods instead of co-clustering methods are recommended,
especially for large datasets. That is because the computational complexity of co-clustering
methods is generally higher than that of traditional clustering methods. As shown in
Table 4, the computational complexity of k-means is O(mnki) (where m is the number of
rows in GTS, n is the number of columns, k is the number of row-clusters and i is the number
of iterations needed to reach convergence). In comparison, the complexity of BBAC_I is
higher, i.e., O(mni(k + l)) (where l is the number of columns in GTS). Nevertheless, if research
7. Conclusions
In this paper, we systematically described the classification of clustering methods for GTS
categorized into one-way clustering, co- and tri-clustering methods. Furthermore, we
compared different categories to offer suggestions for selecting appropriate methods. To
achieve this, we defined a taxonomy of clustering-related questions with three compo-
nents (spatial-cluster, temporal-cluster and cluster) and two reading levels (elementary,
synoptic). Different methods were then compared by answering these questions using
representative algorithms and a case study dataset.
Our results show that tri-clustering methods are more powerful in exploring complex
patterns from GTS with fine resolutions at the cost of considerably extended running time.
In relative terms, one-way clustering and co-clustering methods require less running time
but are less capable of exploring complex patterns. However, the selection of the most
appropriate method should consider the data type, research questions, computational
complexity, and also the availability of methods. Traditional clustering methods are
recommended for analyzing large 2D datasets when research questions focus on the
whole study area or period; otherwise, co-clustering methods are recommended for 2D
GTS. Tri-clustering methods are recommended for analyzing 3D GTS for complex patterns,
albeit at the expense of additional computational effort. Finally, the classification
described in this study is necessary because it can include more co- and tri-clustering
methods for GTS and thus explore more complex spatio-temporal patterns.
Acknowledgments
We thank the reviewers for their constructive comments.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This work was supported by the National Natural Science Foundation of China [41771537,
41901317]; China Postdoctoral Science Foundation Grant [2018M641246]; National Key Research
and Development Plan of China [2017YFB0504102];Fundamental Research Funds for the Central
Universities.
References
Amar, D., et al., 2015. A hierarchical Bayesian model for flexible module discovery in three-way
time-series data. Bioinformatics, 31 (12), i17–i26. doi:10.1093/bioinformatics/btv228
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 25
Andreo, V., et al., 2018. Identifying favorable spatio-temporal conditions for west nile virus outbreaks
by co-clustering of modis LST indices time series. In: IGARSS 2018-2018 IEEE International
Geoscience and Remote Sensing Symposium. Valencia, Spain, 4670–4673.
Andrienko, G., et al., 2009. Interactive visual clustering of large collections of trajectories. In: 2009
IEEE Symposium on Visual Analytics Science and Technology (VAST) 12-13 Oct. Atlantic City, New
Jersey, 3–10.
Andrienko, G., et al., 2010. Space-in-time and time-in-space self-organizing maps for exploring spatio-
temporal patterns. Computer Graphics Forum, 29 (3), 913–922. doi:10.1111/cgf.2010.29.issue-3
Andrienko, N. and Andrienko, G., 2006. Exploratory analysis of spatial and temporal data -
a systematic approach. Berlin: Springer-Verlag.
Bação, F., Lobo, V., and Painho, M., 2005. The self-organizing map, the Geo-SOM, and relevant variants
for geosciences. Computers & Geosciences, 31 (2), 155–163. doi:10.1016/j.cageo.2004.06.013
Banerjee, A., et al., 2007. A generalized maximum entropy approach to Bregman co-clustering and
matrix approximation. Journal of Machine Learning Research, 8, 1919–1986.
Berkhin, P., 2006. A survey of clustering data mining techniques. Grouping Multidimensional Data:
Recent Advances in Clustering, 25–71.
Bertin, J., 1983. Semiology of graphics: diagrams, networks, maps. London: University of Wisconsin
Press.
Cai, R., Lu, L., and Cai, L.-H. 2005. Unsupervised auditory scene categorization via key audio effects
and information-theoretic co-clustering. Proceedings. (ICASSP’05). IEEE International Conference on
Acoustics, Speech, and Signal Processing, ii/1073-ii/1076 Vol. 1072. Philadelphia, Pennsylvania.
Charrad, M. and Ahmed, M.B., 2011. Simultaneous clustering: A survey. In: International Conference
on Pattern Recognition and Machine Intelligence. Moscow, Russia, 370–375.
Chen, W., Tang, H., and Zhao, H., 2015. Diurnal, weekly and monthly spatial variations of air
pollutants and air quality of Beijing. Atmospheric Environment, 119, 21–34. doi:10.1016/j.
atmosenv.2015.08.040
Cheng, T., et al., 2014. Spatiotemporal data mining. Handbook of regional science. Heidelberg,
Germany: Springer, 1173–1193.
Cheng, W., et al., 2012. Hierarchical co-clustering based on entropy splitting. Proceedings of the 21st
ACM international conference on Information and knowledge management. Maui, Hawaii,
1472–1476.
Cheng, W., et al., 2016. HICC: an entropy splitting-based framework for hierarchical co-clustering.
Knowledge and Information Systems, 46 (2), 343–367. doi:10.1007/s10115-015-0823-x
China, 2012. Technical regulation on ambient air quality index (on trial). China: China Environmental
Science Press Beijing.
Cho, H., et al., 2004. Minimum sum-squared residue co-clustering of gene expression data. Fourth
SIAM Int’l Conf. Data Mining. Florida, USA.
Costa, G., Manco, G., and Ortale, R., 2008. A hierarchical model-based approach to co-clustering
high-dimensional data. Proceedings of the 2008 ACM symposium on Applied computing. Maui,
Hawaii, 886–890.
Dhillon, I.S., Mallela, S., and Modha, D.S., 2003. Information-theoretic co-clustering. In: The 9th
International Conference on Knowledge Discovery and Data Mining (KDD). Washington, DC,
89–98. doi:10.1159/000071010
Eren, K., et al., 2012. A comparative analysis of biclustering algorithms for gene expression data.
Briefings in Bioinformatics, 14 (3), 279–292.
Gerber, G.K., et al., 2007. Automated discovery of functional generality of human gene expression
programs. PLoS Computational Biology, 3 (8), e148. doi:10.1371/journal.pcbi.0030148
Grubesic, T.H., Wei, R., and Murray, A.T., 2014. Spatial clustering overview and comparison: accuracy,
sensitivity, and computational expense. Annals of the Association of American Geographers, 104
(6), 1134–1156. doi:10.1080/00045608.2014.958389
Gu, Y., et al., 2010. Phenological classification of the United States: A geographic framework for
extending multi-sensor time-series data. Remote Sensing, 2, 526–544. doi:10.3390/rs2020526
26 X. WU ET AL.
Guo, D., et al., 2006. A visualization system for space-time and multivariate patterns (VIS-STAMP).
IEEE Transactions on Visualization and Computer Graphics, 12 (6), 1461–1474. doi:10.1109/
TVCG.2006.84
Hagenauer, J. and Helbich, M., 2013. Hierarchical self-organizing maps for clustering spatiotemporal
data. International Journal of Geographical Information Science, 27 (10), 2026–2042. doi:10.1080/
13658816.2013.788249
Han, J., Kamber, M., and Pei, J., 2012. Data mining concepts and techniques. 3rd ed. Burlington, MA:
Morgan Kaufman MIT press.
Han, J., Lee, J.-G., and Kamber, M., 2009. An overview of clustering methods in geographic data
analysis. In: H.J. Miller and J. Han, eds. Geographic data mining and knowledge discovery. 2nd ed.
New York: Taylor & Francis Group, 150–187.
Hartigan, J.A., 1972. Direct clustering of a data matrix. Journal of American Statistical Association, 67
(337), 123–129. doi:10.1080/01621459.1972.10481214
Henriques, R. and Madeira, S.C., 2018. Triclustering algorithms for three-dimensional data analysis:
A comprehensive survey. ACM Computing Surveys (CSUR), 51 (5), 95. doi:10.1145/3271482
Hosseini, M. and Abolhassani, H., 2007. Hierarchical co-clustering for web queries and selected urls.
In: International Conference on Web Information Systems Engineering. Nancy, France, 653–662.
Hu, Z. and Bhatnagar, R., 2010. Algorithm for discovering low-variance 3-clusters from real-valued
datasets. In: 2010 IEEE 10th International Conference on Data Mining (ICDM). Sydney, Australia,
236–245.
Ienco, D., Pensa, R.G., and Meo, R., 2009. Parameter-free hierarchical co-clustering by n-ary splits.
In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Bled,
Slovenia, 580–595.
Kangas, J., 1992. Temporal knowledge in locations of activations in a self-organizing map. In: I.
Aleksander and J. Taylor, eds. Artificial neural networks, 2. Vol. 1. Amsterdam, Netherlands:
North-Holland, 117–120.
Kisilevich, S., et al., 2010. Spatio-temporal clustering. In: O. Maimon, et al., eds. Data mining and
knowledge discovery handbook. Springer US, 855–874.
Kohonen, T., 1995. Self-organizing maps. Berlin: Springer-Verlag.
Li, H., Fan, H., and Mao, F., 2016. A visualization approach to air pollution data exploration—a case
study of air quality index (PM2. 5) in Beijing, China. Atmosphere, 7 (3), 35. doi:10.3390/atmos7030035
Li, R., et al., 2015. Diurnal, seasonal, and spatial variation of PM2. 5 in Beijing. Science Bulletin, 60 (3),
387–395. doi:10.1007/s11434-014-0607-9
Lloyd, S., 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28 (2),
129–137. doi:10.1109/TIT.1982.1056489
MacQueen, J., 1967. Some methods for classification and analysis of multivariate observations. the
Fifth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, California, 281–297.
Miller, H.J. and Han, J., 2009. Geographic data mining and knowledge discovery: an overview. In: H.
J. Miller and J. Han, eds. Geographic data mining and knowledge discovery - 2nd edition. London:
Taylor & Francis Group, 1–26.
Mills, R.T., et al., 2011. Cluster analysis-based approaches for geospatiotemporal data mining of
massive data sets for identification of forest threats. Procedia Computer Science, 4, 1612–1621.
doi:10.1016/j.procs.2011.04.174
Padilha, V.A. and Campello, R.J., 2017. A systematic comparative evaluation of biclustering
techniques. BMC Bioinformatics, 18 (1), 55. doi:10.1186/s12859-017-1487-1
Pensa, R.G., Ienco, D., and Meo, R., 2012. Hierarchical co-clustering: off-line and incremental
approaches. Data Mining and Knowledge Discovery, 28 (1), 31–64. doi:10.1007/s10618-012-0292-8
Peuquet, D.J., 1994. It’s about time: a conceptual framework for the representation of temporal
dynamics in geographic information systems. Annals of the Association of American Geographers,
84 (3), 441–461. doi:10.1111/j.1467-8306.1994.tb01869.x
Robardet, C., 2002. Contribution à la classification non supervisée: proposition d’une méthode de bi-
partitionnement. Doctoral dissertation, Lyon, 1.
Rohwer, R. and Freitag, D., 2004. Towards full automation of lexicon construction. Proceedings of the
HLT-NAACL Workshop on Computational Lexical Semantics. Boston, MA, 9–16.
INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE 27
Shekhar, S., et al., 2015. Spatiotemporal data mining: a computational perspective. ISPRS
International Journal of Geo-Information, 4 (4), 2306–2338. doi:10.3390/ijgi4042306
Shen, S., et al., 2018. Spatial distribution patterns of global natural disasters based on biclustering.
Natural Hazards, 92 (3), 1809–1820. doi:10.1007/s11069-018-3279-y
Sim, K., Aung, Z., and Gopalkrishnan, V., 2010. Discovering correlated subspace clusters in 3D
continuous-valued data. 2010 IEEE 10th International Conference on Data Mining (ICDM),
471–480. doi:10.1016/j.nano.2009.09.005
Tou, J.T. and Gonzalez, R.C., 1974. Pattern recognition principles. Boston, MA: Addison-Wesley
Publishing Company.
Ullah, S., et al., 2017. Detecting space-time disease clusters with arbitrary shapes and sizes using a
co-clustering approach. Geospatial Health, 12 (2), 567.
Wang, Z., et al., 2015. Spatial-temporal characteristics of PM2.5 in Beijing in 2013. Acta Geographica
Sinica, 70 (1), 110–120.
White, M.A., et al., 2005. A global framework for monitoring phenological responses to climate
change. Geophysical Research Letters, 32 (4), L04705. doi:10.1029/2004GL021961
Wu, X., et al., 2020a. Spatio-temporal differentiation of spring phenology in China driven by
temperatures and photoperiod from 1979 to 2018. Science China-Earth Sciences. doi:10.1360/
SSTe-2019-0212
Wu, X., et al., 2020b. An interactive web-based geovisual analytics platform for co-clustering
analysis. Computers & Geosciences, 104420. doi:10.1016/j.cageo.2020.10442
Wu, X., et al., 2018. Triclustering georeferenced time series for analyzing patterns of intra-annual
variability in temperature. Annals of the American Association of Geographers, 108 (1), 71–87.
doi:10.1080/24694452.2017.1325725
Wu, X., Zurita-Milla, R., and Kraak, M.J., 2015. Co-clustering geo-referenced time series: exploring
spatio-temporal patterns in Dutch temperature data. International Journal of Geographical
Information Science, 29 (4), 624–642. doi:10.1080/13658816.2014.994520
Wu, X., Zurita-Milla, R., and Kraak, M.-J., 2013. Visual discovery of synchronization in weather data at
multiple temporal resolutions. The Cartographic Journal, 50 (3), 247–256. doi:10.1179/
1743277413Y.0000000067
Wu, X., Zurita-Milla, R., and Kraak, M.-J., 2016. A novel analysis of spring phenological patterns over
Europe based on co-clustering. Journal of Geophysical Research: Biogeosciences, 121, 1434–1448.
Wu, X., et al., 2017. Clustering-based approaches to the exploration of spatio-temporal data.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
(ISPRS’17). Wuhan, China, 1387–1391.
Zhang, T., Ramakrishnan, R., and Livny, M., 1996. BIRCH: an efficient data clustering method for very
large databases. ACM SIGMOD Record, 25 (2), 103–114.
Zhang, Y.L. and Cao, F., 2015. Fine particulate matter (PM 2.5) in China at a city level. Scientific
Reports, 5, 14884. doi:10.1038/srep14884
Zhao, C., et al., 2014. Temporal and spatial distribution of PM2.5 and PM10 pollution status and the
correlation of particulate matters and meteorological factors during winter and spring in Beijing.
Environmental Science, 35 (2), 418–427.
Zhao, L. and Zaki, M.J., 2005. TRICLUSTER: an effective algorithm for mining coherent clusters in 3D
microarray data. Proc. of the 2005 ACM SIGMOD International Conference on Management of Data.
Baltimore, Maryland, 694–705.
Zheng, Y., et al., 2014. A cloud-based knowledge discovery system for monitoring fine-grained air
quality. Preparation, Microsoft Tech Report, http://research.microsoft.com/apps/pubs/default. aspx
Zheng, Y., Liu, F., and Hsieh, H.-P., 2013. U-Air: when urban air quality inference meets big data.
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data
mining. Chicago, IL, 1436–1444.