Dynamic spatio-temporal pattern discovery: a novel grid and density-based clustering algorithm
Dynamic spatio-temporal pattern discovery: a novel grid and density-based clustering algorithm
Corresponding Author:
Swati Meshram
Department of Computer Science and Eng, Government College of Engineering
Amravati, Maharashtra, India
Email: [email protected]
1. INTRODUCTION
Advancement in computer technology, remote sensing, and location-based services has resulted in the
generation of massive spatiotemporal data. Spatiotemporal data analysis is an emerging research area driven by
development and application of intelligent computational techniques. Analyzing spatiotemporal data is beneficial
for various human-centered applications like recommendation systems, identifying disease outbreak patterns,
urban development clustering, infrastructure planning, and detection of criminal activities. Clustering is a valuable
analysis tool for exploring and understanding rich information contained in the spatio-temporal datasets.
The aim of analyzing spatiotemporal data clustering is to detect and examine noteworthy patterns in the
data that change both in space and time and help in understanding the dynamics or processes driving the patterns
and trends. Anomalous patterns are detected to indicate rare but significant events with deviation from expected
behavior. The analysis may also be used to design models for predicting future occurrences of similar events. It
provides insights in evolution of cluster and its changes over time which is a valuable information in
understanding trends. This may lead to proactive decision-making and improved resource allocation.
Understanding the impact of human activities and natural processes and their interconnections. This is highly
relevant in the field of emergency response, public health, monitoring environmental variables, and urban resource
allocation. Spatiotemporal clustering analysis allows researchers to uncover complex relationships and make
informed decisions in dynamic and interconnected systems [1]. Spatio-temporal data contain the geographical
location and temporal or time of occurrence of the events along with other non-spatiotemporal features describing
the events. Spatio-temporal data clustering analysis is a machine learning technique to search patterns in a dataset
by grouping data instances based on similarity measures. The intraclusters instances exhibit high similarity
whereas the same instances are incoherent with instances of other clusters to form distinct clusters. In other words,
clustering, is an unsupervised classification technique that separates an unlabeled data set into a finite number of
groups whose members are data instances that are more homogeneous to its group than to other groups. These
groups are termed as clusters. Thus, clustering as a process is illustrated by the following example:
Given an input data set X={x1, x2,…,xn}, where each xi has set of j features or dimensions. We attempt
to derive ‘k’ clusters given as C={c1, c2,…ck} satisfying the following conditions. For all i,j ϵ {1,…k},each
|ci| >0, and 𝐶𝑖 ∩ 𝐶𝑗 = ø, and 𝑋 =∪𝑘𝑖=0 𝐶𝑖 .
Clustering has been applied in recommendation systems by reviewing customer feedbacks for product
popularity. Clustering as a tool is also useful for observing the abnormal behavior of outliers that do not exhibit
the same relationship as that of other clusters. This analysis helps detect rare but important patterns in urban
planning [2], big climate data analytics [3]. One of the applications of clustering we tend to explore is detection
of earthquake clusters of same severity, regions of clustering displaying foreshocks and aftershocks of main
earthquake events. Earthquakes are natural events that cause tremors from Earth’s core to the surface. These
sudden, vibrations may destroy useful natural and man-made resources. Identification of such areas, which
may have a trend or reach of earthquake impacts using machine learning pattern mining techniques is
important. Hence, we focus our study on deriving clustering patterns through our proposed research work on
Indian earthquake spatiotemporal data. We highlight our contribution in this research article as follows:
i) a method for selection of centroids; ii) a method to convert tentative clusters to fixed clusters based on
density; iii) outlier score and clustering quality; iv) detecting spatio-temporal referenced variables with respect
to evolution over time; and v) the proposed algorithm is implemented and experimentally validated.
Our research paper adheres to the following structure: section 2 explores related literature. Section 3
outlines the methodology. Section 4 presents the results and subsequent discussion. Finally, section 5 provides
the conclusion of the research.
2. RELATED WORK
The clustering distance-based method computes a distance metric to measure the spatial distance and
cluster similar or neighbouring points. The distance metric used are Euclidean distance, dynamic time warping
[4], longest common subsequence (LCSS) [5], edit distance on real sequence (EDR) [6], Hausdroff [7], and
Fréchet [8] distance. Density based clustering performs the grouping of density satisfying regions into clusters
[9]. Feature-based clustering, first extracts the features and then computes their similarity [10], [11]. Time
series data analysis using kernel density was the work undertaken to develop the algorithm spatio-temporal
density-based spatial clustering of applications with noise (ST-DBSCAN) [12]. A spike neural network
architecture is developed to cluster spatiotemporal brain data [13]. Guo et al. [14] analysed foodborne diseases
on people of Zhejiang has been studied using spatiotemporal clustering which includes methods such as
statistical and spatial analysis along with spatiotemporal scanning. Here the temporal resolution found is large.
Loiola et al. [15] explored a hybrid burned area algorithm based on moderate resolution imaging
spectroradiometer (MODIS) thermal anomalies and NIR reflectance with spatial resolution of 250 m on
MODIS data. Hotspots clusters were developed to discover fire active areas. All these studies reflect the
algorithms are developed to tackle specific problems and thus they have limited applicability. Gong et al. [16]
put forth a model that learns the dynamics of mobility in taxi trajectory data and uses it to predict mobility in
specific route areas. Here the area of study is confined to a particular area. Another article on trajectory
clustering is studied in [17], it identifies clusters based on Hausdroff distance in K-nearest neighbour method
where the accuracy of the method heavily relies on appropriate value of ‘K’. The research in [18], [19]
trajectory analysis was used to extract road traffic, determine flow statistic, and detect congestion. According
to Georgoulas et al. [20], a hybrid approach of clustering over seismic spatio-temporal data was adopted. It is
based on density and hierarchical agglomerative clustering which extracts objects with unknown class labels.
Here connectivity is based on single linkage to form the clusters and no emphasis is placed on the temporal
parameter of the dataset. According to Nazia et al. [21], space time clusters were discovered using
geographically weighted regression model. The model also utilizes local and global Moran’s I to interpret the
cluster distribution pattern. The model was verified using COVID dataset. Another work on COVID dataset is
carried out in [22]. The authors adopted a partition dataset using medoids and improved the result gap statistics.
While some of these studies have discussed about outliers, but they have not explicitly addressed to reduce the
3. METHODOLOGY
This section discusses the proposed methodology using hybrid grid and density-based clustering
approach on spatio-temporal data as shown in the Figure 1. The proposed method is implemented using Python
programming in Colab environment which offers Google free cloud space for storage of data along with CPU
processing capability. We have employed the concept of grid structure, density of the grid cells and
neighborhood of instances, centroids and grids to form clusters. Further use density attraction rate of
neighboring clusters to merge the clusters and derive final clusters.
The dataset is obtained from the https://seismo.gov.in [34] which is the Government of India portal
for Seismic events containing earthquake spatiotemporal data for the Indian subcontinent. The data is available
in CSV file. With 6506 samples have been employed in our experiment from the year August 2019 to January
2024 as mentioned in Table 2. The attributes of the dataset are spatial longitude, latitude, timestamp, and depth
of the event along with comments. Table 3, shows the types of earthquake severity levels based on magnitude
Dynamic spatio-temporal pattern discovery: a novel grid and density-based … (Swati Meshram)
400 ISSN:2252-8938
and depth. We imported the earthquake catalogue and deleted the comments describing the textual location of
the earthquake. Table 4, describes the parameters of the proposed algorithm with its initialization. The
visualisation of different levels of earthquakes recorded are shown in Figures 2(a) and 2(b).
(a) (b)
Figures 2. Summary of different w.r.t count of earthquake dataset based on (a) magnitude and
(b) depth levels
Spatiotemporal data is a sequence of data points in increasing order of time and is expressed as (1).
where ST is a collection of spatio-temporal events dataset and ‘n’ is the total number of spatiotemporal events
present in the dataset. sti- represents the ith datapoint that records the longitude, latitude as location coordinates
along with occurrence time of the event.
Grid: G is a multidimensional logical grid that geographically and temporally divides the
spatiotemporal space. The division is based on the longitude, latitude, and time. In addition, the dataset also
records non-spatial information related to events.
Distance Measure: This measure describes the closeness of two data points based on their spatial-
temporal distance and similarity, producing lower values for low similarity and higher values for high
similarity. As a spatio-temporal distance measure, we adopt the Haversine distance formula for spatial distance
along with the temporal distance measured in days. We assume that the time advancement between any two
events was at least 1. The Haversine distance formula for two spatial instances is expressed as (2):
Step 1: Determine minimum and maximum longitude and latitude coordinates of the dataset.
𝐺𝑖𝑗𝑘 = {𝑜𝑚 |(𝐺𝑖𝑗𝑘 . minlat ≤ 𝑜𝑚 . 𝑙𝑎𝑡 ≤ 𝐺𝑖𝑗𝑘 . maxlat) ∧ (𝐺_𝑖𝑗𝑘. 𝑚𝑖𝑛𝐿𝑜𝑛. ≤ 𝑜_𝑚. 𝑙𝑜𝑛 ≤ 𝐺_𝑖𝑗𝑘. 𝑚𝑎𝑥𝑙𝑜𝑛)
∧ 𝑜𝑚 . 𝑡𝑖𝑚𝑒 ∈ [𝑘 − 1, 𝑘]} (10)
where Om represents the spatio-temporal instances, with its event time belonging to the interval k-1 to k.
Step 5: Compute the density of each grid cell as
|𝑃𝐶𝑒𝑛𝑡𝑟𝑒(𝐺𝑖𝑗𝑘 )| ≤ 𝑝 (14)
Dynamic spatio-temporal pattern discovery: a novel grid and density-based … (Swati Meshram)
402 ISSN:2252-8938
Hdist is the Haversine spatial distance between two locations. Tdist is the temporal distance between the two
events converted into days.
Step 9: Compute the average radius of the probable clusters.
∑𝑂𝑚∈𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟 𝑑𝑖𝑠𝑡(𝑂𝑚 ,𝑃𝐶𝑒𝑛𝑡𝑟𝑒𝑞 )
𝑞
𝑅𝑎𝑑𝑖𝑢𝑠(𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑞 ) = (17)
|𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑞 |
Step 11: Sort the density attraction rate of each grid cell.
The density attraction rate is used to merge the clusters with minimum points joins to strong clusters in the
neighbourhood.
Step 13: Construct centroid to centroid distance matrix, p x p for p centroid configuration.
Step 15: Find the neighbour cluster density attraction rate to merge the cluster.
Continue step 12
𝐼𝑓(𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝐴𝑡𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑅𝑎𝑡𝑒(𝑃𝐶𝑒𝑛𝑡𝑟𝑒𝑞 ) > 𝐷𝑒𝑛𝑠𝑖𝑡𝑦𝐴𝑡𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛𝑅𝑎𝑡𝑒(𝑃𝐶𝑒𝑛𝑡𝑟𝑒𝑟 ))𝑡ℎ𝑒𝑛
𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑞 ← 𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑞 ∪ 𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑟
𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟 ← 𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟 − 𝑃𝐶𝑙𝑢𝑠𝑡𝑒𝑟𝑟
𝑃𝐶𝑒𝑛𝑡𝑟𝑒 ← 𝑃𝐶𝑒𝑛𝑡𝑟𝑒 − 𝑃𝐶𝑒𝑛𝑡𝑟𝑒𝑟 (24)
Continue step 12
where 𝜇is the mean distance from the centroid to all other data instances within the cluster. M is the mean
distance to all other clusters data instances.
Step 17: Stop.
Dynamic spatio-temporal pattern discovery: a novel grid and density-based … (Swati Meshram)
404 ISSN:2252-8938
clustering result. It is observed that on this dataset the proposed algorithm has shown better clustering quality.
With Davis Bouldin DB index 2.337 as shown in the Table 5. The number of clusters as are increased, the
spatio-temporal distances between the clusters are reduced, forming strong clusters.
Figure 3. Result of proposed clustering algorithm on the earthquake dataset showing distinct density of the
clusters with magnitude and depth with respect to time plot
Figure 4. Result of proposed clustering algorithm on the earthquake dataset showing trend of events with
clustering size
Figure 5. Result of STK-means on Indian subcontine Figure 6. Result of proposed clustering algorithm on
ntearthquake dataset producing seven clusters Indian earthquake dataset producing seven clusters
Dynamic spatio-temporal pattern discovery: a novel grid and density-based … (Swati Meshram)
406 ISSN:2252-8938
5. CONCLUSION
This paper proposes a novel and adaptive method of clustering. The method has been experimentally
evaluated on real and standard earthquake dataset of Indian subcontinent. The clustering technique uses grid
and density-based partitioning of data instances. Restricting the analysis to the effects of space and time,
provides us with the information that events that are high intensity events are followed by weak events in the
same clustering region reasoning to aftershocks. Our proposed method has found distinct, non-overlapping
arbitrary shaped clusters on spatial and temporal data with reducing outlier ratio and distance metric
computation by taking advantage of grid structure. The silhouette index is about 0.93 shows good clustering
result. The proposed method for Spatio-temporal clustering is experimented on earthquake dataset but it can
be applied on other Spatio-temporal dataset to study the dynamics of data. Further research direction we would
take up is to minimize the parameter required for the method.
REFERENCES
[1] S. Meshram and K. P. Wagh, “Mining intelligent spatial clustering patterns: A comparative analysis of different approaches,” in
Proceedings of the 2021 8th International Conference on Computing for Sustainable Global Development, INDIACom 2021, 2021,
pp. 325–330, doi: 10.1109/INDIACom51348.2021.00056.
[2] Y. Zheng, L. Capra, O. Wolfson, and H. Yang, “Urban Computing,” ACM Transactions on Intelligent Systems and Technology,
vol. 5, no. 3, pp. 1–55, Oct. 2014, doi: 10.1145/2629592.
[3] F. Hu et al., “ClimateSpark: An in-memory distributed computing framework for big climate data analytics,” Computers and
Geosciences, vol. 115, pp. 154–166, 2018, doi: 10.1016/j.cageo.2018.03.011.
[4] D. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series,” in Proceedings of the 3rd International
Conference on Knowledge Discovery and Data Mining, 1994, vol. 398, pp. 359–370.
[5] M. Vlachos, G. Kollios, and D. Gunopulos, “Discovering similar multidimensional trajectories,” in Proceedings 18th International
Conference on Data Engineering, pp. 673–684, doi: 10.1109/ICDE.2002.994784.
[6] L. Chen, M. T. Özsu, and V. Oria, “Robust and fast similarity search for moving object trajectories,” in Proceedings of the 2005
ACM SIGMOD International Conference on Management of Data, Jun. 2005, pp. 491–502, doi: 10.1145/1066157.1066213.
[7] B. Guan, L. Liu, and J. Chen, “Using relative distance and hausdorff distance to mine trajectory clusters,” TELKOMNIKA
Indonesian Journal of Electrical Engineering, vol. 11, no. 1, 2013, doi: 10.11591/telkomnika.v11i1.1877.
[8] M. M. Fréchet, “Sur quelques points du calcul fonctionnel,” Rendiconti del Circolo Matematico di Palermo, vol. 22, no. 1, pp. 1–
72, 1906, doi: 10.1007/BF03018603.
[9] Z. Cheng, L. Jiang, D. Liu, and Z. Zheng, “Density based spatio-temporal trajectory clustering algorithm,” International Geoscience
and Remote Sensing Symposium (IGARSS), vol. 2018, pp. 3358–3361, 2018, doi: 10.1109/IGARSS.2018.8517434.
[10] J. Du and L. Aultman-Hall, “Increasing the accuracy of trip rate information from passive multi-day GPS travel datasets: Automatic
trip end identification issues,” Transportation Research Part A: Policy and Practice, vol. 41, no. 3, pp. 220–232, 2007, doi:
10.1016/j.tra.2006.05.001.
[11] N. Pelekis, I. Kopanakis, I. Ntoutsi, G. Marketos, and Y. Theodoridis, “Mining trajectory databases via a suite of distance operators,”
in 2007 IEEE 23rd International Conference on Data Engineering Workshop, Apr. 2007, pp. 575–584, doi:
10.1109/ICDEW.2007.4401043.
[12] D. Birant and A. Kut, “ST-DBSCAN: An algorithm for clustering spatial-temporal data,” Data and Knowledge Engineering, vol.
60, no. 1, pp. 208–221, 2007, doi: 10.1016/j.datak.2006.01.013.
[13] M. G. Doborjeh and N. Kasabov, “Dynamic 3D clustering of spatio-temporal brain data in the NeuCube spiking neural network
architecture on a case study of fMRI data,” in Neural Information Processing, vol. 9492, 2015, pp. 191–198, doi: 10.1007/978-3-
319-26561-2_23.
[14] J.-X. Guo, T. Liu, X.-J. Qi, J. Chen, and S.-Y. Ai, “Application of spatio-temporal scanning in the analysis of spatio-temporal
clusters of foodborne diseases in Zhejiang Province,” Chinese Preventive Medicine, vol. 21, no. 11, pp. 1171–1177, 2020, doi:
10.16506/j.1009-6639.2020.11.003.
[15] J. L. -Loiola, G. Otón, R. Ramo, and E. Chuvieco, “A spatio-temporal active-fire clustering approach for global burned area mapping
at 250 m from MODIS data,” Remote Sensing of Environment, vol. 236, Jan. 2020, doi: 10.1016/j.rse.2019.111493.
[16] S. Gong, J. Cartlidge, R. Bai, Y. Yue, Q. Li, and G. Qiu, “Extracting activity patterns from taxi trajectory data: a two-layer
framework using spatio-temporal clustering, Bayesian probability and Monte Carlo simulation,” International Journal of
Geographical Information Science, vol. 34, no. 6, pp. 1210–1234, 2020, doi: 10.1080/13658816.2019.1641715.
[17] Y. Yang, J. Cai, H. Yang, J. Zhang, and X. Zhao, “TAD: A trajectory clustering algorithm based on spatial-temporal density
analysis,” Expert Systems with Applications, vol. 139, Jan. 2020, doi: 10.1016/j.eswa.2019.112846.
[18] A. I. J. Tostes, F. D. L. P. Duarte-Figueiredo, R. Assunção, J. Salles, and A. A. F. Loureiro, “From data to knowledge,” in
Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing, Aug. 2013, pp. 1–8, doi:
10.1145/2505821.2505831.
[19] A. Muñoz-Villamizar, E. L. Solano-Charris, M. AzadDisfany, and L. Reyes-Rubiano, “Study of urban-traffic congestion based on
Google Maps API: the case of Boston,” IFAC-PapersOnLine, vol. 54, no. 1, pp. 211–216, 2021, doi: 10.1016/j.ifacol.2021.08.079.
[20] G. Georgoulas, A. Konstantaras, E. Katsifarakis, C. D. Stylios, E. Maravelakis, and G. J. Vachtsevanos, “‘Seismic-mass’ density-
based algorithm for spatio-temporal clustering,” Expert Systems with Applications, vol. 40, no. 10, pp. 4183–4189, 2013, doi:
10.1016/j.eswa.2013.01.028.
[21] N. Nazia, J. Law, and Z. A. Butt, “Spatiotemporal clusters and the socioeconomic determinants of COVID-19 in Toronto
neighbourhoods, Canada,” Spatial and Spatio-temporal Epidemiology, vol. 43, Nov. 2022, doi: 10.1016/j.sste.2022.100534.
[22] S. Deb and S. Karmakar, “A novel spatio-temporal clustering algorithm with applications on COVID-19 data from the United
States,” Computational Statistics and Data Analysis, vol. 188, 2023, doi: 10.1016/j.csda.2023.107810.
[23] J. Sodoge, C. Kuhlicke, and M. M. de Brito, “Automatized spatio-temporal detection of drought impacts from newspaper articles
using natural language processing and machine learning,” SSRN Electronic Journal, 2022, doi: 10.2139/ssrn.4178096.
[24] X. Liu and D. Lv, “Spatial and temporal characteristics, spatial clustering and governance strategies for regional development of
social enterprises in China,” Heliyon, vol. 10, no. 4, Feb. 2024, doi: 10.1016/j.heliyon.2024.e26246.
[25] F. Y. Foo, N. A. Rahman, F. Z. S. Abdullah, and N. S. A. Naeeim, “Spatio-temporal clustering analysis of COVID-19 cases in
Johor,” Infectious Disease Modelling, vol. 9, no. 2, pp. 387–396, 2024, doi: 10.1016/j.idm.2024.01.009.
[26] H. Peng, W. Li, C. Jin, H. Yang, and J. Guan, “MuSTC: A multi-stage spatio–temporal clustering method for uncovering the
regionality of global SST,” Atmosphere, vol. 14, no. 9, 2023, doi: 10.3390/atmos14091358.
[27] N. Madyavanhu, M. D. Shekede, S. Kusangaya, D. M. Pfukenyi, S. Chikerema, and I. Gwitira, “Bovine anaplasmosis in Zimbabwe:
spatio-temporal distribution and environmental drivers,” Veterinary Quarterly, vol. 44, no. 1, pp. 1–16, 2024, doi:
10.1080/01652176.2024.2306210.
[28] M. M. Nemukula, C. Sigauke, H. Chikoore, and A. Bere, “Modelling drought risk using bivariate spatial extremes: application to
the Limpopo Lowveld Region of South Africa,” Climate, vol. 11, no. 2, 2023, doi: 10.3390/cli11020046.
[29] S. Shivakumar et al., “Examining leopard attacks: spatio-temporal clustering of human injuries and deaths in Western Himalayas,
India,” Frontiers in Conservation Science, vol. 4, 2023, doi: 10.3389/fcosc.2023.1157067.
[30] R. Tang, G. Hou, and R. Du, “Isolated or colocated? exploring the spatio-temporal evolution pattern and influencing factors of the
attractiveness of residential areas to restaurants in the central Urban Area,” ISPRS International Journal of Geo-Information, vol.
12, no. 5, May 2023, doi: 10.3390/ijgi12050202.
[31] K. Tripathi, “The novel hierarchical clustering approach using self-organizing map with optimum dimension selection,” Health
Care Science, vol. 3, no. 2, pp. 88–100, 2024, doi: 10.1002/hcs2.90.
[32] T. Z. Nigussie, T. T. Zewotir, and E. K. Muluneh, “Detection of temporal, spatial and spatiotemporal clustering of malaria incidence
in northwest Ethiopia, 2012–2020,” Scientific Reports, vol. 12, no. 1, 2022, doi: 10.1038/s41598-022-07713-3.
[33] X. Yu et al., “Epidemiological characteristics and spatio-temporal analysis of brucellosis in Shandong Province, 2015–2021,” BMC
Infectious Diseases, vol. 23, no. 1, 2023, doi: 10.1186/s12879-023-08503-6.
[34] National Center for Seismology, “Seismological data: earthquake catalogue, Aug. 2019 to Jan. 2024,” National Center for
Seismology, Ministry of Earth Sciences, Government of India. [Online]. Available:
https://riseq.seismo.gov.in/riseq/earthquake/archive
BIOGRAPHIES OF AUTHORS
Dynamic spatio-temporal pattern discovery: a novel grid and density-based … (Swati Meshram)