Sciendo

An approach to speed up the DBSCAN algorithm is suggested. The planar clusters to be revealed are assumed to be tightly packed and correlated constituting, thus, a serpentine dataset developing rightwards or leftwards as time goes on. The dataset is initially divided into a few sub-datasets along the time axis, whereupon the best neighbourhood radius is determined over the first sub-dataset and the standard DBSCAN algorithm is run over all the sub-datasets by the best neighbourhood radius. To find the best neighbourhood radius, it is necessary to know ground truth cluster labels of points within a region. The factual speedup registered in a series of 80 000 dataset computational simulations ranges from 5.0365 to 724.7633 having a trend to increase as the dataset size increases.

eISSN:: 2255-8691
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence, Information Technology, Project Management, Software Development

Journal RSS Feed

DBSCAN Speedup for Time-Serpentine Datasets

Published Online: Aug 15, 2024

Page range: 14 - 23

Received: Apr 12, 2024

Accepted: Jul 10, 2024

DOI: https://doi.org/10.2478/acss-2024-0003

Keywords
Clustering, DBSCAN, large dataset, serpentine cluster, speedup

© 2024 Vadim Romanuke., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.