Cluster-and-Conquer: When Randomness Meets Graph Locality

Giakkoupis, George; Kermarrec, Anne-Marie; Ruas, Olivier; Taïani, François

Computer Science > Databases

arXiv:2010.11497 (cs)

[Submitted on 22 Oct 2020]

Title:Cluster-and-Conquer: When Randomness Meets Graph Locality

Authors:George Giakkoupis (WIDE), Anne-Marie Kermarrec (EPFL), Olivier Ruas (SPIRALS), François Taïani (WIDE, IRISA)

View PDF

Abstract:K-Nearest-Neighbors (KNN) graphs are central to many emblematic data mining and machine-learning applications. Some of the most efficient KNN graph algorithms are incremental and local: they start from a random graph, which they incrementally improve by traversing neighbors-of-neighbors links. Paradoxically, this random start is also one of the key weaknesses of these algorithms: nodes are initially connected to dissimilar neighbors, that lie far away according to the similarity metric. As a result, incremental algorithms must first laboriously explore spurious potential neighbors before they can identify similar nodes, and start converging. In this paper, we remove this drawback with Cluster-and-Conquer (C 2 for short). Cluster-and-Conquer boosts the starting configuration of greedy algorithms thanks to a novel lightweight clustering mechanism, dubbed FastRandomHash. FastRandomHash leverages random-ness and recursion to pre-cluster similar nodes at a very low cost. Our extensive evaluation on real datasets shows that Cluster-and-Conquer significantly outperforms existing approaches, including LSH, yielding speed-ups of up to x4.42 while incurring only a negligible loss in terms of KNN quality.

Subjects:	Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
Cite as:	arXiv:2010.11497 [cs.DB]
	(or arXiv:2010.11497v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2010.11497

Submission history

From: Olivier Ruas [view email] [via CCSD proxy]
[v1] Thu, 22 Oct 2020 07:31:12 UTC (211 KB)

Computer Science > Databases

Title:Cluster-and-Conquer: When Randomness Meets Graph Locality

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Cluster-and-Conquer: When Randomness Meets Graph Locality

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators