Where am I? Cross-View Geo-localization with Natural Language Descriptions

Ye, Junyan; Lin, Honglin; Ou, Leyan; Chen, Dairong; Wang, Zihao; Zhu, Qi; He, Conghui; Li, Weijia

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.17007 (cs)

[Submitted on 22 Dec 2024 (v1), last revised 1 Apr 2025 (this version, v2)]

Title:Where am I? Cross-View Geo-localization with Natural Language Descriptions

Authors:Junyan Ye, Honglin Lin, Leyan Ou, Dairong Chen, Zihao Wang, Qi Zhu, Conghui He, Weijia Li

View PDF HTML (experimental)

Abstract:Cross-view geo-localization identifies the locations of street-view images by matching them with geo-tagged satellite images or OSM. However, most existing studies focus on image-to-image retrieval, with fewer addressing text-guided retrieval, a task vital for applications like pedestrian navigation and emergency response. In this work, we introduce a novel task for cross-view geo-localization with natural language descriptions, which aims to retrieve corresponding satellite images or OSM database based on scene text descriptions. To support this task, we construct the CVG-Text dataset by collecting cross-view data from multiple cities and employing a scene text generation approach that leverages the annotation capabilities of Large Multimodal Models to produce high-quality scene text descriptions with localization details. Additionally, we propose a novel text-based retrieval localization method, CrossText2Loc, which improves recall by 10% and demonstrates excellent long-text retrieval capabilities. In terms of explainability, it not only provides similarity scores but also offers retrieval reasons. More information can be found at this https URL .

Comments:	11 pages, 6 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2412.17007 [cs.CV]
	(or arXiv:2412.17007v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.17007

Submission history

From: Weijia Li [view email]
[v1] Sun, 22 Dec 2024 13:13:10 UTC (10,117 KB)
[v2] Tue, 1 Apr 2025 02:48:45 UTC (22,383 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Where am I? Cross-View Geo-localization with Natural Language Descriptions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Where am I? Cross-View Geo-localization with Natural Language Descriptions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators