Megaloc

MegaLoc: One Retrieval to Place Them All

Uploaded by

pokingsanta899

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views6 pages

Megaloc

MegaLoc: One Retrieval to Place Them All

Uploaded by

pokingsanta899

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

MegaLoc: One Retrieval to Place Them All

Gabriele Berton Carlo Masone

Polytechnic of Turin Polytechnic of Turin
[email protected]
arXiv:2502.17237v1 [cs.CV] 24 Feb 2025

Abstract

Retrieving images from the same location as a given

query is an important component of multiple computer vi-
sion tasks, like Visual Place Recognition, Landmark Re-
trieval, Visual Localization, 3D reconstruction, and SLAM.
However, existing solutions are built to specifically work
for one of these tasks, and are known to fail when the
requirements slightly change or when they meet out-of-
distribution data. In this paper we combine a variety of
existing methods, training techniques, and datasets to train
a retrieval model, called MegaLoc, that is performant on
multiple tasks. We find that MegaLoc (1) achieves state
of the art on a large number of Visual Place Recognition
datasets, (2) impressive results on common Landmark Re-
trieval datasets, and (3) sets a new state of the art for Vi-
sual Localization on the LaMAR datasets, where we only
changed the retrieval method to the existing localization Figure 1. Qualitative examples of predictions by MegaLoc. Each
pipeline. The code for MegaLoc is available at https: pair of images represents a query and its top-1 prediction from the
//github.com/gmberton/MegaLoc SF-XL dataset, searched across the 2.8M database spanning 150
km2 across San Francisco. Predictions in green are correct, red
are wrong.

1. Introduction
This paper tackles the task of retrieving images from a large solutions for each of them. As these three tasks contin-
database that represent the same place as a given query im- ued to diverge, over the years papers have avoided show-
age. But what does it mean for two images to be “from the ing results of their methods on more than one of these
same place”? Depending on who you ask, you’ll get differ- tasks: VPR papers don’t show results on LR, and LR pa-
ent answers: pers don’t show results on VPR. In the meantime, 3D vi-
1. Landmark Retrieval (LR) folks will tell you that two sion pipelines like COLMAP [30], Hierarchical Localiza-
photos are from the same place if they depict the same tion [28] and GLOMAP [22] keep using outdated retrieval
landmark, regardless of how close to each other the two methods, like RootSIFT with bag-of-words [3, 10, 32] and
photos were taken [40]; NetVLAD [4]. In this paper we aim to put an end to this, by
2. Visual Place Recognition (VPR) people set a camera training a single model that achieves SOTA (or almost) on
pose distance of 25 meters to define if two images are all of these tasks, showcasing robustness across diverse do-
positives (i.e. from the same place) [4]; mains. To train this model we do not propose any “technical
3. Visual Localization (VL) / 3D Vision researchers will novelty”, but we use all the lessons learned from all these
tell you that two images need to have their pose as close three task, putting together a combination of good samplers,
as possible to be considered the same place. datasets, and general training techniques.
Even though image retrieval is a core component in all “Why does it matter?”, you may ask. Imagine you are
three tasks, their different definitions and requirement has doing 3D reconstruction, where image retrieval is a funda-
inevitably led to the development of ad-hoc image retrieval mental component, on a collection of diverse scenes (e.g. to

1
create datasets like MegaDepth [18], MegaScenes [37], or method assures that each class contains images that repre-
for the evergreen Image Matching Challenge [6]). In some sent a given place from diverse perspectives, while ensuring
cases there would be small scenes (e.g. reconstruction of a that no visual overlap exists between two different places.
fountain), requiring a retrieval model that is able to retrieve EigenPlaces provides two sub-batches, one made of frontal-
nearby images (few meters away), which is something VPR facing images (i.e. with the camera facing straight along the
models excel at, but LR models underperform (see [8] Tab. street) and one of lateral-facing images.
14). In other cases however, the scene might be large (e.g.
a big landmark like a church), with images hundreds of me- Google Street View Cities (GSV-Cities) is a dataset of
ters away: while LR models are designed for this, VPR 530k images split into 62k places/classes from 40 cities,
models achieve poor results in this situations (see Sec. 3). where each class contains at least 4 images with same ori-
Given these considerations, we note how neither VPR nor entation and is at least 100 meters from any other class.
LR provide models for the diverse cases of 3D reconstruc- Given that GSV-Cities is already split into non-overlapping
tions, creating a gap in literature that is filled by MegaLoc. classes, it is not strictly necessary to apply a particular sam-
As another example where a model like MegaLoc is nec- pling technique. We therefore directly feed the GSV-Cities
essary, one can think of Visual Place Recognition (which dataset to the multi-similarity loss, as in the original GSV-
is also the first step for Visual Localization), where models Cities paper [1].
are evaluated by using a 25 meters threshold (and queries
in popular datasets always have at least one positive within
25 meters). However, in the real world the nearest image to Mapillary Street-Level Sequences (MSLS) is a dataset
a given query might be 100 meters away, and while ideally of 1.6M images split in contiguous sequences, across 30
we would still want to retrieve it, a VPR model is unlikely to different cities over 9 years. To ideally sample data from
work in such case, as it has been trained to ignore anything the MSLS dataset, we use the mining technique described in
further away from the camera. the CliqueMining paper [33]. This method ensures that the
In this paper we demonstrate that, by leveraging a di- places selected for each batch depict visually similar (but
verse set of data sources and best practices from LR, VPR geographically different) places (i.e. hard negatives), so that
and VL, we obtain a single image retrieval model that the loss can be as high as possible and effectively teach the
works well across all these tasks. Our model is called model to disambiguate between similar-looking places.
MegaLoc and it is released at https://github.com/
gmberton/MegaLoc MegaScenes is a collection of 100k 3D structure-from-
motion reconstructions, composed of 2M images from
2. Method Wikimedia Commons. Simply using each reconstruction
as a class, and sampling random images from such class,
The core idea of this paper is to fuse data from multiple could lead to images that do not have any visual overlap,
datasets, and train a single model. We use five datasets e.g. two images could show opposites facades of a building,
containing both outdoor and indoor images and catering to therefore having no visual overlap while belonging to the
different image localization tasks: GSV-Cities [1], Map- same 3D reconstruction. Therefore we make sure that when
illary Street-Level Sequences (MSLS) [39], MegaScenes we sample a set of four images from a given reconstruction,
[37], ScanNet [13] and San Francisco eXtra Large (SF-XL) each of these four images should have visual overlap with
[7]. At each training iteration, we extract six-sub batches each other (we define visual overlap as having at least 1%
of data, one for each dataset (except SF-XL, from which of 3D points in common in the 3D reconstruction).
two sub-batches are sampled) and use a multi-similarity loss
[38] computed over each sub-batch. Each sub-batch is made
ScanNet is a dataset of 2.5M views from 1500 scans from
of 128 images, containing 4 images (called quadruplets)
707 indoor places. To train on ScanNet we use each scene
from 32 different places/classes. Given that these datasets
as a class, and select quadruplets so that each pair of images
have diverse format, they require different sampling tech-
within a quadruplet has visual overlap (i.e. less than 10 me-
niques. In the following paragraphs we explain how data is
ters and 30° apart); simultaneously we ensure that no two
sampled from each dataset.
images from different quadruplets has visual overlap.

San Francisco eXtra Large (SF-XL) is a dataset of 41M 3. Experiments

images with GPS and orientation from 12 different years,
densely covering the entire city of San Francisco across
3.1. Implementation details
time. To select ideal quadruplets for training, we use the During training, images are resized to 224×224, while for
sampling technique presented in EigenPlaces [9]. This inference we resize them to 322×322, following [16]. We

2
Desc. Baidu [34] Eynsham [8, 12] MSLS val [39] Pitts250k [4, 14] Pitts30k [4, 14] SF-XL v1 [7] SF-XL v2 [7] SF-XL night [5] SF-XL occlusion [5] Tokyo 24/7 [36]
Method
Dim. R1 R10 R1 R10 R1 R10 R1 R10 R1 R10 R1 R10 R1 R10 R1 R10 R1 R10 R1 R10
NetVLAD [4] 4096 69.0 95.0 77.7 90.5 54.5 70.4 85.9 95.0 85.0 94.4 40.1 57.7 76.9 91.1 6.7 14.2 9.2 22.4 69.8 82.9
AP-GeM [27] 2048 59.8 90.8 68.3 84.0 56.0 72.9 80.0 93.5 80.7 94.1 37.9 54.1 66.4 84.6 7.5 16.7 5.3 14.5 57.5 77.5
CosPlace [7] 2048 52.0 80.4 90.0 94.9 85.0 92.6 92.3 98.4 90.9 96.7 76.6 85.5 88.8 96.8 23.6 32.8 30.3 44.7 87.3 95.6
MixVPR [2] 4096 71.9 94.7 89.6 94.4 83.2 91.9 94.3 98.9 91.6 96.4 72.5 80.9 88.6 95.0 19.5 30.5 30.3 38.2 87.0 94.0
EigenPlaces [9] 2048 69.1 91.9 90.7 95.4 85.9 93.1 94.1 98.7 92.5 97.6 84.0 90.7 90.8 96.7 23.6 34.5 32.9 52.6 93.0 97.5
AnyLoc [17] 49152 75.6 95.2 85.0 94.1 58.7 74.5 89.4 98.0 86.3 96.7 - - - - - - - - 87.6 97.5
Salad [16] 8448 72.7 93.6 91.6 95.9 88.2 95.0 95.0 99.2 92.3 97.4 88.7 94.4 94.6 98.2 46.1 62.4 50.0 68.4 94.6 98.1
CricaVPR [20] 10752 65.6 93.2 88.0 94.3 76.7 87.2 92.6 98.3 90.0 96.7 62.6 78.9 86.3 96.0 25.8 40.6 27.6 47.4 82.9 93.7
CliqueMining [33] 8448 72.9 92.7 91.9 96.2 91.6 95.9 95.3 99.2 92.6 97.8 85.5 92.6 94.5 98.3 46.1 60.9 44.7 64.5 96.8 97.8
MegaLoc (Ours) 8448 87.7 98.0 92.6 96.8 91.0 95.8 96.4 99.3 94.1 98.2 95.3 98.0 94.8 98.5 52.8 73.8 51.3 75.0 96.5 99.4

Table 1. Recall@1 and Recall@10 on multiple VPR datasets. Best overall results on each dataset are in bold, second best results
underlined. Results marked with a “-” did not fit in 480GB of RAM (2.8M features of 49k dimensions require 560GB for a float32-based
kNN).

CAB (Phone) HGE (Phone) LIN (Phone) CAB (HoloLens) HGE (HoloLens) LIN (HoloLens)
Method
(1, 0.1) (5, 1.0) (1, 0.1) (5, 1.0) (1, 0.1) (5, 1.0) (1, 0.1) (5, 1.0) (1, 0.1) (5, 1.0) (1, 0.1) (5, 1.0)
NetVLAD 43.4 54.0 54.8 80.0 74.4 87.8 63.1 81.4 57.9 71.6 76.1 83.0
AP-GeM 39.4 52.0 58.0 81.3 69.1 82.0 62.9 82.5 65.6 76.6 80.7 91.1
Fusion (NetVLAD+AP-GeM) 41.4 53.8 56.3 82.4 76.0 89.4 63.2 83.1 63.1 75.1 78.5 87.0
CosPlace 29.0 37.4 54.4 81.3 63.3 75.7 56.4 77.8 55.6 69.8 80.6 91.4
MixVPR 40.9 50.8 59.2 83.8 77.5 89.8 65.2 84.7 63.3 74.7 83.6 92.2
EigenPlaces 32.3 44.7 56.3 81.3 70.2 82.6 63.9 81.8 60.2 72.5 84.8 93.1
AnyLoc 48.0 59.8 58.8 83.0 77.2 92.4 69.7 88.5 70.1 81.0 81.4 90.4
Salad 44.2 55.6 65.3 92.2 81.7 94.0 71.5 90.7 75.3 85.2 91.3 99.4
CricaVPR 40.4 52.0 63.7 89.3 80.7 93.1 73.9 90.7 72.5 81.6 89.1 98.4
CliqueMining 44.2 55.6 66.0 91.4 80.5 93.1 74.2 90.9 77.3 86.3 92.0 98.8
MegaLoc (Ours) 47.0 60.4 67.2 92.9 83.3 94.9 77.4 93.4 72.9 83.5 92.2 99.0

Table 2. Results on LaMAR’s datasets, computed on each of the three locations, for both types of queries (HoloLens and Phone), which
include both indoor and outdoor. For each location we report the recall at (1°, 10cm) and (5°, 1m), following the LaMAR paper [29].

use RandAugment [11] for data augmentation, as in [1], and VRAM requirement of training MegaLoc from (roughly)
AdamW [19] as optimizer. Training is performed for 40k 300GB to 60GB.
iterations. The loss is simply computed as L = L1 + L2 +
L3 + L4 + L5 + L6 , where each Ln is the multi-similarity
loss computed on one of the sub-batches.
3.2. Results
We perform experiments on three different types of tasks:
The architecture consists of a DINO-v2-base backbone • Visual Place Recognition, where the task is to retrieve im-
[21] followed by a SALAD [16] aggregation layer, which ages that are within 25 meters from the query (Sec. 3.2.1);
has shown state-of-the-art performances over multiple VPR • Visual Localization, where retrieval is part of a bigger
datasets [16, 33]. The SALAD layer is computed with 64 pipeline that aims at finding the precise pose of the query
clusters, 256 channels per cluster, a global token of 256 and given a set of posed images (Sec. 3.2.2);
an MLP dimension of 512. The SALAD layer is followed • Landmark Retrieval, i.e. retrieving images that depict the
by a linear projection (from a dimension of 16640 to 8448) same landmark as the query (Sec. 3.2.3).
and an L2 normalization.

3.2.1. Visual Place Recognition

Memory-efficient GPU training is achieved using Py-
Torch [23], by ensuring that the computational graph for We run experiments on a comprehensive set of Visual Place
each loss stays in memory as little as possible. In practice Recognition datasets. These datasets contain a large va-
(in the code), instead of adding the computational graph for riety of domains, including: outdoor, indoor, street-view,
each loss into a single giant graph, we compute each loss hand-held camera, car-mounted camera, night, occlusions,
and perform the backward() operation independently: call- long-term changes, grayscale. Results are shown in Tab. 1.
ing backward() in PyTorch not only computes the gradient While other high-performing VPR models (like SALAD
(which is added to any existing gradient), but also frees the and CliqueMining) achieve very good results (i.e. compara-
computational graph (hence freeing memory). The step() ble to MegaLoc) on most datasets, MegaLoc vastly outper-
(and zero grad()) method is then called only once (after forms every other model on Baidu, which is an indoor-only
six backward() calls. This simple technique reduces the dataset.

3
Figure 2. Failure cases, grouped in 4 categories. Each one of the 4 column represent a category of failure cases: for each category we
show 5 examples, made of 3 images, namely the query and its top-2 predictions with MegaLoc, which can be in red or green depending
if the prediction is correct (i.e. within 25 meters). The 4 categories that we identified are (1) very difficult cases, which are unlikely to
be solved any time soon; (2) difficult cases, which can probably be solved by slightly better models than the current ones or simple post-
processing; (3) incorrect GPS labels, which, surprisingly, exist also in Mapillary and Google StreetView data; (4) predictions just out of
the 25m threshold, which despite being considered negatives in VPR, are actually useful predictions for real-world applications.

Method
R-Oxford R-Paris method. Results are reported in Tab. 2.
E M H E M H
NetVLAD 24.1 16.1 4.7 61.2 46.3 22.0 3.2.3. Landmark Retrieval
AP-GeM 49.6 37.6 19.3 82.5 69.5 45.5
CosPlace 32.1 23.4 10.3 57.6 45.0 22.3 For the task of Landmark Retrieval we compute results on
MixVPR 38.2 28.4 10.8 61.9 48.3 25.0 the most used datasets in literature, namely (the revisited
EigenPlaces 29.4 22.9 11.8 60.9 47.3 23.6 versions of [26]) Oxford5k [24] and Paris6k [25]. To do
AnyLoc 64.2 45.5 18.9 82.8 68.5 48.8
this we relied on the official codebase for the datasets2 , by
Salad 55.2 42.3 21.4 76.6 66.2 44.8
CricaVPR 57.0 39.2 15.3 80.0 68.9 48.9 simply swapping the retrieval method. Results, reported in
CliqueMining 52.2 41.0 22.1 71.8 60.5 41.2 Tab. 3, show a large gap between MegaLoc and previous
MegaLoc (Ours) 91.0 79.0 62.1 95.3 89.6 77.1 VPR models on this task, which can be simply explained
by the fact that previous models were only optimized for the
standard VPR metric of retrieving images within 25 meters
Table 3. Results on Landmark Retrieval datasets, respectively
from the query.
Revisited Paris 6k [25, 26] and Revisited Oxford 5k [24, 26].
3.2.4. Failure Cases
We identified a series of 4 main categories of “failure cases”
3.2.2. Visual Localization
that prevent the results from reaching 100% recalls, and we
Image retrieval is a core tool to solve 3D vision tasks, in present them in Fig. 2. We note however that, from a prac-
pipelines like visual localization (e.g. Hierarchical Local- tical perspective, the only real failure cases are depicted
ization [28] and InLoc [35]) and 3D reconstructions (e.g. in the second category/column of Fig. 2: furthermore, in
COLMAP [30, 31] and GLOMAP [22]). To understand if most similar cases SOTA models (i.e. not only MegaLoc,
our method can help this use case, we compute results on but also other recent ones) can actually retrieve precise pre-
the three datasets of Lamar [29], which comprise various dictions, meaning that these failure cases can be likely solv-
challenges, including plenty of visual aliasing from both in- able by some simple post-processing techniques (e.g. re-
door and outdoor imagery. To do this, we relied on the of- ranking with image matchers, or majority voting). Finally,
ficial LaMAR codebase1 by simply replacing the retrieval another failure case that we noted, is when database images
1 https://github.com/microsoft/lamar-benchmark 2 https://github.com/filipradenovic/revisitop

4
do not cover properly the search area: this is very com- weakly supervised place recognition. IEEE Transactions
mon in the Mapillary (MSLS) dataset, where database im- on Pattern Analysis and Machine Intelligence, 40(6):1437–
ages only show one direction (e.g. photos along a road taken 1451, 2018. 1, 3
from north to south), while the queries are photos facing the [5] Giovanni Barbarani, Mohamad Mostafa, Hajali Bayramov,
other direction. We note however, that in the real world this Gabriele Trivigno, Gabriele Berton, Carlo Masone, and Bar-
bara Caputo. Are local features all you need for cross-
can be easily solved by collecting database images in multi-
domain visual place recognition? In CVPRW, pages 6155–
ple directions, which is also common in most test datasets,
6165, 2023. 3
like Eynsham, Pitts30k, Tokyo 24/7 and SF-XL. [6] Fabio Bellavia, Jiri Matas, Dmytro Mishkin, Luca Morelli,
Fabio Remondino, Weiwei Sun, Amy Tabb, Eduard Trulls,
4. Conclusion and limitations Kwang Moo Yi, Sohier Dane, and Ashley Chow. Im-
age matching challenge 2024 - hexathlon. https://
So, is image retrieval for localization solved? Well, almost. kaggle.com/competitions/image- matching-
While some datasets still show some room for improve- challenge-2024, 2024. Kaggle. 2
ment, we note that this is often due to either arguably un- [7] Gabriele Berton, Carlo Masone, and Barbara Caputo. Re-
solvable failure cases, wrong labels, and very few cases that thinking visual geo-localization for large-scale applications.
can be solved by better models. We emphasize however that In IEEE Conference on Computer Vision and Pattern Recog-
this has been the case for some time, as previous DINO-v2- nition, pages 4868–4878, 2022. 2, 3, 5
based models, like SALAD and CliqueMining, show very [8] Gabriele Berton, Riccardo Mereu, Gabriele Trivigno, Carlo
high results on classic VPR datasets. What is still missing Masone, Gabriela Csurka, Torsten Sattler, and Barbara Ca-
from literature is models like MegaLoc that achieve good puto. Deep visual geo-localization benchmark, 2023. 2, 3
results in a variety of diverse tasks and domains. [9] Gabriele Berton, Gabriele Trivigno, Barbara Caputo, and
Should you always use MegaLoc? Well, almost, except Carlo Masone. Eigenplaces: Training viewpoint robust
models for visual place recognition. In Proceedings of the
for at least 3 use-cases. MegaLoc has shown great results
IEEE/CVF International Conference on Computer Vision
on a variety of related tasks, and, unlike other VPR models, (ICCV), pages 11080–11090, 2023. 2, 3
achieves good results on landmark retrieval, which make it [10] Gabriela Csurka, Christopher Dance, Lixin Fan, Jutta
a great option also for retrieval for 3D reconstruction tasks, Willamowski, and Cédric Bray. Visual categorization with
besides standard VPR and visual localization tasks. How- bags of keypoints. In European Conference on Computer
ever, experiments show that MegaLoc is outperformed by Vision, 2004. 1
CliqueMining in MSLS, which is a dataset made of (almost [11] Ekin Dogus Cubuk, Barret Zoph, Jon Shlens, and Quoc Le.
entirely) forward facing images (i.e. photos where the cam- Randaugment: Practical automated data augmentation with
era is facing the same direction of the street, instead of fac- a reduced search space. In Advances in Neural Information
ing sideways towards the side of the street). Another use Processing Systems, pages 18613–18624. Curran Associates,
case where MegaLoc is likely to be suboptimal is in very Inc., 2020. 3
unusual natural environments, like forests or caves, where [12] M. Cummins and P. Newman. Highly scalable appearance-
only slam - FAB-MAP 2.0. In Robotics: Science and Sys-
instead AnyLoc has been shown to work well [17]. A third
tems, 2009. 3
and final use case where other models might be preferred [13] Angela Dai, Angel X. Chang, Manolis Savva, Maciej Hal-
to MegaLoc is for embedded systems, where one might opt ber, Thomas Funkhouser, and Matthias Nießner. Scannet:
for more lightweight models, like the ResNet-18 [15] ver- Richly-annotated 3d reconstructions of indoor scenes. In
sions of CosPlace [7], which has 11M parameters instead of Proc. Computer Vision and Pattern Recognition (CVPR),
MegaLoc’s 228M. IEEE, 2017. 2
[14] Petr Gronát, Guillaume Obozinski, Josef Sivic, and Tomá
References Pajdla. Learning and calibrating per-location classifiers for
visual place recognition. In 2013 IEEE Conference on Com-
[1] Amar Ali-bey, Brahim Chaib-draa, and Philippe Giguère. puter Vision and Pattern Recognition, pages 907–914, 2013.
Gsv-cities: Toward appropriate supervised visual place 3
recognition. Neurocomputing, 513:194–203, 2022. 2, 3 [15] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning
[2] Amar Ali-bey, Brahim Chaib-draa, and Philippe Giguère. for image recognition. In IEEE Conference on Computer
Mixvpr: Feature mixing for visual place recognition. In Pro- Vision and Pattern Recognition, pages 770–778, 2016. 5
ceedings of the IEEE/CVF Winter Conference on Applica- [16] Sergio Izquierdo and Javier Civera. Optimal transport aggre-
tions of Computer Vision, pages 2998–3007, 2023. 3 gation for visual place recognition. In IEEE Conference on
[3] R. Arandjelović and Andrew Zisserman. Three things every- Computer Vision and Pattern Recognition, 2024. 2, 3
one should know to improve object retrieval. pages 2911– [17] Nikhil Keetha, Avneesh Mishra, Jay Karhade, Kr-
2918, 2012. 1 ishna Murthy Jatavallabhula, Sebastian Scherer, Madhava
[4] Relja Arandjelović, Petr Gronat, Akihiko Torii, Tomas Pa- Krishna, and Sourav Garg. Anyloc: Towards universal vi-
jdla, and Josef Sivic. NetVLAD: CNN architecture for sual place recognition. arXiv, 2023. 3, 5

5
[18] Zhengqi Li and Noah Snavely. Megadepth: Learning single- Ondrej Miksik, and Marc Pollefeys. LaMAR: Benchmark-
view depth prediction from internet photos. In Proceed- ing Localization and Mapping for Augmented Reality. In
ings of the IEEE conference on computer vision and pattern ECCV, 2022. 3, 4
recognition, pages 2041–2050, 2018. 2 [30] Johannes Lutz Schönberger and Jan-Michael Frahm.
[19] Ilya Loshchilov and Frank Hutter. Decoupled weight de- Structure-from-motion revisited. In CVPR, 2016. 1, 4
cay regularization. In International Conference on Learning [31] Johannes Lutz Schönberger, Enliang Zheng, Marc Pollefeys,
Representations, 2019. 3 and Jan-Michael Frahm. Pixelwise view selection for un-
[20] Feng Lu, Xiangyuan Lan, Lijun Zhang, Dongmei Jiang, structured multi-view stereo. In ECCV, 2016. 4
Yaowei Wang, and Chun Yuan. Cricavpr: Cross-image [32] Johannes L. Schönberger, True Price, Torsten Sattler, Jan-
correlation-aware representation learning for visual place Michael Frahm, and Marc Pollefeys. A vote-and-verify strat-
recognition. In Proceedings of the IEEE/CVF Conference egy for fast spatial verification in image retrieval. In Com-
on Computer Vision and Pattern Recognition (CVPR), 2024. puter Vision – ACCV 2016, pages 321–337, Cham, 2017.
3 Springer International Publishing. 1
[21] Maxime Oquab, Timothée Darcet, Theo Moutakanni, Huy V. [33] Javier Civera Sergio Izquierdo. Close, but not there: Boost-
Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, ing geographic distance sensitivity in visual place recogni-
Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Rus- tion. In European Conference on Computer Vision (ECCV),
sell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang- 2024. 2, 3
Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nico- [34] Xun Sun, Yuanfan Xie, Peiwen Luo, and Liang Wang. A
las Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, dataset for benchmarking image-based localization. 2017
Julien Mairal, Patrick Labatut, Armand Joulin, and Piotr Bo- IEEE Conference on Computer Vision and Pattern Recog-
janowski. Dinov2: Learning robust visual features without nition (CVPR), pages 5641–5649, 2017. 3
supervision, 2023. 3
[35] Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea
[22] Linfei Pan, Daniel Barath, Marc Pollefeys, and Jo- Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, and Ak-
hannes Lutz Schönberger. Global Structure-from-Motion ihiko Torii. InLoc: Indoor visual localization with dense
Revisited. In European Conference on Computer Vision matching and view synthesis. In IEEE Conference on Com-
(ECCV), 2024. 1, 4 puter Vision and Pattern Recognition, 2018. 4
[23] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
[36] A. Torii, R. Arandjelović, J. Sivic, M. Okutomi, and T. Pa-
James Bradbury, Gregory Chanan, Trevor Killeen, Zem-
jdla. 24/7 place recognition by view synthesis. IEEE Trans-
ing Lin, Natalia Gimelshein, Luca Antiga, Alban Desmai-
actions on Pattern Analysis and Machine Intelligence, 40(2):
son, Andreas Kopf, Edward Yang, Zachary DeVito, Mar-
257–271, 2018. 3
tin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit
[37] Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai
Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. Pytorch:
Zhang, Gordon Wetzstein, Bharath Hariharan, and Noah
An imperative style, high-performance deep learning library.
Snavely. Megascenes: Scene-level view synthesis at scale.
In Advances in Neural Information Processing Systems 32,
In ECCV, 2024. 2
pages 8024–8035. Curran Associates, Inc., 2019. 3
[24] James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and [38] Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and
Andrew Zisserman. Object retrieval with large vocabular- Matthew R Scott. Multi-similarity loss with general pair
ies and fast spatial matching. In IEEE Conference on Com- weighting for deep metric learning. In Proceedings of the
puter Vision and Pattern Recognition. IEEE Computer Soci- IEEE Conference on Computer Vision and Pattern Recogni-
ety, 2007. 4 tion, pages 5022–5030, 2019. 2
[25] James Philbin, Ondrej Chum, Michael Isard, Josef Sivic, and [39] Frederik Warburg, Søren Hauberg, Manuel López-
Andrew Zisserman. Lost in quantization: Improving par- Antequera, Pau Gargallo, Yubin Kuang, and Javier
ticular object retrieval in large scale image databases. In Civera. Mapillary street-level sequences: A dataset for
IEEE Conference on Computer Vision and Pattern Recog- lifelong place recognition. In 2020 IEEE/CVF Conference
nition, 2008. 4 on Computer Vision and Pattern Recognition (CVPR), pages
[26] F. Radenović, A. Iscen, G. Tolias, Y. Avrithis, and O. Chum. 2623–2632, 2020. 2, 3
Revisiting oxford and paris: Large-scale image retrieval [40] Tobias Weyand, A. Araújo, Bingyi Cao, and Jack Sim.
benchmarking. In CVPR, 2018. 4 Google landmarks dataset v2 – a large-scale benchmark for
[27] Jérôme Revaud, Jon Almazán, R. S. Rezende, and instance-level recognition and retrieval. In IEEE Conference
César Roberto de Souza. Learning with average precision: on Computer Vision and Pattern Recognition, pages 2572–
Training image retrieval with a listwise loss. 2019 IEEE/CVF 2581, 2020. 1
International Conference on Computer Vision (ICCV), pages
5106–5115, 2019. 3
[28] Paul-Edouard Sarlin, Cesar Cadena, Roland Siegwart, and
Marcin Dymczyk. From coarse to fine: Robust hierarchical
localization at large scale. In CVPR, 2019. 1, 4
[29] Paul-Edouard Sarlin, Mihai Dusmanu, Johannes L.
Schönberger, Pablo Speciale, Lukas Gruber, Viktor Larsson,

Final Evaluation Quiz (SIEM PoX L4) Attempt Review
100% (1)
Final Evaluation Quiz (SIEM PoX L4) Attempt Review
10 pages
VGGT
No ratings yet
VGGT
20 pages
Masone Berton 2023 06 CVPR23 Geoloc Tutorial
No ratings yet
Masone Berton 2023 06 CVPR23 Geoloc Tutorial
79 pages
Koundinya Nouduri, Filiz Bunyak, Shizeng Yao, Hadi Aliakbarpour, Sanjeev Agarwal, Raghuveer Rao, Kannappan Palaniappan
No ratings yet
Koundinya Nouduri, Filiz Bunyak, Shizeng Yao, Hadi Aliakbarpour, Sanjeev Agarwal, Raghuveer Rao, Kannappan Palaniappan
5 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Image Patch-Matching With Graph-Based Learning
No ratings yet
Image Patch-Matching With Graph-Based Learning
18 pages
Revisit Anything
No ratings yet
Revisit Anything
29 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
OMNIVIS09
No ratings yet
OMNIVIS09
9 pages
Recollection From Pensieve: Novel View Synthesis Via Learning From Uncalibrated Videos
No ratings yet
Recollection From Pensieve: Novel View Synthesis Via Learning From Uncalibrated Videos
13 pages
Singh 2020
No ratings yet
Singh 2020
5 pages
Context-Based Visual-Language Place Recognition
No ratings yet
Context-Based Visual-Language Place Recognition
7 pages
Blob Detection: Unveiling Patterns in Visual Data
From Everand
Blob Detection: Unveiling Patterns in Visual Data
Fouad Sabry
No ratings yet
Deep Learning Features at Scale For Visual Place Recognition
No ratings yet
Deep Learning Features at Scale For Visual Place Recognition
8 pages
Building Rome in A Day
No ratings yet
Building Rome in A Day
8 pages
Roma Pepper
No ratings yet
Roma Pepper
8 pages
Remotesensing 14 03324 v2
No ratings yet
Remotesensing 14 03324 v2
15 pages
Stereo Vision IROS05
No ratings yet
Stereo Vision IROS05
7 pages
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
No ratings yet
Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
16 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
DUSt 3 R
No ratings yet
DUSt 3 R
13 pages
Progressive Learning of 3D Reconstruction Network From 2D GAN Data
No ratings yet
Progressive Learning of 3D Reconstruction Network From 2D GAN Data
12 pages
Visual Global Localization With A Hybrid WNN-CNN Approach
No ratings yet
Visual Global Localization With A Hybrid WNN-CNN Approach
9 pages
Point-Based Multi-View Stereo Network
No ratings yet
Point-Based Multi-View Stereo Network
13 pages
Probabilistic Place Recognition With Covisibility Maps
No ratings yet
Probabilistic Place Recognition With Covisibility Maps
7 pages
Thesis Rec From Video
No ratings yet
Thesis Rec From Video
124 pages
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
No ratings yet
MVD-Fusion: Single-View 3D Via Depth-Consistent Multi-View Generation
11 pages
Combining Deep Learningand Variational Level Setsfor Segmentationof Buildings
No ratings yet
Combining Deep Learningand Variational Level Setsfor Segmentationof Buildings
2 pages
Multi - View Stereo A Tutorial
No ratings yet
Multi - View Stereo A Tutorial
151 pages
Enabling Robust Visual Navigation: PHD Research Proposal
No ratings yet
Enabling Robust Visual Navigation: PHD Research Proposal
14 pages
Stereo4D: Learning How Things Move in 3D From Internet Stereo Videos
No ratings yet
Stereo4D: Learning How Things Move in 3D From Internet Stereo Videos
17 pages
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Piecewise Planar City 3D Modeling From Street View
No ratings yet
Piecewise Planar City 3D Modeling From Street View
8 pages
Predicting Images Using Convolutional Networks - Visual Scene Understanding With Pixel Maps
No ratings yet
Predicting Images Using Convolutional Networks - Visual Scene Understanding With Pixel Maps
149 pages
Remotesensing 14 03010
No ratings yet
Remotesensing 14 03010
47 pages
Dust3R: Geometric 3D Vision Made Easy
No ratings yet
Dust3R: Geometric 3D Vision Made Easy
23 pages
Dust3R: Geometric 3D Vision Made Easy
No ratings yet
Dust3R: Geometric 3D Vision Made Easy
23 pages
OpenSeqSLAM (Matlab)
No ratings yet
OpenSeqSLAM (Matlab)
7 pages
Dynamic SLAM A Visual SLAM in Outdoor Dynamic Scen
No ratings yet
Dynamic SLAM A Visual SLAM in Outdoor Dynamic Scen
15 pages
Schoeps 2017 CVPR
No ratings yet
Schoeps 2017 CVPR
10 pages
Vision-Based Mobile Robot Localization and Mapping Using Scale-Invariant Features
No ratings yet
Vision-Based Mobile Robot Localization and Mapping Using Scale-Invariant Features
8 pages
Visual Slam
No ratings yet
Visual Slam
16 pages
Master Thesis Emily Mes
No ratings yet
Master Thesis Emily Mes
45 pages
A Real World Dataset For Multi-View 3D
No ratings yet
A Real World Dataset For Multi-View 3D
18 pages
Spann3r 2408.16061v1
No ratings yet
Spann3r 2408.16061v1
14 pages
Learning Detect Match
No ratings yet
Learning Detect Match
12 pages
3D Reconstruction USING MULTIPLE 2D IMAGES
No ratings yet
3D Reconstruction USING MULTIPLE 2D IMAGES
4 pages
LFM-3D Learnable Feature Matching Across Wide Baselines Using 3D Signals
No ratings yet
LFM-3D Learnable Feature Matching Across Wide Baselines Using 3D Signals
10 pages
Get Out of My Lab: Large-Scale, Real-Time Visual-Inertial Localization
No ratings yet
Get Out of My Lab: Large-Scale, Real-Time Visual-Inertial Localization
10 pages
IEEE Xplore Reference Download 2023.12.28.20.31.13
No ratings yet
IEEE Xplore Reference Download 2023.12.28.20.31.13
2 pages
Predicting Depth, Surface Normals and Semantic Labels With A Common Multi-Scale Convolutional Architecture
No ratings yet
Predicting Depth, Surface Normals and Semantic Labels With A Common Multi-Scale Convolutional Architecture
9 pages
Vlocnet: Nguyen Anh Minh - IVSR - 2021
No ratings yet
Vlocnet: Nguyen Anh Minh - IVSR - 2021
14 pages
Know Your Neighbors: Improving Single-View Reconstruction Via Spatial Vision-Language Reasoning
No ratings yet
Know Your Neighbors: Improving Single-View Reconstruction Via Spatial Vision-Language Reasoning
11 pages
Gosmatch: Graph-Of-Semantics Matching For Detecting Loop Closures in 3D Lidar Data
No ratings yet
Gosmatch: Graph-Of-Semantics Matching For Detecting Loop Closures in 3D Lidar Data
7 pages
Fast-Mvsnet: Sparse-To-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement
No ratings yet
Fast-Mvsnet: Sparse-To-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement
13 pages
Neurocomputing: Fei Yan, Jiawei Wang, Guojian He, Huan Chang, Yan Zhuang
No ratings yet
Neurocomputing: Fei Yan, Jiawei Wang, Guojian He, Huan Chang, Yan Zhuang
10 pages
Superquadric Object Representation For Optimization-Based Semantic SLAM
No ratings yet
Superquadric Object Representation For Optimization-Based Semantic SLAM
8 pages
Combining Deep Learningand Variational Level Setsfor Segmentationof Buildings
No ratings yet
Combining Deep Learningand Variational Level Setsfor Segmentationof Buildings
3 pages
A Robust Deep Learning Enhanced Monocular SLAM System For Dynamic Environments
No ratings yet
A Robust Deep Learning Enhanced Monocular SLAM System For Dynamic Environments
8 pages
Lecture 16 Hao
No ratings yet
Lecture 16 Hao
56 pages
30 Lab Manual
No ratings yet
30 Lab Manual
12 pages
Snowflake Practicetest SnowPro-Core v2021-04-12 by Wangxiulan 46q
No ratings yet
Snowflake Practicetest SnowPro-Core v2021-04-12 by Wangxiulan 46q
19 pages
Fuzzy Set Theory: 2.1 Human Beings and Fuzziness
No ratings yet
Fuzzy Set Theory: 2.1 Human Beings and Fuzziness
21 pages
CV of Sami Kazimi Mar 2024
No ratings yet
CV of Sami Kazimi Mar 2024
6 pages
Cep Report Final
No ratings yet
Cep Report Final
17 pages
Sap Hana Admin Trainiing in Hyderabad
No ratings yet
Sap Hana Admin Trainiing in Hyderabad
8 pages
Chapter 5. Creating Databases and Tables: Creating A Simple Database
No ratings yet
Chapter 5. Creating Databases and Tables: Creating A Simple Database
2 pages
Dashrath Nandan BDA (Unit-2) Notes
No ratings yet
Dashrath Nandan BDA (Unit-2) Notes
23 pages
Ignition Databases
100% (1)
Ignition Databases
38 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
140 pages
Lab#6 Profiles, Users, Roles, and Privileges
No ratings yet
Lab#6 Profiles, Users, Roles, and Privileges
5 pages
CCS334-Big-Data-Analytics UNIVERSITY QP
No ratings yet
CCS334-Big-Data-Analytics UNIVERSITY QP
20 pages
Balaji PLM
No ratings yet
Balaji PLM
4 pages
PHP and Mysql Slides
100% (2)
PHP and Mysql Slides
65 pages
Patient Prescription Database: Project 2 Report
No ratings yet
Patient Prescription Database: Project 2 Report
10 pages
1statement of Standard 1
No ratings yet
1statement of Standard 1
23 pages
Business Process Documentation
100% (1)
Business Process Documentation
37 pages
Medmont Studio 7.2.9 Release Notes ٢
No ratings yet
Medmont Studio 7.2.9 Release Notes ٢
8 pages
Puneet Pal Singh
No ratings yet
Puneet Pal Singh
9 pages
Srs Document For Hotel Management System
No ratings yet
Srs Document For Hotel Management System
20 pages
Advanced Database Chapter One
100% (1)
Advanced Database Chapter One
60 pages
Database Management Systems, 2/e: Book Information Sheet Book Information Sheet
No ratings yet
Database Management Systems, 2/e: Book Information Sheet Book Information Sheet
2 pages
WinCC TIA Portal Archivierung Komp DOC V2.0 en
No ratings yet
WinCC TIA Portal Archivierung Komp DOC V2.0 en
46 pages
Ujjwal Resume
No ratings yet
Ujjwal Resume
1 page
Data Mining MCQ
75% (4)
Data Mining MCQ
24 pages
Introduction To Practical Software Engineering
No ratings yet
Introduction To Practical Software Engineering
6 pages
Dbms
No ratings yet
Dbms
16 pages
Ipocc Dialer en
No ratings yet
Ipocc Dialer en
79 pages
Cybersecurity Risks For NG9-1-1 (100418) - 508C - FINAL
No ratings yet
Cybersecurity Risks For NG9-1-1 (100418) - 508C - FINAL
15 pages

Megaloc

Uploaded by

Megaloc

Uploaded by

MegaLoc: One Retrieval to Place Them All

Gabriele Berton Carlo Masone

Retrieving images from the same location as a given

San Francisco eXtra Large (SF-XL) is a dataset of 41M 3. Experiments

3.2.1. Visual Place Recognition

You might also like