Self-Supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F24%3A00381240" target="_blank" >RIV/68407700:21730/24:00381240 - isvavai.cz</a>
Result on the web
<a href="https://ieeexplore.ieee.org/abstract/document/10550578" target="_blank" >https://ieeexplore.ieee.org/abstract/document/10550578</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/3DV62453.2024.00139" target="_blank" >10.1109/3DV62453.2024.00139</a>
Alternative languages
Result language
angličtina
Original language name
Self-Supervised Learning of Neural Implicit Feature Fields for Camera Pose Refinement
Original language description
Visual localization techniques rely upon some underlying scene representation to localize against. These representations can be explicit such as 3D SFM map or implicit, such as a neural network that learns to encode the scene. The former requires sparse feature extractors and matchers to build the scene representation. The latter might lack geometric grounding not capturing the 3D structure of the scene well enough. This paper proposes to jointly learn the scene representation along with a 3D dense feature field and a 2D feature extractor whose outputs are embedded in the same metric space. Through a contrastive framework we align this volumetric field with the image-based extractor and regularize the latter with a ranking loss from learned surface information. We learn the underlying geometry of the scene with an implicit field through volumetric rendering and design our feature field to leverage intermediate geometric information encoded in the implicit field. The resulting features are discriminative and robust to viewpoint change while maintaining rich encoded information. Visual localization is then achieved by aligning the image-based features and the rendered volumetric features. We show the effectiveness of our approach on real-world scenes, demonstrating that our approach outperforms prior and concurrent work on leveraging implicit scene representations for localization.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
3DV2024: Proceedings of the 2024 International Conference in 3D Vision
ISBN
979-8-3503-6246-6
ISSN
2378-3826
e-ISSN
2475-7888
Number of pages
11
Pages from-to
484-494
Publisher name
IEEE Computer Society
Place of publication
Los Alamitos
Event location
Davos
Event date
Mar 18, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
001250581700038