HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F24%3A00372774" target="_blank" >RIV/68407700:21230/24:00372774 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1007/s11263-023-01982-9" target="_blank" >https://doi.org/10.1007/s11263-023-01982-9</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s11263-023-01982-9" target="_blank" >10.1007/s11263-023-01982-9</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
Popis výsledku v původním jazyce
Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12-Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.
Název v anglickém jazyce
HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer
Popis výsledku anglicky
Visual localization is critical to many applications in computer vision and robotics. To address single-image RGB localization, state-of-the-art feature-based methods match local descriptors between a query image and a pre-built 3D model. Recently, deep neural networks have been exploited to regress the mapping between raw pixels and 3D coordinates in the scene, and thus the matching is implicitly performed by the forward pass through the network. However, in a large and ambiguous environment, learning such a regression task directly can be difficult for a single network. In this work, we present a new hierarchical scene coordinate network to predict pixel scene coordinates in a coarse-to-fine manner from a single RGB image. The proposed method, which is an extension of HSCNet, allows us to train compact models which scale robustly to large environments. It sets a new state-of-the-art for single-image localization on the 7-Scenes, 12-Scenes, Cambridge Landmarks datasets, and the combined indoor scenes.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
Výsledek vznikl pri realizaci vícero projektů. Více informací v záložce Projekty.
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
International Journal of Computer Vision
ISSN
0920-5691
e-ISSN
1573-1405
Svazek periodika
132
Číslo periodika v rámci svazku
7
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
21
Strana od-do
2530-2550
Kód UT WoS článku
001156667100002
EID výsledku v databázi Scopus
2-s2.0-85187172970