Vision UFormer: Long-Range Monocular Absolute Depth Estimation

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F23%3APU149297" target="_blank" >RIV/00216305:26230/23:PU149297 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0097849323000262" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0097849323000262</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.cag.2023.02.003" target="_blank" >10.1016/j.cag.2023.02.003</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Vision UFormer: Long-Range Monocular Absolute Depth Estimation
Popis výsledku v původním jazyce
We introduce Vision UFormer (ViUT), a novel deep neural long-range monocular depth estimator. The input is an RGB image, and the output is an image that stores the absolute distance of the object in the scene as its per-pixel values. ViUT consists of a Transformer encoder and a ResNet decoder combined with UNet style of skip connections. It is trained on 1M images across ten datasets in a staged regime that starts with easier-to-predict data such as indoor photographs and continues to more complex long-range outdoor scenes. We show that ViUT provides comparable results for normalized relative distances and short-range classical datasets such as NYUv2 and KITTI. We further show that it successfully estimates of absolute long-range depth in meters. We validate ViUT on a wide variety of long-range scenes showing its high estimation capabilities with a relative improvement of up to 23%. Absolute depth estimation finds application in many areas, and we show its usability in image composition, range annotation, defocus, and scene reconstruction.
Název v anglickém jazyce
Vision UFormer: Long-Range Monocular Absolute Depth Estimation
Popis výsledku anglicky
We introduce Vision UFormer (ViUT), a novel deep neural long-range monocular depth estimator. The input is an RGB image, and the output is an image that stores the absolute distance of the object in the scene as its per-pixel values. ViUT consists of a Transformer encoder and a ResNet decoder combined with UNet style of skip connections. It is trained on 1M images across ten datasets in a staged regime that starts with easier-to-predict data such as indoor photographs and continues to more complex long-range outdoor scenes. We show that ViUT provides comparable results for normalized relative distances and short-range classical datasets such as NYUv2 and KITTI. We further show that it successfully estimates of absolute long-range depth in meters. We validate ViUT on a wide variety of long-range scenes showing its high estimation capabilities with a relative improvement of up to 23%. Absolute depth estimation finds application in many areas, and we show its usability in image composition, range annotation, defocus, and scene reconstruction.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/LTAIZ19004" target="_blank" >LTAIZ19004: Topografická analýza obrazu s využitím metod hlubokého učení</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
COMPUTERS & GRAPHICS-UK
ISSN
0097-8493
e-ISSN
1873-7684
Svazek periodika
111
Číslo periodika v rámci svazku
4
Stát vydavatele periodika
GB - Spojené království Velké Británie a Severního Irska
Počet stran výsledku
10
Strana od-do
180-189
Kód UT WoS článku
000954860700001
EID výsledku v databázi Scopus
2-s2.0-85149382691

Podobné výsledky(10)

Automated outdoor depth-map generation and alignment Guiding Monocular Depth Estimation Using Depth-Attention Volume CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Vision UFormer: Long-Range Monocular Absolute Depth Estimation

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)