Monocular Arbitrary Moving Object Discovery and Segmentation
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F21%3A00354230" target="_blank" >RIV/68407700:21230/21:00354230 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.bmvc2021-virtualconference.com/assets/papers/1500.pdf" target="_blank" >https://www.bmvc2021-virtualconference.com/assets/papers/1500.pdf</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Monocular Arbitrary Moving Object Discovery and Segmentation
Popis výsledku v původním jazyce
We propose a method for discovery and segmentation of objects that are, or their parts are, independently moving in the scene. Given three monocular video frames, the method outputs semantically meaningful regions, i.e. regions corresponding to the whole object, even when only a part of it moves. The architecture of the CNN-based end-to-end method, called Raptor, combines semantic and motion backbones, which pass their outputs to a final region segmentation network. The semantic backbone is trained in a class-agnostic manner in order to generalise to object classes beyond the training data. The core of the motion branch is a geometrical cost volume computed from optical flow, optical expansion, mono-depth and the estimated camera motion. Evaluation of the proposed architecture on the instance motion segmentation and binary moving-static segmentation problems on KITTI, DAVIS-Moving and YTVOSMoving datasets shows that the proposed method achieves state-of-the-art results on all the datasets and is able to generalise well to various environments. For the KITTI dataset, we provide an upgraded instance motion segmentation annotation which covers all moving objects. Dataset, code and models are available on the github project page github.com/michalneoral/Raptor.
Název v anglickém jazyce
Monocular Arbitrary Moving Object Discovery and Segmentation
Popis výsledku anglicky
We propose a method for discovery and segmentation of objects that are, or their parts are, independently moving in the scene. Given three monocular video frames, the method outputs semantically meaningful regions, i.e. regions corresponding to the whole object, even when only a part of it moves. The architecture of the CNN-based end-to-end method, called Raptor, combines semantic and motion backbones, which pass their outputs to a final region segmentation network. The semantic backbone is trained in a class-agnostic manner in order to generalise to object classes beyond the training data. The core of the motion branch is a geometrical cost volume computed from optical flow, optical expansion, mono-depth and the estimated camera motion. Evaluation of the proposed architecture on the instance motion segmentation and binary moving-static segmentation problems on KITTI, DAVIS-Moving and YTVOSMoving datasets shows that the proposed method achieves state-of-the-art results on all the datasets and is able to generalise well to various environments. For the KITTI dataset, we provide an upgraded instance motion segmentation annotation which covers all moving objects. Dataset, code and models are available on the github project page github.com/michalneoral/Raptor.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů