Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F22%3A00358883" target="_blank" >RIV/68407700:21730/22:00358883 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1109/CVPR52688.2022.01357" target="_blank" >https://doi.org/10.1109/CVPR52688.2022.01357</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/CVPR52688.2022.01357" target="_blank" >10.1109/CVPR52688.2022.01357</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Popis výsledku v původním jazyce
Human actions often induce changes of object states such as “cutting an apple”, “cleaning shoes” or “pouring coffee”. In this paper, we seek to temporally localize ob ject states (e.g. “empty” and “full” cup) together with the corresponding state-modifying actions (“pouring coffee”) in long uncurated videos with minimal supervision. The contributions of this work are threefold. First, we develop a self-supervised model for jointly learning state-modifying actions together with the corresponding object states from an uncurated set of videos from the Internet. The model is self-supervised by the causal ordering signal, i.e. initial ob ject state -> manipulating action -> end state. Second, to cope with noisy uncurated training data, our model incor porates a noise adaptive weighting module supervised by a small number of annotated still images, that allows to ef ficiently filter out irrelevant videos during training. Third, we collect a new dataset with more than 2600 hours of video and 34 thousand changes of object states, and manually an notate a part of this data to validate our approach. Our re sults demonstrate substantial improvements over prior work in both action and object state-recognition in video.
Název v anglickém jazyce
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Popis výsledku anglicky
Human actions often induce changes of object states such as “cutting an apple”, “cleaning shoes” or “pouring coffee”. In this paper, we seek to temporally localize ob ject states (e.g. “empty” and “full” cup) together with the corresponding state-modifying actions (“pouring coffee”) in long uncurated videos with minimal supervision. The contributions of this work are threefold. First, we develop a self-supervised model for jointly learning state-modifying actions together with the corresponding object states from an uncurated set of videos from the Internet. The model is self-supervised by the causal ordering signal, i.e. initial ob ject state -> manipulating action -> end state. Second, to cope with noisy uncurated training data, our model incor porates a noise adaptive weighting module supervised by a small number of annotated still images, that allows to ef ficiently filter out irrelevant videos during training. Third, we collect a new dataset with more than 2600 hours of video and 34 thousand changes of object states, and manually an notate a part of this data to validate our approach. Our re sults demonstrate substantial improvements over prior work in both action and object state-recognition in video.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/EF15_003%2F0000468" target="_blank" >EF15_003/0000468: Inteligentní strojové vnímání</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceeding 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
ISBN
978-1-6654-6946-3
ISSN
1063-6919
e-ISSN
2575-7075
Počet stran výsledku
11
Strana od-do
13936-13946
Název nakladatele
IEEE
Místo vydání
Piscataway
Místo konání akce
New Orleans, Louisiana
Datum konání akce
19. 6. 2022
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000870759107004