Learning Actionness via Long-range Temporal Order Verification

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F20%3A00347774" target="_blank" >RIV/68407700:21730/20:00347774 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1007/978-3-030-58526-6_28" target="_blank" >https://doi.org/10.1007/978-3-030-58526-6_28</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-58526-6_28" target="_blank" >10.1007/978-3-030-58526-6_28</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Learning Actionness via Long-range Temporal Order Verification
Popis výsledku v původním jazyce
Current methods for action recognition typically rely on supervision provided by manual labeling. Such methods, however, do not scale well given the high burden of manual video annotation and a very large number of possible actions. The annotation is particularly difficult for temporal action localization where large parts of the video present no action, or background. To address these challenges, we here propose a self-supervised and generic method to isolate actions from their background. We build on the observation that actions often follow a particular temporal order and, hence, can be predicted by other actions in the same video. As consecutive actions might be separated by minutes, differently to prior work on the arrow of time, we here exploit long-range temporal relations in 10-20 minutes long videos. To this end, we propose a new model that learns actionness via a self-supervised proxy task of order verification. The model assigns high actionness scores to clips which order is easy to predict from other clips in the video. To obtain a powerful and action-agnostic model, we train it on the large-scale unlabeled HowTo100M dataset with highly diverse actions from instructional videos. We validate our method on the task of action localization and demonstrate consistent improvements when combined with other recent weakly-supervised methods.
Název v anglickém jazyce
Learning Actionness via Long-range Temporal Order Verification
Popis výsledku anglicky
Current methods for action recognition typically rely on supervision provided by manual labeling. Such methods, however, do not scale well given the high burden of manual video annotation and a very large number of possible actions. The annotation is particularly difficult for temporal action localization where large parts of the video present no action, or background. To address these challenges, we here propose a self-supervised and generic method to isolate actions from their background. We build on the observation that actions often follow a particular temporal order and, hence, can be predicted by other actions in the same video. As consecutive actions might be separated by minutes, differently to prior work on the arrow of time, we here exploit long-range temporal relations in 10-20 minutes long videos. To this end, we propose a new model that learns actionness via a self-supervised proxy task of order verification. The model assigns high actionness scores to clips which order is easy to predict from other clips in the video. To obtain a powerful and action-agnostic model, we train it on the large-scale unlabeled HowTo100M dataset with highly diverse actions from instructional videos. We validate our method on the task of action localization and demonstrate consistent improvements when combined with other recent weakly-supervised methods.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/EF15_003%2F0000468" target="_blank" >EF15_003/0000468: Inteligentní strojové vnímání</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2020
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Book Subtitle 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX
ISBN
978-3-030-58525-9
ISSN
0302-9743
e-ISSN
1611-3349
Počet stran výsledku
18
Strana od-do
470-487
Název nakladatele
Springer
Místo vydání
Cham
Místo konání akce
Glasgow
Datum konání akce
23. 8. 2020
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

End-to-End Learning of Visual Representations from Uncurated Instructional Videos HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Learning Actionness via Long-range Temporal Order Verification

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)