Learning Actionness via Long-range Temporal Order Verification
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F20%3A00347774" target="_blank" >RIV/68407700:21730/20:00347774 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1007/978-3-030-58526-6_28" target="_blank" >https://doi.org/10.1007/978-3-030-58526-6_28</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-030-58526-6_28" target="_blank" >10.1007/978-3-030-58526-6_28</a>
Alternative languages
Result language
angličtina
Original language name
Learning Actionness via Long-range Temporal Order Verification
Original language description
Current methods for action recognition typically rely on supervision provided by manual labeling. Such methods, however, do not scale well given the high burden of manual video annotation and a very large number of possible actions. The annotation is particularly difficult for temporal action localization where large parts of the video present no action, or background. To address these challenges, we here propose a self-supervised and generic method to isolate actions from their background. We build on the observation that actions often follow a particular temporal order and, hence, can be predicted by other actions in the same video. As consecutive actions might be separated by minutes, differently to prior work on the arrow of time, we here exploit long-range temporal relations in 10-20 minutes long videos. To this end, we propose a new model that learns actionness via a self-supervised proxy task of order verification. The model assigns high actionness scores to clips which order is easy to predict from other clips in the video. To obtain a powerful and action-agnostic model, we train it on the large-scale unlabeled HowTo100M dataset with highly diverse actions from instructional videos. We validate our method on the task of action localization and demonstrate consistent improvements when combined with other recent weakly-supervised methods.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/EF15_003%2F0000468" target="_blank" >EF15_003/0000468: Intelligent Machine Perception</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Book Subtitle 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIX
ISBN
978-3-030-58525-9
ISSN
0302-9743
e-ISSN
1611-3349
Number of pages
18
Pages from-to
470-487
Publisher name
Springer
Place of publication
Cham
Event location
Glasgow
Event date
Aug 23, 2020
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—