Learning from Narrated Instruction Videos
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F18%3A00318985" target="_blank" >RIV/68407700:21730/18:00318985 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1109/TPAMI.2017.2749223" target="_blank" >http://dx.doi.org/10.1109/TPAMI.2017.2749223</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/TPAMI.2017.2749223" target="_blank" >10.1109/TPAMI.2017.2749223</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Learning from Narrated Instruction Videos
Popis výsledku v původním jazyce
Automatic assistants could guide a person or a robot in performing new tasks, such as changing a car tire or repotting a plant. Creating such assistants, however, is non-trivial and requires understanding of visual and verbal content of a video. Towards this goal, we here address the problem of automatically learning the main steps of a task from a set of narrated instruction videos. We develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method sequentially clusters textual and visual representations of a task, where the two clustering problems are linked by joint constraints to obtain a single coherent sequence of steps in both modalities. To evaluate our method, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains videos for five different tasks with complex interactions between people and objects, captured in a variety of indoor and outdoor settings. We experimentally demonstrate that the proposed method can automatically discover, learn and localize the main steps of a task input videos.
Název v anglickém jazyce
Learning from Narrated Instruction Videos
Popis výsledku anglicky
Automatic assistants could guide a person or a robot in performing new tasks, such as changing a car tire or repotting a plant. Creating such assistants, however, is non-trivial and requires understanding of visual and verbal content of a video. Towards this goal, we here address the problem of automatically learning the main steps of a task from a set of narrated instruction videos. We develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method sequentially clusters textual and visual representations of a task, where the two clustering problems are linked by joint constraints to obtain a single coherent sequence of steps in both modalities. To evaluate our method, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains videos for five different tasks with complex interactions between people and objects, captured in a variety of indoor and outdoor settings. We experimentally demonstrate that the proposed method can automatically discover, learn and localize the main steps of a task input videos.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/EF15_003%2F0000468" target="_blank" >EF15_003/0000468: Inteligentní strojové vnímání</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
IEEE Transactions on Pattern Analysis and Machine Intelligence
ISSN
0162-8828
e-ISSN
1939-3539
Svazek periodika
40
Číslo periodika v rámci svazku
9
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
15
Strana od-do
2194-2208
Kód UT WoS článku
000440868400012
EID výsledku v databázi Scopus
2-s2.0-85029154712