Compositional models for VQA: Can neural module networks really count?
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F18%3A00327912" target="_blank" >RIV/68407700:21230/18:00327912 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/68407700:21730/18:00327912
Výsledek na webu
<a href="https://ac.els-cdn.com/S1877050918323986/1-s2.0-S1877050918323986-main.pdf?_tid=a4ba8c06-ab27-49ab-b27e-28c3ef34031c&acdnat=1549358710_448d7843295e9400663948d0d99401d8" target="_blank" >https://ac.els-cdn.com/S1877050918323986/1-s2.0-S1877050918323986-main.pdf?_tid=a4ba8c06-ab27-49ab-b27e-28c3ef34031c&acdnat=1549358710_448d7843295e9400663948d0d99401d8</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.procs.2018.11.110" target="_blank" >10.1016/j.procs.2018.11.110</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Compositional models for VQA: Can neural module networks really count?
Popis výsledku v původním jazyce
Large neural networks trained in an end-to-end fashion usually fail to generalize over novel inputs which were not included in the training data. In contrast, biologically-inspired compositional models offer a more robust solution due to adaptive chaining of logical operations performed by specialized modules. In this paper, we present an implementation of a cognitive architecture based on the End-to-End Module Networks (N2NMNs) model [9] in the humanoid robot Pepper. The architecture is focused on the Visual Question Answering task (VQA), in which the robot answers questions regarding the seen image in natural language. We trained the system on the synthetic CLEVR dataset [10] and tested it on both synthetic images and real-world situations with CLEVR-like objects. We compare between the results and discuss the decrease of accuracy in real-world situations. Furthermore, we propose a new evaluation method, in which we test whether the model's results for counting objects in each category is consistent with the overall number of seen objects. In summary, our results show that the current visual reasoning models are still far from being applicable in everyday life.
Název v anglickém jazyce
Compositional models for VQA: Can neural module networks really count?
Popis výsledku anglicky
Large neural networks trained in an end-to-end fashion usually fail to generalize over novel inputs which were not included in the training data. In contrast, biologically-inspired compositional models offer a more robust solution due to adaptive chaining of logical operations performed by specialized modules. In this paper, we present an implementation of a cognitive architecture based on the End-to-End Module Networks (N2NMNs) model [9] in the humanoid robot Pepper. The architecture is focused on the Visual Question Answering task (VQA), in which the robot answers questions regarding the seen image in natural language. We trained the system on the synthetic CLEVR dataset [10] and tested it on both synthetic images and real-world situations with CLEVR-like objects. We compare between the results and discuss the decrease of accuracy in real-world situations. Furthermore, we propose a new evaluation method, in which we test whether the model's results for counting objects in each category is consistent with the overall number of seen objects. In summary, our results show that the current visual reasoning models are still far from being applicable in everyday life.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/TJ01000470" target="_blank" >TJ01000470: Imitační učení průmyslových robotů s využitím jazyka</a><br>
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2018
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Procedia Computer Science
ISBN
—
ISSN
1877-0509
e-ISSN
1877-0509
Počet stran výsledku
7
Strana od-do
481-487
Název nakladatele
Elsevier B.V.
Místo vydání
New York
Místo konání akce
Praha
Datum konání akce
22. 8. 2018
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000551069000073