All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Compositional models for VQA: Can neural module networks really count?

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F18%3A00327912" target="_blank" >RIV/68407700:21230/18:00327912 - isvavai.cz</a>

  • Alternative codes found

    RIV/68407700:21730/18:00327912

  • Result on the web

    <a href="https://ac.els-cdn.com/S1877050918323986/1-s2.0-S1877050918323986-main.pdf?_tid=a4ba8c06-ab27-49ab-b27e-28c3ef34031c&acdnat=1549358710_448d7843295e9400663948d0d99401d8" target="_blank" >https://ac.els-cdn.com/S1877050918323986/1-s2.0-S1877050918323986-main.pdf?_tid=a4ba8c06-ab27-49ab-b27e-28c3ef34031c&acdnat=1549358710_448d7843295e9400663948d0d99401d8</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1016/j.procs.2018.11.110" target="_blank" >10.1016/j.procs.2018.11.110</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Compositional models for VQA: Can neural module networks really count?

  • Original language description

    Large neural networks trained in an end-to-end fashion usually fail to generalize over novel inputs which were not included in the training data. In contrast, biologically-inspired compositional models offer a more robust solution due to adaptive chaining of logical operations performed by specialized modules. In this paper, we present an implementation of a cognitive architecture based on the End-to-End Module Networks (N2NMNs) model [9] in the humanoid robot Pepper. The architecture is focused on the Visual Question Answering task (VQA), in which the robot answers questions regarding the seen image in natural language. We trained the system on the synthetic CLEVR dataset [10] and tested it on both synthetic images and real-world situations with CLEVR-like objects. We compare between the results and discuss the decrease of accuracy in real-world situations. Furthermore, we propose a new evaluation method, in which we test whether the model's results for counting objects in each category is consistent with the overall number of seen objects. In summary, our results show that the current visual reasoning models are still far from being applicable in everyday life.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/TJ01000470" target="_blank" >TJ01000470: Imitation learning supported by language for industrial robotics</a><br>

  • Continuities

    S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2018

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Procedia Computer Science

  • ISBN

  • ISSN

    1877-0509

  • e-ISSN

    1877-0509

  • Number of pages

    7

  • Pages from-to

    481-487

  • Publisher name

    Elsevier B.V.

  • Place of publication

    New York

  • Event location

    Praha

  • Event date

    Aug 22, 2018

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article

    000551069000073