VinVL+L: Enriching Visual Representation with Location Context in VQA

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F23%3A43968165" target="_blank" >RIV/49777513:23520/23:43968165 - isvavai.cz</a>
Výsledek na webu
<a href="https://ceur-ws.org/Vol-3349/paper4.pdf" target="_blank" >https://ceur-ws.org/Vol-3349/paper4.pdf</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
VinVL+L: Enriching Visual Representation with Location Context in VQA
Popis výsledku v původním jazyce
In this paper, we describe a novel method - VinVL+L - that enriches the visual representations (i.e. object tags and region features) of the State-of-the-Art Vision and Language (VL) method - VinVL - with Location information. To verify the importance of such metadata for VL models, we (i) trained a Swin-B model on the Places365 dataset and obtained additional sets of visual and tag features; both were made public to allow reproducibility and further experiments, (ii) did an architectural update to the existing VinVL method to include the new feature sets, and (iii) provide a qualitative and quantitative evaluation. By including just binary location metadata, the VinVL+L method provides incremental improvement to the State-of-the-Art VinVL in Visual Question Answering (VQA). The VinVL+L achieved an accuracy of 64.85% and increased the performance by +0.32% in terms of accuracy on the GQA dataset; the statistical significance of the new representations is verified via Approximate Randomization. The code and newly generated sets of features are available at https://github.com/vyskocj/VinVL-L.
Název v anglickém jazyce
VinVL+L: Enriching Visual Representation with Location Context in VQA
Popis výsledku anglicky
In this paper, we describe a novel method - VinVL+L - that enriches the visual representations (i.e. object tags and region features) of the State-of-the-Art Vision and Language (VL) method - VinVL - with Location information. To verify the importance of such metadata for VL models, we (i) trained a Swin-B model on the Places365 dataset and obtained additional sets of visual and tag features; both were made public to allow reproducibility and further experiments, (ii) did an architectural update to the existing VinVL method to include the new feature sets, and (iii) provide a qualitative and quantitative evaluation. By including just binary location metadata, the VinVL+L method provides incremental improvement to the State-of-the-Art VinVL in Visual Question Answering (VQA). The VinVL+L achieved an accuracy of 64.85% and increased the performance by +0.32% in terms of accuracy on the GQA dataset; the statistical significance of the new representations is verified via Approximate Randomization. The code and newly generated sets of features are available at https://github.com/vyskocj/VinVL-L.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20205 - Automation and control systems

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
CEUR Workshop Proceedings
ISBN
—
ISSN
1613-0073
e-ISSN
—
Počet stran výsledku
9
Strana od-do
1-9
Název nakladatele
CEUR-WS
Místo vydání
Aachen
Místo konání akce
Kremže, Rakousko
Datum konání akce
15. 2. 2023
Typ akce podle státní příslušnosti
EUR - Evropská akce
Kód UT WoS článku
—

Podobné výsledky(10)

Probing the Role of Positional Information in Vision-Language Models Advancing Hungarian Text Processing with HuSpaCy: Efficient and Accurate NLP Pipelines Comprehensive Data Set for Automatic Single Camera Visual Speed Measurement

Co hledáte?

Rychlé hledání

Chytré vyhledávání

VinVL+L: Enriching Visual Representation with Location Context in VQA

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)