Detecting Unseen Visual Relations Using Analogies

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21730%2F19%3A00337253" target="_blank" >RIV/68407700:21730/19:00337253 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1109/ICCV.2019.00207" target="_blank" >https://doi.org/10.1109/ICCV.2019.00207</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ICCV.2019.00207" target="_blank" >10.1109/ICCV.2019.00207</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Detecting Unseen Visual Relations Using Analogies
Popis výsledku v původním jazyce
We seek to detect visual relations in images of the form of tripletst= (subject, predicate, object), such as “person riding dog”, where training examples of the individual entities are available but their combinations are unseen at training. This is an important set-up due to the combinatorial nature of visual relations: collecting sufficient training data for all possible triplets would be very hard. The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with(ii) a visual phrase embedding that represents the relation triplets. Second, we learn how to transfer visual phrase em-beddings from existing training triplets to unseen test triplets using analogies between relations that involve similar ob-jects. Third, we demonstrate the benefits of our approach on three challenging datasets: on HICO-DET, our model achieves significant improvement over a strong baseline for both frequent and unseen triplets, and we observe similar improvement for the retrieval of unseen triplets with out-of-vocabulary predicates on the COCO-a dataset as well as the challenging unusual triplets in the UnRel dataset.
Název v anglickém jazyce
Detecting Unseen Visual Relations Using Analogies
Popis výsledku anglicky
We seek to detect visual relations in images of the form of tripletst= (subject, predicate, object), such as “person riding dog”, where training examples of the individual entities are available but their combinations are unseen at training. This is an important set-up due to the combinatorial nature of visual relations: collecting sufficient training data for all possible triplets would be very hard. The contributions of this work are three-fold. First, we learn a representation of visual relations that combines (i) individual embeddings for subject, object and predicate together with(ii) a visual phrase embedding that represents the relation triplets. Second, we learn how to transfer visual phrase em-beddings from existing training triplets to unseen test triplets using analogies between relations that involve similar ob-jects. Third, we demonstrate the benefits of our approach on three challenging datasets: on HICO-DET, our model achieves significant improvement over a strong baseline for both frequent and unseen triplets, and we observe similar improvement for the retrieval of unseen triplets with out-of-vocabulary predicates on the COCO-a dataset as well as the challenging unusual triplets in the UnRel dataset.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/EF15_003%2F0000468" target="_blank" >EF15_003/0000468: Inteligentní strojové vnímání</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2019
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
2019 IEEE International Conference on Computer Vision (ICCV 2019)
ISBN
978-1-7281-4804-5
ISSN
1550-5499
e-ISSN
2380-7504
Počet stran výsledku
10
Strana od-do
1981-1990
Název nakladatele
IEEE Computer Society Press
Místo vydání
Los Alamitos
Místo konání akce
Seoul
Datum konání akce
27. 10. 2019
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000531438102012

Podobné výsledky(10)

Weakly-Supervised Learning of Visual Relations Meta-Personalizing Vision-Language Models to Find Named Instances in Video Weakly Supervised Human-Object Interaction Detection in Video via Contrastive Spatiotemporal Regions

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Detecting Unseen Visual Relations Using Analogies

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)