Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Exploring the Relationship between Dataset Size and Image Captioning Model Performance

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F23%3A43968139" target="_blank" >RIV/49777513:23520/23:43968139 - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://ceur-ws.org/Vol-3349/paper6.pdf" target="_blank" >https://ceur-ws.org/Vol-3349/paper6.pdf</a>

  • DOI - Digital Object Identifier

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Exploring the Relationship between Dataset Size and Image Captioning Model Performance

  • Popis výsledku v původním jazyce

    Image captioning is a deep learning task that involves computer vision methods to extract visual information from the image and also natural language processing to generate the result caption in natural language. Image captioning models, just like other deep learning models, need a large amount of training data and require a long time to train. In this work, we investigate the impact of using a smaller amount of training data on the performance of the standard image captioning model Oscar. We train Oscar on different sizes of the training dataset and measure its performance in terms of accuracy and computational complexity. We observe that the computational time increases linearly with the amount of data used for training. However, the accuracy does not follow this linear trend and the relative improvement diminishes as we add more data to the training. We also measure the consistency of individual sizes of the training sets and observe that the more data we use for training the more consistent the metrics are. In addition to traditional evaluation metrics, we evaluate the performance using CLIP similarity. We investigate whether it can be used as a fully-fledged metric providing a unique advantage over the traditional metrics; it does not need reference captions that had to be acquired by human annotators. Our results show a high correlation between CLIP with the other metrics. This work provides valuable insights for understanding the requirements for training effective image captioning models. We believe our results can be transferred to other models, even in other deep-learning tasks. © 2023 Copyright for this paper by its authors.

  • Název v anglickém jazyce

    Exploring the Relationship between Dataset Size and Image Captioning Model Performance

  • Popis výsledku anglicky

    Image captioning is a deep learning task that involves computer vision methods to extract visual information from the image and also natural language processing to generate the result caption in natural language. Image captioning models, just like other deep learning models, need a large amount of training data and require a long time to train. In this work, we investigate the impact of using a smaller amount of training data on the performance of the standard image captioning model Oscar. We train Oscar on different sizes of the training dataset and measure its performance in terms of accuracy and computational complexity. We observe that the computational time increases linearly with the amount of data used for training. However, the accuracy does not follow this linear trend and the relative improvement diminishes as we add more data to the training. We also measure the consistency of individual sizes of the training sets and observe that the more data we use for training the more consistent the metrics are. In addition to traditional evaluation metrics, we evaluate the performance using CLIP similarity. We investigate whether it can be used as a fully-fledged metric providing a unique advantage over the traditional metrics; it does not need reference captions that had to be acquired by human annotators. Our results show a high correlation between CLIP with the other metrics. This work provides valuable insights for understanding the requirements for training effective image captioning models. We believe our results can be transferred to other models, even in other deep-learning tasks. © 2023 Copyright for this paper by its authors.

Klasifikace

  • Druh

    D - Stať ve sborníku

  • CEP obor

  • OECD FORD obor

    20205 - Automation and control systems

Návaznosti výsledku

  • Projekt

  • Návaznosti

    S - Specificky vyzkum na vysokych skolach

Ostatní

  • Rok uplatnění

    2023

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název statě ve sborníku

    CEUR Workshop Proceedings

  • ISBN

  • ISSN

    1613-0073

  • e-ISSN

  • Počet stran výsledku

    8

  • Strana od-do

    1-8

  • Název nakladatele

    CEUR-WS

  • Místo vydání

    Aachen

  • Místo konání akce

    Krems a.d. Donau, Rakousko

  • Datum konání akce

    15. 2. 2023

  • Typ akce podle státní příslušnosti

    WRD - Celosvětová akce

  • Kód UT WoS článku