Exploring the Relationship between Dataset Size and Image Captioning Model Performance

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F23%3A43968139" target="_blank" >RIV/49777513:23520/23:43968139 - isvavai.cz</a>
Result on the web
<a href="https://ceur-ws.org/Vol-3349/paper6.pdf" target="_blank" >https://ceur-ws.org/Vol-3349/paper6.pdf</a>
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
Exploring the Relationship between Dataset Size and Image Captioning Model Performance
Original language description
Image captioning is a deep learning task that involves computer vision methods to extract visual information from the image and also natural language processing to generate the result caption in natural language. Image captioning models, just like other deep learning models, need a large amount of training data and require a long time to train. In this work, we investigate the impact of using a smaller amount of training data on the performance of the standard image captioning model Oscar. We train Oscar on different sizes of the training dataset and measure its performance in terms of accuracy and computational complexity. We observe that the computational time increases linearly with the amount of data used for training. However, the accuracy does not follow this linear trend and the relative improvement diminishes as we add more data to the training. We also measure the consistency of individual sizes of the training sets and observe that the more data we use for training the more consistent the metrics are. In addition to traditional evaluation metrics, we evaluate the performance using CLIP similarity. We investigate whether it can be used as a fully-fledged metric providing a unique advantage over the traditional metrics; it does not need reference captions that had to be acquired by human annotators. Our results show a high correlation between CLIP with the other metrics. This work provides valuable insights for understanding the requirements for training effective image captioning models. We believe our results can be transferred to other models, even in other deep-learning tasks. © 2023 Copyright for this paper by its authors.
Czech name
—
Czech description
—

Classification

Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
20205 - Automation and control systems

Result continuities

Project
—
Continuities
S - Specificky vyzkum na vysokych skolach

Others

Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Article name in the collection
CEUR Workshop Proceedings
ISBN
—
ISSN
1613-0073
e-ISSN
—
Number of pages
8
Pages from-to
1-8
Publisher name
CEUR-WS
Place of publication
Aachen
Event location
Krems a.d. Donau, Rakousko
Event date
Feb 15, 2023
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—

Similar results(10)

Exploring the Relationship between Dataset Size and Image Captioning Model Performance Fundamentals of Deep Learning for Multiple Data Types (PTC Course)No-Reference Image Quality Assessment Using Meta-Learning

What are you looking for?

Quick search

Smart search

Exploring the Relationship between Dataset Size and Image Captioning Model Performance

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)