COMICORDA: Dialogue Act Recognition in Comic Books
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F24%3A43972631" target="_blank" >RIV/49777513:23520/24:43972631 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2024.lrec-main.316/#" target="_blank" >https://aclanthology.org/2024.lrec-main.316/#</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
COMICORDA: Dialogue Act Recognition in Comic Books
Original language description
Dialogue act (DA) recognition is usually realized from a speech signal that is transcribed and segmented into text. However, only a little work in DA recognition from images exists. Therefore, this paper concentrates on this modality and presents a novel DA recognition approach for image documents, namely comic books. To the best of our knowledge, this is the first study investigating dialogue acts from comic books and represents the first steps to building a model for comic book understanding. The proposed method is composed of the following steps: speech balloon segmentation, optical character recognition (OCR), and DA recognition itself. We use YOLOv8 for balloon segmentation, Google Vision for OCR, and Transformer-based models for DA classification. The experiments are performed on a newly created dataset comprising 1,438 annotated comic panels. It contains bounding boxes, transcriptions, and dialogue act annotation. We have achieved nearly 98% average precision for speech balloon segmentation and exceeded the accuracy of 70% for the DA recognition task. We also present an analysis of dialogue structure in the comics domain and compare it with the standard DA datasets, representing another contribution of this paper.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
ISBN
978-2-493-81410-4
ISSN
2951-2093
e-ISSN
2522-2686
Number of pages
13
Pages from-to
3566-3578
Publisher name
ELRA and ICCL
Place of publication
Paris
Event location
Torino, Italy
Event date
May 20, 2024
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—