TEAM UFAL @ CreativeSumm 2022: BART and SamSum based few-shot approach for creative Summarization
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3A10457014" target="_blank" >RIV/00216208:11320/22:10457014 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2022.creativesumm-1.4/" target="_blank" >https://aclanthology.org/2022.creativesumm-1.4/</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
TEAM UFAL @ CreativeSumm 2022: BART and SamSum based few-shot approach for creative Summarization
Original language description
This system description paper details TEAM UFAL’s approach for the SummScreen, TVMegasite subtask of the CreativeSumm shared task. The subtask deals with creating summaries for dialogues from TV Soap operas. We utilized BART based pre-trained model fine-tuned on SamSum dialouge summarization dataset. Few examples from AutoMin dataset and the dataset provided by the organizers were also inserted into the data as a few-shot learning objective. The additional data was manually broken into chunks based on different boundaries in summary and the dialogue file. For inference we choose a similar strategy as the top-performing team at AutoMin 2021, where the data is split into chunks, either on [SCENE_CHANGE] or exceeding a pre-defined token length, to accommodate the maximum token possible in the pre-trained model for one example. The final training strategy was chosen based on how natural the responses looked instead of how well the model performed on an automated evaluation metrics such as ROGUE.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/TL03000348" target="_blank" >TL03000348: THEAITRE: Artificial intelligence as the author of a theatre play?</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů