All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AVVYBGBAQ" target="_blank" >RIV/00216208:11320/25:VVYBGBAQ - isvavai.cz</a>

  • Result on the web

    <a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199101252&doi=10.1162%2fcoli_a_00519&partnerID=40&md5=51a4fc7a5078fdb617f4542610a2b591" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199101252&doi=10.1162%2fcoli_a_00519&partnerID=40&md5=51a4fc7a5078fdb617f4542610a2b591</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1162/coli_a_00519" target="_blank" >10.1162/coli_a_00519</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

  • Original language description

    While summarization has been extensively researched in natural language processing (NLP), cross-lingual cross-temporal summarization (CLCTS) is a largely unexplored area that has the potential to improve cross-cultural accessibility and understanding. This article comprehensively addresses the CLCTS task, including dataset creation, modeling, and evaluation. We (1) build the first CLCTS corpus with 328 instances for hDe-En (extended version with 455 instances) and 289 for hEn-De (extended version with 501 instances), leveraging historical fiction texts and Wikipedia summaries in English and German; (2) examine the effectiveness of popular transformer end-to-end models with different intermediate fine-tuning tasks; (3) explore the potential of GPT-3.5 as a summarizer; and (4) report evaluations from humans, GPT-4, and several recent automatic evaluation metrics. Our results indicate that intermediate task finetuned end-to-end models generate bad to moderate quality summaries while GPT-3.5, as a zero-shot summarizer, provides moderate to good quality outputs. GPT-3.5 also seems very adept at normalizing historical text. To assess data contamination in GPT-3.5, we design an adversarial attack scheme in which we find that GPT-3.5 performs slightly worse for unseen source documents compared to seen documents. Moreover, it sometimes hallucinates when the source sentences are inverted against its prior knowledge with a summarization accuracy of 0.67 for plot omission, 0.71 for entity swap, and 0.53 for plot negation. Overall, our regression results of model performances suggest that longer, older, and more complex source texts (all of which are more characteristic for historical language variants) are harder to summarize for all models, indicating the difficulty of the CLCTS task. Regarding evaluation, we observe that both the GPT-4 and BERTScore correlate moderately with human evaluations, implicating great potential for future improvement. © 2024 Association for Computational Linguistics.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

  • Continuities

Others

  • Publication year

    2024

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Computational Linguistics

  • ISSN

    0891-2017

  • e-ISSN

  • Volume of the periodical

    50

  • Issue of the periodical within the volume

    3

  • Country of publishing house

    US - UNITED STATES

  • Number of pages

    47

  • Pages from-to

    1001-1047

  • UT code for WoS article

  • EID of the result in the Scopus database

    2-s2.0-85199101252