All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

CsFEVER and CTKFacts: acquiring Czech data for fact verification

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11230%2F23%3A10470715" target="_blank" >RIV/00216208:11230/23:10470715 - isvavai.cz</a>

  • Alternative codes found

    RIV/68407700:21230/23:00372837

  • Result on the web

    <a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=90debLHSb3" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=90debLHSb3</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/s10579-023-09654-3" target="_blank" >10.1007/s10579-023-09654-3</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    CsFEVER and CTKFacts: acquiring Czech data for fact verification

  • Original language description

    In this paper, we examine several methods of acquiring Czech data for automated fact-checking, which is a task commonly modeled as a classification of textual claim veracity w.r.t. a corpus of trusted ground truths. We attempt to collect sets of data in form of a factual claim, evidence within the ground truth corpus, and its veracity label (supported, refuted or not enough info). As a first attempt, we generate a Czech version of the large-scale FEVER dataset built on top of WIKIPEDIA corpus. We take a hybrid approach of machine translation and document alignment; the approach and the tools we provide can be easily applied to other languages. We discuss its weaknesses, propose a future strategy for their mitigation and publish the 127k resulting translations, as well as a version of such dataset reliably applicable for the Natural Language Inference task-the CSFEVER-NLI. Furthermore, we collect a novel dataset of 3,097 claims, which is annotated using the corpus of 2.2 M articles of Czech News Agency. We present an extended dataset annotation methodology based on the FEVER approach, and, as the underlying corpus is proprietary, we also publish a standalone version of the dataset for the task of Natural Language Inference we call CTKFACTSNLI. We analyze both acquired datasets for spurious cues-annotation patterns leading to model overfitting. CTKFACTS is further examined for inter-annotator agreement, thoroughly cleaned, and a typology of common annotator errors is extracted. Finally, we provide baseline models for all stages of the fact-checking pipeline and publish the NLI datasets, as well as our annotation platform and other experimental data.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

  • Project

    <a href="/en/project/TL02000288" target="_blank" >TL02000288: Transformation of Journalisms Ethics in the Advent of Artificial Intelligence</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2023

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Language Resources and Evaluation

  • ISSN

    1574-020X

  • e-ISSN

    1574-0218

  • Volume of the periodical

    57

  • Issue of the periodical within the volume

    4

  • Country of publishing house

    NL - THE KINGDOM OF THE NETHERLANDS

  • Number of pages

    35

  • Pages from-to

    1571-1605

  • UT code for WoS article

    000980799100007

  • EID of the result in the Scopus database

    2-s2.0-85158137690