Fantastic Examples and Where to Find Them - Compiling Czech Dataset for Evaluating Dictionary Examples

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F24%3A00137913" target="_blank" >RIV/00216224:14330/24:00137913 - isvavai.cz</a>
Výsledek na webu
<a href="https://raslan2024.nlp-consulting.net/" target="_blank" >https://raslan2024.nlp-consulting.net/</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Fantastic Examples and Where to Find Them - Compiling Czech Dataset for Evaluating Dictionary Examples
Popis výsledku v původním jazyce
Examples are an important part of a dictionary entry, helping users better understand the word and its usage in context. However, selecting good examples is a challenging and time-consuming task due to varying selection criteria and the vast amount of data to choose from. While different tools have been developed to address this, evaluation remains flawed and lacks standardisation. In this paper, we compile an evaluation dataset for the Czech language, using the GDEX tool and manual annotations to classify examples and explain the classification. Based on our findings, we propose general annotation guidelines to improve consistency. This dataset serves as a foundation for the unified evaluation of dictionary example scoring tools and opens discussion on how to annotate examples. Additionally, we make the dataset publicly available.
Název v anglickém jazyce
Fantastic Examples and Where to Find Them - Compiling Czech Dataset for Evaluating Dictionary Examples
Popis výsledku anglicky
Examples are an important part of a dictionary entry, helping users better understand the word and its usage in context. However, selecting good examples is a challenging and time-consuming task due to varying selection criteria and the vast amount of data to choose from. While different tools have been developed to address this, evaluation remains flawed and lacks standardisation. In this paper, we compile an evaluation dataset for the Czech language, using the GDEX tool and manual annotations to classify examples and explain the classification. Based on our findings, we propose general annotation guidelines to improve consistency. This dataset serves as a foundation for the unified evaluation of dictionary example scoring tools and opens discussion on how to annotate examples. Additionally, we make the dataset publicly available.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the Eighteenth Workshop on Recent Advances in Slavonic Natural Languages Processing
ISBN
9788026318354
ISSN
2336-4289
e-ISSN
—
Počet stran výsledku
10
Strana od-do
37-46
Název nakladatele
Tribun EU
Místo vydání
Brno
Místo konání akce
Brno
Datum konání akce
1. 1. 2024
Typ akce podle státní příslušnosti
CST - Celostátní akce
Kód UT WoS článku
—

Podobné výsledky(10)

GDEX: Automatické vyhledávání dobrých slovníkových příkladů v korpusu Does Size Matter? - Comparing Evaluation Dataset Size for the Bilingual Lexicon Induction Software and Data for Corpus Pattern Analysis

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Fantastic Examples and Where to Find Them - Compiling Czech Dataset for Evaluating Dictionary Examples

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)