EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A90244%2F24%3A10495719" target="_blank" >RIV/00216208:90244/24:10495719 - isvavai.cz</a>
Výsledek na webu
<a href="https://aclanthology.org/2024.bucc-1.10.pdf" target="_blank" >https://aclanthology.org/2024.bucc-1.10.pdf</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research
Popis výsledku v původním jazyce
This paper gives an overview of recent developments concerning the European Reference Corpus EuReCo, an open long-term initiative aimed at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the problems and shortcomings of other types of multilingual corpora - such as the shining-through effects in parallel corpora or the limitation to web material only in web-based comparable corpora - EuReCo constitutes a unique linguistic resource that offers new perspectives for fine-grained cross-linguistic research. The approach advocated here puts forward new solutions to notorious IPR and licensing issues, as well as to challenges of interoperability. It also addresses methodological questions concerning comparability and representativeness. While the focus of this paper is on EuReCo's implementation-based approach to ensuring interoperability in a feasible and maintainable way, it also presents preliminary results of pilot comparative studies on light verb constructions in German, Romanian, Hungarian, Polish and Bulgarian, and reports on recent extensions and plans.
Název v anglickém jazyce
EuReCo: Not Building and Yet Using Federated Comparable Corpora for Cross-Linguistic Research
Popis výsledku anglicky
This paper gives an overview of recent developments concerning the European Reference Corpus EuReCo, an open long-term initiative aimed at providing and using virtual and dynamically definable comparable corpora based on existing national, reference or other large corpora. Given the problems and shortcomings of other types of multilingual corpora - such as the shining-through effects in parallel corpora or the limitation to web material only in web-based comparable corpora - EuReCo constitutes a unique linguistic resource that offers new perspectives for fine-grained cross-linguistic research. The approach advocated here puts forward new solutions to notorious IPR and licensing issues, as well as to challenges of interoperability. It also addresses methodological questions concerning comparability and representativeness. While the focus of this paper is on EuReCo's implementation-based approach to ensuring interoperability in a feasible and maintainable way, it also presents preliminary results of pilot comparative studies on light verb constructions in German, Romanian, Hungarian, Polish and Bulgarian, and reports on recent extensions and plans.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
60203 - Linguistics
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the 17th Workshop on Building and Using Comparable Corpora
ISBN
978-2-493-81431-9
ISSN
—
e-ISSN
—
Počet stran výsledku
10
Strana od-do
94-103
Název nakladatele
ELRA
Místo vydání
Torino
Místo konání akce
Torino
Datum konání akce
20. 5. 2024
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—