Towards a Romanian Phrasal Academic Lexicon
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3ANDEVCKX9" target="_blank" >RIV/00216208:11320/25:NDEVCKX9 - isvavai.cz</a>
Výsledek na webu
<a href="https://aclanthology.org/2024.clib-1.10" target="_blank" >https://aclanthology.org/2024.clib-1.10</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Towards a Romanian Phrasal Academic Lexicon
Popis výsledku v původním jazyce
The lack of NLP based research studies on academic writing in Romania results in an unbalanced development of automatic support tools in Romanian compared to other languages, such as English. For this study, we use Romanian subsets of two bilingual academic writing corpora: the ROGER corpus, consisting of university student papers, and the EXPRES corpus, composed of expert research articles. Working with the Romanian Academic Word List / RoAWL, we present two phrase extraction phases: (i) use Ro-AWL words as node words to extract collocations according to the thresholds of statistical measures and (ii) classify extracted phrases into general versus domain-specific multi-word units. We show how manual rhetorical function annotation of resulting phrases can be combined with automatic function detection. The comparison between academic phrases in ROGER and EXPRES validates the final phrase list. The Romanian phrasal academic lexicon (ROPAL), similar to the Oxford Phrasal Academic Lexicon (OPAL), is a written academic phrase lexicon for Romanian language made available for academic use and further research or applications.
Název v anglickém jazyce
Towards a Romanian Phrasal Academic Lexicon
Popis výsledku anglicky
The lack of NLP based research studies on academic writing in Romania results in an unbalanced development of automatic support tools in Romanian compared to other languages, such as English. For this study, we use Romanian subsets of two bilingual academic writing corpora: the ROGER corpus, consisting of university student papers, and the EXPRES corpus, composed of expert research articles. Working with the Romanian Academic Word List / RoAWL, we present two phrase extraction phases: (i) use Ro-AWL words as node words to extract collocations according to the thresholds of statistical measures and (ii) classify extracted phrases into general versus domain-specific multi-word units. We show how manual rhetorical function annotation of resulting phrases can be combined with automatic function detection. The comparison between academic phrases in ROGER and EXPRES validates the final phrase list. The Romanian phrasal academic lexicon (ROPAL), similar to the Oxford Phrasal Academic Lexicon (OPAL), is a written academic phrase lexicon for Romanian language made available for academic use and further research or applications.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024)
ISBN
—
ISSN
2367-5578
e-ISSN
—
Počet stran výsledku
7
Strana od-do
106-112
Název nakladatele
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Místo vydání
—
Místo konání akce
Sofia, Bulgaria
Datum konání akce
1. 1. 2025
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—