Morphological Tagging and Lemmatization of Spoken Corpora of Czech

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F23%3A10473322" target="_blank" >RIV/00216208:11210/23:10473322 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1007/978-3-031-40498-6_14" target="_blank" >https://doi.org/10.1007/978-3-031-40498-6_14</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-031-40498-6_14" target="_blank" >10.1007/978-3-031-40498-6_14</a>

Jazyk výsledku
angličtina
Název v původním jazyce
Morphological Tagging and Lemmatization of Spoken Corpora of Czech
Popis výsledku v původním jazyce
We describe the annotation of corpora of spoken Czech according to a new annotation standard valid since the publication of the SYN2020 corpus of written Czech. The standard distinguishes lemmas and sublemmas, assigns a new attribute to verb forms, deals with multi-word tokens in an appropriate way. In order to annotate the corpora of spoken Czech by the same standard, new training data for the annotation of spoken text was created and experiments with using both written and spoken data for training a neural tagger were performed.
Název v anglickém jazyce
Morphological Tagging and Lemmatization of Spoken Corpora of Czech
Popis výsledku anglicky
We describe the annotation of corpora of spoken Czech according to a new annotation standard valid since the publication of the SYN2020 corpus of written Czech. The standard distinguishes lemmas and sublemmas, assigns a new attribute to verb forms, deals with multi-word tokens in an appropriate way. In order to annotate the corpora of spoken Czech by the same standard, new training data for the annotation of spoken text was created and experiments with using both written and spoken data for training a neural tagger were performed.

Projekt
<a href="/cs/project/LM2023044" target="_blank" >LM2023044: Český národní korpus</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Název statě ve sborníku
Text, Speech, and Dialogue Lecture Notes in Computer Science
ISBN
978-3-031-40497-9
ISSN
0302-9743
e-ISSN
1611-3349
Počet stran výsledku
10
Strana od-do
154-163
Název nakladatele
Springer, Cham
Místo vydání
Cham, Switzerland
Místo konání akce
Plzeň
Datum konání akce
4. 9. 2023
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)