Experimental tagging of the ORAL series corpora : Insights on using a stochastic tagger

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F15%3A10297728" target="_blank" >RIV/00216208:11210/15:10297728 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-319-24033-6" target="_blank" >http://dx.doi.org/10.1007/978-3-319-24033-6</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-24033-6" target="_blank" >10.1007/978-3-319-24033-6</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Experimental tagging of the ORAL series corpora : Insights on using a stochastic tagger
Popis výsledku v původním jazyce
The ORAL series corpora of spontaneous spoken Czech currently contain neither lemmatization nor part of speech tagging. The main reason for this is that readily available NLP tools, designed primarily with written texts in mind, underperform when applieddirectly to speech transcripts, due to various morpohological and syntactic specificities of informal spoken language and the ways these are captured in transcription. Recently, the highly optimized open-source MorphoDiTa toolchain for training and applying stochastic tagging models was released; MorphoDiTa makes it easy and fast to experiment with incremental changes in the training procedure. The article discusses modifications to the morphological dictionary and training data used by the models which are necessary in order to improve their performance on the ORAL series corpora, as well as challenges which remain to be solved.
Název v anglickém jazyce
Experimental tagging of the ORAL series corpora : Insights on using a stochastic tagger
Popis výsledku anglicky
The ORAL series corpora of spontaneous spoken Czech currently contain neither lemmatization nor part of speech tagging. The main reason for this is that readily available NLP tools, designed primarily with written texts in mind, underperform when applieddirectly to speech transcripts, due to various morpohological and syntactic specificities of informal spoken language and the ways these are captured in transcription. Recently, the highly optimized open-source MorphoDiTa toolchain for training and applying stochastic tagging models was released; MorphoDiTa makes it easy and fast to experiment with incremental changes in the training procedure. The article discusses modifications to the morphological dictionary and training data used by the models which are necessary in order to improve their performance on the ORAL series corpora, as well as challenges which remain to be solved.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
AI - Jazykověda
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/LM2011023" target="_blank" >LM2011023: Český národní korpus</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Text, Speech, and Dialogue
ISBN
978-3-319-24032-9
ISSN
—
e-ISSN
—
Počet stran výsledku
9
Strana od-do
388-396
Název nakladatele
Springer International Publishing
Místo vydání
Switzerland
Místo konání akce
Plzeň
Datum konání akce
14. 9. 2015
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

ORTOFON: korpus neformální mluvené češtiny s víceúrovňovým přepisem Mapping Diatopic and Diachronic Variation in Spoken Czech: the ORTOFON and DIALEKT Corpora TSD 2016, 19th International Conference on Text, Speech and Dialogue

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Experimental tagging of the ORAL series corpora : Insights on using a stochastic tagger

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)