Database Framework for a Distributed Spoken Data Collection Project
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F11%3A10103866" target="_blank" >RIV/00216208:11210/11:10103866 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Database Framework for a Distributed Spoken Data Collection Project
Popis výsledku v původním jazyce
The chapter describes the main features of database system Mluvka (chatterbox) that is used in the Czech National Corpus for collecting recordings and transcriptions of authentic spoken Czech used in informal situations. To record maximum possible variety of speakers, the material is collected in the whole country through a network of local collaborators. The system is a central data storage that reflects distributed character of the project and facilitates its organisation in various ways. In particular, it ensures formal conformance of all the submissions, it supports several levels of read/write access rights based on the collection areas and it enables continuous balancing of the collected material. Mluvka is a well-attested system lying behind both recently published corpora of authentic spoken Czech, ORAL2006 and ORAL2008. Their total size is 2 650 000 tokens including punctuation, ORAL2008 is balanced in selected sociolinguistic categories of speakers.
Název v anglickém jazyce
Database Framework for a Distributed Spoken Data Collection Project
Popis výsledku anglicky
The chapter describes the main features of database system Mluvka (chatterbox) that is used in the Czech National Corpus for collecting recordings and transcriptions of authentic spoken Czech used in informal situations. To record maximum possible variety of speakers, the material is collected in the whole country through a network of local collaborators. The system is a central data storage that reflects distributed character of the project and facilitates its organisation in various ways. In particular, it ensures formal conformance of all the submissions, it supports several levels of read/write access rights based on the collection areas and it enables continuous balancing of the collected material. Mluvka is a well-attested system lying behind both recently published corpora of authentic spoken Czech, ORAL2006 and ORAL2008. Their total size is 2 650 000 tokens including punctuation, ORAL2008 is balanced in selected sociolinguistic categories of speakers.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
AI - Jazykověda
OECD FORD obor
—
Návaznosti výsledku
Projekt
—
Návaznosti
Z - Vyzkumny zamer (s odkazem do CEZ)
Ostatní
Rok uplatnění
2011
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů