Database Framework for a Distributed Spoken Data Collection Project
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F11%3A10103866" target="_blank" >RIV/00216208:11210/11:10103866 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Database Framework for a Distributed Spoken Data Collection Project
Original language description
The chapter describes the main features of database system Mluvka (chatterbox) that is used in the Czech National Corpus for collecting recordings and transcriptions of authentic spoken Czech used in informal situations. To record maximum possible variety of speakers, the material is collected in the whole country through a network of local collaborators. The system is a central data storage that reflects distributed character of the project and facilitates its organisation in various ways. In particular, it ensures formal conformance of all the submissions, it supports several levels of read/write access rights based on the collection areas and it enables continuous balancing of the collected material. Mluvka is a well-attested system lying behind both recently published corpora of authentic spoken Czech, ORAL2006 and ORAL2008. Their total size is 2 650 000 tokens including punctuation, ORAL2008 is balanced in selected sociolinguistic categories of speakers.
Czech name
—
Czech description
—
Classification
Type
O - Miscellaneous
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
—
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2011
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů