Database Framework for a Distributed Spoken Data Collection Project

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F11%3A10103866" target="_blank" >RIV/00216208:11210/11:10103866 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Database Framework for a Distributed Spoken Data Collection Project
Popis výsledku v původním jazyce
The chapter describes the main features of database system Mluvka (chatterbox) that is used in the Czech National Corpus for collecting recordings and transcriptions of authentic spoken Czech used in informal situations. To record maximum possible variety of speakers, the material is collected in the whole country through a network of local collaborators. The system is a central data storage that reflects distributed character of the project and facilitates its organisation in various ways. In particular, it ensures formal conformance of all the submissions, it supports several levels of read/write access rights based on the collection areas and it enables continuous balancing of the collected material. Mluvka is a well-attested system lying behind both recently published corpora of authentic spoken Czech, ORAL2006 and ORAL2008. Their total size is 2 650 000 tokens including punctuation, ORAL2008 is balanced in selected sociolinguistic categories of speakers.
Název v anglickém jazyce
Database Framework for a Distributed Spoken Data Collection Project
Popis výsledku anglicky
The chapter describes the main features of database system Mluvka (chatterbox) that is used in the Czech National Corpus for collecting recordings and transcriptions of authentic spoken Czech used in informal situations. To record maximum possible variety of speakers, the material is collected in the whole country through a network of local collaborators. The system is a central data storage that reflects distributed character of the project and facilitates its organisation in various ways. In particular, it ensures formal conformance of all the submissions, it supports several levels of read/write access rights based on the collection areas and it enables continuous balancing of the collected material. Mluvka is a well-attested system lying behind both recently published corpora of authentic spoken Czech, ORAL2006 and ORAL2008. Their total size is 2 650 000 tokens including punctuation, ORAL2008 is balanced in selected sociolinguistic categories of speakers.

Klasifikace

Druh
O - Ostatní výsledky
CEP obor
AI - Jazykověda
OECD FORD obor
—

Návaznosti výsledku

Projekt
—
Návaznosti
Z - Vyzkumny zamer (s odkazem do CEZ)

Ostatní

Rok uplatnění
2011
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Podobné výsledky(10)

Balanced corpus of informal spoken Czech: compilation, design and findings ORAL2008: Nový vyvážený korpus mluvené češtiny Mapping Diatopic and Diachronic Variation in Spoken Czech: the ORTOFON and DIALEKT Corpora

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Database Framework for a Distributed Spoken Data Collection Project

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Podobné výsledky(10)