All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Spoken Corpora of Slavic Languages

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F22%3A10456702" target="_blank" >RIV/00216208:11210/22:10456702 - isvavai.cz</a>

  • Result on the web

    <a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=qtiZaEwpEg" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=qtiZaEwpEg</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/s11185-022-09254-9" target="_blank" >10.1007/s11185-022-09254-9</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Spoken Corpora of Slavic Languages

  • Original language description

    Spoken corpora are collections of transcribed and annotated audio and /or video recordings of languages or language varieties. The aim of this paper is to present an overview of 51 spoken corpora currently available for Slavic languages and dialects, in particular Belarusian, Bulgarian, Croatian, Czech, Polish, Russian, Slovak, Slovenian, Trasianka, Ukrainian/Rusyn. We identify three groups of corpora according to the type of lect: corpora of standard languages (spoken mainly in an urban environment and existing in both written and oral form), dialects (spoken mainly in a rural environment and unwritten), and bilingual varieties (we call bilingual varieties spoken as L2 by people with different L1 languages, as well as all varieties that evolved in a multilingual environment). We survey the corpora in terms of text registers, transcription, and principles of linguistic and extralinguistic annotation. In conclusion, we suggest a list of features that linguists should take into consideration when developing a spoken corpus. Many spoken corpora are currently being created for various Slavic lects, and their developers may use this overview as a source of information on different designs and solutions.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database

  • CEP classification

  • OECD FORD branch

    60203 - Linguistics

Result continuities

  • Project

  • Continuities

Others

  • Publication year

    2022

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    Russian Linguistics

  • ISSN

    0304-3487

  • e-ISSN

    1572-8714

  • Volume of the periodical

    46

  • Issue of the periodical within the volume

    2

  • Country of publishing house

    NL - THE KINGDOM OF THE NETHERLANDS

  • Number of pages

    17

  • Pages from-to

    77-93

  • UT code for WoS article

    000827909200001

  • EID of the result in the Scopus database

    2-s2.0-85134600545