Comparison of spoken corpora from a sociolinguistic perspective
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F17%3A10367178" target="_blank" >RIV/00216208:11210/17:10367178 - isvavai.cz</a>
Výsledek na webu
<a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=vNsq4La4mi" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=vNsq4La4mi</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Comparison of spoken corpora from a sociolinguistic perspective
Popis výsledku v původním jazyce
This paper presents a comparison of the largest contemporary corpus of spoken Czech ORAL2013 and a different source, data gathered in the project "Sociolinguistic Analysis of the Use of Prothetic v- in Bohemia" (SAUP). Both of these data sources consist of informal interviews with Czech speakers, but their design is different. ORAL2013 is based on shorter recordings of many speakers whereas the SAUP data is based on longer recordings of fewer speakers. It is assumed that these two data sources should yield similar results since they aim to represent the same population. The comparison is based on the use of two features of spoken Czech in the Bohemia region: prothetic v- and conditional verb forms bych/bysem and bychom/bysme. Based on the analysis, it is concluded that (1) more information about the speakers should be added to future corpora like ORAL2013; (2) the corpus ORAL2013 is useful to conduct a sociolinguistic pilot study which then should be followed by a full-scale research project based on a different sample constructed strictly for the purposes of the particular research; (3) the ratio between the number of speakers in the corpus and the amount of their speech is an important (and often underestimated) aspect of corpus design which should be given careful consideration.
Název v anglickém jazyce
Comparison of spoken corpora from a sociolinguistic perspective
Popis výsledku anglicky
This paper presents a comparison of the largest contemporary corpus of spoken Czech ORAL2013 and a different source, data gathered in the project "Sociolinguistic Analysis of the Use of Prothetic v- in Bohemia" (SAUP). Both of these data sources consist of informal interviews with Czech speakers, but their design is different. ORAL2013 is based on shorter recordings of many speakers whereas the SAUP data is based on longer recordings of fewer speakers. It is assumed that these two data sources should yield similar results since they aim to represent the same population. The comparison is based on the use of two features of spoken Czech in the Bohemia region: prothetic v- and conditional verb forms bych/bysem and bychom/bysme. Based on the analysis, it is concluded that (1) more information about the speakers should be added to future corpora like ORAL2013; (2) the corpus ORAL2013 is useful to conduct a sociolinguistic pilot study which then should be followed by a full-scale research project based on a different sample constructed strictly for the purposes of the particular research; (3) the ratio between the number of speakers in the corpus and the amount of their speech is an important (and often underestimated) aspect of corpus design which should be given careful consideration.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
60203 - Linguistics
Návaznosti výsledku
Projekt
<a href="/cs/project/GP13-12973P" target="_blank" >GP13-12973P: Sociolingvistická analýza užívání protetického /v/ v Čechách</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Slovo a slovesnost
ISSN
0037-7031
e-ISSN
—
Svazek periodika
78
Číslo periodika v rámci svazku
2
Stát vydavatele periodika
CZ - Česká republika
Počet stran výsledku
14
Strana od-do
145-158
Kód UT WoS článku
000402434300003
EID výsledku v databázi Scopus
—