Multi-word units in Czech Academic Texts
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F21%3A10436222" target="_blank" >RIV/00216208:11210/21:10436222 - isvavai.cz</a>
Result on the web
<a href="https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=eb3H~KjkXL" target="_blank" >https://verso.is.cuni.cz/pub/verso.fpl?fname=obd_publikace_handle&handle=eb3H~KjkXL</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.14712/23366591.2021.2.4" target="_blank" >10.14712/23366591.2021.2.4</a>
Alternative languages
Result language
čeština
Original language name
Víceslovné jednotky typické pro české akademické texty
Original language description
This paper introduces Akalex, a new online tool created to help vocabulary research in Czech academic texts. The Akalex database includes almost 60 000 n-grams - candidates for typical academic words or multiword units - and can be easily searched and filtered according to several criteria. These n-grams were extracted from the SYN2015 corpus of written contemporary Czech based on their prominent frequency in academic texts and shared occurrence in many different academic disciplines, which distinguish them from general vocabulary on one hand and specialized terminology on the other. Each n-gram in the database is also provided with additional information, such as part-of-speech, distribution in disciplines, frequency etc., making it possible to search for e.g. specific collocations with a specific lexeme (such as adjectives combined with the word výzkum 'research' or verbs with a certain preposition). The features of Akalex were put to test in our case study covering 2-grams to 6-grams used in all 24 academic disciplines included in the SYN2015 corpus. Out of almost 900 candidates, 236 were manually chosen by two annotators as typical for academic texts. These were then further analysed and split into groups based on their semantic, functional and formal features. Among the most frequent were lexical bundles, collocations with content words and combinations of two verbs pointing to a frequent use of passives in academic texts etc.
Czech name
Víceslovné jednotky typické pro české akademické texty
Czech description
This paper introduces Akalex, a new online tool created to help vocabulary research in Czech academic texts. The Akalex database includes almost 60 000 n-grams - candidates for typical academic words or multiword units - and can be easily searched and filtered according to several criteria. These n-grams were extracted from the SYN2015 corpus of written contemporary Czech based on their prominent frequency in academic texts and shared occurrence in many different academic disciplines, which distinguish them from general vocabulary on one hand and specialized terminology on the other. Each n-gram in the database is also provided with additional information, such as part-of-speech, distribution in disciplines, frequency etc., making it possible to search for e.g. specific collocations with a specific lexeme (such as adjectives combined with the word výzkum 'research' or verbs with a certain preposition). The features of Akalex were put to test in our case study covering 2-grams to 6-grams used in all 24 academic disciplines included in the SYN2015 corpus. Out of almost 900 candidates, 236 were manually chosen by two annotators as typical for academic texts. These were then further analysed and split into groups based on their semantic, functional and formal features. Among the most frequent were lexical bundles, collocations with content words and combinations of two verbs pointing to a frequent use of passives in academic texts etc.
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
60203 - Linguistics
Result continuities
Project
<a href="/en/project/EF16_019%2F0000734" target="_blank" >EF16_019/0000734: Creativity and Adaptability as Conditions of the Success of Europe in an Interrelated World</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2021
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Časopis pro moderní filologii [online]
ISSN
2336-6591
e-ISSN
—
Volume of the periodical
103
Issue of the periodical within the volume
2
Country of publishing house
CZ - CZECH REPUBLIC
Number of pages
16
Pages from-to
228-243
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85111711926