Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Calc: Corpus Calculator

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11210%2F19%3A10402271" target="_blank" >RIV/00216208:11210/19:10402271 - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://wiki.korpus.cz/doku.php/manualy:calc" target="_blank" >https://wiki.korpus.cz/doku.php/manualy:calc</a>

  • DOI - Digital Object Identifier

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Calc: Corpus Calculator

  • Popis výsledku v původním jazyce

    This calculator should provide quick support to corpus users when calculating basic statistical tasks most commonly encountered in research. The app is divided into 7 modules reflecting specific research problems: The first module &quot;1 word in 1 corpus&quot; does not in fact do any statistical testing as such but serves as a tool for adequate frequency interpretation. It should help you answer the question: What exactly does it mean, when a feature I am interested in has a frequency X in a corpus? The second module &quot;2 words in 1 corpus&quot; compares two frequencies (e.g. two competing variants in one corpus) and determines, how significant their difference really is and whether their difference is not just a result of a random variation. The typical use case of the third module &quot;2 words in 2 corpora&quot; is the identification of keywords - units that are significantly more common in one of two corpora (considering the size of the two corpora). It is, however, useful for any comparison of frequencies across different corpora. The fourth module &quot;1 feature - more samples&quot; helps in establishing the level of precision and reliability of a random sample analysis. If the resulting range of variation (for the feature under scrutiny) is too broad, it might be advisable to improve the precision by adding more samples. Module &quot;Many features - 1 sample&quot; is used to estimate the occurrence of groups of features (e.g. word senses) in an analysed sample or concordance. It can be used to show whether one group is truly more frequent than the other or whether a group can be considered to be actually attested. The sixth module labeled &quot;zTTR&quot; compares the lexical richness (the length of the sample in relation to the number of lexical units it contains) of texts. Its advantage is that the resulting zTTR index is comparable even between texts of different length. When comparing multi-word units between two languages, we often face a problem of n-gram length (non-)correspondence. The seventh module &quot;N-gram correspondence&quot; is used to investigate the N-grams correspondence by showing what is the actual counterpart of e.g. a list of the most frequent bigrams in one language when compared to another language.

  • Název v anglickém jazyce

    Calc: Corpus Calculator

  • Popis výsledku anglicky

    This calculator should provide quick support to corpus users when calculating basic statistical tasks most commonly encountered in research. The app is divided into 7 modules reflecting specific research problems: The first module &quot;1 word in 1 corpus&quot; does not in fact do any statistical testing as such but serves as a tool for adequate frequency interpretation. It should help you answer the question: What exactly does it mean, when a feature I am interested in has a frequency X in a corpus? The second module &quot;2 words in 1 corpus&quot; compares two frequencies (e.g. two competing variants in one corpus) and determines, how significant their difference really is and whether their difference is not just a result of a random variation. The typical use case of the third module &quot;2 words in 2 corpora&quot; is the identification of keywords - units that are significantly more common in one of two corpora (considering the size of the two corpora). It is, however, useful for any comparison of frequencies across different corpora. The fourth module &quot;1 feature - more samples&quot; helps in establishing the level of precision and reliability of a random sample analysis. If the resulting range of variation (for the feature under scrutiny) is too broad, it might be advisable to improve the precision by adding more samples. Module &quot;Many features - 1 sample&quot; is used to estimate the occurrence of groups of features (e.g. word senses) in an analysed sample or concordance. It can be used to show whether one group is truly more frequent than the other or whether a group can be considered to be actually attested. The sixth module labeled &quot;zTTR&quot; compares the lexical richness (the length of the sample in relation to the number of lexical units it contains) of texts. Its advantage is that the resulting zTTR index is comparable even between texts of different length. When comparing multi-word units between two languages, we often face a problem of n-gram length (non-)correspondence. The seventh module &quot;N-gram correspondence&quot; is used to investigate the N-grams correspondence by showing what is the actual counterpart of e.g. a list of the most frequent bigrams in one language when compared to another language.

Klasifikace

  • Druh

    R - Software

  • CEP obor

  • OECD FORD obor

    60203 - Linguistics

Návaznosti výsledku

  • Projekt

    <a href="/cs/project/LM2015044" target="_blank" >LM2015044: Český národní korpus</a><br>

  • Návaznosti

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

  • Rok uplatnění

    2019

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Interní identifikační kód produktu

    Calc

  • Technické parametry

    On-line webová multiuživatelská aplikace v provozu na http://www.korpus.cz/calc

  • Ekonomické parametry

    Aplikace je volně přístupná na základě licence GNU GPL 2 a nevytváří primárně žádný hmotný zisk.

  • IČO vlastníka výsledku

    00216208

  • Název vlastníka

    Univerzita Karlova