All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Genre Annotation of Web Corpora: Scheme and Issues

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F21%3A00118741" target="_blank" >RIV/00216224:14330/21:00118741 - isvavai.cz</a>

  • Result on the web

    <a href="https://link.springer.com/book/10.1007/978-3-030-63128-4" target="_blank" >https://link.springer.com/book/10.1007/978-3-030-63128-4</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/978-3-030-63128-4_55" target="_blank" >10.1007/978-3-030-63128-4_55</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Genre Annotation of Web Corpora: Scheme and Issues

  • Original language description

    Unlike traditional corpora made from printed media in the past decades, sources of web corpora are not categorised and described well, thus making it difficult to control the content of the corpus. This paper presents an attempt to classify genres in a large English web corpus through supervised learning. A set of genres suitable for web corpora users is defined based on a research of related work. A genre annotation scheme with active learning rounds is introduced. A collection of web pages representing various genres that was created for this task and a scheme of consequent human annotation of the data set is described. Measuring the inter-annotator agreement revealed that either the problem may not be well defined, or that our expectations concerning the precision and recall of the classifier cannot be met. Eventually, the project was postponed at that point. Possible solutions of the issue are discussed at the end of the paper.

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

  • OECD FORD branch

    60203 - Linguistics

Result continuities

  • Project

    <a href="/en/project/GA18-23891S" target="_blank" >GA18-23891S: Hyperintensional Reasoning over Natural Language Texts</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach

Others

  • Publication year

    2021

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1

  • ISBN

    9783030631277

  • ISSN

    2194-5357

  • e-ISSN

    2194-5365

  • Number of pages

    17

  • Pages from-to

    738-754

  • Publisher name

    Springer Nature Switzerland AG

  • Place of publication

    Vancouver, Canada

  • Event location

    Vancouver, Canada

  • Event date

    Nov 5, 2020

  • Type of event by nationality

    WRD - Celosvětová akce

  • UT code for WoS article