All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Yet Another Language Identifier

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F12%3A10130078" target="_blank" >RIV/00216208:11320/12:10130078 - isvavai.cz</a>

  • Result on the web

    <a href="http://aclweb.org/anthology-new/E/E12/E12-3006.pdf" target="_blank" >http://aclweb.org/anthology-new/E/E12/E12-3006.pdf</a>

  • DOI - Digital Object Identifier

Alternative languages

  • Result language

    angličtina

  • Original language name

    Yet Another Language Identifier

  • Original language description

    Language identification of written text has been studied for several decades. Despite this fact, most of the research is focused on a few most spoken languages, whereas the minor ones are ignored. The identification of a larger number of languages bringsnew difficulties that do not occur for a few languages. These difficulties are causing decreased accuracy. The objective of this paper is to investigate the sources of such degradation. In order to isolate the impact of individual factors, 5 different algorithms and 3 different number of languages are used. The Support Vector Machine algorithm achieved an accuracy of 98% for 90 languages and the YALI algorithm based on a scoring function had an accuracy of 95.4%. The YALI algorithm has slightly lower accuracy but classifies around 17 times faster and its training is more than 4000 times faster. Three different data sets with various number of languages and sample sizes were prepared to overcome the lack of standardized data sets. These

  • Czech name

  • Czech description

Classification

  • Type

    D - Article in proceedings

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/7E11042" target="_blank" >7E11042: Knowledge Helper for Medical and Other Information users</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2012

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Article name in the collection

    Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics

  • ISBN

    978-1-937284-19-0

  • ISSN

  • e-ISSN

  • Number of pages

    9

  • Pages from-to

    46-54

  • Publisher name

    Association for Computational Linguistics

  • Place of publication

    Avignon, France

  • Event location

    Avignon, France

  • Event date

    Apr 23, 2012

  • Type of event by nationality

    CST - Celostátní akce

  • UT code for WoS article