Yet Another Language Identifier

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F12%3A10130078" target="_blank" >RIV/00216208:11320/12:10130078 - isvavai.cz</a>
Výsledek na webu
<a href="http://aclweb.org/anthology-new/E/E12/E12-3006.pdf" target="_blank" >http://aclweb.org/anthology-new/E/E12/E12-3006.pdf</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Yet Another Language Identifier
Popis výsledku v původním jazyce
Language identification of written text has been studied for several decades. Despite this fact, most of the research is focused on a few most spoken languages, whereas the minor ones are ignored. The identification of a larger number of languages bringsnew difficulties that do not occur for a few languages. These difficulties are causing decreased accuracy. The objective of this paper is to investigate the sources of such degradation. In order to isolate the impact of individual factors, 5 different algorithms and 3 different number of languages are used. The Support Vector Machine algorithm achieved an accuracy of 98% for 90 languages and the YALI algorithm based on a scoring function had an accuracy of 95.4%. The YALI algorithm has slightly lower accuracy but classifies around 17 times faster and its training is more than 4000 times faster. Three different data sets with various number of languages and sample sizes were prepared to overcome the lack of standardized data sets. These
Název v anglickém jazyce
Yet Another Language Identifier
Popis výsledku anglicky
Language identification of written text has been studied for several decades. Despite this fact, most of the research is focused on a few most spoken languages, whereas the minor ones are ignored. The identification of a larger number of languages bringsnew difficulties that do not occur for a few languages. These difficulties are causing decreased accuracy. The objective of this paper is to investigate the sources of such degradation. In order to isolate the impact of individual factors, 5 different algorithms and 3 different number of languages are used. The Support Vector Machine algorithm achieved an accuracy of 98% for 90 languages and the YALI algorithm based on a scoring function had an accuracy of 95.4%. The YALI algorithm has slightly lower accuracy but classifies around 17 times faster and its training is more than 4000 times faster. Three different data sets with various number of languages and sample sizes were prepared to overcome the lack of standardized data sets. These

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—

Návaznosti výsledku

Projekt
<a href="/cs/project/7E11042" target="_blank" >7E11042: Knowledge Helper for Medical and Other Information users</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2012
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
ISBN
978-1-937284-19-0
ISSN
—
e-ISSN
—
Počet stran výsledku
9
Strana od-do
46-54
Název nakladatele
Association for Computational Linguistics
Místo vydání
Avignon, France
Místo konání akce
Avignon, France
Datum konání akce
23. 4. 2012
Typ akce podle státní příslušnosti
CST - Celostátní akce
Kód UT WoS článku
—

Podobné výsledky(10)

Optimization of multilayer neural network parameters for speaker recognition Implementation of Ant Colony Algorithms in Matlab Simple and Fast Oexp(N) Algorithm for Finding an Exact Maximum Distance in E2 Instead of O(N^2) or O(N lgN)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Yet Another Language Identifier

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)