Named Entity Recognition in Vietnamese Tweets
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F15%3A86096567" target="_blank" >RIV/61989100:27240/15:86096567 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1007/978-3-319-21786-4_18" target="_blank" >http://dx.doi.org/10.1007/978-3-319-21786-4_18</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-21786-4_18" target="_blank" >10.1007/978-3-319-21786-4_18</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Named Entity Recognition in Vietnamese Tweets
Popis výsledku v původním jazyce
Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes such as Person (PER), Location (LOC), Organization (ORG) and so on. There have been many approaches proposed to tackle this problem in both formal texts such as news or authorized web content and short texts such as contents in online social network. However, those texts were written in languages other than Vietnamese. In this paper, we propose a method for NER in Vietnamese tweets. Since tweets on Twitter are noisy, irregular, short and consist of acronyms, spelling errors, NER in those tweets is a challenging task. Our method firstly normalizes tweets and then applies a learning model to recognize named entities using six different types of features. We built a training set of more than 40,000 named entities, and a testing set of 2,446 named entities to evaluate our system. The experiment results show that our system achieves encouraging performance with 82.3%
Název v anglickém jazyce
Named Entity Recognition in Vietnamese Tweets
Popis výsledku anglicky
Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes such as Person (PER), Location (LOC), Organization (ORG) and so on. There have been many approaches proposed to tackle this problem in both formal texts such as news or authorized web content and short texts such as contents in online social network. However, those texts were written in languages other than Vietnamese. In this paper, we propose a method for NER in Vietnamese tweets. Since tweets on Twitter are noisy, irregular, short and consist of acronyms, spelling errors, NER in those tweets is a challenging task. Our method firstly normalizes tweets and then applies a learning model to recognize named entities using six different types of features. We built a training set of more than 40,000 named entities, and a testing set of 2,446 named entities to evaluate our system. The experiment results show that our system achieves encouraging performance with 82.3%
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
IN - Informatika
OECD FORD obor
—
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Lecture Notes in Computer Science. Volume 9197
ISBN
978-3-319-21785-7
ISSN
0302-9743
e-ISSN
—
Počet stran výsledku
11
Strana od-do
205-215
Název nakladatele
Springer Verlag
Místo vydání
London
Místo konání akce
Beijing
Datum konání akce
4. 8. 2015
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—