Named Entity Recognition in Vietnamese Tweets
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F15%3A86096567" target="_blank" >RIV/61989100:27240/15:86096567 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-319-21786-4_18" target="_blank" >http://dx.doi.org/10.1007/978-3-319-21786-4_18</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-21786-4_18" target="_blank" >10.1007/978-3-319-21786-4_18</a>
Alternative languages
Result language
angličtina
Original language name
Named Entity Recognition in Vietnamese Tweets
Original language description
Named entity recognition (NER) is a task of detecting named entities in documents and categorizing them to predefined classes such as Person (PER), Location (LOC), Organization (ORG) and so on. There have been many approaches proposed to tackle this problem in both formal texts such as news or authorized web content and short texts such as contents in online social network. However, those texts were written in languages other than Vietnamese. In this paper, we propose a method for NER in Vietnamese tweets. Since tweets on Twitter are noisy, irregular, short and consist of acronyms, spelling errors, NER in those tweets is a challenging task. Our method firstly normalizes tweets and then applies a learning model to recognize named entities using six different types of features. We built a training set of more than 40,000 named entities, and a testing set of 2,446 named entities to evaluate our system. The experiment results show that our system achieves encouraging performance with 82.3%
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Lecture Notes in Computer Science. Volume 9197
ISBN
978-3-319-21785-7
ISSN
0302-9743
e-ISSN
—
Number of pages
11
Pages from-to
205-215
Publisher name
Springer Verlag
Place of publication
London
Event location
Beijing
Event date
Aug 4, 2015
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—