Normalization of Vietnamese Tweets on Twitter
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F15%3A86096573" target="_blank" >RIV/61989100:27240/15:86096573 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-319-21206-7_16" target="_blank" >http://dx.doi.org/10.1007/978-3-319-21206-7_16</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-21206-7_16" target="_blank" >10.1007/978-3-319-21206-7_16</a>
Alternative languages
Result language
angličtina
Original language name
Normalization of Vietnamese Tweets on Twitter
Original language description
We study a task of noisy text normalization focusing on Vietnamese tweets. This task aims to improve the performance of applications mining or analyzing semantics of social media contents as well as other social network analysis applications. Since tweets on Twitter are noisy, irregular, short and consist of acronym, spelling errors, processing those tweets is more challenging than that of news or formal texts. In this paper, we proposed a method that aims to normalize Vietnamese tweets by detecting non-standard words as well as spelling errors and correcting them. The method combines a language model with dictionaries and Vietnamese vocabulary structures. We build a dataset including 1,360 Vietnamese tweets to evaluate the proposed method. Experimentresults show that our method achieved encouraging performance with 89% F1-Score. (C) Springer International Publishing Switzerland 2015.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Advances in intelligent systems and computing. Volume 370
ISBN
978-3-319-21205-0
ISSN
2194-5357
e-ISSN
—
Number of pages
11
Pages from-to
1789-189
Publisher name
Springer
Place of publication
Basel
Event location
Ostrava
Event date
Jun 29, 2015
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—