Sarcasm Detection on Czech and English Twitter
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F14%3A43923001" target="_blank" >RIV/49777513:23520/14:43923001 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Sarcasm Detection on Czech and English Twitter
Original language description
This paper presents a machine learning approach to sarcasm detection on Twitter in two languages -- English and Czech. This is the first attempt at sarcasm detection in the Czech language. We created a large Czech Twitter corpus consisting of 7,000 manually-labelled tweets and provide it to the community. We evaluate two classifiers with various combinations of features on both the Czech and English datasets. Furthermore, we tackle the issues of rich Czech morphology by examining different pre-processing techniques. Experiments show that our language-independent approach significantly outperforms adapted state-of-the-art methods in English (F-measure 0.947) and also represents a strong baseline for further research in Czech (F-measure 0.582).
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2014
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
ISBN
978-1-941643-26-6
ISSN
—
e-ISSN
—
Number of pages
11
Pages from-to
213-223
Publisher name
neuveden
Place of publication
neuveden
Event location
Dublin
Event date
Aug 23, 2014
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—