Identification of Multiword Expressions in Tweets for Hate Speech Detection
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3A5SBMMEI4" target="_blank" >RIV/00216208:11320/22:5SBMMEI4 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2022.lrec-1.22" target="_blank" >https://aclanthology.org/2022.lrec-1.22</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Identification of Multiword Expressions in Tweets for Hate Speech Detection
Original language description
Multiword expression (MWE) identification in tweets is a complex task due to the complex linguistic nature of MWEs combined with the non-standard language use in social networks. MWE features were shown to be helpful for hate speech detection (HSD). In this article, we present joint experiments on these two related tasks on English Twitter data: first we focus on the MWE identification task, and then we observe the influence of MWE-based features on the HSD task. For MWE identification, we compare the performance of two systems: lexicon-based and deep neural networks-based (DNN). We experimentally evaluate seven configurations of a state-of-the-art DNN system based on recurrent networks using pre-trained contextual embeddings from BERT. The DNN-based system outperforms the lexicon-based one thanks to its superior generalisation power, yielding much better recall. For the HSD task, we propose a new DNN architecture for incorporating MWE features. We confirm that MWE features are helpful for the HSD task. Moreover, the proposed DNN architecture beats previous MWE-based HSD systems by 0.4 to 1.1 F-measure points on average on four Twitter HSD corpora.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Thirteenth Language Resources and Evaluation Conference
ISBN
979-10-95546-72-6
ISSN
—
e-ISSN
—
Number of pages
9
Pages from-to
202-210
Publisher name
European Language Resources Association
Place of publication
—
Event location
Marseille, France
Event date
Jan 1, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—