Annotation scheme and evaluation: the case of OFFENSIVE language
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14410%2F23%3A00132528" target="_blank" >RIV/00216224:14410/23:00132528 - isvavai.cz</a>
Výsledek na webu
<a href="https://hrcak.srce.hr/clanak/444602" target="_blank" >https://hrcak.srce.hr/clanak/444602</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.31724/rihjj.49.1.8" target="_blank" >10.31724/rihjj.49.1.8</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Annotation scheme and evaluation: the case of OFFENSIVE language
Popis výsledku v původním jazyce
The present paper focuses on the presentation and discussion of aspects of OFFENSIVE LANGUAGE linguistic annotation, including creation, annotation practice, curation, and evaluation of an OFFENSIVE LANGUAGE annotation taxonomy scheme, first proposed in Lewandowska-Tomaszczyk et al. (2021). An extended offensive language ontology comprising 17 categories, structured in terms of 4 hierarchical levels, has been shown to represent the encoding of the defined offensive language schema, trained in terms of non-contextual word embeddings – i.e., Word2Vec and Fast Text, and eventually juxtaposed to the data acquired by using a pairwise training and testing analysis for existing categories in the HateBERT model (Lewandowska-Tomaszczyk et al. submitted). The study reports on the annotation practice in WG 4.1.1. Incivility in media and social media in the context of COST Action CA 18209 European network for Web-centred linguistic data science (Nexus Linguarum) with 2 the INCEpTION tool (https://github.com/inception-project/inception) – a semantic annotation platform offering assistance in annotation. The results partly support the proposed ontology of explicit offence and positive implicitness types to provide more variance among widely recognized types of figurative language (e.g., metaphorical, metonymic, ironic, etc.). The use of the annotation system and the representation of linguistic data have also been evaluated in a series of the annotators’ comments, using a questionnaire method and in an open discussion. The annotation results and the questionnaire showed that for some of the categories, there was low or medium inter-annotator agreement, and it was more challenging for annotators to distinguish between category items than between aspect items, with the category items of offensive, insulting and abusive being the most difficult in this respect. The need for taxonomic simplification measures in this respect has been recognized for further annotation practices.
Název v anglickém jazyce
Annotation scheme and evaluation: the case of OFFENSIVE language
Popis výsledku anglicky
The present paper focuses on the presentation and discussion of aspects of OFFENSIVE LANGUAGE linguistic annotation, including creation, annotation practice, curation, and evaluation of an OFFENSIVE LANGUAGE annotation taxonomy scheme, first proposed in Lewandowska-Tomaszczyk et al. (2021). An extended offensive language ontology comprising 17 categories, structured in terms of 4 hierarchical levels, has been shown to represent the encoding of the defined offensive language schema, trained in terms of non-contextual word embeddings – i.e., Word2Vec and Fast Text, and eventually juxtaposed to the data acquired by using a pairwise training and testing analysis for existing categories in the HateBERT model (Lewandowska-Tomaszczyk et al. submitted). The study reports on the annotation practice in WG 4.1.1. Incivility in media and social media in the context of COST Action CA 18209 European network for Web-centred linguistic data science (Nexus Linguarum) with 2 the INCEpTION tool (https://github.com/inception-project/inception) – a semantic annotation platform offering assistance in annotation. The results partly support the proposed ontology of explicit offence and positive implicitness types to provide more variance among widely recognized types of figurative language (e.g., metaphorical, metonymic, ironic, etc.). The use of the annotation system and the representation of linguistic data have also been evaluated in a series of the annotators’ comments, using a questionnaire method and in an open discussion. The annotation results and the questionnaire showed that for some of the categories, there was low or medium inter-annotator agreement, and it was more challenging for annotators to distinguish between category items than between aspect items, with the category items of offensive, insulting and abusive being the most difficult in this respect. The need for taxonomic simplification measures in this respect has been recognized for further annotation practices.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
60203 - Linguistics
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Rasprave Instituta za Hrvatski Jezik i Jezikoslovlje
ISSN
1331-6745
e-ISSN
1849-0379
Svazek periodika
49
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
HR - Chorvatská republika
Počet stran výsledku
21
Strana od-do
155-175
Kód UT WoS článku
001153374200005
EID výsledku v databázi Scopus
2-s2.0-85177228943