Extending general sentiment lexicon to specific domains in (semi-)automatic manner
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3AKPQ6FJYQ" target="_blank" >RIV/00216208:11320/22:KPQ6FJYQ - isvavai.cz</a>
Výsledek na webu
<a href="https://repositorio-aberto.up.pt/handle/10216/141370" target="_blank" >https://repositorio-aberto.up.pt/handle/10216/141370</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Extending general sentiment lexicon to specific domains in (semi-)automatic manner
Popis výsledku v původním jazyce
This paper describes an approach to the construction of a sentiment analysis system that uses both automatic and manual processes. The system includes a domain-specific sentiment lexicon, modifier patterns and rules that are used to derive the sentiment values of sentences in new texts. The lexicon that includes single words (unigrams) is obtained in an automatic manner from the distribution of ratings for all words in the labelled training data. The sentiment values of phrases is derived from a list of modifier patterns, built/developed manually. These include a modifier and a focal element. The modifiers can be of different types, depending on whether the operation is intensification, downtoning or reversal. This approach was applied to texts on economics and finance in European Portuguese. In our view, this line of work deserves more attention in the community, as the system not only has reasonable performance, but also can provide understandable explanations to the user.
Název v anglickém jazyce
Extending general sentiment lexicon to specific domains in (semi-)automatic manner
Popis výsledku anglicky
This paper describes an approach to the construction of a sentiment analysis system that uses both automatic and manual processes. The system includes a domain-specific sentiment lexicon, modifier patterns and rules that are used to derive the sentiment values of sentences in new texts. The lexicon that includes single words (unigrams) is obtained in an automatic manner from the distribution of ratings for all words in the labelled training data. The sentiment values of phrases is derived from a list of modifier patterns, built/developed manually. These include a modifier and a focal element. The modifiers can be of different types, depending on whether the operation is intensification, downtoning or reversal. This approach was applied to texts on economics and finance in European Portuguese. In our view, this line of work deserves more attention in the community, as the system not only has reasonable performance, but also can provide understandable explanations to the user.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů