Extending Word2Vec with Domain-Specific Labels
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27510%2F22%3A10250379" target="_blank" >RIV/61989100:27510/22:10250379 - isvavai.cz</a>
Result on the web
<a href="https://annals-csis.org/Volume_30/" target="_blank" >https://annals-csis.org/Volume_30/</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Extending Word2Vec with Domain-Specific Labels
Original language description
Choosing a proper representation of textual data isan important part of natural language processing. One option is using Word2Vec embeddings, i.e., dense vectors whose properties can to a degree capture the "meaning" of each word. One of the main disadvantages of Word2Vec is its inability to distinguish between antonyms. Motivated by this deficiency, this paper presents a Word2Vec extension for incorporating domain-specific labels. The goal is to improve the ability to differentiate between embeddings of words associated with different document labels or classes. This improvement is demonstrated on word embeddings derived from tweets related to a publicly traded company. Each tweet is given a label depending on whether its publication coincides with a stock price increase or decrease. The extended Word2Vec model then takes this label into account. The user can also set the weight of this label in the embedding creation process. Experiment results show that increasing this weight leads to a gradual decrease in cosine similarity between embeddings of words associated with different labels. This decrease in similarity can be interpreted as an improvement of the ability to distinguish between these words.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Annals of Computer Science and Information Systems. Volume 30
ISBN
978-83-965897-1-2
ISSN
2300-5963
e-ISSN
—
Number of pages
4
Pages from-to
157-160
Publisher name
Polskie Towarzystwo Informatyczne
Place of publication
Varšava
Event location
Sofie
Event date
Sep 4, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000904404400022