A method for constructing word sense embeddings based on word sense induction
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F23%3A10254662" target="_blank" >RIV/61989100:27240/23:10254662 - isvavai.cz</a>
Result on the web
<a href="https://www.nature.com/articles/s41598-023-40062-3" target="_blank" >https://www.nature.com/articles/s41598-023-40062-3</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1038/s41598-023-40062-3" target="_blank" >10.1038/s41598-023-40062-3</a>
Alternative languages
Result language
angličtina
Original language name
A method for constructing word sense embeddings based on word sense induction
Original language description
Polysemy is an inherent characteristic of natural language. In order to make it easier to distinguish between different senses of polysemous words, we propose a method for encoding multiple different senses of polysemous words using a single vector. The method first uses a two-layer bidirectional long short-term memory neural network and a self-attention mechanism to extract the contextual information of polysemous words. Then, a K-means algorithm, which is improved by optimizing the density peaks clustering algorithm based on cosine similarity, is applied to perform word sense induction on the contextual information of polysemous words. Finally, the method constructs the corresponding word sense embedded representations of the polysemous words. The results of the experiments demonstrate that the proposed method produces better word sense induction than Euclidean distance, Pearson correlation, and KL-divergence and more accurate word sense embeddings than mean shift, DBSCAN, spectral clustering, and agglomerative clustering. (C) 2023, Springer Nature Limited.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2023
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Scientific Reports
ISSN
2045-2322
e-ISSN
—
Volume of the periodical
13
Issue of the periodical within the volume
1
Country of publishing house
US - UNITED STATES
Number of pages
13
Pages from-to
—
UT code for WoS article
001045574100067
EID of the result in the Scopus database
2-s2.0-85167532342