A comprehensive analysis of static word embeddings for Turkish
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AC45RSFNF" target="_blank" >RIV/00216208:11320/25:C45RSFNF - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192269121&doi=10.1016%2fj.eswa.2024.124123&partnerID=40&md5=eb9e7299fe152e6047145dcf76b7892b" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192269121&doi=10.1016%2fj.eswa.2024.124123&partnerID=40&md5=eb9e7299fe152e6047145dcf76b7892b</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.eswa.2024.124123" target="_blank" >10.1016/j.eswa.2024.124123</a>
Alternative languages
Result language
angličtina
Original language name
A comprehensive analysis of static word embeddings for Turkish
Original language description
Word embeddings are fixed-length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non-contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and non-contextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non-contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available. © 2024 Elsevier Ltd
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Expert Systems with Applications
ISSN
0957-4174
e-ISSN
—
Volume of the periodical
252
Issue of the periodical within the volume
2024
Country of publishing house
US - UNITED STATES
Number of pages
11
Pages from-to
1-11
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85192269121