A comprehensive analysis of static word embeddings for Turkish

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AC45RSFNF" target="_blank" >RIV/00216208:11320/25:C45RSFNF - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192269121&doi=10.1016%2fj.eswa.2024.124123&partnerID=40&md5=eb9e7299fe152e6047145dcf76b7892b" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85192269121&doi=10.1016%2fj.eswa.2024.124123&partnerID=40&md5=eb9e7299fe152e6047145dcf76b7892b</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.eswa.2024.124123" target="_blank" >10.1016/j.eswa.2024.124123</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
A comprehensive analysis of static word embeddings for Turkish
Popis výsledku v původním jazyce
Word embeddings are fixed-length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non-contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and non-contextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non-contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available. © 2024 Elsevier Ltd
Název v anglickém jazyce
A comprehensive analysis of static word embeddings for Turkish
Popis výsledku anglicky
Word embeddings are fixed-length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non-contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and non-contextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non-contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available. © 2024 Elsevier Ltd

Klasifikace

Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Expert Systems with Applications
ISSN
0957-4174
e-ISSN
—
Svazek periodika
252
Číslo periodika v rámci svazku
2024
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
11
Strana od-do
1-11
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85192269121

Podobné výsledky(10)

Hidden in the Layers: Interpretation of Neural Networks for Natural Language Processing Analysis of the Semantic Vector Space Induced by a Neural Language Model and a Corpus On the Language Neutrality of Pre-trained Multilingual Representations

Co hledáte?

Rychlé hledání

Chytré vyhledávání

A comprehensive analysis of static word embeddings for Turkish

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)