Cross-Lingual NLU: Mitigating Language-Specific Impact in Embeddings Leveraging Adversarial Learning
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AAPM8W767" target="_blank" >RIV/00216208:11320/25:APM8W767 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195980825&partnerID=40&md5=931a35840517cadf414805ba3e25461c" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195980825&partnerID=40&md5=931a35840517cadf414805ba3e25461c</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Cross-Lingual NLU: Mitigating Language-Specific Impact in Embeddings Leveraging Adversarial Learning
Popis výsledku v původním jazyce
Low-resource languages and computational expenses pose significant challenges in the domain of large language models (LLMs). Currently, researchers are actively involved in various efforts to tackle these challenges. Cross-lingual natural language processing (NLP) remains one of the most promising strategies to address these issues. In this paper, we introduce a novel approach that utilizes adversarial techniques to mitigate the impact of language-specific information in contextual embeddings generated by large multilingual language models, with potential applications in cross-lingual tasks. The study encompasses five different languages, including both Latin and non-Latin ones, in the context of two fundamental tasks in natural language understanding: intent detection and slot filling. The results primarily show that our current approach excels in zero-shot scenarios for Latin languages like Spanish. However, it encounters limitations when applied to languages distant from English, such as Thai and Persian. This highlights that while our approach effectively reduces the effect of language-specific information on the core meaning, it performs better for Latin languages that share language-specific nuances with English, as certain characteristics persist in the overall meaning within embeddings. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Název v anglickém jazyce
Cross-Lingual NLU: Mitigating Language-Specific Impact in Embeddings Leveraging Adversarial Learning
Popis výsledku anglicky
Low-resource languages and computational expenses pose significant challenges in the domain of large language models (LLMs). Currently, researchers are actively involved in various efforts to tackle these challenges. Cross-lingual natural language processing (NLP) remains one of the most promising strategies to address these issues. In this paper, we introduce a novel approach that utilizes adversarial techniques to mitigate the impact of language-specific information in contextual embeddings generated by large multilingual language models, with potential applications in cross-lingual tasks. The study encompasses five different languages, including both Latin and non-Latin ones, in the context of two fundamental tasks in natural language understanding: intent detection and slot filling. The results primarily show that our current approach excels in zero-shot scenarios for Latin languages like Spanish. However, it encounters limitations when applied to languages distant from English, such as Thai and Persian. This highlights that while our approach effectively reduces the effect of language-specific information on the core meaning, it performs better for Latin languages that share language-specific nuances with English, as certain characteristics persist in the overall meaning within embeddings. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Jt. Int. Conf. Comput. Linguist., Lang. Resour. Eval., LREC-COLING - Main Conf. Proc.
ISBN
978-249381410-4
ISSN
—
e-ISSN
—
Počet stran výsledku
6
Strana od-do
4158-4163
Název nakladatele
European Language Resources Association (ELRA)
Místo vydání
—
Místo konání akce
Torino, Italia
Datum konání akce
1. 1. 2025
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—