MT4CrossOIE: Multi-stage tuning for cross-lingual open information extraction

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AG2EURCKF" target="_blank" >RIV/00216208:11320/25:G2EURCKF - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199005546&doi=10.1016%2fj.eswa.2024.124760&partnerID=40&md5=5346c2848fb28ffc346f43d5691c9ab9" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85199005546&doi=10.1016%2fj.eswa.2024.124760&partnerID=40&md5=5346c2848fb28ffc346f43d5691c9ab9</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.eswa.2024.124760" target="_blank" >10.1016/j.eswa.2024.124760</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
MT4CrossOIE: Multi-stage tuning for cross-lingual open information extraction
Popis výsledku v původním jazyce
Cross-lingual open information extraction aims to extract structured information from raw text across multiple languages. Previous work uses a shared cross-lingual pre-trained model to handle the different languages but underuses the potential of the language-specific representation. In this paper, we propose an effective multi-stage tuning framework called MT4CrossOIE, designed for enhancing cross-lingual open information extraction by injecting language-specific knowledge into the shared model. Specifically, the cross-lingual pre-trained model is first tuned in a shared semantic space (e.g., embedding matrix) in the fixed encoder and then other components are optimized in the second stage. After enough training, we freeze the pre-trained model and tune the multiple extra low-rank language-specific modules using mixture of LoRAs for model-based cross-lingual transfer. In addition, we leverage two-stage prompting to encourage the large language model (LLM) to annotate the multi-lingual raw data for data-based cross-lingual transfer. The model is trained with multi-lingual objectives on our proposed dataset OpenIE4++ by combining the model-based and data-based transfer techniques. Experimental results on various benchmarks emphasize the importance of aggregating multiple plug-in-and-play language-specific modules and demonstrate the effectiveness of MT4CrossOIE in cross-lingual OIE. © 2024
Název v anglickém jazyce
MT4CrossOIE: Multi-stage tuning for cross-lingual open information extraction
Popis výsledku anglicky
Cross-lingual open information extraction aims to extract structured information from raw text across multiple languages. Previous work uses a shared cross-lingual pre-trained model to handle the different languages but underuses the potential of the language-specific representation. In this paper, we propose an effective multi-stage tuning framework called MT4CrossOIE, designed for enhancing cross-lingual open information extraction by injecting language-specific knowledge into the shared model. Specifically, the cross-lingual pre-trained model is first tuned in a shared semantic space (e.g., embedding matrix) in the fixed encoder and then other components are optimized in the second stage. After enough training, we freeze the pre-trained model and tune the multiple extra low-rank language-specific modules using mixture of LoRAs for model-based cross-lingual transfer. In addition, we leverage two-stage prompting to encourage the large language model (LLM) to annotate the multi-lingual raw data for data-based cross-lingual transfer. The model is trained with multi-lingual objectives on our proposed dataset OpenIE4++ by combining the model-based and data-based transfer techniques. Experimental results on various benchmarks emphasize the importance of aggregating multiple plug-in-and-play language-specific modules and demonstrate the effectiveness of MT4CrossOIE in cross-lingual OIE. © 2024

Klasifikace

Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
—

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Expert Systems with Applications
ISSN
0957-4174
e-ISSN
—
Svazek periodika
255
Číslo periodika v rámci svazku
2024
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
12
Strana od-do
1-12
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85199005546

Podobné výsledky(10)

Cross-Lingual Transfer from Related Languages: Treating Low-Resource Maltese as Multilingual Code-Switching Cross-Lingual BERT Transformation for Zero-Shot Dependency Parsing Distilling Efficient Language-Specific Models for Cross-Lingual Transfer

Co hledáte?

Rychlé hledání

Chytré vyhledávání

MT4CrossOIE: Multi-stage tuning for cross-lingual open information extraction

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)