CR-M-SpanBERT: Multiple embedding-based DNN coreference resolution using self-attention SpanBERT
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AHZD7NCW6" target="_blank" >RIV/00216208:11320/25:HZD7NCW6 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85186114766&doi=10.4218%2fetrij.2023-0308&partnerID=40&md5=25e606202bf4fd7c289e32d5a26c8827" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85186114766&doi=10.4218%2fetrij.2023-0308&partnerID=40&md5=25e606202bf4fd7c289e32d5a26c8827</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.4218/etrij.2023-0308" target="_blank" >10.4218/etrij.2023-0308</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
CR-M-SpanBERT: Multiple embedding-based DNN coreference resolution using self-attention SpanBERT
Popis výsledku v původním jazyce
This study introduces CR-M-SpanBERT, a coreference resolution (CR) model that utilizes multiple embedding-based span bidirectional encoder representations from transformers, for antecedent recognition in natural language (NL) text. Information extraction studies aimed to extract knowledge from NL text autonomously and cost-effectively. However, the extracted information may not represent knowledge accurately owing to the presence of ambiguous entities. Therefore, we propose a CR model that identifies mentions referring to the same entity in NL text. In the case of CR, it is necessary to understand both the syntax and semantics of the NL text simultaneously. Therefore, multiple embeddings are generated for CR, which can include syntactic and semantic information for each word. We evaluate the effectiveness of CR-M-SpanBERT by comparing it to a model that uses SpanBERT as the language model in CR studies. The results demonstrate that our proposed deep neural network model achieves high-recognition accuracy for extracting antecedents from NL text. Additionally, it requires fewer epochs to achieve an average F1 accuracy greater than 75% compared with the conventional SpanBERT approach. 1225-6463/$ © 2024 ETRI.
Název v anglickém jazyce
CR-M-SpanBERT: Multiple embedding-based DNN coreference resolution using self-attention SpanBERT
Popis výsledku anglicky
This study introduces CR-M-SpanBERT, a coreference resolution (CR) model that utilizes multiple embedding-based span bidirectional encoder representations from transformers, for antecedent recognition in natural language (NL) text. Information extraction studies aimed to extract knowledge from NL text autonomously and cost-effectively. However, the extracted information may not represent knowledge accurately owing to the presence of ambiguous entities. Therefore, we propose a CR model that identifies mentions referring to the same entity in NL text. In the case of CR, it is necessary to understand both the syntax and semantics of the NL text simultaneously. Therefore, multiple embeddings are generated for CR, which can include syntactic and semantic information for each word. We evaluate the effectiveness of CR-M-SpanBERT by comparing it to a model that uses SpanBERT as the language model in CR studies. The results demonstrate that our proposed deep neural network model achieves high-recognition accuracy for extracting antecedents from NL text. Additionally, it requires fewer epochs to achieve an average F1 accuracy greater than 75% compared with the conventional SpanBERT approach. 1225-6463/$ © 2024 ETRI.
Klasifikace
Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
ETRI Journal
ISSN
12256463
e-ISSN
—
Svazek periodika
46
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
13
Strana od-do
35-47
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85186114766