Real-world sentence boundary detection using multitask learning: A case study on French
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AFLLK296I" target="_blank" >RIV/00216208:11320/25:FLLK296I - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85128531911&doi=10.1017%2fS1351324922000134&partnerID=40&md5=9be131708d834d63a090961e6a9c1911" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85128531911&doi=10.1017%2fS1351324922000134&partnerID=40&md5=9be131708d834d63a090961e6a9c1911</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1017/S1351324922000134" target="_blank" >10.1017/S1351324922000134</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Real-world sentence boundary detection using multitask learning: A case study on French
Popis výsledku v původním jazyce
We propose a novel approach for sentence boundary detection in text datasets in which boundaries are not evident (e.g., sentence fragments). Although detecting sentence boundaries without punctuation marks has rarely been explored in written text, current real-world textual data suffer from widespread lack of proper start/stop signaling. Herein, we annotate a dataset with linguistic information, such as parts of speech and named entity labels, to boost the sentence boundary detection task. Via experiments, we obtained F1 scores up to 98.07% using the proposed multitask neural model, including a score of 89.41% for sentences completely lacking punctuation marks. We also present an ablation study and provide a detailed analysis to demonstrate the effectiveness of the proposed multitask learning method. © The Author(s), 2022. Published by Cambridge University Press.
Název v anglickém jazyce
Real-world sentence boundary detection using multitask learning: A case study on French
Popis výsledku anglicky
We propose a novel approach for sentence boundary detection in text datasets in which boundaries are not evident (e.g., sentence fragments). Although detecting sentence boundaries without punctuation marks has rarely been explored in written text, current real-world textual data suffer from widespread lack of proper start/stop signaling. Herein, we annotate a dataset with linguistic information, such as parts of speech and named entity labels, to boost the sentence boundary detection task. Via experiments, we obtained F1 scores up to 98.07% using the proposed multitask neural model, including a score of 89.41% for sentences completely lacking punctuation marks. We also present an ablation study and provide a detailed analysis to demonstrate the effectiveness of the proposed multitask learning method. © The Author(s), 2022. Published by Cambridge University Press.
Klasifikace
Druh
J<sub>SC</sub> - Článek v periodiku v databázi SCOPUS
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Natural Language Engineering
ISSN
1351-3249
e-ISSN
—
Svazek periodika
30
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
21
Strana od-do
150-170
Kód UT WoS článku
—
EID výsledku v databázi Scopus
2-s2.0-85128531911