Real-world sentence boundary detection using multitask learning: A case study on French
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AFLLK296I" target="_blank" >RIV/00216208:11320/25:FLLK296I - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85128531911&doi=10.1017%2fS1351324922000134&partnerID=40&md5=9be131708d834d63a090961e6a9c1911" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85128531911&doi=10.1017%2fS1351324922000134&partnerID=40&md5=9be131708d834d63a090961e6a9c1911</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1017/S1351324922000134" target="_blank" >10.1017/S1351324922000134</a>
Alternative languages
Result language
angličtina
Original language name
Real-world sentence boundary detection using multitask learning: A case study on French
Original language description
We propose a novel approach for sentence boundary detection in text datasets in which boundaries are not evident (e.g., sentence fragments). Although detecting sentence boundaries without punctuation marks has rarely been explored in written text, current real-world textual data suffer from widespread lack of proper start/stop signaling. Herein, we annotate a dataset with linguistic information, such as parts of speech and named entity labels, to boost the sentence boundary detection task. Via experiments, we obtained F1 scores up to 98.07% using the proposed multitask neural model, including a score of 89.41% for sentences completely lacking punctuation marks. We also present an ablation study and provide a detailed analysis to demonstrate the effectiveness of the proposed multitask learning method. © The Author(s), 2022. Published by Cambridge University Press.
Czech name
—
Czech description
—
Classification
Type
J<sub>SC</sub> - Article in a specialist periodical, which is included in the SCOPUS database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
—
Others
Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Natural Language Engineering
ISSN
1351-3249
e-ISSN
—
Volume of the periodical
30
Issue of the periodical within the volume
1
Country of publishing house
US - UNITED STATES
Number of pages
21
Pages from-to
150-170
UT code for WoS article
—
EID of the result in the Scopus database
2-s2.0-85128531911