Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F20%3A10492525" target="_blank" >RIV/00216208:11320/20:10492525 - isvavai.cz</a>
Result on the web
<a href="https://aclanthology.org/2020.coling-main.406/" target="_blank" >https://aclanthology.org/2020.coling-main.406/</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.18653/v1/2020.coling-main.406" target="_blank" >10.18653/v1/2020.coling-main.406</a>
Alternative languages
Result language
angličtina
Original language name
Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT
Original language description
Words are properly segmented in the Persian writing system; in practice, however, these writing rules are often neglected, resulting in single words being written disjointedly and multiple words written without any white spaces between them. This paper addresses the problems of word segmentation and zero-width non-joiner (ZWNJ) recognition in Persian, which we approach jointly as a sequence labeling problem. We achieved a macro-averaged F1-score of 92.40% on a carefully collected corpus of 500 sentences with a high level of difficulty.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2020
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Second Workshop on Beyond Vision and LANguage: inTEgrating Real-world kNowledge (LANTERN)
ISBN
978-1-952148-51-4
ISSN
—
e-ISSN
—
Number of pages
7
Pages from-to
4612-4618
Publisher name
Association for Computational Linguistics
Place of publication
Barcelona, Spain
Event location
Online
Event date
Dec 13, 2020
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—