Deep Learning-Based Preprocessing Tools for Turkish Natural Language Processing
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3AIXIBKYLA" target="_blank" >RIV/00216208:11320/25:IXIBKYLA - isvavai.cz</a>
Výsledek na webu
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85202632922&doi=10.1007%2f978-3-031-66705-3_15&partnerID=40&md5=e4b7a13018e011e2fe856762b6688db3" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85202632922&doi=10.1007%2f978-3-031-66705-3_15&partnerID=40&md5=e4b7a13018e011e2fe856762b6688db3</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-031-66705-3_15" target="_blank" >10.1007/978-3-031-66705-3_15</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Deep Learning-Based Preprocessing Tools for Turkish Natural Language Processing
Popis výsledku v původním jazyce
As the demand for effective natural language processing applications in Turkish continues to rise, the need for text preprocessing tools tailored to the Turkish language increases. These tools form the initial step of any natural language application and improves the efficiency of complex tasks such as text summarization, question-answering, and machine translation. We propose a novel deep learning-based framework focusing on Turkish preprocessing tasks, including tokenization, sentence splitting, deasciification, part-of-speech tagging, vowelization, spell correction, and morphological analysis. The proposed framework is suitable for independent use of each preprocessing tool as well as the use in an all-in-one scheme. We use the CANINE model to train the character-level tools, and BERT and mT5 models for the token-based tools. We evaluate the framework for each task on the BOUN Treebank in the UD project and make both the tools and the codes publicly available. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Název v anglickém jazyce
Deep Learning-Based Preprocessing Tools for Turkish Natural Language Processing
Popis výsledku anglicky
As the demand for effective natural language processing applications in Turkish continues to rise, the need for text preprocessing tools tailored to the Turkish language increases. These tools form the initial step of any natural language application and improves the efficiency of complex tasks such as text summarization, question-answering, and machine translation. We propose a novel deep learning-based framework focusing on Turkish preprocessing tasks, including tokenization, sentence splitting, deasciification, part-of-speech tagging, vowelization, spell correction, and morphological analysis. The proposed framework is suitable for independent use of each preprocessing tool as well as the use in an all-in-one scheme. We use the CANINE model to train the character-level tools, and BERT and mT5 models for the token-based tools. We evaluate the framework for each task on the BOUN Treebank in the UD project and make both the tools and the codes publicly available. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
—
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Commun. Comput. Info. Sci.
ISBN
978-303166704-6
ISSN
1865-0929
e-ISSN
—
Počet stran výsledku
17
Strana od-do
218-234
Název nakladatele
Springer Science and Business Media Deutschland GmbH
Místo vydání
—
Místo konání akce
Dijon
Datum konání akce
1. 1. 2025
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—