NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3A5BZ8PKPR" target="_blank" >RIV/00216208:11320/25:5BZ8PKPR - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195954923&partnerID=40&md5=34678ac17212dda261bcfdb2ebad7df2" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85195954923&partnerID=40&md5=34678ac17212dda261bcfdb2ebad7df2</a>
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems
Original language description
With the advancements of transformer-based architectures, we observe the rise of natural language preprocessing (NLPre) tools capable of solving preliminary NLP tasks (e.g. tokenisation, part-of-speech tagging, dependency parsing, or morphological analysis) without any external linguistic guidance. It is arduous to compare novel solutions to well-entrenched preprocessing toolkits, relying on rule-based morphological analysers or dictionaries. Aware of the shortcomings of existing NLPre evaluation approaches, we investigate a novel method of reliable and fair evaluation and performance reporting. Inspired by the GLUE benchmark, the proposed language-centric benchmarking system enables comprehensive ongoing evaluation of multiple NLPre tools, while credibly tracking their performance. The prototype application is configured for Polish and integrated with the thoroughly assembled NLPre-PL benchmark. Based on this benchmark, we conduct an extensive evaluation of a variety of Polish NLPre systems. To facilitate the construction of benchmarking environments for other languages, e.g. NLPre-GA for Irish or NLPre-ZH for Chinese, we ensure full customization of the publicly released source code of the benchmarking system. The links to all the resources (deployed platforms, source code, trained models, datasets etc.) can be found on the project website: https://sites.google.com/view/nlpre-benchmark. © 2024 ELRA Language Resource Association: CC BY-NC 4.0.
Czech name
—
Czech description
—

Classification

Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
—
Continuities
—

Others

Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Article name in the collection
Jt. Int. Conf. Comput. Linguist., Lang. Resour. Eval., LREC-COLING - Main Conf. Proc.
ISBN
978-249381410-4
ISSN
—
e-ISSN
—
Number of pages
17
Pages from-to
12271-12287
Publisher name
European Language Resources Association (ELRA)
Place of publication
—
Event location
Torino, Italia
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—

Similar results(10)

Deep Learning-Based Preprocessing Tools for Turkish Natural Language Processing Effectiveness of Text, Acoustic, and Lattice-Based Representations in Spoken Language Understanding Tasks IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

What are you looking for?

Quick search

Smart search

NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systems

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)