MRL Parsing Without Tears: The Case of Hebrew

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F25%3ATTNQL84I" target="_blank" >RIV/00216208:11320/25:TTNQL84I - isvavai.cz</a>
Result on the web
<a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85205304209&partnerID=40&md5=e12f3974145dc05edce7687898d967f7" target="_blank" >https://www.scopus.com/inward/record.uri?eid=2-s2.0-85205304209&partnerID=40&md5=e12f3974145dc05edce7687898d967f7</a>
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
MRL Parsing Without Tears: The Case of Hebrew
Original language description
Syntactic parsing remains a critical tool for relation extraction and information extraction, especially in resource-scarce languages where LLMs are lacking. Yet in morphologically rich languages (MRLs), where parsers need to identify multiple lexical units in each token, existing systems suffer in latency and setup complexity. Some use a pipeline to peel away the layers: first segmentation, then morphology tagging, and then syntax parsing; however, errors in earlier layers are then propagated forward. Others use a joint architecture to evaluate all permutations at once; while this improves accuracy, it is notoriously slow. In contrast, and taking Hebrew as a test case, we present a new ""flipped pipeline"": decisions are made directly on the whole-token units by expert classifiers, each one dedicated to one specific task. The classifier predictions are independent of one another, and only at the end do we synthesize their predictions. This blazingly fast approach requires only a single huggingface call, without the need for recourse to lexicons or linguistic resources. When trained on the same training set used in previous studies, our model achieves near-SOTA performance on a wide array of Hebrew NLP tasks. Furthermore, when trained on a newly enlarged training corpus, our model achieves a new SOTA for Hebrew POS tagging and dependency parsing. We release this new SOTA model to the community. Because our architecture does not rely on any language-specific resources, it can serve as a model to develop similar parsers for other MRLs. © 2024 Association for Computational Linguistics.
Czech name
—
Czech description
—

Classification

Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Result continuities

Project
—
Continuities
—

Others

Publication year
2024
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

Article name in the collection
Proc. Annu. Meet. Assoc. Comput Linguist.
ISBN
979-889176099-8
ISSN
0736-587X
e-ISSN
—
Number of pages
14
Pages from-to
4537-4550
Publisher name
Association for Computational Linguistics (ACL)
Place of publication
—
Event location
Bangkok
Event date
Jan 1, 2025
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—

Similar results(10)

A Truly Joint Neural Architecture for Segmentation and Parsing A Pointer Network Architecture for Joint Morphological Segmentation and Tagging Prague at EPE 2017: The UDPipe System

What are you looking for?

Quick search

Smart search

MRL Parsing Without Tears: The Case of Hebrew

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Data specific for result type

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Data specific for result type

Data specific for result type

Similar results(10)