Using Unsupervised Paradigm Acquisition for Prefixes (revised version)
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F09%3A00206827" target="_blank" >RIV/00216208:11320/09:00206827 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Using Unsupervised Paradigm Acquisition for Prefixes (revised version)
Original language description
We describe a simple method of unsupervised morpheme segmentation of words in an unknown language. All that is needed is a raw text corpus (or a list of words) in the given language. The algorithm identifies word parts occurring in many words and interprets them as morpheme candidates (prefixes, stems and suffixes). New treatment of prefixes is the main innovation in comparison to [1]. After filtering out spurious hypotheses, the list of morphemes is applied to segment input words. Official Morpho Challenge 2008 evaluation is given together with some additional experiments. Processing of prefixes improved the F-score by 5 to 11 points for German, Finnish and Turkish, while it failed to improve English and Arabic. We also analyze and discuss errors withrespect to the evaluation method.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/1ET101470416" target="_blank" >1ET101470416: Multimodal human sign language and speech processing for man-machine communication</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2009
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Evaluating Systems for Multilingual and Multimodal Information Access ? 9th Workshop of the Cross-Language Evaluation Forum
ISBN
978-3-642-04446-5
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
—
Publisher name
Springer Verlag
Place of publication
Berlin / Heidelberg
Event location
Berlin / Heidelberg
Event date
Jan 1, 2009
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000273344500130