The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F12%3A10130089" target="_blank" >RIV/00216208:11320/12:10130089 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
The Study of Effect of Length in Morphological Segmentation of Agglutinative Languages
Original language description
Morph length is one of the indicative feature that helps learning the morphology of languages, in particular agglutinative languages. In this paper, we introduce a simple unsupervised model for morphological segmentation and study how the knowledge of morph length affect the performance of the segmentation task under the Bayesian framework. The model is based on (Goldwater et al., 2006) unigram word segmentation model and assumes a simple prior distribution over morph length. We experiment this model ontwo highly related and agglutinative languages namely Tamil and Telugu, and compare our results with the state of the art Morfessor system. We show that, knowledge of morph length has a positive impact and provides competitive results in terms of overall performance.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
—
Continuities
R - Projekt Ramcoveho programu EK
Others
Publication year
2012
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the First Workshop on Multilingual Modeling (MM-2012)
ISBN
978-1-937284-35-0
ISSN
—
e-ISSN
—
Number of pages
7
Pages from-to
18-24
Publisher name
Association for Computational Linguistics
Place of publication
Jeju, Korea
Event location
Jeju Island, Korea
Event date
Jul 13, 2012
Type of event by nationality
CST - Celostátní akce
UT code for WoS article
—