Context Sensitive Pattern Based Segmentation: A Thai Challenge
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F03%3A00008605" target="_blank" >RIV/00216224:14330/03:00008605 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Context Sensitive Pattern Based Segmentation: A Thai Challenge
Original language description
A Thai written text is a string of symbols without explicit word boundaries. A method for a development of a segmentation tool from a corpus of already segmented text is described. The methodology is based on the technology of competing patterns. A new UNICODE pattern generation program, OPATGEN, is used for the learning phase. We have shown feasibility of our methodology by generating patterns for Thai segmentation from already segmented text of the Thai corpus ORCHID: the segmentation algorithm quickly reaches F-score of 93 %. Finally, we enumerate possible new applications based on the pattern technique, and conclude with the suggestion of a general Pattern Translation Process. The technology is general and can be used for any other segmentation tasks as phonetic, morphologic segmentation, word hyphenation, sentence segmentation and text topic segmentation for any language.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
—
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2003
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of EACL 2003 workshop Computational Linguistics for South Asian Languages -- Expanding Synergies with Europe
ISBN
1-932432-02-7
ISSN
—
e-ISSN
—
Number of pages
8
Pages from-to
65-72
Publisher name
Association for Computational Linguistics
Place of publication
Budapest
Event location
Budapest
Event date
Apr 14, 2003
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—