Hard Problems of Tagset Conversion
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F10%3A10078047" target="_blank" >RIV/00216208:11320/10:10078047 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Hard Problems of Tagset Conversion
Original language description
Part-of-speech or morphological tags are important means of annotation in a vast number of corpora. However, different sets of tags are used in different corpora, even for the same language. Tagset conversion is difficult, and solutions tend to be tailored to a particular pair of tagsets. We discuss Interset, a universal approach that makes the conversion tools reusable. While some morphosyntactic categories are clearly defined and easily ported from one tagset to another, there are also phenomena thatare difficult to deal with because of overlapping concepts. In the present paper we focus on some of such problems, discuss their coverage in selected tagsets and propose solutions to unify the respective tagsets' approaches.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
—
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2010
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the Second International Conference on Global Interoperability for Language Resources
ISBN
978-962-442-323-5
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
—
Publisher name
City University of Hong Kong
Place of publication
Hong Kong, China
Event location
Hong Kong, China
Event date
Jan 15, 2010
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—