Czech text segmentation using voting experts and its comparison with Menzerath-Altmann law
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F61989100%3A27240%2F11%3A86081142" target="_blank" >RIV/61989100:27240/11:86081142 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/978-3-642-27245-5_20" target="_blank" >http://dx.doi.org/10.1007/978-3-642-27245-5_20</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-642-27245-5_20" target="_blank" >10.1007/978-3-642-27245-5_20</a>
Alternative languages
Result language
angličtina
Original language name
Czech text segmentation using voting experts and its comparison with Menzerath-Altmann law
Original language description
The word alphabet is connection to a lot of problems in the information retrieval. Information retrieval algorithms usually do not process the input data as sequence of bytes, but they use even bigger pieces of the data, say words or generally some chunks of the data. This is the main motivation of the paper. How to split the input data into smaller chunks without a priori known structure? To do this, we use Voting Experts Algorithms in our paper. Voting Experts Algorithm is often used to process time series data, audio signals, etc. Our intention is to use Voting Experts algorithm for future segmentation of discrete data such as DNA or proteins. For test purposes we use Czech and English text as test bed for the segmentation algorithm. We use Menzerath-Altmann law for comparison of the segmentation result.
Czech name
—
Czech description
—
Classification
Type
J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GA205%2F09%2F1079" target="_blank" >GA205/09/1079: Methods of Artificial Inteligence in GIS</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2011
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Communications in Computer and Information Science
ISSN
1865-0929
e-ISSN
—
Volume of the periodical
245
Issue of the periodical within the volume
12
Country of publishing house
DE - GERMANY
Number of pages
9
Pages from-to
152-160
UT code for WoS article
—
EID of the result in the Scopus database
—