HPS: High precision stemmer
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F15%3A43922745" target="_blank" >RIV/49777513:23520/15:43922745 - isvavai.cz</a>
Výsledek na webu
<a href="http://dx.doi.org/10.1016/j.ipm.2014.08.006" target="_blank" >http://dx.doi.org/10.1016/j.ipm.2014.08.006</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.ipm.2014.08.006" target="_blank" >10.1016/j.ipm.2014.08.006</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
HPS: High precision stemmer
Popis výsledku v původním jazyce
Research into unsupervised ways of stemming has resulted, in the past few years, in the development of methods that are reliable and perform well. Our approach further shifts the boundaries of the state of the art by providing more accurate stemming results. The idea of the approach consists in building a stemmer in two stages. In the first stage, a stemming algorithm based upon clustering, which exploits the lexical and semantic information of words, is used to prepare large-scale training data for thesecond-stage algorithm. The second-stage algorithm uses a maximum entropy classifier. The stemming-specific features help the classifier decide when and how to stem a particular word. In our research, we have pursued the goal of creating a multi-purposestemming tool. Its design opens up possibilities of solving non-traditional tasks such as approximating lemmas or improving language modeling. However, we still aim at very good results in the traditional task of information retrieval. T
Název v anglickém jazyce
HPS: High precision stemmer
Popis výsledku anglicky
Research into unsupervised ways of stemming has resulted, in the past few years, in the development of methods that are reliable and perform well. Our approach further shifts the boundaries of the state of the art by providing more accurate stemming results. The idea of the approach consists in building a stemmer in two stages. In the first stage, a stemming algorithm based upon clustering, which exploits the lexical and semantic information of words, is used to prepare large-scale training data for thesecond-stage algorithm. The second-stage algorithm uses a maximum entropy classifier. The stemming-specific features help the classifier decide when and how to stem a particular word. In our research, we have pursued the goal of creating a multi-purposestemming tool. Its design opens up possibilities of solving non-traditional tasks such as approximating lemmas or improving language modeling. However, we still aim at very good results in the traditional task of information retrieval. T
Klasifikace
Druh
J<sub>x</sub> - Nezařazeno - Článek v odborném periodiku (Jimp, Jsc a Jost)
CEP obor
JD - Využití počítačů, robotika a její aplikace
OECD FORD obor
—
Návaznosti výsledku
Projekt
<a href="/cs/project/ED1.1.00%2F02.0090" target="_blank" >ED1.1.00/02.0090: NTIS - Nové technologie pro informační společnost</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2015
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
Information Processing and Mangement
ISSN
0306-4573
e-ISSN
—
Svazek periodika
51
Číslo periodika v rámci svazku
1
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
24
Strana od-do
68-91
Kód UT WoS článku
000345491900005
EID výsledku v databázi Scopus
—