The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F08%3A10079301" target="_blank" >RIV/00216208:11320/08:10079301 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Victor
Original language description
Victor is a tool for cleaning web pages. It employs a sequence-labeling approach based on Conditional Random Fields (CRF). Every block of text in the analyzed web page is assigned a set of features extracted from the textual content and HTML structure ofthe page. Text blocks are automatically labeled either as content segments containing main web page content, which should be preserved, or as noisy segments not suitable for further linguistic processing, which should be eliminated.
Czech name
—
Czech description
—
Classification
Type
R - Software
CEP classification
AI - Linguistics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/GD201%2F05%2FH014" target="_blank" >GD201/05/H014: Collegium Informaticum</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2008
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Internal product ID
Victor
Technical parameters
http://ufal.mff.cuni.cz/victor/
Economical parameters
100000
Owner IČO
00216208
Owner name
Univerzita Karlova v Praze