Cluster-based Page Segmentation - a fast and precise method for web page pre-processing
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F13%3APU108460" target="_blank" >RIV/00216305:26230/13:PU108460 - isvavai.cz</a>
Result on the web
<a href="http://www.fit.vutbr.cz/research/pubs/all.php?id=10252" target="_blank" >http://www.fit.vutbr.cz/research/pubs/all.php?id=10252</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Cluster-based Page Segmentation - a fast and precise method for web page pre-processing
Original language description
Segmenting a web page may be one of initial steps of information retrieval or content classification performed on that page. While there has been an extensive research in this area, the approaches usually focus either on performance or quality of the results. Vision based segmentation is one of the quality focused methods, which are considerably slow. This paper proposes an approach for boosting the performance of vision based algorithms. Our approach is based on concepts of modern web and a very common scenario in which an entire web site is processed at once. In this scenario, a great amount of performance boost can be gained by isomorphic mapping of previous results gathered from pages within the site to other pages on the same site. We provide the results of experiments performed on VIPS, the most common algorithm for page segmentation.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/ED1.1.00%2F02.0070" target="_blank" >ED1.1.00/02.0070: IT4Innovations Centre of Excellence</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)<br>Z - Vyzkumny zamer (s odkazem do CEZ)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2013
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
The Third International Conference on Web Intelligence, Mining and Semantics
ISBN
978-1-4503-1850-1
ISSN
—
e-ISSN
—
Number of pages
12
Pages from-to
1-12
Publisher name
Association for Computing Machinery
Place of publication
Madrid
Event location
Madrid
Event date
Jun 12, 2013
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—