Automatic Web Document Restructuring Based on Visual Information Analysis
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F10%3APU82674" target="_blank" >RIV/00216305:26230/10:PU82674 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Automatic Web Document Restructuring Based on Visual Information Analysis
Original language description
Many documents available on the current web have quite a complex structure that allows to present various kinds of information. Apart from the main content, the documents usually contain headers and footers, navigation sections and other types of additional information. For many applications such as document indexing or browsing on special devices, it is desirable that the main document information should precede the additional information in the underlying HTML code. In this paper, we propose a methodof document preprocessing that automatically restructures the document code according to this criteria. Our method is based on rendered document analysis. A page segmentation algorithm is used for detecting the basic blocks on the page and the relevanceof the individual parts is estimated from the visual properties of the text content.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
—
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)
Others
Publication year
2010
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Advances in Intelligent Web Mastering - 2, Proceedings of the 6th Atlantic Web Intelligence Conference - AWIC'2009
ISBN
978-3-642-10686-6
ISSN
—
e-ISSN
—
Number of pages
10
Pages from-to
61-70
Publisher name
Springer Verlag
Place of publication
Prague
Event location
Prague
Event date
Sep 9, 2009
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—