Automatic Web Document Restructuring Based on Visual Information Analysis

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F10%3APU82674" target="_blank" >RIV/00216305:26230/10:PU82674 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Automatic Web Document Restructuring Based on Visual Information Analysis
Popis výsledku v původním jazyce
Many documents available on the current web have quite a complex structure that allows to present various kinds of information. Apart from the main content, the documents usually contain headers and footers, navigation sections and other types of additional information. For many applications such as document indexing or browsing on special devices, it is desirable that the main document information should precede the additional information in the underlying HTML code. In this paper, we propose a methodof document preprocessing that automatically restructures the document code according to this criteria. Our method is based on rendered document analysis. A page segmentation algorithm is used for detecting the basic blocks on the page and the relevanceof the individual parts is estimated from the visual properties of the text content.
Název v anglickém jazyce
Automatic Web Document Restructuring Based on Visual Information Analysis
Popis výsledku anglicky
Many documents available on the current web have quite a complex structure that allows to present various kinds of information. Apart from the main content, the documents usually contain headers and footers, navigation sections and other types of additional information. For many applications such as document indexing or browsing on special devices, it is desirable that the main document information should precede the additional information in the underlying HTML code. In this paper, we propose a methodof document preprocessing that automatically restructures the document code according to this criteria. Our method is based on rendered document analysis. A page segmentation algorithm is used for detecting the basic blocks on the page and the relevanceof the individual parts is estimated from the visual properties of the text content.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
JC - Počítačový hardware a software
OECD FORD obor
—

Návaznosti výsledku

Projekt
—
Návaznosti
Z - Vyzkumny zamer (s odkazem do CEZ)

Ostatní

Rok uplatnění
2010
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Advances in Intelligent Web Mastering - 2, Proceedings of the 6th Atlantic Web Intelligence Conference - AWIC'2009
ISBN
978-3-642-10686-6
ISSN
—
e-ISSN
—
Počet stran výsledku
10
Strana od-do
61-70
Název nakladatele
Springer Verlag
Místo vydání
Prague
Místo konání akce
Prague
Datum konání akce
9. 9. 2009
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Information Extraction from Web Sources based on Multi-aspect Content Analysis Automatic annotation of online articles based on visual feature classification Web Page Element Classification Based on Visual Features

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Automatic Web Document Restructuring Based on Visual Information Analysis

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)