Visual Area Classification for Article Identification in Web Documents
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F10%3APU89585" target="_blank" >RIV/00216305:26230/10:PU89585 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Visual Area Classification for Article Identification in Web Documents
Original language description
In the World Wide Web, the news and other articles are usually published in complex HTML documents containing many types of additional information that is not explicitly marked. In this paper, we propose a visual information analysis approach to the article discovery in complex HTML documents. We use a classification approach for the identification the important parts of the article within the page and we propose an algorithm for the detection of the article bounds within the page. Finally, we provide the results of an experimental evaluation.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
—
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2010
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
21st International Workshop on Databases and Expert Systems Applications
ISBN
978-0-7695-4174-7
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
—
Publisher name
IEEE Computer Society
Place of publication
Bilbao
Event location
Bilbao
Event date
Aug 31, 2010
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—