Two-Phase Categorization of Web Documents
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F10%3APU89654" target="_blank" >RIV/00216305:26230/10:PU89654 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Two-Phase Categorization of Web Documents
Original language description
The number of pages on the World Wide Web is permanently growing and there is a need to process pages efficiently and obtain some useful knowledge from them. Web page categorization is a very important issue in this area. The method proposed here takes both visual and textual information into consideration. It consists of two phases. In the first phase, web page areas obtained by segmentation are classified based on their visual properties, and in the second phase, pages are classified, based on information from the first phase and textual information. Several experiments with web pages taken from news web sites are presented in the final part of the paper.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
—
Continuities
Z - Vyzkumny zamer (s odkazem do CEZ)<br>S - Specificky vyzkum na vysokych skolach
Others
Publication year
2010
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings of the International Conference on Knowledge Discovery and Information Retrieval
ISBN
978-989-8425-28-7
ISSN
—
e-ISSN
—
Number of pages
5
Pages from-to
—
Publisher name
Institute for Systems and Technologies of Information, Control and Communication
Place of publication
Valencia
Event location
Valencia
Event date
Oct 25, 2010
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
—