Extracting Visually Presented Element Relationships from Web Documents
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F13%3APU108091" target="_blank" >RIV/00216305:26230/13:PU108091 - isvavai.cz</a>
Result on the web
<a href="http://www.fit.vutbr.cz/research/pubs/all.php?id=10468" target="_blank" >http://www.fit.vutbr.cz/research/pubs/all.php?id=10468</a>
DOI - Digital Object Identifier
—
Alternative languages
Result language
angličtina
Original language name
Extracting Visually Presented Element Relationships from Web Documents
Original language description
Many documents in the World Wide Web present structured information that consists of multiple pieces of data with certain relationships among them. Although it is usually not difficult to identify the individual data values in the document text, their relationships are often not explicitly described in the document content. They are expressed by visual presentation of the document content that is expected to be interpreted by a human reader. In this paper, we propose a formal generic model of logical relationships in a document based on an interpretation of visual presentation patterns in the documents. The model describes the visually expressed relationships between individual parts of the contents independently of the document format and the particular way of presentation. Therefore, it can be used as an appropriate document model in many information retrieval or extraction applica- tions. We formally define the model, we introduce a method of extracting the relationships between the content parts based on the visual presentation analysis and we discuss the expected applications. We also present a new dataset consisting of programmes of conferences and other scientific events and we discuss its suitability for the task in hand. Finally, we use the dataset to evaluate results of the implemented system.
Czech name
—
Czech description
—
Classification
Type
J<sub>ost</sub> - Miscellaneous article in a specialist periodical
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
Result was created during the realization of more than one project. More information in the Projects tab.
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2013
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
International Journal of Cognitive Informatics and Natural Intelligence
ISSN
1557-3958
e-ISSN
1557-3966
Volume of the periodical
2013
Issue of the periodical within the volume
2
Country of publishing house
US - UNITED STATES
Number of pages
17
Pages from-to
13-29
UT code for WoS article
—
EID of the result in the Scopus database
—