Creating Searchable Web Page Snapshots using Semantic Technologies
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216305%3A26230%2F23%3APU148339" target="_blank" >RIV/00216305:26230/23:PU148339 - isvavai.cz</a>
Výsledek na webu
<a href="https://link.springer.com/chapter/10.1007/978-3-031-34444-2_26" target="_blank" >https://link.springer.com/chapter/10.1007/978-3-031-34444-2_26</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Creating Searchable Web Page Snapshots using Semantic Technologies
Popis výsledku v původním jazyce
For many applications, it is necessary to create snapshots of web pages that accurately describe how the page appeared in a browser at a given point in time. Storing the original code (even when including all referenced resources) and creating bitmap screenshots have many drawbacks when it comes to searching, viewing and manipulating such snapshots. In this paper, we demonstrate a different approach that uses a remotely controlled web browser for rendering web pages. We capture the complete information about the rendered page and all pieces of its content, transform it to an explicit RDF-based model representation stored in a repository. Then, the stored page models may be examined using an interactive web-based tools, exported in different formats, linked with other data sources, and queried using SPARQL.
Název v anglickém jazyce
Creating Searchable Web Page Snapshots using Semantic Technologies
Popis výsledku anglicky
For many applications, it is necessary to create snapshots of web pages that accurately describe how the page appeared in a browser at a given point in time. Storing the original code (even when including all referenced resources) and creating bitmap screenshots have many drawbacks when it comes to searching, viewing and manipulating such snapshots. In this paper, we demonstrate a different approach that uses a remotely controlled web browser for rendering web pages. We capture the complete information about the rendered page and all pieces of its content, transform it to an explicit RDF-based model representation stored in a repository. Then, the stored page models may be examined using an interactive web-based tools, exported in different formats, linked with other data sources, and queried using SPARQL.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach
Ostatní
Rok uplatnění
2023
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů