Deep Neural Networks for Web Page Information Extraction
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F16%3A00303112" target="_blank" >RIV/68407700:21230/16:00303112 - isvavai.cz</a>
Alternative codes found
RIV/68407700:21730/16:00303112
Result on the web
<a href="http://link.springer.com/chapter/10.1007/978-3-319-44944-9_14" target="_blank" >http://link.springer.com/chapter/10.1007/978-3-319-44944-9_14</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/978-3-319-44944-9_14" target="_blank" >10.1007/978-3-319-44944-9_14</a>
Alternative languages
Result language
angličtina
Original language name
Deep Neural Networks for Web Page Information Extraction
Original language description
Web wrappers are systems for extracting structured information from web pages. Currently, wrappers need to be adapted to a particular website template before they can start the extraction process. In this work we present a new method, which uses convolutional neural networks to learn a wrapper that can extract information from previously unseen templates. Therefore, this wrapper does not need any site-specific initialization and is able to extract information from a single web page. We also propose a method for spatial text encoding, which allows us to encode visual and textual content of a web page into a single neural net. The first experiments with product information extraction showed very promising results and suggest that this approach can lead to a general site-independent web wrapper.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
JC - Computer hardware and software
OECD FORD branch
—
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2016
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Artificial Intelligence Applications and Innovations
ISBN
978-3-319-44943-2
ISSN
1868-4238
e-ISSN
—
Number of pages
10
Pages from-to
154-163
Publisher name
Springer International Publishing
Place of publication
Cham
Event location
Thessaloniki
Event date
Sep 16, 2016
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000392413700014