Open dataset discovery using context-enhanced similarity search
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F22%3A00359555" target="_blank" >RIV/68407700:21240/22:00359555 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1007/s10115-022-01751-z" target="_blank" >https://doi.org/10.1007/s10115-022-01751-z</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s10115-022-01751-z" target="_blank" >10.1007/s10115-022-01751-z</a>
Alternative languages
Result language
angličtina
Original language name
Open dataset discovery using context-enhanced similarity search
Original language description
Today, open data catalogs enable users to search for datasets with full-text queries in metadata records combined with simple faceted filtering. Using this combination, a user is able to discover a significant number of the datasets relevant to a user’s search intent. However, there still remain relevant datasets that are hard to find because of the enormous sparsity of their metadata (e.g., several keywords). As an alternative, in this paper, we propose an approach to dataset discovery based on similarity search over metadata descriptions enhanced by various semantic contexts. In general, the semantic contexts enrich the dataset metadata in a way that enables the identification of additional relevant datasets to a query that could not be retrieved using just the keyword or full-text search. In experimental evaluation we show that context-enhanced similarity retrieval methods increase the findability of relevant datasets, improving thus the retrieval recall that is critical in dataset discovery scenarios. As a part of the evaluation, we created a catalog-like user interface for dataset discovery and recorded streams of user actions that served us to create the ground truth. For the sake of reproducibility, we published the entire evaluation testbed.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
—
Continuities
S - Specificky vyzkum na vysokych skolach
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Knowledge and Information Systems
ISSN
0219-1377
e-ISSN
0219-3116
Volume of the periodical
64
Issue of the periodical within the volume
12
Country of publishing house
DE - GERMANY
Number of pages
27
Pages from-to
3265-3291
UT code for WoS article
000849677000001
EID of the result in the Scopus database
2-s2.0-85137453544