Open dataset discovery using context-enhanced similarity search

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F22%3A00359555" target="_blank" >RIV/68407700:21240/22:00359555 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/00216208:11320/22:10448313
Výsledek na webu
<a href="https://doi.org/10.1007/s10115-022-01751-z" target="_blank" >https://doi.org/10.1007/s10115-022-01751-z</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s10115-022-01751-z" target="_blank" >10.1007/s10115-022-01751-z</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Open dataset discovery using context-enhanced similarity search
Popis výsledku v původním jazyce
Today, open data catalogs enable users to search for datasets with full-text queries in metadata records combined with simple faceted filtering. Using this combination, a user is able to discover a significant number of the datasets relevant to a user’s search intent. However, there still remain relevant datasets that are hard to find because of the enormous sparsity of their metadata (e.g., several keywords). As an alternative, in this paper, we propose an approach to dataset discovery based on similarity search over metadata descriptions enhanced by various semantic contexts. In general, the semantic contexts enrich the dataset metadata in a way that enables the identification of additional relevant datasets to a query that could not be retrieved using just the keyword or full-text search. In experimental evaluation we show that context-enhanced similarity retrieval methods increase the findability of relevant datasets, improving thus the retrieval recall that is critical in dataset discovery scenarios. As a part of the evaluation, we created a catalog-like user interface for dataset discovery and recorded streams of user actions that served us to create the ground truth. For the sake of reproducibility, we published the entire evaluation testbed.
Název v anglickém jazyce
Open dataset discovery using context-enhanced similarity search
Popis výsledku anglicky
Today, open data catalogs enable users to search for datasets with full-text queries in metadata records combined with simple faceted filtering. Using this combination, a user is able to discover a significant number of the datasets relevant to a user’s search intent. However, there still remain relevant datasets that are hard to find because of the enormous sparsity of their metadata (e.g., several keywords). As an alternative, in this paper, we propose an approach to dataset discovery based on similarity search over metadata descriptions enhanced by various semantic contexts. In general, the semantic contexts enrich the dataset metadata in a way that enables the identification of additional relevant datasets to a query that could not be retrieved using just the keyword or full-text search. In experimental evaluation we show that context-enhanced similarity retrieval methods increase the findability of relevant datasets, improving thus the retrieval recall that is critical in dataset discovery scenarios. As a part of the evaluation, we created a catalog-like user interface for dataset discovery and recorded streams of user actions that served us to create the ground truth. For the sake of reproducibility, we published the entire evaluation testbed.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach

Ostatní

Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Knowledge and Information Systems
ISSN
0219-1377
e-ISSN
0219-3116
Svazek periodika
64
Číslo periodika v rámci svazku
12
Stát vydavatele periodika
DE - Spolková republika Německo
Počet stran výsledku
27
Strana od-do
3265-3291
Kód UT WoS článku
000849677000001
EID výsledku v databázi Scopus
2-s2.0-85137453544

Podobné výsledky(10)

Open dataset discovery using context-enhanced similarity search Evaluation Framework for Search Methods Focused on Dataset Findability in Open Data Catalogs Evaluation Framework for Search Methods Focused on Dataset Findability in Open Data Catalogs

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Open dataset discovery using context-enhanced similarity search

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)