Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

Modular framework for similarity-based dataset discovery using external knowledge

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F22%3A00356040" target="_blank" >RIV/68407700:21240/22:00356040 - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://doi.org/10.1108/DTA-09-2021-0261" target="_blank" >https://doi.org/10.1108/DTA-09-2021-0261</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1108/DTA-09-2021-0261" target="_blank" >10.1108/DTA-09-2021-0261</a>

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    Modular framework for similarity-based dataset discovery using external knowledge

  • Popis výsledku v původním jazyce

    Purpose Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth. Design/methodology/approach In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery. Findings The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework. Originality/value To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

  • Název v anglickém jazyce

    Modular framework for similarity-based dataset discovery using external knowledge

  • Popis výsledku anglicky

    Purpose Semantic retrieval and discovery of datasets published as open data remains a challenging task. The datasets inherently originate in the globally distributed web jungle, lacking the luxury of centralized database administration, database schemes, shared attributes, vocabulary, structure and semantics. The existing dataset catalogs provide basic search functionality relying on keyword search in brief, incomplete or misleading textual metadata attached to the datasets. The search results are thus often insufficient. However, there exist many ways of improving the dataset discovery by employing content-based retrieval, machine learning tools, third-party (external) knowledge bases, countless feature extraction methods and description models and so forth. Design/methodology/approach In this paper, the authors propose a modular framework for rapid experimentation with methods for similarity-based dataset discovery. The framework consists of an extensible catalog of components prepared to form custom pipelines for dataset representation and discovery. Findings The study proposes several proof-of-concept pipelines including experimental evaluation, which showcase the usage of the framework. Originality/value To the best of authors’ knowledge, there is no similar formal framework for experimentation with various similarity methods in the context of dataset discovery. The framework has the ambition to establish a platform for reproducible and comparable research in the area of dataset discovery. The prototype implementation of the framework is available on GitHub.

Klasifikace

  • Druh

    J<sub>imp</sub> - Článek v periodiku v databázi Web of Science

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

    S - Specificky vyzkum na vysokych skolach

Ostatní

  • Rok uplatnění

    2022

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

  • Název periodika

    Data Technologies and Applications

  • ISSN

    2514-9288

  • e-ISSN

    2514-9318

  • Svazek periodika

    56

  • Číslo periodika v rámci svazku

    4

  • Stát vydavatele periodika

    GB - Spojené království Velké Británie a Severního Irska

  • Počet stran výsledku

    30

  • Strana od-do

    506-535

  • Kód UT WoS článku

    000759634600001

  • EID výsledku v databázi Scopus

    2-s2.0-85125073753