Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F21%3A00356836" target="_blank" >RIV/68407700:21240/21:00356836 - isvavai.cz</a>
Výsledek na webu
<a href="http://ceur-ws.org/Vol-2836/qurator2021_paper_18.pdf" target="_blank" >http://ceur-ws.org/Vol-2836/qurator2021_paper_18.pdf</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources
Popis výsledku v původním jazyce
This paper addresses one of the largest and most complex data curation workflows in existence: Wikipedia and Wikidata, with a high number of users and curators adding factual information from external sources via a non-systematic Wiki workflow to Wikipedia’s infoboxes and Wikidata items. We present high-level analyses of the current state, the challenges and limitations in this workflow and supplement it with a quantitative and semantic analysis of the resulting data spaces by deploying DBpedia’s integration and extraction capabilities. Based on an analysis of millions of references from Wikipedia infoboxes in different languages, we can find the most important sources which can be used to enrich other knowledge bases with information of better quality. An initial tool is presented, the GlobalFactSync browser, as a prototype to discuss further measures to develop a more systematic approach for data curation in the WikiVerse.
Název v anglickém jazyce
Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources
Popis výsledku anglicky
This paper addresses one of the largest and most complex data curation workflows in existence: Wikipedia and Wikidata, with a high number of users and curators adding factual information from external sources via a non-systematic Wiki workflow to Wikipedia’s infoboxes and Wikidata items. We present high-level analyses of the current state, the challenges and limitations in this workflow and supplement it with a quantitative and semantic analysis of the resulting data spaces by deploying DBpedia’s integration and extraction capabilities. Based on an analysis of millions of references from Wikipedia infoboxes in different languages, we can find the most important sources which can be used to enrich other knowledge bases with information of better quality. An initial tool is presented, the GlobalFactSync browser, as a prototype to discuss further measures to develop a more systematic approach for data curation in the WikiVerse.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2021
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the Conference on Digital Curation Technologies (Qurator 2021)
ISBN
—
ISSN
1613-0073
e-ISSN
1613-0073
Počet stran výsledku
15
Strana od-do
—
Název nakladatele
CEUR Workshop Proceedings
Místo vydání
Aachen
Místo konání akce
Berlin
Datum konání akce
8. 2. 2021
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

LinkedPipes ETL: Wikidata Integration Open Bibliographical Data Workflows and the Multilinguality Challenge Special domain data mining through DBpedia on the example of Biology

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Towards a Systematic Approach to Sync Factual Data across Wikipedia, Wikidata and External Data Sources

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)