JEDI: These aren't the JSON documents you're looking for...
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21240%2F22%3A00365165" target="_blank" >RIV/68407700:21240/22:00365165 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1145/3514221.3517850" target="_blank" >https://doi.org/10.1145/3514221.3517850</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1145/3514221.3517850" target="_blank" >10.1145/3514221.3517850</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
JEDI: These aren't the JSON documents you're looking for...
Popis výsledku v původním jazyce
The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data. In this paper, we address the problem of JSON similarity lookup queries: given a query document and a distance threshold ????, retrieve all documents that are within ???? from the query document. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered sibling collections within a single document which poses a new challenge to the tree model and distance computation. We propose JSON tree, a lossless tree representation of JSON documents, and define the JSON Edit Distance (JEDI), the first edit-based distance measure for JSON. We develop QuickJEDI, an algorithm that computes JEDI by leveraging a new technique to prune expensive sibling matchings. It outperforms a baseline algorithm by an order of magnitude in runtime. To boost the performance of JSON similarity queries, we introduce an index called JSIM and an effective upper bound based on tree sorting. Our upper bound algorithm runs in ???? (????????) time and ???? (???? +???? log ????) space, which substantially improves the previous best bound of ???? (????2) time and ???? (???? log ????) space (where ???? is the tree size). Our experimental evaluation shows that our solution scales to databases with millions of documents and JSON trees with tens of thousands of nodes.
Název v anglickém jazyce
JEDI: These aren't the JSON documents you're looking for...
Popis výsledku anglicky
The JavaScript Object Notation (JSON) is a popular data format used in document stores to natively support semi-structured data. In this paper, we address the problem of JSON similarity lookup queries: given a query document and a distance threshold ????, retrieve all documents that are within ???? from the query document. Different from other hierarchical formats such as XML, JSON supports both ordered and unordered sibling collections within a single document which poses a new challenge to the tree model and distance computation. We propose JSON tree, a lossless tree representation of JSON documents, and define the JSON Edit Distance (JEDI), the first edit-based distance measure for JSON. We develop QuickJEDI, an algorithm that computes JEDI by leveraging a new technique to prune expensive sibling matchings. It outperforms a baseline algorithm by an order of magnitude in runtime. To boost the performance of JSON similarity queries, we introduce an index called JSIM and an effective upper bound based on tree sorting. Our upper bound algorithm runs in ???? (????????) time and ???? (???? +???? log ????) space, which substantially improves the previous best bound of ???? (????2) time and ???? (???? log ????) space (where ???? is the tree size). Our experimental evaluation shows that our solution scales to databases with millions of documents and JSON trees with tens of thousands of nodes.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
<a href="/cs/project/EF15_003%2F0000421" target="_blank" >EF15_003/0000421: Big Code: Škálovatelná analýza rozsáhlých bází programů</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
ISBN
978-1-4503-9249-5
ISSN
0730-8078
e-ISSN
—
Počet stran výsledku
14
Strana od-do
1584-1597
Název nakladatele
Association for Computing Machinery
Místo vydání
New York
Místo konání akce
Philadelphia
Datum konání akce
12. 6. 2022
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
000852705400115