Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F03892620%3A_____%2F17%3AN0000002" target="_blank" >RIV/03892620:_____/17:N0000002 - isvavai.cz</a>
Nalezeny alternativní kódy
RIV/00216224:14330/17:00094375
Výsledek na webu
<a href="http://ceur-ws.org/Vol-1923/article-01.pdf" target="_blank" >http://ceur-ws.org/Vol-1923/article-01.pdf</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines
Popis výsledku v původním jazyce
Vector representations and vector space modeling (VSM) play a central role in modern machine learning. In our recent research we proposed a novel approach to ‘vector similarity searching’ over dense semantic vector representations. This approach can be deployed on top of traditional inverted-index-based fulltext engines, taking advantage of their robustness, stability, scalability and ubiquity. In this paper we validate our method using varied datasets ranging from text representations and embeddings (LSA, doc2vec, GloVe) to SIFT descriptors of image data. We show how our approach handles the indexing and querying in these domains, building a fast and scalable vector database with a tunable trade-off between vector search performance and quality, backed by a standard fulltext engine such as Elasticsearch.
Název v anglickém jazyce
Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines
Popis výsledku anglicky
Vector representations and vector space modeling (VSM) play a central role in modern machine learning. In our recent research we proposed a novel approach to ‘vector similarity searching’ over dense semantic vector representations. This approach can be deployed on top of traditional inverted-index-based fulltext engines, taking advantage of their robustness, stability, scalability and ubiquity. In this paper we validate our method using varied datasets ranging from text representations and embeddings (LSA, doc2vec, GloVe) to SIFT descriptors of image data. We show how our approach handles the indexing and querying in these domains, building a fast and scalable vector database with a tunable trade-off between vector search performance and quality, backed by a standard fulltext engine such as Elasticsearch.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
20205 - Automation and control systems

Návaznosti výsledku

Projekt
<a href="/cs/project/TD03000295" target="_blank" >TD03000295: Inteligentní software pro sémantické hledání dokumentů</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
CEUR Workshop Proceedings, Vol. 1923
ISBN
—
ISSN
1613-0073
e-ISSN
—
Počet stran výsledku
12
Strana od-do
1-12
Název nakladatele
Neuveden
Místo vydání
Vienna, Austria
Místo konání akce
Vienna, Austria
Datum konání akce
21. 10. 2017
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Semantic Vector Encoding and Similarity Search Using Fulltext Search Engines ScaleText Indexing and Searching Mathematics in Digital Libraries -- Architecture, Design and Scalability Issues

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Flexible Similarity Search of Semantic Vectors Using Fulltext Search Engines

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)