Combining Sparse and Dense Information Retrieval: Soft Vector Space Model and MathBERTa at ARQMath-3 Task 1 (Answer Retrieval)

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F22%3A00126431" target="_blank" >RIV/00216224:14330/22:00126431 - isvavai.cz</a>
Výsledek na webu
<a href="http://ceur-ws.org/Vol-3180/paper-06.pdf" target="_blank" >http://ceur-ws.org/Vol-3180/paper-06.pdf</a>
DOI - Digital Object Identifier
—

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Combining Sparse and Dense Information Retrieval: Soft Vector Space Model and MathBERTa at ARQMath-3 Task 1 (Answer Retrieval)
Popis výsledku v původním jazyce
Sparse retrieval techniques can detect exact matches, but are inadequate for mathematical texts, where the same information can be expressed as either text or math. The soft vector space model has been shown to improve sparse retrieval on semantic text similarity, text classification, and machine translation evaluation tasks, but it has not yet been properly evaluated on math information retrieval. In our work, we compare the soft vector space model against standard sparse retrieval baselines and state-of-the-art math information retrieval systems from Task 1 (Answer Retrieval) of the ARQMath-3 lab. We evaluate the impact of different math representations, different notions of similarity between key words and math symbols ranging from Levenshtein distances to deep neural language models, and different ways of combining text and math. We show that using the soft vector space model consistently improves effectiveness compared to using standard sparse retrieval techniques. We also show that the Tangent-L math representation achieves better effectiveness than LaTeX, and that modeling text and math separately using two models improves effectiveness compared to jointly modeling text and math using a single model. Lastly, we show that different math representations and different ways of combining text and math benefit from different notions of similarity between tokens. Our best system achieves NDCG' of 0.251 on Task 1 of the ARQMath-3 lab.
Název v anglickém jazyce
Combining Sparse and Dense Information Retrieval: Soft Vector Space Model and MathBERTa at ARQMath-3 Task 1 (Answer Retrieval)
Popis výsledku anglicky
Sparse retrieval techniques can detect exact matches, but are inadequate for mathematical texts, where the same information can be expressed as either text or math. The soft vector space model has been shown to improve sparse retrieval on semantic text similarity, text classification, and machine translation evaluation tasks, but it has not yet been properly evaluated on math information retrieval. In our work, we compare the soft vector space model against standard sparse retrieval baselines and state-of-the-art math information retrieval systems from Task 1 (Answer Retrieval) of the ARQMath-3 lab. We evaluate the impact of different math representations, different notions of similarity between key words and math symbols ranging from Levenshtein distances to deep neural language models, and different ways of combining text and math. We show that using the soft vector space model consistently improves effectiveness compared to using standard sparse retrieval techniques. We also show that the Tangent-L math representation achieves better effectiveness than LaTeX, and that modeling text and math separately using two models improves effectiveness compared to jointly modeling text and math using a single model. Lastly, we show that different math representations and different ways of combining text and math benefit from different notions of similarity between tokens. Our best system achieves NDCG' of 0.251 on Task 1 of the ARQMath-3 lab.

Klasifikace

Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název statě ve sborníku
Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum
ISBN
—
ISSN
1613-0073
e-ISSN
—
Počet stran výsledku
15
Strana od-do
104-118
Název nakladatele
CEUR-WS
Místo vydání
Bologna
Místo konání akce
Bologna
Datum konání akce
5. 9. 2022
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—

Podobné výsledky(10)

Overview of ARQMath-3 (2022): Third CLEF Lab on Answer Retrieval for Questions on Math (Working Notes Version)Ensembling Math Information Retrieval Systems: MIRMU and MSM at ARQMath 2021 Ensembling Ten Math Information Retrieval Systems: MIRMU and MSM at ARQMath 2021

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Combining Sparse and Dense Information Retrieval: Soft Vector Space Model and MathBERTa at ARQMath-3 Task 1 (Answer Retrieval)

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)