Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F18%3A10390078" target="_blank" >RIV/00216208:11320/18:10390078 - isvavai.cz</a>
Result on the web
<a href="https://transactions.ismir.net/articles/10.5334/tismir.12/#" target="_blank" >https://transactions.ismir.net/articles/10.5334/tismir.12/#</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.5334/tismir.12" target="_blank" >10.5334/tismir.12</a>
Alternative languages
Result language
angličtina
Original language name
Learning Audio-Sheet Music Correspondences for Cross-Modal Retrieval and Piece Identification
Original language description
This work addresses the problem of matching musical audio directly to sheet music, without any higher-level abstract representation. We propose a method that learns joint embedding spaces for short excerpts of audio and their respective counterparts in sheet music images, using multimodal convolutional neural networks. Given the learned representations, we show how to utilize them for two sheet-music-related tasks: (1) piece/score identification from audio queries and (2) retrieving relevant performances given a score as a search query. All retrieval models are trained and evaluated on a new, large scale multimodal audio-sheet music dataset which is made publicly available along with this article. The dataset comprises 479 precisely annotated solo piano pieces by 53 composers, for a total of 1,129 pages of music and about 15 hours of aligned audio, which was synthesized from these scores. Going beyond this synthetic training data, we carry out first retrieval experiments using scans of real sheet musi
Czech name
—
Czech description
—
Classification
Type
J<sub>ost</sub> - Miscellaneous article in a specialist periodical
CEP classification
—
OECD FORD branch
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Result continuities
Project
<a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2018
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
Transactions of the International Society for Music Information Retrieval
ISSN
2514-3298
e-ISSN
—
Volume of the periodical
1
Issue of the periodical within the volume
1
Country of publishing house
CA - CANADA
Number of pages
12
Pages from-to
22-33
UT code for WoS article
—
EID of the result in the Scopus database
—