Interactive search for words and phrases in large audio-visual archives
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F17%3A43932995" target="_blank" >RIV/49777513:23520/17:43932995 - isvavai.cz</a>
Výsledek na webu
—
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
Interactive search for words and phrases in large audio-visual archives
Popis výsledku v původním jazyce
This paper describes an automatic system for processing and searching for large audio-visual archives, especially for the use of in the field of oral history studies. The system contains automated processing pipeline for speech recognition and indexation. The carefully designed graphical user interface allows to search for specific words and phrases. It also allows to search for out-of-vocabulary words and directly replay the occurrences sorted according to the automatically estimated confidence scores. The first archive processed in this system is the MALACH archive containing personal testimonies of holocaust survivors and witnesses. The searchable portion of interviews consists of 2,000 hours of English recordings and 1,000 hours of Czech recordings. The paper gives a brief overview of the architecture of the system, describes the phoneme-based search and provides basic performance metrics for both the English and Czech data.
Název v anglickém jazyce
Interactive search for words and phrases in large audio-visual archives
Popis výsledku anglicky
This paper describes an automatic system for processing and searching for large audio-visual archives, especially for the use of in the field of oral history studies. The system contains automated processing pipeline for speech recognition and indexation. The carefully designed graphical user interface allows to search for specific words and phrases. It also allows to search for out-of-vocabulary words and directly replay the occurrences sorted according to the automatically estimated confidence scores. The first archive processed in this system is the MALACH archive containing personal testimonies of holocaust survivors and witnesses. The searchable portion of interviews consists of 2,000 hours of English recordings and 1,000 hours of Czech recordings. The paper gives a brief overview of the architecture of the system, describes the phoneme-based search and provides basic performance metrics for both the English and Czech data.
Klasifikace
Druh
O - Ostatní výsledky
CEP obor
—
OECD FORD obor
20205 - Automation and control systems
Návaznosti výsledku
Projekt
<a href="/cs/project/TE01020197" target="_blank" >TE01020197: Centrum aplikované kybernetiky 3</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Ostatní
Rok uplatnění
2017
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů