Polish Malach Speech Corpus

The result's identifiers

Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F06%3A00000005" target="_blank" >RIV/49777513:23520/06:00000005 - isvavai.cz</a>
Result on the web
—
DOI - Digital Object Identifier
—

Alternative languages

Result language
angličtina
Original language name
Polish Malach Speech Corpus
Original language description
Visual History Foundation collected recently at least 52 thousand testimonies of holocaust survivors pronounced at 32 different languages. The Polish collection is created by about 1,550 testimonies with the total length of about 3,500 hours. The corresponding Polish Malach Speech Corpus was annotated with the goal to build the large vocabulary continuous speech recognition system. For this purpose it was selected and manually transcribed 200 15-minute speech segments of individual speakers (for training purposes) and whole testimonies of 10 different survivors (about 22 hours of speech) for tests. All manual annotations were performed in the orthographic form of the words.
Czech name
Anotovaný korpus polských výpovědí svědků holocaustu
Czech description
Visual History Foundation shromáždila v minulých létech cca 52 tisíc výpovědí svědků holocaustu namluvených ve 32 jazycích. Polských výpovědí je k dispozici cca 1 550 s celkovou délkou asi 3 500 hodin. Korpus polských výpovědí projektu Malach byl připraven pro konstrukci systému automatického rozpoznávání spontánní řeči, který bude využit pro automatické hledání klíčových slov a topiků ve výpovědích. Pro trénování systému bylo zpracováno a speciálním způsobem anotováno celkem 200 15-minutových segmentůvýpovědí (celkem 100 hodin), pro testy bylo zpracováno 10 celých výpovědí od různých řečníků (celkem cca 22 hodin). Všechny manuální anotace byly provede

Classification

Type
X - Unclassified
CEP classification
JD - Use of computers, robotics and its application
OECD FORD branch
—

Result continuities

Project
<a href="/en/project/LC536" target="_blank" >LC536: Integrated center for natural language processing</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

Publication year
2006
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Similar results(10)

Russian Malach Speech Corpus Slovak Malach Speech Corpus Czech Malach Speech Corpus

What are you looking for?

Quick search

Smart search

Polish Malach Speech Corpus

The result's identifiers

Alternative languages

Classification

Result continuities

Others

Similar results(10)

What are you looking for?

Quick search

Smart search

Result description

The result's identifiers

The result's identifiers

Alternative languages

Alternative languages

Classification

Classification

Result continuities

Result continuities

Others

Others

Similar results(10)