Vše

Co hledáte?

Vše
Projekty
Výsledky výzkumu
Subjekty

Rychlé hledání

  • Projekty podpořené TA ČR
  • Významné projekty
  • Projekty s nejvyšší státní podporou
  • Aktuálně běžící projekty

Chytré vyhledávání

  • Takto najdu konkrétní +slovo
  • Takto z výsledků -slovo zcela vynechám
  • “Takto můžu najít celou frázi”

The IBN BATTOUTA Air Traffic Control Corpus with Real Life ADS-B and METAR Data

Identifikátory výsledku

  • Kód výsledku v IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F20%3A10426963" target="_blank" >RIV/00216208:11320/20:10426963 - isvavai.cz</a>

  • Výsledek na webu

    <a href="https://link.springer.com/chapter/10.1007%2F978-3-030-51186-9_26" target="_blank" >https://link.springer.com/chapter/10.1007%2F978-3-030-51186-9_26</a>

  • DOI - Digital Object Identifier

Alternativní jazyky

  • Jazyk výsledku

    angličtina

  • Název v původním jazyce

    The IBN BATTOUTA Air Traffic Control Corpus with Real Life ADS-B and METAR Data

  • Popis výsledku v původním jazyce

    The Ibn Battouta ATC (Air Traffic Control) communication corpus is English corpus characterized with a strong Moroccan accent speech, under the creation and concentrated on Tangier’s airport1. It synchronizes the voice, the ADS-B (flight data broadcast by aircraft) and the METAR (Weather Report on Airport), as trusted structured data frequently pronounced during the communication. The aim of this work is to update the acoustic model and train the ASR (Automatic Speech Recognition) engine like CMU Sphinx or IBM Kaldi for the purpose of improving ATC capabilities, facilitating decision making and ensuring security. Currently we have recorded more than five hours long audio files with silence elimination of real life communication between pilots and controllers during takeoffs, approaches and landings in Tangier airport, Gibraltar airport and in the way from the Moroccan airspace to the Spanish one and vice versa. In addition to 10logging files from ADS-B flight data and METAR reports. All audio files have been fully transcribed manually, with time marking to indicate the start of transmission and its duration, by airlines pilots and controllers working on Tangier airport. To the best knowledge, it’s the first corpus that aligns speech data and structured data perfectly to all for a richer ASR modeling. In this paper, we will describe the Tangier’s airport geographic specification, and the techniques used to ensure quality of recording, data collecting, and their transcription, using the META data annotation and the advantages of the ATC phraseology as a reducing and controlling language, with synchronized flight data, and weather reports as trusted structured data.

  • Název v anglickém jazyce

    The IBN BATTOUTA Air Traffic Control Corpus with Real Life ADS-B and METAR Data

  • Popis výsledku anglicky

    The Ibn Battouta ATC (Air Traffic Control) communication corpus is English corpus characterized with a strong Moroccan accent speech, under the creation and concentrated on Tangier’s airport1. It synchronizes the voice, the ADS-B (flight data broadcast by aircraft) and the METAR (Weather Report on Airport), as trusted structured data frequently pronounced during the communication. The aim of this work is to update the acoustic model and train the ASR (Automatic Speech Recognition) engine like CMU Sphinx or IBM Kaldi for the purpose of improving ATC capabilities, facilitating decision making and ensuring security. Currently we have recorded more than five hours long audio files with silence elimination of real life communication between pilots and controllers during takeoffs, approaches and landings in Tangier airport, Gibraltar airport and in the way from the Moroccan airspace to the Spanish one and vice versa. In addition to 10logging files from ADS-B flight data and METAR reports. All audio files have been fully transcribed manually, with time marking to indicate the start of transmission and its duration, by airlines pilots and controllers working on Tangier airport. To the best knowledge, it’s the first corpus that aligns speech data and structured data perfectly to all for a richer ASR modeling. In this paper, we will describe the Tangier’s airport geographic specification, and the techniques used to ensure quality of recording, data collecting, and their transcription, using the META data annotation and the advantages of the ATC phraseology as a reducing and controlling language, with synchronized flight data, and weather reports as trusted structured data.

Klasifikace

  • Druh

    O - Ostatní výsledky

  • CEP obor

  • OECD FORD obor

    10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

  • Projekt

  • Návaznosti

Ostatní

  • Rok uplatnění

    2020

  • Kód důvěrnosti údajů

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů