A polyphase filter for many-core architectures

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F47813059%3A19240%2F16%3AN0000131" target="_blank" >RIV/47813059:19240/16:N0000131 - isvavai.cz</a>
Výsledek na webu
<a href="http://www.sciencedirect.com/science/article/pii/S221313371630018X" target="_blank" >http://www.sciencedirect.com/science/article/pii/S221313371630018X</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.ascom.2016.03.003" target="_blank" >10.1016/j.ascom.2016.03.003</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
A polyphase filter for many-core architectures
Popis výsledku v původním jazyce
In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFLOP/s/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is 1.5× to 1.92× greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.
Název v anglickém jazyce
A polyphase filter for many-core architectures
Popis výsledku anglicky
In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. The polyphase filter is a standard tool in digital signal processing and as such a well established algorithm. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards (Fermi, Kepler, Maxwell), on the Intel Xeon CPU and Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFLOP/s/s) and type conversions (GTc/s). We include a presentation of our results in terms of the sample rate which can be processed in real-time by a chosen platform, which more intuitively describes the expected performance in a signal processing setting. Our findings show that, for the GPUs considered, the performance of our polyphase filter when using lower precision input data is limited by type conversions rather than device bandwidth. We compare these results to an implementation on the Xeon Phi. We show that our Xeon Phi implementation has a performance that is 1.5× to 1.92× greater than our CPU implementation, however is not insufficient to compete with the performance of GPUs. We conclude with a comparison of our best performing code to two other implementations of the polyphase filter, showing that our implementation is faster in nearly all cases. This work forms part of the Astro-Accelerate project, a many-core accelerated real-time data processing library for digital signal processing of time-domain radio astronomy data.

Klasifikace

Druh
J<sub>x</sub> - Nezařazeno - Článek v odborném periodiku (Jimp, Jsc a Jost)
CEP obor
BN - Astronomie a nebeská mechanika, astrofyzika
OECD FORD obor
—

Návaznosti výsledku

Projekt
—
Návaznosti
S - Specificky vyzkum na vysokych skolach<br>I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2016
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Astronomy and Computing
ISSN
2213-1337
e-ISSN
—
Svazek periodika
16
Číslo periodika v rámci svazku
July
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
16
Strana od-do
1-16
Kód UT WoS článku
000382414100001
EID výsledku v databázi Scopus
2-s2.0-84963864190

Podobné výsledky(10)

Efficient Transfer of C++ Objects in Intel Xeon Phi Coprocessor in Offload Mode Optimization of Selected Remote Sensing Algorithms for Many-Core Architectures Boundary element quadrature schemes for multi- and many-core architecture

Co hledáte?

Rychlé hledání

Chytré vyhledávání

A polyphase filter for many-core architectures

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)