Optimizing CUDA code by kernel fusion: application on BLAS
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F15%3A00083436" target="_blank" >RIV/00216224:14330/15:00083436 - isvavai.cz</a>
Result on the web
<a href="http://dx.doi.org/10.1007/s11227-015-1483-z" target="_blank" >http://dx.doi.org/10.1007/s11227-015-1483-z</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1007/s11227-015-1483-z" target="_blank" >10.1007/s11227-015-1483-z</a>
Alternative languages
Result language
angličtina
Original language name
Optimizing CUDA code by kernel fusion: application on BLAS
Original language description
Contemporary GPUs have significantly higher arithmetic throughput than a memory throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic power of the GPU. Examples of memory-bound kernels are BLAS-1 (vector?vector) and BLAS-2 (matrix?vector) operations. However, when kernels share data, kernel fusion can improve memory locality by placing shared data, originally passed via off-chip global memory, into a faster, but distributed on-chip memory. In this paper, we show how kernelsperforming map, reduce or their nested combinations can be fused automatically by our source-to-source compiler. To demonstrate the usability of the compiler, we have implemented several BLAS-1 and BLAS-2 routines and show how the performance of their sequences can be improved by fusions. Compared with similar sequences using CUBLAS, our compiler is able to generate code that is up to 2.24x faster for the examples tested.
Czech name
—
Czech description
—
Classification
Type
J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)
CEP classification
IN - Informatics
OECD FORD branch
—
Result continuities
Project
<a href="/en/project/EE2.3.30.0037" target="_blank" >EE2.3.30.0037: Employment of Best Young Scientists for International Cooperation Empowerment</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2015
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
The Journal of Supercomputing
ISSN
0920-8542
e-ISSN
—
Volume of the periodical
71
Issue of the periodical within the volume
10
Country of publishing house
US - UNITED STATES
Number of pages
24
Pages from-to
3934-3957
UT code for WoS article
000361531500013
EID of the result in the Scopus database
—