All

What are you looking for?

All
Projects
Results
Organizations

Quick search

  • Projects supported by TA ČR
  • Excellent projects
  • Projects with the highest public support
  • Current projects

Smart search

  • That is how I find a specific +word
  • That is how I leave the -word out of the results
  • “That is how I can find the whole phrase”

Optimizing CUDA code by kernel fusion: application on BLAS

The result's identifiers

  • Result code in IS VaVaI

    <a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14330%2F15%3A00083436" target="_blank" >RIV/00216224:14330/15:00083436 - isvavai.cz</a>

  • Result on the web

    <a href="http://dx.doi.org/10.1007/s11227-015-1483-z" target="_blank" >http://dx.doi.org/10.1007/s11227-015-1483-z</a>

  • DOI - Digital Object Identifier

    <a href="http://dx.doi.org/10.1007/s11227-015-1483-z" target="_blank" >10.1007/s11227-015-1483-z</a>

Alternative languages

  • Result language

    angličtina

  • Original language name

    Optimizing CUDA code by kernel fusion: application on BLAS

  • Original language description

    Contemporary GPUs have significantly higher arithmetic throughput than a memory throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic power of the GPU. Examples of memory-bound kernels are BLAS-1 (vector?vector) and BLAS-2 (matrix?vector) operations. However, when kernels share data, kernel fusion can improve memory locality by placing shared data, originally passed via off-chip global memory, into a faster, but distributed on-chip memory. In this paper, we show how kernelsperforming map, reduce or their nested combinations can be fused automatically by our source-to-source compiler. To demonstrate the usability of the compiler, we have implemented several BLAS-1 and BLAS-2 routines and show how the performance of their sequences can be improved by fusions. Compared with similar sequences using CUBLAS, our compiler is able to generate code that is up to 2.24x faster for the examples tested.

  • Czech name

  • Czech description

Classification

  • Type

    J<sub>x</sub> - Unclassified - Peer-reviewed scientific article (Jimp, Jsc and Jost)

  • CEP classification

    IN - Informatics

  • OECD FORD branch

Result continuities

  • Project

    <a href="/en/project/EE2.3.30.0037" target="_blank" >EE2.3.30.0037: Employment of Best Young Scientists for International Cooperation Empowerment</a><br>

  • Continuities

    P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Others

  • Publication year

    2015

  • Confidentiality

    S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Data specific for result type

  • Name of the periodical

    The Journal of Supercomputing

  • ISSN

    0920-8542

  • e-ISSN

  • Volume of the periodical

    71

  • Issue of the periodical within the volume

    10

  • Country of publishing house

    US - UNITED STATES

  • Number of pages

    24

  • Pages from-to

    3934-3957

  • UT code for WoS article

    000361531500013

  • EID of the result in the Scopus database