Using hardware performance counters to speed up autotuning convergence on GPUs

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216224%3A14610%2F22%3A00125022" target="_blank" >RIV/00216224:14610/22:00125022 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.sciencedirect.com/science/article/pii/S0743731521001945?via%3Dihub" target="_blank" >https://www.sciencedirect.com/science/article/pii/S0743731521001945?via%3Dihub</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1016/j.jpdc.2021.10.003" target="_blank" >10.1016/j.jpdc.2021.10.003</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
Using hardware performance counters to speed up autotuning convergence on GPUs
Popis výsledku v původním jazyce
Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware. In this paper, we introduce a novel method for searching generic tuning spaces. The tuning spaces can contain tuning parameters changing any user-defined property of the source code. The method takes advantage of collecting hardware performance counters (also known as profiling counters) during empirical tuning. Those counters are used to navigate the searching process towards faster implementations. The method requires the tuning space to be sampled on any GPU. It builds a problem-specific model, which can be used during autotuning on various, even previously unseen inputs or GPUs. Using a set of five benchmarks, we experimentally demonstrate that our method can speed up autotuning when an application needs to be ported to different hardware or when it needs to process data with different characteristics. We also compared our method to state of the art and show that our method is superior in terms of the number of searching steps and typically outperforms other searches in terms of convergence time.
Název v anglickém jazyce
Using hardware performance counters to speed up autotuning convergence on GPUs
Popis výsledku anglicky
Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware. In this paper, we introduce a novel method for searching generic tuning spaces. The tuning spaces can contain tuning parameters changing any user-defined property of the source code. The method takes advantage of collecting hardware performance counters (also known as profiling counters) during empirical tuning. Those counters are used to navigate the searching process towards faster implementations. The method requires the tuning space to be sampled on any GPU. It builds a problem-specific model, which can be used during autotuning on various, even previously unseen inputs or GPUs. Using a set of five benchmarks, we experimentally demonstrate that our method can speed up autotuning when an application needs to be ported to different hardware or when it needs to process data with different characteristics. We also compared our method to state of the art and show that our method is superior in terms of the number of searching steps and typically outperforms other searches in terms of convergence time.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
<a href="/cs/project/EF16_013%2F0001802" target="_blank" >EF16_013/0001802: CERIT Scientific Cloud</a><br>
Návaznosti
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)

Ostatní

Rok uplatnění
2022
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
Journal of Parallel and Distributed Computing
ISSN
0743-7315
e-ISSN
—
Svazek periodika
160
Číslo periodika v rámci svazku
February
Stát vydavatele periodika
NL - Nizozemsko
Počet stran výsledku
20
Strana od-do
16-35
Kód UT WoS článku
000711621300002
EID výsledku v databázi Scopus
2-s2.0-85117699383

Podobné výsledky(10)

Searching CUDA code autotuning spaces with hardware performance counters: data from benchmarks running on various GPU architectures A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit Exploiting historical data: pruning autotuning spaces and estimating the number of tuning steps

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Using hardware performance counters to speed up autotuning convergence on GPUs

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)