Mixed Precision s-step Conjugate Gradient with Residual Replacement on GPUs
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F00216208%3A11320%2F22%3A10452867" target="_blank" >RIV/00216208:11320/22:10452867 - isvavai.cz</a>
Result on the web
<a href="https://doi.org/10.1109/IPDPS53621.2022.00091" target="_blank" >https://doi.org/10.1109/IPDPS53621.2022.00091</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/IPDPS53621.2022.00091" target="_blank" >10.1109/IPDPS53621.2022.00091</a>
Alternative languages
Result language
angličtina
Original language name
Mixed Precision s-step Conjugate Gradient with Residual Replacement on GPUs
Original language description
The s-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of s. However, though mathematically equivalent, s-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of s-step CG in practice. To improve the numerical behavior of s-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the s-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision s-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of s-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to 1.8times in our experiments), especially when s-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.
Czech name
—
Czech description
—
Classification
Type
D - Article in proceedings
CEP classification
—
OECD FORD branch
10102 - Applied mathematics
Result continuities
Project
—
Continuities
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Others
Publication year
2022
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Article name in the collection
Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022
ISBN
978-1-66548-106-9
ISSN
1530-2075
e-ISSN
—
Number of pages
11
Pages from-to
886-896
Publisher name
IEEE
Place of publication
New York
Event location
Ecole Normale Supérieure de Lyon
Event date
May 30, 2022
Type of event by nationality
WRD - Celosvětová akce
UT code for WoS article
000854096200083