A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training
The result's identifiers
Result code in IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F49777513%3A23520%2F17%3A43932982" target="_blank" >RIV/49777513:23520/17:43932982 - isvavai.cz</a>
Result on the web
<a href="http://ieeexplore.ieee.org/document/7990554/?reload=true" target="_blank" >http://ieeexplore.ieee.org/document/7990554/?reload=true</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/TPDS.2017.2731764" target="_blank" >10.1109/TPDS.2017.2731764</a>
Alternative languages
Result language
angličtina
Original language name
A GPU-Architecture Optimized Hierarchical Decomposition Algorithm for Support Vector Machine Training
Original language description
In the last decade, several GPU implementations of Support Vector Machine (SVM) training with nonlinear kernels were published. Some of them even with source codes. The most effective ones are based on Sequential Minimal Optimization (SMO). They decompose the restricted quadratic problem into a series of smallest possible subproblems, which are then solved analytically. For large datasets, the majority of elapsed time is spent by a large amount of matrix-vector multiplications that cannot be computed efficiently on current GPUs because of limited memory bandwidth. In this paper, we introduce a novel GPU approach to the SVM training that we call Optimized Hierarchical Decomposition SVM (OHD-SVM). It uses a hierarchical decomposition iterative algorithm that fits better to actual GPU architecture. The low decomposition level uses a single GPU multiprocessor to efficiently solve a local subproblem. Nowadays a single GPU multiprocessor can run thousand or more threads that are able to synchronize quickly. It is an ideal platform for a single kernel SMO-based local solver with fast local iterations. The high decomposition level updates gradients of entire training set and selects a new local working set. The gradient update requires many kernel values that are costly to compute. However, solving a large local subproblem offers an efficient kernel values computation via a matrix-matrix multiplication that is much more efficient than the matrix-vector multiplication used in already published implementations. Along with a description of our implementation, the paper includes an exact comparison of five publicly available C++ SVM training GPU implementations. In this paper, the binary classification task and RBF kernel function are taken into account as it is usual in most of the recent papers. According to the measured results on a wide set of publicly available datasets, our proposed approach excelled significantly over the other methods in all datasets. The biggest difference was on the largest dataset where we achieved speed-up up to 12 times in comparison with the fastest already published GPU implementation. Moreover, our OHD-SVM is the only one that can handle dense as well as sparse datasets. Along with this paper, we published the source-codes at https://github.com/OrcusCZ/OHD-SVM.
Czech name
—
Czech description
—
Classification
Type
J<sub>imp</sub> - Article in a specialist periodical, which is included in the Web of Science database
CEP classification
—
OECD FORD branch
20205 - Automation and control systems
Result continuities
Project
<a href="/en/project/GBP103%2F12%2FG084" target="_blank" >GBP103/12/G084: Center for Large Scale Multi-modal Data Interpretation</a><br>
Continuities
P - Projekt vyzkumu a vyvoje financovany z verejnych zdroju (s odkazem do CEP)
Others
Publication year
2017
Confidentiality
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Data specific for result type
Name of the periodical
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
ISSN
1045-9219
e-ISSN
—
Volume of the periodical
28
Issue of the periodical within the volume
12
Country of publishing house
US - UNITED STATES
Number of pages
14
Pages from-to
3330-3343
UT code for WoS article
000415179500002
EID of the result in the Scopus database
2-s2.0-85028954180