HDF5 parallelization for hierarchical semi-sparse data cubes
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F67985815%3A_____%2F24%3A00617588" target="_blank" >RIV/67985815:_____/24:00617588 - isvavai.cz</a>
Výsledek na webu
<a href="https://www.aspbooks.org/publications/535/115.pdf" target="_blank" >https://www.aspbooks.org/publications/535/115.pdf</a>
DOI - Digital Object Identifier
—
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
HDF5 parallelization for hierarchical semi-sparse data cubes
Popis výsledku v původním jazyce
Big Data is not only about big volumes but also a higher number of dimensions of the data. For every observed astronomical object, we usually have multiple observations in different times, wavelengths, polarization, or even created by different instrument types. Intuitively, taking all of the relevant information into account will produce higher quality results for classification or clustering algorithms, rather than just focusing on a single aspect of the object. Most often we are talking about spectroscopic and photometric observations which can be combined into data cubes. With the Hierarchical Semi-Sparse data cubes (HiSS cubes) engine we combine spectral and imaging data within the HDF5 format for efficient use of machine learning algorithms and visualization. The HiSS cube ensures this efficiency by implementing an indexing mechanism within the HDF5 that also takes advantage of the native chunking feature. Preprocessing that rescales the spectral and photometry measurements, in order to be directly comparable, takes significant time. Therefore, it needs to be parallelized, and this parallelization also takes advantage of the native HDF5 parallel I/O feature. This contribution focuses on the parallel performance of the Python version h5py of the HDF5-based solution in the construction of the HiSS cube.
Název v anglickém jazyce
HDF5 parallelization for hierarchical semi-sparse data cubes
Popis výsledku anglicky
Big Data is not only about big volumes but also a higher number of dimensions of the data. For every observed astronomical object, we usually have multiple observations in different times, wavelengths, polarization, or even created by different instrument types. Intuitively, taking all of the relevant information into account will produce higher quality results for classification or clustering algorithms, rather than just focusing on a single aspect of the object. Most often we are talking about spectroscopic and photometric observations which can be combined into data cubes. With the Hierarchical Semi-Sparse data cubes (HiSS cubes) engine we combine spectral and imaging data within the HDF5 format for efficient use of machine learning algorithms and visualization. The HiSS cube ensures this efficiency by implementing an indexing mechanism within the HDF5 that also takes advantage of the native chunking feature. Preprocessing that rescales the spectral and photometry measurements, in order to be directly comparable, takes significant time. Therefore, it needs to be parallelized, and this parallelization also takes advantage of the native HDF5 parallel I/O feature. This contribution focuses on the parallel performance of the Python version h5py of the HDF5-based solution in the construction of the HiSS cube.
Klasifikace
Druh
D - Stať ve sborníku
CEP obor
—
OECD FORD obor
10308 - Astronomy (including astrophysics,space science)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název statě ve sborníku
Astronomical Data Analysis Software and Systems XXXI
ISBN
978-1-58381-957-9
ISSN
—
e-ISSN
—
Počet stran výsledku
4
Strana od-do
115-118
Název nakladatele
Astronomical Society of the Pacific
Místo vydání
San Francisco
Místo konání akce
Kapské Město
Datum konání akce
24. 10. 2021
Typ akce podle státní příslušnosti
WRD - Celosvětová akce
Kód UT WoS článku
—