The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction
Identifikátory výsledku
Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F24%3A00379847" target="_blank" >RIV/68407700:21230/24:00379847 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1109/ACCESS.2024.3436585" target="_blank" >https://doi.org/10.1109/ACCESS.2024.3436585</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ACCESS.2024.3436585" target="_blank" >10.1109/ACCESS.2024.3436585</a>
Alternativní jazyky
Jazyk výsledku
angličtina
Název v původním jazyce
The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction
Popis výsledku v původním jazyce
The Transformer revolutionized Natural Language Processing and Computer Vision by effectively capturing contextual relationships in sequential data through its attention mechanism. While Transformers have been explored sufficiently in traditional computer vision tasks such as image classification, their application to more intricate tasks, such as Video Saliency Prediction (VSP), remains limited. Video saliency prediction is the task of identifying the most visually salient regions in a video, which are likely to capture a viewer's attention. In this study, we propose a pure transformer architecture named Temporal Visual Saliency Transformer (TempVST) for the VSP task. Our model leverages the Visual Saliency Transformer (VST) as a backbone, with the addition of a Transformer-based temporal module that can seamlessly transition diverse architectural frameworks from image to video domain, through the incorporation of temporal recurrences. Moreover, we demonstrate that transfer learning is viable in the context of VSP through Transformer architectures and helps reduce the duration of the training phase, leading to a reduction in the duration of the training phase by 41% and 45% in two different datasets. Our experiments were conducted on two benchmark datasets, DHF1K and LEDOV, and our results show that our network can compete with all other state-of-the-art models.
Název v anglickém jazyce
The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction
Popis výsledku anglicky
The Transformer revolutionized Natural Language Processing and Computer Vision by effectively capturing contextual relationships in sequential data through its attention mechanism. While Transformers have been explored sufficiently in traditional computer vision tasks such as image classification, their application to more intricate tasks, such as Video Saliency Prediction (VSP), remains limited. Video saliency prediction is the task of identifying the most visually salient regions in a video, which are likely to capture a viewer's attention. In this study, we propose a pure transformer architecture named Temporal Visual Saliency Transformer (TempVST) for the VSP task. Our model leverages the Visual Saliency Transformer (VST) as a backbone, with the addition of a Transformer-based temporal module that can seamlessly transition diverse architectural frameworks from image to video domain, through the incorporation of temporal recurrences. Moreover, we demonstrate that transfer learning is viable in the context of VSP through Transformer architectures and helps reduce the duration of the training phase, leading to a reduction in the duration of the training phase by 41% and 45% in two different datasets. Our experiments were conducted on two benchmark datasets, DHF1K and LEDOV, and our results show that our network can compete with all other state-of-the-art models.
Klasifikace
Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)
Návaznosti výsledku
Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace
Ostatní
Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů
Údaje specifické pro druh výsledku
Název periodika
IEEE Access
ISSN
2169-3536
e-ISSN
2169-3536
Svazek periodika
12
Číslo periodika v rámci svazku
Aug
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
12
Strana od-do
129705-129716
Kód UT WoS článku
001320453600001
EID výsledku v databázi Scopus
2-s2.0-85200252622