The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction

Identifikátory výsledku

Kód výsledku v IS VaVaI
<a href="https://www.isvavai.cz/riv?ss=detail&h=RIV%2F68407700%3A21230%2F24%3A00379847" target="_blank" >RIV/68407700:21230/24:00379847 - isvavai.cz</a>
Výsledek na webu
<a href="https://doi.org/10.1109/ACCESS.2024.3436585" target="_blank" >https://doi.org/10.1109/ACCESS.2024.3436585</a>
DOI - Digital Object Identifier
<a href="http://dx.doi.org/10.1109/ACCESS.2024.3436585" target="_blank" >10.1109/ACCESS.2024.3436585</a>

Alternativní jazyky

Jazyk výsledku
angličtina
Název v původním jazyce
The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction
Popis výsledku v původním jazyce
The Transformer revolutionized Natural Language Processing and Computer Vision by effectively capturing contextual relationships in sequential data through its attention mechanism. While Transformers have been explored sufficiently in traditional computer vision tasks such as image classification, their application to more intricate tasks, such as Video Saliency Prediction (VSP), remains limited. Video saliency prediction is the task of identifying the most visually salient regions in a video, which are likely to capture a viewer's attention. In this study, we propose a pure transformer architecture named Temporal Visual Saliency Transformer (TempVST) for the VSP task. Our model leverages the Visual Saliency Transformer (VST) as a backbone, with the addition of a Transformer-based temporal module that can seamlessly transition diverse architectural frameworks from image to video domain, through the incorporation of temporal recurrences. Moreover, we demonstrate that transfer learning is viable in the context of VSP through Transformer architectures and helps reduce the duration of the training phase, leading to a reduction in the duration of the training phase by 41% and 45% in two different datasets. Our experiments were conducted on two benchmark datasets, DHF1K and LEDOV, and our results show that our network can compete with all other state-of-the-art models.
Název v anglickém jazyce
The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction
Popis výsledku anglicky
The Transformer revolutionized Natural Language Processing and Computer Vision by effectively capturing contextual relationships in sequential data through its attention mechanism. While Transformers have been explored sufficiently in traditional computer vision tasks such as image classification, their application to more intricate tasks, such as Video Saliency Prediction (VSP), remains limited. Video saliency prediction is the task of identifying the most visually salient regions in a video, which are likely to capture a viewer's attention. In this study, we propose a pure transformer architecture named Temporal Visual Saliency Transformer (TempVST) for the VSP task. Our model leverages the Visual Saliency Transformer (VST) as a backbone, with the addition of a Transformer-based temporal module that can seamlessly transition diverse architectural frameworks from image to video domain, through the incorporation of temporal recurrences. Moreover, we demonstrate that transfer learning is viable in the context of VSP through Transformer architectures and helps reduce the duration of the training phase, leading to a reduction in the duration of the training phase by 41% and 45% in two different datasets. Our experiments were conducted on two benchmark datasets, DHF1K and LEDOV, and our results show that our network can compete with all other state-of-the-art models.

Klasifikace

Druh
J<sub>imp</sub> - Článek v periodiku v databázi Web of Science
CEP obor
—
OECD FORD obor
10201 - Computer sciences, information science, bioinformathics (hardware development to be 2.2, social aspect to be 5.8)

Návaznosti výsledku

Projekt
—
Návaznosti
I - Institucionalni podpora na dlouhodoby koncepcni rozvoj vyzkumne organizace

Ostatní

Rok uplatnění
2024
Kód důvěrnosti údajů
S - Úplné a pravdivé údaje o projektu nepodléhají ochraně podle zvláštních právních předpisů

Údaje specifické pro druh výsledku

Název periodika
IEEE Access
ISSN
2169-3536
e-ISSN
2169-3536
Svazek periodika
12
Číslo periodika v rámci svazku
Aug
Stát vydavatele periodika
US - Spojené státy americké
Počet stran výsledku
12
Strana od-do
129705-129716
Kód UT WoS článku
001320453600001
EID výsledku v databázi Scopus
2-s2.0-85200252622

Podobné výsledky(10)

GUMsley: Evaluating Entity Salience in Summarization for 12 English Genres Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers Saliency Methods Analysis for Paintings

Co hledáte?

Rychlé hledání

Chytré vyhledávání

The Visual Saliency Transformer Goes Temporal: TempVST for Video Saliency Prediction

Identifikátory výsledku

Alternativní jazyky

Klasifikace

Návaznosti výsledku

Ostatní

Údaje specifické pro druh výsledku

Podobné výsledky(10)

Co hledáte?

Rychlé hledání

Chytré vyhledávání

Popis výsledku

Identifikátory výsledku

Identifikátory výsledku

Alternativní jazyky

Alternativní jazyky

Klasifikace

Klasifikace

Návaznosti výsledku

Návaznosti výsledku

Ostatní

Ostatní

Údaje specifické pro druh výsledku

Údaje specifické pro druh výsledku

Podobné výsledky(10)