This method uses transformers to address video-based tasks such as action recognition, addressing the need for models that can handle spatiotemporal data. 27.07.2023 17:54 aior