This is a variant of BERT used for video understanding tasks. It's trained on large-scale video and associated textual data, and can be used to understand actions and events in videos. 27.07.2023 17:54 aior