HyperAI超神经

首页资讯论文教程数据集百科 SOTA LLM 模型天梯 GPU 天梯顶会

中文

HyperAI超神经

Action Recognition On Diving 48

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称	Accuracy	Paper Title	Repository
ORViT TimeSformer	88.0	Object-Region Video Transformers	-
VIMPAC	85.5	VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning	-
SlowFast	77.6	SlowFast Networks for Video Recognition	-
LVMAE	94.9	Extending Video Masked Autoencoders to 128 frames	-
StructVit-B-4-1	88.3	Learning Correlation Structures for Vision Transformers	-
TimeSformer	75	Is Space-Time Attention All You Need for Video Understanding?	-
DUALPATH	88.7	Dual-path Adaptation from Image to Video Transformers	-
TimeSformer-HR	78	Is Space-Time Attention All You Need for Video Understanding?	-
TFCNet	88.3	TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning	-
Video-FocalNet-B	90.8	Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition	-
AIM (CLIP ViT-L/14, 32x224)	90.6	AIM: Adapting Image Models for Efficient Video Action Recognition	-
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)	84.2	Relational Self-Attention: What's Missing in Attention for Video Understanding	-
GC-TDN	87.6	Group Contextualization for Video Recognition	-
BEVT	86.7	BEVT: BERT Pretraining of Video Transformers	-
PMI Sampler	81.3	PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition	-
TQN	81.8	Temporal Query Networks for Fine-grained Video Understanding	-
TimeSformer-L	81	Is Space-Time Attention All You Need for Video Understanding?	-
PSB	86	Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition	-

0 of 18 row(s) selected.