HyperAI超神经

首页资讯最新论文教程数据集百科 SOTA LLM 模型天梯 GPU 天梯顶会

中文

HyperAI超神经

Zero Shot Video Question Answer On Video Mme 1

评估指标

Accuracy (%)

评测结果

各个模型在此基准测试上的表现结果

模型名称	Accuracy (%)	Paper Title	Repository
GPT-4o mini	68.9	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	-
VideoLLaMA2 (72B)	63.1	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	-
BIMBA-LLaVA-Qwen2-7B	64.67	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering	-
Video-RAG (Based on LLaVA-Video)	77.4	Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension	-
VILA-1.5 (34B)	64.1	VILA: On Pre-training for Visual Language Models	-
Gemini 1.5 Pro	81.3	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	-
LongVU (7B)	60.6	LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	-
MiniCPM-V 2.6 (8B)	63.7	MiniCPM-V: A GPT-4V Level MLLM on Your Phone	-
Gemini 1.5 Flash	75.0	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context	-
GPT-4o	77.2	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	-

0 of 10 row(s) selected.