HyperAI超神经

首页资讯论文教程数据集百科 SOTA LLM 模型天梯 GPU 天梯顶会

中文

HyperAI超神经

Visual Question Answering Vqa On Core Mm

评估指标

Abductive

Analogical

Deductive

Overall score

Params

评测结果

各个模型在此基准测试上的表现结果

模型名称	Abductive	Analogical	Deductive	Overall score	Params	Paper Title	Repository
MiniGPT-v2	13.28	5.69	11.02	10.43	8B	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	-
BLIP-2-OPT2.7B	18.96	7.5	2.76	19.31	3B	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models	-
GPT-4V	77.88	69.86	74.86	74.44	-	GPT-4 Technical Report	-
SPHINX v2	49.85	20.69	42.17	39.48	16B	SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models	-
InstructBLIP	37.76	20.56	27.56	28.02	8B	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	-
Emu	36.57	18.19	28.9	28.24	14B	Emu: Generative Pretraining in Multimodality	-
Otter	33.64	13.33	22.49	22.69	7B	Otter: A Multi-Modal Model with In-Context Instruction Tuning	-
CogVLM-Chat	47.88	28.75	36.75	37.16	17B	CogVLM: Visual Expert for Pretrained Language Models	-
mPLUG-Owl2	20.6	7.64	23.43	20.05	7B	mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration	-
OpenFlamingo-v2	5.3	1.11	8.88	6.82	9B	OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models	-
LLaVA-1.5	47.91	24.31	30.94	32.62	13B	Improved Baselines with Visual Instruction Tuning	-
Qwen-VL-Chat	44.39	30.42	37.55	37.39	16B	Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	-
LLaMA-Adapter V2	46.12	22.08	28.7	30.46	7B	LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model	-
InternLM-XComposer-VL	35.97	18.61	26.77	26.84	9B	InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition	-

0 of 14 row(s) selected.

Visual Question Answering Vqa On Core Mm | SOTA | HyperAI超神经