HyperAI
HyperAI超神经
首页
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
视觉问答 (VQA)
Visual Question Answering On Vqa V2 Val
Visual Question Answering On Vqa V2 Val
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
MetaLM
41.1
Language Models are General-Purpose Interfaces
-
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
65.2
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
-
BLIP-2 ViT-G OPT 6.7B (zero-shot)
54.3
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
-
Frozen
29.5
Multimodal Few-Shot Learning with Frozen Language Models
-
BLIP-2 ViT-G FlanT5 XL (zero-shot)
63.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
-
BLIP-2 ViT-L OPT 2.7B (zero-shot)
50.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
-
VLKD(ViT-B/16)
38.6
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
-
BLIP-2 ViT-L FlanT5 XL (zero-shot)
62.6
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
-
PNP-VQA
63.3
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
-
Few VLM (zero-shot)
47.7
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
-
BLIP-2 ViT-G OPT 2.7B (zero-shot)
53.5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
-
0 of 11 row(s) selected.
Previous
Next
Visual Question Answering On Vqa V2 Val | SOTA | HyperAI超神经