HyperAIHyperAI超神经
首页资讯最新论文教程数据集百科SOTALLM 模型天梯GPU 天梯顶会
全站搜索
关于
中文
HyperAIHyperAI超神经
  1. 首页
  2. SOTA
  3. 视听语音识别
  4. Audio Visual Speech Recognition On Lrs3 Ted

Audio Visual Speech Recognition On Lrs3 Ted

评估指标

Word Error Rate (WER)

评测结果

各个模型在此基准测试上的表现结果

模型名称
Word Error Rate (WER)
Paper TitleRepository
EG-seq2seq6.8Discriminative Multi-modality Speech Recognition-
DistillAV1.3Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models-
TM-seq2seq7.2Deep Audio-Visual Speech Recognition-
RNN-T4.5Recurrent Neural Network Transducer for Audio-Visual Speech Recognition-
Hyb-Conformer2.3End-to-end Audio-visual Speech Recognition with Conformers-
AV-HuBERT Large1.4Robust Self-Supervised Audio-Visual Speech Recognition-
Llama-AVSR0.77Large Language Models are Strong Audio-Visual Speech Recognition Learners-
CTC/Attention0.9Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels-
Whisper-Flamingo0.76Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation-
RAVEn Large1.4Jointly Learning Visual and Auditory Speech Representations from Raw Data-
Zero-AVSR1.5Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations-
MMS-LLaMA0.74MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens-
0 of 12 row(s) selected.
HyperAI

学习、理解、实践,与社区一起构建人工智能的未来

中文

关于

关于我们数据集帮助

产品

资讯教程数据集百科

链接

TVM 中文Apache TVMOpenBayes

© HyperAI超神经

津ICP备17010941号-1京公网安备11010502038810号京公网安备11010502038810号
TwitterBilibili