SUPERB: Speech Understanding and PERformance Benchmark
August 30, 2021
By: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Daniel Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Godic Lee, Darong Liu, Zili Huang, Annie Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee
Using self-supervised learning methods to pre-train a network on large volumes of unlabeled data followed by fine-tuning for multiple downstream tasks has proven vital for advancing research in natural language representation learning. However, the speech processing community lacks a similar setup that systematically measures the quality of learned representations across a wide range of downstream speech applications. To bridge this gap, we introduce the Speech Understanding and Performance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark the performance of learned speech representations on ten speech processing tasks. We present a complete framework for learning and evaluating specialized prediction heads for each task given the pre-trained speech representations. Our results on many publicly-available self-supervised models demonstrate their generalization abilities to multiple speech tasks with limited supervised and minimal architecture changes. All the materials are open-sourced and reproducible in the s3prl toolkit to facilitate future research in speech representation learning. View code at: GitHub
Download Paper
Related Publications
AKBC - October 3, 2021
Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations
Yihong Chen, Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp
ICCV - October 11, 2021
Contrast and Classify: Training Robust VQA Models
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
Interspeech - August 30, 2021
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) - December 13, 2021
Kaizen: Continuously Improving Teacher Using Exponential Moving Average For Semi-supervised Speech Recognition
Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
All Publications
Additional Resources
Downloads & Projects
Visiting Researchers & Postdocs
Visit Our Other Blogs
Facebook AI
RSS Feed
Facebook © 2021
To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy