slimIPL: Language-Model-Free Iterative Pseudo-Labeling
August 31, 2021
By: Tatiana Likhomanenko, Qiantong Xu, Jacob Kahn, Gabriel Synnaeve, Ronan Collobert
Recent results in end-to-end automatic speech recognition have demonstrated the efficacy of pseudo-labeling for semi-supervised models trained both with Connectionist Temporal Classification (CTC) and Sequence-to-Sequence (seq2seq) losses. Iterative Pseudo-Labeling (IPL), which continuously trains a single model using pseudo-labels iteratively re-generated as the model learns, has been shown to further improve performance in ASR. We improve upon the IPL algorithm: as the model learns, we propose to iteratively re-generate transcriptions with hard labels (the most probable tokens), that is, without a language model. We call this approach Language-Model-Free IPL (slimIPL) and give a resultant training setup for low-resource settings with CTC-based models. slimIPL features a dynamic cache for pseudo-labels which reduces sensitivity to changes in relabeling hyperparameters and results in improved training stability. slimIPL is also highly-efficient and requires 3.5-4x fewer computational resources to converge than other state-of-the-art semi/self-supervised approaches. With only 10 hours of labeled audio, slimIPL is competitive with self-supervised approaches, and is state-of-the-art with 100 hours of labeled audio without the use of a language model both at test time and during pseudo-label generation.
Download Paper
Related Publications
AKBC - October 3, 2021
Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations
Yihong Chen, Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp
ICCV - October 11, 2021
Contrast and Classify: Training Robust VQA Models
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
Interspeech - August 30, 2021
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) - December 13, 2021
Kaizen: Continuously Improving Teacher Using Exponential Moving Average For Semi-supervised Speech Recognition
Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
All Publications
Additional Resources
Downloads & Projects
Visiting Researchers & Postdocs
Visit Our Other Blogs
Facebook AI
RSS Feed
Facebook © 2021
To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy