PUBLICATION
LiRA: Learning Visual Speech Representations from Audio through Self-supervision
Interspeech
October 12, 2021
By: Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic
Abstract
The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning. Recent works have focused on each of these modalities separately, while others have attempted to model both simultaneously in a cross-modal fashion. However, comparatively little attention has been given to leveraging one modality as a training objective to learn from the other. In this work, we propose Learning visual speech Representations from Audio via self-supervision (LiRA). Specifically, we train a ResNet+Conformer model to predict acoustic features from unlabelled visual speech. We find that this pre-trained model can be leveraged towards word-level and sentence-level lip reading through feature extraction and fine-tuning experiments. We show that our approach significantly outperforms other self-supervised methods on the Lip Reading in the Wild (LRW) dataset and achieves state-of-the-art performance on Lip Reading Sentences 2 (LRS2) using only a fraction of the total labelled data. Index Terms: self-supervised learning, lipreading, visual speech recognition, visual representations, conformer
Download Paper
Areas
COMPUTER VISION
MACHINE LEARNING
Share
Related Publications
NeurIPS - December 5, 2021
Local Differential Privacy for Regret Minimization in Reinforcement Learning
Evrard Garcelon, Vianney Perchet, Ciara Pike-Burke, Matteo Pirotta
NeurIPS - December 5, 2021
Hierarchical Skills for Efficient Exploration
Jonas Gehring, Gabriel Synnaeve, Andreas Krause, Nicolas Usunier
NeurIPS - December 5, 2021
Interpretable agent communication from scratch (with a generic visual processor emerging on the side)
Roberto Dessì, Eugene Kharitonov, Marco Baroni
NeurIPS - December 6, 2021
Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement
Samuel Daulton, Maximilian Balandat, Eytan Bakshy
All Publications
Additional Resources
Videos
Downloads & Projects
Visiting Researchers & Postdocs
Visit Our Other Blogs
Engineering
Facebook AI
Oculus
Tech@
RSS Feed
About
Careers
Privacy
Cookies
Terms
Help
Facebook © 2021
To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy