PUBLICATION
A Two-stage Approach to Speech Bandwidth Extension
Interspeech
August 30, 2021
By: Ju Lin, Yun Wang, Kaustubh Kalgaonkar, Gil Keren, Didi Zhang, Christian Fuegen
Abstract
Algorithms for speech bandwidth extension (BWE) may work in either the time domain or the frequency domain. Time-domain methods often do not sufficiently recover the high-frequency content of speech signals; frequency-domain methods are better at recovering the spectral envelope, but have difficulty reconstructing the details of the waveform. In this paper, we propose a two-stage approach for BWE, which enjoys the advantages of both time- and frequency-domain methods. The first stage is a frequency-domain neural network, which predicts the high-frequency part of the wide-band spectrogram from the narrow-band input spectrogram. The wide-band spectrogram is then converted into a time-domain waveform, and passed through the second stage to refine the temporal details. For the first stage, we compare a convolutional recurrent network (CRN) with a temporal convolutional network (TCN), and find that the latter is able to capture long-span dependencies equally well as the former while using a lot fewer parameters. For the second stage, we enhance the Wave-U-Net architecture with a multi-resolution short-time Fourier transform (MSTFT) loss function. A series of comprehensive experiments show that the proposed system achieves superior performance in speech enhancement (measured by both time-and frequency-domain metrics) as well as speech recognition.
Download Paper
Areas
NATURAL LANGUAGE PROCESSING & SPEECH
Share
Related Publications
AKBC - October 3, 2021
Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations
Yihong Chen, Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp
ICCV - October 11, 2021
Contrast and Classify: Training Robust VQA Models
Yash Kant, Abhinav Moudgil, Dhruv Batra, Devi Parikh, Harsh Agrawal
Interspeech - August 30, 2021
Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency
Yangyang Shi, Varun Nagaraja, Chunyang Wu, Jay Mahadeokar, Duc Le, Rohit Prabhavalkar, Alex Xiao, Ching-Feng Yeh, Julian Chan, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer
IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) - December 13, 2021
Kaizen: Continuously Improving Teacher Using Exponential Moving Average For Semi-supervised Speech Recognition
Vimal Manohar, Tatiana Likhomanenko, Qiantong Xu, Wei-Ning Hsu, Ronan Collobert, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
All Publications
Additional Resources
Videos
Downloads & Projects
Visiting Researchers & Postdocs
Visit Our Other Blogs
Engineering
Facebook AI
Oculus
Tech@
RSS Feed
About
Careers
Privacy
Cookies
Terms
Help
Facebook © 2021
To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy