PUBLICATION
Lookahead-Bounded Q-Learning
International Conference on Machine Learning (ICML)
July 12, 2020
By: Ibrahim El Shar, Daniel Jiang
Abstract
We introduce the lookahead-bounded Q-learning (LBQL) algorithm, a new, provably convergent variant of Q-learning that seeks to improve the performance of standard Q-learning in stochastic environments through the use of “lookahead” upper and lower bounds. To do this, LBQL employs previously collected experience and each iteration’s state-action values as dual feasible penalties to construct a sequence of sampled information relaxation problems. The solutions to these problems provide estimated upper and lower bounds on the optimal value, which we track via stochastic approximation. These quantities are then used to constrain the iterates to stay within the bounds at every iteration. Numerical experiments confirm the fast convergence of LBQL as compared to the standard Q-learning algorithm and several related techniques. Our approach is particularly appealing in problems that require expensive simulations or real-world interactions.
Download Paper
Areas
MACHINE LEARNING
Share
Related Publications
NeurIPS - December 6, 2021
Parallel Bayesian Optimization of Multiple Noisy Objectives with Expected Hypervolume Improvement
Samuel Daulton, Maximilian Balandat, Eytan Bakshy
UAI - July 27, 2021
Measuring Data Leakage in Machine-Learning Models with Fisher Information
Awni Hannun, Chuan Guo, Laurens van der Maaten
arXiv - January 29, 2020
fastMRI: An Open Dataset and Benchmarks for Accelerated MRI
Jure Zbontar, Florian Knoll, Anuroop Sriram, Tullie Murrell, Zhengnan Huang, Matthew J. Muckley, Aaron Defazio, Ruben Stern, Patricia Johnson, Mary Bruno, Marc Parente, Krzysztof J. Geras, Joe Katsnelson, Hersh Chandarana, Zizhao Zhang, Michal Drozdzal, Adriana Romero, Michael Rabbat, Pascal Vincent, Nafissa Yakubova, James Pinkerton, Duo Wang, Erich Owens, Larry Zitnick, Michael P. Recht, Daniel K. Sodickson, Yvonne W. Lui
NeurIPS - December 6, 2021
CRYPTEN: Secure Multi-Party Computation Meets Machine Learning
Brian Knott, Shobha Venkataraman, Awni Hannun, Shubho Sengupta, Mark Ibrahim, Laurens van der Maaten
All Publications
Additional Resources
Videos
Downloads & Projects
Visiting Researchers & Postdocs
Visit Our Other Blogs
Engineering
Facebook AI
Oculus
Tech@
RSS Feed
About
Careers
Privacy
Cookies
Terms
Help
Facebook © 2021
To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy