Q-Value Weighted Regression: Reinforcement Learning with Limited Data
- Prelegent(ci)
- Piotr Kozakowski
- Termin
- 6 maja 2021 12:15
- Informacje na temat wydarzenia
- meet.google.com/yew-oubf-ngi
- Seminarium
- Seminarium "Uczenie maszynowe"
Sample efficiency is a major challenge in the current Reinforcement Learning (RL) systems. Another is robustness - it is hard to find one RL algorithm that will perform well in a variety of settings. I am going to present QWR - a novel RL algorithm that performs on-par with Soft Actor Critic (SAC) in continuous control tasks, works well in the Offline RL setting, unlike SAC, and surpasses Rainbow in sample efficiency on the Atari benchmark, while being significantly simpler than both algorithms. I will also present several related RL methods that influenced the design of QWR.