Nie jesteś zalogowany | Zaloguj się

Prelegent(ci): Piotr Kozakowski
Termin: 6 maja 2021 12:15
Informacje na temat wydarzenia: meet.google.com/yew-oubf-ngi
Seminarium: Seminarium "Uczenie maszynowe"

Sample efficiency is a major challenge in the current Reinforcement Learning (RL) systems. Another is robustness - it is hard to find one RL algorithm that will perform well in a variety of settings. I am going to present QWR - a novel RL algorithm that performs on-par with Soft Actor Critic (SAC) in continuous control tasks, works well in the Offline RL setting, unlike SAC, and surpasses Rainbow in sample efficiency on the Atari benchmark, while being significantly simpler than both algorithms. I will also present several related RL methods that influenced the design of QWR.

Q-Value Weighted Regression: Reinforcement Learning with Limited Data

Kariera

Strona internetowa

Bezpieczeństwo na Kampusie