Tradeoff between expectation and variance in Markov decision processes
- Prelegent(ci)
- Jakob Piribauer
- Afiliacja
- TU Dresden
- Termin
- 23 listopada 2022 14:15
- Pokój
- p. 5050
- Seminarium
- Seminarium „Teoria automatów”
The stochastic shortest path problem asks to resolve the non-deterministic choices in a Markov decision process (MDP) such that the expected accumulated weight before reaching a target state is maximized. This problem is well-studied and solvable in polynomial time. An optimal strategy might, however, lead to an undesirable probability distribution over path weights as all aspects of the distribution except for the expectation are disregarded. In this talk, we address the problem to achieve high values of expected accumulated weight while keeping the variance of the resulting distribution small. In other words, we take a look at the resulting Pareto frontier in terms of expectation and variance. We show that the variance-penalized expectation (expectation - lambda * variance) can be optimized by finite memory schedulers, which helps us to better understand the Pareto frontier, and establish complexity bounds for this optimization problem. Furthermore, difficulties arising in understanding the full Pareto frontier will be illustrated. The talk is based on joint work with Ocan Sankur and Christel Baier.