The main objective is to develop safe planning algorithms with various degrees of confidence for variants of Markov decision processes. More precisely, we will develop algorithms for multi-environment MDPs, partially observable MDPs, and their variants and apply these in appropriate applications provided by MERCE.
We will focus on developing practical solutions for these formalisms. Some possibilities are to develop solutions based on dynamic programming over finite horizon, or using mathematical solvers, or adapting reinforcement learning algorithms to the desired context. Furthermore, the candidate can also study theoretical properties of the developed algorithms such as their complexity, optimality, and measures such as the regret. These algorithms are expected to be validated experimentally on appropriate case studies.
The overall objective is to contribute to the state of the art of planning with strong safety guarantees.
References:
- Sun et al. Online MDP with Prototypes Information: A Robust Adaptive Approach. AAAI 2025.
- Royer et al. Multiple-environment markov decision processes: Efficient analysis and applications. ICAPS 2020.
- Chatterjee et al. The Value Problem for Multiple-Environment MDPs with Parity Objective. ICALP 2025.