ICML Poster Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity

Poster

Provable Policy Gradient for Robust Average-Reward MDPs Beyond Rectangularity

Qiuhao Wang · Yuqi Zha · Chin Pang Ho · Marek Petrik

[ Abstract ]

Wed 16 Jul 11 a.m. PDT — 1:30 p.m. PDT

Abstract:

Robust Markov Decision Processes (MDPs) offer a promising framework for computing reliable policies under model uncertainty. While policy gradient methods have gained increasing popularity in robust discounted MDPs, their application to the average-reward criterion remains largely unexplored. This paper proposes a Robust Projected Policy Gradient (RP2G), the first generic policy gradient method for robust average-reward MDPs (RAMDPs) that is applicable beyond the typical rectangularity assumption on transition ambiguity. In contrast to existing robust policy gradient algorithms, RP2G incorporates an adaptive decreasing tolerance mechanism for efficient policy updates at each iteration. We also present a comprehensive convergence analysis of RP2G for solving ergodic tabular RAMDPs. Furthermore, we establish the first study of the inner worst-case transition evaluation problem in RAMDPs, proposing two gradient-based algorithms tailored for rectangular and general ambiguity sets, each with provable convergence guarantees. Numerical experiments confirm the global convergence of our new algorithm and demonstrate its superior performance.

Live content is unavailable. Log in and register to view live content