In this paper, we seek to learn a vision-based policy guaranteed to satisfy state constraints during and after training. To obtain hard safety guarantees in closed-loop with a black-box environment we build upon the POLICEd RL approach.
We extend the POLICEd RL approach to ensure that the policy maintains safety guaranteed with image inputs instead of state inputs by modifying the affine region to account for error in state estimation from images. Doing so can at times lead to the creation of a large affine region which can limit the generalisability of the network.
To solve this we use switched actors which allow us to define multiple affine regions. Thus we can break the large affine region into multiple smaller regions. At the same time using proojected gradient descent alongside switched actors allows us to guarantee hard constraints even during the training process.
Schematic illustration of training of switched actor with projected gradient descent ensuring constraint satisfaction throughout training.
We also extend the framework to non affine constraints by augemnting the state space with the non affine constraint allowing us to transform the non affine constraint into an affine constraint
Schematic illustration of switched actors with a non affine circular constraint.
@inproceedings{khari2025enforcing,
title = {Provably Enforcing Hard Constraints During Training of Vision-Based Policies in Reinforcement Learning},
author = {Khari, Shashwat and Bouvier, Jean-Baptiste and Mehr, Negar},
booktitle = {},
year = {2025}
}