UC Berkeley
ICON Lab

DDAT: Diffusion Policies Enforcing Dynamically Admissible Robot Trajectories

Unitree GO2 zero-shot hardware deployment of open-loop trajectories generated by a vanilla diffusion policy (left) and our DDAT model (right).
The vanilla diffusion policy fails at walking through the cones in open-loop. By accounting for the quadruped's dynamics our open-loop diffusion policy succeeds in following the corridor.

Overview

  • DDAT is a diffusion model generating long-horizon open-loop dynamically feasible robot trajectories.
  • DDAT projects predicted trajectories to make them dynamically admissible during both training and inference.
  • No more replanning! By producing accurate trajectories, our projections eliminate the need for diffusion models to continually replan.

Challenge

  • Diffusion models are stochastic by nature.
  • Thus, they cannot generate trajectories exactly satisfying the equations of motion of robots.
  • Robot dynamics can be written as $$ s_{t+1} = f(s_t, a_t) $$ where $s_t$ is the current state and $a_t \in \mathcal A$ a feasible control action leading to the next state $s_{t+1}$.
  • A generated trajectory $(s_0, s_1, ...)$ is dynamically feasible if and only if there exists $a_0, a_1, ... \in \mathcal A$ such that $s_1 = f(s_0, a_0)$, $s_2 = f(s_1, a_1)$, ...
  • This is equivalent to having each $s_{t+1}$ in the reachable set of $s_t$, i.e., $$ s_{t+1} \in \mathcal R(s_t) = \big\{ f(s_t, a) : a \in \mathcal{A} \big\}. $$
  • Dynamics $f$ are black-box preventing calculation of reachable set $\mathcal R(s_t)$.
  • Almost all robots except robot arms are underactuated and hence have small low-dimensional reachable sets from which sampling is challenging, making it impossible to generate dynamically feasible trajectories with diffusion.

Our Idea

Training loop of DDAT
DDAT illustration
  • We propose to project the trajectories generated by our diffusion model to make them admissible during both training and inference.
  • Trajectory projections are auto-regressive since we need to reach first $s_1$, then $s_2$ and so on.
  • Since $f$ is a black-box, we cannot compute reachable sets and instead sample a polytopic under-approximation of the reachable sets as shown on the video.

Our Approach

We implemented four projection algorithms divided in two categories to make trajectories dynamically feasible.
  • Given $s_0$ the diffusion model generates $s_1, s_2, ...$ and projections make $s_{t+1}$ reachable from $s_t$.
    • Greedy state projection chronologically projects a sequence of states onto the approximated reachable sets $\mathcal{C}$ as shown on the video above: $$s_{t+1} \leftarrow \mathcal{P}(s_t, s_{t+1}) = \arg \min \big\{ ||s_{t+1} - c|| : c \in \mathcal{C} \}.$$
    • Reference state projection uses a reference trajectory to guide the projections and prevent long-horizon divergence due to myopic greedy projections: $$s_{t+1} \leftarrow \mathcal{P}^\text{ref}(s_t, s_{t+1}, s^{ref}_{t+1}) = \arg \min \big\{ ||s_{t+1} - c|| + \lambda ||s^{ref}_{t+1} - c|| : c \in \mathcal{C} \}.$$
  • Given $s_0$ the diffusion model generates $s_1, s_2, ...$ and $a_0, a_1, ...$ while projections find $a_t$ to reach $s_{t+1}$ from $s_t$.
    • Action projection replaces the next state by the state obtained after applying $a_t$: $$s_{t+1} \leftarrow \mathcal{P}^\text{A}(s_t, a_t) = f(s_t, a_t). $$
    • State-Action projection calculates a feedback correction term to reach a next state closer to the prediction: $$ s_{t+1} \leftarrow \mathcal{P}^\text{SA}(s_t, a_t, s_{t+1}) = f\big(s_t, a_t + \delta a_t \big) \quad \text{where} \quad \delta a_t := \pi_\theta\big( s_{t+1} - f(s_t, a_t) \big). $$

Experiments

We implemented DDAT on six underactuated robots simulated in MuJoCo:
  • Hopper: 12 states, 3 actions,
  • Walker: 18 states, 6 actions,
  • HalfCheetah: 18 states, 6 actions,
  • Quadcopter: 17 states, 4 actions,
  • Unitree GO1 and GO2: 37 states, 12 actions.
We deployed DDAT on two real robots: the Unitree GO1 and GO2.

Projections make trajectories admissible and better

Unitree GO2 open-loop trajectories tasked with going straight.
Both models without projections deviate from walking straight,
whereas our DDAT model follows prefectly the command.

Projections should only occur at low noise levels

All diffusion models generate state-action trajectories with projections starting from either the beginning of inference, i.e., projecting at all noise levels, or starting mid-inference, or project only once after inference.

Generating quadcopter trajectories

Objective: slalom between obstacles to reach the target.

MuJoCo Hopper open-loop trajectories.
Without projections the Hopper fall earlier than with our projections.

BibTeX

@inproceedings{bouvier2025ddat,
        title = {DDAT: Diffusion Policies Enforcing Dynamically Admissible Robot Trajectories},
        author = {Bouvier, Jean-Baptiste and Ryu, Kanghyun and Nagpal, Kartik and Liao, Qiayuan and Sreenath, Koushil and Mehr, Negar},
        booktitle = {under review},
        year = {2025}
      }