DDAT: Diffusion Policies Enforcing Dynamically Admissible Robot Trajectories

Jean-Baptiste Bouvier, Kanghyun Ryu, Kartik Nagpal, Qiayuan Liao, Koushil Sreenath, Negar Mehr
Robotics: Science and Systems (RSS) 2025

Paper Code arXiv

Unitree GO2 zero-shot hardware deployment of open-loop trajectories generated by a vanilla diffusion policy (left) and our DDAT model (right).
The vanilla diffusion policy fails at walking through the cones in open-loop. By accounting for the quadruped's dynamics our open-loop diffusion policy succeeds in following the corridor.

Overview

DDAT is a diffusion model generating long-horizon open-loop dynamically feasible robot trajectories.
DDAT projects predicted trajectories to make them dynamically admissible during both training and inference.
No more replanning! By producing accurate trajectories, our projections eliminate the need for diffusion models to continually replan.

Challenge

Diffusion models are stochastic by nature.
Thus, they cannot generate trajectories exactly satisfying the equations of motion of robots.
Robot dynamics can be written as $$ s_{t+1} = f(s_t, a_t) $$ where $s_t$ is the current state and $a_t \in \mathcal A$ a feasible control action leading to the next state $s_{t+1}$.
A generated trajectory $(s_0, s_1, ...)$ is dynamically feasible if and only if there exists $a_0, a_1, ... \in \mathcal A$ such that $s_1 = f(s_0, a_0)$, $s_2 = f(s_1, a_1)$, ...
This is equivalent to having each $s_{t+1}$ in the reachable set of $s_t$, i.e., $$ s_{t+1} \in \mathcal R(s_t) = \big\{ f(s_t, a) : a \in \mathcal{A} \big\}. $$
Dynamics $f$ are black-box preventing calculation of reachable set $\mathcal R(s_t)$.
Almost all robots except robot arms are underactuated and hence have small low-dimensional reachable sets from which sampling is challenging, making it impossible to generate dynamically feasible trajectories with diffusion.

Our Idea

Training loop of DDAT

We propose to project the trajectories generated by our diffusion model to make them admissible during both training and inference.
Trajectory projections are auto-regressive since we need to reach first $s_1$, then $s_2$ and so on.
Since $f$ is a black-box, we cannot compute reachable sets and instead sample a polytopic under-approximation of the reachable sets as shown on the video.

Our Approach

We implemented four projection algorithms divided in two categories to make trajectories dynamically feasible.

Given $s_0$ the diffusion model generates $s_1, s_2, ...$ and projections make $s_{t+1}$ reachable from $s_t$.
- Greedy state projection chronologically projects a sequence of states onto the approximated reachable sets $\mathcal{C}$ as shown on the video above: $$s_{t+1} \leftarrow \mathcal{P}(s_t, s_{t+1}) = \arg \min \big\{ ||s_{t+1} - c|| : c \in \mathcal{C} \}.$$
- Reference state projection uses a reference trajectory to guide the projections and prevent long-horizon divergence due to myopic greedy projections: $$s_{t+1} \leftarrow \mathcal{P}^\text{ref}(s_t, s_{t+1}, s^{ref}_{t+1}) = \arg \min \big\{ ||s_{t+1} - c|| + \lambda ||s^{ref}_{t+1} - c|| : c \in \mathcal{C} \}.$$
Given $s_0$ the diffusion model generates $s_1, s_2, ...$ and $a_0, a_1, ...$ while projections find $a_t$ to reach $s_{t+1}$ from $s_t$.
- Action projection replaces the next state by the state obtained after applying $a_t$: $$s_{t+1} \leftarrow \mathcal{P}^\text{A}(s_t, a_t) = f(s_t, a_t). $$
- State-Action projection calculates a feedback correction term to reach a next state closer to the prediction: $$ s_{t+1} \leftarrow \mathcal{P}^\text{SA}(s_t, a_t, s_{t+1}) = f\big(s_t, a_t + \delta a_t \big) \quad \text{where} \quad \delta a_t := \pi_\theta\big( s_{t+1} - f(s_t, a_t) \big). $$

Experiments

We implemented DDAT on six underactuated robots simulated in MuJoCo:

Hopper: 12 states, 3 actions,
Walker: 18 states, 6 actions,
HalfCheetah: 18 states, 6 actions,
Quadcopter: 17 states, 4 actions,
Unitree GO1 and GO2: 37 states, 12 actions.

We deployed DDAT on two real robots: the Unitree GO1 and GO2.

Projections make trajectories admissible and better

Statewise admissibility error over 500 Hopper trajectories.
The error without projections is much larger than when using
projections at inference or training with projections $\mathcal{P}^\text{ref}$.
All diffusion models generate only state trajectories.

Cumulative admissibility error over 500 Hopper trajectories.
The error without projections is much larger than when using
projections at inference or training with projections $\mathcal{P}^\text{ref}$.
All diffusion models generate only state trajectories.

Ratios of open-loop Hopper trajectories having fallen at a given timestep.
The trajectories without projections and those with projections at inference
start to fall significantly earlier than those training with projections $\mathcal{P}^\text{ref}$.
All diffusion models generate only state trajectories.

Statewise admissibility error over 400 Walker trajectories.
The error without projections is much larger than when using projections at inference.
Our model trained with projections $\mathcal{P}^\text{SA}$ has no error.
All diffusion models generate states and actions.

Cumulative admissibility error over 400 Walker trajectories.
The error without projections is much larger than when using projections at inference.
Our model trained with projections $\mathcal{P}^\text{SA}$ has no error.
All diffusion models generate states and actions.

Ratios of open-loop Walker trajectories having fallen at a given timestep.
The trajectories without projections and those with projections at inference
start to fall significantly earlier than those training with projections $\mathcal{P}^\text{SA}$.
All diffusion models generate states and actions.

Unitree GO2 open-loop trajectories tasked with going straight.
Both models without projections deviate from walking straight,
whereas our DDAT model follows prefectly the command.

Projections should only occur at low noise levels

All diffusion models generate state-action trajectories with projections starting from either the beginning of inference, i.e., projecting at all noise levels, or starting mid-inference, or project only once after inference.

Statewise admissibility error over 500 Hopper trajectories.
The admissibility error is similar between models no matter when projections start,
because they all perform a projection at the end of inference.

Ratios of open-loop Hopper trajectories having fallen at a given timestep.
The trajectories projecting at high noise levels fall significantly earlier than those projecting at small and zero noise levels.

Quadcopter trajectories tasked with slaloming between obstacles following the dashed line.
Trajectories generated with projections at high noise levels are incapable of performing the task,
while trajectories with projections at small and zero noise levels succeed.

Generating quadcopter trajectories

Objective: slalom between obstacles to reach the target.

Model generating state trajectories without projections.
The sampled trajectory would succeed if it was feasible,
but inverse dynamics shows how far off the trajectory actually is.

Model generating state trajectories with projections only at inference.
The sampled trajectory is close to feasible, but not sufficiently to
prevent the inverse dynamics from crashing.

Model generating state trajectories trained with reference projections.
The sampled trajectory is very close to the inverse dynamics and both succeed.

Model generating state trajectories trained with reference projections.
The sampled trajectory slaloms between the obstacles and reaches the target.

Model generating states and actions without projections.
The sampled trajectory would succeed if it was feasible,
but the open-loop shows how far off the trajectory actually is.

Model generating states and actions with projections only at inference.
The sampled trajectory flies through the obstacle,
while the open-loop diverges.

Model generating states and actions trained with SA-projections.
The sampled trajectory is admissible as it matches its
open-loop realisation and both succeed.

Model generating states and actions trained with reference projections.
The sampled trajectory slaloms between the obstacles and reaches the target.

MuJoCo Hopper open-loop trajectories.
Without projections the Hopper fall earlier than with our projections.

Poster

BibTeX

@inproceedings{bouvier2025ddat,
        title = {DDAT: Diffusion Policies Enforcing Dynamically Admissible Robot Trajectories},
        author = {Bouvier, Jean-Baptiste and Ryu, Kanghyun and Nagpal, Kartik and Liao, Qiayuan and Sreenath, Koushil and Mehr, Negar},
        booktitle = {Robotics: Science and Systems (RSS)},
        year = {2025}
      }

DDAT: Diffusion Policies Enforcing Dynamically Admissible Robot Trajectories

Overview

Challenge

Our Idea

Our Approach

Experiments

Projections make trajectories admissible and better

Statewise admissibility error over 500 Hopper trajectories. The error without projections is much larger than when using projections at inference or training with projections $\mathcal{P}^\text{ref}$. All diffusion models generate only state trajectories.

Cumulative admissibility error over 500 Hopper trajectories. The error without projections is much larger than when using projections at inference or training with projections $\mathcal{P}^\text{ref}$. All diffusion models generate only state trajectories.

Statewise admissibility error over 400 Walker trajectories. The error without projections is much larger than when using projections at inference. Our model trained with projections $\mathcal{P}^\text{SA}$ has no error. All diffusion models generate states and actions.

Cumulative admissibility error over 400 Walker trajectories. The error without projections is much larger than when using projections at inference. Our model trained with projections $\mathcal{P}^\text{SA}$ has no error. All diffusion models generate states and actions.

Ratios of open-loop Walker trajectories having fallen at a given timestep. The trajectories without projections and those with projections at inference start to fall significantly earlier than those training with projections $\mathcal{P}^\text{SA}$. All diffusion models generate states and actions.

Projections should only occur at low noise levels

Statewise admissibility error over 500 Hopper trajectories. The admissibility error is similar between models no matter when projections start, because they all perform a projection at the end of inference.

Ratios of open-loop Hopper trajectories having fallen at a given timestep. The trajectories projecting at high noise levels fall significantly earlier than those projecting at small and zero noise levels.

Quadcopter trajectories tasked with slaloming between obstacles following the dashed line. Trajectories generated with projections at high noise levels are incapable of performing the task, while trajectories with projections at small and zero noise levels succeed.