MIMIC-D

Overview

MIMIC-D is a Centralized Training Decentralized Execution (CTDE) framework for multi-agent coordination using only local observations that effectively captures multi-modality in the expert data.
During the centralized training, agents learns to coordinate with each other via access to centralized loss function. The agents preserve coordination during receding horizon decentralized execution.
We demonstrate the effectiveness of MIMIC-D in multiple simulated domains and on a complicated bimanual hardware setup, showing significant improvements over baselines in recovering expert trajectory distributions while reducing collisions and task failures

Challenge

Achieving coordination in multi-agent systems is challenging, especially in the presence of multi-modality
For example, when two people are walking toward each other head-on, they can avoid a collision if they both choose to yield right or both choose to yield left. Both strategies are acceptable, but the agents need to decide together to achieve the desired coordination.
One popular way to learn motion policies is Imitation Learning
However, most existing imitation learning methods are not designed to handle multi-modality and coordination in multi-agent systems.
We assume that we have access to a dataset $\mathcal{D}$ containing $M$ expert demonstrations where each demonstration is a collection of $N$ agents interacting with each other for a finite horizon of time $T$.
Each demonstration in the dataset is a set of tuples $\{(\xi^i,o^i)\}_{i=1}^N$, where $o^i$ is the observation of agent $i$ and $\xi^i = \{a^i_0,\ldots,a^i_{T-1}\}$ is the corresponding finite-horizon ($T$) trajectory of actions executed by agent $i$ associated with observation $o^i$.
Note that the expert demonstrations may be multi-modal in nature, i.e., for a given observation $o^i$, we may have two or more different expert actions.
Our goal is to learn a set of decentralized policies $\{\pi_{\theta^1}^1, \ldots, \pi_{\theta^N}^N\}$ that can collectively reproduce individual agent behavior and learn implicit coordination among agents from the dataset $\mathcal{D}$.

Our Key Idea

Our key idea is to use diffusion models to capture multi-modality in expert demonstrations.

Our Approach

We employ a CTDE paradigm for learning multi-modal multi-agent coordination policies using diffusion models.

For our MIMIC-D method, we model decentralized policies $\pi^i$ as a conditional diffusion model with denoiser network $D_{\theta^i}(\xi^i;\sigma, o^i)$.
For each agent, $D_{\theta^i}$ takes three inputs, the noisy action trajectory $\xi^i_k$, current noise level $\sigma_k$, and observation $o^i$, and with iterative denoising produces a sampled action trajectory ${\xi}^i_K$.
Centralized Training: we jointly train the policies for all the agents in the system. During the training, the agents' denoising policies share a single loss and have access to all local observations. This is necessary to promote coordination and collision-avoidant behaviors. The joint loss function: $$ \mathcal{L}_{\text{total}}(\theta) = \sum_{i=1}^N \mathcal{L}^i_{\text{diff}}(\theta^i) $$
Decentralized Execution: At every timestep, each agent $i$ individually acquires their local observation $o^i$ from the environment. Then, the agents sample their own policy $\pi^i_{\theta^i}$ for ${\xi}^i$, which has horizon $T$. Only the first $h$ ($h < T$) steps of each agent's actions are executed.

Simulation Experiments

We employed MIMIC-D on three different simulation scenariors:

Two-Agent Swap: Two agents are trying to swap positions with one another while avoiding a central obstacle. This environment involves six possible modes as shown below.

Three-Agent Road Crossing: Three agents are trying to avoid one another while on their way to their respective goal locations. As shown below, this can be achieved in enumerous ways.

Two-Arm Lift: The two Kinova3 arms collaborate to lift the pot and transfer it to the other side while avoiding the obstacle (red box). This challenging high dimensional task has two modes on which both the agents need to coordinate.

The results of these simulation results are summarized in the following plots. As it can be seen MIMIC-D achieves higher coordination amongst agents and significantly less collisions and task failures.

Hardware Experiments

To demonstrate MIMIC-D in high-dimensional real hardware, we also perform the Two-Arm Lift task in the real world.

BibTeX

@article{dong2025mimic,
  title={MIMIC-D: Multi-modal Imitation for MultI-agent Coordination with Decentralized Diffusion Policies},
  author={Dong, Dayi and Bhatt, Maulik and Choi, Seoyeon and Mehr, Negar},
  journal={arXiv preprint arXiv:2509.14159},
  year={2025}
}