Multi-Agent Reinforcement Learning (MARL) provides a powerful framework for learning coordination in multi-agent systems.
However, applying MARL to robotics still remains challenging due to high-dimensional continuous joint action spaces, complex reward design,
and non-stationary transitions inherent to decentralized settings.
On the other hand, humans learn complex coordination through staged curricula,
where long-horizon behaviors are progressively built upon simpler skills. Motivated by this, we propose CRAFT: Coaching Reinforcement learning Autonomously
using Foundation models for multi-robot coordination Tasks, a framework that leverages the reasoning capabilities of foundation models to act as a "coach" for multi-robot coordination.
CRAFT automatically decomposes long-horizon coordination tasks into sequences of subtasks using the planning capability of Large Language Models (LLMs). In what follows,
CRAFT trains each subtask using reward functions generated by LLM, and refines them through a Vision Language Model (VLM)-guided reward-refinement loop.
We evaluate CRAFT on multi-quadruped navigation and bimanual manipulation tasks, demonstrating its capability to learn complex coordination behaviors.
In addition, we validate the multi-quadruped navigation policy in real hardware experiments.
Overview of CRAFT. CRAFT consist of five key stages:
Example of curriculum refinement for task lift and balance the pot.
- Three different candidate curricula $\mathcal{C}^1$ to $\mathcal{C}^3$, generated by
the curriculum LLM, are re-provided to the LLM for refinement. In $\mathcal{C}^1$, Task 1 focuses only on minimizing distance, while Task 1 in $\mathcal{C}^3$
is defined as minimizing distance and matching orientation. In contrast, Task 3 and Task 4 in $\mathcal{C}^1$ break down the lifting into two stages of
first lifting halfway and then to a full height, whereas $\mathcal{C}^3$ represents lifting as a single task. The curriculum LLM merges these candidates
into a final curriculum $\mathcal{C}$ by selecting the stronger tasks definitions from each candidate
Example of reward refinement of subtask Coordinate Preliminary Lift
- Through the first reward-refinement loop,
$R^1_{k=3}$ was produced and the evaluation VLM marked the policy as a failure since the pot never reached the required elevation of 0.05 m.
The reward component learning curves were then passed to the advice VLM, which identified that lift_reward
was too weak compared to balance_reward
.
It recommended removing the square on elevation, increasing the lift weight, and decreasing the balance weight.
The revised reward $R^2_{k=3}$ reflects these changes: the square on elevation was removed, the lift weight increased from 80 to 200, and the balance weight decreased from 2 to 1.
With this reward, the policy successfully achieved the 0.05 m elevation and satisfied the subtask.
CRAFT can learn collaborative multi-robot tasks that requires complex, long-horizon coordination, by learning a sequence of subtasks required to accomplish the overall task. We validate CRAFT in bimanual manipulation and multi-quadruped navigation tasks, demonstrating its capability to learn complex coordination behaviors.
Task 1: Align with the handle
Task 2: Grasp the handle
Task 3: Lift and balance the pot
Task 1: Initial Entry
Task 2: Coordinate Entry
Task 3: Target Platform Ascension
Task 1: Approach gate
Task 2: First agent passes
Task 3: Sequential passage
@article{choi2025craft,
title={CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks},
author={Choi, Seoyeon and Ryu, Kanghyun and Ock, Jonghoon and Mehr, Negar},
journal={arXiv preprint arXiv:2509.14380},
year={2025}
}