UC Berkeley
ICON Lab

CurricuLLM
Automatic Task Curricula Design for
Learning Complex Robot Skills
using Large Language Models

ICON Lab at UC Berkeley
2025 International Conference on Robotics and Automation (ICRA)

Curriculum learning of a walking policy for a Berkeley Humanoid using CurricuLLM.
CurricuLLM can learn a real-world policy without human intervention for curriculum design or reward functions.

Abstract

Curriculum learning can achieve complex policies by progressively increasing the task difficulty during training. However, designing effective curricula for a specific task often requires extensive domain knowledge and human intervention, which limits its applicability across various domains. Our core idea is that large language models (LLMs) present significant potential for efficiently breaking down tasks and decomposing skills across various robotics environments. Additionally, the demonstrated success of LLMs in translating natural language into executable code for RL agents strengthens their role in generating task curricula. In this work, we propose CurricuLLM, which leverages the high-level planning and programming capabilities of LLMs for curriculum design, thereby enhancing the efficient learning of complex tasks. CurricuLLM consists of: (Step 1) Generating sequence of subtasks in natural language form, (Step 2) Translating natural language description of subtasks in executable task code, including the reward code and goal distribution code, and (Step 3) Evaluating trained policies based on trajectory rollout and subtask description. We evaluate CurricuLLM in various robotics simulation environments, ranging from manipulation, navigation, and locomotion, to show that CurricuLLM can aid learning complex robot control tasks. In addition, we validate humanoid locomotion policy learned through CurricuLLM in real-world.

Step 1: Curriculum generation

Step 1: Curriculum generation - Curriculum generation LLM receives the natural language form of a curriculum prompt as well as the environment description to generate a sequence of subtasks. Our prompt includes instruction for the curriculum designer, rules for how to describe the subtasks, and other tips on describing the curriculum. Environment description consists of the robot and its state variable description, the target task, and the initial state description.

Step 2 & 3: Task code generation and evaluation

Step 2 & 3: Task code generation and evaluation framework in each subtask - Task code generation LLM takes the environment and target task description, current and past task information, and the reward function used for previous subtask. Then, $K$ task code candidates for current subtask is sampled and used for fine-tuning policies from previous subtask. Then, evaluation LLM receives the trajectory rollout informations from trained policy and find a policy that best aligns with current subtask description.

CurricuLLM can learn various robot tasks

CurricuLLM can be applied to various robot tasks. CurricuLLM can break down the complex task to a sequence of subtasks that are easier to learn. Moreover, as the curriculum progresses, the reward function becomes more informative and aligned with the target task. We provide the example curriculum and reward function generated from CurricuLLM for Fetch Push environment.

Task 1
Name: Reach the Block
Description: The robot manipulator must use its end effector to reach and position itself directly above the block without making contact.

Task 2
Name: Maintain Contact With the Block
Description: Slowly decrease the z-coordinate of the end_effector_position to make gentle contact with the block

Task 3
Name: Push to a Short Distance
Description: After making contact, the robot manipulator must push the block to a predefined point that is a short distance away from the starting point.

Task 4
Name: Original Task
Description: The manipulator needs to push the block to a goal position on the table.

Result summary

CurricuLLM can be applied to diverse robot tasks, outperforming the vanilla RL trainings.

Humanoid policy trained with CurricuLLM deployed in real-world

Poster

BibTeX

@inproceedings{ryu2025curricullm,
  title={CurricuLLM: Automatic task curricula design for learning complex robot skills using large language models}, 
  author={Ryu, Kanghyun and Liao, Qiayuan and Li, Zhongyu and Sreenath, Koushil and Mehr, Negar},
  booktitle={2025 IEEE International Conference on Robotics and Automation (ICRA)},
  year={2025},
  organization={IEEE},
  arxiv={2409.18382},
}

References

[Eureka]
Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi “Jim” Fan, Anima Anandkumar, Eureka: Human-level reward design via coding large language models, International Conference on Learning Representations (ICLR), 2024.
[HER]
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba, Hindsight Experience Replay, Advances in Neural Information Processing Systems (Neurips), 2017.
[Eurekaverse]
William Liang, Sam Wang, Hung-Ju Wang, Osbert Bastani, Dinesh Jayaraman, Yecheng Jason Ma, Eurekaverse: Environment Curriculum Generation via Large Language Models, Conference on Robot Learning (CoRL), 2024.
[Berkeley Humanoid]
Qiayuan Liao, Bike Zhang, Xuanyu Huang, Xiaoyu Huang, Zhongyu Li, and Koushil Sreenath, Berkeley Humanoid: A Research Platform for Learning-based Control, Arxiv, 2024.