Curriculum learning can achieve complex policies by progressively increasing the task difficulty during training. However, designing effective curricula for a specific task often requires extensive domain knowledge and human intervention, which limits its applicability across various domains. Our core idea is that large language models (LLMs) present significant potential for efficiently breaking down tasks and decomposing skills across various robotics environments. Additionally, the demonstrated success of LLMs in translating natural language into executable code for RL agents strengthens their role in generating task curricula. In this work, we propose CurricuLLM, which leverages the high-level planning and programming capabilities of LLMs for curriculum design, thereby enhancing the efficient learning of complex tasks. CurricuLLM consists of: (Step 1) Generating sequence of subtasks in natural language form, (Step 2) Translating natural language description of subtasks in executable task code, including the reward code and goal distribution code, and (Step 3) Evaluating trained policies based on trajectory rollout and subtask description. We evaluate CurricuLLM in various robotics simulation environments, ranging from manipulation, navigation, and locomotion, to show that CurricuLLM can aid learning complex robot control tasks. In addition, we validate humanoid locomotion policy learned through CurricuLLM in real-world.
Step 1: Curriculum generation - Curriculum generation LLM receives the natural language form of a curriculum prompt as well as the environment description to generate a sequence of subtasks. Our prompt includes instruction for the curriculum designer, rules for how to describe the subtasks, and other tips on describing the curriculum. Environment description consists of the robot and its state variable description, the target task, and the initial state description.
Step 2 & 3: Task code generation and evaluation framework in each subtask - Task code generation LLM takes the environment and target task description, current and past task information, and the reward function used for previous subtask. Then, $K$ task code candidates for current subtask is sampled and used for fine-tuning policies from previous subtask. Then, evaluation LLM receives the trajectory rollout informations from trained policy and find a policy that best aligns with current subtask description.
CurricuLLM can be applied to various robot tasks. CurricuLLM can break down the complex task to a sequence of subtasks that are easier to learn. Moreover, as the curriculum progresses, the reward function becomes more informative and aligned with the target task. We provide the example curriculum and reward function generated from CurricuLLM for Fetch Push environment.
Task 1
Name: Reach the Block
Description: The robot manipulator must use its end effector to reach and position itself directly above the block without making contact.
Task 2
Name: Maintain Contact With the Block
Description: Slowly decrease the z-coordinate of the end_effector_position to make gentle contact with the block
Task 3
Name: Push to a Short Distance
Description: After making contact, the robot manipulator must push the block to a predefined point that is a short distance away from the starting point.
Task 4
Name: Original Task
Description: The manipulator needs to push the block to a goal position on the table.
@inproceedings{ryu2025curricullm,
title={CurricuLLM: Automatic task curricula design for learning complex robot skills using large language models},
author={Ryu, Kanghyun and Liao, Qiayuan and Li, Zhongyu and Sreenath, Koushil and Mehr, Negar},
booktitle={2025 IEEE International Conference on Robotics and Automation (ICRA)},
year={2025},
organization={IEEE},
arxiv={2409.18382},
}