LEAP is a framework for continual behavior learning in embodied agents through interaction with the environment and guidance from humans. LEAP addresses the challenge of representing flexible knowledge about tasks and environments β ranging from constraints and subgoal sequences to action plans and high-level goals β in a unified framework using the Crow Definition Language (CDL).
-
π§ LLM-to-CDL Translation: Novel algorithm that translates diverse natural language instructions into structured CDL behavior representations with task decomposition and error correction mechanisms
-
π Continual Behavior Learning: Mechanism for abstracting and storing reusable behavior rules that enables lifelong learning and knowledge accumulation in embodied agents
-
π VirtualHome-HG Benchmark: Comprehensive evaluation dataset with 210 challenging long-horizon tasks across 3 household environments, featuring systematic human-in-the-loop evaluation
git clone https://github.com/George121380/LEAP.git
cd LEAP
conda create -n leap-agent python==3.9 -y
conda activate leap-agent
pip install -r requirements.txt
conda install faiss-cpu -c conda-forge -y
# Manual setup of third-party libraries required (see below)The project requires two external libraries:
# Create directory for third-party libraries
mkdir -p ~/leap_third_party
cd ~/leap_third_party
# Install Jacinle
git clone https://github.com/vacancy/Jacinle --recursive
# Install Concepts
git clone https://github.com/concepts-ai/Concepts.git --recursive
# Set environment variables
export PATH="$HOME/leap_third_party/Jacinle/bin:$PATH"
export PYTHONPATH="$HOME/leap_third_party/Jacinle:$PYTHONPATH"
export PATH="$HOME/leap_third_party/Concepts/bin:$PATH"
export PYTHONPATH="$HOME/leap_third_party/Concepts:$PYTHONPATH"cd ..
python verify_installation.py-
Copy the example configuration:
cp config/api_keys.json.example config/api_keys.json
-
Edit
config/api_keys.jsonwith your actual API keys:{ "OpenAI_API_Key": "sk-your-actual-openai-key", }
cd src
python main_VH.pyFollow the prompts to:
- Select agent configuration
- Choose evaluation mode (single task or batch)
- Specify scenes and parameters
| Configuration | Description |
|---|---|
| OursWG | Full system with guidance (recommended) |
| OursWOG | Full system without guidance |
| LLMWG | LLM baseline with guidance |
| LLMWOG | LLM baseline without guidance |
| LLMPlusPWG | LLM with planning, with guidance |
| CAPWG | CAP baseline with guidance |
| Configuration | Purpose |
|---|---|
| WOLibrary | Without behavior library |
| ActionLibrary | Action-based vs behavior-based library |
| WORefinement | Without goal refinement |
| WOSplit | Without task decomposition |
| PvP | Policy vs Planning comparison |
# Single task evaluation
python main_VH.py --config OursWG --mode single --scene 0 \
--task_path ../VirtualHome-HG/dataset/Cook_some_food/g1.txt
# Batch evaluation
python main_VH.py --config OursWG --mode all --run_mode test --scene all--config CONFIG Agent configuration (e.g., OursWG, LLMWG)
--mode {single,all} Evaluation mode
--scene SCENE Scene ID or 'all' for all scenes
--task_path TASK_PATH Path to specific task file (single mode)
--run_mode {debug,test} Running mode for batch evaluation
--checkpoint PATH Resume from checkpoint
--verbo Verbose output
LEAP/
βββ π src/ # Source code
β βββ π€ agent/ # Agent implementations
β β βββ __init__.py
β β βββ base.py # Base agent class
β β βββ leap.py # LEAP agent (main)
β β βββ llm_based.py # LLM-only agent
β βββ π evaluation.py # Task evaluation logic
β βββ π env.py # VirtualHome environment wrapper
β βββ π§ planning.py # Planning pipeline
β βββ π library.py # Behavior library
β βββ π€ human.py # Human guidance interface
β βββ βοΈ configs.py # Configuration classes
β βββ π domain/ # CDL domain definitions
β β βββ init_scene.cdl # Scene initialization
β β βββ virtualhome_*.cdl # VirtualHome-specific rules
β βββ π prompts/ # LLM prompts and templates
β β βββ baselines/ # Baseline method prompts
β β βββ QA/ # Question-answering prompts
β βββ π simulator/ # VirtualHome simulator components
β β βββ environment.py # Environment interface
β β βββ execution.py # Action execution
β β βββ logic_score.py # Logic-based scoring
β βββ π utils/ # Utility functions and models
β β βββ __init__.py
β β βββ auto_debugger.py # Automatic debugging
β β βββ models/ # Pre-trained models
β β βββ solver.py # Problem solving utilities
β βββ π― main_VH.py # Main entry point
β βββ π metrics*.py # Evaluation metrics
βββ π VirtualHome-HG/ # Dataset and scenes
β βββ π dataset/ # Task definitions (210 tasks)
β β βββ Cook_some_food/ # Cooking tasks
β β βββ Clean_the_bathroom/ # Cleaning tasks
β β βββ Wash_clothes/ # Laundry tasks
β β βββ ... # Other task categories
β βββ π scenes/ # Environment scenes
β β βββ Scene_0.json # Kitchen scene
β β βββ Scene_1.json # Living room scene
β β βββ Scene_2.json # Bedroom scene
β βββ π scripts/ # Dataset processing scripts
βββ π config/ # Configuration files
β βββ api_keys.json # Your API keys (gitignored)
βββ π leap_third_party/ # Third-party dependencies
βββ Jacinle/ # Jacinle framework
βββ Concepts/ # Concepts framework
LEAP introduces VirtualHome-HG (Human Guidance), a new benchmark built on the VirtualHome simulator featuring:
- 210 diverse tasks across 3 different household scenes
- 93 cooking tasks, 33 cleaning tasks, 27 laundry tasks, 57 rearrangement tasks
- 376 distinct items spanning 157 categories per scene on average
- Task complexity: From single-action tasks to complex 159-action sequences
- Simple Set (78 tasks): Single-stage tasks requiring <15 actions
- Multi-stage Set (30 tasks): Complex tasks requiring 30-150 actions
- Ambiguous Set (57 tasks): Tasks with highly ambiguous descriptions requiring human guidance
- Constraint Set (30 tasks): Tasks with implicit size and spatial constraints
- Task Completion Rate: Based on goal state achievement using oracle planning
- Key Action Execution Rate: Measures execution of manually annotated critical actions
- Combined Score: Weighted combination (2/3 action rate + 1/3 goal rate)
- LLM-based Human Agent: Provides natural, human-like guidance based on annotated instructions
- Adaptive Querying: Agents can request help after multiple failed attempts
- Realistic Communication: Mimics parent-child teaching interactions without robotic terminology
LEAP demonstrates significant improvements over baseline methods:
| Method | Without Guidance | With Guidance |
|---|---|---|
| LLM Policy | 59.1% | 59.3% |
| LLM+P | 67.8% | 70.1% |
| Code as Policy | 61.7% | 69.9% |
| Voyager | 70.1% | 76.4% |
| LEAP (Ours) | 75.6% | 80.1% |
- π― Best Human Guidance Utilization: LEAP achieves the highest improvement (14.3%) on ambiguous tasks when receiving human guidance
- π Library Learning Benefits: CDL library storage significantly outperforms action sequence storage across all task categories
- π Continual Learning: Performance continuously improves over time, with 12% improvement on medium tasks and 17% on hard tasks through prior experience
- β‘ Efficiency: Refinement mechanism reduces CDL generation time by ~10% while improving performance
If you use this work in your research, please cite:
@inproceedings{liu2025leap,
title={Lifelong Experience Abstraction and Planning},
author={Peiqi Liu and Joshua B. Tenenbaum and Leslie Pack Kaelbling and Jiayuan Mao},
booktitle={ICML 2025 Workshop on Programmatic Representations for Agent Learning},
year={2025},
institution={Massachusetts Institute of Technology and EECS, Peking University}
}This project is licensed under the MIT License - see the LICENSE file for details.
- VirtualHome: Built upon the VirtualHome simulator (Puig et al., 2018) for realistic household environments
- Crow Definition Language (CDL): Leverages CDL (Mao et al., 2024) as the core behavior rule language
- MIT & PKU: Research conducted at Massachusetts Institute of Technology and Peking University
- ICML 2025: Accepted at ICML 2025 Workshop on Programmatic Representations for Agent Learning
- Jacinle & Concepts: Utilizes frameworks by Jiayuan Mao for reasoning and planning
- VirtualHome: Original VirtualHome Environment
- Crow Planner: CDL and Crow Planning Framework
β Star this repository if you find it helpful!
For questions and discussions, please open an issue or reach out to the maintainers.

