Skip to content

A framework for continual behavior learning in embodied agents through interaction and human guidance.

License

Notifications You must be signed in to change notification settings

George121380/LEAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LEAP: Lifelong Experience Abstraction and Planning

License: MIT Python 3.9+ Code style: black ICML 2025

LEAP is a framework for continual behavior learning in embodied agents through interaction with the environment and guidance from humans. LEAP addresses the challenge of representing flexible knowledge about tasks and environments β€” ranging from constraints and subgoal sequences to action plans and high-level goals β€” in a unified framework using the Crow Definition Language (CDL).

πŸ— Overview

LEAP Pipeline Overview

LEAP System Architecture

Learning through Interaction and Guidance

Learning and Guidance

πŸ”‘ Key Contributions

  1. 🧠 LLM-to-CDL Translation: Novel algorithm that translates diverse natural language instructions into structured CDL behavior representations with task decomposition and error correction mechanisms

  2. πŸ“š Continual Behavior Learning: Mechanism for abstracting and storing reusable behavior rules that enables lifelong learning and knowledge accumulation in embodied agents

  3. 🏠 VirtualHome-HG Benchmark: Comprehensive evaluation dataset with 210 challenging long-horizon tasks across 3 household environments, featuring systematic human-in-the-loop evaluation

πŸ›  Setup

Pip Installation

git clone https://github.com/George121380/LEAP.git
cd LEAP
conda create -n leap-agent python==3.9 -y
conda activate leap-agent
pip install -r requirements.txt
conda install faiss-cpu -c conda-forge -y
# Manual setup of third-party libraries required (see below)

Third-Party Dependencies

The project requires two external libraries:

# Create directory for third-party libraries
mkdir -p ~/leap_third_party
cd ~/leap_third_party

# Install Jacinle
git clone https://github.com/vacancy/Jacinle --recursive

# Install Concepts
git clone https://github.com/concepts-ai/Concepts.git --recursive

# Set environment variables
export PATH="$HOME/leap_third_party/Jacinle/bin:$PATH"
export PYTHONPATH="$HOME/leap_third_party/Jacinle:$PYTHONPATH"
export PATH="$HOME/leap_third_party/Concepts/bin:$PATH"
export PYTHONPATH="$HOME/leap_third_party/Concepts:$PYTHONPATH"

Verification

cd ..
python verify_installation.py

API Keys Setup

  1. Copy the example configuration:

    cp config/api_keys.json.example config/api_keys.json
  2. Edit config/api_keys.json with your actual API keys:

    {
      "OpenAI_API_Key": "sk-your-actual-openai-key",
    }

πŸ“– Usage

Interactive Mode

cd src
python main_VH.py

Follow the prompts to:

  1. Select agent configuration
  2. Choose evaluation mode (single task or batch)
  3. Specify scenes and parameters

Baselines Experiment

Configuration Description
OursWG Full system with guidance (recommended)
OursWOG Full system without guidance
LLMWG LLM baseline with guidance
LLMWOG LLM baseline without guidance
LLMPlusPWG LLM with planning, with guidance
CAPWG CAP baseline with guidance

Ablation Study

Configuration Purpose
WOLibrary Without behavior library
ActionLibrary Action-based vs behavior-based library
WORefinement Without goal refinement
WOSplit Without task decomposition
PvP Policy vs Planning comparison

Command Line Interface

# Single task evaluation
python main_VH.py --config OursWG --mode single --scene 0 \
  --task_path ../VirtualHome-HG/dataset/Cook_some_food/g1.txt

# Batch evaluation
python main_VH.py --config OursWG --mode all --run_mode test --scene all

Available Command Line Options

--config CONFIG         Agent configuration (e.g., OursWG, LLMWG)
--mode {single,all}     Evaluation mode
--scene SCENE           Scene ID or 'all' for all scenes
--task_path TASK_PATH   Path to specific task file (single mode)
--run_mode {debug,test} Running mode for batch evaluation
--checkpoint PATH       Resume from checkpoint
--verbo                 Verbose output

πŸ— Project Structure

LEAP/
β”œβ”€β”€ πŸ“ src/                     # Source code
β”‚   β”œβ”€β”€ πŸ€– agent/               # Agent implementations
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ base.py             # Base agent class
β”‚   β”‚   β”œβ”€β”€ leap.py             # LEAP agent (main)
β”‚   β”‚   └── llm_based.py        # LLM-only agent
β”‚   β”œβ”€β”€ πŸ“Š evaluation.py        # Task evaluation logic
β”‚   β”œβ”€β”€ 🏠 env.py               # VirtualHome environment wrapper
β”‚   β”œβ”€β”€ 🧠 planning.py          # Planning pipeline
β”‚   β”œβ”€β”€ πŸ“š library.py           # Behavior library
β”‚   β”œβ”€β”€ πŸ‘€ human.py             # Human guidance interface
β”‚   β”œβ”€β”€ βš™οΈ configs.py           # Configuration classes
β”‚   β”œβ”€β”€ πŸ“ domain/              # CDL domain definitions
β”‚   β”‚   β”œβ”€β”€ init_scene.cdl      # Scene initialization
β”‚   β”‚   └── virtualhome_*.cdl   # VirtualHome-specific rules
β”‚   β”œβ”€β”€ πŸ“ prompts/             # LLM prompts and templates
β”‚   β”‚   β”œβ”€β”€ baselines/          # Baseline method prompts
β”‚   β”‚   └── QA/                 # Question-answering prompts
β”‚   β”œβ”€β”€ πŸ“ simulator/           # VirtualHome simulator components
β”‚   β”‚   β”œβ”€β”€ environment.py      # Environment interface
β”‚   β”‚   β”œβ”€β”€ execution.py        # Action execution
β”‚   β”‚   └── logic_score.py      # Logic-based scoring
β”‚   β”œβ”€β”€ πŸ“ utils/               # Utility functions and models
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ auto_debugger.py    # Automatic debugging
β”‚   β”‚   β”œβ”€β”€ models/             # Pre-trained models
β”‚   β”‚   └── solver.py           # Problem solving utilities
β”‚   β”œβ”€β”€ 🎯 main_VH.py           # Main entry point
β”‚   └── πŸ“Š metrics*.py          # Evaluation metrics
β”œβ”€β”€ πŸ“ VirtualHome-HG/          # Dataset and scenes
β”‚   β”œβ”€β”€ πŸ“ dataset/             # Task definitions (210 tasks)
β”‚   β”‚   β”œβ”€β”€ Cook_some_food/     # Cooking tasks
β”‚   β”‚   β”œβ”€β”€ Clean_the_bathroom/ # Cleaning tasks
β”‚   β”‚   β”œβ”€β”€ Wash_clothes/       # Laundry tasks
β”‚   β”‚   └── ...                 # Other task categories
β”‚   β”œβ”€β”€ πŸ“ scenes/              # Environment scenes
β”‚   β”‚   β”œβ”€β”€ Scene_0.json        # Kitchen scene
β”‚   β”‚   β”œβ”€β”€ Scene_1.json        # Living room scene
β”‚   β”‚   └── Scene_2.json        # Bedroom scene
β”‚   └── πŸ“ scripts/             # Dataset processing scripts
β”œβ”€β”€ πŸ“ config/                  # Configuration files
β”‚   └── api_keys.json           # Your API keys (gitignored)
└── πŸ“ leap_third_party/        # Third-party dependencies
    β”œβ”€β”€ Jacinle/                # Jacinle framework
    └── Concepts/               # Concepts framework

🏠 VirtualHome-HG Benchmark

LEAP introduces VirtualHome-HG (Human Guidance), a new benchmark built on the VirtualHome simulator featuring:

πŸ“Š Dataset Statistics

  • 210 diverse tasks across 3 different household scenes
  • 93 cooking tasks, 33 cleaning tasks, 27 laundry tasks, 57 rearrangement tasks
  • 376 distinct items spanning 157 categories per scene on average
  • Task complexity: From single-action tasks to complex 159-action sequences

🎯 Task Categories

  • Simple Set (78 tasks): Single-stage tasks requiring <15 actions
  • Multi-stage Set (30 tasks): Complex tasks requiring 30-150 actions
  • Ambiguous Set (57 tasks): Tasks with highly ambiguous descriptions requiring human guidance
  • Constraint Set (30 tasks): Tasks with implicit size and spatial constraints

πŸ“ Evaluation Metrics

  • Task Completion Rate: Based on goal state achievement using oracle planning
  • Key Action Execution Rate: Measures execution of manually annotated critical actions
  • Combined Score: Weighted combination (2/3 action rate + 1/3 goal rate)

πŸ€– Human Guidance System

  • LLM-based Human Agent: Provides natural, human-like guidance based on annotated instructions
  • Adaptive Querying: Agents can request help after multiple failed attempts
  • Realistic Communication: Mimics parent-child teaching interactions without robotic terminology

πŸ“ˆ Experimental Results

LEAP demonstrates significant improvements over baseline methods:

Performance Comparison (Overall Success Rate)

Method Without Guidance With Guidance
LLM Policy 59.1% 59.3%
LLM+P 67.8% 70.1%
Code as Policy 61.7% 69.9%
Voyager 70.1% 76.4%
LEAP (Ours) 75.6% 80.1%

Key Findings

  • 🎯 Best Human Guidance Utilization: LEAP achieves the highest improvement (14.3%) on ambiguous tasks when receiving human guidance
  • πŸ“š Library Learning Benefits: CDL library storage significantly outperforms action sequence storage across all task categories
  • πŸ”„ Continual Learning: Performance continuously improves over time, with 12% improvement on medium tasks and 17% on hard tasks through prior experience
  • ⚑ Efficiency: Refinement mechanism reduces CDL generation time by ~10% while improving performance

πŸ“š Citation

If you use this work in your research, please cite:

@inproceedings{liu2025leap,
  title={Lifelong Experience Abstraction and Planning},
  author={Peiqi Liu and Joshua B. Tenenbaum and Leslie Pack Kaelbling and Jiayuan Mao},
  booktitle={ICML 2025 Workshop on Programmatic Representations for Agent Learning},
  year={2025},
  institution={Massachusetts Institute of Technology and EECS, Peking University}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • VirtualHome: Built upon the VirtualHome simulator (Puig et al., 2018) for realistic household environments
  • Crow Definition Language (CDL): Leverages CDL (Mao et al., 2024) as the core behavior rule language
  • MIT & PKU: Research conducted at Massachusetts Institute of Technology and Peking University
  • ICML 2025: Accepted at ICML 2025 Workshop on Programmatic Representations for Agent Learning
  • Jacinle & Concepts: Utilizes frameworks by Jiayuan Mao for reasoning and planning

πŸ”— Related Work


⭐ Star this repository if you find it helpful!

For questions and discussions, please open an issue or reach out to the maintainers.

About

A framework for continual behavior learning in embodied agents through interaction and human guidance.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published