Experiments on layout generalizaion and reward shaping in Cooperative Multi-Agent Overcooked-AI environment
In this work, we investigate if policies trained via self-play with Proximal Policy Optimization (PPO) can perform well in multiple layouts and generalize to unseen layouts in Overcooked-AI. Our experimental setup combines reward shaping with a stepwise decay schedule and a deep neural policy architecture. We analyze the effect of entropy regularization and reward shaping on both learning efficiency and final policy performance.
You can find the full project report here.
Below you can see a replay of two agents cooperating in the "cramped_room" layout, from the first experiment of the report:
Below you can see the learning curves of the generalizatione experiment in three different layouts:

The Overcooked-AI Python package is needed for all environments and simulation logic.
git clone https://github.com/HumanCompatibleAI/overcooked_ai.git
cd overcooked_aiRecommended to avoid version conflicts.
With conda:
conda create -n overcooked-rl python=3.10 -y
conda activate overcooked-rlOr with venv:
python3.10 -m venv overcooked-rl
source overcooked-rl/bin/activateThe [harl] option installs all dependencies needed for RL experiments (including gym, pygame, etc).
pip install -e .[harl]Go back to this repo.
This will install all Python dependencies for this repository (TensorFlow, numpy, etc).
cd path/to/this/repo
pip install -r requirements.txtRun the demo.ipynb notebook for evaluating the project model trained with gifs replays.
Run:
jupyter notebookand then select and run cells in demo.ipynb.
This includes all the final results discussed in the report, and replay GIFs of episodes.
You can run a new training or resume a previus one with PPO using the provided script.
Change script parameters as needed inside training/train_selfplay.py.
python -m training.train_selfplayThe training was monitored with Tensorboard, and with .csv file for plotting results. The logging in on by default.
overcooked_rl/
│
├── env/
│ └── generalized_env.py # Multi-layout Gym-like environment wrapper
│
├── cramped_room # cramped_room training experiments files (logs, checkpoints, gifs...)
│
├── generalization/ # Generalization experiments files (logs, chckpoints, gifs...)
|
├── agents/
│ └── ppo_tf.py # Custom PPO agent(TensorFlow implementation)
│
├── training/
│ └── train_selfplay.py # Main training script (self-play, generalization)
│
├── demo.ipynb # Evaluation notebook
│
├── requirements.txt # Package requirements
│
└── README.md
