This repository provides comprehensive instructions to reproduce the results from our study on hyperparameter tuning for DQN, PPO, and SAC in hybrid-action environments. The algorithms were adapted to address the unique challenges posed by such environments:
- PPO and SAC: Modified using the PFRL framework to handle hybrid actions.
- DQN: Continuous action space discretized into 24 discrete actions, derived from 8 predefined directions combined with 3 discrete action types.
The 8 predefined directions used for discretization are:
- Up ((0, 1))
- Up-Right ((1, 1))
- Right ((1, 0))
- Down-Right ((1, -1))
- Down ((0, -1))
- Down-Left ((-1, -1))
- Left ((-1, 0))
- Up-Left ((-1, 1))
These adaptations were critical for enabling the algorithms to operate effectively in the hybrid-action environment. For further details, refer to the accompanying paper.
If using Compute Canada, load the required modules with the following commands:
module load clang/17.0.6
module load cmake
module load python/3.11.5
module load scipy-stack
module load glfw
module load cuda/12.2
module load opencv/4.11.0For other environments, refer to the AgarCL installation guide.
Install the required Python packages:
pip install -r requirements.txtNavigate to the AgarLE directory and install the package:
cd AgarCL_DIR
python setup.py install --user # or
pip install -e .Note: These installations are one-time processes and do not need to be repeated for subsequent experiments.
The hyperparameter tuning process is automated using scripts that facilitate job submission to cluster environments.
Use the following command to generate and submit the DQN tuning script:
python SubmitParallelJobs.py DQN.sh YOUR_DIR/DQN_full_action_set.py\
--lr=1e-5,3e-5,1e-4,3e-4\
--batch_accumulator="sum","mean"\
--minibatch_size=32,64\
--tau=1e-2,1e-3,5e-3\
--seed=0,1,2Run the following command for SAC tuning:
python generate_bash.py SAC.sh SAC_full_action_set.py\
--reward="reward_gym","min_max"\
--lr=3e-5,1e-4,1e-5\
--soft-update-tau=0.01,0.005,0.001\
--temperature-lr=1e-4,1e-5\
--max-grad-norm=0.5,0.7,0.9\
--seed=0,1,2Run the following command for PPO tuning:
python generate_bash.py PPO.sh /home/mayman/projects/def-machado/mayman/agarle_bench/AgarLE-benchmark/PPO_multi_heads_full_action.py \
--reward="reward_gym","min_max" \
--lr=1e-5,3e-5,1e-4,3e-4\
--epochs=10,15,20\
--max-grad-norm=0.5,0.7,0.9\
--entropy-coef=0.05,0.01,0.1,0.5\
--clip-eps=0.2,0.4,0.5\
--value-func-coef=0.5,0.7,0.9\
--seed=0,1,2After running the tuning scripts, use the get_best_hyper notebook to identify the best hyperparameters based on the metric of averaging the last 100 steps.
- Ensure all dependencies are installed before running experiments.
- The
generate_bash.pyscript simplifies job submission to Compute Canada. - PPO tuning involves a two-stage process for efficient optimization.
For issues or suggestions, raise an issue or contribute to this repository.