Hyperparameter Tuning for DQN, PPO, and SAC

This repository provides comprehensive instructions to reproduce the results from our study on hyperparameter tuning for DQN, PPO, and SAC in hybrid-action environments. The algorithms were adapted to address the unique challenges posed by such environments:

PPO and SAC: Modified using the PFRL framework to handle hybrid actions.
DQN: Continuous action space discretized into 24 discrete actions, derived from 8 predefined directions combined with 3 discrete action types.

Predefined Directions

The 8 predefined directions used for discretization are:

Up ((0, 1))
Up-Right ((1, 1))
Right ((1, 0))
Down-Right ((1, -1))
Down ((0, -1))
Down-Left ((-1, -1))
Left ((-1, 0))
Up-Left ((-1, 1))

These adaptations were critical for enabling the algorithms to operate effectively in the hybrid-action environment. For further details, refer to the accompanying paper.

Setup Instructions

1. Environment Setup (Compute Canada)

If using Compute Canada, load the required modules with the following commands:

module load clang/17.0.6
module load cmake
module load python/3.11.5
module load scipy-stack
module load glfw
module load cuda/12.2
module load opencv/4.11.0

For other environments, refer to the AgarCL installation guide.

2. Install Python Dependencies

Install the required Python packages:

pip install -r requirements.txt

3. Install AgarCL

Navigate to the AgarLE directory and install the package:

cd AgarCL_DIR
python setup.py install --user  # or
pip install -e .

Note: These installations are one-time processes and do not need to be repeated for subsequent experiments.

Running Hyperparameter Tuning

The hyperparameter tuning process is automated using scripts that facilitate job submission to cluster environments.

DQN Hyperparameter Tuning

Use the following command to generate and submit the DQN tuning script:

python SubmitParallelJobs.py DQN.sh YOUR_DIR/DQN_full_action_set.py\
  --lr=1e-5,3e-5,1e-4,3e-4\
  --batch_accumulator="sum","mean"\
  --minibatch_size=32,64\
  --tau=1e-2,1e-3,5e-3\
  --seed=0,1,2

SAC Hyperparameter Tuning

Run the following command for SAC tuning:

python generate_bash.py SAC.sh SAC_full_action_set.py\
  --reward="reward_gym","min_max"\
  --lr=3e-5,1e-4,1e-5\
  --soft-update-tau=0.01,0.005,0.001\
  --temperature-lr=1e-4,1e-5\
  --max-grad-norm=0.5,0.7,0.9\
  --seed=0,1,2

PPO Hyperparameter Tuning

Run the following command for PPO tuning:

python generate_bash.py PPO.sh /home/mayman/projects/def-machado/mayman/agarle_bench/AgarLE-benchmark/PPO_multi_heads_full_action.py \
  --reward="reward_gym","min_max" \
  --lr=1e-5,3e-5,1e-4,3e-4\
  --epochs=10,15,20\
  --max-grad-norm=0.5,0.7,0.9\
  --entropy-coef=0.05,0.01,0.1,0.5\
  --clip-eps=0.2,0.4,0.5\
  --value-func-coef=0.5,0.7,0.9\
  --seed=0,1,2

Post-Tuning Analysis

After running the tuning scripts, use the get_best_hyper notebook to identify the best hyperparameters based on the metric of averaging the last 100 steps.

Notes

Ensure all dependencies are installed before running experiments.
The generate_bash.py script simplifies job submission to Compute Canada.
PPO tuning involves a two-stage process for efficient optimization.

Contact

For issues or suggestions, raise an issue or contribute to this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
bash_files		bash_files
.gitignore		.gitignore
.gitmodules		.gitmodules
DQN_RNN.py		DQN_RNN.py
DQN_full_action_set.py		DQN_full_action_set.py
PPO_multi_heads_full_action.py		PPO_multi_heads_full_action.py
PPO_multi_heads_full_action_RNN.py		PPO_multi_heads_full_action_RNN.py
README.md		README.md
RNN_arch.py		RNN_arch.py
SAC_full_action_set.py		SAC_full_action_set.py
custom_utils.py		custom_utils.py
env_config.json		env_config.json
get_best_hypars.ipynb		get_best_hypars.ipynb
requirements.txt		requirements.txt
test.py		test.py
update_csv.py		update_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hyperparameter Tuning for DQN, PPO, and SAC

Predefined Directions

Setup Instructions

1. Environment Setup (Compute Canada)

2. Install Python Dependencies

3. Install AgarCL

Running Hyperparameter Tuning

DQN Hyperparameter Tuning

SAC Hyperparameter Tuning

PPO Hyperparameter Tuning

Post-Tuning Analysis

Notes

Contact

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

machado-research/AgarCL-benchmark

Folders and files

Latest commit

History

Repository files navigation

Hyperparameter Tuning for DQN, PPO, and SAC

Predefined Directions

Setup Instructions

1. Environment Setup (Compute Canada)

2. Install Python Dependencies

3. Install AgarCL

Running Hyperparameter Tuning

DQN Hyperparameter Tuning

SAC Hyperparameter Tuning

PPO Hyperparameter Tuning

Post-Tuning Analysis

Notes

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages