Status: Under development (expect bug fixes and huge updates)
ShinRL is an open-source JAX library specialized for the evaluation of reinforcement learning (RL) algorithms from both theoretical and practical perspectives. Please take a look at the paper for details. Try ShinRL at experiments/QuickStart.ipynb.
import gym
from shinrl import DiscreteViSolver
import matplotlib.pyplot as plt
# make an env & a config
env = gym.make("ShinPendulum-v0")
config = DiscreteViSolver.DefaultConfig(explore="eps_greedy", approx="nn", steps_per_epoch=10000)
# make & run a solver
mixins = DiscreteViSolver.make_mixins(env, config)
dqn_solver = DiscreteViSolver.factory(env, config, mixins)
dqn_solver.run()
# plot performance
returns = dqn_solver.scalars["Return"]
plt.plot(returns["x"], returns["y"])
# plot learned q-values (action == 0)
q0 = dqn_solver.data["Q"][:, 0]
env.plot_S(q0, title="Learned")ShinEnvprovides small environments with oracle methods that can compute exact quantities.- Some environments support continuous action space and image observation:
- See the tutorial for details: experiments/Tutorials/ShinEnvTutorial.ipynb.
| Environment | Discrete action | Continuous action | Image Observation | Tuple Observation |
|---|---|---|---|---|
| ShinMaze | ✔️ | ❌ | ❌ | ✔️ |
| ShinMountainCar-v0 | ✔️ | ✔️ | ✔️ | ✔️ |
| ShinPendulum-v0 | ✔️ | ✔️ | ✔️ | ✔️ |
| ShinCartPole-v0 | ✔️ | ✔️ | ❌ | ✔️ |
- A
Solversolves an environment with specified algorithms. - A "mixin" is a class which defines and implements a single feature. ShinRL's solvers are instantiated by mixing some mixins.
- See the tutorial for details: experiments/Tutorials/SolverTutorial.ipynb.
- The table bellow lists the implemented popular algorithms.
- Note that it does not list all the implemented algorithms (e.g., DDP 1 version of the DQN algorithm). See
make_mixinfunctions of solvers for implemented variants. - Note that the implemented algorithms may differ from the original implementation for simplicity (e.g., Discrete SAC). See source code of solvers for details.
| Algorithm | Solver | Configuration | Type 1 |
|---|---|---|---|
| Value Iteration (VI) | DiscreteViSolver | approx == "tabular" & explore == "oracle" |
TDP |
| Policy Iteration (PI) | DiscretePiSolver | approx == "tabular" & explore == "oracle" |
TDP |
| Conservative Value Iteration (CVI) | DiscreteViSolver | approx == "tabular" & explore == "oracle & er_coef != 0 & kl_coef != 0" |
TDP |
| Tabular Q Learning | DiscreteViSolver | approx == "tabular" & explore != "oracle" |
TRL |
| SARSA | DiscretePiSolver | approx == "tabular" & explore != "oracle" & eps_decay_target_pol > 0 |
TRL |
| Deep Q Network (DQN) | DiscreteViSolver | approx == "nn" & explore != "oracle" |
DRL |
| Soft DQN | DiscreteViSolver | approx == "nn" & explore != "oracle" & er_coef != 0 |
DRL |
| Munchausen-DQN | DiscreteViSolver | approx == "nn" & explore != "oracle" & er_coef != 0 & kl_coef != 0 |
DRL |
| Double-DQN | DiscreteViSolver | approx == "nn" & explore != "oracle" & use_double_q == True |
DRL |
| Discrete Soft Actor Critic | DiscretePiSolver | approx == "nn" & explore != "oracle" & er_coef != 0 |
DRL |
| Deep Deterministic Policy Gradient (DDPG) | ContinuousDdpgSolver | approx == "nn" & explore != "oracle" |
DRL |
1 Algorithm Type:
- TDP (
approx=="tabular" & explore=="oracle"): Tabular Dynamic Programming algorithms. No exploration & no approximation & the complete specification about the MDP is given. - TRL (
approx=="tabular" & explore!="oracle"): Tabular Reinforcement Learning algorithms. No approximation & the dynamics and the reward functions are unknown. - DDP (
approx=="nn" & explore=="oracle"): Deep Dynamic Programming algorithms. It is the same as TDP, except that neural networks approximate computed values. - DRL (
approx=="nn" & explore!="oracle"): Deep Reinforcement Learning algorithms. It is the same as TRL, except that neural networks approximate computed values.
git clone [email protected]:omron-sinicx/ShinRL.git
cd ShinRL
pip install -e .cd ShinRL
make testcd ShinRL
make formatcd ShinRL
docker-compose up# Neurips DRL WS 2021 version (pytorch branch)
@inproceedings{toshinori2021shinrl,
author = {Kitamura, Toshinori and Yonetani, Ryo},
title = {ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives},
year = {2021},
booktitle = {Proceedings of the NeurIPS Deep RL Workshop},
}
# Arxiv version (commit 2d3da)
@article{toshinori2021shinrlArxiv,
author = {Kitamura, Toshinori and Yonetani, Ryo},
title = {ShinRL: A Library for Evaluating RL Algorithms from Theoretical and Practical Perspectives},
year = {2021},
url = {https://arxiv.org/abs/2112.04123},
journal={arXiv preprint arXiv:2112.04123},
}



