Why does GAIL get lower rewards the more it is trained?

Hi, thank you for the baseline code, it helps me a lot. But I have a little problem with running it. I first sample data through the trained expert strategy, and then provide it to GAIL, but in the environments of Ant-v2 and Hopper-v2, the rewards will get lower and lower as the number of training increases. My environment is mujoco.py=2.0.8 and mujoco200. I would be very grateful if you could take the time to look into the problem for me.
![16571687510554_ pic](https://github.com/Khrylx/PyTorch-RL/assets/46175680/6ddab97b-dd7c-453b-a5ad-b6e104ca3a5e)
![16401687509779_ pic](https://github.com/Khrylx/PyTorch-RL/assets/46175680/6ddba10b-d71f-4221-9780-fbc339d14645)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does GAIL get lower rewards the more it is trained? #36

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Why does GAIL get lower rewards the more it is trained? #36

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions