-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
In calculating kl_dist, I have seen no update to the policy before calculating "logp_newpolicy_oldac," which results in kl_dist being 0.0 every time, as the same policy means the same probability dist under the same observations. Can you please review the code and give feedback about the situation? Maybe I am missing something. We can talk about this in a meeting session also if you want.
Metadata
Metadata
Assignees
Labels
No labels