Skip to content

kl dist calculation #1

@kyourin

Description

@kyourin

In calculating kl_dist, I have seen no update to the policy before calculating "logp_newpolicy_oldac," which results in kl_dist being 0.0 every time, as the same policy means the same probability dist under the same observations. Can you please review the code and give feedback about the situation? Maybe I am missing something. We can talk about this in a meeting session also if you want.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions