Skip to content

Reward sign in Eq. (8) seems inverted: node-death “penalty” has the highest positive value #2

@mohammed-menaplatforms

Description

@mohammed-menaplatforms

While reading the reward design in the documentation/paper, I noticed a potential sign inconsistency in Eq. (8). The node-death case (when E_(n,t)≤0 for some node n ) receives the largest positive number (2), although it is described as a penalty.
Screenshot for reference:
Equation (8) (as written):

Image Why this is problematic (for standard RL maximization) Most RL algorithms (e.g., Q-learning, policy gradient) maximize expected return. Assigning a larger positive value to node death will increase action values for behaviors that cause death, which contradicts the stated intention of a penalty. Under maximization, 2 is better than 1.1 and 1 . Possible explanations (please clarify which applies) R is actually a cost that the algorithm minimizes (e.g., added to a loss). There is a sign flip later (e.g., code uses -R in the update), making the death term effectively -2 . Typographical error in the equation; the death case should be negative (e.g., -2 ) or at least strictly smaller than the alive cases. Suggested fix (if the objective is to maximize reward) Use a negative value for node death, or make it strictly the smallest return: Image

Optional shaping alternative:

Image

Request
• Could you please confirm whether Eq. (8) is intended as a cost (to minimize) or a reward (to maximize)?
• If it is a reward to be maximized, would you consider changing the death case to a negative value (or documenting the sign flip in code)?
Thanks for the great work on PyNetSim!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions