Reward sign in Eq. (8) seems inverted: node-death “penalty” has the highest positive value

While reading the reward design in the documentation/paper, I noticed a potential sign inconsistency in Eq. (8). The node-death case (when E_(n,t)≤0 for some node n ) receives the largest positive number (2), although it is described as a penalty.
Screenshot for reference:
Equation (8) (as written):

<img width="650" height="150" alt="Image" src="https://github.com/user-attachments/assets/c94a4f9e-1a7b-4b7e-bdfb-a00e78350f04" />
Why this is problematic (for standard RL maximization)
Most RL algorithms (e.g., Q-learning, policy gradient) maximize expected return. Assigning a larger positive value to node death will increase action values for behaviors that cause death, which contradicts the stated intention of a penalty. Under maximization, 2 is better than 1.1 and 1 .
Possible explanations (please clarify which applies)
	R is actually a cost that the algorithm minimizes (e.g., added to a loss).
	There is a sign flip later (e.g., code uses -R in the update), making the death term effectively -2 .
	Typographical error in the equation; the death case should be negative (e.g., -2 ) or at least strictly smaller than the alive cases.
Suggested fix (if the objective is to maximize reward)
Use a negative value for node death, or make it strictly the smallest return:

<img width="616" height="126" alt="Image" src="https://github.com/user-attachments/assets/2950cc07-5883-4d51-bed5-e033c652b912" />

Optional shaping alternative:

<img width="1059" height="212" alt="Image" src="https://github.com/user-attachments/assets/c81f3896-5413-42a6-b334-1e990c3dbd91" />

Request
•	Could you please confirm whether Eq. (8) is intended as a cost (to minimize) or a reward (to maximize)?
•	If it is a reward to be maximized, would you consider changing the death case to a negative value (or documenting the sign flip in code)?
Thanks for the great work on PyNetSim!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reward sign in Eq. (8) seems inverted: node-death “penalty” has the highest positive value #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reward sign in Eq. (8) seems inverted: node-death “penalty” has the highest positive value #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions