Skip to content

Conversation

@cwing-nvidia
Copy link
Contributor

Add Numeric Dueling Environment

Overview

Adds a new probabilistic, simultaneous-move, multi-round number-selection game designed to train LLM capabilities in numerical reasoning, risk assessment, and strategic thinking under uncertainty.

What's Implemented

Game engine with 9 rule variants:

  • 3 bust rules (Standard, Soft, Probabilistic)
  • 3 win rules (Highest, Closest to R, Cumulative)
  • Flexible configuration (rounds, number ranges, opponent types)

Infrastructure:

  • Session-based state management for multi-round games
  • Dynamic prompt generation based on game state
  • Pluggable opponent interface (Random, Fixed, Adaptive)
  • Per-round verification with immediate reward signals
  • Full game history tracking

Testing:

  • 42 automated tests (18 unit + 13 integration + 11 error handling) - all passing
  • Human interactive client for manual testing
  • Tested all 9 rule variants with GPT-4

Documentation:

  • Comprehensive README with game spec
  • Detailed usage instructions
  • Test documentation

Request for Guidance: Multi-Round Support

This environment is designed for multi-round gameplay (e.g., 5 rounds per game), where the agent should:

  1. Call /seed_session (initialize game)
  2. Loop for N rounds:
    • Call /get_prompt (get current round's dynamic prompt)
    • Send prompt to LLM
    • Call /verify (submit LLM response, get reward)
  3. Game state persists across rounds via session management

Observed Behavior:

  • ng_collect_rollouts + simple_agent calls /verify once per JSONL row
  • Human interactive client successfully loops through all rounds by repeatedly calling /verify
  • Server correctly maintains state across multiple /verify calls

Questions:

  • Does NeMo Gym currently support multi-turn environments where agents need to call /verify multiple times per episode (not just multiple tool calls within one turn)?
  • Are there existing environments with similar multi-round patterns I can reference?
  • What's the intended pattern for environments where:
    • Prompts are dynamically generated based on evolving game state (via /get_prompt)
    • State persists across multiple agent-environment interaction cycles
    • Multiple verification steps are needed per episode

The environment itself is fully functional - it's the agent workflow integration that needs guidance.

…ssessment and multi-step planning

Add numeric_dueling environment with:
- Flexible game engine supporting various rule combinations (3 bust × 3 win rules)
- Multi-round gameplay with stateful session management
- Opponent types: Random, Fixed, Adaptive (extensible for future LLM vs LLM)
- Human testing client for interactive gameplay
- 42 automated tests (18 unit + 13 integration + 11 error handling)
- Comprehensive documentation

Signed-off-by: Chris Wing <[email protected]>
@cwing-nvidia cwing-nvidia added the resource-server Resource servers (math, code, etc.) label Nov 9, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 9, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

resource-server Resource servers (math, code, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants