Add game environment: numeric dueling #285

cwing-nvidia · 2025-11-09T05:10:35Z

Add Numeric Dueling Environment

Overview

Adds a new probabilistic, simultaneous-move, multi-round number-selection game designed to train LLM capabilities in numerical reasoning, risk assessment, and strategic thinking under uncertainty.

What's Implemented

Game engine with 9 rule variants:

3 bust rules (Standard, Soft, Probabilistic)
3 win rules (Highest, Closest to R, Cumulative)
Flexible configuration (rounds, number ranges, opponent types)

Infrastructure:

Session-based state management for multi-round games
Dynamic prompt generation based on game state
Pluggable opponent interface (Random, Fixed, Adaptive)
Per-round verification with immediate reward signals
Full game history tracking

Testing:

42 automated tests (18 unit + 13 integration + 11 error handling) - all passing
Human interactive client for manual testing
Tested all 9 rule variants with GPT-4

Documentation:

Comprehensive README with game spec
Detailed usage instructions
Test documentation

Request for Guidance: Multi-Round Support

This environment is designed for multi-round gameplay (e.g., 5 rounds per game), where the agent should:

Call /seed_session (initialize game)
Loop for N rounds:
- Call /get_prompt (get current round's dynamic prompt)
- Send prompt to LLM
- Call /verify (submit LLM response, get reward)
Game state persists across rounds via session management

Observed Behavior:

ng_collect_rollouts + simple_agent calls /verify once per JSONL row
Human interactive client successfully loops through all rounds by repeatedly calling /verify
Server correctly maintains state across multiple /verify calls

Questions:

Does NeMo Gym currently support multi-turn environments where agents need to call /verify multiple times per episode (not just multiple tool calls within one turn)?
Are there existing environments with similar multi-round patterns I can reference?
What's the intended pattern for environments where:
- Prompts are dynamically generated based on evolving game state (via /get_prompt)
- State persists across multiple agent-environment interaction cycles
- Multiple verification steps are needed per episode

The environment itself is fully functional - it's the agent workflow integration that needs guidance.

…ssessment and multi-step planning Add numeric_dueling environment with: - Flexible game engine supporting various rule combinations (3 bust × 3 win rules) - Multi-round gameplay with stateful session management - Opponent types: Random, Fixed, Adaptive (extensible for future LLM vs LLM) - Human testing client for interactive gameplay - 42 automated tests (18 unit + 13 integration + 11 error handling) - Comprehensive documentation Signed-off-by: Chris Wing <[email protected]>

Signed-off-by: Chris Wing <[email protected]>

copy-pr-bot · 2025-11-09T05:10:40Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cwing-nvidia added 3 commits November 8, 2025 20:02

add LLM testing with example rollouts

78e87f2

Signed-off-by: Chris Wing <[email protected]>

docs: clean up numeric_dueling README intro for clarity

c04252d

Signed-off-by: Chris Wing <[email protected]>

cwing-nvidia requested review from bxyu-nvidia and fsiino-nvidia November 9, 2025 05:10

cwing-nvidia added the resource-server Resource servers (math, code, etc.) label Nov 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add game environment: numeric dueling #285

Add game environment: numeric dueling #285

Uh oh!

cwing-nvidia commented Nov 9, 2025

Uh oh!

copy-pr-bot bot commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add game environment: numeric dueling #285

Are you sure you want to change the base?

Add game environment: numeric dueling #285

Uh oh!

Conversation

cwing-nvidia commented Nov 9, 2025

Add Numeric Dueling Environment

Overview

What's Implemented

Request for Guidance: Multi-Round Support

Uh oh!

copy-pr-bot bot commented Nov 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants