[feat]: added one-logger support for NeMo RL #1631

therealnaveenkamal · 2025-12-14T01:09:51Z

What does this PR do ?

Adds OneLogger (https://github.com/NVIDIA/nv-one-logger) support for NeMo-RL

Issues

Addresses #1152

Usage

You can potentially add a usage example below

uv run examples/run_sft.py logger.one_logger_enabled=True logger.one_logger.project=my-nemo-project logger.one_logger.name=my-rl-run

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

Release Notes

New Features
- Added One Logger as a new logging backend for experiment tracking, metrics, and hyperparameters.
- Supports configuration alongside existing logging backends (Weights & Biases, TensorBoard, MLflow).
- Includes new configuration options to enable and customize One Logger in training runs.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Naveenraj Kamalakannan <[email protected]>

coderabbitai · 2025-12-14T01:15:42Z

📝 Walkthrough

Walkthrough

This pull request integrates NV-One-Logger as a new logging backend into the NeMo RL framework. It adds the nv-one-logger submodule, creates wrapper packages with setup and configuration files, introduces the OneLogger class implementing the LoggerInterface, and updates project dependencies and example configurations to support the new logger backend.

Changes

Cohort / File(s)	Change Summary
Submodule and Workspace Setup `.gitmodules`, `3rdparty/NV-One-Logger-workspace/nv-one-logger`	Adds new git submodule reference for NV-One-Logger at `3rdparty/NV-One-Logger-workspace/nv-one-logger` pointing to `[email protected]:NVIDIA/nv-one-logger.git`
NV-One-Logger Root Workspace `3rdparty/NV-One-Logger-workspace/pyproject.toml`, `3rdparty/NV-One-Logger-workspace/setup.py`, `3rdparty/NV-One-Logger-workspace/is_nvlogger_installed.py`	Adds project configuration and setup script for the core nv-one-logger workspace, including dependency resolution from nested pyproject.toml and package discovery; adds utility script to check nv_one_logger installation status
Wrapper Package Configurations `3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/`, `3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/`, `3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/*`	Adds pyproject.toml and setup.py files for three wrapper packages (otel, training-telemetry, wandb), each with Poetry dependency parsing and setuptools package configuration
Logger Module Integration `nemo_rl/utils/logger.py`	Introduces OneLogger class implementing LoggerInterface, OneLoggerConfigDict TypedDict, extends LoggerConfig with one_logger configuration options, integrates OneLogger backend initialization into Logger class with conditional enablement per rank
Example Configurations `examples/configs/grpo_math_1B.yaml`, `examples/configs/sft.yaml`	Adds optional one_logger configuration blocks with project and name fields to logger section in example config files
Project Dependencies `pyproject.toml`	Adds four nv-one-logger packages (core, otel, training-telemetry, wandb) as dependencies and optional dependencies, registers workspace entries for UV package manager, and includes packages in no-build-isolation list

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

nemo_rl/utils/logger.py: New OneLogger class and LoggerConfig TypedDict extensions introduce public API surface; apparent code duplication at end of file; GPU-event integration scaffolding uses new API patterns
Setup.py files (3rdparty/*/setup.py): Dependency parsing logic converts Poetry format to pip install_requires across four similar files; potential edge cases in version handling and path resolution
pyproject.toml workspace configuration: Multiple new dependencies and workspace entries; verify correctness of workspace member paths and dependency declarations
Config file additions: Straightforward additions but verify schema compatibility with logger configuration expectations

Suggested labels

CI, Run CICD

Suggested reviewers

terrykong
parthchadha
yfw

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	PR introduces OneLogger class with full logging functionality but no tests were added for it; existing test file contains no OneLogger references.	Add comprehensive unit tests for OneLogger class to tests/unit/utils/test_logger.py covering initialization, logging methods, and Logger class integration.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'added one-logger support for NeMo RL' directly and clearly summarizes the main change: integrating OneLogger (nv-one-logger) into the NeMo RL framework. It matches the core objective described in the PR.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_rl/utils/logger.py (1)
948-978: Require explicit cfg["one_logger"] configuration when one_logger_enabled is True, consistent with other backends.

The one_logger initialization uses cfg.get("one_logger", {}) which allows silent fallback to an empty config dict, whereas wandb, swanlab, tensorboard, and mlflow all require their config keys to be present (using direct cfg[key] access). This inconsistency can result in OneLogger being silently initialized with defaults instead of failing fast when configuration is missing.

Change from:
if cfg["one_logger_enabled"]:
    # OneLogger needs no log dir setup
    self.one_logger = OneLogger(cfg.get("one_logger", {}), log_dir=self.base_log_dir)
    self.loggers.append(self.one_logger)
To:
if cfg["one_logger_enabled"]:
    one_logger_cfg = cfg["one_logger"]  # Require key to be present
    # OneLogger needs no log dir setup
    self.one_logger = OneLogger(one_logger_cfg, log_dir=self.base_log_dir)
    self.loggers.append(self.one_logger)
This matches the pattern used by all other logging backends and ensures configuration is explicitly provided.

🧹 Nitpick comments (7)

pyproject.toml (1)
191-191: Minor clarity improvement for the comment.

The comment starting with "If we don't do this" could be slightly clearer about the consequence.

Consider this diff for improved clarity:
-  # If we don't do this, the research projects do not see the global uv.lock, and may mistakenly
+  # If we don't do this, research projects will not see the global uv.lock and may mistakenly
3rdparty/NV-One-Logger-workspace/pyproject.toml (1)

5-9: Consider aligning Python version requirement with project standards.

The requires-python = ">=3.8" may conflict with the project's Python 3.12+ guideline. If this wrapper is intended for use only within NeMo-RL (which targets 3.12+), consider updating to requires-python = ">=3.12". However, if maintaining compatibility with the upstream nv-one-logger package is intentional, this can remain as-is.
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/setup.py (2)
26-30: Poetry caret conversion loses upper-bound semantics.

Poetry's ^ specifier (e.g., ^1.2.3) means >=1.2.3, <2.0.0 for major versions. Converting to just >=1.2.3 removes the upper bound, potentially allowing breaking changes. If strict version compatibility is needed, consider using the poetry-core or poetry-semver libraries for accurate conversion, or explicitly set upper bounds.

17-59: Consider extracting shared logic to reduce duplication.

This dependency extraction and setup logic is duplicated across all four setup.py files (core, otel, wandb, training-telemetry). Consider creating a shared utility module (e.g., _setup_utils.py) that handles Poetry-to-setuptools conversion, then have each setup.py import and call it with package-specific parameters.

Example shared utility approach:
# 3rdparty/NV-One-Logger-workspace/_setup_utils.py
def setup_nv_logger_package(src_dir, name, description):
    """Shared setup logic for nv-one-logger wrappers."""
    # ... common logic here ...
Then each setup.py becomes:
from _setup_utils import setup_nv_logger_package
setup_nv_logger_package(
    src_dir=Path(__file__).parent.parent.parent / "nv-one-logger" / "nv_one_logger" / "one_logger_otel",
    name="nv-one-logger-otel",
    description="Extensions to onelogger library to use Open telemetry (OTEL) as a backend.",
)
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/setup.py (1)

6-59: Duplicated setup logic.

This file duplicates the same Poetry-to-setuptools conversion logic found in the other wrapper setup.py files. Please consider the shared utility refactor suggested in the otel wrapper review to improve maintainability.

3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/setup.py (1)

6-59: Duplicated setup logic.

Same duplication concern as other wrappers. Please consider the shared utility refactor suggested in the otel wrapper review.

3rdparty/NV-One-Logger-workspace/setup.py (1)

6-59: Duplicated setup logic - consider this file as the host for shared utility.

This root setup.py shares identical logic with all wrapper setup.py files. Consider placing a shared _setup_utils.py module here in 3rdparty/NV-One-Logger-workspace/ that all setup.py files can import from.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e5df0c and 046c049.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (15)

.gitmodules (1 hunks)
3rdparty/NV-One-Logger-workspace/is_nvlogger_installed.py (1 hunks)
3rdparty/NV-One-Logger-workspace/nv-one-logger (1 hunks)
3rdparty/NV-One-Logger-workspace/pyproject.toml (1 hunks)
3rdparty/NV-One-Logger-workspace/setup.py (1 hunks)
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/pyproject.toml (1 hunks)
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/setup.py (1 hunks)
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/pyproject.toml (1 hunks)
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/setup.py (1 hunks)
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/pyproject.toml (1 hunks)
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/setup.py (1 hunks)
examples/configs/grpo_math_1B.yaml (1 hunks)
examples/configs/sft.yaml (1 hunks)
nemo_rl/utils/logger.py (7 hunks)
pyproject.toml (4 hunks)

🧰 Additional context used

📓 Path-based instructions (4)

!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/pyproject.toml
.gitmodules
examples/configs/grpo_math_1B.yaml
3rdparty/NV-One-Logger-workspace/setup.py
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/setup.py
pyproject.toml
3rdparty/NV-One-Logger-workspace/nv-one-logger
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/pyproject.toml
3rdparty/NV-One-Logger-workspace/pyproject.toml
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/setup.py
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/pyproject.toml
3rdparty/NV-One-Logger-workspace/is_nvlogger_installed.py
examples/configs/sft.yaml
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/setup.py
nemo_rl/utils/logger.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

3rdparty/NV-One-Logger-workspace/setup.py
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/setup.py
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/setup.py
3rdparty/NV-One-Logger-workspace/is_nvlogger_installed.py
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/setup.py
nemo_rl/utils/logger.py

**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

3rdparty/NV-One-Logger-workspace/setup.py
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/setup.py
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/setup.py
3rdparty/NV-One-Logger-workspace/is_nvlogger_installed.py
3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/setup.py
nemo_rl/utils/logger.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

nemo_rl/utils/logger.py

🪛 Ruff (0.14.8)

3rdparty/NV-One-Logger-workspace/setup.py

10-10: Avoid specifying long messages outside the exception class

(TRY003)

3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-wandb/setup.py

10-10: Avoid specifying long messages outside the exception class

(TRY003)

3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-training-telemetry/setup.py

10-10: Avoid specifying long messages outside the exception class

(TRY003)

3rdparty/NV-One-Logger-workspace/is_nvlogger_installed.py

15-15: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/setup.py

10-10: Avoid specifying long messages outside the exception class

(TRY003)

nemo_rl/utils/logger.py

467-467: Unused method argument: log_dir

(ARG002)

512-512: Unused method argument: step_metric

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (5)

examples/configs/sft.yaml (1)

209-212: LGTM! Consistent logger configuration pattern.

The new one_logger configuration follows the established pattern used by other loggers (wandb, swanlab) with appropriate variable interpolation and safe default (one_logger_enabled: false).

examples/configs/grpo_math_1B.yaml (1)

286-289: LGTM! Consistent with sft.yaml configuration.

The one_logger configuration is consistent with the pattern established in sft.yaml and other logger configurations, ensuring uniformity across example configs.

3rdparty/NV-One-Logger-workspace/is_nvlogger_installed.py (1)

1-21: LGTM! Copyright header and structure are correct.

The file follows coding guidelines correctly:

NVIDIA copyright header with current year (2025)

Upper snake case for module-level constant INSTALLED

Clean try/except pattern for import detection

The Ruff RUF100 hint about the unused noqa directive is a false positive—the noqa: F401 is necessary since F401 (unused import) is enabled in the project's ruff configuration (line 316 of root pyproject.toml), and the import is intentionally unused as it's only checked for availability.

pyproject.toml (2)

158-161: LGTM! Proper workspace configuration.

The workspace sources, members, and no-build-isolation-package entries are correctly configured and follow the same pattern as other workspace packages (e.g., megatron-core, megatron-bridge).

Also applies to: 185-189, 218-221

52-55: The nv-one-logger packages should remain as required dependencies in dependencies, not optional.

The packages are unconditionally imported at the top level of nemo_rl/utils/logger.py (lines 57–63), which itself is unconditionally imported across all main example scripts and algorithm modules (sft.py, distillation.py, dpo.py, grpo.py, rm.py, etc.). If nv-one-logger were optional, importing any of these modules would fail if the packages were not installed. The one_logger_enabled configuration flag only controls whether the logger is used, not whether it can be imported.

coderabbitai · 2025-12-14T01:15:45Z

.gitmodules

+[submodule "3rdparty/NV-One-Logger-workspace/nv-one-logger"]
+	path = 3rdparty/NV-One-Logger-workspace/nv-one-logger
+	url = [email protected]:NVIDIA/nv-one-logger.git


🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n .gitmodules

Repository: NVIDIA-NeMo/RL

Length of output: 1102

🏁 Script executed:

curl -s -o /dev/null -w "%{http_code}" https://github.com/NVIDIA/nv-one-logger

Repository: NVIDIA-NeMo/RL

Length of output: 61

Add branch and shallow clone configuration for consistency.

The new submodule entry lacks branch and shallow properties that are present in all four existing submodules. Additionally, it uses an SSH URL instead of HTTPS like all other NVIDIA submodules. This creates inconsistency and may lead to unintended behavior:

Without shallow = true, the submodule will perform full clones instead of shallow clones, increasing download size and time.

Without a branch specification, the submodule will track HEAD rather than a stable, pinned branch.

The SSH URL ([email protected]:) differs from the HTTPS URLs used for all other NVIDIA submodules in this file.

Apply this diff to align with existing submodule configurations:

[submodule "3rdparty/NV-One-Logger-workspace/nv-one-logger"] path = 3rdparty/NV-One-Logger-workspace/nv-one-logger - url = [email protected]:NVIDIA/nv-one-logger.git + url = https://github.com/NVIDIA/nv-one-logger.git + branch = main + shallow = true

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

[submodule "3rdparty/NV-One-Logger-workspace/nv-one-logger"]

path = 3rdparty/NV-One-Logger-workspace/nv-one-logger

url = [email protected]:NVIDIA/nv-one-logger.git

[submodule "3rdparty/NV-One-Logger-workspace/nv-one-logger"]

path = 3rdparty/NV-One-Logger-workspace/nv-one-logger

url = https://github.com/NVIDIA/nv-one-logger.git

branch = main

shallow = true

🤖 Prompt for AI Agents

In .gitmodules around lines 21-23, the NV-One-Logger submodule entry is missing the branch and shallow settings and uses an SSH URL; update the entry to match the other NVIDIA submodules by changing the URL to the HTTPS form, adding a branch line matching the other NVIDIA submodules (e.g., branch = main or the same branch used by the other entries), and add shallow = true so the submodule is shallow-cloned.

coderabbitai · 2025-12-14T01:15:45Z

3rdparty/NV-One-Logger-workspace/nv-one-logger

@@ -0,0 +1 @@
+Subproject commit c382dd5193ae987233c7a61de33fb16f91724b7d


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash echo "=== Checking repository structure ===" ls -la 3rdparty/ 2>/dev/null | head -20 || echo "3rdparty directory listing failed" echo -e "\n=== Checking if submodule file exists ===" if [ -f "3rdparty/NV-One-Logger-workspace/nv-one-logger" ]; then echo "File found. Content:" cat "3rdparty/NV-One-Logger-workspace/nv-one-logger" else echo "File not found at specified path" fi echo -e "\n=== Checking .gitmodules configuration ===" if [ -f .gitmodules ]; then echo ".gitmodules content:" cat .gitmodules else echo ".gitmodules file not found" fi echo -e "\n=== Searching for OneLogger references in codebase ===" rg -i "onelogger|one.?logger" --type py --type sh 2>/dev/null | head -30 || echo "No OneLogger references found" echo -e "\n=== Searching for test files ===" find . -type f \( -name "*test*.py" -o -name "*test*.sh" \) 2>/dev/null | grep -i "logger\|log" || echo "No logger-related test files found" echo -e "\n=== Searching for documentation files mentioning OneLogger ===" rg -i "onelogger|one.?logger" --type md --type txt 2>/dev/null | head -30 || echo "No documentation found mentioning OneLogger" echo -e "\n=== Listing Python files (excluding tests) to check for imports ===" fd -e py . --exclude tests --exclude test_\* 2>/dev/null | xargs grep -l "onelogger\|nv_one_logger" 2>/dev/null || echo "No Python files import OneLogger"

Repository: NVIDIA-NeMo/RL

Length of output: 4838

🏁 Script executed:

head -100 tests/unit/utils/test_logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 3519

🏁 Script executed:

rg -n "OneLogger" tests/unit/utils/test_logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 40

Add OneLogger integration tests and documentation.

OneLogger is integrated in nemo_rl/utils/logger.py but lacks test coverage and documentation. Before merging:

Add unit tests for the OneLogger class in tests/unit/utils/test_logger.py, including configuration, initialization, and logging functionality.

Add documentation describing OneLogger setup, configuration options, and usage examples (e.g., README or integration guide).

Verify the submodule commit hash in .gitmodules is accessible and stable in the NVIDIA/nv-one-logger repository.

🤖 Prompt for AI Agents

In 3rdparty/NV-One-Logger-workspace/nv-one-logger around lines 1 to 1, tests and documentation for OneLogger are missing; add unit tests at tests/unit/utils/test_logger.py that instantiate OneLogger with various valid/invalid configurations, assert initialization state, and verify that log methods emit expected messages (use a temporary config and capture logs or mock handlers), create documentation (README or integration guide) detailing setup steps, available configuration options and usage examples showing initialization and common logging calls, and finally confirm the .gitmodules submodule commit hash points to an accessible, stable commit in the NVIDIA/nv-one-logger repo and update it if necessary.

coderabbitai · 2025-12-14T01:15:45Z

3rdparty/NV-One-Logger-workspace/setup.py

+import sys
+import toml
+from pathlib import Path
+import setuptools


🛠️ Refactor suggestion | 🟠 Major

Missing NVIDIA copyright header and unused import.

Same issues as the wrapper setup.py files: add the NVIDIA copyright header and remove the unused sys import. As per coding guidelines for Python files.

Apply this diff:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + -import sys import toml from pathlib import Path import setuptools

🤖 Prompt for AI Agents

In 3rdparty/NV-One-Logger-workspace/setup.py around lines 1-4, the file is missing the required NVIDIA copyright header and contains an unused import of sys; remove the unused "import sys" line and prepend the standard NVIDIA copyright header comment block at the top of the file (matching other wrapper setup.py files in the repo) so the file begins with the header followed by the remaining imports (toml, Path, setuptools).

coderabbitai · 2025-12-14T01:15:46Z

3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/pyproject.toml

+[project]
+name = "nv-one-logger-otel"
+description = "Wrapper for nv-one-logger-otel"
+requires-python = ">=3.8"


⚠️ Potential issue | 🟡 Minor

Inconsistent Python version requirement.

The requires-python = ">=3.8" setting is inconsistent with the root pyproject.toml which specifies requires-python = ">=3.12" (line 16) and the coding guidelines which require Python 3.12+ conformance. This could allow installation in unsupported Python environments.

Apply this diff to align with project standards:

-requires-python = ">=3.8" +requires-python = ">=3.12"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

requires-python = ">=3.8"

requires-python = ">=3.12"

🤖 Prompt for AI Agents

In 3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/pyproject.toml around line 8, update the Python requirement from ">=3.8" to ">=3.12" to match the root pyproject.toml and project guidelines; edit the requires-python field to ">=3.12" so the package cannot be installed on older, unsupported Python versions and ensure any related CI/metadata remains consistent.

coderabbitai · 2025-12-14T01:15:46Z

3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/setup.py

+import sys
+import toml
+from pathlib import Path
+import setuptools


🛠️ Refactor suggestion | 🟠 Major

Missing NVIDIA copyright header and unused import.

Per coding guidelines, Python files require the NVIDIA copyright header. Also, sys is imported but never used.

Apply this diff:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + -import sys import toml from pathlib import Path import setuptools

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

import sys

import toml

from pathlib import Path

import setuptools

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

# SPDX-License-Identifier: Apache-2.0

#

# Licensed under the Apache License, Version 2.0 (the "License");

# you may not use this file except in compliance with the License.

# You may obtain a copy of the License at

#

# http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

import toml

from pathlib import Path

import setuptools

🤖 Prompt for AI Agents

In 3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/setup.py lines 1-4, the file is missing the required NVIDIA copyright header and imports an unused module (`sys`); add the standard NVIDIA Python file header comment block at the very top of the file and remove the unused `import sys` line so the file only imports required modules (toml, Path, setuptools). Ensure the header is placed before any imports and keep the remaining imports unchanged.

coderabbitai · 2025-12-14T01:15:46Z

nemo_rl/utils/logger.py

+# Add 3rdparty to path to support vendored nv-one-logger
+import sys
+_3rd_party_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../3rdparty/nv-one-logger/nv_one_logger"))
+if os.path.isdir(_3rd_party_root):
+    # We need to add the src directories of the individual packages to sys.path
+    _core_src = os.path.join(_3rd_party_root, "one_logger_core/src")
+    _otel_src = os.path.join(_3rd_party_root, "one_logger_otel/src")
+
+    if os.path.isdir(_core_src) and _core_src not in sys.path:
+        sys.path.append(_core_src)
+    if os.path.isdir(_otel_src) and _otel_src not in sys.path:
+        sys.path.append(_otel_src)
+
+from nv_one_logger.api.config import OneLoggerConfig as BaseOneLoggerConfig
+from nv_one_logger.api.one_logger_provider import OneLoggerProvider
+from nv_one_logger.core.attributes import Attributes
+from nv_one_logger.core.event import Event
+from nv_one_logger.core.span import StandardSpanName
+from nv_one_logger.otel.exporter.otel_exporter import OTelExporter
+from nv_one_logger.recorder.default_recorder import DefaultRecorder
+from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
+
 from nemo_rl.data.interfaces import LLMMessageLogType
 from nemo_rl.distributed.batched_data_dict import BatchedDataDict
+HAVE_ONE_LOGGER = True



⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and check its size find . -name "logger.py" -path "*/nemo_rl/utils/*" -type f

Repository: NVIDIA-NeMo/RL

Length of output: 83

🏁 Script executed:

# Get the file size to determine how to read it wc -l nemo_rl/utils/logger.py 2>/dev/null || echo "File not found, trying alternate path"

Repository: NVIDIA-NeMo/RL

Length of output: 86

🏁 Script executed:

# Read the relevant section and context around lines 44-69 sed -n '1,100p' nemo_rl/utils/logger.py | cat -n

Repository: NVIDIA-NeMo/RL

Length of output: 3945

🏁 Script executed:

# Search for HAVE_ONE_LOGGER usage throughout the file rg "HAVE_ONE_LOGGER" nemo_rl/utils/logger.py -n

Repository: NVIDIA-NeMo/RL

Length of output: 233

🏁 Script executed:

# Check the OneLogger class initialization and how it's used rg -A 20 "class OneLogger" nemo_rl/utils/logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 1404

🏁 Script executed:

# Look for any import error handling or try-except blocks around imports sed -n '40,100p' nemo_rl/utils/logger.py | cat -n

Repository: NVIDIA-NeMo/RL

Length of output: 2512

🏁 Script executed:

# Check Logger.__init__ to see how OneLogger backend is instantiated sed -n '900,1000p' nemo_rl/utils/logger.py | cat -n

Repository: NVIDIA-NeMo/RL

Length of output: 5019

🏁 Script executed:

# Look for where LoggerConfig one_logger field is accessed rg -n "one_logger" nemo_rl/utils/logger.py | head -30

Repository: NVIDIA-NeMo/RL

Length of output: 1620

🏁 Script executed:

# Check if there's any env var or fallback for one_logger config rg -B5 -A5 "OneLogger(" nemo_rl/utils/logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 128

🏁 Script executed:

# Check OneLogger.__init__ signature and usage of log_dir parameter sed -n '464,520p' nemo_rl/utils/logger.py | cat -n

Repository: NVIDIA-NeMo/RL

Length of output: 2558

🏁 Script executed:

# Verify OneLoggerConfigDict is correctly defined and its usage sed -n '95,125p' nemo_rl/utils/logger.py | cat -n

Repository: NVIDIA-NeMo/RL

Length of output: 1073

Wrap nv-one-logger imports in try-except to prevent import-time crash when package is unavailable.

Currently, the unconditional imports at lines 57–64 will raise ImportError before line 68 can execute, making HAVE_ONE_LOGGER ineffective and preventing nemo_rl/utils/logger.py from importing at all if nv-one-logger is neither vendored nor installed. The check in OneLogger.__init__ (line 468) cannot protect against this module-level failure.

# Add 3rdparty to path to support vendored nv-one-logger import sys _3rd_party_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../3rdparty/nv-one-logger/nv_one_logger")) if os.path.isdir(_3rd_party_root): # We need to add the src directories of the individual packages to sys.path _core_src = os.path.join(_3rd_party_root, "one_logger_core/src") _otel_src = os.path.join(_3rd_party_root, "one_logger_otel/src") - if os.path.isdir(_core_src) and _core_src not in sys.path: - sys.path.append(_core_src) - if os.path.isdir(_otel_src) and _otel_src not in sys.path: - sys.path.append(_otel_src) + if os.path.isdir(_core_src) and _core_src not in sys.path: + sys.path.insert(0, _core_src) + if os.path.isdir(_otel_src) and _otel_src not in sys.path: + sys.path.insert(0, _otel_src) -from nv_one_logger.api.config import OneLoggerConfig as BaseOneLoggerConfig -from nv_one_logger.api.one_logger_provider import OneLoggerProvider -from nv_one_logger.core.attributes import Attributes -from nv_one_logger.core.event import Event -from nv_one_logger.core.span import StandardSpanName -from nv_one_logger.otel.exporter.otel_exporter import OTelExporter -from nv_one_logger.recorder.default_recorder import DefaultRecorder -from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter +try: + from nv_one_logger.api.config import OneLoggerConfig as BaseOneLoggerConfig + from nv_one_logger.api.one_logger_provider import OneLoggerProvider + from nv_one_logger.core.attributes import Attributes + from nv_one_logger.core.event import Event + from nv_one_logger.core.span import StandardSpanName + from nv_one_logger.otel.exporter.otel_exporter import OTelExporter + from nv_one_logger.recorder.default_recorder import DefaultRecorder + from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter +except ImportError: + HAVE_ONE_LOGGER = False +else: + HAVE_ONE_LOGGER = True from nemo_rl.data.interfaces import LLMMessageLogType from nemo_rl.distributed.batched_data_dict import BatchedDataDict -HAVE_ONE_LOGGER = True

🤖 Prompt for AI Agents

In nemo_rl/utils/logger.py around lines 44 to 69, the unconditional imports from nv_one_logger will raise ImportError at module import time and make HAVE_ONE_LOGGER ineffective; wrap the nv_one_logger imports in a try/except ImportError block so that if the package is unavailable you catch the ImportError, set HAVE_ONE_LOGGER = False (and optionally record a debug/warning via the standard logging module), and only set HAVE_ONE_LOGGER = True inside the try block after successful imports; keep the existing sys.path adjustments outside the try so vendored paths are still attempted before importing.

coderabbitai · 2025-12-14T01:15:46Z

nemo_rl/utils/logger.py

+class OneLoggerConfigDict(TypedDict):
+    project: str
+    name: str
+    enable_for_current_rank: NotRequired[bool]
+


⚠️ Potential issue | 🟠 Major

Align TypedDict requirements with runtime behavior (avoid hidden defaults; enforce config presence when enabled).
OneLoggerConfigDict requires project/name, but Logger passes {} and OneLogger supplies fallback defaults in code, which conflicts with the “YAML is source of truth / no hidden defaults” guideline.

class OneLoggerConfigDict(TypedDict): project: str name: str enable_for_current_rank: NotRequired[bool] class LoggerConfig(TypedDict): @@ one_logger_enabled: bool @@ one_logger: NotRequired[OneLoggerConfigDict]

And then enforce presence when enabled (see Logger.init comment below) and access cfg["project"] / cfg["name"] (see OneLogger.init comment below).

Also applies to: 106-118

🤖 Prompt for AI Agents

In nemo_rl/utils/logger.py around lines 95-99 (and similarly 106-118), the TypedDict currently requires project/name but runtime passes {} and OneLogger supplies fallbacks; update the TypedDict so project and name are NotRequired (make them optional), then add a validation in Logger.__init__ that if logging is enabled it raises a descriptive error when cfg lacks "project" or "name", and finally update OneLogger.__init__ to assume presence and access cfg["project"] and cfg["name"] (instead of using .get) after the Logger check ensures they exist.

coderabbitai · 2025-12-14T01:15:46Z

nemo_rl/utils/logger.py

+class OneLogger(LoggerInterface):
+    """OneLogger backend."""
+
+    def __init__(self, cfg: OneLoggerConfigDict, log_dir: Optional[str] = None):
+        if not HAVE_ONE_LOGGER:
+            print("Warning: nv_one_logger not installed. OneLogger will be disabled.")
+            return
+
+        self.provider = OneLoggerProvider.instance()
+        project = cfg.get("project", "nemo_rl")
+        name = cfg.get("name", "nemo_rl_run")
+
+        if not self.provider.one_logger_ready:
+            # Configure OneLogger
+            one_logger_config = BaseOneLoggerConfig(
+                application_name=project,
+                session_tag_or_fn=name,
+                world_size_or_fn=lambda: int(os.environ.get("WORLD_SIZE", 1)),
+                enable_for_current_rank=cfg.get("enable_for_current_rank", True),
+            )
+
+            # Setup OTLP exporter
+            otel_endpoint = os.environ.get(
+                "OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4318/v1/traces"
+            )
+            os.environ["OTEL_SERVICE_NAME"] = project
+            otel_exporter = OTelExporter(
+                otel_span_exporter=OTLPSpanExporter(endpoint=otel_endpoint)
+            )
+
+            recorder = DefaultRecorder(exporters=[otel_exporter])
+            self.provider.configure(one_logger_config, recorder)
+            print(f"Initialized OneLogger for project {project}, run {name}")
+
+        self.recorder = self.provider.recorder
+
+        # Ensure we have an active application span
+        app_spans = self.recorder.get_active_spans_by_name(StandardSpanName.APPLICATION)
+        if app_spans:
+            self.app_span = app_spans[0]
+        else:
+            self.app_span = self.recorder.start(StandardSpanName.APPLICATION)
+
+    def log_metrics(
+        self,
+        metrics: dict[str, Any],
+        step: int,
+        prefix: Optional[str] = "",
+        step_metric: Optional[str] = None,
+    ) -> None:
+        if not HAVE_ONE_LOGGER or not self.app_span:
+            return
+
+        attributes_dict = {"step": step}
+
+        for name, value in metrics.items():
+            if prefix:
+                name = f"{prefix}/{name}"
+            # OneLogger attributes support basic types.
+            if isinstance(value, (int, float, str, bool)):
+                attributes_dict[name] = value
+            elif isinstance(value, torch.Tensor) and value.numel() == 1:
+                attributes_dict[name] = value.item()
+
+        # Record as an event
+        self.recorder.event(
+            span=self.app_span,
+            event=Event.create(
+                name="metrics_update",
+                attributes=Attributes(attributes_dict),
+            ),
+        )
+
+    def log_hyperparams(self, params: Mapping[str, Any]) -> None:
+        if not HAVE_ONE_LOGGER or not self.app_span:
+            return
+
+        flat_params = flatten_dict(params)
+        # Convert to supported types
+        valid_params = {}
+        for k, v in flat_params.items():
+            if isinstance(v, (int, float, str, bool)):
+                valid_params[k] = v
+            else:
+                valid_params[k] = str(v)
+
+        self.app_span.attributes.add_attributes(Attributes(valid_params))
+
+    def log_histogram(self, histogram: list[Any], step: int, name: str) -> None:
+        pass
+
+    def log_plot(self, figure: plt.Figure, step: int, name: str) -> None:
+        pass
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file structure and see if it exists git ls-files nemo_rl/utils/logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 81

🏁 Script executed:

# Read the specific file to verify current state wc -l nemo_rl/utils/logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 86

🏁 Script executed:

# Read the OneLogger class implementation sed -n '464,557p' nemo_rl/utils/logger.py | cat -n

Repository: NVIDIA-NeMo/RL

Length of output: 4085

🏁 Script executed:

# Check for config definitions - likely a TypedDict rg "OneLoggerConfigDict" -A 5 -B 2

Repository: NVIDIA-NeMo/RL

Length of output: 1301

🏁 Script executed:

# Check if there's a copyright header at the top of the file head -20 nemo_rl/utils/logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 728

🏁 Script executed:

# Check for @ray.remote decorators in the file rg "@ray.remote" nemo_rl/utils/logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 40

🏁 Script executed:

# Check for OpenTelemetry dependency version in pyproject.toml or requirements fd -e toml -e txt | xargs grep -l "opentelemetry\|otel" 2>/dev/null | head -5

Repository: NVIDIA-NeMo/RL

Length of output: 148

🏁 Script executed:

# Look for config schema or YAML files that define defaults fd -e yaml | xargs grep -l "onelogger\|one_logger" -i 2>/dev/null

Repository: NVIDIA-NeMo/RL

Length of output: 118

🏁 Script executed:

# Check YAML examples for one_logger configuration cat examples/configs/grpo_math_1B.yaml | grep -A 10 "one_logger"

Repository: NVIDIA-NeMo/RL

Length of output: 436

🏁 Script executed:

# Check sft.yaml cat examples/configs/sft.yaml | grep -A 10 "one_logger"

Repository: NVIDIA-NeMo/RL

Length of output: 508

🏁 Script executed:

# Check pyproject.toml for OpenTelemetry dependency version grep -A 2 -B 2 "opentelemetry\|otel" pyproject.toml

Repository: NVIDIA-NeMo/RL

Length of output: 837

🏁 Script executed:

# Check if log_dir is used anywhere in the class or as base class method rg "log_dir" nemo_rl/utils/logger.py

Repository: NVIDIA-NeMo/RL

Length of output: 2942

🏁 Script executed:

# Verify step_metric usage in log_metrics rg "step_metric" nemo_rl/utils/logger.py -A 3 -B 3

Repository: NVIDIA-NeMo/RL

Length of output: 6620

Initialize all class members in __init__ path; don't skip initialization on early return.

On early return (line 5-7), self.app_span and self.recorder are never initialized. Later calls (lines 51, 66, 75, 87) then fail with AttributeError. Always initialize these members before any early return.

Also: For required config keys project and name, use direct access (cfg["project"]) instead of cfg.get() with defaults—the TypedDict marks them required. The unused step_metric parameter should be explicitly ignored with _ = step_metric.

🧰 Tools

🪛 Ruff (0.14.8)

467-467: Unused method argument: log_dir

(ARG002)

512-512: Unused method argument: step_metric

(ARG002)

coderabbitai · 2025-12-14T01:15:46Z

nemo_rl/utils/logger.py

        Parameters:
            cfg (LoggerConfig): Configuration mapping. Expected keys include:
                - "log_dir": base directory for backend logs.
-                - "wandb_enabled", "swanlab_enabled", "tensorboard_enabled", "mlflow_enabled": booleans to enable backends.
-                - "wandb", "swanlab", "tensorboard", "mlflow": per-backend configuration dicts.
+                - "log_dir": base directory for backend logs.
+                - "wandb_enabled", "swanlab_enabled", "tensorboard_enabled", "mlflow_enabled", "one_logger_enabled": booleans to enable backends.
+                - "wandb", "swanlab", "tensorboard", "mlflow", "one_logger": per-backend configuration dicts.
                - "monitor_gpus": boolean to enable Ray GPU monitoring.


⚠️ Potential issue | 🟡 Minor

Docstring has duplicated bullets (“log_dir” repeated).
Clean up to avoid confusing rendered docs.

cfg (LoggerConfig): Configuration mapping. Expected keys include: - "log_dir": base directory for backend logs. - - "log_dir": base directory for backend logs. - "wandb_enabled", "swanlab_enabled", "tensorboard_enabled", "mlflow_enabled", "one_logger_enabled": booleans to enable backends. - "wandb", "swanlab", "tensorboard", "mlflow", "one_logger": per-backend configuration dicts.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Parameters:

cfg (LoggerConfig): Configuration mapping. Expected keys include:

- "log_dir": base directory for backend logs.

- "wandb_enabled", "swanlab_enabled", "tensorboard_enabled", "mlflow_enabled": booleans to enable backends.

- "wandb", "swanlab", "tensorboard", "mlflow": per-backend configuration dicts.

- "log_dir": base directory for backend logs.

- "wandb_enabled", "swanlab_enabled", "tensorboard_enabled", "mlflow_enabled", "one_logger_enabled": booleans to enable backends.

- "wandb", "swanlab", "tensorboard", "mlflow", "one_logger": per-backend configuration dicts.

- "monitor_gpus": boolean to enable Ray GPU monitoring.

Parameters:

cfg (LoggerConfig): Configuration mapping. Expected keys include:

- "log_dir": base directory for backend logs.

- "wandb_enabled", "swanlab_enabled", "tensorboard_enabled", "mlflow_enabled", "one_logger_enabled": booleans to enable backends.

- "wandb", "swanlab", "tensorboard", "mlflow", "one_logger": per-backend configuration dicts.

- "monitor_gpus": boolean to enable Ray GPU monitoring.

🤖 Prompt for AI Agents

In nemo_rl/utils/logger.py around lines 951 to 957, the docstring parameter list contains a duplicated bullet for "log_dir"; remove the duplicate entry (or replace it with the correct intended key if a different config key was meant) so the docstring lists each expected configuration key only once and update any wording to keep the list consistent and unambiguous.

nemo_rl/utils/logger.py

therealnaveenkamal · 2025-12-14T01:15:51Z

@terrykong I've integrated nv-one-logger:

added as a git submodule at 3rdparty/NV-One-Logger-workspace/nv-one-logger
created wrapper packages nv-one-logger-core, -otel, -training-telemetry, -wandb
tested with an OTEL collector and confirmed metrics are being sent successfully

let me know what you think, thanks!

terrykong

thanks @therealnaveenkamal for the help adding one-logger. please see my review below and let me know if you have any questions

terrykong · 2025-12-15T05:29:49Z

3rdparty/NV-One-Logger-workspace/wrappers/nv-one-logger-otel/pyproject.toml

so for all the one logger dependencies, can we switch to VCS installs in pyproject.toml? see https://github.com/NVIDIA-NeMo/RL/blob/main/pyproject.toml#L154 for example

for context, the 3rdparty repo contains dependencies we expect to change a lot and require us to edit the code to have a monorepo experience. considering logging is probably not something we will change very often, i think it will be easier to maintain this as a regular python package.

terrykong · 2025-12-15T05:30:43Z

nemo_rl/utils/logger.py

+# Add 3rdparty to path to support vendored nv-one-logger
+import sys
+_3rd_party_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "../../3rdparty/nv-one-logger/nv_one_logger"))
+if os.path.isdir(_3rd_party_root):
+    # We need to add the src directories of the individual packages to sys.path
+    _core_src = os.path.join(_3rd_party_root, "one_logger_core/src")
+    _otel_src = os.path.join(_3rd_party_root, "one_logger_otel/src")
+
+    if os.path.isdir(_core_src) and _core_src not in sys.path:
+        sys.path.append(_core_src)
+    if os.path.isdir(_otel_src) and _otel_src not in sys.path:
+        sys.path.append(_otel_src)


assuming this is moved to pyproject.toml as a regular VCS install, i think we can just assume one logger is installed to simplify this

terrykong · 2025-12-15T05:31:31Z

nemo_rl/utils/logger.py

+    """OneLogger backend."""
+
+    def __init__(self, cfg: OneLoggerConfigDict, log_dir: Optional[str] = None):
+        if not HAVE_ONE_LOGGER:


terrykong · 2025-12-15T05:32:10Z

nemo_rl/utils/logger.py

+        project = cfg.get("project", "nemo_rl")
+        name = cfg.get("name", "nemo_rl_run")


these default should come from yaml

Suggested change

project = cfg.get("project", "nemo_rl")

name = cfg.get("name", "nemo_rl_run")

project = cfg["project"]

name = cfg["name"]

terrykong · 2025-12-15T05:36:07Z

nemo_rl/utils/logger.py

+            one_logger_config = BaseOneLoggerConfig(
+                application_name=project,
+                session_tag_or_fn=name,
+                world_size_or_fn=lambda: int(os.environ.get("WORLD_SIZE", 1)),


WORLD_SIZE doesn't have a value in this context b/c this is executed on the driver and there may be multiple "worlds" b/c multiple policy worker groups and the generation worker groups. Probably best thing to do here is num_nodes * gpus_per_node

nemo_rl/utils/logger.py

terrykong · 2025-12-15T05:42:19Z

.gitmodules

 	url = https://github.com/NVIDIA-NeMo/Gym.git
 	branch = main
 	shallow = true
+[submodule "3rdparty/NV-One-Logger-workspace/nv-one-logger"]


shouldn't be needed

terrykong · 2025-12-15T05:43:15Z

pyproject.toml

+nv-one-logger-core = { workspace = true }
+nv-one-logger-otel = { workspace = true }
+nv-one-logger-training-telemetry = { workspace = true }
+nv-one-logger-wandb = { workspace = true }


for these vcs installs can probably use subdir syntax https://docs.astral.sh/uv/concepts/projects/dependencies/#git

terrykong · 2025-12-15T05:44:14Z

nemo_rl/utils/logger.py

+
+            # Setup OTLP exporter
+            otel_endpoint = os.environ.get(
+                "OTEL_EXPORTER_OTLP_ENDPOINT", "http://localhost:4318/v1/traces"


what's this for?

terrykong · 2025-12-15T05:44:50Z

nemo_rl/utils/logger.py

        return


+class OneLogger(LoggerInterface):


could you please add unit tests for this?

added one-logger support for NeMo RL

046c049

Signed-off-by: Naveenraj Kamalakannan <[email protected]>

therealnaveenkamal requested review from a team as code owners December 14, 2025 01:09

therealnaveenkamal changed the title ~~added one-logger support for NeMo RL~~ [feat]: added one-logger support for NeMo RL Dec 14, 2025

coderabbitai bot reviewed Dec 14, 2025

View reviewed changes

github-actions bot added the community-request label Dec 14, 2025

terrykong reviewed Dec 15, 2025

View reviewed changes

		@@ -0,0 +1 @@
		Subproject commit c382dd5193ae987233c7a61de33fb16f91724b7d

-import sys
-import toml
-from pathlib import Path
-import setuptools
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import toml
+from pathlib import Path
+import setuptools

		project = cfg.get("project", "nemo_rl")
		name = cfg.get("name", "nemo_rl_run")

[feat]: added one-logger support for NeMo RL #1631

Are you sure you want to change the base?

[feat]: added one-logger support for NeMo RL #1631

Uh oh!

Conversation

therealnaveenkamal commented Dec 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 14, 2025

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

therealnaveenkamal commented Dec 14, 2025

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

terrykong Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

therealnaveenkamal commented Dec 14, 2025 •

edited by coderabbitai bot

Loading

terrykong Dec 15, 2025 •

edited

Loading