Skip to content

Conversation

@Shr1ftyy
Copy link
Collaborator

@Shr1ftyy Shr1ftyy commented Feb 11, 2026

Summary

  • Batch GPU→CPU state transfers (robot + object reads) via torch.cat to minimize sync overhead
  • Add render_depth param/env-var to disable depth rendering when unused by policy
  • Pre-compute actuated_dof_idx and physics_steps once in __init__ instead of per-step
  • Log errors in fallback object-state reads instead of silently swallowing them

Benchmark Results

Configuration Before After Improvement
Default (all rendering) 31.93 FPS 35.74 FPS +11.9%

Per-component gains

Component Before After Saved
Robot state read 4.11 ms 2.58 ms -1.53 ms
Object state read 1.34 ms 1.05 ms -0.29 ms
Action apply 2.29 ms 1.54 ms -0.75 ms

API Changes

One new optional parameter added to GenesisBaseEnvironment.__init__() and G1Environment.__init__():

  • render_depth: bool = True — enable/disable depth rendering (default: enabled, backward-compatible)
  • Also configurable via GENESIS_RENDER_DEPTH env var inside evaluation containers

Benchmark script gains --no-depth CLI flag.

Testing

  • ruff check .
  • ty check .
  • pytest tests/unit/test_genesis.py — 72 passed ✅

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optional depth rendering control for Genesis environments; disable via --no-depth CLI flag or environment variable for improved step performance.
  • Documentation

    • Added Genesis rendering configuration reference guide covering depth rendering controls and environment setup.
  • Chores

    • Updated NumPy dependency constraints for improved compatibility.

Batch GPU→CPU state transfers, add configurable render frame-skip,
optional depth rendering, cached image encoding, and pre-computed
per-step constants. Brings env.step() from ~32 FPS to ~47.5 FPS
(with render_interval=2), closing 95% of the gap to 50 Hz realtime.

Key changes:
- Batch 6 robot state reads into single torch.cat().cpu() transfer
- Batch object state reads similarly
- Add render_interval param: skip camera render on intermediate steps
- Add render_depth param: disable depth when unused by policy
- Cache encoded images across frame-skip steps (avoid redundant base64)
- Pre-compute actuated_dof_idx and physics_steps in __init__
- Update benchmark script with --render-interval and --no-depth flags
@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Added optional depth-rendering control (constructor arg + GENESIS_RENDER_DEPTH), batched GPU→CPU state/object reads, precomputed actuated-DOF indices and physics-step reuse, camera move-to-attached-link before capture, conditional depth capture and image caching/encoding changes.

Changes

Cohort / File(s) Summary
Base Environment
kinitro/environments/genesis/base.py
Added render_depth ctor arg and env fallback (_render_depth); precompute _actuated_dof_idx and _physics_steps; batch GPU→CPU reads for robot and objects; use precomputed indices in _apply_action; conditional depth capture; maintain explicit RGB/Depth encoded caches.
Environment Subclass
kinitro/environments/genesis/envs/g1_humanoid.py
Propagated new `render_depth: bool
CLI / Benchmark
scripts/benchmark_genesis.py
Benchmark CLI and benchmark() accept depth toggle (--no-depth); pass render_depth to G1Environment; print depth-rendering state.
Env Example
.env.example
Added commented GENESIS_RENDER_DEPTH example line in Genesis rendering section.
Packaging / Static Analysis
pyproject.toml
Capped numpy to <2.3 and added torch to tool.ty.analysis.allowed-unresolved-imports.
Docs
docs/backend-guide.md, environments/README.md
Documented GENESIS_RENDER_DEPTH, constructor precedence, and new executor verification env vars and behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Agent
    participant Env as GenesisEnv
    participant Camera
    participant GPU
    participant CPU
    participant Encoder

    Agent->>Env: step(action)
    Env->>Env: _apply_action(action using _actuated_dof_idx)
    Env->>GPU: simulate(steps = _physics_steps)
    Env->>Env: decide_render(_render_depth, caches)
    alt render required
        Env->>Camera: move_to_attached_link()
        Camera->>GPU: render_frame(RGB and maybe Depth)
        GPU->>CPU: batch_read_frames (torch.cat)
        CPU->>Encoder: encode_base64_if_needed
        Encoder->>Env: store_encoded_images
    else reuse cached frames
        Env->>Env: return cached encoded images
    end
    Env->>Agent: observation(state + encoded images)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

"🐇 I hop through frames with careful cheer,

Batching reads and keeping near,
Depth optional, joints in line,
Cached pixels saved just fine.
A rabbit nudges render clear."

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: performance optimizations through batched state reads and a render_depth toggle feature for the Genesis environment.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch perf/genesis-step-optimizations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

torch is a runtime-only dependency bundled with Genesis in the Docker
container and not installed in the CI environment.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@kinitro/environments/genesis/base.py`:
- Around line 759-761: The cached encoded payloads are never invalidated when a
fresh render updates pixel buffers, so add invalidation in _capture_camera: when
you assign self._cached_rgb and/or self._cached_depth (the fresh-render path in
_capture_camera), also clear self._cached_rgb_encoded and
self._cached_depth_encoded (e.g., set to None or empty) so subsequent calls to
_build_observation will re-encode the new buffers; locate these members in
_capture_camera and ensure you only invalidate the encoded cache when you
replace the raw buffer references.
🧹 Nitpick comments (4)
kinitro/environments/genesis/base.py (3)

266-285: LGTM on precomputed indices and caching fields.

Clean initialization of rendering caches and precomputed action indices. One minor note:

Line 281: int(self.CONTROL_DT / self.SIM_DT) truncates rather than rounds. With the current constants (0.02 / 0.01 = 2.0) this is fine, but if either constant is ever changed to a value where floating-point division doesn't land exactly, int() will silently drop a substep. round(...) would be a safer choice.

Safer rounding for physics substeps
-        self._physics_steps = max(1, int(self.CONTROL_DT / self.SIM_DT))
+        self._physics_steps = max(1, round(self.CONTROL_DT / self.SIM_DT))

720-725: Silent except Exception: pass swallows per-object read failures.

In the fallback path, individual object read failures are silently discarded with no logging. This can mask real issues (e.g., an entity becoming invalid mid-episode). At minimum, add a logger.debug to match the outer handler.

As per coding guidelines, "Avoid broad except Exception unless you re-raise or return a clear error."

Add debug logging for individual object failures
                 except Exception:
-                    pass
+                    logger.debug("object_state_read_failed", object_id=obj_id)

786-795: Identity-based cache check is fragile (downstream of the caching bug above).

The cam_rgb is not self._cached_rgb check on Line 787 will always be False for fresh renders (same reference), so the guard relies entirely on the is None fallback. With the fix above (resetting _cached_rgb_encoded = None in _capture_camera), this works correctly — but the is not arm is dead code. Consider simplifying to just check for None:

Simplify to a None-only check
-            if cam_rgb is not self._cached_rgb or self._cached_rgb_encoded is None:
+            if self._cached_rgb_encoded is None:
                 self._cached_rgb_encoded = encode_image(cam_rgb)
             rgb["ego"] = self._cached_rgb_encoded
         depth: dict[str, dict | list] = {}
         if cam_depth is not None:
-            if cam_depth is not self._cached_depth or self._cached_depth_encoded is None:
+            if self._cached_depth_encoded is None:
                 self._cached_depth_encoded = encode_image(cam_depth)
             depth["ego"] = self._cached_depth_encoded
scripts/benchmark_genesis.py (1)

207-228: Decomposed benchmark loop doesn't reflect --render-interval / --no-depth.

The decomposed profiling loop (Lines 208–228) always renders every step with depth=True, so its timings represent the unoptimized baseline. The env.step() loop (Lines 252–258) captures the optimized path. This is fine for component-level profiling, but could confuse users comparing the two sections — consider adding a brief note in the output (e.g., "Decomposed timings use default rendering settings") to set expectations.

When _capture_camera produces new pixel data, the base64-encoded
caches were not cleared, causing _build_observation to serve stale
encoded frames from the first render for the entire episode.
…depth

Docker eval containers can now set GENESIS_RENDER_INTERVAL and
GENESIS_RENDER_DEPTH environment variables to tune rendering without
any executor protocol changes. Constructor args still take precedence
when passed explicitly.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@kinitro/environments/genesis/base.py`:
- Around line 266-293: The code currently calls
int(os.environ.get("GENESIS_RENDER_INTERVAL", "1")) which will raise ValueError
on malformed env values; update the constructor logic around render_interval
(the branch that reads GENESIS_RENDER_INTERVAL and sets self._render_interval)
to defensively parse the env var with a try/except: attempt to convert the env
string to int, on ValueError fall back to the default (1) and proceed to apply
max(1, render_interval); optionally emit a warning via the module/class logger
when the env var is invalid so developers can notice misconfiguration.
🧹 Nitpick comments (2)
kinitro/environments/genesis/base.py (2)

726-732: Silent pass in fallback swallows per-object read errors with zero observability.

The outer batch failure is logged at debug level, but individual object failures in the fallback are silently discarded. If all per-object reads fail, callers receive an empty dict with no indication. Consider at minimum a debug-level log.

♻️ Add minimal logging
                 except Exception:
-                    pass
+                    logger.debug("object_state_read_failed", object_id=obj_id)

795-806: Identity-based cache reuse for encoded images is correct but subtle — worth a brief inline comment.

The is not check works because _capture_camera stores the reference to _cached_rgb and invalidates _cached_rgb_encoded on fresh renders, while frame-skip steps return the same cached reference with the encoded cache intact. The logic is correct, but the two conditions interact across two methods. A one-line comment explaining why is is safe here would help future readers.

…ENESIS_RENDER_DEPTH)

Add configuration docs to backend guide, environments README, and
.env.example for the new container-level rendering knobs.
…ender interval example

Replace silent 'except Exception: pass' in the individual object read
fallback loop with a debug-level structured log so failures are observable
without breaking the fallback behavior.

Update .env.example GENESIS_RENDER_INTERVAL to recommend 2 as the default
and rewrite the comment to explain the frame-skip tradeoff clearly.
Remove GENESIS_RENDER_INTERVAL and all associated render frame-skip
machinery. The camera now renders every step unconditionally, eliminating
the cached-frame fields, identity-based encode caching, and the
render_interval constructor/env-var plumbing across base.py,
g1_humanoid.py, benchmark script, and documentation.

GENESIS_RENDER_DEPTH remains as the sole rendering knob.
@Shr1ftyy Shr1ftyy changed the title perf(genesis): optimize step loop for ~49% faster env.step() throughput perf(genesis): optimize step loop with batched state reads and render_depth toggle Feb 11, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/benchmark_genesis.py (1)

213-215: ⚠️ Potential issue | 🟡 Minor

Decomposed benchmark loop ignores render_depth flag.

The manual camera.render(rgb=True, depth=True) on Line 214 always renders depth, even when --no-depth is passed. This means the per-component timings won't reflect the actual depth-disabled performance, while the env.step() comparison loop (Line 253) will skip depth — making the two measurements inconsistent.

Proposed fix
-            rgb, depth, _seg, _normal = camera.render(rgb=True, depth=True)
+            rgb, depth, _seg, _normal = camera.render(rgb=True, depth=render_depth)
🧹 Nitpick comments (3)
kinitro/environments/genesis/base.py (3)

730-733: Accessing _camera._attached_link is fragile.

This relies on a private attribute of the Genesis camera object. If the Genesis library changes its internal API, this will break silently or raise AttributeError.

Consider wrapping in hasattr or getattr(..., None) for resilience:

Proposed defensive check
-            if self._camera._attached_link is not None:
+            if getattr(self._camera, "_attached_link", None) is not None:

746-748: Redundant depth suppression after render call.

depth is already None when render_depth=False is passed to camera.render() (most renderers skip computation and return None). If Genesis does return a non-None depth tensor even when depth=False, this guard is necessary — but it would be worth a comment explaining why it's needed so future readers don't remove it as dead code.


766-772: Type annotation dict[str, dict | list] is overly broad.

The actual values are always dict (from encode_image). Consider dict[str, dict[str, Any]] for accuracy if you want to keep it simple, or just dict[str, Any].

…prehension

Replace index-based parallel iteration with zip pairs and a dict
comprehension for the batch path. The fallback loop now iterates
over (obj, entity) tuples directly instead of manual indexing.
Replace manual offset arithmetic with np.split(packed, cumsum(sizes))
so split points are derived from actual tensor shapes. Removes all
hand-computed index variables (dof_start, dof_end, vel_start).
Depth rendering is rarely used by current policies and adds overhead.
Default GENESIS_RENDER_DEPTH to false; operators can opt in with
GENESIS_RENDER_DEPTH=true if needed.
Replace string-matching boolean parsing (true/false/yes/no) with a
simple int() cast. Cleaner, fails fast on bad input, and matches
common env var conventions.
@Shr1ftyy Shr1ftyy merged commit de97b9d into main Feb 11, 2026
2 checks passed
@Shr1ftyy Shr1ftyy deleted the perf/genesis-step-optimizations branch February 11, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant