-
Notifications
You must be signed in to change notification settings - Fork 0
perf(genesis): optimize step loop with batched state reads and render_depth toggle #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Batch GPU→CPU state transfers, add configurable render frame-skip, optional depth rendering, cached image encoding, and pre-computed per-step constants. Brings env.step() from ~32 FPS to ~47.5 FPS (with render_interval=2), closing 95% of the gap to 50 Hz realtime. Key changes: - Batch 6 robot state reads into single torch.cat().cpu() transfer - Batch object state reads similarly - Add render_interval param: skip camera render on intermediate steps - Add render_depth param: disable depth when unused by policy - Cache encoded images across frame-skip steps (avoid redundant base64) - Pre-compute actuated_dof_idx and physics_steps in __init__ - Update benchmark script with --render-interval and --no-depth flags
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdded optional depth-rendering control (constructor arg + GENESIS_RENDER_DEPTH), batched GPU→CPU state/object reads, precomputed actuated-DOF indices and physics-step reuse, camera move-to-attached-link before capture, conditional depth capture and image caching/encoding changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Agent
participant Env as GenesisEnv
participant Camera
participant GPU
participant CPU
participant Encoder
Agent->>Env: step(action)
Env->>Env: _apply_action(action using _actuated_dof_idx)
Env->>GPU: simulate(steps = _physics_steps)
Env->>Env: decide_render(_render_depth, caches)
alt render required
Env->>Camera: move_to_attached_link()
Camera->>GPU: render_frame(RGB and maybe Depth)
GPU->>CPU: batch_read_frames (torch.cat)
CPU->>Encoder: encode_base64_if_needed
Encoder->>Env: store_encoded_images
else reuse cached frames
Env->>Env: return cached encoded images
end
Env->>Agent: observation(state + encoded images)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
torch is a runtime-only dependency bundled with Genesis in the Docker container and not installed in the CI environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@kinitro/environments/genesis/base.py`:
- Around line 759-761: The cached encoded payloads are never invalidated when a
fresh render updates pixel buffers, so add invalidation in _capture_camera: when
you assign self._cached_rgb and/or self._cached_depth (the fresh-render path in
_capture_camera), also clear self._cached_rgb_encoded and
self._cached_depth_encoded (e.g., set to None or empty) so subsequent calls to
_build_observation will re-encode the new buffers; locate these members in
_capture_camera and ensure you only invalidate the encoded cache when you
replace the raw buffer references.
🧹 Nitpick comments (4)
kinitro/environments/genesis/base.py (3)
266-285: LGTM on precomputed indices and caching fields.Clean initialization of rendering caches and precomputed action indices. One minor note:
Line 281:
int(self.CONTROL_DT / self.SIM_DT)truncates rather than rounds. With the current constants (0.02 / 0.01 = 2.0) this is fine, but if either constant is ever changed to a value where floating-point division doesn't land exactly,int()will silently drop a substep.round(...)would be a safer choice.Safer rounding for physics substeps
- self._physics_steps = max(1, int(self.CONTROL_DT / self.SIM_DT)) + self._physics_steps = max(1, round(self.CONTROL_DT / self.SIM_DT))
720-725: Silentexcept Exception: passswallows per-object read failures.In the fallback path, individual object read failures are silently discarded with no logging. This can mask real issues (e.g., an entity becoming invalid mid-episode). At minimum, add a
logger.debugto match the outer handler.As per coding guidelines, "Avoid broad
except Exceptionunless you re-raise or return a clear error."Add debug logging for individual object failures
except Exception: - pass + logger.debug("object_state_read_failed", object_id=obj_id)
786-795: Identity-based cache check is fragile (downstream of the caching bug above).The
cam_rgb is not self._cached_rgbcheck on Line 787 will always beFalsefor fresh renders (same reference), so the guard relies entirely on theis Nonefallback. With the fix above (resetting_cached_rgb_encoded = Nonein_capture_camera), this works correctly — but theis notarm is dead code. Consider simplifying to just check forNone:Simplify to a None-only check
- if cam_rgb is not self._cached_rgb or self._cached_rgb_encoded is None: + if self._cached_rgb_encoded is None: self._cached_rgb_encoded = encode_image(cam_rgb) rgb["ego"] = self._cached_rgb_encoded depth: dict[str, dict | list] = {} if cam_depth is not None: - if cam_depth is not self._cached_depth or self._cached_depth_encoded is None: + if self._cached_depth_encoded is None: self._cached_depth_encoded = encode_image(cam_depth) depth["ego"] = self._cached_depth_encodedscripts/benchmark_genesis.py (1)
207-228: Decomposed benchmark loop doesn't reflect--render-interval/--no-depth.The decomposed profiling loop (Lines 208–228) always renders every step with
depth=True, so its timings represent the unoptimized baseline. Theenv.step()loop (Lines 252–258) captures the optimized path. This is fine for component-level profiling, but could confuse users comparing the two sections — consider adding a brief note in the output (e.g., "Decomposed timings use default rendering settings") to set expectations.
When _capture_camera produces new pixel data, the base64-encoded caches were not cleared, causing _build_observation to serve stale encoded frames from the first render for the entire episode.
…depth Docker eval containers can now set GENESIS_RENDER_INTERVAL and GENESIS_RENDER_DEPTH environment variables to tune rendering without any executor protocol changes. Constructor args still take precedence when passed explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@kinitro/environments/genesis/base.py`:
- Around line 266-293: The code currently calls
int(os.environ.get("GENESIS_RENDER_INTERVAL", "1")) which will raise ValueError
on malformed env values; update the constructor logic around render_interval
(the branch that reads GENESIS_RENDER_INTERVAL and sets self._render_interval)
to defensively parse the env var with a try/except: attempt to convert the env
string to int, on ValueError fall back to the default (1) and proceed to apply
max(1, render_interval); optionally emit a warning via the module/class logger
when the env var is invalid so developers can notice misconfiguration.
🧹 Nitpick comments (2)
kinitro/environments/genesis/base.py (2)
726-732: Silentpassin fallback swallows per-object read errors with zero observability.The outer batch failure is logged at
debuglevel, but individual object failures in the fallback are silently discarded. If all per-object reads fail, callers receive an empty dict with no indication. Consider at minimum adebug-level log.♻️ Add minimal logging
except Exception: - pass + logger.debug("object_state_read_failed", object_id=obj_id)
795-806: Identity-based cache reuse for encoded images is correct but subtle — worth a brief inline comment.The
is notcheck works because_capture_camerastores the reference to_cached_rgband invalidates_cached_rgb_encodedon fresh renders, while frame-skip steps return the same cached reference with the encoded cache intact. The logic is correct, but the two conditions interact across two methods. A one-line comment explaining whyisis safe here would help future readers.
…ENESIS_RENDER_DEPTH) Add configuration docs to backend guide, environments README, and .env.example for the new container-level rendering knobs.
…ender interval example Replace silent 'except Exception: pass' in the individual object read fallback loop with a debug-level structured log so failures are observable without breaking the fallback behavior. Update .env.example GENESIS_RENDER_INTERVAL to recommend 2 as the default and rewrite the comment to explain the frame-skip tradeoff clearly.
Remove GENESIS_RENDER_INTERVAL and all associated render frame-skip machinery. The camera now renders every step unconditionally, eliminating the cached-frame fields, identity-based encode caching, and the render_interval constructor/env-var plumbing across base.py, g1_humanoid.py, benchmark script, and documentation. GENESIS_RENDER_DEPTH remains as the sole rendering knob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/benchmark_genesis.py (1)
213-215:⚠️ Potential issue | 🟡 MinorDecomposed benchmark loop ignores
render_depthflag.The manual
camera.render(rgb=True, depth=True)on Line 214 always renders depth, even when--no-depthis passed. This means the per-component timings won't reflect the actual depth-disabled performance, while theenv.step()comparison loop (Line 253) will skip depth — making the two measurements inconsistent.Proposed fix
- rgb, depth, _seg, _normal = camera.render(rgb=True, depth=True) + rgb, depth, _seg, _normal = camera.render(rgb=True, depth=render_depth)
🧹 Nitpick comments (3)
kinitro/environments/genesis/base.py (3)
730-733: Accessing_camera._attached_linkis fragile.This relies on a private attribute of the Genesis camera object. If the Genesis library changes its internal API, this will break silently or raise
AttributeError.Consider wrapping in
hasattrorgetattr(..., None)for resilience:Proposed defensive check
- if self._camera._attached_link is not None: + if getattr(self._camera, "_attached_link", None) is not None:
746-748: Redundant depth suppression after render call.
depthis alreadyNonewhenrender_depth=Falseis passed tocamera.render()(most renderers skip computation and returnNone). If Genesis does return a non-None depth tensor even whendepth=False, this guard is necessary — but it would be worth a comment explaining why it's needed so future readers don't remove it as dead code.
766-772: Type annotationdict[str, dict | list]is overly broad.The actual values are always
dict(fromencode_image). Considerdict[str, dict[str, Any]]for accuracy if you want to keep it simple, or justdict[str, Any].
…prehension Replace index-based parallel iteration with zip pairs and a dict comprehension for the batch path. The fallback loop now iterates over (obj, entity) tuples directly instead of manual indexing.
Replace manual offset arithmetic with np.split(packed, cumsum(sizes)) so split points are derived from actual tensor shapes. Removes all hand-computed index variables (dof_start, dof_end, vel_start).
Depth rendering is rarely used by current policies and adds overhead. Default GENESIS_RENDER_DEPTH to false; operators can opt in with GENESIS_RENDER_DEPTH=true if needed.
Replace string-matching boolean parsing (true/false/yes/no) with a simple int() cast. Cleaner, fails fast on bad input, and matches common env var conventions.
Summary
torch.catto minimize sync overheadrender_depthparam/env-var to disable depth rendering when unused by policyactuated_dof_idxandphysics_stepsonce in__init__instead of per-stepBenchmark Results
Per-component gains
API Changes
One new optional parameter added to
GenesisBaseEnvironment.__init__()andG1Environment.__init__():render_depth: bool = True— enable/disable depth rendering (default: enabled, backward-compatible)GENESIS_RENDER_DEPTHenv var inside evaluation containersBenchmark script gains
--no-depthCLI flag.Testing
ruff check .✅ty check .✅pytest tests/unit/test_genesis.py— 72 passed ✅Summary by CodeRabbit
Release Notes
New Features
--no-depthCLI flag or environment variable for improved step performance.Documentation
Chores