Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request undertakes a major overhaul of the project's documentation, aiming to provide a more structured, comprehensive, and accessible knowledge base. The changes centralize development guidelines, introduce detailed design specifications for core components, and improve the overall navigation and clarity of the project's technical information. This refactor will significantly aid new contributors in onboarding and help all developers understand the system's architecture and component interactions more deeply. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly updates and expands the project's documentation, introducing detailed design specifications for various components in new Design.md files, revising development setup and contributing guidelines, and clarifying CLI usage. Notable changes include a more comprehensive "Documentation" section in README.md with links to these new design specs, updated development environment setup in DEVELOPMENT.md to reflect new modules and pre-commit hook details, and refined CLI examples in CLI_QUICK_REFERENCE.md and LOCAL_TESTING.md. Review feedback indicates a potential documentation duplication for the "Endpoint client" component, suggesting a need to clarify or deprecate the older document, and also highlights an inconsistency in the CLI_QUICK_REFERENCE.md regarding the --report-dir option for the from-config subcommand, requiring clarification.
There was a problem hiding this comment.
Pull request overview
This PR performs a broad documentation cleanup and restructuring, adding per-component “Design Spec” documents and refreshing contributor and usage guides to reflect the current repository structure and CLI workflows.
Changes:
- Add new component-level design specs under
docs/<component>/Design.mdto describe architecture, responsibilities, and integration points. - Refresh contributor and usage documentation (local testing, development workflow, CLI reference, GitHub setup) to match current commands and repository links.
- Expand/curate example listings and cross-link documentation from the root
README.md.
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/README.md | Adds links/descriptions for new example directories. |
| docs/utils/Design.md | New utils design spec (currently contains several API/behavior mismatches vs code). |
| docs/testing/Design.md | New testing utilities design spec (echo/max/variable throughput servers). |
| docs/sglang/Design.md | New SGLang adapter design spec (currently documents non-existent adapter/accumulator methods). |
| docs/profiling/Design.md | New profiling design spec (line_profiler + pytest flag). |
| docs/plugins/Design.md | New plugin namespace design spec (currently overstates existing plugin registration API). |
| docs/openai/Design.md | New OpenAI adapter design spec (currently documents non-existent adapter/accumulator methods). |
| docs/metrics/Design.md | New metrics design spec (EventRecorder API signature section currently mismatches implementation). |
| docs/load_generator/Design.md | New load generator design spec (session/scheduler architecture). |
| docs/evaluation/Design.md | New evaluation design spec (accuracy scoring + LiveCodeBench). |
| docs/endpoint_client/Design.md | New endpoint client design spec (worker pool, ZMQ IPC, adapters). |
| docs/dataset_manager/Design.md | New dataset manager design spec (loader/transforms/presets; format inference list currently incomplete). |
| docs/core/Design.md | New core types design spec (currently mismatches actual Query/QueryResult struct fields). |
| docs/config/Design.md | New config design spec (YAML/CLI → RuntimeSettings, templates, rulesets). |
| docs/commands/Design.md | New commands layer design spec (CLI layout and command flow). |
| docs/async_utils/services/metrics_aggregator/Design.md | Adds a clearer one-line summary of the metrics aggregator service. |
| docs/async_utils/services/event_logger/Design.md | Adds a clearer one-line summary of the event logger service. |
| docs/async_utils/services/Design.md | Adds a clearer one-line summary of the pub/sub system design doc. |
| docs/async_utils/Design.md | New async_utils design spec (loop manager, ZMQ transport, pub/sub services). |
| docs/LOCAL_TESTING.md | Updates local testing instructions (dataset path, init syntax, default duration, supported formats list). |
| docs/GITHUB_SETUP.md | Updates GitHub workflow descriptions and branch protection checklist. |
| docs/ENDPOINT_CLIENT.md | Adds a short introductory summary line for the document. |
| docs/DEVELOPMENT.md | Major rewrite of development workflow guide (fork/upstream flow, tooling, formatting, links). |
| docs/CLI_QUICK_REFERENCE.md | Replaces intro text and adjusts examples/wording for config-driven usage. |
| README.md | Updates docs links, architecture diagram, LiveCodeBench link, and minor command snippets. |
| CONTRIBUTING.md | Adds pointer to docs/DEVELOPMENT.md for standards. |
| AGENTS.md | Simplifies setup section and updates repo structure/tooling references. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4ddf686 to
c5e2ed6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
nvzhihanj
left a comment
There was a problem hiding this comment.
Review Council — Multi-AI Code Review
Reviewed by: Claude (Codex unavailable) | Depth: thorough
Found 9 issues across 5 files.
Note: 17 existing inline comments were already present. Overlapping issues excluded.
Review Council — Multi-AI Code ReviewReviewed by: Claude (Codex unavailable — branch checkout failed) | Depth: thorough Found 9 issues across 5 files. All verified against actual source code. 17 existing inline comments from a prior review were already present. Overlapping issues have been excluded from this review. 🔴 Must Fix (high) — 3 issuesIssues where the documentation will actively mislead readers about how the code works.
🟡 Should Fix (medium) — 4 issuesReal inaccuracies that could cause confusion under specific circumstances.
🔵 Consider (low) — 2 issuesValid improvements that could be follow-ups.
🤖 Generated with Claude Code — Review Council |
0db4683 to
7196894
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add Design.md specs for all 15 top-level components under src/inference_endpoint/ - Restructure AGENTS.md: move code style details to DEVELOPMENT.md, update component table with runner.py and async_utils services - Update README.md: add Component Design Specs table, use python3 in examples - Reformat DEVELOPMENT.md: remove emojis, add commit type list, exact-version pinning guidance - Update CLI_QUICK_REFERENCE.md, LOCAL_TESTING.md, ENDPOINT_CLIENT.md, GITHUB_SETUP.md for consistency - Fix stale references: pkl→jsonl throughout, CLIError for eval mode, dataset_manager Design.md reflects current supported formats Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
465f559 to
9a7697b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 36 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 34 out of 35 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
docs/async_utils/services/Design.md:22
- The out-of-process subscriber description suggests using a “shared socket directory” like
socket_dir=log_dir.parent, but IPC subscribers must use the publisher’sManagedZMQContext.socket_dir(the directory where the PUB socket was actually bound). If the directory differs,ctx.connect(..., socket_name)will point at a non-existent IPC path. Recommend rewording this to emphasize that the parent process must pass the publisher’s socket_dir to child processes via--socket-dir.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Design for hybrid overlap warmup that primes server-side (CUDA graphs, KV cache, batch scheduler) and client-side (connection pools, workers) before steady-state measurement. Empirically validated against a 13-level concurrency sweep on B200 (GPT OSS 120B) showing 2-43x TTFT cold-start inflation at c>=64. Key decisions: - User-provided warmup dataset (similar ISL/OSL, different content) - Fixed sample count (default = target concurrency) - Same load pattern as performance phase - Hybrid overlap: no batch drain gap, START_PERFORMANCE_TRACKING fires on last warmup completion - Single EventRecorder/SQLite DB with time-window filtering Includes analysis script (plot_warmup_analysis.py) generating 5 figure sets comparing no-warmup, drain, and hybrid strategies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
nit: Can we use all-caps or all-lowercase for |
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 35 out of 36 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (3)
scripts/zmq_pubsub_demo.py:248
event_log_diris a fixed path under the system temp directory. When the demo is run multiple times,events.dbcan persist and accumulate old rows, which makes the SQLWriter verification output misleading (it may show stale data even for a run that didn't use the SQL writer). Consider using a unique per-run directory (e.g.tempfile.TemporaryDirectory()or adding a UUID suffix) and/or deleting any existingevents.dbbefore starting the subprocess.
scripts/zmq_pubsub_demo.py:289LoopManager()is instantiated inside a coroutine that is executed viaasyncio.run().LoopManager.__init__creates and sets a new "default" event loop for the current thread (asyncio.set_event_loop(loop)), which can interfere withasyncio.run()'s running loop and lead to tasks being scheduled on the wrong loop. It would be safer to avoid creatingLoopManagerunderasyncio.run()(e.g., run the demo onLoopManager().default_loop.run_until_complete(main())or use the project’s async runner utility).
scripts/zmq_pubsub_demo.py:110- In
DurationSubscriber.process(), entries added toself.start_timesonISSUEDare never removed afterCOMPLETE. In a longer-running demo this will grow without bound. After computing a duration, consider deleting the correspondingstart_timesentry (or usingpop()), and optionally droppingself.durationstoo if you only need aggregate stats.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
What does this PR do?
Update the docs.
Type of change
Related issues
Testing
Checklist