Add live-docs-lookup skill for AI SDK doc grounding#646
Add live-docs-lookup skill for AI SDK doc grounding#646
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds documentation-only "live-docs-lookup" skill: automatic provider detection and per-provider live documentation fetching (Anthropic, OpenAI, Google/Gemini), concise grounding summaries for models/APIs, a creation log, and academic/pressure tests illustrating expected agent behavior changes. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent
participant ProviderDetector as ProviderDetector
participant DocFetcher as DocFetcher
participant WebSearch as WebSearchFallback
participant Summarizer as Summarizer
Agent->>ProviderDetector: supply prompt / code / env
ProviderDetector->>Agent: detected provider(s)
ProviderDetector->>DocFetcher: request provider-specific docs
DocFetcher->>DocFetcher: fetch model/docs pages (parallel)
DocFetcher-->>WebSearchFallback: fallback restricted web search if missing
WebSearchFallback-->>DocFetcher: return candidate pages
DocFetcher->>Summarizer: deliver fetched docs
Summarizer->>Agent: return 5–10 bullet grounding + suggested fixes
Agent->>Agent: continue original task using grounded context
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
skills/live-docs-lookup/SKILL.md (1)
1-9: Add this skill to the main skills index for discoverability.
README.md:88-115enumerates the available skills, but this new one is missing there. Shipping the skill without updating the central list makes it much harder to discover than the rest of the library.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/live-docs-lookup/SKILL.md` around lines 1 - 9, The new skill "live-docs-lookup" is missing from the central skills index, so add an entry for it into the README skills list (the section that enumerates available skills) matching the format used for other skills: include the skill name "live-docs-lookup", a one-line description pulled from SKILL.md, and a link/reference to the SKILL.md file so it appears in discoverability lists; ensure the entry follows the same ordering/format and syntax as existing skill entries in the README skills index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/live-docs-lookup/CREATION-LOG.md`:
- Around line 18-28: Update the wording so the docs do not present
web_search_20260209 / web_fetch_20260209 as canonical tool identifiers; instead
state that Anthropic provides native server-hosted web search and web fetch
tools (e.g., historically referenced as web_search_20250305, web_fetch_20250910,
etc.) and add a note instructing developers to verify the exact current `type`
identifier in the official docs before implementation.
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 46-49: Update the fallback instruction in the "If a URL fails or
returns a redirect/404" section to require domain-scoped searches: when a doc
URL fails, search only the vendor's official docs domain (e.g.,
platform.claude.com/docs, developers.openai.com/api/docs, ai.google.dev) using
the doc topic and provider name instead of a general web search; change the text
to explicitly instruct the agent to restrict queries to the provider's canonical
docs domain and include example domain patterns to enforce grounding.
- Around line 30-31: The provider detection list is missing the modern Google
Gen AI import path; add detection for the import pattern "from google import
genai" alongside the existing "import google.generativeai" and "import vertexai"
entries so the skill recognizes projects using the official google-genai SDK.
Update the Google row in SKILL.md to include the `from google import genai`
token (and any equivalent identifier like `genai`) so functions that parse these
strings (provider detection logic) will trigger for the new SDK import style.
In `@skills/live-docs-lookup/test-pressure-3.md`:
- Around line 21-30: The test hardcodes an outdated model ID
(`gemini-2.0-flash`) in the expectation block; update the test in
skills/live-docs-lookup/test-pressure-3.md to either replace that hardcoded ID
with the current recommended IDs (`gemini-3-flash` and `gemini-3.1-pro`)
or—preferably—change the assertion logic referenced by the "whichever is current
at fetch time" check to validate that the fetched model recommendation is newer
than `gemini-2.0-flash` (i.e., a freshness comparison) instead of exact string
matching, so the test enforces dynamic live-docs fetching rather than a fixed
model name.
- Around line 36-50: The test expects the deprecated Python SDK
`google-generativeai`; update it to the current GenAI SDK by replacing
references to `google-generativeai` with `google-genai` and change import
guidance to the new syntax `from google import genai`; also update any example
model IDs from legacy names like `gemini-pro` to current aliases such as
`gemini-2.0-flash` (or the live recommended alias) so the test reflects live-doc
grounding and non-deprecated usage.
---
Nitpick comments:
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 1-9: The new skill "live-docs-lookup" is missing from the central
skills index, so add an entry for it into the README skills list (the section
that enumerates available skills) matching the format used for other skills:
include the skill name "live-docs-lookup", a one-line description pulled from
SKILL.md, and a link/reference to the SKILL.md file so it appears in
discoverability lists; ensure the entry follows the same ordering/format and
syntax as existing skill entries in the README skills index.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: bb5b2716-5e80-474b-a38f-068d40fd5873
📒 Files selected for processing (6)
skills/live-docs-lookup/CREATION-LOG.mdskills/live-docs-lookup/SKILL.mdskills/live-docs-lookup/test-academic.mdskills/live-docs-lookup/test-pressure-1.mdskills/live-docs-lookup/test-pressure-2.mdskills/live-docs-lookup/test-pressure-3.md
- Add `from google import genai` to Google provider detection (google-genai SDK) - Domain-scope fallback URL search to official docs domains (platform.claude.com/docs, platform.openai.com/docs, ai.google.dev) - test-pressure-3: update SDK references to recommend google-genai over legacy google-generativeai; remove pinned model ID expectation in favor of "whichever is current at fetch time" - CREATION-LOG: clarify that tool type identifiers (e.g. web_search_20260209) are versioned and should be verified in official docs - README: add live-docs-lookup to skills index under AI SDK Grounding category Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
skills/live-docs-lookup/test-pressure-3.md (1)
21-25:⚠️ Potential issue | 🟠 MajorRemove the baked-in “current” Gemini model from the baseline narrative.
Line 23 hardcodes
gemini-2.0-flashas the default “as of March 2026”, but Lines 47-48 and 55-56 say this test should validate whichever model is current at fetch time. That makes the doc stale again as soon as Google updates its recommendation and undercuts the PR’s goal of removing pinned model-ID expectations.Suggested doc update
-**Confirmed failure — stale model IDs:** -The baseline recommended `gemini-1.5-flash` and `gemini-1.5-pro` as current -models. As of March 2026, `gemini-2.0-flash` is the recommended default and -the 1.5 series has been superseded. The baseline even acknowledged its own +**Confirmed failure — stale model IDs:** +The baseline recommended `gemini-1.5-flash` and `gemini-1.5-pro` as current +models instead of checking the live models page for the fetch-time +recommendation. The baseline even acknowledged its own uncertainty ("check the docs for latest models") — but still gave 1.5-series IDs, which developers will copy-paste into production code.Google Gemini API models page current recommended model March 2026🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/live-docs-lookup/test-pressure-3.md` around lines 21 - 25, Remove the baked-in “current” Gemini model string and date from the baseline narrative: delete the hardcoded "gemini-2.0-flash" token and the "as of March 2026" phrasing and replace with a short, neutral sentence that says the test will validate whichever Gemini model is returned by the API at fetch time (i.e., follow live docs). Update the narrative text that currently contrasts with the later lines (the passages referencing model validation) so it no longer asserts a specific recommended model and instead directs readers to query the official Google Gemini models page for the current recommendation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 132-154: The checklist in SKILL.md's "Common things to verify"
hardcodes rapidly changing provider facts (e.g., Anthropic bullets like
budget_tokens, beta header values, model suffix guidance, OpenAI/Google model
examples), so replace concrete assertions with directive guidance and doc
pointers: edit the Anthropic, OpenAI and Google subsections (the listed bullet
items) to state what to verify (e.g., "confirm current token/format flags,
required beta headers, and exact model IDs") and instruct the agent to consult
the provider's official docs or models page rather than embedding specific
values; remove or reword all specific deprecated flag names and header strings
into generic checks and add a short note like "see provider docs" for each
provider. Ensure the updated text preserves the "what to verify" intent and
removes mutable factual claims.
---
Duplicate comments:
In `@skills/live-docs-lookup/test-pressure-3.md`:
- Around line 21-25: Remove the baked-in “current” Gemini model string and date
from the baseline narrative: delete the hardcoded "gemini-2.0-flash" token and
the "as of March 2026" phrasing and replace with a short, neutral sentence that
says the test will validate whichever Gemini model is returned by the API at
fetch time (i.e., follow live docs). Update the narrative text that currently
contrasts with the later lines (the passages referencing model validation) so it
no longer asserts a specific recommended model and instead directs readers to
query the official Google Gemini models page for the current recommendation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a746d69d-dcd6-416b-9618-8e6e6de2f7d5
📒 Files selected for processing (4)
README.mdskills/live-docs-lookup/CREATION-LOG.mdskills/live-docs-lookup/SKILL.mdskills/live-docs-lookup/test-pressure-3.md
🚧 Files skipped from review as they are similar to previous changes (1)
- skills/live-docs-lookup/CREATION-LOG.md
…line
SKILL.md: Replace specific deprecated-fact assertions in "Common things to
verify" with durable watch-list items — keeps named parameters as search
anchors (budget_tokens, output_format, beta headers) but removes current-state
claims that would age. Apply CodeRabbit's OpenAI and Google sections as-is.
test-pressure-3.md: Remove hardcoded gemini-2.0-flash from baseline narrative
("as of March 2026, gemini-2.0-flash is the recommended default") — now says
"at least one generation behind at time of testing", consistent with the test
assertion that the skill should recommend whichever model is current at fetch time.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 46-55: Update the OpenAI docs domain used in the fallback search
guidance in SKILL.md: replace the `site:platform.openai.com/docs` entry with
`site:developers.openai.com` so fallback queries target the accessible canonical
documentation domain; ensure the example query (`site:platform.openai.com/docs
responses API`) is updated to use `site:developers.openai.com` as well and keep
the surrounding text/format intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d5e7606a-5ee0-4e1e-a07b-a2e9274709fa
📒 Files selected for processing (2)
skills/live-docs-lookup/SKILL.mdskills/live-docs-lookup/test-pressure-3.md
Use site:developers.openai.com instead — verified accessible (HTTP 200) and is OpenAI's canonical documentation domain. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The original failure obra#2 ("treated web search and web fetch as custom tools") was not accurate — the agent did use native server tools correctly. Reduced to two documented failures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 78-91: The OpenAI doc URLs under the "OpenAI" section in SKILL.md
are returning 403; update the listed links (the Model IDs, Chat completions,
Responses API, Function calling, Streaming, Structured outputs, Assistants API,
Batch API, and Rate limits entries) to use the accessible developers.openai.com
domain instead of platform.openai.com (e.g., replace
https://platform.openai.com/docs/... with the corresponding
https://developers.openai.com/docs/... URLs) so the skill can fetch those pages
directly rather than falling back to search.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 475ea557-21a4-4a67-a0e9-45d141745c47
📒 Files selected for processing (2)
skills/live-docs-lookup/CREATION-LOG.mdskills/live-docs-lookup/SKILL.md
platform.openai.com/docs returns HTTP 403 to non-browser fetches. Updated all OpenAI URLs to the accessible developers.openai.com domain with verified paths. Also removed Assistants API row (now labeled legacy). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Ready for review. Done with changes based on CodeRabbit's feedback. |
Fetches live documentation from Anthropic/OpenAI/Google before answering AI SDK questions, preventing stale training data from causing bugs. Upstream PR: obra#646 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hi! I'd encourage you to publish this skill as its own plugin. |
Motivation and Context
Adds a new skill that fetches live documentation from Anthropic, OpenAI, or Google before answering any AI SDK question so stale training data doesn't cause bugs.
Motivation and Context
Built after a real failure: used superpowers (brainstorming → writing-plans → executing-plans) to build a research tool calling the Anthropic Messages API on 1000 companies. The agent made three confident silent mistakes — all caused by stale training data:
web_search_20260209) and web fetch (web_fetch_20260209)No errors, no hedges — just wrong code.
How Has This Been Tested?
Three pressure tests with empirical baseline runs (March 2026):
test-pressure-1(Anthropic):budget_tokensdeprecation on Opus 4.6 — quantitative benchmark: 100% pass rate with skill vs 45% withouttest-pressure-2(OpenAI): unverified model IDs in Responses API guidancetest-pressure-3(Google):gemini-1.5recommended whengemini-2.0is currentSee
CREATION-LOG.mdfor full details.Breaking Changes
None — additive only.
Types of changes
Checklist
Additional context
Supports Anthropic, OpenAI, and Google. Provider detected automatically from imports, env vars, or model names. Falls back to web search if a doc URL has moved.