Skip to content

Add live-docs-lookup skill for AI SDK doc grounding#646

Closed
dhegde11 wants to merge 6 commits intoobra:mainfrom
dhegde11:add-live-docs-lookup
Closed

Add live-docs-lookup skill for AI SDK doc grounding#646
dhegde11 wants to merge 6 commits intoobra:mainfrom
dhegde11:add-live-docs-lookup

Conversation

@dhegde11
Copy link

@dhegde11 dhegde11 commented Mar 6, 2026

Motivation and Context

Adds a new skill that fetches live documentation from Anthropic, OpenAI, or Google before answering any AI SDK question so stale training data doesn't cause bugs.

Motivation and Context

Built after a real failure: used superpowers (brainstorming → writing-plans → executing-plans) to build a research tool calling the Anthropic Messages API on 1000 companies. The agent made three confident silent mistakes — all caused by stale training data:

  1. Missed the Messages Batches API — 50% cost reduction, purpose-built for this workload
  2. Treated native server tools as custom implementations to build — didn't know Anthropic provides code execution tool built-in with web search (web_search_20260209) and web fetch (web_fetch_20260209)
  3. Used deprecated tool name syntax throughout

No errors, no hedges — just wrong code.

How Has This Been Tested?

Three pressure tests with empirical baseline runs (March 2026):

  • test-pressure-1 (Anthropic): budget_tokens deprecation on Opus 4.6 — quantitative benchmark: 100% pass rate with skill vs 45% without
  • test-pressure-2 (OpenAI): unverified model IDs in Responses API guidance
  • test-pressure-3 (Google): gemini-1.5 recommended when gemini-2.0 is current

See CREATION-LOG.md for full details.

Breaking Changes

None — additive only.

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist

  • My code follows the repository's style guidelines
  • I have added or updated documentation as needed

Additional context

Supports Anthropic, OpenAI, and Google. Provider detected automatically from imports, env vars, or model names. Falls back to web search if a doc URL has moved.

@coderabbitai
Copy link

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds documentation-only "live-docs-lookup" skill: automatic provider detection and per-provider live documentation fetching (Anthropic, OpenAI, Google/Gemini), concise grounding summaries for models/APIs, a creation log, and academic/pressure tests illustrating expected agent behavior changes.

Changes

Cohort / File(s) Summary
Core Skill Docs
skills/live-docs-lookup/CREATION-LOG.md, skills/live-docs-lookup/SKILL.md
New docs: creation log and detailed skill spec covering provider detection, per-provider URL mappings, parallel fetch + fallback strategy, 30s fetch workflow, verification, and 5–10 bullet grounding output.
Tests — Academic & Pressure
skills/live-docs-lookup/test-academic.md, skills/live-docs-lookup/test-pressure-1.md, skills/live-docs-lookup/test-pressure-2.md, skills/live-docs-lookup/test-pressure-3.md
Adds academic test and three pressure tests (Anthropic extended-thinking deprecation, OpenAI Responses vs Chat Completions, Google/Gemini SDK/model-ID selection) with scenarios, baseline, expected behavior with the skill, and key assertions.
Repository README
README.md
Adds Skills Library entry for the live-docs-lookup skill (appears twice in the same subsection).

Sequence Diagram(s)

sequenceDiagram
  participant Agent
  participant ProviderDetector as ProviderDetector
  participant DocFetcher as DocFetcher
  participant WebSearch as WebSearchFallback
  participant Summarizer as Summarizer

  Agent->>ProviderDetector: supply prompt / code / env
  ProviderDetector->>Agent: detected provider(s)
  ProviderDetector->>DocFetcher: request provider-specific docs
  DocFetcher->>DocFetcher: fetch model/docs pages (parallel)
  DocFetcher-->>WebSearchFallback: fallback restricted web search if missing
  WebSearchFallback-->>DocFetcher: return candidate pages
  DocFetcher->>Summarizer: deliver fetched docs
  Summarizer->>Agent: return 5–10 bullet grounding + suggested fixes
  Agent->>Agent: continue original task using grounded context
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐇 I hopped through docs both near and far,

Nosed out new IDs, chased each tiny star,
I fetched the live pages, flagged what broke and shone,
Brought back tidy notes so the agent’s not alone,
A rabbit’s grounding gift — fresh facts to take home.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly matches the main change: adding a new live-docs-lookup skill for grounding AI SDK documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The PR description clearly explains the motivation (preventing stale training data bugs), provides concrete real-world failure examples, documents testing approach with empirical results, and describes the scope and implementation details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
skills/live-docs-lookup/SKILL.md (1)

1-9: Add this skill to the main skills index for discoverability.

README.md:88-115 enumerates the available skills, but this new one is missing there. Shipping the skill without updating the central list makes it much harder to discover than the rest of the library.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/live-docs-lookup/SKILL.md` around lines 1 - 9, The new skill
"live-docs-lookup" is missing from the central skills index, so add an entry for
it into the README skills list (the section that enumerates available skills)
matching the format used for other skills: include the skill name
"live-docs-lookup", a one-line description pulled from SKILL.md, and a
link/reference to the SKILL.md file so it appears in discoverability lists;
ensure the entry follows the same ordering/format and syntax as existing skill
entries in the README skills index.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/live-docs-lookup/CREATION-LOG.md`:
- Around line 18-28: Update the wording so the docs do not present
web_search_20260209 / web_fetch_20260209 as canonical tool identifiers; instead
state that Anthropic provides native server-hosted web search and web fetch
tools (e.g., historically referenced as web_search_20250305, web_fetch_20250910,
etc.) and add a note instructing developers to verify the exact current `type`
identifier in the official docs before implementation.

In `@skills/live-docs-lookup/SKILL.md`:
- Around line 46-49: Update the fallback instruction in the "If a URL fails or
returns a redirect/404" section to require domain-scoped searches: when a doc
URL fails, search only the vendor's official docs domain (e.g.,
platform.claude.com/docs, developers.openai.com/api/docs, ai.google.dev) using
the doc topic and provider name instead of a general web search; change the text
to explicitly instruct the agent to restrict queries to the provider's canonical
docs domain and include example domain patterns to enforce grounding.
- Around line 30-31: The provider detection list is missing the modern Google
Gen AI import path; add detection for the import pattern "from google import
genai" alongside the existing "import google.generativeai" and "import vertexai"
entries so the skill recognizes projects using the official google-genai SDK.
Update the Google row in SKILL.md to include the `from google import genai`
token (and any equivalent identifier like `genai`) so functions that parse these
strings (provider detection logic) will trigger for the new SDK import style.

In `@skills/live-docs-lookup/test-pressure-3.md`:
- Around line 21-30: The test hardcodes an outdated model ID
(`gemini-2.0-flash`) in the expectation block; update the test in
skills/live-docs-lookup/test-pressure-3.md to either replace that hardcoded ID
with the current recommended IDs (`gemini-3-flash` and `gemini-3.1-pro`)
or—preferably—change the assertion logic referenced by the "whichever is current
at fetch time" check to validate that the fetched model recommendation is newer
than `gemini-2.0-flash` (i.e., a freshness comparison) instead of exact string
matching, so the test enforces dynamic live-docs fetching rather than a fixed
model name.
- Around line 36-50: The test expects the deprecated Python SDK
`google-generativeai`; update it to the current GenAI SDK by replacing
references to `google-generativeai` with `google-genai` and change import
guidance to the new syntax `from google import genai`; also update any example
model IDs from legacy names like `gemini-pro` to current aliases such as
`gemini-2.0-flash` (or the live recommended alias) so the test reflects live-doc
grounding and non-deprecated usage.

---

Nitpick comments:
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 1-9: The new skill "live-docs-lookup" is missing from the central
skills index, so add an entry for it into the README skills list (the section
that enumerates available skills) matching the format used for other skills:
include the skill name "live-docs-lookup", a one-line description pulled from
SKILL.md, and a link/reference to the SKILL.md file so it appears in
discoverability lists; ensure the entry follows the same ordering/format and
syntax as existing skill entries in the README skills index.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: bb5b2716-5e80-474b-a38f-068d40fd5873

📥 Commits

Reviewing files that changed from the base of the PR and between e4a2375 and 581e46d.

📒 Files selected for processing (6)
  • skills/live-docs-lookup/CREATION-LOG.md
  • skills/live-docs-lookup/SKILL.md
  • skills/live-docs-lookup/test-academic.md
  • skills/live-docs-lookup/test-pressure-1.md
  • skills/live-docs-lookup/test-pressure-2.md
  • skills/live-docs-lookup/test-pressure-3.md

- Add `from google import genai` to Google provider detection (google-genai SDK)
- Domain-scope fallback URL search to official docs domains (platform.claude.com/docs, platform.openai.com/docs, ai.google.dev)
- test-pressure-3: update SDK references to recommend google-genai over legacy google-generativeai; remove pinned model ID expectation in favor of "whichever is current at fetch time"
- CREATION-LOG: clarify that tool type identifiers (e.g. web_search_20260209) are versioned and should be verified in official docs
- README: add live-docs-lookup to skills index under AI SDK Grounding category

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
skills/live-docs-lookup/test-pressure-3.md (1)

21-25: ⚠️ Potential issue | 🟠 Major

Remove the baked-in “current” Gemini model from the baseline narrative.

Line 23 hardcodes gemini-2.0-flash as the default “as of March 2026”, but Lines 47-48 and 55-56 say this test should validate whichever model is current at fetch time. That makes the doc stale again as soon as Google updates its recommendation and undercuts the PR’s goal of removing pinned model-ID expectations.

Suggested doc update
-**Confirmed failure — stale model IDs:**
-The baseline recommended `gemini-1.5-flash` and `gemini-1.5-pro` as current
-models. As of March 2026, `gemini-2.0-flash` is the recommended default and
-the 1.5 series has been superseded. The baseline even acknowledged its own
+**Confirmed failure — stale model IDs:**
+The baseline recommended `gemini-1.5-flash` and `gemini-1.5-pro` as current
+models instead of checking the live models page for the fetch-time
+recommendation. The baseline even acknowledged its own
 uncertainty ("check the docs for latest models") — but still gave 1.5-series
 IDs, which developers will copy-paste into production code.
Google Gemini API models page current recommended model March 2026
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/live-docs-lookup/test-pressure-3.md` around lines 21 - 25, Remove the
baked-in “current” Gemini model string and date from the baseline narrative:
delete the hardcoded "gemini-2.0-flash" token and the "as of March 2026"
phrasing and replace with a short, neutral sentence that says the test will
validate whichever Gemini model is returned by the API at fetch time (i.e.,
follow live docs). Update the narrative text that currently contrasts with the
later lines (the passages referencing model validation) so it no longer asserts
a specific recommended model and instead directs readers to query the official
Google Gemini models page for the current recommendation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 132-154: The checklist in SKILL.md's "Common things to verify"
hardcodes rapidly changing provider facts (e.g., Anthropic bullets like
budget_tokens, beta header values, model suffix guidance, OpenAI/Google model
examples), so replace concrete assertions with directive guidance and doc
pointers: edit the Anthropic, OpenAI and Google subsections (the listed bullet
items) to state what to verify (e.g., "confirm current token/format flags,
required beta headers, and exact model IDs") and instruct the agent to consult
the provider's official docs or models page rather than embedding specific
values; remove or reword all specific deprecated flag names and header strings
into generic checks and add a short note like "see provider docs" for each
provider. Ensure the updated text preserves the "what to verify" intent and
removes mutable factual claims.

---

Duplicate comments:
In `@skills/live-docs-lookup/test-pressure-3.md`:
- Around line 21-25: Remove the baked-in “current” Gemini model string and date
from the baseline narrative: delete the hardcoded "gemini-2.0-flash" token and
the "as of March 2026" phrasing and replace with a short, neutral sentence that
says the test will validate whichever Gemini model is returned by the API at
fetch time (i.e., follow live docs). Update the narrative text that currently
contrasts with the later lines (the passages referencing model validation) so it
no longer asserts a specific recommended model and instead directs readers to
query the official Google Gemini models page for the current recommendation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: a746d69d-dcd6-416b-9618-8e6e6de2f7d5

📥 Commits

Reviewing files that changed from the base of the PR and between 581e46d and 0ef9250.

📒 Files selected for processing (4)
  • README.md
  • skills/live-docs-lookup/CREATION-LOG.md
  • skills/live-docs-lookup/SKILL.md
  • skills/live-docs-lookup/test-pressure-3.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • skills/live-docs-lookup/CREATION-LOG.md

…line

SKILL.md: Replace specific deprecated-fact assertions in "Common things to
verify" with durable watch-list items — keeps named parameters as search
anchors (budget_tokens, output_format, beta headers) but removes current-state
claims that would age. Apply CodeRabbit's OpenAI and Google sections as-is.

test-pressure-3.md: Remove hardcoded gemini-2.0-flash from baseline narrative
("as of March 2026, gemini-2.0-flash is the recommended default") — now says
"at least one generation behind at time of testing", consistent with the test
assertion that the skill should recommend whichever model is current at fetch time.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 46-55: Update the OpenAI docs domain used in the fallback search
guidance in SKILL.md: replace the `site:platform.openai.com/docs` entry with
`site:developers.openai.com` so fallback queries target the accessible canonical
documentation domain; ensure the example query (`site:platform.openai.com/docs
responses API`) is updated to use `site:developers.openai.com` as well and keep
the surrounding text/format intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d5e7606a-5ee0-4e1e-a07b-a2e9274709fa

📥 Commits

Reviewing files that changed from the base of the PR and between 0ef9250 and aaa72a5.

📒 Files selected for processing (2)
  • skills/live-docs-lookup/SKILL.md
  • skills/live-docs-lookup/test-pressure-3.md

Use site:developers.openai.com instead — verified accessible (HTTP 200)
and is OpenAI's canonical documentation domain.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dhegde11 dhegde11 closed this Mar 7, 2026
@dhegde11 dhegde11 reopened this Mar 7, 2026
The original failure obra#2 ("treated web search and web fetch as custom tools")
was not accurate — the agent did use native server tools correctly. Reduced
to two documented failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/live-docs-lookup/SKILL.md`:
- Around line 78-91: The OpenAI doc URLs under the "OpenAI" section in SKILL.md
are returning 403; update the listed links (the Model IDs, Chat completions,
Responses API, Function calling, Streaming, Structured outputs, Assistants API,
Batch API, and Rate limits entries) to use the accessible developers.openai.com
domain instead of platform.openai.com (e.g., replace
https://platform.openai.com/docs/... with the corresponding
https://developers.openai.com/docs/... URLs) so the skill can fetch those pages
directly rather than falling back to search.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 475ea557-21a4-4a67-a0e9-45d141745c47

📥 Commits

Reviewing files that changed from the base of the PR and between aaa72a5 and e510c8c.

📒 Files selected for processing (2)
  • skills/live-docs-lookup/CREATION-LOG.md
  • skills/live-docs-lookup/SKILL.md

platform.openai.com/docs returns HTTP 403 to non-browser fetches.
Updated all OpenAI URLs to the accessible developers.openai.com domain
with verified paths. Also removed Assistants API row (now labeled legacy).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dhegde11
Copy link
Author

dhegde11 commented Mar 7, 2026

Ready for review. Done with changes based on CodeRabbit's feedback.

IgorTavcar added a commit to IgorTavcar/superpowers that referenced this pull request Mar 9, 2026
Fetches live documentation from Anthropic/OpenAI/Google before answering
AI SDK questions, preventing stale training data from causing bugs.

Upstream PR: obra#646

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@obra
Copy link
Owner

obra commented Mar 10, 2026

Hi! I'd encourage you to publish this skill as its own plugin.

@obra obra closed this Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants