Skip to content

services: add Surf Inference - OpenAI-compatible LLM inference with streaming sessions#406

Open
tenequm wants to merge 1 commit intotempoxyz:mainfrom
cascade-protocol:feat/surf-inference
Open

services: add Surf Inference - OpenAI-compatible LLM inference with streaming sessions#406
tenequm wants to merge 1 commit intotempoxyz:mainfrom
cascade-protocol:feat/surf-inference

Conversation

@tenequm
Copy link

@tenequm tenequm commented Mar 21, 2026

Service: Surf Inference

URL: https://inference.surf.cascade.fyi
Integration: First-party (native MPP, not proxied)
Payment: Tempo USDC
Intent: Session (per-token streaming) + Charge (one-shot)

What it does

OpenAI-compatible LLM inference API with per-token streaming sessions via MPP payment channels. Supports Claude (Sonnet/Opus 4.5-4.6), Grok (4.1/4.20), Kimi K2.5, MiniMax M2.5/M2.7, Qwen 2.5, and GLM-5.

Endpoints

Route Pricing
POST /v1/chat/completions Dynamic ($0.001 - $25/M tokens depending on model)
GET /v1/models Free

Why it's novel

  • Native MPP with streaming sessions: Per-token billing via payment channels, not one-shot charges. First request opens a channel on-chain, subsequent tokens use off-chain vouchers with sub-100ms latency.
  • Multi-model: Single endpoint, 13+ models across 6 providers.
  • First-party: MPP is built into the service, not proxied through mpp.tempo.xyz.
  • Also supports MCP (Model Context Protocol) for direct AI agent tool calls.

Checklist

  • Service is live and accepting MPP payments
  • schemas/services.ts updated
  • node scripts/generate-discovery.ts run, schemas/discovery.json committed
  • node scripts/gen-icons.cjs run, icons committed
  • pnpm check:types passes
  • pnpm build succeeds

Verify it's live

# Check 402 challenge
curl -s https://inference.surf.cascade.fyi/v1/chat/completions \
  -X POST -H "Content-Type: application/json" \
  -d '{"model":"moonshotai/kimi-k2.5","messages":[{"role":"user","content":"hi"}]}' \
  -o /dev/null -w "%{http_code}"
# Returns: 402

# Free endpoint
curl -s https://inference.surf.cascade.fyi/v1/models | head -c 200

OpenAI-compatible LLM inference with per-token streaming sessions.
First-party MPP integration accepting Tempo USDC payments natively.

Models: Claude (Sonnet/Opus), Grok, Kimi, MiniMax, Qwen, GLM
Endpoints: POST /v1/chat/completions (dynamic pricing), GET /v1/models
@vercel
Copy link

vercel bot commented Mar 21, 2026

@tenequm is attempting to deploy a commit to the Tempo Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant