services: add Surf Inference - OpenAI-compatible LLM inference with streaming sessions by tenequm · Pull Request #406 · tempoxyz/mpp

tenequm · 2026-03-21T03:13:30Z

Service: Surf Inference

URL: https://inference.surf.cascade.fyi
Integration: First-party (native MPP, not proxied)
Payment: Tempo USDC
Intent: Session (per-token streaming) + Charge (one-shot)

What it does

OpenAI-compatible LLM inference API with per-token streaming sessions via MPP payment channels. Supports Claude (Sonnet/Opus 4.5-4.6), Grok (4.1/4.20), Kimi K2.5, MiniMax M2.5/M2.7, Qwen 2.5, and GLM-5.

Endpoints

Route	Pricing
`POST /v1/chat/completions`	Dynamic ($0.001 - $25/M tokens depending on model)
`GET /v1/models`	Free

Why it's novel

Native MPP with streaming sessions: Per-token billing via payment channels, not one-shot charges. First request opens a channel on-chain, subsequent tokens use off-chain vouchers with sub-100ms latency.
Multi-model: Single endpoint, 13+ models across 6 providers.
First-party: MPP is built into the service, not proxied through mpp.tempo.xyz.
Also supports MCP (Model Context Protocol) for direct AI agent tool calls.

Checklist

Service is live and accepting MPP payments
schemas/services.ts updated
node scripts/generate-discovery.ts run, schemas/discovery.json committed
node scripts/gen-icons.cjs run, icons committed
pnpm check:types passes
pnpm build succeeds

Verify it's live

# Check 402 challenge
curl -s https://inference.surf.cascade.fyi/v1/chat/completions \
  -X POST -H "Content-Type: application/json" \
  -d '{"model":"moonshotai/kimi-k2.5","messages":[{"role":"user","content":"hi"}]}' \
  -o /dev/null -w "%{http_code}"
# Returns: 402

# Free endpoint
curl -s https://inference.surf.cascade.fyi/v1/models | head -c 200

OpenAI-compatible LLM inference with per-token streaming sessions. First-party MPP integration accepting Tempo USDC payments natively. Models: Claude (Sonnet/Opus), Grok, Kimi, MiniMax, Qwen, GLM Endpoints: POST /v1/chat/completions (dynamic pricing), GET /v1/models

vercel · 2026-03-21T03:13:33Z

@tenequm is attempting to deploy a commit to the Tempo Team on Vercel.

A member of the Team first needs to authorize it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

services: add Surf Inference - OpenAI-compatible LLM inference with streaming sessions#406

services: add Surf Inference - OpenAI-compatible LLM inference with streaming sessions#406
tenequm wants to merge 1 commit intotempoxyz:mainfrom
cascade-protocol:feat/surf-inference

tenequm commented Mar 21, 2026

Uh oh!

vercel bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tenequm commented Mar 21, 2026

Service: Surf Inference

What it does

Endpoints

Why it's novel

Checklist

Verify it's live

Uh oh!

vercel bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant