diff --git a/README.md b/README.md index 3690013c..f06b437b 100644 --- a/README.md +++ b/README.md @@ -1,256 +1,50 @@ # AgentRemote -AgentRemote brings all your agents into one app. +AgentRemote connects Beeper to self-hosted agent runtimes. -Beeper becomes the universal remote for agents. +It gives Matrix/Beeper chats a bridge layer for full history, live streaming, approvals, and remote access, while the actual runtime stays on your machine or network. -Connect agent runtimes to Beeper with full history, live streaming, tool approvals, and encrypted delivery. +This repository is still experimental. -Run the bridge next to your agent, then talk to it from Beeper on your phone or desktop. +## Included bridges -## Why Use It - -- Keep agents on your own machine, server, or private network -- Use Beeper instead of building a separate web UI -- Stream responses and approve tool calls in the same chat -- Reach your agents from anywhere Beeper runs - -## Open Source Focus - -This repository is centered on the self-hosted path. - -That means: - -- local developer machines -- homelabs -- office servers -- runtimes behind a firewall -- private deployments that still want a polished remote interface - -There is a broader product direction around richer AI chats and more opinionated agent experiences. Open source here is focused on making the bridge layer for private deployments easy to run and hard to break. - -## AgentRemote SDK - -If you want to build your own bridge, start with the SDK in [`sdk/`](./sdk). - -The SDK handles the Matrix and Beeper side of the bridge for you: - -- bridge bootstrapping and registration -- room and conversation wrappers -- streaming turn lifecycle -- tool approval UI -- agent identity and capability metadata - -The main entrypoint is `sdk.New(sdk.Config{...})`. - -In practice, most custom bridges only need three things: - -- an `sdk.Agent` that represents the remote assistant in Beeper -- an `OnConnect` hook that builds whatever runtime client you need -- an `OnMessage` hook that turns an incoming Beeper message into model output - -### Minimal SDK Shape - -This is the smallest useful shape of a bridge: - -```go -bridge := sdk.New(sdk.Config{ - Name: "my-bridge", - Agent: &sdk.Agent{ - ID: "my-agent", - Name: "My Agent", - Description: "A custom agent exposed through Beeper", - ModelKey: "openai/gpt-5-mini", - Capabilities: sdk.BaseAgentCapabilities(), - }, - OnConnect: func(ctx context.Context, login *sdk.LoginInfo) (any, error) { - return newRuntimeClient(), nil - }, - OnMessage: func(session any, conv *sdk.Conversation, msg *sdk.Message, turn *sdk.Turn) error { - turn.WriteText("hello from my bridge") - turn.End("stop") - return nil - }, -}) - -bridge.Run() -``` - -`turn` is the important piece here. You can write text and reasoning deltas into it, request approvals, attach sources/files, and then finalize the message with `turn.End(...)` or `turn.EndWithError(...)`. - -### Simple OpenAI SDK Bridge - -The example below is intentionally minimal. It uses the Go OpenAI SDK directly and lets AgentRemote handle the chat room, sender identity, and message lifecycle. - -```go -package main - -import ( - "context" - "fmt" - "log" - "os" - - "github.com/beeper/agentremote/sdk" - "github.com/openai/openai-go/v3" - "github.com/openai/openai-go/v3/option" -) - -func main() { - if os.Getenv("OPENAI_API_KEY") == "" { - log.Fatal("OPENAI_API_KEY is required") - } - - bridge := sdk.New(sdk.Config{ - Name: "openai-simple", - Description: "A minimal OpenAI-backed AgentRemote bridge", - Agent: &sdk.Agent{ - ID: "openai-simple-agent", - Name: "OpenAI Simple", - Description: "Minimal bridge example using openai-go", - ModelKey: "openai/gpt-4o-mini", - Capabilities: sdk.BaseAgentCapabilities(), - }, - OnConnect: func(ctx context.Context, login *sdk.LoginInfo) (any, error) { - return openai.NewClient(option.WithAPIKey(os.Getenv("OPENAI_API_KEY"))), nil - }, - OnMessage: func(session any, conv *sdk.Conversation, msg *sdk.Message, turn *sdk.Turn) error { - client := session.(*openai.Client) - - resp, err := client.Chat.Completions.New(turn.Context(), openai.ChatCompletionNewParams{ - Model: "gpt-4o-mini", - Messages: []openai.ChatCompletionMessageParamUnion{ - openai.SystemMessage("You are a helpful assistant replying through Beeper."), - openai.UserMessage(msg.Text), - }, - }) - if err != nil { - turn.EndWithError(err.Error()) - return err - } - if len(resp.Choices) == 0 { - err := fmt.Errorf("openai returned no choices") - turn.EndWithError(err.Error()) - return err - } - - turn.WriteText(resp.Choices[0].Message.Content) - turn.End(resp.Choices[0].FinishReason) - return nil - }, - }) - - bridge.Run() -} -``` - -Useful details from that example: - -- `OnConnect` returns the session object that will be passed back into every `OnMessage` call. -- `sdk.Message` already gives you the normalized incoming Beeper message text. -- `sdk.Turn` is where you stream or finalize the assistant reply. -- If you want live token streaming later, switch the OpenAI call to `client.Chat.Completions.NewStreaming(...)` or `client.Responses.NewStreaming(...)` and forward deltas with `turn.WriteText(...)`. - -## Included Bridges - -Each bridge has its own README with setup details and scope: - -| Bridge | Purpose | +| Bridge | What it connects | | --- | --- | -| `ai` | AI Chats bridge surface used by the project | -| [`codex`](./bridges/codex/README.md) | Connect the Codex CLI app-server to Beeper | -| [`openclaw`](./bridges/openclaw/README.md) | Connect a self-hosted OpenClaw gateway to Beeper | -| [`opencode`](./bridges/opencode/README.md) | Connect a self-hosted OpenCode server to Beeper | - -## Quick Start +| `ai` | The built-in Beeper AI chat surface in this repo | +| [`codex`](./bridges/codex/README.md) | A local `codex app-server` runtime | +| [`opencode`](./bridges/opencode/README.md) | A remote OpenCode server or a bridge-managed local OpenCode process | +| [`openclaw`](./bridges/openclaw/README.md) | A self-hosted OpenClaw gateway | -Log into Beeper and start a bridge: +## Quick start ```bash ./tools/bridges login --env prod +./tools/bridges list ./tools/bridges run codex ``` -Then open Beeper and use the connected bridge from chat. - -For a local Beeper environment: - -```bash -./tools/bridges login --env local -./tools/bridges whoami -./tools/bridges run codex -``` - -Configured instances live under `~/.config/agentremote/profiles//instances/`: - -- `ai` -- `codex` -- `openclaw` -- `opencode` - -Run any of them directly: - -```bash -./tools/bridges run ai -./tools/bridges run codex -./tools/bridges run openclaw -./tools/bridges run opencode -``` - -Or use the wrapper: +Useful commands: -```bash -./run.sh ai -./run.sh codex -./run.sh openclaw -./run.sh opencode -``` +- `./tools/bridges up ` starts a bridge in the background +- `./tools/bridges status` shows local and remote bridge state +- `./tools/bridges logs --follow` tails logs +- `./tools/bridges stop ` stops a running instance -## Bridge Manager +Instance state lives under `~/.config/agentremote/profiles//instances/`. -Common commands: +## SDK -```bash -./tools/bridges list -./tools/bridges status -./tools/bridges logs codex --follow -./tools/bridges restart codex -./tools/bridges down codex -./tools/bridges whoami -``` +Custom bridges in this repo are built on [`sdk/`](./sdk), using: -Reset all local bridge state and registrations: +- `bridgesdk.NewStandardConnectorConfig(...)` +- `bridgesdk.NewConnectorBase(...)` +- `sdk.Config`, `sdk.Agent`, `sdk.Conversation`, and `sdk.Turn` -```bash -./tools/bridges delete ai -./tools/bridges delete codex -./tools/bridges delete openclaw -./tools/bridges delete opencode -./tools/bridges logout -``` +See [`bridges/dummybridge`](./bridges/dummybridge) for a minimal bridge example. ## Docs -- [`docs/bridge-orchestrator.md`](./docs/bridge-orchestrator.md): local bridge management workflow -- [`docs/matrix-ai-matrix-spec-v1.md`](./docs/matrix-ai-matrix-spec-v1.md): Matrix transport profile for streaming, approvals, state, and AI payloads -- [`bridges/codex/README.md`](./bridges/codex/README.md): Codex bridge details -- [`bridges/openclaw/README.md`](./bridges/openclaw/README.md): OpenClaw bridge details -- [`bridges/opencode/README.md`](./bridges/opencode/README.md): OpenCode bridge details - -## Status - -Experimental and evolving quickly. The transport and bridge surfaces are real, but the project is still early. - -## Build - -Requires `libolm` for encryption support. - -```bash -./build.sh -``` - -Or with Docker: - -```bash -docker build -t agentremote . -``` +- CLI reference: [`docs/bridge-orchestrator.md`](./docs/bridge-orchestrator.md) +- Matrix transport surface: [`docs/matrix-ai-matrix-spec-v1.md`](./docs/matrix-ai-matrix-spec-v1.md) +- Streaming note: [`docs/msc/com.beeper.mscXXXX-streaming.md`](./docs/msc/com.beeper.mscXXXX-streaming.md) +- Command profile: [`docs/msc/com.beeper.mscXXXX-commands.md`](./docs/msc/com.beeper.mscXXXX-commands.md) diff --git a/bridges/codex/README.md b/bridges/codex/README.md index 50bbd97f..df8cf6f3 100644 --- a/bridges/codex/README.md +++ b/bridges/codex/README.md @@ -1,38 +1,28 @@ -# Codex Companion +# Codex Bridge -The Codex Companion bridge connects a local Codex CLI runtime to Beeper through AgentRemote. +The Codex bridge connects Beeper to a local Codex CLI runtime. -This is the bridge for people who want to run Codex on a workstation, laptop, or remote machine and use Beeper as the chat client. It exposes Codex conversations in Beeper with streaming responses, history, and tool approval flows, while keeping the actual runtime close to the code and credentials it needs. +It fits setups where Codex stays on the machine that already has the checkout, credentials, and tools. -## What It Does +## What it does -- Starts or connects to a local `codex app-server` process -- Bridges Codex threads into Beeper rooms -- Streams assistant output into chat as it is generated -- Preserves conversation history -- Surfaces tool calls and approval requests in Beeper +- starts or connects to `codex app-server` +- maps Codex conversations into Beeper rooms +- streams replies into chat +- carries approvals and tool activity through the same room -## Login Model +## Login modes -The bridge supports Codex-backed logins through: +The bridge supports: -- ChatGPT-based auth -- OpenAI API key auth -- Externally managed ChatGPT tokens +- ChatGPT login +- OpenAI API key login +- externally managed ChatGPT tokens +- host-auth auto-detection when Codex is already logged in on the machine -If Codex is already authenticated on the host, the bridge can auto-provision a login from the existing local Codex state. +Managed logins use an isolated `CODEX_HOME` per login. Host-auth uses the machine's existing Codex auth state. -## Best Fit - -Use this bridge when: - -- Your agent already runs through the Codex CLI -- You want a phone-friendly interface for coding agents -- You want to keep execution on your own machine or behind your own network boundary - -## Run It - -From the repo root: +## Run ```bash ./tools/bridges run codex @@ -43,16 +33,3 @@ Or: ```bash ./run.sh codex ``` - -For local Beeper environments: - -```bash -./tools/bridges login --env local -./tools/bridges run codex -``` - -## Notes - -- The bridge uses a dedicated Codex surface rather than the generic AI connector. -- Auth tokens are managed by Codex itself when using the local Codex home flow. -- This bridge is part of the self-hosted AgentRemote story: Beeper is the remote control, Codex stays where the work happens. diff --git a/bridges/openclaw/README.md b/bridges/openclaw/README.md index c38f18c2..b1cff907 100644 --- a/bridges/openclaw/README.md +++ b/bridges/openclaw/README.md @@ -1,39 +1,24 @@ -# OpenClaw Gateway +# OpenClaw Bridge -The OpenClaw Gateway bridge connects a self-hosted OpenClaw gateway to Beeper through AgentRemote. +The OpenClaw bridge connects Beeper to a self-hosted OpenClaw gateway. -This is the most direct way to expose OpenClaw sessions in Beeper while keeping the agent runtime on infrastructure you control. Run the gateway on a local machine, server, or private network, then use Beeper from mobile or desktop to talk to those agents remotely. +## What it does -## What It Does +- connects to a gateway over `ws`, `wss`, `http`, or `https` +- syncs OpenClaw sessions into Beeper rooms +- streams replies, approvals, and session updates into chat -- Connects to an OpenClaw gateway over `ws`, `wss`, `http`, or `https` -- Syncs OpenClaw sessions into Beeper rooms -- Streams responses and updates live -- Carries tool calls, approvals, and agent state into chat -- Preserves per-session metadata, usage, and history context - -## Login Model +## Login flow The bridge asks for: -- Gateway URL -- Optional gateway token -- Optional gateway password -- Optional label for distinguishing multiple gateways - -That makes it a good fit for private deployments where the gateway is reachable only on a LAN, VPN, Tailscale network, or internal hostname. - -## Best Fit - -Use this bridge when: +- gateway URL +- auth mode: none, token, or password +- optional label -- You already run OpenClaw and want Beeper as the client -- Your agents live behind a firewall and should stay there -- You want streaming and approvals without building a separate mobile UI +If the gateway requires device pairing, the login waits for approval and surfaces the request ID. -## Run It - -From the repo root: +## Run ```bash ./tools/bridges run openclaw @@ -44,8 +29,3 @@ Or: ```bash ./run.sh openclaw ``` - -## Notes - -- The bridge is intentionally focused on OpenClaw as a remote runtime, not a hosted SaaS workflow. -- It is a core example of the AgentRemote model: keep the gateway private, use Beeper as the interface. diff --git a/bridges/opencode/README.md b/bridges/opencode/README.md index 01878609..d21867ab 100644 --- a/bridges/opencode/README.md +++ b/bridges/opencode/README.md @@ -1,38 +1,32 @@ -# OpenCode Companion +# OpenCode Bridge -The OpenCode Companion bridge connects a self-hosted OpenCode server to Beeper through AgentRemote. +The OpenCode bridge connects Beeper to OpenCode. -It is built for setups where OpenCode is already running on a machine you trust and you want Beeper to become the front end. That can be a local development machine, a lab box, or an office server that you reach from your phone. +It supports two modes: -## What It Does +- remote: connect to an existing OpenCode server over HTTP +- managed: let the bridge launch `opencode` locally and keep a default working directory -- Connects to an OpenCode server over HTTP -- Subscribes to the OpenCode event stream for live updates -- Maps OpenCode sessions into Beeper rooms -- Streams responses, titles, and session events into chat -- Keeps the bridge usable even when the remote instance temporarily disconnects +## What it does -## Login Model +- maps OpenCode sessions into Beeper rooms +- streams replies and session updates into chat +- keeps reconnect logic inside the bridge instead of requiring a separate UI -The bridge asks for: +## Login flow -- Server URL -- Optional username -- Optional password for HTTP basic auth +Remote mode asks for: -Multiple OpenCode instances can be tracked per login, which is useful if you talk to different machines or environments. +- server URL +- optional basic-auth username +- optional basic-auth password -## Best Fit +Managed mode asks for: -Use this bridge when: +- path to the `opencode` binary +- default working directory -- You run OpenCode yourself and want Beeper access from anywhere -- You want a simple remote interface for agent sessions without exposing a separate UI -- You want to keep the runtime and credentials on the host machine - -## Run It - -From the repo root: +## Run ```bash ./tools/bridges run opencode @@ -43,8 +37,3 @@ Or: ```bash ./run.sh opencode ``` - -## Notes - -- OpenCode uses an HTTP API plus event streaming rather than the local Codex app-server flow. -- In AgentRemote terms, this is the bridge for turning a private OpenCode deployment into a Beeper-accessible agent endpoint. diff --git a/docs/bridge-orchestrator.md b/docs/bridge-orchestrator.md index 20717c11..aea2a1ae 100644 --- a/docs/bridge-orchestrator.md +++ b/docs/bridge-orchestrator.md @@ -1,53 +1,60 @@ -# Bridge Orchestrator +# AgentRemote CLI -`tools/bridges` is a thin wrapper around `agentremote`, which manages isolated bridgev2 instances for Beeper from this repo. +`./tools/bridges` is the local entrypoint for `agentremote`. -## Auth - -Use one of: - -- `./tools/bridges login --env prod` for the email and code flow -- `./tools/bridges auth set-token --token syt_... --env prod` -- Environment variables: `BEEPER_ACCESS_TOKEN`, optional `BEEPER_ENV`, `BEEPER_USERNAME` - -## One-command startup +It wraps: ```bash -./tools/bridges up ai +go run ./cmd/agentremote ... ``` -This will: +## Authentication + +Use one of: -1. Create instance state under `~/.config/agentremote/profiles/default/instances//` -2. Generate config from the bridge binary with `-e` if needed -3. Ensure Beeper appservice registration and sync config tokens -4. Start the bridge process and write PID and log files +- `./tools/bridges login --env prod` +- `./tools/bridges auth set-token --token syt_... --env prod` +- `./tools/bridges whoami` -## Core commands +Profiles default to `default`. + +## Bridge lifecycle - `./tools/bridges list` -- `./tools/bridges login` -- `./tools/bridges logout` -- `./tools/bridges whoami [--output json]` -- `./tools/bridges profiles` +- `./tools/bridges run ` - `./tools/bridges up ` - `./tools/bridges start ` -- `./tools/bridges run ` -- `./tools/bridges init ` -- `./tools/bridges register ` -- `./tools/bridges status [instance]` -- `./tools/bridges instances` -- `./tools/bridges logs [--follow]` -- `./tools/bridges down ` - `./tools/bridges stop ` -- `./tools/bridges stop-all` +- `./tools/bridges down ` - `./tools/bridges restart ` - `./tools/bridges delete [instance]` + +`up` is an alias of `start`. `down` is an alias of `stop`. + +## Inspection + +- `./tools/bridges status [instance...]` +- `./tools/bridges instances` +- `./tools/bridges logs --follow` - `./tools/bridges doctor` + +## Setup helpers + +- `./tools/bridges init ` +- `./tools/bridges register ` - `./tools/bridges completion ` -Shortcut wrapper: +## Quick examples -- `./run.sh ai|codex|opencode|openclaw` - - checks login and prompts with `login` if needed - - then runs the selected bridge instance +```bash +./tools/bridges login --env prod +./tools/bridges up codex --wait +./tools/bridges status codex +./tools/bridges logs codex --follow +``` + +Local instance data is stored under: + +```text +~/.config/agentremote/profiles//instances// +``` diff --git a/docs/matrix-ai-matrix-spec-v1.md b/docs/matrix-ai-matrix-spec-v1.md index 4cd3cd21..389efc1b 100644 --- a/docs/matrix-ai-matrix-spec-v1.md +++ b/docs/matrix-ai-matrix-spec-v1.md @@ -1,491 +1,106 @@ -# Real-time AI with Matrix? +# Matrix AI Transport v1 -## Matrix AI Transport Spec v1 +Status: experimental and unstable. -> [!WARNING] -> Status: *Draft* (unreleased), proposed v1. -> This is a highly experimental profile. -> It relies on homeserver/client support for custom event types and rendering/consumption. -> Streaming transport is message-anchored: a placeholder `m.room.message` advertises `com.beeper.stream`, live deltas flow over `to_device`, and completion is signaled by a final timeline edit. -> This repo contains one experimental implementation, but the transport profile is not bridge-specific: any Matrix bot/client/bridge can emit and consume these events. +## What the code emits -## Contents -- [Scope](#scope) -- [Compatibility](#compatibility) -- [Terminology](#terminology) -- [Inventory](#inventory) -- [Canonical Assistant Message](#canonical) -- [Streaming](#streaming) -- [Timeline Projections](#projections) -- [State Events](#state) -- [Tool Approvals](#approvals) -- [Other Matrix Keys](#other-keys) -- [Implementation Notes](#impl-notes) -- [Forward Compatibility](#forward-compat) +### 1. Canonical assistant messages - -## Scope -This document specifies a Matrix transport profile for real-time AI: -- Canonical assistant content in `m.room.message` (`com.beeper.ai` as AI SDK-compatible `UIMessage`). -- Streaming deltas via message-anchored transport: - - placeholder `m.room.message` carrying `com.beeper.stream` - - `to_device` subscription and update events (`com.beeper.stream.subscribe`, `com.beeper.stream.update`) - - final `m.replace` timeline edit with canonical content -- `com.beeper.ai.*` timeline projection events (tool call/result, compaction status, etc). -- standard Matrix room features for capability advertising. -- Tool approvals (MCP approvals + selected builtin tools). -- Auxiliary `com.beeper.ai*` keys used for routing/metadata. +Assistant turns are stored as normal `m.room.message` events with: -This spec is intended to be usable by any Matrix bot/client/bridge. Where this document references "the bridge", it refers to the producing implementation (for this repo, `AI Chats`). +- standard Matrix fallback fields such as `msgtype` and `body` +- `com.beeper.ai`, which carries an AI SDK-style `UIMessage` -Upstream reference (AI SDK): -- Normative message model target: Vercel AI SDK `ai@6.0.121`. -- Core types: - - `packages/ai/src/ui/ui-messages.ts` - - `packages/ai/src/ui-message-stream/ui-message-chunks.ts` - - `packages/ai/src/ui-message-stream/json-to-sse-transform-stream.ts` +Current shape: -Reference implementation in this repo (AI Chats): -- Event type identifiers: `pkg/matrixevents/matrixevents.go` -- Event payload structs (where defined): `bridges/ai/events.go` -- Streaming envelope and emission: `pkg/matrixevents/matrixevents.go`, `turns/session.go`, `sdk/turn.go` -- Tool call/result projections: `bridges/ai/tool_execution.go` -- Compaction status emission: `bridges/ai/response_retry.go` -- State broadcast: `bridges/ai/chat.go` -- Approvals: `bridges/ai/tool_approvals*.go`, `bridges/ai/handlematrix.go`, `bridges/ai/handler_interfaces.go`, `bridges/ai/streaming_ui_tools.go` -- Shared approval manager and reaction handling: `approval_manager.go`, `approval_decision.go` - - -## Compatibility -- Homeserver support for custom event types is required. -- Clients that want live streaming must implement `com.beeper.stream` descriptor handling plus `to_device` subscribe/update flows. -- Non-supporting clients still interoperate through placeholder fallback text and the final timeline edit. -- Non-supporting clients should fall back to `m.room.message.body` where available. - - -## Terminology -- `turn_id`: Unique ID for a single assistant response "turn". -- `seq`: Per-turn monotonic sequence number for streamed deltas. -- `call_id` / `toolCallId`: Tool invocation identifier. -- `timeline`: persisted Matrix events. -- `stream descriptor`: `com.beeper.stream` object attached to the placeholder timeline event. -- `subscription`: A short-lived `to_device` request to receive live updates for one placeholder event. -- `m.reference`: relation used to link events to a target event ID. -- `m.replace`: relation used to edit/replace an earlier timeline message. - - -## Inventory -Authoritative identifiers are defined in `pkg/matrixevents/matrixevents.go`. - -### Event Types -| Event type | Class | Persistence | Primary purpose | Spec section | -| --- | --- | --- | --- | --- | -| `m.room.message` | message | timeline | Canonical assistant message carrier; placeholder also carries `com.beeper.stream` | [Canonical](#canonical) | -| `com.beeper.stream.subscribe` | to-device | transient | Subscribe one device to a placeholder-backed live stream | [Streaming](#streaming) | -| `com.beeper.stream.update` | to-device | transient | Deliver buffered or incremental stream deltas | [Streaming](#streaming) | -| `com.beeper.ai.compaction_status` | message | timeline | Context compaction lifecycle/status | [Projections](#projection-compaction) | -| `com.beeper.ai.agents` | state | state | Agent definitions for the room | — | - -### Content Keys (Inside Standard Events) -| Key | Where it appears | Purpose | Spec section | -| --- | --- | --- | --- | -| `com.beeper.ai` | `m.room.message` | Canonical assistant `UIMessage` | [Canonical](#canonical) | -| `com.beeper.stream` | `m.room.message` | Active live-stream descriptor for a placeholder message | [Streaming](#streaming) | -| `com.beeper.ai.model_id` | `m.room.message` | Routing/display hint | [Other keys](#other-keys-routing) | -| `com.beeper.ai.agent` | `m.room.message`, `m.room.member` | Routing hint or agent definition | [Other keys](#other-keys-agent) | -| `com.beeper.ai.image_generation` | `m.room.message` (image) | Generated-image tag/metadata | [Other keys](#other-keys-media) | -| `com.beeper.ai.tts` | `m.room.message` (audio) | Generated-audio tag/metadata | [Other keys](#other-keys-media) | - - -## Canonical Assistant Message -Canonical assistant content is carried in a standard `m.room.message` event. - -Requirements: -- MUST include standard Matrix fallback fields (`msgtype`, `body`) for non-AI clients. -- MUST include `com.beeper.ai` and it MUST be an AI SDK-compatible `UIMessage`. - -### UIMessage Shape -`com.beeper.ai`: -- `id: string` -- `role: "assistant"` -- `metadata?: object` -- `parts: UIMessagePart[]` - -Recommended `metadata` keys: -- `turn_id`, `agent_id`, `model`, `finish_reason` -- `usage` (`prompt_tokens`, `completion_tokens`, `reasoning_tokens`, `total_tokens?`) -- `timing` (`started_at`, `first_token_at`, `completed_at`, unix ms) - -Example: ```json { "msgtype": "m.text", - "body": "Thinking...", + "body": "...", "com.beeper.ai": { "id": "turn_123", "role": "assistant", - "metadata": { "turn_id": "turn_123" }, + "metadata": { + "turn_id": "turn_123" + }, "parts": [] } } ``` -### Assistant Turn Encoding -Send assistant turns as standard `m.room.message` events: -- `msgtype` and `body` for Matrix fallback. -- Full AI payload in `com.beeper.ai` as `UIMessage`. -- Turn-level metadata in `com.beeper.ai.metadata` (for example: `turn_id`, `agent_id`, `model`, `finish_reason`, `usage`, `timing`). +The final edit keeps `com.beeper.ai` as the canonical payload. Streaming-only UI parts are compacted before final persistence. - -## Streaming -Streaming uses a placeholder timeline event plus `to_device` subscribe/update traffic. +### 2. Message-anchored live streaming -### Placeholder Descriptor -The sender starts the turn by sending a placeholder `m.room.message`. While the turn is live, that message carries `com.beeper.stream`: +When live streaming is available, the placeholder message also carries `com.beeper.stream`. -```json -{ - "msgtype": "m.text", - "body": "Thinking...", - "com.beeper.ai": { - "id": "turn_123", - "role": "assistant", - "metadata": { "turn_id": "turn_123" }, - "parts": [] - }, - "com.beeper.stream": { - "user_id": "@aibot:beeper.local", - "device_id": "ABCD1234", - "type": "com.beeper.llm", - "expiry_ms": 1800000 - } -} -``` - -Descriptor fields: -- `user_id: string` (REQUIRED) -- `device_id: string` (REQUIRED) -- `type: string` (REQUIRED, currently `com.beeper.llm`) -- `expiry_ms?: integer` (milliseconds; clients SHOULD stop subscribing after this age) -- `encryption?: object` (OPTIONAL custom symmetric encryption descriptor; see MSC doc) - -If the most recent assistant placeholder in a room still contains `com.beeper.stream`, clients MAY render a preview such as "Generating response...". +The bridge code does not hardcode the transport backend. It asks a `BeeperStreamPublisher` for a descriptor, registers the placeholder event, and emits live deltas against that target. -### Subscription -Clients subscribe with `to_device` event type `com.beeper.stream.subscribe`: +Live delta payloads use the stable `com.beeper.llm` envelope: ```json { - "type": "com.beeper.stream.subscribe", - "content": { - "room_id": "!meow", - "event_id": "$foobar", - "device_id": "4321EFGH", - "expiry_ms": 300000 + "turn_id": "turn_123", + "seq": 7, + "part": { + "type": "text-delta", + "id": "text-turn_123", + "delta": "hello" + }, + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$placeholder" } } ``` -Content: -- `room_id: string` (REQUIRED) -- `event_id: string` (REQUIRED; placeholder event ID) -- `device_id: string` (REQUIRED; subscriber device) -- `expiry_ms?: integer` (OPTIONAL requested subscription lifetime in milliseconds) +Envelope fields: -### Update Delivery -The sender replies with `to_device` event type `com.beeper.stream.update`. +- `turn_id` +- `seq` +- `part` +- `m.relates_to` +- optional `agent_id` -Content: -- `room_id: string` (REQUIRED) -- `event_id: string` (REQUIRED) -- `com.beeper.llm.deltas: object[]` (REQUIRED for `type = "com.beeper.llm"`) +`part` follows the AI SDK `UIMessageChunk` model. -The sender MUST first send buffered state accumulated so far to the new subscriber, then MAY continue with incremental updates while the subscription is active. +### 3. Finalization -### `com.beeper.llm` Delta Envelope -Each entry in `com.beeper.llm.deltas` is: -- `turn_id: string` (REQUIRED) -- `seq: integer` (REQUIRED, starts at 1, strictly increasing per `turn_id`) -- `part: UIMessageChunk` (REQUIRED) -- `m.relates_to: { rel_type: "m.reference", event_id: string }` (REQUIRED) -- `agent_id?: string` (OPTIONAL) +When a turn completes, the placeholder is edited with the final assistant content. The final event is authoritative. The stream descriptor is no longer present after finalization. -### SSE Mapping -AI SDK UI streams emit SSE frames: -- `data: ` -- terminal sentinel `data: [DONE]` +### 4. Compaction status events -Mapping: -1. For each SSE JSON chunk, append one entry to `com.beeper.llm.deltas` with `part = `. -2. `data: [DONE]` is transport-level termination and does not require a Matrix event. +The AI bridge may emit `com.beeper.ai.compaction_status` timeline events while retrying after context compaction. -Implications: -- Producers MUST NOT remap chunk payload schemas. -- Consumers MUST process each delta `part` as AI SDK `UIMessageChunk`. +Current fields are: -### Chunk Compatibility -Producers MAY emit any valid AI SDK `UIMessageChunk` type: -- `start` -- `start-step` -- `finish-step` -- `message-metadata` -- `text-start` -- `text-delta` -- `text-end` -- `reasoning-start` -- `reasoning-delta` -- `reasoning-end` -- `tool-input-start` -- `tool-input-delta` -- `tool-input-available` -- `tool-input-error` -- `tool-approval-request` -- `tool-approval-response` -- `tool-output-available` -- `tool-output-error` -- `tool-output-denied` -- `source-url` -- `source-document` -- `file` -- `data-*` -- `finish` -- `abort` +- `type` +- `session_id` +- `messages_before` +- `messages_after` +- `tokens_before` +- `tokens_after` +- `summary` +- `will_retry` - `error` -Consumer requirements: -- MUST accept and safely handle all valid AI SDK chunk types. -- MUST ignore unknown future chunk types. -- MUST NOT persist `data-*` chunks with `transient: true`. -- MUST treat `start`, `finish`, `abort`, and `message-metadata` as stream-only events, not persisted parts. -- MUST merge payload data from stream-only terminal and metadata chunks into the final canonical `UIMessage.metadata` during finalization or replay assembly. This includes fields such as `finish_reason`, `usage`, and `timing`. -- MUST persist `start-step` as a `step-start` part in the canonical `UIMessage`. - -### Bridge-specific `data-*` chunks -This bridge emits some `data-*` chunks in `part` for UI coordination. Clients that do not recognize them SHOULD ignore them. - -| Chunk type | Transient | Payload | -| --- | --- | --- | -| `data-tool-progress` | yes | `data.call_id`, `data.tool_name`, `data.status`, `data.progress` | -| `data-image_generation_partial` | yes | `data.item_id`, `data.index`, `data.image_b64` | -| `data-annotation` | yes | `data.annotation`, `data.index` | - -### Ordering and Lifecycle -Per turn: -- `seq` MUST be strictly increasing. -- Duplicate/stale deltas (`seq <= last_applied_seq`) MUST be ignored. -- Out-of-order deltas SHOULD be buffered briefly and applied in `seq` order. -- Producers MUST NOT advertise or publish a live stream until the canonical assistant placeholder has a concrete Matrix event ID. -- Producers MUST buffer the final timeline edit until the placeholder's Matrix event ID is resolved, because `m.replace` requires `m.relates_to.event_id`. -- If neither a bridge-side message ID nor a Matrix event ID exists, producers MUST buffer or fail the turn and MUST NOT start live delivery. +### 5. Command descriptions -Required lifecycle: -1. Send initial placeholder `m.room.message` with seed `com.beeper.ai` and `com.beeper.stream`. -2. Resolve/store the placeholder's Matrix event ID. -3. Accept `com.beeper.stream.subscribe` requests for that placeholder. -4. Send buffered `com.beeper.stream.update` state, then incremental updates (monotonic `seq`) while subscribed. -5. Emit final timeline edit (`m.replace`) containing final fallback text + full final `com.beeper.ai`, and remove `com.beeper.stream`. - -Terminal chunks: -- The stream SHOULD end with one of: `finish`, `abort`, `error`. - -Mermaid (conceptual): -```mermaid -sequenceDiagram - participant C as Client - participant H as Homeserver - participant B as Bridge - - B->>H: m.room.message (placeholder + com.beeper.ai + com.beeper.stream) - H->>C: timeline placeholder - C->>B: to_device com.beeper.stream.subscribe - loop subscribed updates - B->>C: to_device com.beeper.stream.update (com.beeper.llm.deltas) - end - B->>H: m.room.message (m.replace final + com.beeper.ai final) - H->>C: timeline edit -``` - -### Streaming Example -```json -{ - "room_id": "!meow", - "event_id": "$foobar", - "com.beeper.llm.deltas": [ - { - "turn_id": "turn_123", - "seq": 7, - "m.relates_to": { "rel_type": "m.reference", "event_id": "$foobar" }, - "part": { "type": "text-delta", "id": "text-turn_123", "delta": "hello" } - } - ] -} -``` - - -## Additional Timeline Status - - -### `com.beeper.ai.compaction_status` -Status events emitted during context compaction/retry. - -Schema (event content): -- `type: "compaction_start"|"compaction_end"` (required) -- `session_id?: string` -- `messages_before?: number` -- `messages_after?: number` -- `tokens_before?: number` -- `tokens_after?: number` -- `summary?: string` -- `will_retry?: boolean` -- `error?: string` -- `duration_ms?: number` - -Example: -```json -{ - "type": "compaction_end", - "session_id": "main", - "messages_before": 50, - "messages_after": 20, - "tokens_before": 80000, - "tokens_after": 30000, - "summary": "...", - "will_retry": true, - "duration_ms": 742 -} -``` - - -## State Events -This bridge no longer uses custom room state for editable AI configuration. Room target selection is determined by ghost identity and membership, while room-level capability advertising uses standard Matrix room features. - - -## Tool Approvals -Approvals are an owner-only gate for: -- MCP approvals (OpenAI Responses `mcp_approval_request` items). -- Selected builtin tool actions, configured via `network.tool_approvals.require_for_tools`. - -Config (see `config.example.yaml` and `bridges/ai/integrations_config.go`): -- `network.tool_approvals.enabled` (default true) -- `network.tool_approvals.ttl_seconds` (default 600) -- `network.tool_approvals.require_for_mcp` (default true) -- `network.tool_approvals.require_for_tools` (default list in code) - -### Approval Request Emission -When approval is needed, the bridge emits: -1. A live stream delta delivered in `com.beeper.stream.update`, where one `com.beeper.llm.deltas[*].part.type = "tool-approval-request"` and contains: - - `approvalId: string` - - `toolCallId: string` -2. A timeline-visible canonical approval notice. - - The notice is an `m.room.message` with `msgtype = "m.notice"`, SHOULD reply to the originating assistant turn via `m.relates_to.m.in_reply_to`, and includes a complete `com.beeper.ai` `UIMessage` using the canonical shape defined above (`id`, `role`, optional `metadata`, `parts`). - - The notice body MUST list the canonical reaction keys for the available options. - - The bridge MUST send bridge-authored placeholder `m.reaction` events on the notice, one for each allowed option key, using `m.annotation` as the relation type. - - `UIMessage.metadata.approval` SHOULD include: - - `id: string` - - `options: [{ id, key, label, approved, always?, reason? }]` - - `presentation` - - `expiresAt` when known - - The `dynamic-tool` part contains: - - `state = "approval-requested"` - - `toolCallId: string` - - `toolName: string` - - `approval: { id: string }` - -Canonical approval data in persisted `dynamic-tool` parts follows the AI SDK: -- pending approval: `approval: { id: string }` -- responded approval: `approval: { id: string, approved: boolean, reason?: string }` - - -### Approving / Denying -Approvals are resolved through reactions on the canonical approval notice: - -1. **Bridge sends** the canonical approval notice and placeholder reactions for the allowed option keys. -2. **Owner reacts** to that notice using one of the advertised option keys: - -```json -{ - "type": "m.reaction", - "content": { - "m.relates_to": { - "rel_type": "m.annotation", - "event_id": "$approval_notice", - "key": "approval.allow_once" - } - } -} -``` - -Rules: -- The approval notice is the canonical Matrix artifact. Rich clients MAY also observe mirrored `tool-approval-request` and `tool-approval-response` stream parts inside `com.beeper.stream.update`. A `tool-approval-response` chunk carries `approvalId`, `toolCallId`, `approved`, and optional `reason`. -- Clients MUST NOT send legacy timeline approval decision payloads such as `com.beeper.ai.approval_decision`; owner reactions on the approval notice are the only Matrix approval action. -- Only owner reactions with an advertised option key can resolve the approval. -- Non-owner reactions and invalid keys MUST be rejected and SHOULD be redacted. -- On terminal completion, the bridge MUST edit the approval notice into its final state and redact all bridge-authored placeholder reactions. -- The resolving owner reaction MUST remain visible. -- If the approval was resolved outside Matrix, the bridge SHOULD mirror the owner's chosen reaction into Matrix before terminal cleanup so the notice stays in sync. -- Approval notices and their terminal edits remain excluded from provider replay history. - -Always-allow: -- Reacting with the `allow always` option persists an allow rule in login metadata, scoped to the current login/account for the current bridge implementation. -- A stored rule matches on the approval target identity emitted by the bridge for that login: at minimum `toolName`, plus any bridge-emitted qualifier needed to distinguish separate approval surfaces for that login (for example agent/model or room-scoped tool routing). -- Rules are allow-only. If multiple stored rules match, the most specific rule for the current login wins; otherwise any matching allow rule MAY be applied. -- Approval events themselves remain the audit record for the concrete `approvalId`; persisted allow rules are derived from those events and do not change canonical replay history. - -TTL: -- Pending approvals expire after `ttl_seconds`. - - -## Other Matrix Keys - - -### Routing/Display Hints on `m.room.message` -The bridge may set: -- `com.beeper.ai.model_id: string` -- `com.beeper.ai.agent: string` - - -### Agent Definitions in `m.room.member` (Builder room) -Agent definitions can be stored in member state (see `AgentMemberContent` in `bridges/ai/events.go`): -- `com.beeper.ai.agent: AgentDefinitionContent` - -Example: -```json -{ - "membership": "join", - "displayname": "Researcher", - "avatar_url": "mxc://example.org/abc", - "com.beeper.ai.agent": { - "id": "researcher", - "name": "Researcher", - "model": "openai/gpt-5", - "created_at": 1738970000000, - "updated_at": 1738970000000 - } -} -``` +AI rooms broadcast `org.matrix.msc4391.command_description` state events for the user-facing commands implemented by the bridge. See [`docs/msc/com.beeper.mscXXXX-commands.md`](./msc/com.beeper.mscXXXX-commands.md). - -### AI-Generated Media Tags -Generated media messages may include minimal metadata: -- `com.beeper.ai.image_generation: { "turn_id": "..." }` -- `com.beeper.ai.tts: { "turn_id": "..." }` +## Extra keys -### Unstable HTTP Namespace -For the Beeper provider, base URLs may be formed with: -- `/_matrix/client/unstable/com.beeper.ai` +These keys appear as metadata or rendering hints on Matrix events: -Examples: -- `https:///_matrix/client/unstable/com.beeper.ai/openrouter/v1` -- `https:///_matrix/client/unstable/com.beeper.ai/openai/v1` -- `https:///_matrix/client/unstable/com.beeper.ai/exa` +- `com.beeper.ai` +- `com.beeper.stream` +- `com.beeper.ai.model_id` +- `com.beeper.ai.agent` +- `com.beeper.ai.image_generation` +- `com.beeper.ai.tts` - -## Implementation Notes -- Desktop consumes `com.beeper.llm.deltas[*].part` as an AI SDK `UIMessageChunk` and reconstructs a live `UIMessage`. -- Matrix envelope concerns (`turn_id`, `seq`, `m.relates_to`) remain bridge/client responsibilities inside each delta entry. -- Consumers should prefer AI SDK-compatible chunk semantics (metadata merge, tool partial JSON handling, step boundaries). +## Notes - -## Forward Compatibility -- Clients MUST ignore unknown `com.beeper.ai.*` event types and unknown fields. -- Clients MUST ignore unknown future streaming chunk types. +- Custom agents are stored in login metadata, not published as room state events. +- `com.beeper.ai.info` is registered as a known state type, but it is not actively broadcast. +- Room capability state is sent through standard Beeper room-feature state, not a custom AI state event. diff --git a/docs/msc/com.beeper.mscXXXX-commands.md b/docs/msc/com.beeper.mscXXXX-commands.md index a09019d5..74565927 100644 --- a/docs/msc/com.beeper.mscXXXX-commands.md +++ b/docs/msc/com.beeper.mscXXXX-commands.md @@ -1,98 +1,34 @@ -# MSC: AI Chats MSC4391 Command Profile +# MSC: AI Command Profile -## Summary +Status: implemented for the AI bridge in this repo. -This document defines the specific command set that AI Chats advertises via [MSC4391] bot command descriptions. Rather than introducing a custom `com.beeper.*` command system, AI Chats adopts MSC4391 directly — broadcasting `org.matrix.msc4391.command_description` state events so that supporting clients can render slash commands with autocomplete and typed parameters. +## Transport -This is a profile document, not a new MSC. It specifies which commands AI Chats publishes via MSC4391. +Room state: -## Motivation +- `org.matrix.msc4391.command_description` -Text-based bot commands (`!ai status`, `!ai reset`) have several problems: +Structured invocation: -- **Undiscoverable:** Users must read documentation or type `!ai help` to learn available commands. There is no in-client autocomplete or parameter hinting. -- **Fragile parsing:** Free-text command parsing leads to ambiguous inputs and poor error messages. Typed parameters eliminate this class of bugs. -- **No validation:** Without structured schemas, clients cannot validate arguments before sending. Invalid commands waste a round-trip. +- `org.matrix.msc4391.command` inside `m.room.message` -[MSC4391] solves these problems by letting bots advertise commands as room state events. Clients that support MSC4391 render them as slash commands with autocomplete. AI Chats adopts this directly. +When both structured data and plain text are present, the structured command wins. -## Proposal +## Built-in user-facing commands -### State Event +The AI bridge currently publishes these stable user-facing commands: -Type: `org.matrix.msc4391.command_description` +| Command | Meaning | +| --- | --- | +| `new` | Create a new chat of the same type, optionally targeting an agent | +| `status` | Show current session status | +| `reset` | Start a new session or thread in the current room | +| `stop` | Abort the active run and clear the pending queue | -The bot MUST broadcast one state event per command when it joins a room. The `state_key` is the command name. - -```json -{ - "type": "org.matrix.msc4391.command_description", - "state_key": "status", - "content": { - "description": "Show current session status", - "arguments": {} - } -} -``` - -```json -### Structured Invocation - -When a client sends a command, it MUST include the `org.matrix.msc4391.command` field in the message content: - -```json -{ - "type": "m.room.message", - "content": { - "msgtype": "m.text", - "body": "!ai status", - "org.matrix.msc4391.command": { - "command": "status", - "arguments": {} - } - } -} -``` - -The `body` field MUST contain a text fallback for clients without MSC4391 support. When `org.matrix.msc4391.command` is present, the bot MUST use the structured field and ignore the `body` for command parsing. - -### Command List - -Commands broadcast by AI Chats: - -| Command | Description | Arguments | -|---------|-------------|-----------| -| `new` | Create a new chat of the same type | `agent?: string` | -| `status` | Show current session status | — | -| `reset` | Start a new session/thread | — | -| `stop` | Abort current run and clear queue | — | - -Dynamic commands from integrations and modules are also broadcast as state events. +Integration modules may register more commands at runtime. Those are also broadcast through MSC4391 when available. ## Fallback -Clients without MSC4391 support MAY send commands as `!ai ` text messages. The bot MUST parse `!ai` prefixed text as a fallback when the `org.matrix.msc4391.command` field is absent. - -When both are present, the structured `org.matrix.msc4391.command` field takes precedence over the text `body`. - -## Security Considerations - -- **Command authorization:** The bot SHOULD check room power levels before executing commands that modify room or session state. -- **Argument validation:** The bot MUST validate structured arguments against the published schema before execution. Malformed arguments MUST be rejected with an error message. - -## Unstable Prefix - -This profile uses the MSC4391 unstable prefix directly: - -| Unstable | Stable (future) | -|----------|----------------| -| `org.matrix.msc4391.command_description` | `m.command_description` | -| `org.matrix.msc4391.command` | `m.command` | - -No `com.beeper.*` variant is needed — MSC4391 is adopted as-is. - -## Dependencies - -- [MSC4391]: Bot command descriptions — the underlying protocol this profile builds on. +Clients without MSC4391 support can still send plain-text commands using the room command prefix. -[MSC4391]: https://github.com/matrix-org/matrix-spec-proposals/pull/4391 +The default command prefix is `!ai`. diff --git a/docs/msc/com.beeper.mscXXXX-ephemeral.md b/docs/msc/com.beeper.mscXXXX-ephemeral.md index ee1e2fd6..6b3c6f73 100644 --- a/docs/msc/com.beeper.mscXXXX-ephemeral.md +++ b/docs/msc/com.beeper.mscXXXX-ephemeral.md @@ -1,139 +1,9 @@ # MSC: Custom Room Ephemeral Events -## Summary +Current status: -`com.beeper.ephemeral` provides a transport for custom ephemeral events in Matrix rooms. This is an implementation of [MSC2477] with a `com.beeper` unstable prefix, plus transparent E2EE support following the [MSC3673] pattern. +- no bridge here implements `com.beeper.ephemeral` +- live AI output uses the message-anchored streaming model in [`com.beeper.mscXXXX-streaming.md`](./com.beeper.mscXXXX-streaming.md) +- timeline state remains the source of truth -Ephemeral events are short-lived, non-persisted events delivered via `/sync` to joined room members. They are useful for real-time features like room-scoped AI telemetry, live indicators, and collaborative cursors. - -## Motivation - -Matrix currently provides only a limited set of built-in ephemeral events — primarily typing indicators (`m.typing`) and read receipts (`m.receipt`). Applications that need real-time, non-persisted data delivery within a room have no standard mechanism available. - -Use cases that require custom ephemeral events include: - -- **Transient room-scoped AI telemetry:** Implementations that want every joined client to observe non-persisted AI status can use ephemeral events. The AI streaming profile in this repo now prefers message-anchored `to_device` delivery instead of room ephemerals. -- **Collaborative cursors:** Real-time cursor position sharing in shared editing contexts. -- **Custom presence:** Application-specific presence or activity indicators beyond `m.presence`. - -[MSC2477] proposes user-defined ephemeral events but has not yet been merged into the Matrix specification. This proposal implements the same concept with a `com.beeper` unstable prefix to unblock real-time features today. - -## Proposal - -### Differences from MSC2477 - -| Aspect | MSC2477 | com.beeper.ephemeral | -|--------|---------|---------------------| -| Unstable prefix | `org.matrix.msc2477` | `com.beeper.ephemeral` | -| Endpoint | `PUT /_matrix/client/unstable/org.matrix.msc2477/rooms/{roomId}/ephemeral/{type}/{txnId}` | `PUT /_matrix/client/unstable/com.beeper.ephemeral/rooms/{roomId}/ephemeral/{type}/{txnId}` | -| Power levels key | `ephemeral` + `ephemeral_default` (default 50) | Same concept — checked via power levels | -| TTL | Not specified | Servers SHOULD expire events. Recommended TTL: 2 minutes. | -| Timestamp | `origin_server_ts` on event | `?ts=` query param on PUT, stored as `origin_server_ts` | -| Response | `{}` | `{}` (empty body) | -| Built-in type blocking | Rejects `m.*` types | Rejects built-in `m.*` ephemeral types except `m.room.encrypted` | -| Sync delivery | `ephemeral` section of `/sync` rooms | Same — delivered in `rooms.join.{roomId}.ephemeral.events[]` | - -### Client-Server API - -#### Sending - -``` -PUT /_matrix/client/unstable/com.beeper.ephemeral/rooms/{roomId}/ephemeral/{eventType}/{txnId} -``` - -**Request body:** Arbitrary JSON content. - -**Query parameters:** - -| Parameter | Type | Required | Description | -|-----------|------|----------|-------------| -| `ts` | integer | no | Unix millisecond timestamp for `origin_server_ts`. If omitted, the server MUST use the current time. | - -**Authentication:** Standard Matrix access token. The sender MUST be joined to the room. - -**Power levels:** The server MUST check the sender's power level against the room's ephemeral event power level for the given `eventType`. - -**Constraints:** -- Maximum content size: 64KB. Servers MUST reject requests exceeding this limit with `M_TOO_LARGE`. -- Event types: Servers MUST accept `m.room.encrypted` and custom non-`m.*` event types. Servers MUST reject other built-in `m.*` ephemeral event types. -- Deduplication: Servers MUST deduplicate on the composite key `(room_id, sender, event_type, txn_id)`. Duplicate sends MUST be silently accepted and return `200 OK`. - -**Response:** `200 OK` -```json -{} -``` - -#### Receiving via /sync - -Ephemeral events appear in the `/sync` response under `rooms.join.{roomId}.ephemeral.events[]`: - -```json -{ - "type": "com.example.custom", - "sender": "@user:server", - "origin_server_ts": 1709123456000, - "room_id": "!room:server", - "content": { ... } -} -``` - -Servers MUST only deliver ephemeral events to users with `membership: join` in the room. - -#### TTL and Expiry - -Servers SHOULD expire ephemeral events after a configured TTL. The recommended TTL is 2 minutes. Servers SHOULD run periodic cleanup to remove expired events. The `/sync` endpoint MUST NOT deliver expired events. - -### E2EE - -When a room is encrypted, clients MUST encrypt ephemeral event content using the room's Megolm session before sending: - -1. The client checks whether the room is encrypted. -2. If encrypted: the client wraps the content with Megolm encryption and sets `eventType` to `m.room.encrypted`. -3. The encrypted event is sent via `PUT .../ephemeral/m.room.encrypted/{txnId}`. -4. The server stores the event content-agnostically. -5. `/sync` delivers the encrypted event. Receiving clients decrypt with shared Megolm room keys. - -This reuses existing room Megolm sessions — no separate key management is required. This follows the [MSC3673] pattern for encrypted ephemeral data units. - -## Potential Issues - -- **No delivery guarantee:** Ephemeral events are best-effort. Clients MUST NOT rely on ephemeral events as the sole delivery mechanism for critical data. Applications SHOULD provide a persisted fallback (e.g. timeline edits for streaming). -- **TTL semantics are server-defined:** The TTL is a server implementation detail, not a client-controlled parameter. Different servers MAY use different TTL values, which could affect applications that assume a specific event lifetime. -- **Dedup key constraints:** The composite dedup key `(room_id, sender, event_type, txn_id)` means that two different senders MAY use the same `txn_id` for the same `event_type` without conflict, but a single sender reusing a `txn_id` will have the second event silently dropped. - -## Alternatives - -### `to_device` events - -`to_device` events provide direct device-to-device messaging but bypass room semantics entirely. They require the sender to enumerate target devices, do not benefit from server-side room membership filtering, and cannot be delivered to all room members via a single API call. - -### Reusing `m.typing` - -The existing `m.typing` mechanism is limited to a single boolean per user per room. It cannot carry arbitrary payloads, custom types, or per-event content. Extending `m.typing` to support custom data would be a breaking change to a well-established API. - -### MSC2477 directly - -Adopting [MSC2477] with its `org.matrix.msc2477` prefix is the eventual goal. The `com.beeper.ephemeral` prefix is used in the interim because MSC2477 has not yet been merged, and we need to ship real-time features today. The protocol semantics are intentionally aligned to make migration straightforward. - -## Security Considerations - -- **Power level enforcement:** Servers MUST check the sender's power level before accepting ephemeral events. Without power level checks, any joined user could flood a room with ephemeral events. -- **Content size limits:** Servers MUST enforce the 64KB content size limit. Unbounded content could be used for denial-of-service attacks on the `/sync` pipeline. -- **E2EE requirement for sensitive data:** Applications sending sensitive data (e.g. tool call parameters, user input) via ephemeral events in encrypted rooms MUST encrypt the content per the E2EE section above. Sending plaintext ephemeral events in encrypted rooms leaks data to the server. -- **Rate limiting:** Servers SHOULD apply rate limits to the ephemeral event endpoint. High-frequency streaming use cases (e.g. AI token-by-token output) can generate significant load. - -## Unstable Prefix - -While this proposal is not yet part of the Matrix specification, implementations MUST use the following unstable prefix: - -| Unstable | Stable (future) | -|----------|----------------| -| `com.beeper.ephemeral` (endpoint path) | Aligned with [MSC2477] — `org.matrix.msc2477` or future `m.ephemeral` | - -## Dependencies - -- [MSC2477]: User-defined ephemeral events — the upstream proposal this implementation is based on. -- [MSC3673]: Encrypted ephemeral data units — the pattern for E2EE ephemeral events. - -[MSC2477]: https://github.com/matrix-org/matrix-spec-proposals/pull/2477 -[MSC3673]: https://github.com/matrix-org/matrix-spec-proposals/pull/3673 +If room-scoped custom ephemerals are added later, they should be documented separately from the current bridge surface. diff --git a/docs/msc/com.beeper.mscXXXX-streaming.md b/docs/msc/com.beeper.mscXXXX-streaming.md index bfe24ef1..322b4a68 100644 --- a/docs/msc/com.beeper.mscXXXX-streaming.md +++ b/docs/msc/com.beeper.mscXXXX-streaming.md @@ -1,255 +1,75 @@ # MSC: Message-Anchored AI Streaming -## Summary +Status: experimental. -This proposal defines an application-level streaming profile for real-time AI output in Matrix rooms. +## Current model -Instead of broadcasting every token into room-scoped ephemeral events, the sender publishes a normal placeholder `m.room.message` that carries a `com.beeper.stream` descriptor. Clients that care about live progress subscribe to that descriptor over `to_device`, and the sender delivers buffered and incremental updates directly to those devices. The final assistant message still lands in the room timeline as a normal edit of the placeholder. +The bridge starts a turn with a normal placeholder `m.room.message`. -The profile covers transport, subscription, completion, and optional custom encryption. The authoritative chunk catalog for `com.beeper.llm` remains in the [AI Matrix Spec](../matrix-ai-matrix-spec-v1.md#streaming). +That placeholder may include: -## Motivation +- `com.beeper.ai` for canonical assistant state +- `com.beeper.stream` for live-stream attachment -AI model responses are generated token-by-token and can take tens of seconds to complete. Users should see progress quickly, but room-wide streaming transport has a few practical problems: +While the turn is active, the bridge emits `com.beeper.llm` delta envelopes anchored to the placeholder event. -- **Unnecessary fanout:** Most joined devices are not actively viewing the room. -- **Server support burden:** Custom room-ephemeral support is not universally available. -- **Per-room delivery overhead:** High-frequency token traffic does not need to be delivered to every client. +When the turn finishes, the placeholder is replaced by a final edit and the live stream is considered complete. -Anchoring the stream in a timeline placeholder solves those problems: - -- **Timeline-first UX:** Clients can render a room preview such as "Generating response..." from the placeholder alone. -- **Opt-in live delivery:** Only actively viewing devices subscribe. -- **Strong completion signal:** The final `m.replace` edit removes the stream descriptor, so even non-subscribed clients can tell the stream ended. - -## Proposal - -### Placeholder Descriptor - -The sender starts by sending a placeholder `m.room.message` in the room timeline. The message includes a `com.beeper.stream` object: - -```json -{ - "type": "m.room.message", - "room_id": "!meow", - "event_id": "$foobar", - "sender": "@ai_chatgpt:beeper.local", - "content": { - "msgtype": "m.text", - "body": "Pondering...", - "com.beeper.stream": { - "user_id": "@aibot:beeper.local", - "device_id": "ABCD1234", - "type": "com.beeper.llm", - "expiry_ms": 1800000 - } - } -} -``` - -Fields: - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `user_id` | string | yes | Matrix user that accepts subscriptions and publishes updates. This may differ from the placeholder message sender when bridge bot/device identities differ. | -| `device_id` | string | yes | Device that accepts subscriptions and sends updates. | -| `type` | string | yes | Stream payload family. This proposal currently defines `com.beeper.llm`. | -| `expiry_ms` | integer | no | Maximum age in milliseconds for treating the descriptor as live. Clients SHOULD ignore stale descriptors after this window. | -| `encryption` | object | no | Optional custom symmetric encryption parameters. See [Custom encryption](#custom-encryption). | - -If a message containing `com.beeper.stream` is the latest relevant event in a room, clients MAY show a room-list or timeline preview such as "Generating response...". - -### Subscription Request - -When a client opens the room and sees an unexpired stream descriptor, it subscribes with a `to_device` event: - -```json -{ - "type": "com.beeper.stream.subscribe", - "sender": "@you:beeper.com", - "to_user_id": "@aibot:beeper.local", - "to_device_id": "ABCD1234", - "content": { - "room_id": "!meow", - "event_id": "$foobar", - "device_id": "4321EFGH", - "expiry_ms": 300000 - } -} -``` - -Fields: - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `room_id` | string | yes | Room containing the placeholder message. | -| `event_id` | string | yes | Placeholder event ID being subscribed to. | -| `device_id` | string | yes | Subscriber device that should receive updates. | -| `expiry_ms` | integer | no | Requested subscription lifetime in milliseconds. Clients SHOULD renew before expiry if still viewing the stream. | - -The sender SHOULD verify that the subscription targets a live placeholder message it controls and SHOULD clamp the granted expiry to a sender-defined maximum. - -### Stream Update Delivery - -After receiving a valid subscription, the sender sends a buffered snapshot of stream state so far to the subscribing device, then continues sending incremental updates while the subscription is active: - -```json -{ - "type": "com.beeper.stream.update", - "sender": "@aibot:beeper.local", - "to_user_id": "@you:beeper.com", - "to_device_id": "4321EFGH", - "content": { - "room_id": "!meow", - "event_id": "$foobar", - "com.beeper.llm.deltas": [ - { - "turn_id": "turn_123", - "seq": 7, - "part": { - "type": "text-delta", - "id": "text-turn_123", - "delta": "hello" - }, - "m.relates_to": { - "rel_type": "m.reference", - "event_id": "$foobar" - } - } - ] - } -} -``` - -For a descriptor with `type = X`, update content uses the field `X + ".deltas"`. This proposal defines `com.beeper.llm.deltas` for AI SDK-compatible streaming chunks. - -Each entry in `com.beeper.llm.deltas` uses the stable envelope defined by the AI profile: - -| Field | Type | Required | Description | -|-------|------|----------|-------------| -| `turn_id` | string | yes | Identifier for the assistant turn. | -| `seq` | integer | yes | Monotonically increasing per `turn_id`. | -| `part` | object | yes | AI SDK-compatible streaming chunk. | -| `m.relates_to` | object | yes | `m.reference` pointing at the placeholder event. | -| `agent_id` | string | no | Multi-agent routing hint. | - -For `com.beeper.llm`, producers SHOULD send buffered deltas in-order and receivers SHOULD ignore duplicates where `seq <= last_applied_seq`. - -### Completion - -When the stream is complete, the sender edits the original message: +## Placeholder shape ```json { - "type": "m.room.message", - "room_id": "!meow", - "sender": "@ai_chatgpt:beeper.local", - "content": { - "m.relates_to": { - "rel_type": "m.replace", - "event_id": "$foobar" + "msgtype": "m.text", + "body": "...", + "com.beeper.ai": { + "id": "turn_123", + "role": "assistant", + "metadata": { + "turn_id": "turn_123" }, - "m.new_content": { - "msgtype": "m.text", - "body": "Result of pondering is here" - } + "parts": [] + }, + "com.beeper.stream": { + "...": "publisher-defined descriptor" } } ``` -The terminal edit is authoritative. It SHOULD remove `com.beeper.stream` from the message content and include the finalized assistant state. Clients MUST treat the removal of `com.beeper.stream`, or the arrival of the final edit, as the end of the live stream. - -### Client Behavior - -1. Observe placeholder `m.room.message` events for `com.beeper.stream`. -2. If the descriptor is unexpired and the room is actively viewed, send `com.beeper.stream.subscribe` to the advertised `user_id` and `device_id`. -3. Apply the initial buffered `com.beeper.stream.update`, then subsequent incremental updates. -4. Re-subscribe before subscription expiry if the room remains active. -5. Stop rendering the stream when the placeholder is edited to remove `com.beeper.stream`, when the descriptor has expired, or when the client leaves the room. - -## Custom Encryption +The descriptor comes from the active `BeeperStreamPublisher`. Transport details are publisher-defined. -`to_device` updates can use normal Olm encryption. In encrypted rooms, that is the default and recommended transport. - -As an optional optimization, the placeholder descriptor MAY expose a symmetric key: - -```json -{ - "com.beeper.stream": { - "user_id": "@aibot:beeper.local", - "device_id": "ABCD1234", - "type": "com.beeper.llm", - "expiry_ms": 1800000, - "encryption": { - "algorithm": "com.beeper.stream.v1.aes-gcm", - "key": "57v+6jXy1NOiFzkrrg+nga0VN7+RURdrCEbm+8OrCDA" - } - } -} -``` +## Delta envelope -When using this mode, the sender encrypts the `com.beeper.stream.update` payload once and sends the same ciphertext to every subscriber: +Each streamed delta is wrapped as: ```json { - "type": "m.room.encrypted", - "content": { - "algorithm": "com.beeper.stream.v1.aes-gcm", - "room_id": "!meow", - "event_id": "$foobar", - "iv": "svNAxzmSqyRdMU3O", - "ciphertext": "vrKgF7jsQyd9CKnXLqVjAI9mSLH1okmtu0Puu4Tl4uh+HjrR4JhhD0DhT2ioxiUZMaqgYuERuXThAkpebpFFs0kwT0Bp8sC+NyCXHw8apLWxbUxMZ1FMUvyV5fIR6l6RXS50gA" + "turn_id": "turn_123", + "seq": 7, + "part": { + "type": "text-delta", + "delta": "hello" + }, + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$placeholder" } } ``` -Requirements: - -- `key` is 32 random bytes encoded as unpadded standard base64. -- `room_id` and `event_id` are included in the encrypted event envelope so receivers can route the payload to the correct stream key without trial-decrypting every active stream. -- `iv` is 12 random bytes encoded as unpadded standard base64. -- `ciphertext` is AES-GCM ciphertext followed by the 16-byte authentication tag, encoded as unpadded standard base64. - -This is an optimization, not the baseline transport. - -## Potential Issues - -- **Sender-side subscriber tracking:** The sender must keep short-lived subscriber state per placeholder event. -- **Metadata exposure:** The placeholder reveals that a stream exists and identifies the serving device. -- **Late subscribers:** Clients may receive only buffered state retained by the sender, not an authoritative replay log. -- **Descriptor staleness:** If the sender crashes and never edits the placeholder, clients rely on `expiry_ms` to stop subscribing. - -## Alternatives - -### Room ephemerals - -Room-scoped ephemeral events can broadcast updates to all joined clients, but they require homeserver support and deliver high-frequency traffic to devices that may not be viewing the room. - -### Timeline edits only - -Streaming entirely through `m.replace` edits would persist every intermediate state and create unnecessary room traffic. The placeholder-plus-subscription model keeps the timeline authoritative without persisting every token. - -## Security Considerations +Envelope rules: -- **Authorization:** Senders SHOULD only honor subscriptions from users who are entitled to view the placeholder message. -- **Validation:** `room_id` and `event_id` in subscriptions and updates MUST match the anchored placeholder. -- **Expiry enforcement:** Senders SHOULD cap subscription lifetimes and discard expired subscribers. -- **Custom AES mode:** Anyone who can read the placeholder descriptor can decrypt stream updates when the symmetric key mode is used. This is acceptable only because anyone who can read the placeholder is also allowed to subscribe. -- **Key/IV reuse:** AES-GCM senders MUST generate a fresh random IV for every encrypted update. Implementations that approach AES-GCM limits for a single key MUST rotate keys. +- `turn_id` is required +- `seq` is strictly positive and monotonic per turn +- `part` is required +- `m.relates_to.event_id` must point at the placeholder event +- `agent_id` may be included when the sender wants multi-agent routing hints -## Unstable Prefix +## Final message -While this proposal is not yet part of the Matrix specification, implementations MUST use the following unstable identifiers: +The final timeline edit is the canonical result. -| Unstable | Stable (future) | -|----------|----------------| -| `com.beeper.stream` | `m.stream` | -| `com.beeper.stream.subscribe` | `m.stream.subscribe` | -| `com.beeper.stream.update` | `m.stream.update` | -| `com.beeper.stream.v1.aes-gcm` | `m.stream.v1.aes-gcm` | +The final `com.beeper.ai` payload is compacted before it is attached to the edit, dropping live-only parts that are useful during streaming but not in the stored message. -## Dependencies +## Out of scope -- Matrix timeline messaging (`m.room.message`, `m.replace`) for the placeholder and final state. -- Matrix `to_device` delivery for subscriptions and live updates. -- Standard Olm `to_device` encryption, or the optional AES-GCM mode defined above. +This document does not define the wire protocol behind the stream publisher abstraction. For the broader Matrix event surface, see [`docs/matrix-ai-matrix-spec-v1.md`](../matrix-ai-matrix-spec-v1.md).