-
Notifications
You must be signed in to change notification settings - Fork 6
Add source@v1 role for audio input devices #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
e958810
7036dde
e2aac33
05ab13d
8352361
49cb037
c8fda54
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -515,6 +515,212 @@ Binary messages should be rejected if there is no active stream. | |
|
|
||
| The timestamp indicates when the first audio sample in this chunk should be output. Clients must translate this server timestamp to their local clock using the offset computed from clock synchronization. Clients should compensate for any known processing delays (e.g., DAC latency, audio buffer delays, amplifier delays) by accounting for these delays when submitting audio to the hardware. | ||
|
|
||
| ## Sources | ||
|
|
||
| Sendspin can also represent **audio inputs** (e.g., AUX/line-in, turntable preamp, Bluetooth receiver, microphone/voice satellite) as first-class, selectable **sources**. | ||
|
|
||
| A **source** is implemented as a Sendspin client role that streams audio **to** the server. The server remains the single place that performs heavy work such as resampling, transcoding, equalization, mixing, buffering, visualization and distribution to output players. | ||
|
|
||
| Sources are intended to be simple: | ||
| - capture/encode audio | ||
| - optionally provide basic signal presence information (level / line sensing) | ||
| - stream audio frames with timestamps | ||
|
|
||
| A device may implement both `source` and `player` roles (e.g., a speaker with a local AUX input that can be forwarded into Sendspin). | ||
|
|
||
| The server may also expose **built-in inputs** (e.g., a line-in on the server host, or an HDMI capture device connected to the server) as a **virtual source client**. Virtual sources participate in the same source selection and state model as regular source clients and appear in the controller `sources` list. | ||
|
|
||
| ## Source messages | ||
|
|
||
| This section describes messages specific to clients with the `source` role, which capture audio from a local input and stream it to the server. | ||
|
|
||
| A source client uses the same clock synchronization mechanism as all clients. Binary source audio messages are timestamped in the **server time domain** using the clock offset learned from `client/time`/`server/time`. | ||
|
|
||
| ### Client → Server: `client/hello` source@v1 support object | ||
|
|
||
| The `source@v1_support` object in [`client/hello`](#client--server-clienthello) has this structure: | ||
|
|
||
| - `source@v1_support`: object | ||
| - `supported_formats`: object[] - list of supported capture/encode formats in priority order (first is preferred) | ||
| - `codec`: 'opus' | 'flac' | 'pcm' - codec identifier | ||
| - `channels`: integer - number of channels (e.g., 1 = mono, 2 = stereo) | ||
| - `sample_rate`: integer - sample rate in Hz (e.g., 44100, 48000) | ||
| - `bit_depth`: integer - bit depth (e.g., 16, 24) | ||
| - `controls?`: string[] - optional source control commands supported by this client (subset of: 'play' | 'pause' | 'next' | 'previous' | 'activate' | 'deactivate') | ||
| - `features?`: object - optional feature hints | ||
| - `level?`: boolean - true if source reports `level` | ||
| - `line_sense?`: boolean - true if source reports `signal` | ||
|
|
||
| **Note:** Servers must support all audio codecs: 'opus', 'flac', and 'pcm'. | ||
| **Note:** Servers should offer only the `supported_formats` options and avoid requesting unsupported formats. | ||
|
|
||
| Example `client/hello` excerpt: | ||
| ```json | ||
| { | ||
| "type": "client/hello", | ||
| "payload": { | ||
| "client_id": "kitchen-linein", | ||
| "name": "Kitchen Line-In", | ||
| "version": 1, | ||
| "supported_roles": ["source@v1"], | ||
| "source@v1_support": { | ||
| "supported_formats": [ | ||
| { | ||
| "codec": "opus", | ||
| "channels": 2, | ||
| "sample_rate": 48000, | ||
| "bit_depth": 16 | ||
| }, | ||
| { | ||
| "codec": "pcm", | ||
| "channels": 2, | ||
| "sample_rate": 48000, | ||
| "bit_depth": 16 | ||
| } | ||
| ], | ||
| "controls": ["play", "pause", "next", "previous", "activate", "deactivate"], | ||
| "features": { | ||
| "line_sense": true, | ||
| "level": true | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Client → Server: `client/state` source object | ||
|
|
||
| The `source` object in [`client/state`](#client--server-clientstate) has this structure: | ||
|
|
||
| - `source`: object | ||
| - `state`: 'idle' | 'streaming' | 'error' | ||
| - `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported | ||
| - `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported | ||
|
Comment on lines
+597
to
+598
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the use case of
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It isn’t strictly required. We could simplify by only using present/absent, and treat signal as “unknown” when it’s omitted (or when line_sense=false). The only reason to keep unknown is semantic clarity for clients that do support line sensing but can’t determine it yet (startup, device not ready, no samples). If we want to keep the spec minimal, dropping unknown is perfectly fine.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can drop |
||
|
|
||
| Example `client/state` excerpt: | ||
| ```json | ||
| { | ||
| "type": "client/state", | ||
| "payload": { | ||
| "source": { | ||
| "state": "streaming", | ||
| "signal": "present", | ||
| "level": 0.42 | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Client → Server: `client/command` source object | ||
|
|
||
| Source clients may send commands to inform the server about user-initiated capture actions (implementation-defined). | ||
|
|
||
| - `source`: object | ||
| - `command`: 'started' | 'stopped' | ||
|
|
||
| ### Server → Client: `server/command` source object | ||
|
|
||
| The `source` object in [`server/command`](#server--client-servercommand) has this structure: | ||
|
|
||
| - `source`: object | ||
| - `command?`: 'start' | 'stop' | ||
| - `control?`: 'play' | 'pause' | 'next' | 'previous' | 'activate' | 'deactivate' - optional source control command; ignored if unsupported by the client | ||
maximmaxim345 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| - `vad?`: object - optional VAD settings hint | ||
| - `threshold_db?`: number - signal threshold in dB | ||
| - `hold_ms?`: integer - hold time in milliseconds | ||
|
|
||
| All fields are optional. The server may send any subset (`command`, `control`, and/or `vad`) in one message. | ||
|
|
||
| #### Source command semantics | ||
|
|
||
| - `command` controls Sendspin ingest lifecycle for this source: | ||
| - `start`: server requests ingest to become active. The client should transition to `state: "streaming"`, send `input_stream/start`, and then send source audio chunks. | ||
| - `stop`: server requests ingest to become inactive. The client should send `input_stream/end`, stop sending source audio chunks, and transition to `state: "idle"`. | ||
| - `control` is optional upstream-device control intent and only applies when advertised in `source@v1_support.controls`. | ||
| - `play` | `pause` | `next` | `previous`: control content playback behavior on the upstream source device (if supported). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we include this, then we should also include the state of the source? (Basically the same info we send about Sendspin being played) I wonder if this is scope creep and we shouldn't include this for now. What is the use case? |
||
| - `activate` | `deactivate`: prepare or power-manage the upstream source path (for example power on/off, wake/sleep, input enable/disable). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would the server know when to call this? Why can't the source do this automatically on play using a hook |
||
|
|
||
| `start`/`stop` and `play`/`pause` are independent: | ||
|
|
||
| - `start`/`stop` govern whether Sendspin ingest is active. | ||
| - `play`/`pause` govern upstream content playback behavior. | ||
|
|
||
| #### Default ingest behavior | ||
|
|
||
| - Effective default after handshake is `stop` (ingest inactive). | ||
| - Server ingest interest is represented by `command: "start"` / `command: "stop"`. | ||
| - Server implementations should ignore/drop source binary chunks while ingest is not active. | ||
|
|
||
| #### `vad` semantics | ||
|
|
||
| `vad` is an optional server hint for source-side line-sense behavior (`threshold_db`, `hold_ms`). It allows centralized tuning and consistent behavior across sources/groups. Clients may ignore unsupported hints. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This feels out of scope and more like a fleet management feature. This should just be locally configured?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm also in favor of moving the VAD configuration outside the Sendspin protocol. |
||
|
|
||
| Example `server/command` to start capture: | ||
| ```json | ||
| { | ||
| "type": "server/command", | ||
| "payload": { | ||
| "source": { | ||
| "command": "start" | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Client → Server: `input_stream/start` | ||
|
|
||
| The `input_stream/start` message announces the active input stream format and provides any required codec header data. | ||
|
|
||
| - `source`: object | ||
| - `codec`: 'opus' | 'flac' | 'pcm' | ||
| - `channels`: integer | ||
| - `sample_rate`: integer | ||
| - `bit_depth`: integer | ||
| - `codec_header?`: string - Base64 encoded codec header (required for Opus/FLAC) | ||
|
|
||
| Example `input_stream/start`: | ||
| ```json | ||
| { | ||
| "type": "input_stream/start", | ||
| "payload": { | ||
| "source": { | ||
| "codec": "flac", | ||
| "channels": 2, | ||
| "sample_rate": 48000, | ||
| "bit_depth": 16, | ||
| "codec_header": "BASE64..." | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Server → Client: `input_stream/request-format` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One of the things I miss in the spec is what if a client starts streaming data to the server, but the server doesn't care? We would want a way to specify that. Maybe a request-format message to ask for
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The use cases I want to make sure that are covered by this role:
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great point. This is already the intended behavior, and it is how the implementation works. The model is: source.command (start / stop) defines server ingest interest. For the two use cases: Turntable auto-start
Computer output selectable from server
I agree we should make this explicit in the spec text (default stop + ignore/drop when not started), but no new mechanism is required.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have another Idea for the turntable use case: can't the client just directly start the input_stream? I think its a good idea to make this explicit, can you update the text @rudyberends ? |
||
|
|
||
| The server can request a different input stream format. Clients should respond by reconfiguring capture (if supported) and sending a new `input_stream/start` with the updated format and header. | ||
|
|
||
| - `source`: object | ||
| - `codec?`: 'opus' | 'flac' | 'pcm' | ||
| - `channels?`: integer | ||
| - `sample_rate?`: integer | ||
| - `bit_depth?`: integer | ||
|
|
||
| ### Client → Server: `input_stream/end` | ||
|
|
||
| The client ends the current input stream. After this message, no more source audio chunks should be sent until a new `input_stream/start`. | ||
|
|
||
| ### Client → Server: Source Audio Chunks (Binary) | ||
|
|
||
| Binary messages should be rejected by the server if the source is not in `state: 'streaming'`. | ||
| Clients must send `input_stream/start` before the first audio chunk. | ||
|
|
||
| - Byte 0: message type `12` (uint8) | ||
| - Bytes 1-8: timestamp (big-endian int64) - server clock time in microseconds when the first sample was captured | ||
| - Rest of bytes: encoded audio frame | ||
|
|
||
| The timestamp indicates when the first audio sample in this chunk was captured (in server time domain). The server may resample/transcode and then distribute the audio to players with its normal buffering and synchronization strategy. | ||
maximmaxim345 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| **Note:** Source timestamps are derived from the client's clock offset and may show small discontinuities or drift (e.g., ADC clock variance). Server implementations should not assume perfectly continuous timestamps; the audio sample stream itself should remain continuous. | ||
|
|
||
| ## Controller messages | ||
| This section describes messages specific to clients with the `controller` role, which enables the client to control the Sendspin group this client is part of, and switch between groups. | ||
|
|
||
|
|
@@ -584,6 +790,14 @@ The `controller` object in [`server/state`](#server--client-serverstate) has thi | |
| - `supported_commands`: string[] - subset of: 'play' | 'pause' | 'stop' | 'next' | 'previous' | 'volume' | 'mute' | 'repeat_off' | 'repeat_one' | 'repeat_all' | 'shuffle' | 'unshuffle' | 'switch' | ||
| - `volume`: integer - volume of the whole group, range 0-100 | ||
| - `muted`: boolean - mute state of the whole group | ||
| - sources?: object[] - list of available/known sources on the server | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lets remove the Just an idea: Maybe that future role will also allow you to see your library and select a album or playlist for playback? But that's something for later.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed — removed select_source from this PR and left it for a future “media/inputs” role. The reference implementation has been updated accordingly (no controller command, no select/clear CLI; only source listing remains).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you drop |
||
| - id: string - stable identifier of the source (typically the source client_id) | ||
| - name: string - friendly name | ||
| - state: 'idle' | 'streaming' | 'error' | ||
| - signal?: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence | ||
| - selected?: boolean - whether this source is currently selected for this group | ||
| - last_event?: 'started' | 'stopped' - last source event (optional) | ||
| - last_event_ts_us?: integer - server time in microseconds for last event (optional) | ||
|
|
||
| **Reading group volume:** Group volume is calculated as the average of all player volumes in the group. | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.