Conversation
Co-authored-by: NichUK <346792+NichUK@users.noreply.github.com> Agent-Logs-Url: https://github.com/NichUK/openclaw-windows-node/sessions/b0f37bbe-5816-430c-9069-9ebbdd02b0a1
…indefinite hangs Co-authored-by: NichUK <346792+NichUK@users.noreply.github.com> Agent-Logs-Url: https://github.com/NichUK/openclaw-windows-node/sessions/0cc237fa-b2b8-427a-83e8-4375e2c3f2fc
Propagate CancellationToken through WebSocket TTS call chain
…/sub-pr-2 # Conflicts: # src/OpenClaw.Tray.WinUI/Services/Voice/VoiceCloudTextToSpeechClient.cs
Co-authored-by: NichUK <346792+NichUK@users.noreply.github.com> Agent-Logs-Url: https://github.com/NichUK/openclaw-windows-node/sessions/556f63ce-3524-4508-9ac9-5b05a7697956
Co-authored-by: NichUK <346792+NichUK@users.noreply.github.com> Agent-Logs-Url: https://github.com/NichUK/openclaw-windows-node/sessions/368e6f83-a2f3-412c-bac7-47d57ddd4d92
Thread CancellationToken + timeout through WebSocket TTS operations
…/voice-mode # Conflicts: # src/OpenClaw.Tray.WinUI/Services/Voice/VoiceCloudTextToSpeechClient.cs
There was a problem hiding this comment.
Pull request overview
Adds a first-pass Windows “Voice Mode” feature set to the WinUI tray app, introducing voice runtime/configuration plumbing, UI surfaces (status + repeater), and provider-catalog driven STT/TTS support, plus associated schema/capability work in OpenClaw.Shared.
Changes:
- Add WinUI voice UI: settings panel integration, voice status window, and repeater window (with persisted placement/options).
- Introduce provider catalog/config store + cloud TTS client scaffolding; add Windows.Media STT route and capture service.
- Extend shared schema/capability surface for voice commands and enhance gateway chat handling for session-key normalization + preview-based final assistant message recovery.
Reviewed changes
Copilot reviewed 50 out of 52 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/OpenClaw.Tray.Tests/WebChatWindowDomBridgeTests.cs | Adds tests for WebChat DOM voice bridge scripting (currently mismatched vs implementation). |
| tests/OpenClaw.Tray.Tests/VoiceServiceTransportTests.cs | Adds unit tests covering internal voice transport/decision helpers. |
| tests/OpenClaw.Tray.Tests/VoiceProviderCatalogServiceTests.cs | Adds tests for voice provider catalog loading + icon generation. |
| tests/OpenClaw.Tray.Tests/VoiceCloudTextToSpeechClientTests.cs | Adds tests for cloud TTS client cancellation + decoding helpers. |
| tests/OpenClaw.Tray.Tests/VoiceChatCoordinatorTests.cs | Adds tests for coordinating draft/turn mirroring across attached windows. |
| tests/OpenClaw.Tray.Tests/SettingsRoundTripTests.cs | Extends settings round-trip/back-compat tests for voice settings + provider config migration. |
| tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj | Updates test TFM to Windows and references WinUI project. |
| tests/OpenClaw.Shared.Tests/VoiceModeSchemaTests.cs | Adds schema/enum/default tests for the shared voice model. |
| tests/OpenClaw.Shared.Tests/OpenClawGatewayClientTests.cs | Adds tests for new gateway chat session normalization + preview handling. |
| tests/OpenClaw.Shared.Tests/CapabilityTests.cs | Adds VoiceCapability behavior tests. |
| src/OpenClaw.Tray.WinUI/Windows/WebChatWindow.xaml.cs | Implements IVoiceChatWindow draft injection into WebChat via WebView2 script. |
| src/OpenClaw.Tray.WinUI/Windows/WebChatVoiceDomState.cs | Adds minimal state container for pending draft mirroring into WebChat DOM. |
| src/OpenClaw.Tray.WinUI/Windows/WebChatVoiceDomBridge.cs | Adds injected DOM bridge script + helper to build draft-setting JS. |
| src/OpenClaw.Tray.WinUI/Windows/VoiceRepeaterWindow.xaml.cs | Adds compact repeater window for transcript/replies + controls + persistence. |
| src/OpenClaw.Tray.WinUI/Windows/VoiceRepeaterWindow.xaml | Adds repeater window layout and settings flyout UI. |
| src/OpenClaw.Tray.WinUI/Windows/VoiceModeWindow.xaml.cs | Adds “Voice Mode” status/config summary window. |
| src/OpenClaw.Tray.WinUI/Windows/VoiceModeWindow.xaml | Adds layout for the voice status/config window. |
| src/OpenClaw.Tray.WinUI/Windows/SettingsWindow.xaml.cs | Integrates VoiceSettingsPanel and makes save flow async to apply voice config. |
| src/OpenClaw.Tray.WinUI/Windows/SettingsWindow.xaml | Adds VoiceSettingsPanel control to Settings UI. |
| src/OpenClaw.Tray.WinUI/Services/Voice/WindowsMediaSpeechToTextRoute.cs | Adds Windows.Media dictation recognizer route. |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceSpeechToTextRouteResources.cs | Adds resource container for STT route assets (recognizer/capture). |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceSpeechToTextRouteKind.cs | Adds route-kind enum for selecting STT pipeline. |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceSpeechToTextRouteFactory.cs | Adds factory to select STT route kind based on provider/runtime. |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceProviderCatalogService.cs | Adds catalog loader/normalizer and provider runtime support checks. |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceDisplayHelper.cs | Adds UI-friendly labels for voice mode/state/runtime. |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceCloudTextToSpeechClient.cs | Adds HTTP/WebSocket cloud TTS client using provider contracts + templating. |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceChatCoordinator.cs | Adds dispatcher-based coordination of draft/turn updates across windows. |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceChatContracts.cs | Adds interfaces for voice runtime/config/control + window mirroring abstractions. |
| src/OpenClaw.Tray.WinUI/Services/Voice/VoiceCaptureService.cs | Adds AudioGraph-based capture service with peak/signal helpers. |
| src/OpenClaw.Tray.WinUI/Services/Voice/SherpaOnnxSpeechToTextRoute.cs | Adds scaffold route for sherpa-onnx STT (not implemented). |
| src/OpenClaw.Tray.WinUI/Services/Voice/IVoiceSpeechToTextRoute.cs | Adds STT route interface. |
| src/OpenClaw.Tray.WinUI/Services/Voice/AudioGraphStreamingSpeechToTextRoute.cs | Adds scaffold route for streaming STT (not implemented). |
| src/OpenClaw.Tray.WinUI/Services/SettingsManager.cs | Persists voice settings, repeater window prefs, and provider configuration store. |
| src/OpenClaw.Tray.WinUI/Services/NodeService.cs | Wires VoiceCapability into node and starts/stops voice with node connect lifecycle. |
| src/OpenClaw.Tray.WinUI/Services/GlobalHotkeyService.cs | Adds second global hotkey (Ctrl+Alt+Shift+V) for voice pause/resume. |
| src/OpenClaw.Tray.WinUI/Properties/AssemblyInfo.cs | Adds InternalsVisibleTo for OpenClaw.Tray.Tests. |
| src/OpenClaw.Tray.WinUI/Helpers/IconHelper.cs | Adds voice tray icon state variants and generated icon caching. |
| src/OpenClaw.Tray.WinUI/Controls/VoiceSettingsPanel.xaml.cs | Adds voice settings UI logic (mode/providers/devices/provider settings draft/apply). |
| src/OpenClaw.Tray.WinUI/Controls/VoiceSettingsPanel.xaml | Adds voice settings UI layout and provider settings editor controls. |
| src/OpenClaw.Tray.WinUI/Assets/voice-providers.json | Adds provider catalog describing STT/TTS options and contracts. |
| src/OpenClaw.Tray.WinUI/Assets/voice-mode-feature.png | Adds README feature icon asset for Voice Mode. |
| src/OpenClaw.Shared/VoiceProviderConfigurationStoreExtensions.cs | Adds config-store helpers + clone + legacy credential migration. |
| src/OpenClaw.Shared/VoiceModeSchema.cs | Adds shared voice command/schema models + JSON converters and provider contract types. |
| src/OpenClaw.Shared/SettingsData.cs | Adds voice settings + repeater prefs + provider config + legacy JSON migration shim. |
| src/OpenClaw.Shared/OpenClawGatewayClient.cs | Extends gateway client for chat session defaults/normalization and preview-based final assistant message capture. |
| src/OpenClaw.Shared/Models.cs | Adds ChatMessageEventArgs event payload type. |
| src/OpenClaw.Shared/Capabilities/VoiceCapability.cs | Adds node capability implementation for voice commands. |
| README.md | Documents Voice Mode feature and adds to feature list/parity table. |
| .gitignore | Ignores .env and repo-local tool/workspace cache directories. |
Comments suppressed due to low confidence (2)
tests/OpenClaw.Tray.Tests/WebChatWindowDomBridgeTests.cs:28
- This test uses
WebChatWindow.BuildTurnsScript(...)and asserts forsetTurns/direction/textserialization, but there is noBuildTurnsScript(and the injectedsetTurns()inWebChatVoiceDomBridge.DocumentCreatedScriptcurrently just clears a legacy host). Either implement turns rendering/serialization or update/remove this test so it matches the current DOM bridge behavior.
tests/OpenClaw.Tray.Tests/WebChatWindowDomBridgeTests.cs:35 - This assertion checks
WebChatWindow.TrayVoiceIntegrationScriptfor DOM anchor logic, but no such script exists in the current implementation (the injected script isWebChatVoiceDomBridge.DocumentCreatedScriptand it does not containgetTurnsAnchor/insertBefore). Update the test to assert against the actual injected script, or add the missing integration script if it's still intended.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public void BuildDraftScript_ClearsWhenDraftIsBlank() | ||
| { | ||
| var script = WebChatWindow.BuildDraftScript(string.Empty); | ||
|
|
||
| Assert.Equal("window.__openClawTrayVoice?.clearDraft?.();", script); | ||
| } |
There was a problem hiding this comment.
These tests reference WebChatWindow.BuildDraftScript(...), but WebChatWindow no longer exposes that API (draft scripting is implemented in WebChatVoiceDomBridge.BuildSetDraftScript). As written, this test file will not compile; update the tests to call WebChatVoiceDomBridge.BuildSetDraftScript and assert against that output instead.
This issue also appears in the following locations of the same file:
- line 15
- line 30
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| <TargetFramework>net10.0-windows10.0.19041.0</TargetFramework> | ||
| <RuntimeIdentifier>win-x64</RuntimeIdentifier> | ||
| <PlatformTarget>x64</PlatformTarget> |
There was a problem hiding this comment.
Hard-coding <RuntimeIdentifier>win-x64</RuntimeIdentifier> and <PlatformTarget>x64</PlatformTarget> makes this test project impossible to build/run on win-arm64 (and forces RID-specific restore even when not needed). If the windows TFM is sufficient, consider removing these, or make them conditional so the project can run on both x64 and arm64.
There was a problem hiding this comment.
@copilot apply changes based on this feedback
Summary
This PR adds the first-pass Windows Voice Mode implementation to the tray app. It's by no means finished, but the first feature-set is working. I apologise for the hugeness... Also there was quite a lot of experimentation and reversion so it's not quite as bad as it looks...
What works now
What didn't work
I tried to fully integrate with the WebChat UI, but couldn't achieve it without nasty local DOM-writes, which is very hacky. Also the Windows STT (Windows.Media.SpeechRecognizer) works pretty well, but it has to have control of the entire pipeline, and we can't select an input device without changing the default devices.
Coming Next
Notes
I kept the architecture intentionally close to the existing tray/node model and documented the current and planned states in
docs/VOICE-MODE.mdas well as the architecture. Also made as few touch points to the existing app as possible to minimise change risk,Happy to receive notes/change requests before merging, etc., and attempt to deal with issues if anyone actually uses it! :)