Skip to content

Conversation

@jpvajda
Copy link
Contributor

@jpvajda jpvajda commented Dec 18, 2025

PR Summary

Built Node.js voice agent application with WebSocket proxy to Deepgram Agent API, including full frontend UI with Deepgram design system and cross-browser audio support.

Backend (server.js)

  • Implemented Express server with WebSocket proxy to Deepgram Agent API using @deepgram/sdk
  • Added proper event forwarding (Welcome, SettingsApplied, ConversationText, Audio, Error, etc.)
  • Implemented error handling for missing API key and audio format errors per AsyncAPI spec
  • Added CONFIG object for environment-based configuration (dev/prod modes)
  • Configured http-proxy-middleware for Vite dev server proxying

Frontend (index.html + main.js)

  • UI Components: Integrated Deepgram design system (@deepgram/styles)
    • Header with Font Awesome icons and Docs link
    • Status banner using dg-status components (success/error/info states)
    • Action buttons with proper styling (dg-btn classes)
    • Chat-style conversation history (user left-aligned, agent right-aligned)
  • Audio Pipeline:
    • AudioContext management with 24000 Hz sample rate for Chrome/Safari
    • Browser-specific audio constraints (Firefox vs Chrome)
    • ScriptProcessor for real-time microphone streaming
    • Audio queue management for smooth playback
    • Int16 ↔ Float32 conversion with proper scaling
  • WebSocket Communication:
    • Settings message with audio input/output configuration
    • Binary audio data handling (Blob and ArrayBuffer support)
    • Message type handling (Welcome, SettingsApplied, ConversationText, Error events)

Cross-Browser Fixes

  • Firefox:
    • Native sample rate AudioContext to avoid mismatch errors
    • AudioContext resume logic with 100ms initialization delay
    • Minimal audio constraints (Firefox ignores most)
  • Chrome/Safari:
    • Forced 24000 Hz AudioContext
    • Enhanced audio constraints with echo cancellation and noise suppression
    • Google-specific constraint support
  • Audio Quality: Enabled echo cancellation and noise suppression for desktop microphones

Test Plan

  • pnpm run security-check - no issues.
  • Chrome: Full functionality (mic, agent responses, UI)
  • Firefox: Audio capture working, no sample rate errors
  • Safari: Microphone and playback functional
  • UI rendering correctly across browsers
  • Conversation history displaying properly
  • Status indicators working
  • Error handling functional
  • App works in Dev and Prod Modes
  • Contract Compliance pases: WS_URL=ws://localhost:3000 pnpm run test:agent

@jpvajda jpvajda changed the title Feat/starter refactor feat: starter refactor Dec 18, 2025
@jpvajda
Copy link
Contributor Author

jpvajda commented Dec 18, 2025

Current state of mic / audio quality for the Agent:

Chrome / Safari

  • Mic input is decent but it sometimes can't detect words being said
  • Audio output is a bit distorted, a little crackling and grainy

Firefox

  • Mic input is pretty poor, seems very delayed
  • Audio output is a bit distorted, a little crackling and grainy , Agent speaks slower than in Chrome / Safari

@jpvajda
Copy link
Contributor Author

jpvajda commented Dec 18, 2025

Review Feedback:

  • Maybe move controls down under conversation ✅
  • allow a user to inject a message ✅
  • add a simple welcome message ✅
  • Look at EmilyAI, for reference ✅
  • Don't use browser-agent, it's a black box ✅
  • Don't over architect it ✅
  • Ask Dan about what they've done to solve agent playback issue ✅

Notes:

  • I figured out the Mic issue I saw from EmilyAI,
    it was the sampleRate: 24000, needs to be 16000. Works much better

  • Figure out the click we are hearing in the audio, when the agent speaks. might be header or container issues. container is set to none, linear 16 defaults to wav, could be causing the issue.

waveform of click:

waveform

@jpvajda
Copy link
Contributor Author

jpvajda commented Dec 19, 2025

Where I left off, everything is done, 🥳 but I'm still getting that click on the first part of the agent response audio, I tried using a function that builds a standard 44‑byte PCM WAV header (little-endian) for uncompressed audio and returns it as a Node Buffer. But the click was still there. 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants