feat: starter refactor #12

jpvajda · 2025-12-18T00:16:44Z

PR Summary

Built Node.js voice agent application with WebSocket proxy to Deepgram Agent API, including full frontend UI with Deepgram design system and cross-browser audio support.

Backend (`server.js`)

Implemented Express server with WebSocket proxy to Deepgram Agent API using @deepgram/sdk
Added proper event forwarding (Welcome, SettingsApplied, ConversationText, Audio, Error, etc.)
Implemented error handling for missing API key and audio format errors per AsyncAPI spec
Added CONFIG object for environment-based configuration (dev/prod modes)
Configured http-proxy-middleware for Vite dev server proxying

Frontend (`index.html` + `main.js`)

UI Components: Integrated Deepgram design system (@deepgram/styles)
- Header with Font Awesome icons and Docs link
- Status banner using dg-status components (success/error/info states)
- Action buttons with proper styling (dg-btn classes)
- Chat-style conversation history (user left-aligned, agent right-aligned)
Audio Pipeline:
- AudioContext management with 24000 Hz sample rate for Chrome/Safari
- Browser-specific audio constraints (Firefox vs Chrome)
- ScriptProcessor for real-time microphone streaming
- Audio queue management for smooth playback
- Int16 ↔ Float32 conversion with proper scaling
WebSocket Communication:
- Settings message with audio input/output configuration
- Binary audio data handling (Blob and ArrayBuffer support)
- Message type handling (Welcome, SettingsApplied, ConversationText, Error events)

Cross-Browser Fixes

Firefox:
- Native sample rate AudioContext to avoid mismatch errors
- AudioContext resume logic with 100ms initialization delay
- Minimal audio constraints (Firefox ignores most)
Chrome/Safari:
- Forced 24000 Hz AudioContext
- Enhanced audio constraints with echo cancellation and noise suppression
- Google-specific constraint support
Audio Quality: Enabled echo cancellation and noise suppression for desktop microphones

Test Plan

jpvajda · 2025-12-18T00:18:05Z

Current state of mic / audio quality for the Agent:

Chrome / Safari

Mic input is decent but it sometimes can't detect words being said
Audio output is a bit distorted, a little crackling and grainy

Firefox

Mic input is pretty poor, seems very delayed
Audio output is a bit distorted, a little crackling and grainy , Agent speaks slower than in Chrome / Safari

jpvajda · 2025-12-18T20:50:19Z

Review Feedback:

Maybe move controls down under conversation ✅
allow a user to inject a message ✅
add a simple welcome message ✅
Look at EmilyAI, for reference ✅
Don't use browser-agent, it's a black box ✅
Don't over architect it ✅
Ask Dan about what they've done to solve agent playback issue ✅

Notes:

I figured out the Mic issue I saw from EmilyAI,
it was the sampleRate: 24000, needs to be 16000. Works much better
Figure out the click we are hearing in the audio, when the agent speaks. might be header or container issues. container is set to none, linear 16 defaults to wav, could be causing the issue.

waveform of click:

jpvajda · 2025-12-19T23:40:19Z

Where I left off, everything is done, 🥳 but I'm still getting that click on the first part of the agent response audio, I tried using a function that builds a standard 44‑byte PCM WAV header (little-endian) for uncompressed audio and returns it as a Node Buffer. But the click was still there. 😢

jpvajda added 7 commits December 16, 2025 14:15

updates cursor rules

f0f92e2

app-build-1

45cf361

gets websocket working

c43e542

improves UI

98777b1

audio input issues

e753ed4

fixes firefox issues

80169d4

removes some audioConstraints

3715681

jpvajda changed the title ~~Feat/starter refactor~~ feat: starter refactor Dec 18, 2025

jpvajda requested review from lukeocodes and naomi-deepgram December 18, 2025 00:38

jpvajda added 5 commits December 18, 2025 14:10

fixes mic issues

b4dcb26

fixed grainy audio

ec3e0a1

adds inject message

560366f

helper content

bf2f2f5

removes debugging

0ef514e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: starter refactor #12

feat: starter refactor #12

Uh oh!

jpvajda commented Dec 18, 2025

Uh oh!

jpvajda commented Dec 18, 2025

Uh oh!

jpvajda commented Dec 18, 2025 •

edited

Loading

Uh oh!

jpvajda commented Dec 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: starter refactor #12

Are you sure you want to change the base?

feat: starter refactor #12

Uh oh!

Conversation

jpvajda commented Dec 18, 2025

PR Summary

Backend (server.js)

Frontend (index.html + main.js)

Cross-Browser Fixes

Test Plan

Uh oh!

jpvajda commented Dec 18, 2025

Uh oh!

jpvajda commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpvajda commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Backend (`server.js`)

Frontend (`index.html` + `main.js`)

jpvajda commented Dec 18, 2025 •

edited

Loading

jpvajda commented Dec 19, 2025 •

edited

Loading