Skip to content

Conversation

@rovo89
Copy link

@rovo89 rovo89 commented Jan 15, 2026

While builtin speech support is nice, consistency is also important.

This option gives developers a way to use the MediaRecorder implementation for all supported browsers, so they can use e.g. OpenAI transcriptions with automated language detection everywhere.

I have not updated any docs, would like to get your feedback first.

While builtin speech support is nice, consistency is also important.

This option gives developers a way to use the MediaRecorder
implementation for all supported browsers, so they can use e.g. OpenAI
transcriptions with automated language detection everywhere.
@vercel
Copy link
Contributor

vercel bot commented Jan 15, 2026

@rovo89 is attempting to deploy a commit to the Vercel Team on Vercel.

A member of the Team first needs to authorize it.

@haydenbleasel
Copy link
Member

Nice one @rovo89 - we could possibly(?) make it more flexible by having prefer = "speech-recognition" | "media-recorder" or something along these lines. WDYT?

@haydenbleasel
Copy link
Member

Or even better, rather than assuming and falling back on preset functionality, let's just accept a mode which defaults to the most supported API, then you can change it easily.

@rovo89
Copy link
Author

rovo89 commented Jan 16, 2026

Let's see why devs might prefer either combination:

  1. only Web Speech API: I'll take it if it's easy and free, have no special requirements and ignore unsupported browsers
  2. both (prefer Web Speech API): I want to offer voice input to more users, so I'm willing to pay for those without native support (but I'm glad that a certain share of users doesn't cost me anything). I don't have special requirements and don't care about differences between the two implementations.
  3. both (prefer Media Recorder): I don't think the fallback would ever be taken since Media Recorder has more availability.
  4. only Media Recorder: I have special requirements, such as choice of the provider/model (more predictable quality), auto language-detection, additional prompts with context (e.g. it's a medical setting) or even just offering the same to every user. I'm willing to pay for everyone.

Web Speech API might also be available offline, however when I tried out the new component on the train, it always failed due to network interruptions, so that's not necessarily given. The "no control" part would still apply in this case, it would only mean that it's faster in local mode.

In order to turn off Media Recorder, we can just not pass onAudioRecorded.

I think prefer = "media-recorder" sounds like it would try Web Speech API as a fallback - which doesn't make sense to me (case 3). Similarly, my suggestion of preferWebSpeechApi makes less sense if onAudioRecorded isn't provided - in that case, it's not about preferring Web Speech API, it's the only thing to try (case 1).

What about one option with values "auto" | "web-speech" | "media-recorder"? That would make it more explicit. Not sure if that's what you meant, but I think it's important to have an "auto" option, otherwise the caller has to figure it out themselves.


By the way, did you consider offloading the state and actions into a provider, similar to <PromptInputProvider>, so that devs can decide to completely hide the button if not available, disable other controls while listening or even bring their own button (a <PromptInputButton> for example)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants