Skip to content

PatelUtkarsh/audio-type

Repository files navigation

AudioType

A native macOS menu bar app for voice-to-text. Hold fn to record, release to transcribe and type.

Features

  • Hold fn key to record voice, release to transcribe and insert text
  • Local macOS transcription — Apple Speech on-device recognition (no API key, no internet, fully private)
  • Cloud providers for higher accuracyGroq and OpenAI Whisper APIs
  • Works in any app — types transcribed text into the focused application
  • Self-serve — bring your own API key (Groq free tier, or OpenAI)
  • Lightweight — runs in menu bar, no dock icon

Privacy & Data

AudioType offers two transcription modes with different privacy trade-offs:

Local macOS Speech API (private by default)

  • Uses Apple's built-in SFSpeechRecognizer — runs entirely on-device when on-device recognition is available (macOS 13+)
  • No audio leaves your machine, no API key needed, no internet required
  • Works out of the box with zero configuration
  • Accuracy is good for everyday dictation, though cloud providers may perform better for specialized vocabulary

Cloud providers (higher accuracy)

  • When using Groq or OpenAI, audio recordings are sent to the provider's servers for transcription
  • An internet connection is required for cloud transcription
  • Your API keys are stored locally in the macOS Keychain
  • No audio is saved to disk locally — it is recorded in memory, sent to the cloud provider, and discarded
  • See Groq's data policy or OpenAI's data policy for how they handle your data

Note: AudioType previously bundled a local Whisper model via whisper.cpp. We replaced it with Apple's native Speech API, which provides better system integration and comparable on-device accuracy without bundling a large model. If you still want the whisper.cpp version, see AudioType v1.1.1.

Requirements

  • macOS 14.0 (Sonoma) or later
  • Apple Silicon or Intel Mac
  • Internet connection (for cloud engines; not needed for Apple Speech)
  • A cloud API key (optional — app works without one using Apple Speech):

Setup

1. Get an API Key (optional)

AudioType works out of the box using Apple's on-device speech recognition. For higher accuracy, configure a cloud provider:

Option A: Groq (free tier)

  1. Go to console.groq.com/keys
  2. Create an account or sign in
  3. Generate a new API key
  4. Copy the key — you'll paste it into AudioType on first launch

Groq's free tier is generous enough for typical dictation use. See Groq's rate limits for current details.

Option B: OpenAI

  1. Go to platform.openai.com/api-keys
  2. Create an account or sign in
  3. Generate a new API key
  4. Copy the key — you'll paste it into AudioType Settings

2. Install AudioType

Download Release

  1. Download the latest .dmg from Releases
  2. Open the DMG and drag AudioType to Applications
  3. First launch — Right-click the app and select "Open" (required for unsigned apps)
  4. Click "Open" in the dialog to confirm

Note: Since this app is not notarized, macOS will block it on first launch. You can also bypass this via Terminal:

xattr -cr /Applications/AudioType.app

Build from Source

# Clone the repository
git clone https://github.com/PatelUtkarsh/audio-type.git
cd audio-type

# Build and create app bundle
make app

# Run the app
open AudioType.app

3. First Launch

On first launch, AudioType will ask you to:

  1. Grant Microphone access — to record your voice
  2. Grant Accessibility access — to type text into other apps
  3. Grant Speech Recognition — for on-device Apple Speech
  4. Enter a Groq API key (optional) — for cloud transcription

You can skip the API key step to use Apple Speech. Additional cloud providers (OpenAI) can be configured later in Settings.

Permissions

Permission Purpose
Microphone Record voice for transcription
Accessibility Detect fn key and type text into apps
Speech Recognition On-device Apple Speech transcription
Internet Send audio to cloud provider (Groq or OpenAI)

Usage

  1. Launch AudioType — appears in menu bar with a waveform icon
  2. Hold fn key — starts recording (overlay shows waveform)
  3. Release fn key — sends audio to the active engine and types the result
  4. Click menu bar icon — access Settings or Quit

Settings

  • Engine Selection:
    • Auto (default) — uses Groq if configured, then OpenAI, then Apple Speech (local)
    • Groq Whisper — always use Groq (requires API key)
    • OpenAI Whisper — always use OpenAI (requires API key)
    • Apple Speech — always use on-device recognition
  • Groq API Key — add or update your Groq key
  • OpenAI API Key — add or update your OpenAI key
  • Model Selection:
    • Groq: Whisper Large V3 Turbo (default, faster) or Whisper Large V3 (most accurate)
    • OpenAI: GPT-4o Mini Transcribe (default, balanced), GPT-4o Transcribe (best), or Whisper V2 (cheapest)
  • Language — auto-detect or choose from 25+ languages

How It Works

fn key held -> Record audio -> Release fn key
                                    |
                                    v
                            Encode audio as WAV
                                    |
                                    v
                            EngineResolver picks engine
                            (Groq / OpenAI / Apple Speech)
                                    |
                                    v
                            Text post-processing
                            (capitalization, corrections)
                                    |
                                    v
                            Simulate keyboard typing
                            into focused app

Tech Stack

  • Swift — native macOS app
  • Groq API — cloud speech-to-text (Whisper Large V3)
  • OpenAI API — cloud speech-to-text (GPT-4o Transcribe / Whisper)
  • Apple Speech — on-device speech-to-text (SFSpeechRecognizer)
  • AVAudioEngine — low-latency audio capture
  • CGEvent — global hotkey detection and keyboard simulation
  • macOS Keychain — secure API key storage

Troubleshooting

App doesn't respond to fn key

  • Check Accessibility permission in System Settings > Privacy & Security > Accessibility
  • Try removing and re-adding AudioType from the list

No audio captured

  • Check Microphone permission in System Settings > Privacy & Security > Microphone
  • Ensure your microphone is working in other apps

Transcription fails

  • Check your internet connection (for cloud engines)
  • Verify your API key is valid in Settings
  • If you see "Rate limited", wait a moment and try again
  • Check Groq status or OpenAI status for service issues

"API key required" error

  • Open Settings from the menu bar icon and enter your API key
  • Get a free Groq key at console.groq.com/keys
  • Or use Apple Speech (no key required) by setting engine to Auto or Apple Speech

Rate Limits

Groq offers a free tier that is generous enough for typical dictation use. For current limits and pricing, see Groq's rate limits and pricing.

OpenAI uses pay-as-you-go pricing. See OpenAI's pricing for current rates.

License

MIT

Acknowledgments

  • Groq for fast cloud inference
  • OpenAI for Whisper and GPT-4o transcription models
  • This project is entirely vibe coded with AI assistance

About

Voice-to-text for macOS powered by Groq / Openai / cloud transcription or use Mac OS native.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors