You speak 3x faster than you type. SpeakFlow is a keyboard you talk to.
A macOS menu bar app that turns your voice into text — anywhere.
Press a hotkey, speak naturally, and your words appear in whatever app you're using.
| Deepgram Nova-3 — Real-time | ChatGPT (GPT-4o) — Batch | |
|---|---|---|
| How it works | Audio streams to Deepgram over WebSocket; text appears as you speak | Audio is recorded locally, then sent for transcription when you stop |
| Latency | Words appear in ~300ms | Text appears after you finish speaking |
| Best for | Live dictation, long-form writing, conversations | Short notes, high-accuracy single takes |
| Requires | Deepgram API key | ChatGPT login |
Switch between modes from the Transcription tab in the settings window.
Deepgram offers a free $200 credit — no credit card required.
- Sign up at deepgram.com/pricing
- Create an API key in the Deepgram console
- Paste it into SpeakFlow via the Accounts tab in settings
- Download
SpeakFlow.dmgfrom the Releases page - Open the DMG and drag SpeakFlow to Applications
- Launch SpeakFlow — the settings window opens automatically
- Grant Accessibility and Microphone permissions from the General tab
- Log in to ChatGPT or add a Deepgram API key from the Accounts tab
git clone https://github.com/rezkam/SpeakFlow.git
cd SpeakFlow
swift build -c release --product SpeakFlow
# Or build the full .app bundle + DMG:
bash scripts/build-release.sh 0.1.0Requires macOS 15+ and Swift 6.1+ (Xcode 16.4 or later).
SpeakFlow needs two permissions, both granted from the General tab in settings:
- Accessibility — required to type transcribed text into any app. Clicking "Grant Access" opens System Settings where you toggle SpeakFlow on.
- Microphone — required to hear your voice. Clicking "Grant Access" shows the macOS permission dialog.
Permissions are never requested automatically on launch — you choose when to grant them.
SpeakFlow is self-signed (ad-hoc), so macOS ties the accessibility trust to the exact binary. If you rebuild the app or install a new version and accessibility stops working:
- Open System Settings → Privacy & Security → Accessibility
- Remove the old SpeakFlow entry (select it and click the minus button)
- Re-add
SpeakFlow.appfrom/Applications(click the plus button) - Restart SpeakFlow
This is required because macOS revokes accessibility trust when the binary changes for apps without an Apple Developer ID signature.
- Real-time streaming transcription — words appear as you speak with Deepgram Nova-3; interim results refine in-place using smart diff (only changed characters are retyped, no flickering)
- Batch transcription — record first, transcribe after with GPT-4o via ChatGPT
- On-device voice activity detection — a neural network model runs locally on Apple Silicon to distinguish speech from silence in real-time; no audio leaves your machine until speech is confirmed
- Automatic turn detection — when you stop speaking, silence is detected and the session ends automatically (works in both modes — local VAD for batch, server-side for streaming)
- Smart chunking — in batch mode, audio is split at natural sentence boundaries detected by silence analysis
- Noise filtering — silent and noise-only chunks are filtered before transcription, saving API calls and improving accuracy
- Universal text insertion — transcribed text is typed into the focused app via macOS Accessibility
- Launch at login — runs quietly in the menu bar
- Capture — 16 kHz, mono, 16-bit PCM from the system microphone
- Stream — raw audio is sent over WebSocket to Deepgram Nova-3 (English, monolingual)
- Interim results — partial transcriptions appear immediately as you speak
- Smart diff — when text updates, only the changed suffix is retyped (common prefix is preserved)
- Final results — server finalizes each utterance with punctuation and formatting
- Auto-end — after 5+ seconds of server-detected silence following speech, the session ends
- Capture — 16 kHz, mono, 16-bit PCM from the system microphone
- Voice activity detection — a lightweight neural model runs on Apple Neural Engine, classifying 32ms frames as speech or silence
- Speech segmentation — hysteresis thresholds with 3-second silence debounce avoid false ends during natural pauses
- Chunking — audio is buffered and sent at the configured interval, waiting for natural pauses
- Auto-end — after 5+ seconds of confirmed silence following speech, the session ends
- Transcription — speech chunks are sent to the Whisper API; results arrive in order and are typed into the active app
Apache License, Version 2.0
