SpeakFlow

You speak 3x faster than you type. SpeakFlow is a keyboard you talk to.

A macOS menu bar app that turns your voice into text — anywhere.
Press a hotkey, speak naturally, and your words appear in whatever app you're using.

Two Transcription Modes

	Deepgram Nova-3 — Real-time	ChatGPT (GPT-4o) — Batch
How it works	Audio streams to Deepgram over WebSocket; text appears as you speak	Audio is recorded locally, then sent for transcription when you stop
Latency	Words appear in ~300ms	Text appears after you finish speaking
Best for	Live dictation, long-form writing, conversations	Short notes, high-accuracy single takes
Requires	Deepgram API key	ChatGPT login

Switch between modes from the Transcription tab in the settings window.

Deepgram API Key

Deepgram offers a free $200 credit — no credit card required.

Sign up at deepgram.com/pricing
Create an API key in the Deepgram console
Paste it into SpeakFlow via the Accounts tab in settings

Installation

From DMG (recommended)

Download SpeakFlow.dmg from the Releases page
Open the DMG and drag SpeakFlow to Applications
Launch SpeakFlow — the settings window opens automatically
Grant Accessibility and Microphone permissions from the General tab
Log in to ChatGPT or add a Deepgram API key from the Accounts tab

Build from Source

git clone https://github.com/rezkam/SpeakFlow.git
cd SpeakFlow
swift build -c release --product SpeakFlow

# Or build the full .app bundle + DMG:
bash scripts/build-release.sh 0.1.0

Requires macOS 15+ and Swift 6.1+ (Xcode 16.4 or later).

Permissions

SpeakFlow needs two permissions, both granted from the General tab in settings:

Accessibility — required to type transcribed text into any app. Clicking "Grant Access" opens System Settings where you toggle SpeakFlow on.
Microphone — required to hear your voice. Clicking "Grant Access" shows the macOS permission dialog.

Permissions are never requested automatically on launch — you choose when to grant them.

Troubleshooting Accessibility

SpeakFlow is self-signed (ad-hoc), so macOS ties the accessibility trust to the exact binary. If you rebuild the app or install a new version and accessibility stops working:

Open System Settings → Privacy & Security → Accessibility
Remove the old SpeakFlow entry (select it and click the minus button)
Re-add SpeakFlow.app from /Applications (click the plus button)
Restart SpeakFlow

This is required because macOS revokes accessibility trust when the binary changes for apps without an Apple Developer ID signature.

Features

Real-time streaming transcription — words appear as you speak with Deepgram Nova-3; interim results refine in-place using smart diff (only changed characters are retyped, no flickering)
Batch transcription — record first, transcribe after with GPT-4o via ChatGPT
On-device voice activity detection — a neural network model runs locally on Apple Silicon to distinguish speech from silence in real-time; no audio leaves your machine until speech is confirmed
Automatic turn detection — when you stop speaking, silence is detected and the session ends automatically (works in both modes — local VAD for batch, server-side for streaming)
Smart chunking — in batch mode, audio is split at natural sentence boundaries detected by silence analysis
Noise filtering — silent and noise-only chunks are filtered before transcription, saving API calls and improving accuracy
Universal text insertion — transcribed text is typed into the focused app via macOS Accessibility
Launch at login — runs quietly in the menu bar

Audio Pipeline

Streaming (Deepgram)

Capture — 16 kHz, mono, 16-bit PCM from the system microphone
Stream — raw audio is sent over WebSocket to Deepgram Nova-3 (English, monolingual)
Interim results — partial transcriptions appear immediately as you speak
Smart diff — when text updates, only the changed suffix is retyped (common prefix is preserved)
Final results — server finalizes each utterance with punctuation and formatting
Auto-end — after 5+ seconds of server-detected silence following speech, the session ends

Batch (ChatGPT)

Capture — 16 kHz, mono, 16-bit PCM from the system microphone
Voice activity detection — a lightweight neural model runs on Apple Neural Engine, classifying 32ms frames as speech or silence
Speech segmentation — hysteresis thresholds with 3-second silence debounce avoid false ends during natural pauses
Chunking — audio is buffered and sent at the configured interval, waiting for natural pauses
Auto-end — after 5+ seconds of confirmed silence following speech, the session ends
Transcription — speech chunks are sent to the Whisper API; results arrive in order and are typed into the active app

License

Apache License, Version 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/workflows		.github/workflows
Sources		Sources
Tests		Tests
UITests		UITests
docs		docs
scripts		scripts
.gitignore		.gitignore
.swiftlint.yml		.swiftlint.yml
LICENSE		LICENSE
Makefile		Makefile
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
run_chunk_tests.sh		run_chunk_tests.sh
run_comprehensive_tests.sh		run_comprehensive_tests.sh
run_timing_tests.sh		run_timing_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeakFlow

Two Transcription Modes

Deepgram API Key

Installation

From DMG (recommended)

Build from Source

Permissions

Troubleshooting Accessibility

Features

Audio Pipeline

Streaming (Deepgram)

Batch (ChatGPT)

License

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

rezkam/SpeakFlow

Folders and files

Latest commit

History

Repository files navigation

SpeakFlow

Two Transcription Modes

Deepgram API Key

Installation

From DMG (recommended)

Build from Source

Permissions

Troubleshooting Accessibility

Features

Audio Pipeline

Streaming (Deepgram)

Batch (ChatGPT)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages