Skip to content

devmobasa/SpeechToText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeechToText

Low-latency push-to-talk speech-to-text for Linux, built on faster-whisper with GPU acceleration and clean fallbacks. Works on Wayland and X11, with hotkey bindings provided by your window manager or keybinding tool.

Why this exists

This repo is a cleaned, production-ready version of a working F7/F9 push-to-talk workflow. It focuses on:

  • Fast local transcription (GPU when available, CPU fallback when not)
  • Configurable capture + output (type, clipboard, file, stdout)
  • Wayland + X11 support
  • Distro-agnostic setup with clear dependencies

Quick Start

git clone <your-repo-url>
cd SpeechToText
./scripts/install.sh

# Bind hotkeys (example: Hyprland)
# bind = , F7, exec, stt-ptt press en
# bindr = , F7, exec, stt-ptt release en

If you're using the whisper.cpp backend and want to skip Python setup:

STT_SKIP_PYTHON=1 ./scripts/install.sh

Usage

Push-to-talk commands:

stt-ptt press en
stt-ptt release en

Check status / cleanup:

stt-ptt status
stt-ptt cleanup

Configuration

Copy the sample config:

cp config/config.env.example ~/.config/speech-to-text/config.env

Key settings:

  • VOICE_MODEL — whisper model (e.g. large-v3-turbo)
  • VOICE_DEVICEcuda, cpu, or auto
  • VOICE_SOURCE — Pulse/PipeWire source name
  • VOICE_OUTPUTtype, clipboard, file, or stdout
  • VOICE_ENGINEfaster-whisper (default) or whispercpp

Find audio sources:

pactl list sources short

Dependencies

Recording backends (any one works):

  • pw-record (PipeWire)
  • ffmpeg (Pulse/PipeWire)
  • parec (PulseAudio / PipeWire-pulse)
  • arecord (ALSA)

Output tools (choose one):

  • Wayland: wtype or ydotool
  • X11: xdotool
  • Clipboard: wl-clipboard, xclip, or xsel

Python:

  • Python 3.10+
  • faster-whisper (installed via requirements.txt)

Examples

  • Hyprland: examples/hyprland.conf
  • Sway: examples/sway.conf
  • GNOME: examples/gnome.md
  • KDE: examples/kde.md
  • keyd: examples/keyd.conf

whisper.cpp backend (optional)

If you want a Python-free backend, you can use whisper.cpp instead of faster-whisper.

  1. Build or install whisper.cpp and get a .gguf model.
  2. Set these in ~/.config/speech-to-text/config.env:
VOICE_ENGINE=whispercpp
VOICE_WHISPER_CPP_BIN=whisper-cli
VOICE_WHISPER_CPP_MODEL=/path/to/gguf/model.gguf
# Optional:
# VOICE_WHISPER_CPP_ARGS="-nt"

Notes:

  • The script defaults to using -otxt/-of output if supported.
  • If your build doesn't support those flags, set: VOICE_WHISPER_CPP_OUTPUT_MODE=stdout and add the right output flags in VOICE_WHISPER_CPP_ARGS.

Troubleshooting

Run diagnostics:

./scripts/diagnose.sh

Common issues:

  • No audio captured → verify VOICE_SOURCE and pactl list sources short
  • No typing → install wtype (Wayland) or xdotool (X11)
  • CUDA errors → set VOICE_DEVICE=cpu or install correct CUDA libs

Privacy

Audio is recorded to a temporary file in /tmp/speech-to-text and deleted after transcription. Set VOICE_KEEP_AUDIO=1 to keep the last recording for debugging.

License

MIT

About

No description, website, or topics provided.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published