Best Open Source Python Speech Software

Python Speech Software

Speech Python Clear Filters

Browse free open source Python Speech Software and projects below. Use the toggles on the left to filter open source Python Speech Software by OS, license, language, programming language, and project status.

Passwordless Authentication and Passwordless Security
Identity is everything. Protect it with Duo.

It’s no secret — passwords can be a real headache, both for the people who use them and the people who manage them. Over time, we’ve created hundreds of passwords, it’s easy to lose track of them and they’re easily compromised. Fortunately, passwordless authentication is becoming a feasible reality for many businesses. Duo can help you get there.

Get a Free Trial
Our Free Plans just got better! | Auth0 by Okta
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

DeepSpeech

Open source embedded speech-to-text engine

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier. A pre-trained English model is available for use and can be downloaded following the instructions in the usage docs. If you want to use the pre-trained English model for performing speech-to-text, you can download it (along with other important inference material) from the DeepSpeech releases page.

Downloads: 34 This Week

Last Update: 2021-04-08
See Project
2

SpeechRecognition

Speech recognition module for Python

Library for performing speech recognition, with support for several engines and APIs, online and offline. Recognize speech input from the microphone, transcribe an audio file, save audio data to an audio file. Show extended recognition results, calibrate the recognizer energy threshold for ambient noise levels (see recognizer_instance.energy_threshold for details). Listening to a microphone in the background, various other useful recognizer features. The easiest way to install this is using pip install SpeechRecognition. The first software requirement is Python 2.6, 2.7, or Python 3.3+. This is required to use the library. PyAudio is required if and only if you want to use microphone input (Microphone). PyAudio version 0.2.11+ is required, as earlier versions have known memory management bugs when recording from microphones in certain situations. To hack on this library, first make sure you have all the requirements listed in the "Requirements" section.

Downloads: 10 This Week

Last Update: 2025-03-23
See Project
3

ASR-Builder

ASR-Builder provides an easy-to-use interface to the HTK toolkit, that allows users to build ASR systems. ASR-Builder provides a platform that performs house-keeping tasks when using HTK and also provides default training/testing/recognition scripts.

Downloads: 1 This Week

Last Update: 2013-04-26
See Project
4

Annotation Graph Toolkit

AGTK is a suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs.

Downloads: 1 This Week

Last Update: 2013-04-25
See Project
Our Free Plans just got better! | Auth0 by Okta
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

Book2m4b

This is a Linux project that acts as a front end to cdparanoia, sox, and ffmpeg with the hope of making it incredibly simple to rip many audiobook cds into one mono, audiobook (m4b) format file for use in audio players capable of playing audiobooks.

Downloads: 1 This Week

Last Update: 2019-03-16
See Project
6

FM2TXT

RtlSdr listen to radio, recognize audio, and writes text file log

Just log your favorite FM station speech to a text file using rtl-sdr dongle and speech recognition. Cross-platform tool. Follow the README on the download page for Windows installation. https://sourceforge.net/projects/fm2txt-rtlsdr/files/ If you prefer GitHub source, not SF: https://github.com/randaller/fm2txt For those, who want to recognize from soundcard, not from rtl-sdr (this allows to transcribe NFM etc): https://github.com/randaller/souncard2txt

Downloads: 1 This Week

Last Update: 2017-12-17
See Project
7

Voice keyboard

Voice keyboard/dictation. Aims to be a total substitute for a keyboard. Spell out words letter by letter (using code: alpha, bravo, ..). Arrow keys, modifiers work. Speak whole words (but whole word accuracy is not good). Attach commands to some word

Downloads: 1 This Week

Last Update: 2015-04-20
See Project
8

Sayz Me

Sayz Me is a text-to-speech application for Windows. Text can be typed in or read from clipboard. Words are highlighted when spoken. Select voice, adjust reading speed, voice pitch, font and color. Simple and easy to use.

2 Reviews

Downloads: 1 This Week

Last Update: 2013-04-11
See Project
9

A.L.V.I. Bot

A.L.V.I. e' nato per essere un semplice ma modulare Bot, in grado di interagire con l'essere umano attraverso il linguaggio naturale ed eseguire svariati compiti, come leggere ad alta voce Mail, notizie, Feeds. Tutto in Italiano!

Downloads: 0 This Week

Last Update: 2014-12-21
See Project
Comprehensive Cybersecurity to Safeguard Your Organization | SOCRadar
See what hackers already know about your organization – and stop them from getting in.

Protect your organization from cyber threats with SOCRadar’s cutting-edge threat intelligence. Gain 360° visibility into your digital assets, monitor the dark web, and stay ahead of hackers with real-time insights. Start for free and transform your cybersecurity today.

Free Trial
10

AIChatbot

An extensible (by plugin) chatbot project

Downloads: 0 This Week

Last Update: 2015-07-02
See Project
11

ASTA - Auto. Subtitle Timing Annotator

A collection of scripts and programs to automatically annotate video/audio for subtitles. Basically relies on a MARSYAS (Music Analysis, Retrieval and Synthesis for Audio Signals) plug-in for detecting human voice in polyphonic recordings.

Downloads: 0 This Week

Last Update: 2014-04-24
See Project
12

AarTon

AarTon is an automated text-to-speech application. It allows user to enter text in a web-based front-end and render these texts via a multi-channel sound card.

Downloads: 0 This Week

Last Update: 2013-11-14
See Project
13

Audio Trigger

Performs actions on detected volume threshold Examples : - Launch music on clap - Launch speech recording when you start speaking - Launch guard webcam when a significant sound is detected - Increase or decrease headphones volume when ambient noise pass

Downloads: 0 This Week

Last Update: 2013-04-01
See Project
14

BATS (Blind Audio Tactile Mapping System

The Blind Audio Tactile Mapping System (BATS) attempts to address the lack of spatial information available for visually impaired students.

Downloads: 0 This Week

Last Update: 2014-05-09
See Project
15

DJBorg

DJBorg turns your MP3 playlist into a personalized radio station, adding randomly-generated DJ banter between tracks. Song information (based on ID3 tags), news, weather, and headlines are announced via a text-to-speech engine.

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
16

Defox text to speech and downloader

Written or imported text offline read or online download.

This software design to convert text to speech and download the converted speech. Description : • Installation setup with two languages (English, French) • Two areas called text reading and speech downloading • Many languages supported to download center Note 1: I'm a student yet and I'm not in the software designing industry. Therefore maybe I haven't software making skills. I'm worried about that. ! Note 2 : When you double click on the software maybe it will get some seconds to open. That's not my fault. I used Python language to make this software and Python was not supported speedy to modern computers.

1 Review

Downloads: 0 This Week

Last Update: 2019-09-27
See Project
17

Eve

Eve is a AI project written in python that takes commands verbally or textually to control the computer and eveyday functions.

Downloads: 0 This Week

Last Update: 2013-04-03
See Project
18

InproTK

An Incremental Spoken Dialogue Processing Toolkit

InproTK is an Incremental Spoken Dialogue Processing Toolkit, that is, a toolkit to help you build dialogue systems that listen and talk incrementally, allowing for advanced interactional behaviour. Please see our Wiki for more information: http://sourceforge.net/p/inprotk/wiki/

Downloads: 0 This Week

Last Update: 2015-06-16
See Project
19

Moshi

A speech-text foundation model for real time dialogue

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codecs like SpeechTokenizer (50 Hz, 4kbps), or SemantiCodec (50 Hz, 1.3kbps). Moshi models two streams of audio: one corresponds to Moshi, and the other one to the user. At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation. A small Depth Transformer models inter codebook dependencies for a given time step, while a large, 7B parameter Temporal Transformer models the temporal dependencies.

Downloads: 0 This Week

Last Update: 2024-11-05
See Project
20

Open Interface for Speech Synthesis

The Open Interface for Speech Synthesis (OISS) provides an interface to speech synthesis hardware and software for end-user applications under Unix.

Downloads: 0 This Week

Last Update: 2013-02-21
See Project
21

PhoneBlogger

PhoneBlogger allows you to post to a weblog by phone. PhoneBlogger is written in VoiceXML, Python, and JavaScript.

Downloads: 0 This Week

Last Update: 2016-08-20
See Project
22

Python Gutenberg E-text Project

The PyGE (Python Gutenberg E-text) project is a suite of GUI desktop utilities written in Python to promote and facilitate awareness and enjoyment of works of literature that are available from the archives of Project Gutenberg.

Downloads: 0 This Week

Last Update: 2013-03-22
See Project
23

QWave

QWave: Qt-based waveform display and audio playback class library.

Downloads: 0 This Week

Last Update: 2013-05-01
See Project
24

RNNLIB

RNNLIB is a recurrent neural network library for sequence learning problems. Applicable to most types of spatiotemporal data, it has proven particularly effective for speech and handwriting recognition. full installation and usage instructions given at http://sourceforge.net/p/rnnl/wiki/Home/

2 Reviews

Downloads: 0 This Week

Last Update: 2016-11-28
See Project
25

SWIPE' pitch extractor

This is a fast C implementation of Arturo Camacho's SWIPE' pitch extraction algorithm. See the project homepage for more about the advantages of the SWIPE' algorithm. swipe-1.0.tar.gz contains the current source, which should compile quite neatly.

Downloads: 0 This Week

Last Update: 2013-04-11
See Project