0% found this document useful (0 votes)

6 views4 pages

Labs 9

This document outlines an experiment focused on implementing Text to Speech (TTS) recognition and synthesis using APIs. It provides a detailed description of speech synthesis and recognition, including steps to use the gTTS library for converting text to speech and the SpeechRecognition library for transcribing audio files in Google Colab. Additionally, it discusses speech segmentation and presents tasks for generating audio and comparing transcriptions.

Uploaded by

Matrix Gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views4 pages

Labs 9

Uploaded by

Matrix Gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

EXPERIMENT 9 Text to Speech recognition and Synthesis through APIs

Aim:
To implement the algorithm for Text to Speech recognition and Synthesis through APIs.

Description:
1. Speech synthesis is the artificial production of human speech. A computer system used
for this purpose is called a speech synthesizer, and can be implemented in software or
hardware products. A text-to-speech (TTS) system converts normal language text into
speech; other systems render symbolic linguistic representations like phonetic
transcriptions into speech. The reverse process is speech recognition.
Synthesized speech can be created by concatenating pieces of recorded speech that are
stored in a database. Systems differ in the size of the stored speech units; a system that
stores phones or diphones provides the largest output range, but may lack clarity. For
specific usage domains, the storage of entire words or sentences allows for high-quality
output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other
human voice characteristics to create a completely "synthetic" voice output.

Here’s how you can use Google Text-to-Speech (gTTS) in Google Colab to convert text to
speech. The gTTS library is straightforward and well-suited for generating speech in multiple
languages.

Step 1: Install the gTTS Library

First, install the gTTS library.

# Install gTTS
!pip install gTTS

Step 2: Convert Text to Speech and Save as an Audio File

Now, you can use the gTTS library to convert text into speech and save it as an MP3 file.

from gtts import gTTS

# Specify the text and language

text = "Hello, my name is (your full name)."
language = 'en' # Language can be changed to any supported language code,
like 'es' for Spanish, 'fr' for French, etc.

# Create a gTTS object

speech = gTTS(text=text, lang=language, slow=False)

# Save the audio file

speech.save("output_audio.mp3")

Step 3: Play the Audio (Optional)

You can play the audio file directly in Colab using the IPython.display.Audio function.
from IPython.display import Audio

# Play the saved audio file

Audio("output_audio.mp3")

This code will convert your text into speech, save it as an MP3 file, and allow you to play or
download it directly from Colab.

2. Speech recognition, also known as automatic speech recognition (ASR), computer

speech recognition, or speech-to-text, is a capability which enables a program to
process human speech into a written format. While it’s commonly confused with voice
recognition, speech recognition focuses on the translation of speech from a verbal
format to a text one whereas voice recognition just seeks to identify an individual user’s
voice.
Key features of effective speech recognition
Many speech recognition applications and devices are available, but the more advanced
solutions use AI and machine learning. They integrate grammar, syntax, structure, and
composition of audio and voice signals to understand and process human speech.
Ideally, they learn as they go — evolving responses with each interaction.
The best kind of systems also allow organizations to customize and adapt the technology
to their specific requirements — everything from language and nuances of speech to
brand recognition. For example:
 Language weighting: Improve precision by weighting specific words that are
spoken frequently (such as product names or industry jargon), beyond terms already
in the base vocabulary.
 Speaker labeling: Output a transcription that cites or tags each speaker’s
contributions to a multi-participant conversation.
 Acoustics training: Attend to the acoustical side of the business. Train the system to
adapt to an acoustic environment (like the ambient noise in a call center) and speaker
styles (like voice pitch, volume and pace).
 Profanity filtering: Use filters to identify certain words or phrases and sanitize
speech output.

Here’s a Python code snippet to perform speech recognition in Google Colab using the
SpeechRecognition library. This code will allow you to transcribe speech from an audio
file.

Step 1: Install Required Libraries

First, you need to install the SpeechRecognition and pydub libraries. pydub is useful for
handling audio files in different formats.

# Install the required libraries

!pip install SpeechRecognition pydub
Step 2: Import and Set Up Libraries

Import the necessary libraries and set up the recognizer.

import speech_recognition as sr
from pydub import AudioSegment

Step 3: Upload an Audio File

Google Colab allows you to upload files directly. Use the following code to upload an audio
file.

from google.colab import files

# Upload an audio file

uploaded = files.upload()

Make sure the audio file is in a format supported by SpeechRecognition (e.g., .wav, .flac,
or .mp3). If it's not in .wav format, you may need to convert it using pydub.

Step 4: Convert Audio to WAV Format (if needed)

If your file is in .mp3 format, you can convert it to .wav using pydub.

# Convert audio file to WAV format

audio_file = next(iter(uploaded.keys())) # Get the uploaded file name
sound = AudioSegment.from_file(audio_file)
wav_file = "converted_audio.wav"
sound.export(wav_file, format="wav")

Step 5: Perform Speech Recognition

Use the SpeechRecognition library to transcribe the audio.

# Initialize recognizer
recognizer = sr.Recognizer()

# Load the audio file and recognize the speech

with sr.AudioFile(wav_file) as source:
audio_data = recognizer.record(source)

# Recognize speech using Google Web Speech API

try:
text = recognizer.recognize_google(audio_data)
print("Transcribed Text:")
print(text)
except sr.UnknownValueError:
print("Speech Recognition could not understand the audio.")
except sr.RequestError as e:
print(f"Could not request results; {e}")

Step 6: Optional - Display the Transcribed Text

The transcribed text will be printed directly in the Colab output cell.

3. Speech segmentation is the process of identifying the boundaries between words,

syllables, or phonemes in spoken natural languages. The term applies both to the
mental processes used by humans, and to artificial processes of natural language
processing.
Speech segmentation is a subfield of general speech perception and an important sub
problem of the technologically focused field of speech recognition, and cannot be
adequately solved in isolation. As in most natural language processing problems, one
must take into account context, grammar, and semantics, and even so the result is often
a probabilistic division (statistically based on likelihood) rather than a categorical one.
Though it seems that coarticulation—a phenomenon which may happen between
adjacent words just as easily as within a single word—presents the main challenge in
speech segmentation across languages.

Task 1: Implement the gTTS code and provide the link. You must use your full name as
input text and generate audio in English. (4
marks)

(optional: Try text in your mother tongue and generate audio file)

Task 2: Speech recognition. Implement the SpeechRecognition code and provide the link.
You must use the generated audio in Task 1 as input to Task 2.

Compare the text input given in Task 1 and the Transcribed text obtained from Task 2. Test
more sentences and write your inference (6 marks)

100 Essential Resources For Hardware & Electrical Engineers Ebook
No ratings yet
100 Essential Resources For Hardware & Electrical Engineers Ebook
65 pages
Visualizing Criminal Networks White Papers
No ratings yet
Visualizing Criminal Networks White Papers
7 pages
Untitled Document
No ratings yet
Untitled Document
12 pages
Theories of Color-Sound Synesthesia in Alien Species
No ratings yet
Theories of Color-Sound Synesthesia in Alien Species
10 pages
History of The Sky-Cities of Yaldria
No ratings yet
History of The Sky-Cities of Yaldria
10 pages
Manual of Dream Architecture Building in Lucid Space
No ratings yet
Manual of Dream Architecture Building in Lucid Space
10 pages
Quantum Sand Research
No ratings yet
Quantum Sand Research
1 page
Adv Coding - 304 Assign 5
No ratings yet
Adv Coding - 304 Assign 5
5 pages
M.Naveen VU21CSEN0100259: 638. Shopping Offers
No ratings yet
M.Naveen VU21CSEN0100259: 638. Shopping Offers
8 pages
Interratouch Technical Info v1 Eng
No ratings yet
Interratouch Technical Info v1 Eng
9 pages
2016 Service-Oriented Architecture Analysis and Desige Technology - Thomas Erl
100% (1)
2016 Service-Oriented Architecture Analysis and Desige Technology - Thomas Erl
714 pages
Python Text To Spesdfssech
No ratings yet
Python Text To Spesdfssech
2 pages
Eos Linux 7 Broadworks PB
No ratings yet
Eos Linux 7 Broadworks PB
4 pages
Labs 10
No ratings yet
Labs 10
4 pages
29th Oct OE Slot 1 Student List
No ratings yet
29th Oct OE Slot 1 Student List
15 pages
2024-07-20
No ratings yet
2024-07-20
10 pages
Speech Recognition Techniques - GUVI
No ratings yet
Speech Recognition Techniques - GUVI
4 pages
Bug Tracking System Full Document
No ratings yet
Bug Tracking System Full Document
45 pages
Python Assistent Mini Project Report
No ratings yet
Python Assistent Mini Project Report
23 pages
Exercise 8
No ratings yet
Exercise 8
2 pages
RS&GIS
No ratings yet
RS&GIS
2 pages
PBL 2
No ratings yet
PBL 2
5 pages
AI Important Ques Ans
No ratings yet
AI Important Ques Ans
11 pages
Personal Voice Assistant in Python
86% (22)
Personal Voice Assistant in Python
30 pages
Sphinx Speech Recognition
No ratings yet
Sphinx Speech Recognition
5 pages
Ai Project Sona-1 (1) - 250630 - 194118
No ratings yet
Ai Project Sona-1 (1) - 250630 - 194118
10 pages
Department of Computer Science and Engineering) : CGB1121/ EGB1122
No ratings yet
Department of Computer Science and Engineering) : CGB1121/ EGB1122
18 pages
Update IP Particulars Through IP Portal ESIC
0% (1)
Update IP Particulars Through IP Portal ESIC
55 pages
Speech To Text Conversion
No ratings yet
Speech To Text Conversion
7 pages
Voice Assistant
No ratings yet
Voice Assistant
30 pages
Voice Assistant
No ratings yet
Voice Assistant
34 pages
Presentation 3
No ratings yet
Presentation 3
24 pages
Computergrafik cg1 - 1
No ratings yet
Computergrafik cg1 - 1
34 pages
Understanding Python: Beginner's Guide to Programming
From Everand
Understanding Python: Beginner's Guide to Programming
Sabry Fattah
No ratings yet
Speech Recognition
No ratings yet
Speech Recognition
5 pages
2022.RSE Reyee SON
No ratings yet
2022.RSE Reyee SON
29 pages
Selection Test - Sample - Paper
No ratings yet
Selection Test - Sample - Paper
24 pages
Celeb FMT
No ratings yet
Celeb FMT
19 pages
Speech To Text
No ratings yet
Speech To Text
17 pages
30 Common Email Abbreviations & Acronyms You Should Know
No ratings yet
30 Common Email Abbreviations & Acronyms You Should Know
5 pages
Py Report
No ratings yet
Py Report
8 pages
Text To Speech
No ratings yet
Text To Speech
14 pages
Speech Data Collection-1
No ratings yet
Speech Data Collection-1
14 pages
Sujal Kumar Sinha - IOT - MATLAB Mini
No ratings yet
Sujal Kumar Sinha - IOT - MATLAB Mini
13 pages
Speech Image Translator Presentation
No ratings yet
Speech Image Translator Presentation
16 pages
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
No ratings yet
Speech To Image Conversion: Shaik Karishma, Siddu Devi Naga Susmitha, Nanditha Katari, G. Sirisha
5 pages
MATLAB-Text To Speech
No ratings yet
MATLAB-Text To Speech
13 pages
Priyank Dewashish
No ratings yet
Priyank Dewashish
15 pages
B308 Octo-Output Module
No ratings yet
B308 Octo-Output Module
3 pages
File Management in Linux Operating System PDF
0% (1)
File Management in Linux Operating System PDF
2 pages
Speech Recognition Transcription With Open Source ...
No ratings yet
Speech Recognition Transcription With Open Source ...
2 pages
Midex-3: MIDI Interface
No ratings yet
Midex-3: MIDI Interface
30 pages
7sem Projectreport
No ratings yet
7sem Projectreport
33 pages
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
From Everand
COMPUTER PROGRAMMING FOR KIDS: An Easy Step-by-Step Guide For Young Programmers To Learn Coding Skills (2022 Crash Course for Newbies)
Dexter Rogers
No ratings yet
Curriculum Map in Grade 3 Computer Education
100% (1)
Curriculum Map in Grade 3 Computer Education
8 pages
NODE2 Lsinventory Detail
No ratings yet
NODE2 Lsinventory Detail
65 pages
Onrfile Sample
No ratings yet
Onrfile Sample
97 pages
Imp Tts
No ratings yet
Imp Tts
4 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
DL Proj Rep
No ratings yet
DL Proj Rep
11 pages
The 1 Page Python Book
From Everand
The 1 Page Python Book
Barani Kumar
2/5 (1)
TSA Lab 2
No ratings yet
TSA Lab 2
3 pages
MadAnalysis 5, A User-Friendly Framework For Collider Phenomenology PDF
No ratings yet
MadAnalysis 5, A User-Friendly Framework For Collider Phenomenology PDF
35 pages
Microcontrollers: Multiplexed External Bus Interface (MEBI)
No ratings yet
Microcontrollers: Multiplexed External Bus Interface (MEBI)
34 pages
Coding The Future: A Comprehensive Guide To AI Development-By Tyler Welch
No ratings yet
Coding The Future: A Comprehensive Guide To AI Development-By Tyler Welch
180 pages
SpagoBI Tutorials (Business Intelligence Step by Step) - OLAP, Datamining, Reporting, Charts, Qbe, Cockpits
100% (1)
SpagoBI Tutorials (Business Intelligence Step by Step) - OLAP, Datamining, Reporting, Charts, Qbe, Cockpits
69 pages
Integrated Bridge Systems (IBS) : T.C. Dokuz Eylül University Maritime Faculty Marine Transportation Engineering
100% (7)
Integrated Bridge Systems (IBS) : T.C. Dokuz Eylül University Maritime Faculty Marine Transportation Engineering
23 pages
Project Report
No ratings yet
Project Report
58 pages
x300 Schematics
No ratings yet
x300 Schematics
99 pages
Paper 4
No ratings yet
Paper 4
5 pages
Create A New Verilog Project in Xilinx ISE 6.2i, and Name It Ram
No ratings yet
Create A New Verilog Project in Xilinx ISE 6.2i, and Name It Ram
16 pages
Forcepoint Ipsec Guide: Forcepoint Web Security Cloud
No ratings yet
Forcepoint Ipsec Guide: Forcepoint Web Security Cloud
36 pages
ZTE UMTS UR15 Handover Control Feature Guide - V1.2
No ratings yet
ZTE UMTS UR15 Handover Control Feature Guide - V1.2
342 pages
Speech To Text
No ratings yet
Speech To Text
4 pages
6.python Text To Speech
No ratings yet
6.python Text To Speech
2 pages
Your First Python Program
From Everand
Your First Python Program
Alexander Paz
No ratings yet
Synopsis
No ratings yet
Synopsis
5 pages
Utorrent Optimization Guide - Simple Tricks For The Best Performance
No ratings yet
Utorrent Optimization Guide - Simple Tricks For The Best Performance
4 pages
TEXT - TO - SPEECH - CONVERSION - 22215a1211
No ratings yet
TEXT - TO - SPEECH - CONVERSION - 22215a1211
8 pages
Voice Assistant - Doge: Bachelor of Engineering IN Computer Science & Engineering
No ratings yet
Voice Assistant - Doge: Bachelor of Engineering IN Computer Science & Engineering
48 pages
Pro Tools HD: Advanced Techniques and Workflows
From Everand
Pro Tools HD: Advanced Techniques and Workflows
Edouard Camou
4/5 (1)
Text To Speech Conversion Module
No ratings yet
Text To Speech Conversion Module
8 pages
Spoken Language Processing in Python Chapter2
No ratings yet
Spoken Language Processing in Python Chapter2
23 pages
Automatic Speech Recognition Using Python
No ratings yet
Automatic Speech Recognition Using Python
18 pages
Speech Recognition
No ratings yet
Speech Recognition
13 pages
Ai Virtual Assistant in Python: Submitted By: Rohit Kumar Sakshi Verma
No ratings yet
Ai Virtual Assistant in Python: Submitted By: Rohit Kumar Sakshi Verma
17 pages
(IJCST-V9I2P18) :swati, Harpreet Kaur
No ratings yet
(IJCST-V9I2P18) :swati, Harpreet Kaur
6 pages
Jarvis Voice Assistant For PC
No ratings yet
Jarvis Voice Assistant For PC
10 pages
Speech Recognition System
No ratings yet
Speech Recognition System
16 pages
Personal Voice Assistant in Python
100% (1)
Personal Voice Assistant in Python
30 pages
Listen To What You Wrote! Text-To-Speech for Writers and Others
From Everand
Listen To What You Wrote! Text-To-Speech for Writers and Others
Mitch Sexton
No ratings yet
JARVIS A PC Voice Assistant
No ratings yet
JARVIS A PC Voice Assistant
9 pages
Minor Project123
No ratings yet
Minor Project123
40 pages
How Speech Recognition Works: Hidden Markov Model
No ratings yet
How Speech Recognition Works: Hidden Markov Model
25 pages
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
CASSI Speech Recognition
No ratings yet
CASSI Speech Recognition
14 pages
The PC Interfaced Voice Recognition System Is To Implement A Password For Authentication
No ratings yet
The PC Interfaced Voice Recognition System Is To Implement A Password For Authentication
7 pages
Speech Recognition Project
No ratings yet
Speech Recognition Project
33 pages
Mastering Sublime Text
From Everand
Mastering Sublime Text
Dan Peleg
No ratings yet

Labs 9

Uploaded by

Labs 9

Uploaded by

EXPERIMENT 9 Text to Speech recognition and Synthesis through APIs

Step 1: Install the gTTS Library

First, install the gTTS library.

Step 2: Convert Text to Speech and Save as an Audio File

from gtts import gTTS

# Specify the text and language

# Create a gTTS object

# Save the audio file

Step 3: Play the Audio (Optional)

# Play the saved audio file

2. Speech recognition, also known as automatic speech recognition (ASR), computer

Step 1: Install Required Libraries

# Install the required libraries

Import the necessary libraries and set up the recognizer.

Step 3: Upload an Audio File

from google.colab import files

# Upload an audio file

Step 4: Convert Audio to WAV Format (if needed)

# Convert audio file to WAV format

Step 5: Perform Speech Recognition

Use the SpeechRecognition library to transcribe the audio.

# Load the audio file and recognize the speech

# Recognize speech using Google Web Speech API

Step 6: Optional - Display the Transcribed Text

3. Speech segmentation is the process of identifying the boundaries between words,

You might also like