Labs 9
Labs 9
Aim:
To implement the algorithm for Text to Speech recognition and Synthesis through APIs.
Description:
1. Speech synthesis is the artificial production of human speech. A computer system used
for this purpose is called a speech synthesizer, and can be implemented in software or
hardware products. A text-to-speech (TTS) system converts normal language text into
speech; other systems render symbolic linguistic representations like phonetic
transcriptions into speech. The reverse process is speech recognition.
Synthesized speech can be created by concatenating pieces of recorded speech that are
stored in a database. Systems differ in the size of the stored speech units; a system that
stores phones or diphones provides the largest output range, but may lack clarity. For
specific usage domains, the storage of entire words or sentences allows for high-quality
output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other
human voice characteristics to create a completely "synthetic" voice output.
Here’s how you can use Google Text-to-Speech (gTTS) in Google Colab to convert text to
speech. The gTTS library is straightforward and well-suited for generating speech in multiple
languages.
# Install gTTS
!pip install gTTS
Now, you can use the gTTS library to convert text into speech and save it as an MP3 file.
You can play the audio file directly in Colab using the IPython.display.Audio function.
from IPython.display import Audio
This code will convert your text into speech, save it as an MP3 file, and allow you to play or
download it directly from Colab.
Here’s a Python code snippet to perform speech recognition in Google Colab using the
SpeechRecognition library. This code will allow you to transcribe speech from an audio
file.
First, you need to install the SpeechRecognition and pydub libraries. pydub is useful for
handling audio files in different formats.
Google Colab allows you to upload files directly. Use the following code to upload an audio
file.
Make sure the audio file is in a format supported by SpeechRecognition (e.g., .wav, .flac,
or .mp3). If it's not in .wav format, you may need to convert it using pydub.
If your file is in .mp3 format, you can convert it to .wav using pydub.
# Initialize recognizer
recognizer = sr.Recognizer()
The transcribed text will be printed directly in the Colab output cell.
Task 1: Implement the gTTS code and provide the link. You must use your full name as
input text and generate audio in English. (4
marks)
(optional: Try text in your mother tongue and generate audio file)
Task 2: Speech recognition. Implement the SpeechRecognition code and provide the link.
You must use the generated audio in Task 1 as input to Task 2.
Compare the text input given in Task 1 and the Transcribed text obtained from Task 2. Test
more sentences and write your inference (6 marks)