Speech Recognition in Python using Google Speech API

The speech recognition is one of the most useful features in several applications like home automation, AI etc. In this section we will see how the speech recognition can be done using Python and Google’s Speech API.

In this case we will give an audio using microphone for speech recognizing. To configure the microphones, there are some parameters.

To use this module, we have to install the SpeechRecognition module. There is another module called pyaudio, which is optional. Using this we can set different modes of audio.

sudo pip3 install SpeechRecognition
sudo apt-get install python3-pyaudio

For External Microphones or USB microphones, we need to provide the exact microphone to avoid any difficulties. On Linux, if we type ‘lsusb’ to show the related information for USB devices.

The second parameter is the Chunk Size. Using this we can specify how much data we want to read at once. It will be a number which is power of 2, like 1024 or 2048 etc.

We also have to specify the sampling rate to determine how often the data are recorded for processing.

As there may some unavoidable noise in the surroundings, then we have to adjust the ambient Noise to take the exact voice.

Steps to Recognize the speech

Take different microphone related information.
Configure the microphone using chunk size, sampling rate, ambient noise adjustments etc.
Wait for some time to get the voice
- When the voice is recognized, try to convert it into texts, otherwise raise some errors.
Stop the process.

Example Code

import speech_recognition as spreg
#Setup the sampling rate and the data size
sample_rate = 48000
data_size = 8192
recog = spreg.Recognizer()
with spreg.Microphone(sample_rate = sample_rate, chunk_size = data_size) as source:
recog.adjust_for_ambient_noise(source)
print('Tell Something: ')
   speech = recog.listen(source)
try:
   text = recog.recognize_google(speech)
   print('You have said: ' + text)
except spreg.UnknownValueError:
   print('Unable to recognize the audio')
except spreg.RequestError as e: 
   print("Request error from Google Speech Recognition service; {}".format(e))

Output

$ python3 318.speech_recognition.py
Tell Something: 
You have said: here we are considering the asymptotic notation Pico to calculate the upper bound 
of the time complexity so then the definition of the big O notation is like this one
$

Without using the microphone, we can also take some audio file as input to convert it to a speech.

Example Code

import speech_recognition as spreg
sound_file = 'sample_audio.wav'
recog = spreg.Recognizer()
with spreg.AudioFile(sound_file) as source:
   speech = recog.record(source) #use record instead of listning
   try:
      text = recog.recognize_google(speech)
      print('The file contains: ' + text)
   except spreg.UnknownValueError:
      print('Unable to recognize the audio')
   except spreg.RequestError as e: 
      print("Request error from Google Speech Recognition service; {}".format(e))

Output

$ python3 318a.speech_recognition_file.py 
The file contains: staying ahead of the curve demand planning new technology it also helps you progress in your career
$