In this tutorial, we are going to work with the audio files. We will breakdown the audio into chunks to recognize the content in it. We will store the content of the audio files in text files as well. Install the following modules using the below commands.
pip install pydub
If you run the above command, you will get the following successful message
Collecting pydub Downloading https://files.pythonhosted.org/packages/79/db/eaf620b73a1eec3c8c6f8f5 b0b236a50f9da88ad57802154b7ba7664d0b8/pydub-0.23.1-py2.py3-none-any.whl Installing collected packages: pydub Successfully installed pydub-0.23.1
pip install audioread
If you run the above command, you will get the following successful message.
Collecting audioread Downloading https://files.pythonhosted.org/packages/2e/0b/940ea7861e0e9049f09dcfd 72a90c9ae55f697c17c299a323f0148f913d2/audioread-2.1.8.tar.gz Building wheels for collected packages: audioread Building wheel for audioread (setup.py): started Building wheel for audioread (setup.py): finished with status 'done' Created wheel for audioread: filename=audioread-2.1.8-cp37-none-any.whl size=2309 8 sha256=92b6f46d6b4726e7a13233dc9d84744ba74e23187123e67f663650f24390dc9d Stored in directory: C:\Users\hafeezulkareem\AppData\Local\pip\Cache\wheels\b9\64 \09\0b6417df9d8ba8bc61a7d2553c5cebd714ec169644c88fc012 Successfully built audioread Installing collected packages: audioread Successfully installed audioread-2.1.8
pip install SpeechRecognition
If you run the above command, you will get the following successful message.
Collecting SpeechRecognition Downloading https://files.pythonhosted.org/packages/26/e1/7f5678cd94ec1234269d237 56dbdaa4c8cfaed973412f88ae8adf7893a50/SpeechRecognition-3.8.1-py2.py3-none-any.whl (32.8MB) Installing collected packages: SpeechRecognition Successfully installed SpeechRecognition-3.8.1
We have two steps in the process.
Breaking the audio into chunks.
We have to extract the content using SpeechRecognition.
Take one audio file from your library. Let's start the code.
Example
# importing the module import pydub import speech_recognition # getting the audio file audio = pydub.AudioSegment.from_wav('audio.wav') # length of the audio in milliseconds audio_length = len(audio) print(f'Audio Length: {audio_length}') # chunk counter chunk_counter = 1 audio_text = open('audio_text.txt', 'w+') # setting where to slice the audio point = 60000 # overlap - remaining audio after slicing rem = 8000 # initialising variables to track chunks and ending flag = 0 start = 0 end = 0 # iterating through the audio with incrementing of rem for i in range(0, 2 * audio_length, point): # in first iteration end = rem if i == 0: start = 0 end = point else: # other iterations start = end - rem end = start + point # if end is greater than audio_length if end >= audio_length: end = audio_length # to indicate stop flag = 1 # getting a chunk from the audio chunk = audio[start:end] # chunk name chunk_name = f'chunk_{chunk_counter}' # storing the chunk to local storage chunk.export(chunk_name, format = 'wav') # printing the chunk print(f'{chunk_name} start: {start} end: {end}') # incrementing chunk counter chunk_counter += 1 # recognising text from the audio # initialising the recognizer recognizer = speech_recognition.Recognizer() # creating a listened audio with speech_recognition.AudioFile(chunk_name) as chunk_audio: chunk_listened = recognizer.listen(chunk_audio) # recognizing content from the audio try: # getting content from the chunk content = recognizer.recognize_google(chunk_listened) # writing to the file audio_text.write(content + '\n') # if not recognized except speech_recognition.UnknownValueError: print('Audio is not recognized') # internet error except speech_recognition.RequestError as Error: print('Can\'t connect to the internet') # checking the flag if flag == 1: audio_text.close() break
Output
If you run the above code, you will get the following results.
Audio Length: 480052 chunk_1 start: 0 end: 60000 chunk_2 start: 52000 end: 112000 chunk_3 start: 104000 end: 164000 chunk_4 start: 156000 end: 216000 chunk_5 start: 208000 end: 268000 chunk_6 start: 260000 end: 320000 chunk_7 start: 312000 end: 372000 chunk_8 start: 364000 end: 424000 chunk_9 start: 416000 end: 476000 chunk_10 start: 468000 end: 480052
Checking the file content.
# opening the file in read mode with open('audio_text.txt', 'r') as file: print(file.read())
If you run the above code, you will get the following result.
English and I am here in San Francisco I am back in San Francisco last week we were in Texas at a teaching country and The Reader of the teaching conference was a plan e Re improve teaching as a result you are house backup file with bad it had some English is coming soon one day only time 12 o1 a.m. everything about her English now or powering on my email list sports in your city check your email email Harjeet girlfriend next Tuesday checking the year enjoying office English keep listening keep smiling keep enjoying your English learning
Conclusion
If you have any doubts regarding the tutorial, mention them in the comment section.