Introduction to
audio data in Python
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON
Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Dealing with audio les in Python
Different kinds all of audio les
mp3
wav
m4a
ac
Digital sounds measured in frequency (kHz)
1 kHz = 1000 pieces of information per second
SPOKEN LANGUAGE PROCESSING IN PYTHON
Frequency examples
Streaming songs have a frequency of 32 kHz
Audiobooks and spoken language are between 8 and 16 kHz
We can't see audio les so we have to transform them rst
import wave
SPOKEN LANGUAGE PROCESSING IN PYTHON
Opening an audio le in Python
Audio le saved as good-morning.wav
# Import audio file as wave object
good_morning = wave.open("good-morning.wav", "r")
# Convert wave object to bytes
good_morning_soundwave = good_morning.readframes(-1)
# View the wav file in byte form
good_morning_soundwave
b'\xfd\xff\xfb\xff\xf8\xff\xf8\xff\xf7\...
SPOKEN LANGUAGE PROCESSING IN PYTHON
Working with audio is different
Have to convert the audio to something useful
Small sample of audio = large amount of information
SPOKEN LANGUAGE PROCESSING IN PYTHON
Let's practice!
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON
Converting sound
wave bytes to
integers
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON
Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Converting bytes to integers
Can't use bytes
Convert bytes to integers using numpy
import numpy as np
# Convert soundwave_gm from bytes to integers
signal_gm = np.frombuffer(soundwave_gm, dtype='int16')
# Show the first 10 items
signal_gm[:10]
array([ -3, -5, -8, -8, -9, -13, -8, -10, -9, -11], dtype=int16)
SPOKEN LANGUAGE PROCESSING IN PYTHON
Finding the frame rate
Frequency (Hz) = length of wave object array/duration of audio le (seconds)
# Get the frame rate
framerate_gm = good_morning.getframerate()
# Show the frame rate
framerate_gm
48,000
Duration of audio le (seconds) = length of wave object array/frequency (Hz)
SPOKEN LANGUAGE PROCESSING IN PYTHON
Finding sound wave timestamps
# Return evenly spaced values between start and stop
np.linspace(start=1, stop=10, num=10)
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
# Get the timestamps of the good morning sound wave
time_gm = np.linspace(start=0,
stop=len(soundwave_gm)/framerate_gm,
num=len(soundwave_gm))
SPOKEN LANGUAGE PROCESSING IN PYTHON
Finding sound wave timestamps
# View first 10 time stamps of good morning sound wave
time_gm[:10]
array([0.00000000e+00, 2.08334167e-05, 4.16668333e-05, 6.25002500e-05,
8.33336667e-05, 1.04167083e-04, 1.25000500e-04, 1.45833917e-04,
1.66667333e-04, 1.87500750e-04])
SPOKEN LANGUAGE PROCESSING IN PYTHON
Let's practice!
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON
Visualizing sound
waves
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON
Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Adding another sound wave
New audio le: good_afternoon.wav
Both are 48 kHz
Same data transformations to all audio les
SPOKEN LANGUAGE PROCESSING IN PYTHON
Setting up a plot
import matplotlib.pyplot as plt
# Initialize figure and setup title
plt.title("Good Afternoon vs. Good Morning")
# x and y axis labels
plt.xlabel("Time (seconds)")
plt.ylabel("Amplitude")
# Add good morning and good afternoon values
plt.plot(time_ga, soundwave_ga, label ="Good Afternoon")
plt.plot(time_gm, soundwave_gm, label="Good Morning",
alpha=0.5)
# Create a legend and show our plot
plt.legend()
plt.show()
SPOKEN LANGUAGE PROCESSING IN PYTHON
SPOKEN LANGUAGE PROCESSING IN PYTHON
Time to visualize!
S P OK EN LAN GUAGE P ROCES S IN G IN P YTH ON