0% found this document useful (0 votes)
3 views17 pages

Chapter 1

The document provides an introduction to handling audio data in Python, covering various audio file formats and their frequency characteristics. It explains how to open audio files, convert sound wave bytes to integers, find frame rates, and visualize sound waves using libraries like NumPy and Matplotlib. Practical examples are included to demonstrate the process of working with audio data effectively.

Uploaded by

Andreu Orestes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views17 pages

Chapter 1

The document provides an introduction to handling audio data in Python, covering various audio file formats and their frequency characteristics. It explains how to open audio files, convert sound wave bytes to integers, find frame rates, and visualize sound waves using libraries like NumPy and Matplotlib. Practical examples are included to demonstrate the process of working with audio data effectively.

Uploaded by

Andreu Orestes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Introduction to audio

data in Python
SPOKEN LANGUAGE PROCESSING IN PYTHON

Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Dealing with audio files in Python
Different kinds all of audio files
mp3

wav

m4a

flac

Digital sounds measured in frequency (kHz)


1 kHz = 1000 pieces of information per second

SPOKEN LANGUAGE PROCESSING IN PYTHON


Frequency examples
Streaming songs have a frequency of 32 kHz

Audiobooks and spoken language are between 8 and 16 kHz

We can't see audio files so we have to transform them first

import wave

SPOKEN LANGUAGE PROCESSING IN PYTHON


Opening an audio file in Python
Audio file saved as good-morning.wav

# Import audio file as wave object


good_morning = wave.open("good-morning.wav", "r")

# Convert wave object to bytes


good_morning_soundwave = good_morning.readframes(-1)

# View the wav file in byte form


good_morning_soundwave

b'\xfd\xff\xfb\xff\xf8\xff\xf8\xff\xf7\...

SPOKEN LANGUAGE PROCESSING IN PYTHON


Working with audio is different
Have to convert the audio to something useful

Small sample of audio = large amount of information

SPOKEN LANGUAGE PROCESSING IN PYTHON


Let's practice!
SPOKEN LANGUAGE PROCESSING IN PYTHON
Converting sound
wave bytes to
integers
SPOKEN LANGUAGE PROCESSING IN PYTHON

Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Converting bytes to integers
Can't use bytes
Convert bytes to integers using numpy

import numpy as np
# Convert soundwave_gm from bytes to integers
signal_gm = np.frombuffer(soundwave_gm, dtype='int16')
# Show the first 10 items
signal_gm[:10]

array([ -3, -5, -8, -8, -9, -13, -8, -10, -9, -11], dtype=int16)

SPOKEN LANGUAGE PROCESSING IN PYTHON


Finding the frame rate
Frequency (Hz) = length of wave object array/duration of audio file (seconds)

# Get the frame rate


framerate_gm = good_morning.getframerate()
# Show the frame rate
framerate_gm

48,000

Duration of audio file (seconds) = length of wave object array/frequency (Hz)

SPOKEN LANGUAGE PROCESSING IN PYTHON


Finding sound wave timestamps
# Return evenly spaced values between start and stop
np.linspace(start=1, stop=10, num=10)

array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])

# Get the timestamps of the good morning sound wave


time_gm = np.linspace(start=0,
stop=len(soundwave_gm)/framerate_gm,
num=len(soundwave_gm))

SPOKEN LANGUAGE PROCESSING IN PYTHON


Finding sound wave timestamps
# View first 10 time stamps of good morning sound wave
time_gm[:10]

array([0.00000000e+00, 2.08334167e-05, 4.16668333e-05, 6.25002500e-05,


8.33336667e-05, 1.04167083e-04, 1.25000500e-04, 1.45833917e-04,
1.66667333e-04, 1.87500750e-04])

SPOKEN LANGUAGE PROCESSING IN PYTHON


Let's practice!
SPOKEN LANGUAGE PROCESSING IN PYTHON
Visualizing sound
waves
SPOKEN LANGUAGE PROCESSING IN PYTHON

Daniel Bourke
Machine Learning Engineer/YouTube
Creator
Adding another sound wave
New audio file: good_afternoon.wav
Both are 48 kHz

Same data transformations to all audio files

SPOKEN LANGUAGE PROCESSING IN PYTHON


Setting up a plot
import matplotlib.pyplot as plt
# Initialize figure and setup title
plt.title("Good Afternoon vs. Good Morning")
# x and y axis labels
plt.xlabel("Time (seconds)")
plt.ylabel("Amplitude")
# Add good morning and good afternoon values
plt.plot(time_ga, soundwave_ga, label ="Good Afternoon")
plt.plot(time_gm, soundwave_gm, label="Good Morning",
alpha=0.5)
# Create a legend and show our plot
plt.legend()
plt.show()

SPOKEN LANGUAGE PROCESSING IN PYTHON


SPOKEN LANGUAGE PROCESSING IN PYTHON
Time to visualize!
SPOKEN LANGUAGE PROCESSING IN PYTHON

You might also like