0% found this document useful (0 votes)
27 views

Use Frequency More Frequently, A Guide To Using The Fast Fourier Transform For Data Scientists in Python With Examples - Towards Data Science

Uploaded by

Deepankar Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Use Frequency More Frequently, A Guide To Using The Fast Fourier Transform For Data Scientists in Python With Examples - Towards Data Science

Uploaded by

Deepankar Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists

entists in python wit…

Open in app

10
Search

Member-only story

Use Frequency More Frequently


A handbook from simple to advanced frequency analysis: exploring a vital tool which
is widely underutilized in data science

Daniel Warfield · Follow


Published in Towards Data Science
20 min read · May 19, 2023

Listen Share More

Frequency analysis is extremely useful in a vast number of domains. From audio, to


mechanical systems, to natural language processing and unsupervised learning. For
many scientists and engineers it’s a vital tool, but for many data scientists and
developers it’s hardly understood, if at all. If you don’t know about frequency
analysis, don’t fret, you just found your handbook.

Image by Daniel Warfield using p5.js. All images in this document are either created with p5.js or Python’s
Matplotlib library unless otherwise specified.

Who is this useful for? Anyone who works with virtually any signal, sensor, image,
or AI/ML model.

How advanced is this post? This post is accessible to beginners and contains
examples that will interest even the most advanced users of frequency analysis. You
https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 1/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

will likely get something out of this article regardless of your skill level.

What will you get from this post? Both a conceptual and mathematical
understanding of waves and frequencies, a practical understanding of how to
employ those concepts in Python, some common use cases, and some more
advanced use cases.

Note: To help you skim through, I’ve labeled subsections as Basic, Intermediate,
and Advanced. This is a long article designed to get someone from zero to hero.
However, if you already have education or experience in the frequency domain, you
can probably skim the intermediate sections or jump right to the advanced topics.

I’ve also set up links so you can click to navigate to and from the table of contents

Table Of Contents
Click the links to navigate to specific sections

1) The Frequency Domain


1.1) The Basics of the Frequency Domain (Basic)
1.2) The Specifics of the Frequency Domain (Intermediate)
1.3) A Simple Example in Python (Intermediate)
2) Common Uses of the Frequency Domain
2.1) De-trending and Signal Processing (Intermediate)
2.2) Vibration Analysis (Advanced)
3) Advanced Uses of the Frequency Domain
3.1) Data Augmentation (Advanced)
3.2) Embedding and Clustering (Advanced)
3.3) Compression (Intermediate)
4) Conceptual Takeaways for Data Scientists
5) Summary

1) The Frequency Domain

1.1) The Basics of the Frequency Domain (Basic)


(Back To Table of Contents)

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 2/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

First, what is a domain? Imagine you want to understand temperature changes over
time. Just reading that sentence, you probably imagined a graph like this:

What you might be imagining when you think of temperature over some period of time

Maybe you imagine time progressing from left to right, and greater temperatures
corresponding to higher vertical points. Congratulations, you’ve taken data and
mapped it to a 2d time domain. In other words, you’ve taken temperature readings,
recorded at certain times, and mapped that information to a space where time is
one axis, and the value is another.

There are other ways to represent our temperature vs time data. As you can see,
there’s a “periodic” nature to this data, meaning it oscillates back and forth. A lot of
data behaves this way: sound, ECG data from heartbeats, movement sensors like
accelerometers, and even images. In one way or another, a lot of things have data
that goes to and fro periodically.

“If you want to find the secrets of the universe, think


in terms of energy, frequency and vibration.” ―
Nikola Tesla
I could get to this point in a circuitous way, but a picture speaks 1000 words. In
essence, we can disassemble our temperature graph into a bunch of simple waves,
with various frequencies and amplitudes (frequency being the speed it goes back in
forth, and amplitude being how high and low it goes), and use that to describe the
data.

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 3/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

All the waves, of various frequencies and amplitudes, which goes into making our original wave. You might
notice that there’s one wave which is more subtle than the other two and is practically impossible to see in
the original. Finding this hidden information is one benefit of frequency analysis.

These waves are extracted using a Fourier Transform, which maps our original wave
from the time domain to the frequency domain. Instead of value vs time, the
frequency domain is amplitude vs frequency.

Each of the extracted waves has a frequency and amplitude. If we plot frequency on the x axis, and amplitude
on the y axis, we have plotted what is called a spectrogram

So, to summarize, the Fourier Transform maps data (usually, but not always in the
time domain) into the frequency domain. The frequency domain describes all of the

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 4/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

waves, with different frequencies and amplitudes, which when added together
reconstruct the original wave.

The original wave, in the time domain, and the frequency content in the frequency domain. These both
describe the same signal

1.2) The Specifics of the Frequency Domain (Intermediate)


(Back To Table of Contents)

The sin function is the ratio of the opposite side of a triangle vs the hypotenuse of
that right triangle, for some angle.

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 5/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

θ(theta) is an angle of a right triangle, a is the length of the opposite side of θ, and c is the length of the
hypotenuse

The sin wave is what you get when you plot a/c for different values of θ (Different
Angles), and is used in virtually all scientific disciplines as the most fundamental
wave.

The relationship between the sin function, right triangles, and the sin wave

Often sin(θ) is expanded to A*sin(ωθ+ϕ).

ω(omega) represents frequency (larger values of ω mean the sin wave oscillates
more quickly)

ϕ(phi) represents phase (changing ϕ shifts the wave to the right or left)

A scales the function, which defines the amplitude (how large the oscillations are).

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 6/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

“A” controls the amplitude (height), “omega” controls the frequency (speed of oscillation), and “phi” controls
the phase (shift from side to side)

When I explained the frequency domain I presented a simplified representation,


where the horizontal axis is frequency, and the vertical axis is amplitude. In
actuality the frequency domain is not 2 dimensional, but 3: one dimension for
frequency, one for amplitude, and one for phase. A spectrogram can be of even
higher dimension for higher dimensional signals (like images).

A Traditional Amplitude vs frequency spectrogram (left) vs a more descriptive amplitude, frequency, and
phase plot.

When converting a signal to the frequency domain (using a library like scipy, for
instance) you’ll get a list of imaginary numbers.

[1.13-1.56j, 2.34+2.6j, 7.4,-3.98j, ...]

If you’re not familiar with imaginary numbers, don’t worry about it. You can imagine
these lists as points, where the index of the list corresponds to frequency, and the

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 7/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

complex imaginary number represents a tuple corresponding to amplitude and


phase respectively.

[(1.13, 1.56), (2.34, 2.6), (7.4, -3.98), ...]

I haven’t talked about the units of these numbers. Because units are, essentially,
linear transformations to all data, they can often be disregarded from a data science
perspective. However, if you do use the frequency domain in the future, you will
likely encounter words like Hertz (Hz), Period (T), and other frequency domain-
specific concepts. You will see these units explored in the examples.

If you want to learn more about units in general, and how to deal with them as a
data scientist, I have an article all about it here

1.3) A Simple Example in Python (Intermediate)


(Back To Table of Contents)

In this example, we load a snippet of trumpet music, convert it to the frequency


domain, plot the frequency spectrogram, and use the spectrogram to understand
the original signal.

First, we’ll load and plot the sound data, which is an amplitude over time. This data
is used to control the location of the diaphragm within a speaker, the oscillation of
which generates sound.

"""
Loading a sample waveform, and plotting it in the time domain
"""

#importing dependencies
import matplotlib.pyplot as plt #for plotting
from scipy.io import wavfile #for reading audio file
import numpy as np #for general numerical processing

#reading a .wav file containing audio data.


#This is stereo data, so there's a left and right audio audio channel
https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 8/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

samplerate, data = wavfile.read('trumpet_snippet.wav')

#creating wide figure


plt.figure(figsize=(18,6))

#defining number of samples we will explore


N = 3000

#calculating time of each sample


x = np.linspace(start = 0, stop = N/samplerate, num = N)

#plotting channel 0
plt.subplot(2, 1, 1)
plt.plot(x,data[:N,0])

#plotting channel 1
plt.subplot(2, 1, 2)
plt.plot(x,data[:N,1])

#rendering
plt.show()

The left and right sound waves from a snippet of stereo trumpet music, in the time domain. The X axis
corresponds to time, in seconds, and the y axis corresponds to the amplitude of the signal, which controls
the location of a speaker diaphragm, generating sound. (Raw trumpet data from storyblocks.com)

Lets convert these waveforms to the frequency domain

"""
Converting the sample waveform to the frequency domain, and plotting it

This is basically directly from the scipy documentation


https://docs.scipy.org/doc/scipy/tutorial/fft.html
"""

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 9/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

#importing dependencies
from scipy.fft import fft, fftfreq #for computing frequency information

#calculating the period, which is the amount of time between samples


T = 1/samplerate
#defining the number of samples to be used in the frequency calculation
N = 3000

#calculating the amplitudes and frequencies using fft


yf0 = fft(data[:N,0])
yf1 = fft(data[:N,1])
xf = fftfreq(N, T)[:N//2]

#creating wide figure


plt.figure(figsize=(18,6))

#plotting only frequency and amplitude for the 1st channel


plt.subplot(2, 1, 1)
plt.plot(xf, 2.0/N * np.abs(yf0[0:N//2]))
plt.xlim([0, 6000])

#plotting only frequency and amplitude for the 2st channel


plt.subplot(2, 1, 2)
plt.plot(xf, 2.0/N * np.abs(yf1[0:N//2]))
plt.xlim([0, 6000])

plt.show()

The frequency domain representation of the previously loaded trumpet audio. The X axis is the frequency (in
Hz, which is oscillations/second), and the y axis is the amplitude of the signal.

Just by visualizing this graph, a few insights can be made.

1. Both signals contain very similar frequency content, which makes sense
because they’re both from the same recording. Often stereo recordings are

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 10/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

recorded with two separate microphones simultaneously.

2. The dominant frequency is around 523Hz, which corresponds to a C5 note.

3. There is a lot of sympathetic resonance, which can be seen as spikes at


frequencies that are at integer multiples of the base frequency. This trait is
critical in making an instrument sound good and is the result of various pieces
of the instrument resonating at different frequencies which is induced by the
primary vibration.

4. This is a very clear sound, the spikes are not muddled by a lot of unrelated
frequency content

5. This is an organic sound. There is some frequency content which is not related
to the base frequency. This can be thought of as the timbre of the instrument
and makes it sound like a trumpet, rather than some other instrument
performing the same note.

In section 2 we’ll explore how the frequency domain is used commonly in time
series signal processing. In section 3 we’ll explore more advanced topics.

2) Common Uses of the Frequency Domain

2.1) De-trending and Signal Processing (Intermediate)


(Back To Table of Contents)

Let’s say you have an electrical system, and you want to understand the minute-by-
minute voltage changes in that system over the course of a day. You set up a voltage
meter, capture, and plot the voltage information over time.

Let's say, for the purposes of this example, we only cared about the graph for the
minute-by-minute data, and we consider waves which are too high of a frequency to
be noise, and waves which are too low in frequency to be a trend that we want to
ignore.

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 11/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

We don’t care about the long term trends which take place over the course of hours. We’re interested in
minute-by-minute data (raw data synthetically generated by the author)

We care about the trends going on in around this time frame

We don’t care about waves which oscillate too quickly, these are considered as noise in the signal

So, for this example, we only care about observing content which oscillates slower
than once per second, and faster than once every 5 minutes. We can convert our
data to the frequency domain, remove all but the frequencies we’re interested in

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 12/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

observing, then convert back to the time domain. so we can visualize the wave
including only the trends we’re interested in.

First, let’s observe the frequency domain unaltered:

"""
Plotting the entire frequency domain spectrogram for the mock electrical data
"""

#load electrical data, which is a numpy list of values taken at 1000Hz sampling
x, y = load_electrical_data()
samplerate = 1000
N = len(y)

#calculating the period, which is the amount of time between samples


T = 1/samplerate

#calculating the amplitudes and frequencies using fft


yf = fft(y)
xf = fftfreq(N, T)[:N//2]

#creating wide figure


plt.figure(figsize=(18,6))

#plotting only frequency and amplitude for the 1st channel


plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))

#marking units of the two axis


plt.xlabel('fq (Frequency in Hz)')
plt.ylabel('V (Volts)')

#setting the vertical axis as logorithmic, for better visualization


plt.gca().set_yscale('log')

#rendering
plt.show()

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 13/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

This is the complete , unfiltered spectrogram for the electrical system we are analyzing

We can set all the frequency content we are not interested into zero. Often you use a
special filter, like a butter-worth filter, to do this, but we’ll keep it simple.

"""
converting the data to the frequency domain, and filtering out
unwanted frequencies
"""

#defining low frequency cutoff


lowfq = 1/(5*60)

#defining high frequency cutoff


highfq = 1

#calculating the amplitudes and frequencies, preserving all information


#so the inverse fft can work
yf = fft(y)
xf = fftfreq(N, T)

#applying naiive filter, which will likely create some artifacts, but will
#filter out the data we don't want
yf[np.abs(xf) < lowfq] = 0
yf[np.abs(xf) > highfq] = 0

#creating wide figure


plt.figure(figsize=(18,6))

#plotting only frequency and amplitude


plt.plot(xf[:N//2], 2.0/N * np.abs(yf[0:N//2]))

#marking units of the two axis


plt.xlabel('fq (Frequency in Hz)')
plt.ylabel('V (Volts)')

#setting the vertical axis as logorithmic, for better visualization

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 14/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

plt.gca().set_yscale('log')

#zooming into the frequency range we care about


plt.xlim([-0.1, 1.1])

#rendering
plt.show()

The plot of the frequency domain we’re isolating, with all other frequency information set to zero

Now we can perform an inverse Fast Fourier Transform to reconstruct the wave,
including only the data we care about

"""
Reconstructing the wave with the filtered frequency information
"""

#importing dependencies
from scipy.fft import ifft #for computing the inverse fourier transform

#computing the inverse fourier transform


y_filt = ifft(yf)

#creating wide figure


plt.figure(figsize=(18,6))

#plotting
plt.plot(x,y_filt)

#defining x and y axis


plt.xlabel('t (seconds)')
plt.ylabel('V (volts)')

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 15/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

#looking at a few minutes of data, not looking at


#the beginning or end of the data to avoid filtration artifacts
plt.xlim([60*2,60*10])

#rendering
plt.show()

A few minutes of data, with our filter enabled. We have removed excessively high frequency content, and
brought the wave to center around 0 by removing excessively low frequency content.

And that’s it. We have successfully removed high-frequency information we don’t


care about, and centered the data we do care about around zero by removing low-
frequency trends. We can now use this minute-by-minute data to hone in on
understanding the electrical system we’re measuring.

2.2) Vibration Analysis (Advanced)


(Back To Table of Contents)

I covered vibration analysis in a previous example in the form of analyzing a sound


wave. In this example, I’ll discuss analyzing vibrations in physical systems, like a
motor in a factory.

It can be difficult to predict when certain motors require maintenance. Often,


simple issues like a misalignment can cascade into much more severe issues, like a
complete engine failure. We can use frequency recordings, collected periodically
over time, to help us understand when a motor is operating differently; allowing us
to diagnose issues within an engine before it cascades into a larger issue.

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 16/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

Vibration data taken over a period of time where the engine experienced a minor failure. In the time domain
it’s virtually impossible to see the time of failure. (raw data synthetically generated by the author)

To analyze this data, we will compute and render what is called a mel spectrogram.
A mel spectrogram is just like a normal spectrogram, but instead of computing the
frequency content across the entire waveform, we extract the frequency content
from small rolling windows extracted from the signal. This allows us to plot how the
frequency content changes over time.

"""
plotting a mel-spectrogram of motor vibration to diagnose the point of failure

note: if you don't want to use librosa, you can construct a mel-spectrogram
easily using scipy's fft function across a rolling window, allowing for more
granular calculation, and matplotlib's imshow function for more granular
rendering
"""

#importing dependencies
import librosa #for calculating the mel-spectrogram
import librosa.display #for plotting the mel spectrogram

#loading the motor data


y = load_motor_data()
samplerate = 1000 #in Hz

#calculating the mel spectrogram, as per the librosa documentation


D = np.abs(librosa.stft(y))**2
S = librosa.feature.melspectrogram(S=D, sr=samplerate)

#creating wide figure


fig = plt.figure(figsize=(18,6))

#plotting the mel spectrogram


ax = fig.subplots()
S_dB = librosa.power_to_db(S, ref=np.max)
https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 17/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

img = librosa.display.specshow(S_dB, x_axis='time',


y_axis='mel', sr=samplerate,
fmax=8000, ax=ax)
fig.colorbar(img, ax=ax, format='%+2.0f dB')
ax.set(title='Mel-frequency spectrogram')

#rendering
plt.show()

A mel spectrogram of the motor data. Instead of the a 2d frequency spectrogram, mel spectrograms are 3d:
The vertical axis is the frequency of oscillation, the x axis is time (in this case a percentage of time) that the
frequency content was calculated, and the color represents amplitude, which is measured in a unit called
decibels. Note that at time 0.2, the frequency content of the motor suddenly changes.

In a Mel Spectrogram, each vertical slice represents a region of time, with high-
frequency content being shown higher up, and low-frequency content being shown
lower down in the plot. It’s easy to see that at time 0.2 (20% through our data), the
frequency content changed dramatically. At this point a balancing weight became
loose, causing the engine to become unbalanced. Maintenance at this point may
save the engine from excess wear in the future.

A simple yet effective way to employ this principle is with scheduled vibration
readings. A worker sticks an accelerometer on the body of a motor with a magnet
and records the frequency content once or twice a month. Those windows of
vibration data are then converted to the frequency domain, where certain key
features are extracted. A common extracted feature from the frequency domain is
power spectral density, which is essentially the area under the frequency domain
curve over certain regions of frequencies. Extracted features can be plotted over
several weeks of recordings and used as a proxy for overall motor health.
https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 18/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

3) Advanced Uses of the Frequency Domain

3.1) Data Augmentation (Advanced)


(Back To Table of Contents)

Data augmentation is the process of creating fake data from real data. The
quintessential example is image classification to bolster a data set for classifying if
images are of a dog or a cat.

Example of image augmentation, where a single image can be used to generate multiple images for a
machine learning model to learn from. Created with Affinity Designer 2, stock photo from storyblocks.com

Augmentation can be an incredibly powerful tool, but what if you don’t have
images? What if you have sound, motion, temperature, or some other signal? How
can one sensibly augment these types of data? In the time domain, augmentation
strategies look a lot more like regularization strategies: add a bit of noise here, and

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 19/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

shift the data up or down there. They add random information to data, which can be
useful, but they don’t really make new examples.

We can steal something from the music production scene: a wavetable. The idea
behind a wavetable is to convert two waves to the frequency domain, interpolate
between the two in the frequency domain, then convert the interpolation back to
the time domain. I don’t mean blending, where you overlay one signal over the
other, but making a completely new wave which contains frequency content from
two (or more) other waves.

Let’s imagine we’re trying to build a model to detect if people are talking or not in an
audio snippet. We have a bunch of samples of audio where people are talking, and a
bunch of samples where people aren’t, both in a variety of situations. This data
requires someone to go out with a collection of different microphones and capture
sounds, and then manually flag if the data contains someone talking or not, in a
variety of situations. let’s say the model has to be very robust, and very accurate, and
recording sufficient data to reach desired performance levels is not financially
feasible.

In theory, the thing that makes human speech sound the way it does is frequency
content. A blend of frequency content from one snippet of talking and another
snippet of talking should still sound like someone talking. We can use a wave table
to construct these artificial waves, thus making more data for free (besides a data
scientist's salary and big old expensive computing resources on the cloud).

"""
loading and plotting two waveforms recorded in two seperate environments,
both including people talking
"""

#loading two waveforms


samplerate, y1 = wavfile.read('crowd.wav')
_, y2 = wavfile.read('citycenter.wav')

#creating x axis for both waveforms


N = 1000000
x1 = np.linspace(start = 0, stop = N/samplerate, num = N)
x2 = np.linspace(start = 0, stop = N/samplerate, num = N)

#creating wide figure


plt.figure(figsize=(18,6))

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 20/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

offset = 1000000

#plotting waveform 1
plt.subplot(2, 1, 1)
plt.plot(x1,y1[offset:offset+N])
plt.xlabel('t (seconds)')
plt.ylabel('A (db)')

#plotting waveform 2
plt.subplot(2, 1, 2)
plt.plot(x2, y2[offset:offset+N])
plt.xlabel('t (seconds)')
plt.ylabel('A (db)')

#rendering
plt.show()

Two waveforms, both prominently including people talking. (raw data from storyblocks.com)

We can convert both of these waves to the frequency domain, and create several
frequency representations which are interpolations between the two waves.

"""
Converting both waves to the frequency domain, constructing a wave table, and r
"""

#calculating the frequency content for both waves.


#Only analyzing 1 of the 2 stereo channels
fq1 = fft(y1[offset:offset+N,0])
fq2 = fft(y2[offset:offset+N,0])

#defining frequency axis


T = 1/samplerate
xf = fftfreq(N, T)

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 21/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

#creating wide figure


fig = plt.figure(figsize=(18,6))
ax = fig.add_subplot(111, projection='3d')

#plotting source waves


plt.plot(xf[:N//2], np.array([1]*(N//2)), 2.0/N * np.abs(fq1[0:N//2]))
plt.plot(xf[:N//2], np.array([0]*(N//2)), 2.0/N * np.abs(fq2[0:N//2]))

fq_interp = []
#creating interpolations
for per in np.linspace(0.1,0.9,9):
thisfq = (fq1*per) + (fq2*(1-per))
fq_interp.append((per, thisfq))

#plotting interpolation
plt.plot(xf[:N//2], np.array([per]*(N//2)), 2.0/N * np.abs(thisfq[0:N//2]))

plt.show()

Frequency spectrograms for both the original waveforms (at the extremes) and the waveforms in the middle.
Note that the the plot shows the spectrogram as frequency vs amplitude, but the interpolation also is done
over the phase as well.

We can now compute the inverse Fast Fourier Transform on all of these interpolated
frequency domains, and extract our table of waves.

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 22/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

"""
Computing the inverse fft on the frequency content, and constructing the final
"""

#creating wide figure


fig = plt.figure(figsize=(18,6))
ax = fig.add_subplot(111, projection='3d')

plt.plot(x, np.array([1]*len(x)), y1[offset:offset+N,0])


plt.plot(x, np.array([0]*len(x)), y2[offset:offset+N,0])

#creating interpolations
for per, interp in fq_interp:

waveform = ifft(interp)

plt.plot(x, np.array([per]*len(x)), waveform)

plt.show()

The final wave table. The extreme waves are the source waves, while the ones in between are interpolations
in the frequency domain.

And there we go. From 2 waves of people talking, we now have 10 waves of people
talking. Data augmentation can be a tricky task, as you can easily create data which
is not actually indicative of the data you’re trying to emulate. When employing a
similar augmentation strategy, you can use augmentations which are closer to the

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 23/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

source waves (80% one wave, 20% another). These will be more likely to be realistic
than waves closer to the center (50%, 50%).

3.2) Embedding and Clustering (Advanced)


(Back To Table of Contents)

For this example, we’ll use the output from a sentiment analysis model to cluster
different products based on their customer sentiment over time. Let’s say we run a
store with reviews, and those reviews fluctuate between positive and negative. We
notice we have some reviews which correlate with one another. We want to find
products which have similar sentiment analysis trends, such that they can be
grouped together and further understood.

First, let’s look at our data:

"""
loading 1000 average sentiment scores over the course of a year,
and plotting the first 10 of them
"""

#loading sentiment data


sentiments = load_sentiments()

#creating wide figure


plt.figure(figsize=(18,6))

#plotting first 10 sentiments


for i in range(10):
plt.plot(sentiments[i])

#rendering
plt.xlabel('days')
plt.ylabel('sentiment (low to high)')
plt.show()

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 24/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

first 10 examples of sentiment (data synthetically generated by the author)

As you can see, we have many examples of user sentiment, averaged on a per-day
basis. We can remove the very low-frequency content, which will remove very long-
term average trends (like the average), and we will remove very high-frequency
content, which is noise and is unlikely to create useful clusters.

"""
Converting to the frequency domain, removing very low and high frequency conten
We do this, so we can visually understand the frequency content which we deem i
"""

#importing dependencies
from scipy.fft import fft, ifft #for computing frequency information

#creating wide figure


plt.figure(figsize=(18,6))

#defining the low frequency and high frequency cutoffs


#because lowfq is so low, it effectively only cuts of the wave
#with a frequency of zero, which controls the vertical offset of the data
lowfq = 0.0001
highfq = 0.05

#plotting first 10 sentiments


for i in range(10):

#getting signal
sig = sentiments[i]

#calculating the frequency domain


yf = fft(sig)
T = 1
N = len(sig)
xf = fftfreq(N, T)

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 25/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

#applying naiive filter


yf[np.abs(xf) < lowfq] = 0
yf[np.abs(xf) > highfq] = 0

#converting back to the time domain, and plotting


y = ifft(yf)
plt.plot(y)

#rendering
plt.show()

Ultimately, we will be clustering data in the frequency domain. We generate this plot just so we can confirm
that we’re preserving the type of content we care about: not too low frequency, and not too high frequency.

Now we’re done with the time domain, and will begin working on building up our
clustering in the frequency domain. Let’s look at our filtered frequency domain
plots

The frequency content used to construct the waves above

The input to our clustering operation will be a list of amplitudes, each of which
corresponds to a specific frequency. We could feed this data to our clustering
https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 26/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

algorithm, but there is an additional step which can create significant


improvements. Imagine we are trying to cluster four simple sin waves, with
frequency domain content which looks like this:

a representation of four sin waves, plotted in the frequency domain, for demonstrative purposes

You would expect the waves on the left to cluster together, and the waves on the
right to cluster closely together. However, the vectors which describe this data look
like this:

[0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0]
[0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0]
[0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0]

From the perspective of t-SNE, all of these waves are, equally, orthogonal to each-
other, as none of them share any value along a similar axis. We can get around this
issue by making the frequency domain “fuzzy”; we can apply a moving average to
this data such that frequency content blends to adjacent regions.

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 27/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

our sample data, with an exponential moving average applied in both directions, causing similar frequency
content to bleed into one another.

This data is significantly more likely to yield good clustering results, as similar
frequency content spikes are more apt to bleed into one another. Let’s apply this
concept to our sample plot of sentiment data:

"""
Converting data to the frequency domain, and applying an exponential
moving average in both directions. This is the data we will be clustering.
"""

#converting data to frequencies, and filtering out content


wfs = [np.abs(fft(y)[0:N//2][1:20]) for y in sentiments[:10]]

#loading sample data into a pandas dataframe


df = pd.DataFrame(wfs).T

#applying an exponential moving average in both directions, and adding them


df_plt = df.iloc[::-1].ewm(span=3, adjust=False).mean().iloc[::-1]
df_plt = df.ewm(span=3, adjust=False).mean().add(df_plt)

#creating wide figure


plt.figure(figsize=(18,6))
plt.plot(df_plt)

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 28/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

Filtered amplitude over frequency data, for clustering. Keep in mind, there are numerous changes that can
be made to this general approach. Different high and low frequencies can be used, different spans of the
exponential average can be used, the frequency domain can be normalized such that relative amplitudes are
similar, etc.

As a result of our processing steps, this data is significantly more likely to create
clusters of data we actually care about. Now we can tie all this together, and create
our final cluster:

"""
Converting all the sentiment waveforms to the frequency domain,
applying filtration, and embedding in 2d with TSNE
"""

#importing dependencies
from sklearn.manifold import TSNE

#converting data to frequencies, and filtering out content, for al product sent
wfs = [np.abs(fft(y)[0:N//2][1:20]) for y in sentiments]

#loading sample data into a pandas dataframe


df = pd.DataFrame(wfs).T

#applying an exponential moving average in both directions, and adding them


df_plt = df.iloc[::-1].ewm(span=3, adjust=False).mean().iloc[::-1]
df_plt = df.ewm(span=3, adjust=False).mean().add(df_plt)

#creating wide figure


plt.figure(figsize=(18,6))

#embedding the data


embedding = TSNE(n_components=2 ,init='random', perplexity=20).fit_transform(df

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 29/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

#plotting
plt.scatter(embedding[:,0],embedding[:,1])

t-SNE plot of the filtered frequency domain for all user sentiment product reviews.

And that’s it! Naturally, for a practical application, a lot of work has to be done after
this graph is generated. Likely, these clouds of data would have to be explored, and
potentially labeled, and further refinement of key parameters would have to be
done to gain further insights. For this example, though, we have used the frequency
domain to apply a clustering algorithm to time series data, allowing us to see which
sentiments oscillate in similar ways. This type of analysis could inform product
recommendations within a website, for instance.

3.3) Compression (Intermediate)


(Back To Table of Contents)

Signals contain a lot of data. Sampling at 96,000 samples per second for a few hours
yields massive audio files. These raw recordings are useful for high-quality audio
processing, but when you’re done and want to send a sample to a friend, you’re
willing to sacrifice a bit of audio quality for speed and size. You can down-sample to
a point (send fewer samples per second), however, that will limit the maximum
pitch of the frequencies you can send (If you’re only sending 200 samples/second
you can’t send any frequency higher than 100 Hz). Instead, you can convert your
sample to the frequency domain, compress similar frequencies together, then send
the frequency domain along with the sampling rate. the recipient can then rebuild

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 30/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

the compressed audio via a transform from the frequency domain to the time
domain. This allows you to send arbitrarily high frequencies without needing to
send an arbitrarily large amount of data. The reason mp3 files, for instance, are so
much smaller than .wav files is that they use a Fourier transform prominently in
their encoding.

4) Conceptual Takeaways for Data Scientists


(Back To Table of Contents)

Using frequency analysis directly as a tool can be vital for solving certain problems,
as we’ve seen in previous examples. What often goes unappreciated is the usage of
the frequency domain as a concept. As a data scientist, it might be difficult to wrap
your brain around self-similar modeling strategies like recurrent and convolutional
networks, especially when solving specific, subtle problems. Sometimes, thinking of
these problems as a quasi-frequency domain extraction can be more useful.

Convolutional networks, for instance, use wavelets (convolutions) that propagate


over data. The result then gets pooled, reducing the resolution of the data, and
further wavelets get applied. You can think of convolutions as extracting varying
frequencies of information, often from high-frequency information to low-
frequency information. Keeping this in mind can lead to a more intuitive
understanding of stride, kernel size, and other hyperparameters.

5) Summary
(Back To Table of Contents)

In this article we covered the frequency domain, how it relates to signals and sin
waves, and saw a few examples of frequency domain representations. We saw how a
time-series signal can be converted to the frequency domain, and vice versa, and
saw several examples of how, by converting to the frequency domain, several
classes of problems can be solved.

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 31/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

Follow For More!


In a future post, I’ll describe how the frequency domain can be applied to higher
dimensional signals, like images and video, and how that can be used to great effect
in machine learning/data science applications. I’ll also be describing several
landmark papers in the ML space, with an emphasis on practical and intuitive
explanations.

Attribution: All of the images in this document were created by Daniel Warfield. You
can use any images in this post for your own non-commercial purposes, so long as
you reference this article, https://danielwarfield.dev, or both.

P.S. — Join me on RoundtableML


RoundtableML is a vibrant community where ambitious and driven individuals
come together to collaborate and push boundaries of ML and AI application in a
safe and responsible way. If you're eager to expand your knowledge of ML, engage in
open research diveinto scientific papers and work on ML project within small
intimate groups — this is the place for you!

You can join using this discord invite.

Data Science Signal Processing Machine Learning Time Series Analysis

Deep Dives

Some rights reserved

Follow

Written by Daniel Warfield


7.9K Followers · Writer for Towards Data Science
https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 32/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

Data Scientist and Educator, teaching machine learning Intuitively and Exhaustively.
https://iaee.substack.com/

More from Daniel Warfield and Towards Data Science

Daniel Warfield in Towards Data Science

YOLO — Intuitively and Exhaustively Explained


The genesis of the most widely used object detection models.

· 21 min read · May 31, 2024

508 6

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 33/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

Dario Radečić in Towards Data Science

Python One Billion Row Challenge — From 10 Minutes to 4 Seconds


The one billion row challenge is exploding in popularity. How well does Python stack up?

· 10 min read · May 8, 2024

4.2K 49

Theo Wolf in Towards Data Science

Kolmogorov-Arnold Networks: the latest advance in Neural Networks,


simply explained
https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 34/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

The new type of network that is making waves in the ML world.

· 9 min read · May 13, 2024

1.91K 19

Daniel Warfield in Towards Data Science

YOLO — By Hand
A breakdown of the math within YOLO

· 6 min read · 2 days ago

142 1

See all from Daniel Warfield

See all from Towards Data Science

Recommended from Medium


https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 35/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

Theo Wolf in Towards Data Science

Kolmogorov-Arnold Networks: the latest advance in Neural Networks,


simply explained
The new type of network that is making waves in the ML world.

· 9 min read · May 13, 2024

1.91K 19

Ignacio de Gregorio in Towards AI

RAG 2.0, Finally Getting RAG Right!


https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 36/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

The Creators of RAG Present its Successor

· 9 min read · Apr 10, 2024

2.6K 17

Lists

Predictive Modeling w/ Python


20 stories · 1261 saves

Practical Guides to Machine Learning


10 stories · 1521 saves

Natural Language Processing


1498 stories · 1019 saves

data science and AI


40 stories · 175 saves

Dr. Ernesto Lee

Advanced Stock Pattern Prediction using LSTM with the Attention


Mechanism in TensorFlow: A step by…
Introduction

15 min read · Apr 8, 2024


https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 37/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

801 18

Adva Nakash Peleg in CyberArk Engineering

An LLM Journey — from POC to Production


Journey from LLM Proof-of-Concept to Production: Tips, challenges, and best practices for
turning your idea into a real-world product.

12 min read · May 30, 2024

2.6K 6

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 38/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

René F. Najera, MPH, DrPH

Not Everything Is Normal: Three Statistical Distributions and When to


Use Them
Be mindful of linear regressions. They’re not always the indicated statistical analysis.

· 9 min read · May 28, 2024

361 5

Liu Zuo Lin

You’re Decent At Python If You Can Answer These 7 Questions Correctly


https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 39/40
10/06/2024, 13:03 Use Frequency More Frequently, a guide to using the fast fourier transform for data scientists in python wit…

# No cheating pls!!

· 6 min read · Mar 6, 2024

4.91K 24

See more recommendations

https://towardsdatascience.com/use-frequency-more-frequently-14715714de38 40/40

You might also like