Audio Signal Processing
Audio Signal Processing
Output Signal
Input Signal
Audio
Signal
Processing Data with meaning
Audio Processing in HCI
Some HCI applications involving audio signal
processing are:
• Speech Emotion Recognition
• Speaker Recognition
▫ Speaker Verification
▫ Speaker Identification
• Voice Commands
• Speech to Text
• Etc.
Audio Signals
You can find audio signals represented in either
digital or analog format.
Analog Signal
Sample Digital Signal
Quantization Encoding
Continuous in Time & Hold Discrete in Time Discrete in Time Discrete in Time
Continuous in Continuous in Discrete in Discrete in
Amplitude Amplitude Amplitude Amplitude
BitsPerSample
1) ByteRate = SampleRate ⋅ NumChannels ⋅ 8
BitsPerSample
2) BlockAlign = NumChannels ⋅ 8
Waveform Audio File Format (WAV)
Endianess
Byte
Offeset
Field Name Field Size Description BitsPerSample
8 bits = 8, 16 bits = 16, etc.
Big 0 ChunkID 4
Little 4 ChunkSize 4 RIFF Chunk Descriptor
Big 8 Format 4
SubChunk2ID
Big 12 SubChunk1ID 4
Contains the letters
Little 16 SubChunk1Size 4
«data» in ASCII form
Little 20 AudioFormat 2
(0x64617461 big-endian
Little 22 NumChannels 2
Format SubChunk form)
Little 24 SampleRate 4
Little 28 ByteRate 4
Little 32 BlockAlign 2 SubChunk2Size
Little 34 BitsPerSample 2 This is the number of
Big 36 SubChunk2ID 4 bytes in the Data field.
Little 40 SubChunk2Size 4 Data SubChunk If AudioFormat=PCM,
Little 44 Data SubChunk2Size then you can compute the
number of samples (see
Equation 3).
8 ⋅ SubChunk2Size
3) NumOfSamples =
NumChannels ⋅ BitsPerSample
Example of wave header
AudioFormat = 1 (PCM)
Good work!
typedef struct header_file
Solution
{
char chunk_id[4];
int chunk_size;
char format[4];
char subchunk1_id[4];
int subchunk1_size;
short int audio_format;
short int num_channels;
int sample_rate;
int byte_rate;
short int block_align;
short int bits_per_sample;
char subchunk2_id[4];
int subchunk2_size;
} header;
infile.open("foo.wav", ios::in|ios::binary);
infile.read ((char*)meta, sizeof(header));
• Create AVFrame
▫ This structure describes decoded (raw) audio or
video data.
AVPacket packet;
av_init_packet(&packet);
…
AVFrame* frame = avcodec_alloc_frame();
A little bit of code …
Step 5
• Read packets
▫ Packets are read from AVContextFormat
• Decode packets
▫ Frame are decodec with CodecContext
// Read the packets in a loop
while (av_read_frame(formatContext, &packet) == 0)
{
…
avcodec_decode_audio4(codecContext, frame, &frameFinished, &packet);
…
src_data = frame->data[0];
}
Problems with FFmpeg
• Update issues (with lib update, your previous
code might not work)
▫ Deprecated methods;
▫ Function name or parameters could change.
• Poor documentation (until today)
Example of migration:
• avcodec_open (AVCodecContext *avctx, const AVCodec *codec)
• avcodec_open2 (AVCodecContext *avctx, const AVCodec *codec,
AVDictionary **options)
Audio Processing with Matlab
• Matlab contains a lot of built-in functions to
read, listen, manipulate and save audio files.
• It also contains Signal Processing Toolbox and
DSP System Toolbox
Advantages Disadvantages
filename = './test.wav';
[data,fs] = wavread(filename); % reads only wav file
%play file
sound(data,fs);
if overlap == 0;
Y = buffer(data,sampleForWindow);
else
sampleToJump = sampleForWindow - timeStep * fs;
Y = buffer(data,sampleForWindow,ceil(sampleToJump));
end
old_Y = Y;
for i=1:numFrames
Y(:,i)=Y(:,i).*w_hann;
end
for i=1:numFrames
energy(i)=sum(abs(old_Y(:,i)).^2);
end
figure, plot(energy)
𝐸= |𝑥(𝑖 )|2
𝑖=1
Fast Fourier Transform (FFT)
%% Fast Fourier Transform (sull'intero segnale)
% Section ID = 7
% PLOT
plot(f,abs(freqSignal(1:NFFT/2+1)))
title('Single-Sided Amplitude Spectrum of y(t)')
xlabel('Frequency (Hz)')
ylabel('|Y(f)|')
for i=1:numFrames
STFT(:,i)=fft(Y(:,i),NFFT);
end
% PLOT
plot(autoCorr(sampleForWindow:end,i))
else
disp('Unable to create plot');
end
clear indexToPlot
A system for doing phonetics: Praat
• PRAAT is a comprehensive
speech analysis, synthesis, and
manipulation package
developed by Paul Boersma
and David Weenink at the
Institute of Phonetic Sciences
of the University of
Amsterdam, The Netherlands.
Pitch with Praat
Formants with Praat
5th
4th
3rd
2nd
1st
Other features with Praat
• Intensity
• Mel-Frequency Cepstrum Coefficients (MFCC);
• Linear Predictive Coefficients (LPC);
• Harmonic-to-Noise Ratio (HNR);
• and many others.
Scripting in Praat
• Praat can run scripts containing all the different commands available
in its environment and perform the operations and functionalities
that they represent.
fileName$ = "test.wav"
Read from file... 'fileName$'
name$ = fileName$ - ".wav"
select Sound 'name$'
To Pitch (ac)... 0.0 50.0 15 off 0.1 0.60 0.01 0.35 0.14 500.0