Report On Project 1 Speech Emotion Recognition
Report On Project 1 Speech Emotion Recognition
Report On Project 1 Speech Emotion Recognition
REPORT ON PROJECT 1
1. Abstract
2. Introduction
3. Flow chart
4. Running code
5. Training and testing input
6. Accuracy achieved
7. Instructions for running the code
ABSTRACT
Virtually all audio recordings contain some amount of noise. This noise may join audio
signal during recording process or due to long media storage. To produce best quality
audio recordings these unwanted audio noises must be removed to the greatest extent
possible. Noise cancellation of audio signal is key challenge problem in Audio Signal
Processing. Since noise is random process and varying every instant of time, noise is
estimated at every instant to cancel from the original signal. There are many schemes
for noise cancellation but most effective scheme to accomplish noise cancellation is to
use adaptive filter. Active Noise Cancellation (ANC) is achieved by introducing “anti-
noise” wave through an appropriate array of secondary sources. These secondary sources are
interconnected through an electronic system using a specific signal processing algorithm for
the particular cancellation scheme. In this paper, the three conventional adaptive algorithms;
RLS(Recursive Least Square), LMS(Least Mean Square) and NLMS(Normalized Least Mean
Square) for ANC are analyzed based on single channel broadband feed forward. For
obtaining faster convergence, Normalized Least Mean Square (NLMS) algorithm is modified
and associated extended algorithm under Gaussian noise assumption. Simulation results
indicate a higher quality of noise cancellation and more minimizing mean square error
(MSE).
INTRODUCTION
Audio noise reduction system is that kind of system which is helpful to remove unwanted
noise from speech signals. Audio noise reduction can be classified into two kinds.
Complementary type and non-complementary type. One cannot effectively use speech
communication in an environment in which the background noise level is too high. It
interferes with the original message and corrupts the parameters of the message signal.
Noise can be defined as an unwanted signal that interferes with the communication or
measurement of another signal. A noise itself is an information-bearing signal that conveys
information regarding the sources of the noise and the environment in which it propagates.
The types and sources of noise and distortions are many and include: (i) electronic noise such
as thermal noise and shot noise, (ii) acoustic noise emanating from moving, vibrating or
colliding sources such as revolving machines, moving vehicles, keyboard clicks, wind and
rain, (iii) electromagnetic noise that can interfere with the transmission and reception of
voice, image and data over the radio-frequency spectrum, (iv) electrostatic noise generated
by the presence of a voltage, (v) communication channel distortion and fading and (vi)
quantization noise and lost data packets due to network congestion.
FLOW CHART
4. Running code
audiofile = '/content/drive/MyDrive/whistle_withoutac.wav'
IPython.display.Audio(audiofile)
# Deserializing
frames_wave = struct.unpack('{n}h'.format(n=nframes), frames_wave)
frames_wave = np.array(frames_wave)
print("Min value:", np.min(frames_wave), "Max value:", np.max(frames_wave)
#Applying Fourier
print(magnitude.shape, phase.shape)
print("The max frequency (highest magnitude) is {} Hz".format(np.where(magnitude ==
np.max(magnitude))[
0][0]))
ax1 = fig.add_subplot(1,3,1)
ax1.set_title("Original audio wave / Spatial Domain")
ax1.set_xlabel("Time(s)")
ax1.set_ylabel("Amplitude (16 bit depth - Calulated above)")
ax1.plot(frames_wave)
ax2 = fig.add_subplot(1,3,2)
ax2.set_title("Frequency by magnitude (Max at {} Hz) / Frequency
Domain".format(np.where(magnitude == n
p.max(magnitude))[0][0]))
ax2.set_xlabel("Frequency (Hertz)")
ax2.set_ylabel("Magnitude (normalized)")
ax2.set_xlim(0, 44100) # we are not interested in rest
ax2.plot(magnitude / nframes) # Normalizing magnitude
ax3 = fig.add_subplot(1,3,3)
ax3.set_title("[Unclipped]Frequency by magnitude (Max at {} Hz) / Frequency
Domain".format(np.where(ma
gnitude == np.max(magnitude))[0][0]))
ax3.set_xlabel("Frequency (Hertz)")
ax3.set_ylabel("Magnitude (normalized)")
ax3.plot(magnitude / nframes) # Normalizing magnitude
plt.show()
def _plot_graph():
# Get the filter coefficients so we can check its frequency response.
# Plot the frequency response.
w, h = signal.freqz(b, a, worN=8000)
plt.subplot(2, 1, 1)
plt.plot(0.5 *fs*w/np.pi, np.abs(h), 'b')
plt.plot(cutoff, 0.5 * np.sqrt(2), 'ko')
plt.axvline(cutoff, color='k')
plt.xlim(0, 0.5*fs)
plt.title("Filter Frequency Response")
plt.xlabel('Frequency [Hz]')
plt.grid()
plt.show()
_plot_graph()
return y
# Filter requirements.
order = 10
fs = framerate #* 6.28 # sample rate, Hz
cutoff = 900 #* 6.28 # desired cutoff frequency of the filter, Hz
ax1 = fig.add_subplot(1,4,1)
ax1.set_title("[After Filter] Original audio wave / Spatial Domain")
ax1.set_xlabel("Time(s)")
ax1.set_ylabel("Amplitude (16 bit depth - Calulated above)")
ax1.plot(y)
ax2 = fig.add_subplot(1,4,2)
ax2.set_title("[Before Filter] Original audio wave / Spatial Domain")
ax2.set_xlabel("Time(s)")
ax2.set_ylabel("Amplitude (16 bit depth - Calulated above)")
ax2.plot(frames_wave, 'r')
m = np.abs(fftpack.fft(y))
ax3 = fig.add_subplot(1,4,3)
ax3.set_title("[After Filter] Frequency by magnitude")
ax3.set_xlabel("Frequency (Hertz)")
ax3.set_ylabel("Magnitude (normalized)")
ax3.set_xlim(0, 44100) # we are not interested in rest
ax3.plot(np.abs(fftpack.fft(y)) / nframes)
# ax2.plot(range(0, 676864), m, 'g-', label='dataa')
ax4 = fig.add_subplot(1,4,4)
ax4.set_title("[Before Filter] Frequency by magnitude")
ax4.set_xlabel("Frequency (Hertz)")
ax4.set_ylabel("Magnitude (normalized)")
ax4.set_xlim(0, 44100) # we are not interested in rest
# ax2.plot(magnitude * 2 / (16 * len(magnitude)))
ax4.plot(magnitude / nframes, 'r')
plt.show()
IPython.display.Audio(data=y, rate=44100)
6. Accuracy achieved
Our code will work for any other audio file which contains noise inherently. In this code we
achieved 85% of accuracy. It removes almost all the hissing noise present in it.