Speech Emotion Recognition With Deep Learning
Speech Emotion Recognition With Deep Learning
net/publication/320089581
CITATIONS READS
22 743
3 authors, including:
Pavol Harár
University of Vienna
17 PUBLICATIONS 71 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Effects of non-invasive brain stimulation on hypokinetic dysarthria, micrographia, and brain plasticity in patients with Parkinson's disease View project
All content following this page was uploaded by Pavol Harár on 06 April 2020.
B. DNN architecture
As the first two layers of our model we used convolutional
layer [16] with 32 kernels of size 7 x 1 succeeded with average
pooling layer [17]. The third and fourth layers were also
convolutional with 32 kernels of size 13 x 1 again succeeded
with average pooling. The last two convolutional layers had 16
kernels again of size 13 x 1. After the last convolutional layer
we divided the network to two branches. Both branches
consisted only of one pooling layer. One with average pooling
and the second with max pooling [17] which were afterwards
flattened and concatenated back to main branch.
From this point on, the DNN consisted of only fully
connected layers. The first of size 480, the second one of size
240 and the last one was an output Softmax layer [18] with 3
output neurons. All pooling layers were used with pool size 2.
All convolutional layers border mode was set to 'valid'
therefore no zero padding was performed on borders.