0% found this document useful (0 votes)

119 views9 pages

IJRASET Sample Paper For Format

The document presents a research study on Speech Emotion Recognition (SER) using Convolutional Neural Networks (CNNs), highlighting the challenges of understanding emotions from speech and the methodologies employed, including the use of multiple datasets like CREMA-D, RAVDESS, TESS, and SAVEE. It discusses the importance of spectrograms and mel spectrograms in capturing acoustic features for emotion recognition and outlines the implementation details of the CNN architecture. The research aims to enhance applications in human-computer interaction, medical diagnosis, and forensic investigations through improved emotion recognition techniques.

Uploaded by

agkacdm1163

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views9 pages

IJRASET Sample Paper For Format

Uploaded by

agkacdm1163

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

12 VIII August 2024

https://doi.org/10.22214/ijraset.2024.63859
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 12 Issue VIII Aug 2024- Available at www.ijraset.com

Speech Emotion Recognition Using Convolutional

Neural Networks
Dr. N. V Rajasekhar Reddy1, Sriyash Kulkarni2, Thangella Sainikhil3, Shreyas Vala4
1
Head of the Department, Department of IT MLR Institute ofTechnology
2, 3, 4
Research Student, Department of IT, MLR Institute of Technology

Abstract: Speech is a powerful way to express our thoughts and feelings. It can give us valuable insights into human emotions.
Speech emotion recognition (SER) is a crucial tool used in various fields like human-computer interaction (HCI), medical
diagnosis, and lie detection. However, understanding emotions from speech is challenging.
This research aims to address this challenge. It uses multiple datasets, including CREMA-D, RAVDESS, TESS, and SAVEE, to
identify different emotional states. The researchers reviewed existing literature to inform their methodology. They used
spectrograms and mel spectrograms extracted from speech data to capture important acoustic features for emotion recognition.
The researchers used Convolutional Neural Networks (CNNs), a cutting-edge machine learning approach, to try and decipher the
delicate emotional clues included in speech data. Accurate speech emotion recognition has important ramifications. They
may result in more effective forensic investigations, better medical diagnosis, and enhanced human- computer interface
experiences. With the potential to improve several sectors, this research advances the subject of emotional computing, which
aims to comprehend the complex link between speech and emotion.
CSS Concepts
Clean User Interface (UI): Design a simple, intuitive UI with CSS for easy interaction.
Responsive Design: Ensure UI adapts smoothly to different screen sizes using CSS media queries.
Engaging Feedback: Use CSS animations for user feedback, like loading indicators.
Consistent Branding: Apply CSS theming for a unified visual identity across the application.
Keywords: Convolutional Neural Networks, Spectrograms, and mel spectrograms. Machine Learning.

I. INTRODUCTION
People interact by talking to each other. When they communicate, they express their emotions and feelings. By understanding the
emotions of different speakers, we can tell if they are satisfied or not. This helps companies improve their services to customers,
leading to the growth of the company. This idea is the basis for our project, "Speech Emotion Recognition Using Convolutional
Neural Networks." Speech Emotion Recognition (SER) is an emerging technology in the field of Artificial Intelligence (AI). In
recent times, SER has found applications in areas like Human-Computer Interactions, call centers, and forensics.

A. Spectrograms and Mel Spectrograms

Spectrograms and Mel Spectrograms are commonly used methods in Speech Emotion Recognition. Spectrograms are visual
representations of sound waves that show the intensity and frequency of different sound components over time. They are created by
applying a Fourier transform to the audio signals. Time is represented by the x-axis, frequency range by the y-axis, and audio signal
amplitude by the colour. Mel spectrograms are similar to spectrograms, but they use a different frequency scale called the Mel scale.
Mel spectrograms are widely used to extract features that are more representative of how humans perceive sound.
Both spectrograms and Mel spectrograms are useful tools for analyzing and visualizing speech signals. They provide valuable
insights into the frequency content and changes over time, which can be important for various speech processing applications. This
happens because the relationship between frequency and pitch is not linear. Sound signals are divided into small segments called
frames, and the frequencies in each frame are analyzed. A technique called "mel scaling" is used to group similar frequencies and
display them on a logarithmic scale. This better represents how humans perceive sound, as our ears are more sensitive to changes at
lower frequencies. Next, a visual representation of the sound signal is displayed, with frequency on the vertical axis and time on the
horizontal. More energy in the signal at that particular time and frequency is indicated by brighter colours. Below are some
examples of both Mel Spectrograms and Spectrograms.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 30
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 12 Issue VIII Aug 2024- Available at www.ijraset.com

Spectrogram of an audio file of anger emotion

Mel Spectrogram of an audio file of anger emotion

II. THE APPROACH IN BASE PAPER

Spectrograms and Mel spectrograms were created from brief audio recordings. The most effective method for extracting features
was determined by analysing both spectrograms and mel spectrograms. Convolutional Neural Networks (CNN) are best suited for
Speech Emotion Recognition (SER), according to prior studies. But resolving the researchers' concerns took precedence over
creating the greatest possible model. It is impossible to determine whether a trained model is applicable in the actual world by using
a single dataset. Consequently, data from several databases was used, something that had never been done previously.

III. EXISTING PROBLEMS

1) DeepEmotion: DeepEmotion is a system that extracts emotions from audio inputs using sophisticated machine learning
methods, such as convolutional neural networks (CNNs). After transforming the audio input into spectrograms, it gets ready by
using CNN architectures to extract features and categorise emotions. The system usually consists of fully connected layers for
emotion classification after several layers of convolutional and pooling procedures.Although this strategy has certain potential
disadvantages, it can also be effective:
2) Limited Generalisation: Because DeepEmotion overfitted on the particular training set, it might not be able to generalise as
effectively to new datasets or emotional expressions. CNN models may not be able to identify emotions or speaker traits that
were underrepresented in the first training set if they are not appropriately regularised or trained on a variety of data sets.
Computational Complexity: DeepEmotion's CNN topologies may be computationally costly, particularly if they have a lot of
layers or parameters. In certain real-world applications, training these models could be time-consuming and computationally
demanding.
Overall, DeepEmotion represents an advanced approach to emotional recognition from speech, but it also faces some technical
challenges that may need to be addressed for optimal performance and deployment.
In your current project, you can address the issue of limited generalization by using regularization techniques like dropout, batch
normalization, and data augmentation. These methods can help prevent overfitting and encourage the model to learn more robust
and versatile features from the spectrogram data.
Instead of designing complex CNN architectures from scratch, you could consider using pre-trained models like VGG16 and
VGG19. These models have been pre-trained on large image datasets like ImageNet, which means they've already learned generic
features that can be fine-tuned for speech emotion recognition tasks. By using transfer learning, you can benefit from these learned
representations and reduce the computational complexity of training.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 31
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 12 Issue VIII Aug 2024- Available at www.ijraset.com

A. SERCNN
SERCNN is an existing syste•m that focuses on using convolutional neural networks (CNN) for spe•ech emotion recognition
(SER). It e•mploys different CNN architecture•s, such as 1D-CNN or 2D-CNN, to process audio spectrograms repre•senting speech
signals. The•se CNN models are traine•d on labeled emotion datase•ts to learn distinctive feature•s and accurately classify different
e•motional states.
One drawback of SERCNN is that it may struggle to capture• long-term temporal depe•ndencies in spee•ch signals if it relies solely
on CNN archite•ctures. Emotions are often e•xpressed dynamically over time•, and CNNs, which operate on fixed-size• windows,
may not effectively capture• such temporal nuances. Another limitation is that training CNN-base•d models like SERCNN
typically require•s a large amount of labeled data, which may not always be• readily available, espe•cially for specific emotion
categorie•s or diverse speake•r demographics. This lack of data efficiency can be• a challenge.
Consider integrating convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in your models to solve the
constrained temporal environment. While RNNs can represent temporal dependencies, which aids the model in capturing subtle
emotional expressions across time, CNNs are capable of extracting spatial characteristics from spectrums.
To improve data e•fficiency, try using data augmentation techniques like time stretching, pitch shifting, and noise• injection on your
spectrogram and mel spectrogram data. Additionally, you can leverage transfer learning by fine-tuning pretraine•d VGG16 and
VGG19 models on your speech e•motion recognition (SER) datasets. By initializing the mode•ls with weights learned from ge•neric
image data, you can train them e•ffectively with smaller amounts of labeled speech data, enhancing data efficiency and model
performance.

IV. IMPLEMENTATION
A. Dataset Details
The CREMA-D datase•t is a multimodal dataset consisting of 7443 video clips featuring 91 actors. The• emotions portrayed in the
datase•t include neutral, happiness, ange•r, disgust, fear, and sadness. This dataset was cre•ated using crowdsourcing, with 2443
raters providing the e•motion labels, making it a reliable re•source.. In contrast, the RAVDESS dataset has 1440 audio files that are
solely for speech out of 7356 files that include song and speech data. RAVDESS portrays the following emotions: peaceful, joyful,
sad, furious, fearful, shocked, and disgusted. For the purpose of teaching neural networks to recognise various emotional
expressions, both datasets offer insightful data. It is mentioned in the text that not every emotion from the RAVDESS dataset
was used. Since they are uncommon in other datasets and the RAVDESS dataset is too tiny to be used exclusively for deep neural
network (DNN) training, the emotions of surprise and tranquilly were eliminated. Because of this, there are somewhat uneven
amounts of examples for each emotion—particularly for the "neutral" feeling, which includes a mere 48 files.
The following six emotions are represented by the TESS (Toronto Emotional Speech Set) dataset: numbness, pleasant surprise, rage,
disgust, fear, and happiness. There are 2800 audio files in this dataset, which includes recordings of two female speakers who are 26
and 64 years old. However, since only two actors were participating in the study, using the TESS dataset alone for CNN model
training would be nearly impossible because half of the dataset would need to be reserved for testing. Because of this, other datasets
including CREMA-D, RAVDESS, and SAVEE are often used in conjunction with the TESS dataset. The "Surrey Audio-Visual
Expressed Emotion" dataset is abbreviated as SAVEE. It features audio, visual, and audiovisual recordings of four English male
actors portraying seven distinct emotions: fear, happiness, sadness, disgust, rage, and so on.
With 90 instances of the "neutral" emotion—double the amount of the other emotions—the dataset exhibits an imbalanced
distribution. It is not large enough on its own for artificial neural network (ANN) training, but it can be paired with the larger
RAVDESS dataset, which has less examples of "neutral" emotions.
Convolutional Neural Networks (CNNs) are a type of deep learning model, which are an extension of artificial neural networks.
They have proven to be highly effective in areas like image recognition and audio/video analysis.
The process starts by taking an image as input and passing it through the core components of a CNN. The key ide•a behind CNNs is
to apply filters, called kernels, to the input image in order to extract the most relevant features for the specific task at hand.
The main layers in a CNN include:
1) Convolutional Layer: Applies the filters to the image to extract feature•s
2) Pooling Layer: To increase the model's efficiency, the feature maps' size is decreased.
3) Flattening Layer: Converts the 2D feature maps into a 1D vector
4) Fully Connected Layer: Processes the flattened features to make the final predictions
5) Output Layer: Provides the final output, such as the predicted class of the image

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 32
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 12 Issue VIII Aug 2024- Available at www.ijraset.com

By carefully structuring these layers, CNNs can effectively learn to recognize complex patterns in visual data, making them a
powerful tool for various computer vision applications.

B. Convolution
A crucial layer in deep learning models is convolution. It is employed to identify significant aspects, such as images, in the input
data. It uses filters, sometimes referred to as kernels, to achieve this. These filters create a feature map that indicates the locations of
specific features by performing a dot product with the underlying pixel values.
The activation function plays a vital role in convolution. It introduces non-linearity, meaning the input is not directly proportional to
the output. This is important, as it allows the model to learn more complex patterns beyond basic classification. Padding is
sometimes used in convolution to avoid losing information at the edges, as the feature map size can get reduced after the convolution
operation.
Pooling is another important layer that follows convolution. It downsamples the feature maps, reducing the computational load and
memory requirements. Pooling extracts the most salient features, helping the model focus on the most important information.
Convolution and pooling, taken together, allow deep learning models to efficiently identify and extract significant features from
incoming data.Different forms of pooling exist. Max pooling, sum pooling, and average pooling are a few of them.

C. Flattening
The Flattening layer is usually added right be•fore the fully connected layer. This layer is necessary to convert the 2D matrix into a
single-column matrix, which can be• fed as input to the fully connected layer.

D. Alexnet
With eight layers, Alexnet is a potent deep learning network. Five convolutional layers are used at first, then max-pooling layers, and
finally three completely connected layers. A sizable image database called ImageNet is used to train this network.
The Alexnet layer layers are:
1) Max pooling using a 3x3 pool size and stride 2, then convolutional layer 1 with 96 filters of size 11x11, stride 4, and ReLU
activation.
2) Max pooling with a 3x3 pool size and stride 2 is followed by convolutional layer 2 with 256 filters of size 5x5, stride 1, ReLU
activation, and padding 2.
3) ReLU activation, stride 1, padding 1, and 384 filters of size 3x3 make up Convolutional Layer 3. activation, padding 1)
4) The fourth convolutional layer has 384 3x3 filters, 1 stride, 1 ReLU activation, and 1 padding.
5) ReLU activation, max pooling, and 256 filters of size 3x3 are used in convolutional layer 5.
Because of its complex and deep network architecture, Alexnet is exceptionally good at computer vision tasks like picture
classification.

V. ARCHITECTURE
We first downloaded the CREMA-D, RAVDESS, SAVEE, and TESS datasets from Kaggle. Then, we organized the datasets based
on the emotions they represented. Next, we used the Librosa library to extract spectrograms and mel spectrograms from the audio
files. We split these spectrograms and mel spectrograms into testing and training sets.

Work Flow of the project

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 33
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 12 Issue VIII Aug 2024- Available at www.ijraset.com

Among the many models we constructed were CNN, Alexnet, VGG-16, and VGG-19. To increase accuracy, we tested and trained
these models at the end. Recent studies have demonstrated the effectiveness of Convolutional Neural Networks (CNNs) in speech
emotion recognition (SER). But rather than trying to make the greatest model, our objective was to investigate the use of these
techniques. The sections that follow provide an overview of completed research, address specific issues, and provide answers to
frequently asked concerns. We created a basic model architecture based on the information provided, and we used it in all tests with a
few small adjustments to allow for successful training. The base model includes a dropout layer, max-pooling layers, and many
convolutional layers. After that, two dense layers and a flattened layer finish it off.

VI. SOLUTION OF THE PROBLEM

A follow-up investigation was carried out to confirm the possible results. In this study, recordings from the CREMA-D dataset were
categorised by humans according to their feelings. Given that it is among the biggest and most recent datasets accessible, the
CREMA-D dataset was selected. A thorough explanation of the data gathering procedure and statistical analysis for this dataset was
previously published by Cao et al. It was chosen because crowdsourcing was used to label the data, with volunteers contributing the
categories.

There were 54 Polish volunteers in the study, ranging in age from 22 to 58. Thirty audio recordings, five recordings for each emotion,
were given to them to identify. The recordings could be listened to by participants as many times as necessary during the online study.
Table presents the findings. The participants found it most difficult to identify the feelings of disgust and melancholy because they
were the rare responses they correctly gave. In contrast, the highest accuracy (76%), was found for anger. Individual scores in the
study ranged from 2 to 21, with 30 being the maximum attainable score. The study's overall average score was 14.63 (48.76% right).
15 was the median score within the appendix. Six discrepancies were found when comparing labels from the study with those from
CREMA-D crowdsourcing, which is noteworthy given the sample size of thirty. The fact that only 7–11 persons annotated each
audio file in CREMA-D as opposed to the 54 in the study mentioned above raises questions regarding the quantity of annotators
employed in the project. These findings also highlight how difficult it is to prepare data because there isn't a single set of guidelines
that specify exactly what has to be done in order to create a high- quality dataset for artificial intelligence (AI) models.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 34
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 12 Issue VIII Aug 2024- Available at www.ijraset.com

VII. RESULT ANALYSIS

For CNN architecture, with spectrograms, we got an accuracy of 74.69% in identifying 4 emotions and an accuracy of 62.17% in
identifying 6 emotions.

Spectrograms 4 emotions accuracy for CNN

Spectrograms 6 emotions accuracy for CNN

Mel Spectrograms 4 emotions accuracy for CNN

Mel Spectrograms 6 emotions accuracy for CNN

Mel Spectrograms 4 emotions accuracy for VGG 16

Mel Spectrograms 6 emotions accuracy for VGG 16

For the same CNN architecture, with melspectrograms, we got an accuracy of 75.87% in recognizing 4 emotions and an accuracy of
64.06% in identifying 6 emotions. Since mel spectrograms performed better than spectrograms, we implemented the remaining
architectures (Alexnet, VGG 16 and VGG 19) with mel spectrograms. With Alexnet architecture, we got an accuracy of 72.39% in
identifying 4 emotions and an accuracy of 62.13% in identifying 6 emotions. With VGG 16 architecture, we got an accuracy of
83.57% in identifying 4 emotions and an accuracy of 72.73% in identifying 6 emotions. With VGG 19 architecture, we got an
accuracy of 87.48% in identifying 4 emotions and an accuracy of 76.97% in identifying 6 emotions.

Mel Spectrograms 4 emotions accuracy for VGG 19

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 35
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 12 Issue VIII Aug 2024- Available at www.ijraset.com

Mel Spectrograms 6 emotions accuracy for VGG 19

VIII. CONCLUSION AND FURTHER SCOPE

The CNN model demonstrated an accuracy of 74.69% in recognizing 4 emotions and 62.17% in recognizing 6 emotions when using
spectrograms. We also split the mel spectrograms into training and testing sets, which resulted in accuracies of 75.87% and 64.06%
respectively for 4 and 6 emotions. This suggests that mel spectrograms performed better than regular spectrograms in identifying
emotions. Additionally, we experimented with different architectures like AlexNet, VGG-16, and VGG-19, and found that VGG-19
provided the highest accuracy in recognizing both 4 and 6 emotions.
Our research emphasises how crucial it is to partition datasets appropriately for AI models' testing and training. While many studies
report impressive speech emotion recognition (SER) results, interdependent data splitting is a problem that is frequently disregarded.
Because of this, it may be difficult to check and compare the results directly, particularly if the software is not publicly available to
the research community. We carried out experimental comparisons to highlight the need of dataset splitting techniques in order to
allay this worry.
In summary, this study shows that mel-spectrograms are an effective feature extraction method for convolutional neural networks
(CNNs) in speech emotion recognition tasks. The benefit of mel-spectrograms is evident in our quantitative visualisation, even
though spe•ctrograms are still used in the literature. Going ahead, it is essential

REFERENCES
[1] Zielonka, M.; Piastowski, A.; Czyżewski, A.; Nadachowski, P.; Operlejn, M.; Kaczor, K. "Recognition of Emotions in Speech Using Convolutional Neural
Networks on Different Datasets." Electronics 2022, 11, 3831.
[2] Abeer Ali Alnuaim, Mohammed Zakariah, Prashant Kumar Shukla, Aseel Alhadlaq, Wesam Atef Hatamleh, Hussam Tarazi, R. Sureshbabu, Rajnish Ratna,
"Human-Computer Interaction for Recognizing Speech Emotions Using Multilayer Perceptron Classifier", Journal of Healthcare Engineering, vol. 2022,
Article ID 6005446, 12 pages, 2022.
[3] Singh, A., Srivastava, K. K., & Murugan, H. "Speech Emotion Recognition Using Convolutional Neural Network (CNN)." International Journal of
Psychosocial Rehabilitation, Vol. 24, Issue 08, 2020.
[4] Anvarjon, T.; Mustaqeem; Kwon, S. "Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features." Sensors
2020, 20, 5212.
[5] F. Andayani, L. B. Theng, M. T. Tsun and C. Chua, "Hybrid LSTM-Transformer Model for Emotion Recognition From Speech Audio Files," in IEEE Access,
vol. 10, pp. 36018-36027, 2022.
[6] L. Yunxiang and Z. Kexin, "Design of Efficient Speech Emotion Recognition Based on Multi Task Learning," in IEEE Access, vol. 11, pp. 5528-5537, 2023.
[7] M. B. Er, "A Novel Approach for Classification of Speech Emotions Based on Deep and Acoustic Features," in IEEE Access, vol. 8, pp. 221640-221653, 2020.
[8] K. V. Krishna, N. Sainath and A. M. Posonia, "Speech Emotion Recognition using Machine Learning," 2022 6th International Conference on Computing
Methodologies and Communication (ICCMC), Erode, India, 2022, pp. 1014-1018.
[9] Eyben, F., Wöllmer, M., & Schuller, B. "Opensmile: the Munich versatile and fast open-source audio feature extractor." Proceedings of the 18th ACM
international conference on Multimedia. ACM, 2010.
[10] Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., ... & Burkhardt, F. "The INTERSPEECH 2013 computational paralinguistics
challenge: social signals, conflict, emotion, autism." Proceedings INTERSPEECH, 2013.
[11] Goodfellow, I., Bengio, Y., & Courville, A. "Deep learning." MIT press, 2016.
[12] LeCun, Y., Bengio, Y., & Hinton, G. "Deep learning." Nature, 521(7553), 436-444, 2015.
[13] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. "Imagenet: A large-scale hierarchical image database." IEEE conference on computer vision and
pattern recognition, 2009.
[14] He, K., Zhang, X., Ren, S., & Sun, J. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern
recognition, 2016.
[15] Simonyan, K., & Zisserman, A. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556, 2014.
[16] Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., ... & Kingsbury, B. "Deep neural networks for acoustic modeling in speech recognition:
The shared views of four research groups." IEEE Signal Processing Magazine, 29(6), 82- 97, 2012.
[17] Graves, A., & Schmidhuber, J. "Framewise phoneme classification with bidirectional LSTM and other neural network architectures." Neural Networks, 18(5-
6), 602-610, 2005.
[18] Kingma, D. P., & Ba, J. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980, 2014.
[19] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., ... & Zheng, X. "TensorFlow: Large-scale machine learning on heterogeneous systems."
Software available from tensorflow.org, 2015.
[20] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Desmaison, A. "PyTorch: An imperative style, high- performance deep learning
library." Advances in Neural Information Processing Systems, 32, 8024-8035, 2019.

Cas Clarification For Polytechnic
No ratings yet
Cas Clarification For Polytechnic
37 pages
Speech Emotion Recognition Using Machine Learning - A Systematic Review
No ratings yet
Speech Emotion Recognition Using Machine Learning - A Systematic Review
25 pages
Speech Based Emotion Recognition
No ratings yet
Speech Based Emotion Recognition
8 pages
Emotion Detection Final Paper
No ratings yet
Emotion Detection Final Paper
15 pages
Neural Networks for Beginners: Introduction to Machine Learning and Deep Learning
From Everand
Neural Networks for Beginners: Introduction to Machine Learning and Deep Learning
daniel Huston
No ratings yet
Speech Emotion Recognition System Using Recurrent Neural Network in Deep Learning
No ratings yet
Speech Emotion Recognition System Using Recurrent Neural Network in Deep Learning
7 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
8 pages
Speech Emotion Recognition System For Human Interaction Applications
No ratings yet
Speech Emotion Recognition System For Human Interaction Applications
8 pages
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
No ratings yet
A Review On Speech Emotion Classification Using Linear Predictive Coding and Neural Networks
5 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Deep Learning Based Emotion Analysis and Recognition For Movie Review
No ratings yet
Deep Learning Based Emotion Analysis and Recognition For Movie Review
10 pages
Emotion Detection Using Machine Learning in Python
No ratings yet
Emotion Detection Using Machine Learning in Python
7 pages
Irjet V7i6804
No ratings yet
Irjet V7i6804
7 pages
Electronics 11 03831
No ratings yet
Electronics 11 03831
12 pages
Speech Emotion Recognition Using Deep Learning Hybrid Models
No ratings yet
Speech Emotion Recognition Using Deep Learning Hybrid Models
5 pages
Speech Emotion Recognition
No ratings yet
Speech Emotion Recognition
6 pages
Emotion Sense:-Real-time Speech Emotion Recognition For Live Calls
No ratings yet
Emotion Sense:-Real-time Speech Emotion Recognition For Live Calls
7 pages
Fin Irjmets1653652636
No ratings yet
Fin Irjmets1653652636
5 pages
Deep Learning
From Everand
Deep Learning
Manish Soni
No ratings yet
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
6 pages
Chethana H N REPORT
No ratings yet
Chethana H N REPORT
12 pages
Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics
No ratings yet
Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics
9 pages
11 V May 2023
No ratings yet
11 V May 2023
7 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
No ratings yet
Speech Emotion Recognition: Submitted by Manoj Rajput 2019PEC5303
11 pages
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
No ratings yet
Speech Emotion Recognition For Enhanced User Experience: A Comparative Analysis of Classification Methods
12 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
6 pages
Research Paper Attri
No ratings yet
Research Paper Attri
7 pages
Speech Emotion Recognition1
No ratings yet
Speech Emotion Recognition1
86 pages
Recognition of Emotions in Speech Using Deep CNN A
No ratings yet
Recognition of Emotions in Speech Using Deep CNN A
18 pages
Implementing A Real-Time Emotion Detection System Using Convolutional Neural Network
No ratings yet
Implementing A Real-Time Emotion Detection System Using Convolutional Neural Network
7 pages
Emotion Recognition From Speech: Abstract. Emotions Play An Extremely Vital Role in Human Lives and Human
No ratings yet
Emotion Recognition From Speech: Abstract. Emotions Play An Extremely Vital Role in Human Lives and Human
13 pages
An Enhanced Speech Emotion Recognition Using Vision Transformer
No ratings yet
An Enhanced Speech Emotion Recognition Using Vision Transformer
17 pages
A Survey On Emotion Based Music Player Through Face Recognition System
No ratings yet
A Survey On Emotion Based Music Player Through Face Recognition System
6 pages
THIRD - s10772 022 09985 6
No ratings yet
THIRD - s10772 022 09985 6
19 pages
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
No ratings yet
Sat - 82.Pdf - Election Prediction With Automated Speech Emotion Recognition
11 pages
Literature Study 2025
No ratings yet
Literature Study 2025
10 pages
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
No ratings yet
XEmoAccent Embracing Diversity in Cross-Accent Emotion Recognition Using Deep Learning
18 pages
Machine Learning and Deep Learning Techniques For Emotion Recognition From Human Speech Using Acoustic Analysis
No ratings yet
Machine Learning and Deep Learning Techniques For Emotion Recognition From Human Speech Using Acoustic Analysis
10 pages
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
No ratings yet
Human Speech Emotion Recognition Using Artificial Neural Networks Technique
7 pages
Speech Emotion Recognization
No ratings yet
Speech Emotion Recognization
65 pages
Survey On Facial Emotion Recognition Using Deep Learning
No ratings yet
Survey On Facial Emotion Recognition Using Deep Learning
6 pages
Role of Artificial Intelligence in Emotion Recognition
No ratings yet
Role of Artificial Intelligence in Emotion Recognition
5 pages
Electronics 12 00839 v2
No ratings yet
Electronics 12 00839 v2
17 pages
Human Emotion Detection Using Deep Learning
No ratings yet
Human Emotion Detection Using Deep Learning
7 pages
NLP Docs
No ratings yet
NLP Docs
51 pages
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
No ratings yet
Deep Learning Structure For Emotion Prediction Using MFCC From Native Languages
13 pages
DL Emotion MFCC
No ratings yet
DL Emotion MFCC
6 pages
Real-Time Speech Emotion Recognition Using Deep Le
No ratings yet
Real-Time Speech Emotion Recognition Using Deep Le
40 pages
Set Conference Draft Paper - 223585
No ratings yet
Set Conference Draft Paper - 223585
6 pages
Jait 0708
No ratings yet
Jait 0708
25 pages
Hybrid Temporal Spectral Convolutional Neural Network CNN Designed Specifically For Speech Emotion Rec
No ratings yet
Hybrid Temporal Spectral Convolutional Neural Network CNN Designed Specifically For Speech Emotion Rec
11 pages
Applsci 12 09188 v2
No ratings yet
Applsci 12 09188 v2
17 pages
Personalized Emotion Detection Adapting Models To Individual Emotional Expressions
No ratings yet
Personalized Emotion Detection Adapting Models To Individual Emotional Expressions
6 pages
Sample Poster Template CSE
No ratings yet
Sample Poster Template CSE
1 page
JETIR2106163
No ratings yet
JETIR2106163
5 pages
Sensors 23 06212 v2
No ratings yet
Sensors 23 06212 v2
20 pages
Novel Approach On Audio To Text Sentiment Analysis On Product Reviews
No ratings yet
Novel Approach On Audio To Text Sentiment Analysis On Product Reviews
8 pages
Emotion Based Music Recommendation System
No ratings yet
Emotion Based Music Recommendation System
7 pages
Music Recommendation System Using Facial Expression Recognition Using Machine Learning
No ratings yet
Music Recommendation System Using Facial Expression Recognition Using Machine Learning
7 pages
Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
No ratings yet
Exploring The Effectiveness of Advanced Machine Learning Models in Speech Emotion Recognition
6 pages
A Review On Promising MPPT Control Methods of Phot
No ratings yet
A Review On Promising MPPT Control Methods of Phot
8 pages
Svs College of Engineering: COIMBATORE 642 109
100% (1)
Svs College of Engineering: COIMBATORE 642 109
34 pages
Class 10 Arithmetic Progressions: Answer The Questions
No ratings yet
Class 10 Arithmetic Progressions: Answer The Questions
14 pages
Lattice Diagram For Reflection
No ratings yet
Lattice Diagram For Reflection
11 pages
Emtp PDF
100% (1)
Emtp PDF
34 pages
Class 9 MCQ
No ratings yet
Class 9 MCQ
2 pages
Lecture20-BJT Small Signal Model
No ratings yet
Lecture20-BJT Small Signal Model
33 pages
BE8255 Syllabus PDF
No ratings yet
BE8255 Syllabus PDF
30 pages
Unit-1 - PN Junction Devices: Ec8353 - Electron Devices and Circuits
No ratings yet
Unit-1 - PN Junction Devices: Ec8353 - Electron Devices and Circuits
4 pages
Direct Memory Access (DMA)
No ratings yet
Direct Memory Access (DMA)
2 pages
Structural Units of Embedded Processor PDF
67% (3)
Structural Units of Embedded Processor PDF
11 pages
Flat It Gate 2
No ratings yet
Flat It Gate 2
33 pages
DAA Super-Imp-Tie-22
No ratings yet
DAA Super-Imp-Tie-22
4 pages
Naval Research Laboratory Washington, DC 20375-5320 Nrl/Mr/6410!93!7192
No ratings yet
Naval Research Laboratory Washington, DC 20375-5320 Nrl/Mr/6410!93!7192
134 pages
Find The Optimal Solution To The Linear Programming Model With He Integer Restrictions Relaxed
No ratings yet
Find The Optimal Solution To The Linear Programming Model With He Integer Restrictions Relaxed
10 pages
Unicycle Robot
No ratings yet
Unicycle Robot
45 pages
1615888543RME - Detail Syllabus PhD-2020
No ratings yet
1615888543RME - Detail Syllabus PhD-2020
28 pages
Applied Maths Class 12 Board Paper
No ratings yet
Applied Maths Class 12 Board Paper
13 pages
Revision F3
No ratings yet
Revision F3
2 pages
Dde 23
No ratings yet
Dde 23
3 pages
Relaxation Method 2012
No ratings yet
Relaxation Method 2012
46 pages
HW2 Solution
No ratings yet
HW2 Solution
4 pages
DRCT Sample Paper
No ratings yet
DRCT Sample Paper
6 pages
Stress Detection Using Natural Language
No ratings yet
Stress Detection Using Natural Language
24 pages
Lab 6
No ratings yet
Lab 6
8 pages
Phase Plane Analysis
No ratings yet
Phase Plane Analysis
83 pages
Chapter 2 The Classical Linear Regression Model (CLRM)
No ratings yet
Chapter 2 The Classical Linear Regression Model (CLRM)
20 pages
Investigations Into The Kaprekar Process
No ratings yet
Investigations Into The Kaprekar Process
22 pages
Professor Tony Coxon: Hon. Professorial Research Fellow, University of Edinburgh
No ratings yet
Professor Tony Coxon: Hon. Professorial Research Fellow, University of Edinburgh
13 pages
Mid Term Past Papers 701
No ratings yet
Mid Term Past Papers 701
4 pages
Algorithm and Flow Chart
No ratings yet
Algorithm and Flow Chart
34 pages
K-Space Filling
No ratings yet
K-Space Filling
3 pages
Enhancing Machine Learning Work Ows: A Comprehensive Study of Machine Learning Pipelines
No ratings yet
Enhancing Machine Learning Work Ows: A Comprehensive Study of Machine Learning Pipelines
7 pages
Power System Stability Book by M
No ratings yet
Power System Stability Book by M
229 pages
AP 2.2 Tanmay
No ratings yet
AP 2.2 Tanmay
5 pages
12.4.7 TransformerModels
No ratings yet
12.4.7 TransformerModels
37 pages
Learning Opportunity 1
No ratings yet
Learning Opportunity 1
4 pages
6 Suffix-Tree
No ratings yet
6 Suffix-Tree
20 pages
Ma 3 H0
No ratings yet
Ma 3 H0
2 pages
JFJF
No ratings yet
JFJF
14 pages
Skript Control Systems II V1
No ratings yet
Skript Control Systems II V1
248 pages