0% found this document useful (0 votes)
5 views8 pages

PKS Sentiment Paper

The document discusses a research paper on Speech Emotion Recognition (SER) that explores machine learning techniques for analyzing emotions in speech. It outlines the processes involved in SER, including feature extraction, model training, and prediction, while addressing challenges and solutions in the field. The paper aims to provide a comprehensive review of SER methodologies and enhance the understanding and application of machine learning in emotion detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

PKS Sentiment Paper

The document discusses a research paper on Speech Emotion Recognition (SER) that explores machine learning techniques for analyzing emotions in speech. It outlines the processes involved in SER, including feature extraction, model training, and prediction, while addressing challenges and solutions in the field. The paper aims to provide a comprehensive review of SER methodologies and enhance the understanding and application of machine learning in emotion detection.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Page 1 of 8 - Cover Page Submission ID trn:oid:::1:3151161790

Ashish dixit
333
CONFERENCE

123

The Indian Institute Of Management And Engineering Society

Document Details

Submission ID

trn:oid:::1:3151161790 6 Pages

Submission Date 4,000 Words

Feb 10, 2025, 3:44 PM GMT+5:30


23,654 Characters

Download Date

Feb 10, 2025, 3:46 PM GMT+5:30

File Name

PKS_Sentiment_Paper.doc

File Size

217.0 KB

Page 1 of 8 - Cover Page Submission ID trn:oid:::1:3151161790


Page 2 of 8 - AI Writing Overview Submission ID trn:oid:::1:3151161790

*% detected as AI Caution: Review required.

AI detection includes the possibility of false positives. Although some text in It is essential to understand the limitations of AI detection before making decisions
this submission is likely AI generated, scores below the 20% threshold are not about a student’s work. We encourage you to learn more about Turnitin’s AI detection
capabilities before using the tool.
surfaced because they have a higher likelihood of false positives.

Disclaimer
Our AI writing assessment is designed to help educators identify text that might be prepared by a generative AI tool. Our AI writing assessment may not always be accurate (it may misidentify
writing that is likely AI generated as AI generated and AI paraphrased or likely AI generated and AI paraphrased writing as only AI generated) so it should not be used as the sole basis for
adverse actions against a student. It takes further scrutiny and human judgment in conjunction with an organization's application of its specific academic policies to determine whether any
academic misconduct has occurred.

Frequently Asked Questions

How should I interpret Turnitin's AI writing percentage and false positives?


The percentage shown in the AI writing report is the amount of qualifying text within the submission that Turnitin’s AI writing
detection model determines was either likely AI-generated text from a large-language model or likely AI-generated text that was
likely revised using an AI-paraphrase tool or word spinner.

False positives (incorrectly flagging human-written text as AI-generated) are a possibility in AI models.

AI detection scores under 20%, which we do not surface in new reports, have a higher likelihood of false positives. To reduce the
likelihood of misinterpretation, no score or highlights are attributed and are indicated with an asterisk in the report (*%).

The AI writing percentage should not be the sole basis to determine whether misconduct has occurred. The reviewer/instructor
should use the percentage as a means to start a formative conversation with their student and/or use it to examine the submitted
assignment in accordance with their school's policies.

What does 'qualifying text' mean?


Our model only processes qualifying text in the form of long-form writing. Long-form writing means individual sentences contained in paragraphs that make up a
longer piece of written work, such as an essay, a dissertation, or an article, etc. Qualifying text that has been determined to be likely AI-generated will be
highlighted in cyan in the submission, and likely AI-generated and then likely AI-paraphrased will be highlighted purple.

Non-qualifying text, such as bullet points, annotated bibliographies, etc., will not be processed and can create disparity between the submission highlights and the
percentage shown.

Page 2 of 8 - AI Writing Overview Submission ID trn:oid:::1:3151161790


Page 3 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790

Sentiment Analysis through Voice


Using Rule Based, Pre-existing Emotion
Recognition and Keyword Spottingapproach
1Prabhat Kumar Srivastava

Department of Computer Science and


Engineering
IMS Engineering College, Ghaziabad,
Uttar Pradesh, India
[email protected]

2Basudeo Singh Roohani

Department of Computer Science and


Engineering
IMS Engineering College, Ghaziabad,
Uttar Pradesh, India
[email protected]

Abstract—Speech Emotion Recognition (SER) through voice Domain Adaptation, Multilingual Sentiment Analysis, Product
is a topic of significant interest, particularly in the field of Reviews, Social Media Sentiment, Customer Feedback
psychology. This is due to the considerable potential of SER.
Machine learning algorithms are constantly advancing, and
SER has numerous practical applications. Human speech I. INTRODUCTION
includes nonverbal cues that can be captured through various Speech is a widely used form of human communication that
factors like pitch, power, and Mel Frequency Cepstral presents words and expressions. Spoken words include the
Coefficients (MFCCs). SER typically involves three main content of the conversation and the words used, while
processes: object processing, feature selection/extraction, and irrelevant words include the speaker's gender, mood, age,
object classification based on underlying assumptions. The
and other characteristics. Many studies have shown that
combination of these steps' features with the unique
characteristics of human speech makes machine learning an music can be used as a simple tool to connect machines and
effective approach for SER. Recent theoretical studies have humans [1]. But this requires a machine that recognizes the
utilized various machine learning methods for SER tasks. human voice to accurately predict behavior, just like
However, only a limited number of studies have explored humans do. Therefore, there is interest in the field of
technologies and methods that can aid in the three primary Speaker Emotion Recognition (SER), which focuses on
steps of SER implementation. Additionally, the challenges identifying speaker behavior based on their voice.
associated with these procedures are often overlooked or SER is a very important field of research with many
briefly addressed in these studies. This article provides a applications in telephone conversations [2,3], human-
comprehensive review of past research on SER activities using
computer interaction (HCI) [4], automatic or non-manual
machine learning, focusing on the three steps of SER
implementation. It also addresses the challenges and solutions translation systems, car operation [5,6] and medical. For
related to these steps, including the inclusion of minority example, in healthcare, the emotional state of the patient
populations in speech-independent experiments. Furthermore, can be determined from the patient's voice, thus providing
guidance on SER evaluation is provided, emphasizing the appropriate facilities and support [7,8]. However, due to
principles and measures for testing. It is our hope that this differences in people's speech and cultural backgrounds, the
article will serve as a valuable resource for SER researchers, selection of acoustic features for emotion recognition has
enabling them to leverage machine learning techniques to become difficult and laborious. Speech currently used for
enhance SER solutions, identify areas for improvement in SER is divided into continuous features (formants, energy,
existing models, and drive the development of new
tone), spectral features, Teager Energy Operator, and
technologies to enhance SER performance.
qualitative (speech quality) features [9]. However, the
intelligence derived from these traits often relies on the
Keywords—Speech emotion recognition(SEP), Machine expertise of experts and is often too low-level to capture
Learning in emotion recognition, Emotion Detection, Mood emotions in difficult situations. More importantly, the main
Classification, Ensemble Methods, Support Vector Machines
limitations of professional-identified featuresare:
(SVM), Naive Bayes, Decision Trees, Neural Networks, Random
Forest, Sentiment Analysis Datasets, Labeled Text Corpora,
Social Media Data, Handling Negation, Sarcasm Detection,

Page 3 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790


Page 4 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790

- Inability to accurately identify emotional changes in features, and variable features. Among these, acoustic
different situations. Ex:Differences between speakers, properties are the most popular and best for SER. These
differences in utterance and external influences [10]. features include speech quality such as vibration, trill, first
- Inability to identify change of mind in different situations, three types of patterns, and harmony-voice contrast, as well
such as dissimilarities between speakers, adaptations, and as prosodic features such as pitch and loudness.
environmental influences [10]. Speech Expression Recognition (SER) has been
- Training machine learning to be effective requires a implemented using deep learning (DL) architectures. The
significant amount of time, financial resources, and human challenges faced by machine learning algorithms in
skills [11]. processing large data sets and the rapid development of
- There is a lack of well-developed algorithms that can computer technology have increased interest. Akçay and
extract features for cognitive purposes [12] . OÄŸuz [17] presented a comprehensive review of SER,
To solve previous problems, the ability to extract relevant showed three good methods for character analysis, and
thoughts and features that are important for the operation of emphasized the importance of using optimal classification
the SER. Many studies have proposed various techniques methods to increase the power of SER systems. Deep
for automatic inference from speech symbols. For example, learning has been reviewed by Jahangir et al. [9] divided
one study used a mono-layered CNN to provide automatic them into separation, reproduction, and combination and
features, while another used a dual-layered CNN acting as studied their optimization for SER. Reference [18]
the long short-term temporal (LSTM) layer for the SER proposed a method to provide different agent models using
system. However, superficial models such as one and two- multiple CNNs, achieving a weight of 35.7% on AFEW5.0
layer CNNs may fail to identify important features. Ref. and 44.1% on the BAUM-1 dataset. To improve accuracy
[15] used a deep CNN to achieve frequency separation and reduceprocessing time and computational cost, Ref.
using a combination of filters and SER technology. The [21]. This method involves usageofk-means clustering algo.
proposed deep CNN has learned the features of audio data. to get the best results from the signal and using the Short-
Time Fourier Transform(STFT) method to generate the
spectrogram. The ResNet CNN model is then used to
extract features from the spectrogram, and then the
BiLSTM model uses these features to estimate the
reflectance.

Fig. 1: Sentiment Analysis at low level B. Classifications Methods


Researchers use various machine learning techniques to
This research presents a new SER technique that combines measure emotion in audio files. Classifiers such asRandom
MFCC with timefeatures obtained from every input in the Forest (RF) [24], Multilayer Perceptron (MLP) [25],
data set. This method has four main parts: data collection, Support Vector Machine (SVM), Hidden Markov Model
extraction of features, training of model, and prediction, (HMM) [26], Gaussian Mixture Model (GMM) and k-NN.
which is presented below in Figure 1. In the extraction These techniques are often used to solve voice-related
process, the features always include the MFCC ,and the problems, including sentiments analysis [11, 27, 28] and
recording time is subtracted. In the feature combining speaker training [29-31].
phase, the extracted features are combined. In the last step,
the CNN model containing three 1D CL (sub-processing, III. Methodology
output and max-pooling layers) and fully connected (FC) A. Rule-Based Approach
layers was used for SER. A.1.Data Collection:
Have a set of audio recordings that contain sentiment
II. Related Work labels. They all represent whether the speech is positive,
Classification of SERconsists of two main steps. The first negative or neutral.
stage involves extracting appropriate and different features
from voice. On the other hand, the second stage requires the A.1. Preprocessing:
selection of a classifier that can recognize the distinctive Format audio into a suitable format for analysis. If needed
character. The following subsections briefly describe the apply noise reduction techniques. It splits the audio into
two phases of SER. chunks or frames that can be handled.
A.2.Speech-to-Text Conversion:
A. Feature Extraction Convert spoken words on the STT system to text. Rule-
Feature Extraction is one of thepopularly used methods based analysis heavily relies on rules which are usually
used to obtain the features of the audio signal. The first descriptions in terms of textual patterns.
process is to eliminate the poor quality of each frame by A.3.Feature Extraction:
splitting the signal into speech at a specific time. Features Obtain relevant features from the transcribed text. It could
used for speech recognition (SER) generally fall into four be word frequencies, sentiment-related words, or even
categories: speech features, situational features, aesthetic acoustic features generated from the audio signal.

Page 4 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790


Page 5 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790

A.4. Rule Definition: Clean and preprocess the text data. This involves tasks like
Linguistic patterns, sentiment-related words or any other removing irrelevant characters, converting text to
criteria that correlate with the sentiment indicator you want lowercase, and handling issues like spelling mistakes.
to identify can form a set of rules to identify. The rule
formulation should consider the language nuances, context B.4. System Configuration:
and also if it is domain specific. If applicable, configure the predefined system with any
A.5. Rule Application: specific settings, parameters, or rules. This step might
With a certain amount of freedom, rules can be applied to involve setting thresholds, adjusting weights, or defining
the features extracted from the text (or directly to the specific criteria.
transcribed text) to emotionally interpret sentiment polarity
(positive, negative, neutral). B.5. Feature Extraction:
A.6. Rule Refinement: Extract features of a text data we hear from. In the case of a
Define rules based on this performance of sentiment rule-based system, this may involve identifying words or
analysis. This may include adjusting the threshold, adding patterns associated with the different sentiments. In the case
new rules or changing the existing one. of machine learning models, there could be a step of
A.7. Evaluation: converting text features to numerical features.
Evaluate the accuracy of your method using correct
evaluation metrics and measure its performance. In this B.6. Sentiment Analysis:
case, the results could be compared against a manually Sentiment analysis can be performed on the preprocessed
annotated dataset. data by applying the predefined system. Running the text
A.8. Optimization: through a set of rules, using a pre-trained machine learning
Fine-tune parameters and rules to improve accuracy and model, or a combination of methods would be an example
handle different linguistic variations. of this step.
B.7. Evaluation:
A.9. Testing and Validation: Test the performance of the system using the correct
Validate your rule-based approach on a separate set of data metrics. Precision and accuracy as common measures, and
to ensure its generalizability and robustness. F1 score. It is this step that allows us to check how the
system behaves on this dataset.

B.8. Fine-Tuning:
The system can be fine-tuned according to the evaluation
results if necessary. It could consist of changing or fixing
rules, training machine learning models again or changing
system parameters to make the model more precise.

B.9. Validation:
Fig. 2: Mel-frequency cepstral coefficients Check that the system is accurate and precise on a
different dataset to ensure generalizability to unseen data.
B. Emotion Recognition Systems:
Utilize pre-existing emotion recognition systems that are B.10. Integration (if applicable):
trained to recognize specific emotions in speech, which can Integrate it in such cases if the system is a part of a bigger
be adapted for sentiment analysis. application or pipeline. Make sure that both input and
output are based on a format that suits the overall system
B.1. System Selection: requirements.
Choose or develop a predefined sentiment analysis system.
This could be a rule-based system, a ml model or a B.11. Documentation:
combination of both. Specify the system, dataset, preprocessing steps, as
well as configurations made. It allows for reproduction and
reference later on.
B.2. Data Collection:
Collect a dataset of text samples for sentiment analysis. B.12. Monitoring and Maintenance:
This dataset should be representative of the type of data the Set up monitoring mechanisms to track the performance of
system is expected to analyze. the system over time if it is deployed in a real-world
scenario. Keep the system updated regularly according to
B.3. Preprocessing: the data pattern or requirements changes.

Page 5 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790


Page 6 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790

Account for negations (e.g., "not happy") and intensifiers


(e.g., "very good") that can alter the sentiment. Adjust
scores accordingly.

C.10. Testing and Validation:


Apply the keyword spotting technique on a known dataset
Fig. 3: Emotion Recognition using Textblob library and check its accuracy. Take adjustment of keywords and
thresholds for best performance.
C. Keyword Spotting:
Identify specific keywords or phrases associated with
different sentiments. This approach is rule-based and relies
on predefined lists of words.

C.1. Define Sentiment Categories:


Clearly define the sentiment categories you want to
identify, such as positive, negative, or neutral.
Fig. 3: Emotion Recognition using Keyword Spotting
C.2. Select Keywords:
Identify keywords or phrases associated with each
sentiment category. These keywords act as indicators of the IV. Discussion
sentiment expressed in the text.
This study investigated the notion of sentiment analysis via
C.3. Create a Keyword List: voice under three separate approaches namely, rule-based,
Compile a list of keywords for each sentiment category. pre-existing emotion detection and keyword spotting. Each
This list serves as the basis for spotting sentiments in the approach is discussed in this section, and this also includes
text. the limitations, the implications of each approach, and the
future research avenues. However, sentiment analysis using
C.4. Preprocessing: voice can be carried out through the rule-based approach in
Preprocess the text data by removing irrelevant a straightforward manner. This approach provides
information, handling punctuation, and converting the text transparency and interpretability of what the rules of this
to lowercase for consistency. system should be by establishing some fixed rules on how
to interpret vocal cues. It may however find difficulty in
C.5. Tokenization: handling sub nuances in emotions and complex linguistic
Tokenize the text into words or phrases. This step is structures which would limit the effectiveness in capturing
important for matching the input text against the predefined the nuances in sentiments.
keyword list.
For the second part, the pre-existing emotion recognition
C.6. Keyword Matching: approach relied on the use of pre-existing emotion models
Write a keyword-matching algorithm to check keyword or datasets to identify sentiment in voice recordings.
input on the tokenized text. All of which could be simple However, rule-based approaches are limited in that they are
string matching or more complex techniques such as based on a set of rules that provide the possibility of
regular expressions. capturing fewer emotions than this method due to the
availability of extensive emotional lexicons and trained
models. Its effectiveness suffers greatly from available
quality data that is not a perfect match for the desired
C.7. Scoring System: context or demography of the target population. Finally, the
Create a scoring system to the sentiment where the keyword spotting addressed how to identify some special
occurrence of keywords contributes to the scoring and keywords and expressions indicating certain sentiments in
relevance indicates the weight of the contribution. the voice data. It makes sentiment analysis simple and
Depending on matched keywords, give them a score efficient in cases when the expression of emotions is
indicating positive, negative or neutral sentiment. associated with certain linguistic markers. However, it may
C.8. Threshold Setting: ignore details on the tone, intonation, or context that would
Establish thresholds for sentiment classification. For complete the sentiment it conveys entirely.
instance: if the total number of positive keywords exceeds a Although the benefits and limitations of each approach are
set threshold, classify the overall sentiment as positive. apparent, it is possible to combine or integrate the two
C.9. Handling Negations and Intensifiers: forms of approach to provide more powerful sentiment
analysis systems. For example, sentiment analysis

Page 6 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790


Page 7 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790

algorithms can be enhanced with the interpretability and In any case, combining these approaches provides a
plasticity of the sentiment analysis algorithms through the comprehensive solution to sentiment analysis in voice by
integration of rule-based heuristics with existing emotion utilizing the advantages of each of these methodologies to
recognition models. In addition, blending keyword spotting limit the disadvantages of each. By using rule-based, pre-
with the model’s machine learning inker could increase the existing emotion recognition and keyword spotting, these
accuracy and flexibility of the sentiment classifier. emotion recognition techniques help researchers and
However, limitations for the research in upcoming points practitioners build robust sentiment analysis systems to
out. The first point is that the findings may be limited in accurately interpret the emotions from vocal expressions in
their generalizability due to the specific datasets or contexts the various domains and applications.
used in this research. Future research should validate these In the fast-developing fields of NLP and ML, it is
approaches across various demography, languages and reasonable to expect to see more sophisticated and more
cultural backgrounds. Finally, the accuracy and precision of innovative methodologies and technologies to be developed
these methods could be poor because the quality and type to analyze sentiment via voice, leading to an increasingly
of voice data such as background noise, accent variations, better understanding and interpretation of human emotions
and speech impediments may vary due to which the in the spoken language.
accuracy and precision of these methods may vary.
Working on such challenges would be worthwhile to REFERENCES
develop robust preprocessing techniques to make sentiment 1. Chen, L.; Su, W.; Feng, Y.; Wu, M.; She, J.; Hirota, K. Two-layer fuzzy
analysis systems more reliable. multiple random forest for speech emotion recognition in human-robot
interaction. Inf. Sci. 2020, 509, 150–163. [CrossRef]
Additionally, the integration of multimodal data sources
(such as facial expressions, body language, and 2. Hansen, J.H.; Cairns, D.A. Icarus: Source generator based real-time
physiological signals) could give us more readable in recognition of speech in noisy stressful and lombard effect environments.
human emotion and sentiments. It is additionally critical to Speech Commun. 1995, 16, 391–422. [CrossRef]
investigate the ethical implications of sentiment analysis
with the voice, specifically on privacy, consent and 3. Koduru, A.; Valiveti, H.B.; Budati, A.K. Feature extraction algorithms
to improve the speech emotion recognition rate. Int. J. Speech Technol.
potential human bias for safe deployment of the same in the
2020, 23, 45–55. [CrossRef]
real world.
4. Zheng, W.; Zheng, W.; Zong, Y. Multi-scale discrepancy adversarial
V. Conclusion network for crosscorpus speech emotion recognition. Virtual Real. Intell.
This research paper concludes by exploring the domain of Hardw. 2021, 3, 65–75. [CrossRef]
sentiment analysis through voice with a multi-pronged
approach using rule-based techniques, existing emotion 5. Schuller, B.; Rigoll, G.; Lang, M. Speech emotion recognition
combining acoustic features and linguistic information in a hybrid support
recognition techniques, and keyword spotting algorithms.
vector machine-belief network architecture. In Proceedings of the 2004
By considering the effective properties and effectiveness of IEEE International Conference on Acoustics, Speech, and Signal
these methodologies and in an investigation of these Processing, Montreal, QC, Canada, 17–21 May 2004; pp. 577–580
methodologies, we established the properties of these
methodologies and their limitation of capturing and 6. Spencer, C.; Koç, ˙I.A.; Suga, C.; Lee, A.; Dhareshwar, A.M.; Franzén,
interpreting vocal expression sentiment. E.; Iozzo, M.; Morrison, G.; McKeown, G. A Comparison of Unimodal
and Multimodal Measurements of Driver Stress in Real-World Driving
The rule-based approach provides a structured method to
Conditions; ACM: New York, NY, USA, 2020.
sentiment analysis by definition an explicit rule to
recognize and examine emotions according to the criteria. 7. France, D.J.; Shiavi, R.G.; Silverman, S.; Silverman, M.; Wilkes, M.
However, while this way is transparent and interpretable, it Acoustical properties of speech as indicators of depression and suicidal
can be limited by the fact that its effectiveness depends on risk. IEEE Trans. Biomed. Eng. 2000, 47, 829–837. [CrossRef]
predefined rules that may not be able to define the nuances
of human emotions. On the contrary, the ones in current use 8. Uddin, M.Z.; Nilsson, E.G. Emotion recognition using speech and
neural structured learning to facilitate edge intelligence. Eng. Appl. Artif.
employ machine learning models trained on vast datasets to Intell. 2020, 94, 103775. [CrossRef]
automatically decode and categorize emotions from voice
records. In this case, these techniques worked very well in 9. Jahangir, R.; Teh, Y.W.; Hanif, F.; Mujtaba, G. Deep learning
capturing people’s subtle variations in vocal expressions, approaches for speech emotion recognition: State of the art and research
and they are well suited to different lingual and cultural challenges. Multimed. Tools Appl. 2021, 80, 23745–23812. [CrossRef]
contexts. Despite this, they may require huge amounts of
10. Fahad, M.S.; Ranjan, A.; Yadav, J.; Deepak, A. A survey of speech
annotated data to train and may be lacking in being
emotion recognition in natural environment. Digit. Signal Process. 2021,
transparent about the process of making a decision. 110, 102951. [CrossRef]
Additionally, keyword spotting technique integration makes
it possible to identify particular keywords or phrases 11. Jahangir, R.; Teh, Y.W.; Mujtaba, G.; Alroobaea, R.; Shaikh, Z.H.;
devoted to sentiments from voice recording. Sentiment Ali, I. Convolutional neural network-based cross-corpus speech emotion
analysis is boosted with contextually significant cues in this recognition with data augmentation and features fusion. Mach. Vis. Appl.
2022, 33, 41. [CrossRef]
approach, to make emotion classification more precise.

Page 7 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790


Page 8 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790

12. Ayadi, M.E.; Kamel, M.S.; Karray, F. Survey on speech emotion 28. Al-onazi, B.B.; Nauman, M.A.; Jahangir, R.; Malik, M.M.;
recognition: Features, classification schemes, and databases. Pattern Alkhammash, E.H.; Elshewey, A.M. Transformer-based multilingual
Recognit. 2011, 44, 572–587. [CrossRef] speech emotion recognition using data augmentation and feature fusion.
Appl. Sci. 2022, 12, 9188. [CrossRef]
13. Abdel-Hamid, O.; Mohamed, A.-R.; Jiang, H.; Deng, L.; Penn, G.; Yu,
D. Convolutional neural networks for speech recognition. IEEE/ACM 29. Jahangir, R.; Teh, Y.W.; Memon, N.A.; Mujtaba, G.; Zareei, M.;
Trans. Audio Speech Lang. Process. 2014, 22, 1533–1545. [CrossRef] Ishtiaq, U.; Akhtar, M.Z.; Ali, I. Text-independent speaker identification
through feature fusion and deep neural network. IEEE Access 2020, 8,
14. Trigeorgis, G.; Ringeval, F.; Brueckner, R.; Marchi, E.; Nicolaou, 32187–32202. [CrossRef]
M.A.; Schuller, B.; Zafeiriou, S. Adieu features? end-to-end speech
emotion recognition using a deep convolutional recurrent network. In
Proceedings of the 2016 IEEE International Conference on Acoustics, 30. Jahangir, R.; Teh, Y.W.; Nweke, H.F.; Mujtaba, G.; Al-Garadi, M.A.;
Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March Ali, I. Speaker identification through artificial intelligence techniques: A
2016; pp. 5200–5204. comprehensive review and research challenges. Expert Syst. Appl. 2021,
171, 114591. [CrossRef]
15. Anvarjon, T.; Kwon, S. Deep-net: A lightweight CNN-based speech
emotion recognition system using deep frequency features. Sensors 2020, 31. Khan, A.A.; Jahangir, R.; Alroobaea, R.; Alyahyan, S.Y.; Almulhi,
20, 5212. [CrossRef] [PubMed] A.H.; Alsafyani, M. An efficient text-independent speaker identification
using feature fusion and transformer model. Comput. Mater. Contin. 2023,
16. Rybka, J.; Janicki, A. Comparison of speaker dependent and speaker 75, 4085–4100
independent emotion recognition. Int. J. Appl. Math. Comput. Sci. 2013,
23, 797–808. [CrossRef]

17. Akçay, M.B.; O ˘guz, K. Speech emotion recognition: Emotional


models, databases, features, preprocessing methods, supporting modalities,
and classifiers. Speech Commun. 2020, 116, 56–76. [CrossRef]

18. Zhang, S.; Tao, X.; Chuang, Y.; Zhao, X. Learning deep multimodal
affective features for spontaneous speech emotion recognition. Speech
Commun. 2021, 127, 73–81. [CrossRef]

19. Pawar, M.D.; Kokate, R.D. Convolution neural network based


automatic speech emotion recognition using Mel-frequency Cepstrum
coefficients. Multimed. Tools Appl. 2021, 80, 15563–15587. [CrossRef]

20. Issa, D.; Demirci, M.F.; Yazici, A. Speech emotion recognition with
deep convolutional neural networks. Biomed. Signal Process. Control.
2020, 59, 101894. [CrossRef]

21. Sajjad, M.; Kwon, S. Clustering-based speech emotion recognition by


incorporating learned features and deep BiLSTM. IEEE Access 2020, 8,
79861–79875.

22. Badshah, A.M.; Rahim, N.; Ullah, N.; Ahmad, J.; Muhammad, K.;
Lee, M.Y.; Kwon, S.; Baik, S.W. Deep features-based speech emotion
recognition for smart affective services. Multimed. Tools Appl. 2019, 78,
5571–5589. [CrossRef]

23. Er, M.B. A Novel Approach for Classification of Speech Emotions


Based on Deep and Acoustic Features. IEEE Access 2020, 8, 221640–
221653. [CrossRef]

24. Noroozi, F.; Sapi ´nski, T.; Kami ´nska, D.; Anbarjafari, G. Vocal-
based emotion recognition using random forests and decision tree. Int. J.
Speech Technol. 2017, 20, 239–246. [CrossRef]

25. Nicholson, J.; Takahashi, K.; Nakatsu, R. Emotion recognition in


speech using neural networks. Neural Comput. Appl. 2000, 9, 290–296.
[CrossRef]

26. Nwe, T.L.; Foo, S.W.; Silva, L.C.D. Speech emotion recognition using
hidden Markov models. Speech Commun. 2003, 41, 603–623. [CrossRef]

27. Aljuhani, R.H.; Alshutayri, A.; Alahdal, S. Arabic Speech Emotion


Recognition From Saudi Dialect Corpus. IEEE Access 2021, 9, 127081–
127085. [CrossRef]

Page 8 of 8 - AI Writing Submission Submission ID trn:oid:::1:3151161790

You might also like