PKS Sentiment Paper
PKS Sentiment Paper
Ashish dixit
333
CONFERENCE
123
Document Details
Submission ID
trn:oid:::1:3151161790 6 Pages
Download Date
File Name
PKS_Sentiment_Paper.doc
File Size
217.0 KB
AI detection includes the possibility of false positives. Although some text in It is essential to understand the limitations of AI detection before making decisions
this submission is likely AI generated, scores below the 20% threshold are not about a student’s work. We encourage you to learn more about Turnitin’s AI detection
capabilities before using the tool.
surfaced because they have a higher likelihood of false positives.
Disclaimer
Our AI writing assessment is designed to help educators identify text that might be prepared by a generative AI tool. Our AI writing assessment may not always be accurate (it may misidentify
writing that is likely AI generated as AI generated and AI paraphrased or likely AI generated and AI paraphrased writing as only AI generated) so it should not be used as the sole basis for
adverse actions against a student. It takes further scrutiny and human judgment in conjunction with an organization's application of its specific academic policies to determine whether any
academic misconduct has occurred.
False positives (incorrectly flagging human-written text as AI-generated) are a possibility in AI models.
AI detection scores under 20%, which we do not surface in new reports, have a higher likelihood of false positives. To reduce the
likelihood of misinterpretation, no score or highlights are attributed and are indicated with an asterisk in the report (*%).
The AI writing percentage should not be the sole basis to determine whether misconduct has occurred. The reviewer/instructor
should use the percentage as a means to start a formative conversation with their student and/or use it to examine the submitted
assignment in accordance with their school's policies.
Non-qualifying text, such as bullet points, annotated bibliographies, etc., will not be processed and can create disparity between the submission highlights and the
percentage shown.
Abstract—Speech Emotion Recognition (SER) through voice Domain Adaptation, Multilingual Sentiment Analysis, Product
is a topic of significant interest, particularly in the field of Reviews, Social Media Sentiment, Customer Feedback
psychology. This is due to the considerable potential of SER.
Machine learning algorithms are constantly advancing, and
SER has numerous practical applications. Human speech I. INTRODUCTION
includes nonverbal cues that can be captured through various Speech is a widely used form of human communication that
factors like pitch, power, and Mel Frequency Cepstral presents words and expressions. Spoken words include the
Coefficients (MFCCs). SER typically involves three main content of the conversation and the words used, while
processes: object processing, feature selection/extraction, and irrelevant words include the speaker's gender, mood, age,
object classification based on underlying assumptions. The
and other characteristics. Many studies have shown that
combination of these steps' features with the unique
characteristics of human speech makes machine learning an music can be used as a simple tool to connect machines and
effective approach for SER. Recent theoretical studies have humans [1]. But this requires a machine that recognizes the
utilized various machine learning methods for SER tasks. human voice to accurately predict behavior, just like
However, only a limited number of studies have explored humans do. Therefore, there is interest in the field of
technologies and methods that can aid in the three primary Speaker Emotion Recognition (SER), which focuses on
steps of SER implementation. Additionally, the challenges identifying speaker behavior based on their voice.
associated with these procedures are often overlooked or SER is a very important field of research with many
briefly addressed in these studies. This article provides a applications in telephone conversations [2,3], human-
comprehensive review of past research on SER activities using
computer interaction (HCI) [4], automatic or non-manual
machine learning, focusing on the three steps of SER
implementation. It also addresses the challenges and solutions translation systems, car operation [5,6] and medical. For
related to these steps, including the inclusion of minority example, in healthcare, the emotional state of the patient
populations in speech-independent experiments. Furthermore, can be determined from the patient's voice, thus providing
guidance on SER evaluation is provided, emphasizing the appropriate facilities and support [7,8]. However, due to
principles and measures for testing. It is our hope that this differences in people's speech and cultural backgrounds, the
article will serve as a valuable resource for SER researchers, selection of acoustic features for emotion recognition has
enabling them to leverage machine learning techniques to become difficult and laborious. Speech currently used for
enhance SER solutions, identify areas for improvement in SER is divided into continuous features (formants, energy,
existing models, and drive the development of new
tone), spectral features, Teager Energy Operator, and
technologies to enhance SER performance.
qualitative (speech quality) features [9]. However, the
intelligence derived from these traits often relies on the
Keywords—Speech emotion recognition(SEP), Machine expertise of experts and is often too low-level to capture
Learning in emotion recognition, Emotion Detection, Mood emotions in difficult situations. More importantly, the main
Classification, Ensemble Methods, Support Vector Machines
limitations of professional-identified featuresare:
(SVM), Naive Bayes, Decision Trees, Neural Networks, Random
Forest, Sentiment Analysis Datasets, Labeled Text Corpora,
Social Media Data, Handling Negation, Sarcasm Detection,
- Inability to accurately identify emotional changes in features, and variable features. Among these, acoustic
different situations. Ex:Differences between speakers, properties are the most popular and best for SER. These
differences in utterance and external influences [10]. features include speech quality such as vibration, trill, first
- Inability to identify change of mind in different situations, three types of patterns, and harmony-voice contrast, as well
such as dissimilarities between speakers, adaptations, and as prosodic features such as pitch and loudness.
environmental influences [10]. Speech Expression Recognition (SER) has been
- Training machine learning to be effective requires a implemented using deep learning (DL) architectures. The
significant amount of time, financial resources, and human challenges faced by machine learning algorithms in
skills [11]. processing large data sets and the rapid development of
- There is a lack of well-developed algorithms that can computer technology have increased interest. Akçay and
extract features for cognitive purposes [12] . OÄŸuz [17] presented a comprehensive review of SER,
To solve previous problems, the ability to extract relevant showed three good methods for character analysis, and
thoughts and features that are important for the operation of emphasized the importance of using optimal classification
the SER. Many studies have proposed various techniques methods to increase the power of SER systems. Deep
for automatic inference from speech symbols. For example, learning has been reviewed by Jahangir et al. [9] divided
one study used a mono-layered CNN to provide automatic them into separation, reproduction, and combination and
features, while another used a dual-layered CNN acting as studied their optimization for SER. Reference [18]
the long short-term temporal (LSTM) layer for the SER proposed a method to provide different agent models using
system. However, superficial models such as one and two- multiple CNNs, achieving a weight of 35.7% on AFEW5.0
layer CNNs may fail to identify important features. Ref. and 44.1% on the BAUM-1 dataset. To improve accuracy
[15] used a deep CNN to achieve frequency separation and reduceprocessing time and computational cost, Ref.
using a combination of filters and SER technology. The [21]. This method involves usageofk-means clustering algo.
proposed deep CNN has learned the features of audio data. to get the best results from the signal and using the Short-
Time Fourier Transform(STFT) method to generate the
spectrogram. The ResNet CNN model is then used to
extract features from the spectrogram, and then the
BiLSTM model uses these features to estimate the
reflectance.
A.4. Rule Definition: Clean and preprocess the text data. This involves tasks like
Linguistic patterns, sentiment-related words or any other removing irrelevant characters, converting text to
criteria that correlate with the sentiment indicator you want lowercase, and handling issues like spelling mistakes.
to identify can form a set of rules to identify. The rule
formulation should consider the language nuances, context B.4. System Configuration:
and also if it is domain specific. If applicable, configure the predefined system with any
A.5. Rule Application: specific settings, parameters, or rules. This step might
With a certain amount of freedom, rules can be applied to involve setting thresholds, adjusting weights, or defining
the features extracted from the text (or directly to the specific criteria.
transcribed text) to emotionally interpret sentiment polarity
(positive, negative, neutral). B.5. Feature Extraction:
A.6. Rule Refinement: Extract features of a text data we hear from. In the case of a
Define rules based on this performance of sentiment rule-based system, this may involve identifying words or
analysis. This may include adjusting the threshold, adding patterns associated with the different sentiments. In the case
new rules or changing the existing one. of machine learning models, there could be a step of
A.7. Evaluation: converting text features to numerical features.
Evaluate the accuracy of your method using correct
evaluation metrics and measure its performance. In this B.6. Sentiment Analysis:
case, the results could be compared against a manually Sentiment analysis can be performed on the preprocessed
annotated dataset. data by applying the predefined system. Running the text
A.8. Optimization: through a set of rules, using a pre-trained machine learning
Fine-tune parameters and rules to improve accuracy and model, or a combination of methods would be an example
handle different linguistic variations. of this step.
B.7. Evaluation:
A.9. Testing and Validation: Test the performance of the system using the correct
Validate your rule-based approach on a separate set of data metrics. Precision and accuracy as common measures, and
to ensure its generalizability and robustness. F1 score. It is this step that allows us to check how the
system behaves on this dataset.
B.8. Fine-Tuning:
The system can be fine-tuned according to the evaluation
results if necessary. It could consist of changing or fixing
rules, training machine learning models again or changing
system parameters to make the model more precise.
B.9. Validation:
Fig. 2: Mel-frequency cepstral coefficients Check that the system is accurate and precise on a
different dataset to ensure generalizability to unseen data.
B. Emotion Recognition Systems:
Utilize pre-existing emotion recognition systems that are B.10. Integration (if applicable):
trained to recognize specific emotions in speech, which can Integrate it in such cases if the system is a part of a bigger
be adapted for sentiment analysis. application or pipeline. Make sure that both input and
output are based on a format that suits the overall system
B.1. System Selection: requirements.
Choose or develop a predefined sentiment analysis system.
This could be a rule-based system, a ml model or a B.11. Documentation:
combination of both. Specify the system, dataset, preprocessing steps, as
well as configurations made. It allows for reproduction and
reference later on.
B.2. Data Collection:
Collect a dataset of text samples for sentiment analysis. B.12. Monitoring and Maintenance:
This dataset should be representative of the type of data the Set up monitoring mechanisms to track the performance of
system is expected to analyze. the system over time if it is deployed in a real-world
scenario. Keep the system updated regularly according to
B.3. Preprocessing: the data pattern or requirements changes.
algorithms can be enhanced with the interpretability and In any case, combining these approaches provides a
plasticity of the sentiment analysis algorithms through the comprehensive solution to sentiment analysis in voice by
integration of rule-based heuristics with existing emotion utilizing the advantages of each of these methodologies to
recognition models. In addition, blending keyword spotting limit the disadvantages of each. By using rule-based, pre-
with the model’s machine learning inker could increase the existing emotion recognition and keyword spotting, these
accuracy and flexibility of the sentiment classifier. emotion recognition techniques help researchers and
However, limitations for the research in upcoming points practitioners build robust sentiment analysis systems to
out. The first point is that the findings may be limited in accurately interpret the emotions from vocal expressions in
their generalizability due to the specific datasets or contexts the various domains and applications.
used in this research. Future research should validate these In the fast-developing fields of NLP and ML, it is
approaches across various demography, languages and reasonable to expect to see more sophisticated and more
cultural backgrounds. Finally, the accuracy and precision of innovative methodologies and technologies to be developed
these methods could be poor because the quality and type to analyze sentiment via voice, leading to an increasingly
of voice data such as background noise, accent variations, better understanding and interpretation of human emotions
and speech impediments may vary due to which the in the spoken language.
accuracy and precision of these methods may vary.
Working on such challenges would be worthwhile to REFERENCES
develop robust preprocessing techniques to make sentiment 1. Chen, L.; Su, W.; Feng, Y.; Wu, M.; She, J.; Hirota, K. Two-layer fuzzy
analysis systems more reliable. multiple random forest for speech emotion recognition in human-robot
interaction. Inf. Sci. 2020, 509, 150–163. [CrossRef]
Additionally, the integration of multimodal data sources
(such as facial expressions, body language, and 2. Hansen, J.H.; Cairns, D.A. Icarus: Source generator based real-time
physiological signals) could give us more readable in recognition of speech in noisy stressful and lombard effect environments.
human emotion and sentiments. It is additionally critical to Speech Commun. 1995, 16, 391–422. [CrossRef]
investigate the ethical implications of sentiment analysis
with the voice, specifically on privacy, consent and 3. Koduru, A.; Valiveti, H.B.; Budati, A.K. Feature extraction algorithms
to improve the speech emotion recognition rate. Int. J. Speech Technol.
potential human bias for safe deployment of the same in the
2020, 23, 45–55. [CrossRef]
real world.
4. Zheng, W.; Zheng, W.; Zong, Y. Multi-scale discrepancy adversarial
V. Conclusion network for crosscorpus speech emotion recognition. Virtual Real. Intell.
This research paper concludes by exploring the domain of Hardw. 2021, 3, 65–75. [CrossRef]
sentiment analysis through voice with a multi-pronged
approach using rule-based techniques, existing emotion 5. Schuller, B.; Rigoll, G.; Lang, M. Speech emotion recognition
combining acoustic features and linguistic information in a hybrid support
recognition techniques, and keyword spotting algorithms.
vector machine-belief network architecture. In Proceedings of the 2004
By considering the effective properties and effectiveness of IEEE International Conference on Acoustics, Speech, and Signal
these methodologies and in an investigation of these Processing, Montreal, QC, Canada, 17–21 May 2004; pp. 577–580
methodologies, we established the properties of these
methodologies and their limitation of capturing and 6. Spencer, C.; Koç, ˙I.A.; Suga, C.; Lee, A.; Dhareshwar, A.M.; Franzén,
interpreting vocal expression sentiment. E.; Iozzo, M.; Morrison, G.; McKeown, G. A Comparison of Unimodal
and Multimodal Measurements of Driver Stress in Real-World Driving
The rule-based approach provides a structured method to
Conditions; ACM: New York, NY, USA, 2020.
sentiment analysis by definition an explicit rule to
recognize and examine emotions according to the criteria. 7. France, D.J.; Shiavi, R.G.; Silverman, S.; Silverman, M.; Wilkes, M.
However, while this way is transparent and interpretable, it Acoustical properties of speech as indicators of depression and suicidal
can be limited by the fact that its effectiveness depends on risk. IEEE Trans. Biomed. Eng. 2000, 47, 829–837. [CrossRef]
predefined rules that may not be able to define the nuances
of human emotions. On the contrary, the ones in current use 8. Uddin, M.Z.; Nilsson, E.G. Emotion recognition using speech and
neural structured learning to facilitate edge intelligence. Eng. Appl. Artif.
employ machine learning models trained on vast datasets to Intell. 2020, 94, 103775. [CrossRef]
automatically decode and categorize emotions from voice
records. In this case, these techniques worked very well in 9. Jahangir, R.; Teh, Y.W.; Hanif, F.; Mujtaba, G. Deep learning
capturing people’s subtle variations in vocal expressions, approaches for speech emotion recognition: State of the art and research
and they are well suited to different lingual and cultural challenges. Multimed. Tools Appl. 2021, 80, 23745–23812. [CrossRef]
contexts. Despite this, they may require huge amounts of
10. Fahad, M.S.; Ranjan, A.; Yadav, J.; Deepak, A. A survey of speech
annotated data to train and may be lacking in being
emotion recognition in natural environment. Digit. Signal Process. 2021,
transparent about the process of making a decision. 110, 102951. [CrossRef]
Additionally, keyword spotting technique integration makes
it possible to identify particular keywords or phrases 11. Jahangir, R.; Teh, Y.W.; Mujtaba, G.; Alroobaea, R.; Shaikh, Z.H.;
devoted to sentiments from voice recording. Sentiment Ali, I. Convolutional neural network-based cross-corpus speech emotion
analysis is boosted with contextually significant cues in this recognition with data augmentation and features fusion. Mach. Vis. Appl.
2022, 33, 41. [CrossRef]
approach, to make emotion classification more precise.
12. Ayadi, M.E.; Kamel, M.S.; Karray, F. Survey on speech emotion 28. Al-onazi, B.B.; Nauman, M.A.; Jahangir, R.; Malik, M.M.;
recognition: Features, classification schemes, and databases. Pattern Alkhammash, E.H.; Elshewey, A.M. Transformer-based multilingual
Recognit. 2011, 44, 572–587. [CrossRef] speech emotion recognition using data augmentation and feature fusion.
Appl. Sci. 2022, 12, 9188. [CrossRef]
13. Abdel-Hamid, O.; Mohamed, A.-R.; Jiang, H.; Deng, L.; Penn, G.; Yu,
D. Convolutional neural networks for speech recognition. IEEE/ACM 29. Jahangir, R.; Teh, Y.W.; Memon, N.A.; Mujtaba, G.; Zareei, M.;
Trans. Audio Speech Lang. Process. 2014, 22, 1533–1545. [CrossRef] Ishtiaq, U.; Akhtar, M.Z.; Ali, I. Text-independent speaker identification
through feature fusion and deep neural network. IEEE Access 2020, 8,
14. Trigeorgis, G.; Ringeval, F.; Brueckner, R.; Marchi, E.; Nicolaou, 32187–32202. [CrossRef]
M.A.; Schuller, B.; Zafeiriou, S. Adieu features? end-to-end speech
emotion recognition using a deep convolutional recurrent network. In
Proceedings of the 2016 IEEE International Conference on Acoustics, 30. Jahangir, R.; Teh, Y.W.; Nweke, H.F.; Mujtaba, G.; Al-Garadi, M.A.;
Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March Ali, I. Speaker identification through artificial intelligence techniques: A
2016; pp. 5200–5204. comprehensive review and research challenges. Expert Syst. Appl. 2021,
171, 114591. [CrossRef]
15. Anvarjon, T.; Kwon, S. Deep-net: A lightweight CNN-based speech
emotion recognition system using deep frequency features. Sensors 2020, 31. Khan, A.A.; Jahangir, R.; Alroobaea, R.; Alyahyan, S.Y.; Almulhi,
20, 5212. [CrossRef] [PubMed] A.H.; Alsafyani, M. An efficient text-independent speaker identification
using feature fusion and transformer model. Comput. Mater. Contin. 2023,
16. Rybka, J.; Janicki, A. Comparison of speaker dependent and speaker 75, 4085–4100
independent emotion recognition. Int. J. Appl. Math. Comput. Sci. 2013,
23, 797–808. [CrossRef]
18. Zhang, S.; Tao, X.; Chuang, Y.; Zhao, X. Learning deep multimodal
affective features for spontaneous speech emotion recognition. Speech
Commun. 2021, 127, 73–81. [CrossRef]
20. Issa, D.; Demirci, M.F.; Yazici, A. Speech emotion recognition with
deep convolutional neural networks. Biomed. Signal Process. Control.
2020, 59, 101894. [CrossRef]
22. Badshah, A.M.; Rahim, N.; Ullah, N.; Ahmad, J.; Muhammad, K.;
Lee, M.Y.; Kwon, S.; Baik, S.W. Deep features-based speech emotion
recognition for smart affective services. Multimed. Tools Appl. 2019, 78,
5571–5589. [CrossRef]
24. Noroozi, F.; Sapi ´nski, T.; Kami ´nska, D.; Anbarjafari, G. Vocal-
based emotion recognition using random forests and decision tree. Int. J.
Speech Technol. 2017, 20, 239–246. [CrossRef]
26. Nwe, T.L.; Foo, S.W.; Silva, L.C.D. Speech emotion recognition using
hidden Markov models. Speech Commun. 2003, 41, 603–623. [CrossRef]