Report On Sentiment Analysis
Report On Sentiment Analysis
A PROJECT REPORT
Submitted by
ABHAY CHOUDHARY(21BCS3122)
BACHELOR OF ENGINEERING
IN
ELECTRONICS ENGINEERING
Chandigarh University
Nov 2023
BONAFIDE CERTIFICATE
Certified that this project report “Sentiment analysis Project” is the bonafide
work of “ABHAY CHOUDHARY” who carried out the project work under
my/our supervision.
SIGNATURE SIGNATURE
CHAPTER 1. INTRODUCTION.........................................................................9
1.1. Identification of Client/ Need/ Relevant Contemporary issue..........................................9
1.4. Timeline...........................................................................................................................12
2.6. Goals/Objectives..............................................................................................................20
REFERENCES.......................................................................................................40
Acknowledgments
I would like to thank everyone who contributed to the development and implementation of the
Sentiment Analysis project. The project was made possible with the cooperation and support of
many people and organizations.
I would like to thank Er Prashant Ahluwalia for providing the necessary resources, infrastructure,
and support throughout the project. Your commitment to innovation and technological
advancement is crucial to the success of this Sentiment Analysis system.
Special thanks to the team of developers, engineers, and technicians who worked tirelessly to
design, code, and test Sentiment Analysis algorithms and software. Their skills and hard work play
an important role in ensuring the accuracy and efficiency of the system.
I am also grateful to Er Pooja and Er Kamal Kumar for their valuable ideas and expertise during the
development process. Their understanding increases the efficiency and effectiveness of Sentiment
Analysis systems.
This Sentiment Analysis system is a testament to the cooperation and dedication of everyone
involved. I am grateful for the collaboration that made this project a reality.
Thank you.
Abhay Choudhary
Abstract:
Sentiment Analysis (SA) systems have emerged as essential tools in understanding and managing
emotions expressed in text data across various applications. Leveraging natural language
processing (NLP) techniques and machine learning algorithms, SA systems extract, interpret, and
categorize sentiments expressed in text, aiding businesses in gauging customer feedback, brand
perception, and market trends.
Similar to Automatic License Plate Recognition (ANPR) systems, which have revolutionized
traffic management and surveillance, SA systems play a pivotal role in deciphering the emotional
tone of textual content. Through advanced algorithms and deep learning models, SA systems
analyze text data to discern sentiments such as positivity, negativity, or neutrality.
The core components of an SA system involve text preprocessing, sentiment classification, and
sentiment aggregation. During preprocessing, text data undergoes cleaning and normalization to
enhance analysis accuracy. Sentiment classification employs machine learning classifiers or deep
learning architectures to assign sentiment labels to text inputs. Finally, sentiment aggregation
techniques amalgamate individual sentiment scores to derive overall sentiment insights.
SA technology relies on the principles of sentiment lexicons, machine learning models, and neural
networks to achieve accurate sentiment analysis. Just as ANPR systems utilize optical character
recognition (OCR) and convolutional neural networks (CNNs) for license plate identification, SA
systems harness similar deep learning techniques for sentiment classification, ensuring robust
performance across diverse textual datasets.
Research conducted in collaboration with stakeholders helps uncover nuanced sentiments and
preferences. Analyzing data gathered through surveys, interviews, or social media monitoring
enables businesses to identify sentiment-related trends and adjust strategies accordingly.
Leveraging statistical insights, research findings, and expert consultation, businesses can gain a
comprehensive understanding of sentiment-related issues and devise strategies to enhance customer
satisfaction and engagement.
Identifying challenges in sentiment analysis is crucial for developing robust methodologies and
tools that accurately capture and analyze sentiment data.
Biases in data collection can skew sentiment analysis results. Biases may arise from sampling
methods, survey design, or the demographics of the target audience, leading to inaccurate or
misleading insights.
Technological limitations, such as the inability to accurately analyze sentiment in certain languages
or dialects, pose challenges for sentiment analysis. Advancements in natural language processing
and machine learning are necessary to overcome these limitations and improve the accuracy of
sentiment analysis.
Addressing these challenges requires collaboration between data scientists, researchers, and
industry stakeholders to develop innovative solutions and methodologies that accurately capture
and analyze sentiment data, thereby informing decision-making processes and enhancing customer
satisfaction.
In this chapter, we explore the challenges and opportunities in sentiment analysis, highlighting the
importance of understanding client needs, identifying sentiment-related concerns, and developing
strategies to address them effectively. By leveraging data-driven insights and collaborative research
efforts, businesses can gain valuable insights into customer sentiment and enhance their
competitive advantage in the marketplace.
1.4 Timeline
Task 2.1: Timeline of the reported problem (September 6th - September 16th)
Task 2.2: Proposed solutions (September 17th - September 26th)
Task 3.1: Evaluation & Selection of Specifications/Features (September 27th - October 2nd)
Task 3.2: Design Constraints (October 3rd - October 7th)
Task 3.3: Analysis and Feature finalization subject to constraints (October 8th - October 17th)
Definition of the broad problem requiring resolution: The primary challenge is to develop an
Automated Traffic Control System (ATCS) that leverages advanced technologies for real-time
traffic monitoring, efficient signal control, and adaptive route optimization.
Exclusion of any hints towards a solution: This chapter strictly focuses on identifying and defining
the problem without delving into specific solutions or technical details.
Define and differentiate the tasks needed to identify, build, and test the solution: Tasks include
requirement analysis, technology evaluation, system design, software development, hardware
integration, and comprehensive testing protocols.
Framework outlining chapters, headings, and subheadings: This report will follow a structured
framework encompassing six chapters, each addressing a crucial aspect of the project, as outlined
in the initial provided framework.
1.5.4 Timeline
Definition of the project timeline, preferably using a Gantt chart: The project timeline spans from
August 15th to November 10th, allowing ample time for each phase, including research, design,
implementation, and testing.
CHAPTER 2
LITERATURE REVIEW
The evolution of sentiment analysis has unfolded over several decades, marked by key
developments and incidents:
The concept of sentiment analysis began to emerge in the 1980s and 1990s with early
research focusing on text analysis and opinion mining.
Initial applications of sentiment analysis were seen in market research and customer
feedback analysis, albeit with limited technology and methodologies.
Growing concerns emerged regarding privacy and data protection, especially with the
increasing use of social media data for sentiment analysis.
Governments and regulatory bodies began addressing legal and ethical implications of
sentiment analysis, particularly concerning user data privacy and consent.
Rise of social media platforms (2000s): The proliferation of social media provided
abundant data for sentiment analysis, revolutionizing the field.
Cambridge Analytica scandal (2018): The misuse of personal data for targeted advertising
highlighted ethical concerns and the need for stricter regulations.
GDPR implementation (2018): The General Data Protection Regulation introduced
stringent requirements for data handling and privacy protection, impacting sentiment
analysis practices.
These milestones underscore the evolving landscape of sentiment analysis and the multifaceted
challenges it faces in terms of accuracy, privacy, and ethical considerations.
2.2 Suggestions
Improve Data Collection Methods: Enhance data collection techniques to ensure diverse
and representative datasets for more accurate sentiment analysis.
Mitigate Bias in Models: Implement algorithms and methodologies to identify and mitigate
biases in sentiment analysis models, ensuring fairness and impartiality.
Strengthen Data Protection Measures: Enhance privacy protocols and obtain explicit
consent for data usage to address privacy concerns and regulatory requirements.
Ensure Model Transparency: Employ techniques to make sentiment analysis models more
interpretable and transparent, enabling users to understand and trust the results.
Account for Cultural Differences: Incorporate cultural sensitivity into sentiment analysis
algorithms to accurately capture sentiment across diverse demographics and regions.
Establish Ethical Guidelines: Define ethical standards and regulatory oversight mechanisms
to govern the ethical use of sentiment analysis technology.
Promote Awareness: Educate users and stakeholders about the capabilities, limitations, and
ethical considerations of sentiment analysis to foster responsible usage.
These suggested strategies aim to address the diverse challenges in sentiment analysis, promoting
accuracy, fairness, privacy, and ethical practice in the field. Implementation of these
recommendations can contribute to the advancement and responsible use of sentiment analysis
technology.
An analysis of proposed solutions for issues in Sentiment Analysis systems sheds light on key
features, effectiveness, and drawbacks:
Key Features:
Effectiveness:
Key Features:
Effectiveness:
Highly effective in handling diverse text styles, ensuring accurate sentiment interpretation.
Achieves high accuracy rates in character recognition tasks.
Drawbacks:
Key Features:
Adjusts image attributes like brightness and contrast to improve sentiment analysis.
Utilizes techniques such as histogram equalization for optimal image quality.
Effectiveness:
Drawbacks:
Key Features:
Protects user privacy by anonymizing or encrypting sensitive information.
Implements secure storage and transmission protocols to safeguard data.
Effectiveness:
Drawbacks:
Potential trade-off with sentiment analysis accuracy due to data anonymization noise.
Adds computational overhead for encryption and decryption processes.
Key Features:
Effectiveness:
Drawbacks:
Key Features:
Equips cameras with weather-proof features for sentiment analysis in adverse conditions.
Applies specialized image processing filters to mitigate weather effects.
Effectiveness:
Drawbacks:
Adds complexity and cost to hardware setup and maintenance.
May not completely eliminate adverse weather effects in extreme conditions.
Key Features:
Establishes clear guidelines for sentiment analysis data collection, storage, and usage.
Obtains explicit user consent for data processing to ensure compliance and trust.
Effectiveness:
Builds transparency and trust with users, ensuring ethical sentiment analysis practices.
Mitigates privacy concerns and legal risks associated with sentiment data processing.
Drawbacks:
Key Features:
Effectiveness:
Drawbacks:
These solutions offer promising avenues for enhancing sentiment analysis systems, each with its
own strengths and considerations. By carefully considering and integrating these approaches, we
can develop more robust and accurate sentiment analysis technology to meet the demands of
various applications and environments.
In sentiment analysis systems, the selection of key functions is paramount for achieving accuracy
and reliability. Here's a breakdown of important measures and features essential for the
implementation of a sentiment analysis system:
Evaluation Importance: Preprocessing plays a crucial role in standardizing text input for
analysis.
Best Requirements: Techniques such as tokenization, lowercasing, and removal of
punctuation marks.
Key Analysis: The sentiment classification model serves as the core component,
determining the sentiment of the text.
High Requirements: High-performance models capable of handling various languages, text
lengths, and sentiment nuances.
Critical Evaluation: The quality and diversity of training data significantly impact the
performance of the sentiment analysis model.
Ideally Required: Diverse datasets covering a wide range of topics, domains, and
sentiments.
Critical Evaluation: Metrics for evaluating model performance are essential to assess
accuracy and generalization capabilities.
Ideally Required: Metrics such as accuracy, precision, recall, F1-score, and confusion
matrix analysis.
Critical Evaluation: The system should be scalable to handle large volumes of text data
efficiently.
Ideally Required: High-performance computing infrastructure capable of handling increased
workloads.
Critical Evaluation: The system should be adaptable to different domains and industries,
each with its unique language and sentiment expressions.
Ideally Required: Transfer learning techniques or domain-specific fine-tuning capabilities.
Critical Evaluation: Multilingual support enhances the system's usability and applicability
across diverse linguistic contexts.
Ideally Required: Models capable of understanding and analyzing sentiments in multiple
languages.
Critical Evaluation: Effective sentiment analysis should account for nuances like sarcasm
and irony, which may convey sentiments opposite to literal meaning.
Ideally Required: Advanced algorithms and linguistic analysis techniques for detecting and
interpreting sarcastic or ironic expressions.
Critical Evaluation: Transparent and interpretable models are crucial for understanding how
sentiment predictions are made.
Ideally Required: Techniques for model interpretability, such as attention mechanisms or
explanation generation.
Critical Evaluation: Integration with feedback mechanisms enables continuous learning and
improvement of the sentiment analysis model.
Ideally Required: Feedback loops for collecting user feedback and updating the model
accordingly.
Critical Evaluation: Continuous monitoring and maintenance are necessary to address model
drift and maintain optimal performance.
Ideally Required: Automated monitoring tools and periodic model retraining to keep pace
with evolving language trends and user behaviors.
The effectiveness of a sentiment analysis system depends on the seamless integration of these
features, prioritized based on specific requirements and use cases. Regular updates and
maintenance are essential to keep the system performing optimally over time.
Designing a sentiment analysis system involves considering various constraints to ensure its
effectiveness and reliability. Here are some common design constraints for a sentiment analysis
system:
Ensuring the availability of high-quality and diverse training data is essential for building
accurate sentiment analysis models.
Real-time sentiment analysis applications require fast processing speeds to provide timely
insights.
3.2.3 Scalability:
The system should be able to scale efficiently to handle increasing volumes of text data
without sacrificing performance.
Support for multiple languages may pose challenges in terms of linguistic diversity and
cultural nuances.
3.2.5 Interpretability:
Interpretable models are necessary for understanding how sentiment predictions are made
and gaining user trust.
Compliance with data privacy regulations and ensuring the security of user data are critical
considerations.
Limited computing resources may impact the system's ability to process large amounts of
text data efficiently.
Sentiment analysis models may need to be adapted or fine-tuned for specific domains or
industries to achieve optimal performance.
Addressing bias in sentiment analysis models is crucial to ensure fairness and mitigate
potential ethical concerns.
Seamless integration with existing software applications or platforms may be necessary for
broader adoption and usability.
Implementing mechanisms for continuous model improvement based on user feedback and
evolving language trends is essential.
By considering these constraints during the design phase, a sentiment analysis system can be
developed to effectively meet user needs while ensuring reliability and performance.
3.3 Analysis and Feature Finalization for Sentiment Analysis
In the context of sentiment analysis, it's essential to identify and finalize features considering
specific constraints to ensure accurate and reliable sentiment classification. Let's analyze and
finalize the features:
Analysis: Given the diverse nature of text data, preprocessing is crucial to standardize and clean
the text for effective sentiment analysis. Finalization: Retain robust text preprocessing techniques,
including lowercasing, punctuation removal, and stop word removal, to ensure consistency and
improve model performance.
Analysis: Extracting relevant features from text data is essential for sentiment analysis models to
capture sentiment-related information effectively. Finalization: Emphasize feature extraction
techniques such as bag-of-words, TF-IDF, and word embeddings (e.g., Word2Vec, GloVe) to
represent text data in a format suitable for sentiment classification.
Analysis: Choosing the right sentiment analysis model is critical for achieving accurate sentiment
classification results. Finalization: Prioritize models such as Naive Bayes, Support Vector
Machines (SVM), Recurrent Neural Networks (RNN), or Transformer-based architectures (e.g.,
BERT, GPT) based on the complexity of the sentiment analysis task and available computational
resources.
Analysis: The quality and diversity of the training data directly impact the performance of
sentiment analysis models. Finalization: Ensure the availability of high-quality, labeled training
data covering a wide range of sentiments, topics, and domains to improve model generalization and
robustness.
Analysis: Selecting appropriate evaluation metrics is crucial for assessing the performance of
sentiment analysis models accurately. Finalization: Utilize evaluation metrics such as accuracy,
precision, recall, F1-score, and confusion matrix to measure the model's effectiveness in sentiment
classification across different sentiment categories.
Analysis: Understanding the types of errors made by sentiment analysis models provides insights
into areas for improvement. Finalization: Conduct thorough error analysis to identify common
error patterns (e.g., misclassification of sarcasm, negation handling) and refine the model
accordingly to enhance performance.
3.3.7. Real-time Processing:
Analysis: Real-time sentiment analysis is essential for applications requiring immediate feedback
or response. Finalization: Optimize sentiment analysis models and processing pipelines for real-
time inference, considering factors such as computational efficiency and latency constraints.
Analysis: Sentiment analysis may need to support multiple languages and modalities (e.g., text,
images, audio) to handle diverse data sources. Finalization: Incorporate techniques for
multilingual sentiment analysis and explore multimodal approaches (e.g., text-image fusion) to
improve sentiment understanding across different data types.
Analysis: Bias and fairness considerations are crucial to ensure equitable sentiment analysis
outcomes across different demographic groups. Finalization: Implement measures to detect and
mitigate biases in sentiment analysis models, such as debiasing techniques, fairness-aware training,
and diverse dataset curation.
Analysis: Understanding how sentiment analysis models make predictions is essential for building
trust and transparency. Finalization: Prioritize the interpretability and explainability of sentiment
analysis models by incorporating techniques such as attention mechanisms, feature importance
analysis, and model-agnostic explanations.
By finalizing the features considering these constraints, the sentiment analysis system will be
better equipped to accurately analyze and classify sentiments in diverse text data.
Here are two alternative design flows for implementing a sentiment analysis system:
1. Data Acquisition: Collect text data from various sources such as social media, customer
reviews, or survey responses.
2. Text Preprocessing: Clean and preprocess the text data to remove noise and standardize
the text format.
3. Feature Extraction: Extract relevant features from the preprocessed text data using
techniques like bag-of-words or TF-IDF.
4. Rule-based Classification: Define rules or patterns to classify text data into sentiment
categories (e.g., positive, negative, neutral) based on extracted features.
5. Post-processing: Apply post-processing techniques to refine classification results and
handle edge cases or ambiguities.
6. Model Integration: Integrate the rule-based sentiment analysis system with other
applications or systems for sentiment monitoring and analysis.
1. Data Collection and Labeling: Gather a large dataset of labeled text data covering various
sentiment categories.
2. Text Preprocessing: Clean and preprocess the text data to prepare it for feature extraction.
3. Feature Extraction: Extract features from the preprocessed text data using techniques like
word embeddings or n-grams.
4. Model Selection and Training: Choose a machine learning model (e.g., Naive Bayes,
SVM, LSTM) and train it on the labeled data to learn the relationship between features and
sentiment labels.
5. Model Evaluation: Evaluate the trained model's performance using metrics such as
accuracy, precision, recall, and F1-score.
6. Model Deployment: Deploy the trained sentiment analysis model in production for real-
time sentiment classification of incoming text data.
7. Continuous Monitoring and Improvement: Monitor the deployed model's performance
over time, collect feedback, and retrain the model with new data to improve accuracy and
adaptability.
Based on the analysis of the two design flows, the selection should be based on the specific
requirements, resources, and constraints of the sentiment analysis project. Let's revisit the strengths
and weaknesses of each design:
Strengths:
Weaknesses:
Strengths:
Weaknesses:
Design Selection:
Based on the analysis, the recommended design is Design Flow 2: Machine Learning Approach.
Accuracy and Adaptability: The machine learning approach offers higher potential for
accuracy and adaptability to diverse sentiment patterns and data sources.
Generalization: Machine learning models can generalize well to unseen data, making them
suitable for handling complex sentiment analysis tasks.
Continuous Improvement: Machine learning models can be continuously refined and
improved with new data, ensuring optimal performance over time
CHAPTER 4
RESULTS ANALYSIS AND VALIDATION
1. Analysis:
Tools Used:
Python for Data Analysis: Utilize libraries like Pandas, NumPy, and Matplotlib for
statistical analysis and visualization of sentiment data.
Jupyter Notebooks: Create interactive notebooks to explore and document the
sentiment analysis process.
Natural Language Processing (NLP) Libraries: Utilize NLTK or SpaCy for text
preprocessing and feature extraction.
Tools Used:
Concept Maps or Mind Maps: Illustrate the sentiment analysis workflow, including
data preprocessing, feature extraction, model training, and evaluation.
UML Diagrams: Design the architecture of sentiment analysis models, depicting the
flow of data and processing steps.
3. Document Preparation:
Tools Used:
LaTeX or Microsoft Word: Prepare detailed documentation describing the
methodology, algorithms, and results of sentiment analysis.
Markdown: Create README files or project documentation for easy versioning and
sharing.
Data Visualization Tools: Generate visualizations using tools like Tableau or
Matplotlib to present sentiment analysis results effectively.
Tools Used:
Project Management Platforms: Utilize platforms like Trello or Asana for task
management, progress tracking, and collaboration.
Communication Tools: Foster communication among team members using Slack,
Microsoft Teams, or other messaging platforms.
Version Control Systems: Employ Git for version control and collaboration on code
repositories.
5. Testing/Characterization/Interpretation/Data Validation:
Tools Used:
Testing Frameworks: Conduct unit tests using frameworks like pytest to ensure the
functionality and accuracy of sentiment analysis algorithms.
Evaluation Metrics: Calculate metrics such as accuracy, precision, recall, and F1-
score to evaluate the performance of sentiment analysis models.
Data Visualization Tools: Visualize sentiment analysis results through charts,
graphs, and word clouds to facilitate interpretation and insights.
Other Considerations:
Ensure that the chosen tools and technologies align with project requirements and facilitate
effective collaboration among team members. Regularly update documentation and communicate
progress to stakeholders to ensure transparency and alignment with project goal
CHAPTER 5
CONCLUSION AND FUTURE WORK
5.1. Conclusion
The journey from the initial assessment to the implementation of a robust sentiment analysis system
has been marked by significant achievements and insights. Here, we summarize the key findings,
conclusions, and implications of our sentiment analysis project:
Effectiveness:
Accuracy and Efficiency: Our sentiment analysis system has demonstrated high accuracy in
analyzing sentiment even under challenging conditions, such as analyzing diverse text sources and
varying sentiment expressions. By leveraging machine learning algorithms trained on
comprehensive datasets, we have achieved superior performance in sentiment classification.
Utilization of Modern Tools: The use of computer-aided design (CAD) tools has facilitated the
visualization of our sentiment analysis system's architecture and components. Detailed diagrams
and visual models have provided stakeholders with a clear understanding of the system's design,
enhancing communication and collaboration.
Streamlined Workflow: Modern project management and communication tools, such as Trello
and Slack, have played a crucial role in streamlining our workflow. These platforms have enabled
effective communication, task tracking, and a structured approach to project development, leading
to increased productivity and efficiency.
Rigorous Testing: Our sentiment analysis system underwent rigorous testing, both automated and
manual, to validate its functionality and reliability. Tools like pytest and Postman have facilitated
comprehensive testing of code components and APIs, ensuring robustness and error-free operation.
Future Directions:
Continuous Improvement: While we celebrate the success of our current sentiment analysis
system, there are avenues for future enhancements and expansion. Continuous improvement
through regular updates and refinements to machine learning models will further enhance accuracy
and performance.
Scalability: Considerations for scaling the system to handle increased data volume and expanding
its capabilities to support additional languages or domains will be vital for meeting growing
demands and addressing diverse user needs.
Security and Privacy: Ongoing efforts to enhance system security, including data encryption and
compliance with privacy regulations, will ensure the protection of sensitive information and foster
user trust.
Overall Impact:
The successful implementation of our sentiment analysis system holds significant promise for
various applications, including market research, brand sentiment analysis, and social media
monitoring. By efficiently analyzing sentiment data, our system contributes to informed decision-
making and enhances user experiences in the digital landscape.
Based on our experience and insights gained from sentiment analysis, there are several potential
areas for future improvement and innovation:
Explore advancements in deep learning and neural networks to enhance the accuracy and
robustness of sentiment analysis models. Techniques such as transformers and attention
mechanisms hold promise for capturing nuanced sentiment expressions and context.
Improve the real-time processing capabilities of the sentiment analysis system to enable rapid
analysis and response to incoming data streams. This is particularly important for applications
requiring timely insights, such as social media monitoring and customer feedback analysis.
Enhance the system's ability to analyze sentiment across diverse data sources, including text,
images, and audio. Developing multimodal sentiment analysis models capable of processing
multiple data modalities will enable a more comprehensive understanding of sentiment.
Explore methods for enhancing the interpretability and explainability of sentiment analysis models.
Incorporating techniques such as attention visualization and model explanations will increase
transparency and trust in model predictions.
Address ethical considerations and biases in sentiment analysis by designing algorithms that
prioritize fairness, transparency, and inclusivity. Implementing fairness-aware learning techniques
and bias mitigation strategies will ensure equitable treatment across diverse user groups.
Integrate sentiment analysis capabilities into decision support systems to provide actionable
insights for stakeholders. Leveraging sentiment analysis to inform strategic decision-making
processes will enable organizations to respond effectively to changing market dynamics and
consumer sentiment.
Implement a framework for continuous model training and updating to adapt to evolving language
trends and sentiment expressions. Regularly retraining sentiment analysis models with fresh data
will maintain model relevance and accuracy over time.
Foster collaboration and knowledge sharing within the sentiment analysis community through
open-source initiatives and collaborative research projects. Encouraging the exchange of ideas and
resources will accelerate innovation and drive advancements in sentiment analysis technology.
Explore opportunities to integrate sentiment analysis capabilities into smart platforms and
intelligent systems. Leveraging sentiment analysis in conjunction with IoT devices, virtual
assistants, and smart analytics platforms will enable context-aware decision-making and
personalized user experiences.
Prioritize user-centric design principles and solicit feedback from end-users to ensure the sentiment
analysis system meets their needs and preferences. Incorporating user feedback into system design
and iteration cycles will enhance usability and user satisfaction.
In conclusion, the future of sentiment analysis holds exciting possibilities for innovation and
advancement. By embracing emerging technologies, addressing ethical considerations, and
prioritizing user needs, we can continue to unlock the full potential of sentiment analysis in various
domains and applications.
REFRENCES
1. NasukawaY(2003)Sentimentanalysis:capturing favorability using natural language process ing, IBM
Almaden Research Center, CA 95120, https://doi.org/10.1145/945645.945658
2. MoheyD(2016)Asurveyonsentimentanalysis challenges. J King Saud Univ Eng https://doi.
org/10.1016/j.jksues.2016.04.002
3. Alessia D (2015) Approaches, tools and applications for sentiment analysis implementation. Int J
Comput Appl 125(3)
4. Xu W,Ritter A, Grishman R (2013) Gathering and generating paraphrases from twitter with
application to normalization
5. Hazra TK (2015) Mitigating the adversities of social media through real time tweet extraction
system, IEEE, https://doi.org/10.1109/iemcon.2015.7344483
6. Semih Y (2014) Tagging accuracy analysis on part-of-speech taggers. J Comput Commun 2:157–
162, https://doi.org/10.4236/jcc.2014.24021
7. El-Din DM (2015) Online paper review analysis. Int J Adv Comput Sci Appl 6(9)
8. Kaushik L (2013) Sentiment extraction from natural audio streams, IEEE https://doi.org/10.
1109/icassp.2013.6639321
9. Vaghela VB(2016)Analysisofvarious sentiment classification techniques. Int J Comput Appl 140(3)
10. BiltawiL M (2016) Sentiment classification techniques for Arabic language a survey, IEEE,
https://doi.org/10.1109/iacs.2016.7476075
11. GoelA(2016)Realtimesentiment analysis of tweets using naive bayes, IEEE, https://doi.org/
10.1109/ngct.2016.7877424 12.
Hu M, Liu B (2004) Mining and summarizing customer reviews, seattle, Washington, USA,
https://doi.org/10.1145/1014052.1014073
13.Rob Mulla
14.KimS-M(2004)Determiningthe sentiment of opinions, ACM Digital Library, https://doi.org/
10.3115/1220355.1220555
15. Mohammad S (2009) Generating high-coverage semantic orientation lexicons from overtly marked
words and a thesaurus. In: Conference on empirical methods in natural language pro cessing, pp 599–
608
16. Miller GA (1993) Introduction to word net: an on-line lexical database 16. Hatzivassiloglou V,
McKeown R(1998)Predicting the semantic orientation of adjectives, New York, N.Y.10027, USA
17. Medhat W (2014) Sentiment analysis algorithms and applications a survey. Ain Shams Eng J
(Elsevier B.V.), 5(4):1093–1113
18. Soo-Min Kim, Determining the Sentiment of Opinions, International Journal, doi=10.1.1.68.1034,
(2004)
19. Pang B, Lee L (2008) Opinion mining and sentiment analysis. https://doi.org/10.1561/ 1500000011
20. Niu Y (2005) Analysis of polarity information in medical text, PMC Jurnal
21. Park S (2016) Building thesaurus lexicon using dictionary based approach for sentiment clas
sification, IEEE, https://doi.org/10.1109/sera.2016.7516126
22. Ramsingh J (2016) Data analytic on diabetic awareness with Hadoop streaming using map reduce in
Python, IEEE, https://doi.org/10.1109/icaca.2016.7887979
23. Kim S-M, Hovy E (2006) Automatic identification of pro and con reasons in online reviews,
ACMDigital Library 24. Trupthi M (2017) Sentiment analysis on twitter using streaming API, IEEE,
https://doi.org/10. 1109/iacc.2017.0186
25. Cambria E, Hussain A (2015) Group Using Lexicon Based Approach. Springer J https://doi.
Org/10.1007/978-3-319-23654-4
26. Akter S (2016) Sentiment analysis on Facebook group using lexicon based approach, IEEE,
https://doi.org/10.1109/ceeict.2016.7873080
27. Yoshizawa A (2016) Machine-learning approach to analysis of driving simulation data, IEEE,
https://doi.org/10.1109/icci-cc.2016.7862067 162 A. A. Q. Aqlan et al.
28. Istiaq Ahsan MN (2016) An ensemble approach to detect review spam using hybrid machine
learning technique, IEEE, https://doi.org/10.1109/iccitechn.2016.7860229
29. Kumar M (2016) Analyzing Twitter sentiments through big data, IEEE, https://doi.org/10.
1109/sysmart.2016.7894530 30. Abhinandan P, Shirahatti (2015) Sentiment analysis on Twitter data
using Hadoop. Int J Eng Res Gen Sci 3(6)
USER MANUAL
Sentiment Analysis in Python
This notebook is part of a tutorial that can be found on my youtube channel here, please check it out!
In this notebook we will be doing some sentiment analysis in python using two different techniques:
🤗
1. VADER (Valence Aware Dictionary and sEntiment Reasoner) - Bag of words approach
2. Roberta Pretrained Model from
3. Huggingface Pipeline
Step 0. Read in Data and NLTK Basics
[1]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
import nltk
[2]
# Read in data
df = pd.read_csv('../input/amazon-fine-food-reviews/Reviews.csv')
print(df.shape)
df = df.head(500)
print(df.shape)
(568454, 10)
(500, 10)
[3]
df.head()
Quick EDA
[4]
ax = df['Score'].value_counts().sort_index() \
.plot(kind='bar',
title='Count of Reviews by Stars',
figsize=(10, 5))
ax.set_xlabel('Review Stars')
plt.show()
USER MANUAL
Basic NLTK
[5]
example = df['Text'][50]
print(example)
This oatmeal is not good. Its mushy, soft, I don't like it. Quaker Oats is the way to go.
[6]
tokens = nltk.word_tokenize(example)
tokens[:10]
['This', 'oatmeal', 'is', 'not', 'good', '.', 'Its', 'mushy', ',', 'soft']
[7]
tagged = nltk.pos_tag(tokens)
tagged[:10]
[('This', 'DT'),
('oatmeal', 'NN'),
('is', 'VBZ'),
('not', 'RB'),
('good', 'JJ'),
('.', '.'),
('Its', 'PRP$'),
('mushy', 'NN'),
(',', ','),
('soft', 'JJ')]
[8]
entities = nltk.chunk.ne_chunk(tagged)
entities.pprint()
(S
This/DT
oatmeal/NN
is/VBZ
not/RB
good/JJ
./.
USER MANUAL
Its/PRP$
mushy/NN
,/,
soft/JJ
,/,
I/PRP
do/VBP
n't/RB
like/VB
it/PRP
./.
(ORGANIZATION Quaker/NNP Oats/NNPS)
is/VBZ
the/DT
way/NN
to/TO
go/VB
./.)
Step 1. VADER Seniment Scoring
We will use NLTK's SentimentIntensityAnalyzer to get the neg/neu/pos scores of the text.
This uses a "bag of words" approach:
1. Stop words are removed
2. each word is scored and combined to a total score.
[9]
from nltk.sentiment import SentimentIntensityAnalyzer
from tqdm.notebook import tqdm
sia = SentimentIntensityAnalyzer()
/opt/conda/lib/python3.7/site-packages/nltk/twitter/__init__.py:20: UserWarning: The twython library has not
been installed. Some functionality from the twitter package will not be available.
warnings.warn("The twython library has not been installed. "
[10]
sia.polarity_scores('I am so happy!')
{'neg': 0.0, 'neu': 0.318, 'pos': 0.682, 'compound': 0.6468}
[11]
sia.polarity_scores('This is the worst thing ever.')
{'neg': 0.451, 'neu': 0.549, 'pos': 0.0, 'compound': -0.6249}
[12]
sia.polarity_scores(example)
{'neg': 0.22, 'neu': 0.78, 'pos': 0.0, 'compound': -0.5448}
[13]
# Run the polarity score on the entire dataset
res = {}
for i, row in tqdm(df.iterrows(), total=len(df)):
text = row['Text']
myid = row['Id']
res[myid] = sia.polarity_scores(text)
[14]
vaders = pd.DataFrame(res).T
vaders = vaders.reset_index().rename(columns={'index': 'Id'})
vaders = vaders.merge(df, how='left')
[15]
# Now we have sentiment score and metadata
vaders.head()
[17]
fig, axs = plt.subplots(1, 3, figsize=(12, 3))
sns.barplot(data=vaders, x='Score', y='pos', ax=axs[0])
sns.barplot(data=vaders, x='Score', y='neu', ax=axs[1])
sns.barplot(data=vaders, x='Score', y='neg', ax=axs[2])
axs[0].set_title('Positive')
axs[1].set_title('Neutral')
axs[2].set_title('Negative')
plt.tight_layout()
plt.show()
[20]
# VADER results on example
print(example)
sia.polarity_scores(example)
This oatmeal is not good. Its mushy, soft, I don't like it. Quaker Oats is the way to go.
{'neg': 0.22, 'neu': 0.78, 'pos': 0.0, 'compound': -0.5448}
[21]
# Run for Roberta Model
encoded_text = tokenizer(example, return_tensors='pt')
output = model(**encoded_text)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
scores_dict = {
'roberta_neg' : scores[0],
'roberta_neu' : scores[1],
'roberta_pos' : scores[2]
}
print(scores_dict)
{'roberta_neg': 0.9763551, 'roberta_neu': 0.020687457, 'roberta_pos': 0.0029573673}
[22]
def polarity_scores_roberta(example):
encoded_text = tokenizer(example, return_tensors='pt')
output = model(**encoded_text)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
scores_dict = {
'roberta_neg' : scores[0],
'roberta_neu' : scores[1],
'roberta_pos' : scores[2]
}
return scores_dict
[23]
res = {}
for i, row in tqdm(df.iterrows(), total=len(df)):
try:
text = row['Text']
myid = row['Id']
vader_result = sia.polarity_scores(text)
vader_result_rename = {}
for key, value in vader_result.items():
vader_result_rename[f"vader_{key}"] = value
roberta_result = polarity_scores_roberta(text)
both = {**vader_result_rename, **roberta_result}
res[myid] = both
except RuntimeError:
print(f'Broke for id {myid}')
Broke for id 83
Broke for id 187
[24]
results_df = pd.DataFrame(res).T
USER MANUAL
results_df = results_df.reset_index().rename(columns={'index': 'Id'})
results_df = results_df.merge(df, how='left')
Compare Scores between models
[25]
results_df.columns
Index(['Id', 'vader_neg', 'vader_neu', 'vader_pos', 'vader_compound',
'roberta_neg', 'roberta_neu', 'roberta_pos', 'ProductId', 'UserId',
'ProfileName', 'HelpfulnessNumerator', 'HelpfulnessDenominator',
'Score', 'Time', 'Summary', 'Text'],
dtype='object')
[33]
sent_pipeline('I love sentiment analysis!')
[{'label': 'POSITIVE', 'score': 0.9997853636741638}]
[34]
sent_pipeline('Make sure to like and subscribe!')
[{'label': 'POSITIVE', 'score': 0.9991742968559265}]
[35]
sent_pipeline('booo')
[{'label': 'NEGATIVE', 'score': 0.9936267137527466}]
The End