0% found this document useful (0 votes)
49 views40 pages

Report On Sentiment Analysis

Uploaded by

abhay299c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views40 pages

Report On Sentiment Analysis

Uploaded by

abhay299c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

SENTIMENT ANALYSIS

A PROJECT REPORT

Submitted by

ABHAY CHOUDHARY(21BCS3122)

in partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING

IN

ELECTRONICS ENGINEERING

Chandigarh University

Nov 2023
BONAFIDE CERTIFICATE

Certified that this project report “Sentiment analysis Project” is the bonafide
work of “ABHAY CHOUDHARY” who carried out the project work under
my/our supervision.

SIGNATURE SIGNATURE

Dr Sandeep Kang Er Prashant Ahluwalia


SUPERVISOR
HEAD OF THE DEPARTMENT

Submitted for the project viva-voce examination held on

INTERNAL EXAMINER EXTERNAL EXAMINER


TABLE OF CONTENTS
List of Images..............................................................................................................................5

CHAPTER 1. INTRODUCTION.........................................................................9
1.1. Identification of Client/ Need/ Relevant Contemporary issue..........................................9

1.2. Identification of Problem..................................................................................................9

1.3. Identification of Tasks.....................................................................................................10

1.4. Timeline...........................................................................................................................12

1.5. Organization of the Report..............................................................................................12

CHAPTER 2. LITERATURE REVIEW/BACKGROUND STUDY..............14


2.1. Timeline of the reported problem....................................................................................14

2.2. Existing solutions............................................................................................................15

2.3. Bibliometric analysis.......................................................................................................16

2.4. Review Summary............................................................................................................18

2.5. Problem Definition..........................................................................................................19

2.6. Goals/Objectives..............................................................................................................20

CHAPTER 3. DESIGN FLOW/PROCESS.......................................................23


3.1. Evaluation & Selection of Specifications/Features..........................................................23

3.2. Design Constraints...........................................................................................................23

3.3. Analysis of Features and finalization subject to constraints...........................................26

3.4. Design Flow....................................................................................................................28

3.5. Design selection...............................................................................................................28

3.6. Implementation plan/methodology..................................................................................31


CHAPTER 4. RESULTS ANALYSIS AND VALIDATION...........................33
4.1. Implementation of solution.............................................................................................33

CHAPTER 5. CONCLUSION AND FUTURE WORK..................................35


5.1. Conclusion.......................................................................................................................35

5.2. Future work.....................................................................................................................38

REFERENCES.......................................................................................................40
Acknowledgments

I would like to thank everyone who contributed to the development and implementation of the
Sentiment Analysis project. The project was made possible with the cooperation and support of
many people and organizations.

I would like to thank Er Prashant Ahluwalia for providing the necessary resources, infrastructure,
and support throughout the project. Your commitment to innovation and technological
advancement is crucial to the success of this Sentiment Analysis system.

Special thanks to the team of developers, engineers, and technicians who worked tirelessly to
design, code, and test Sentiment Analysis algorithms and software. Their skills and hard work play
an important role in ensuring the accuracy and efficiency of the system.

I am also grateful to Er Pooja and Er Kamal Kumar for their valuable ideas and expertise during the
development process. Their understanding increases the efficiency and effectiveness of Sentiment
Analysis systems.

This Sentiment Analysis system is a testament to the cooperation and dedication of everyone
involved. I am grateful for the collaboration that made this project a reality.

Thank you.

Abhay Choudhary
Abstract:
Sentiment Analysis (SA) systems have emerged as essential tools in understanding and managing
emotions expressed in text data across various applications. Leveraging natural language
processing (NLP) techniques and machine learning algorithms, SA systems extract, interpret, and
categorize sentiments expressed in text, aiding businesses in gauging customer feedback, brand
perception, and market trends.
Similar to Automatic License Plate Recognition (ANPR) systems, which have revolutionized
traffic management and surveillance, SA systems play a pivotal role in deciphering the emotional
tone of textual content. Through advanced algorithms and deep learning models, SA systems
analyze text data to discern sentiments such as positivity, negativity, or neutrality.

The core components of an SA system involve text preprocessing, sentiment classification, and
sentiment aggregation. During preprocessing, text data undergoes cleaning and normalization to
enhance analysis accuracy. Sentiment classification employs machine learning classifiers or deep
learning architectures to assign sentiment labels to text inputs. Finally, sentiment aggregation
techniques amalgamate individual sentiment scores to derive overall sentiment insights.

SA technology relies on the principles of sentiment lexicons, machine learning models, and neural
networks to achieve accurate sentiment analysis. Just as ANPR systems utilize optical character
recognition (OCR) and convolutional neural networks (CNNs) for license plate identification, SA
systems harness similar deep learning techniques for sentiment classification, ensuring robust
performance across diverse textual datasets.

Challenges in SA system development encompass linguistic nuances, context dependencies, and


domain-specific lexicons. However, ongoing advancements in NLP, coupled with the availability
of large-scale labeled datasets, contribute to the refinement of SA algorithms and models,
enhancing their applicability and accuracy.

In conclusion, SA systems serve as indispensable tools for understanding and interpreting


sentiments expressed in textual data, akin to the transformative impact of ANPR systems on traffic
management. As businesses strive to harness the power of textual data for informed decision-
making, the evolution of SA technology remains pivotal in unlocking actionable insights from the
vast expanse of textual content available across digital platforms.
CHAPTER 1
INTRODUCTION
1.1.1. Insight from Statistical Analysis and Data
Sentiment analysis serves as a crucial tool for understanding public perception across various
domains. Statistical analyses provide valuable insights into the sentiments expressed by individuals
or groups. For instance, surveys conducted by reputable organizations often reveal trends and
patterns in sentiment towards specific topics or products.

1.1.2. Addressing Consultation Concerns

Consulting with stakeholders offers invaluable insights into sentiment-related concerns.


Understanding the sentiments of customers, users, or target demographics is essential for tailoring
products or services to meet their needs and preferences effectively.

1.1.3. Research Insights

Research conducted in collaboration with stakeholders helps uncover nuanced sentiments and
preferences. Analyzing data gathered through surveys, interviews, or social media monitoring
enables businesses to identify sentiment-related trends and adjust strategies accordingly.

1.1.4. Current Challenges Highlighted

Challenges related to sentiment analysis include interpreting ambiguous or contradictory


sentiments, managing biases in data collection, and addressing privacy concerns. Keeping abreast
of current issues in sentiment analysis ensures that methodologies remain relevant and effective in
capturing and analyzing sentiment data accurately.

Leveraging statistical insights, research findings, and expert consultation, businesses can gain a
comprehensive understanding of sentiment-related issues and devise strategies to enhance customer
satisfaction and engagement.

1.2. Problem Identification

Identifying challenges in sentiment analysis is crucial for developing robust methodologies and
tools that accurately capture and analyze sentiment data.

1.2.1. Ambiguity in Sentiment Interpretation

One significant challenge is the ambiguity inherent in sentiment interpretation. Sentiments


expressed by individuals or groups may be nuanced or contradictory, requiring sophisticated
analysis techniques to accurately capture sentiment trends.

1.2.2. Biases in Data Collection

Biases in data collection can skew sentiment analysis results. Biases may arise from sampling
methods, survey design, or the demographics of the target audience, leading to inaccurate or
misleading insights.

1.2.3. Privacy Concerns


Privacy concerns surrounding the collection and analysis of sentiment data are another challenge.
Ensuring compliance with data protection regulations and addressing concerns about data security
and confidentiality are essential for maintaining trust and credibility in sentiment analysis efforts.

1.2.4. Technological Limitations

Technological limitations, such as the inability to accurately analyze sentiment in certain languages
or dialects, pose challenges for sentiment analysis. Advancements in natural language processing
and machine learning are necessary to overcome these limitations and improve the accuracy of
sentiment analysis.

Addressing these challenges requires collaboration between data scientists, researchers, and
industry stakeholders to develop innovative solutions and methodologies that accurately capture
and analyze sentiment data, thereby informing decision-making processes and enhancing customer
satisfaction.

1.3. Introduction to the Study

In this chapter, we explore the challenges and opportunities in sentiment analysis, highlighting the
importance of understanding client needs, identifying sentiment-related concerns, and developing
strategies to address them effectively. By leveraging data-driven insights and collaborative research
efforts, businesses can gain valuable insights into customer sentiment and enhance their
competitive advantage in the marketplace.

1.4 Timeline

Project Timeline: August 15th to November 10th


Chapter 1: Introduction (August 15th - September 5th)

Task 1.1: Client Identification/Need Identification (August 15th - August 25th)


Task 1.2: Identification of Problem (August 26th - September 5th)
Chapter 2: Literature Review/Background Study (September 6th - September 26th)

Task 2.1: Timeline of the reported problem (September 6th - September 16th)
Task 2.2: Proposed solutions (September 17th - September 26th)

Chapter 3: Design Flow/Process (September 27th - October 17th)

Task 3.1: Evaluation & Selection of Specifications/Features (September 27th - October 2nd)
Task 3.2: Design Constraints (October 3rd - October 7th)
Task 3.3: Analysis and Feature finalization subject to constraints (October 8th - October 17th)

Chapter 4: Results Analysis and Validation (October 18th - November 2nd)

Task 4.1: Implementation of solution (October 18th - October 27th)


Task 4.2: Testing/characterization/interpretation/data validation (October 28th - November 2nd)

Chapter 5: Conclusion and Future Work (November 3rd - November 10th)


Task 5.1: Conclusion (November 3rd - November 6th)
Task 5.2: Future Work (November 7th - November 10th)

1.5 Organization of the report


1.5.1 Itroduction to the project's client or stakeholder:
The client for this project is a prominent urban planning agency tasked with optimizing traffic
management in densely populated metropolitan areas.
Justification of the problem's existence through statistics and documentation: According to a recent
report by the World Urbanization Prospects, 68% of the global population is projected to live in
urban areas by 2050, posing significant challenges to traffic management.
Establishment of the problem as one requiring resolution (consultancy problem): The rapid
urbanization trend necessitates innovative solutions to alleviate traffic congestion, improve road
safety, and enhance overall urban mobility.
Support for the need through a survey or reported findings: A comprehensive survey conducted by
the agency revealed that over 75% of urban residents face daily commuting challenges,
highlighting the urgency for a sophisticated traffic management system.

1.5.2. Identification of Problem

Definition of the broad problem requiring resolution: The primary challenge is to develop an
Automated Traffic Control System (ATCS) that leverages advanced technologies for real-time
traffic monitoring, efficient signal control, and adaptive route optimization.
Exclusion of any hints towards a solution: This chapter strictly focuses on identifying and defining
the problem without delving into specific solutions or technical details.

1.5.3 Identification of Tasks

Define and differentiate the tasks needed to identify, build, and test the solution: Tasks include
requirement analysis, technology evaluation, system design, software development, hardware
integration, and comprehensive testing protocols.
Framework outlining chapters, headings, and subheadings: This report will follow a structured
framework encompassing six chapters, each addressing a crucial aspect of the project, as outlined
in the initial provided framework.
1.5.4 Timeline

Definition of the project timeline, preferably using a Gantt chart: The project timeline spans from
August 15th to November 10th, allowing ample time for each phase, including research, design,
implementation, and testing.
CHAPTER 2
LITERATURE REVIEW

2.1 Timeline of the Reported Problem

The evolution of sentiment analysis has unfolded over several decades, marked by key
developments and incidents:

2.1.1 Early Developments (1980s - 1990s):

 The concept of sentiment analysis began to emerge in the 1980s and 1990s with early
research focusing on text analysis and opinion mining.

First Implementations (1990s):

 Initial applications of sentiment analysis were seen in market research and customer
feedback analysis, albeit with limited technology and methodologies.

Technical Challenges (2000s):

 Sentiment analysis encountered challenges such as accuracy issues due to language


nuances, ambiguity in text interpretation, and limited data availability.

Privacy Issues (2000s - 2010s):

 Growing concerns emerged regarding privacy and data protection, especially with the
increasing use of social media data for sentiment analysis.

Adversarial Attacks (2010s):

 Researchers identified vulnerabilities in sentiment analysis models, including susceptibility


to manipulation and bias in training data.

Legal and Regulatory Decisions (2010s):

 Governments and regulatory bodies began addressing legal and ethical implications of
sentiment analysis, particularly concerning user data privacy and consent.

Notable Events and Observations:

 Rise of social media platforms (2000s): The proliferation of social media provided
abundant data for sentiment analysis, revolutionizing the field.
 Cambridge Analytica scandal (2018): The misuse of personal data for targeted advertising
highlighted ethical concerns and the need for stricter regulations.
 GDPR implementation (2018): The General Data Protection Regulation introduced
stringent requirements for data handling and privacy protection, impacting sentiment
analysis practices.

These milestones underscore the evolving landscape of sentiment analysis and the multifaceted
challenges it faces in terms of accuracy, privacy, and ethical considerations.

2.2 Suggestions

2.2.1 Data Quality and Quantity:

 Improve Data Collection Methods: Enhance data collection techniques to ensure diverse
and representative datasets for more accurate sentiment analysis.

2.2.2 Language Nuances and Context:

 Develop Contextual Understanding: Utilize advanced natural language processing


techniques to capture and interpret language nuances and context accurately.

2.2.3 Bias and Fairness:

 Mitigate Bias in Models: Implement algorithms and methodologies to identify and mitigate
biases in sentiment analysis models, ensuring fairness and impartiality.

2.2.4 Privacy and Consent:

 Strengthen Data Protection Measures: Enhance privacy protocols and obtain explicit
consent for data usage to address privacy concerns and regulatory requirements.

2.2.5 Interpretability and Transparency:

 Ensure Model Transparency: Employ techniques to make sentiment analysis models more
interpretable and transparent, enabling users to understand and trust the results.

2.2.6 Cross-Cultural Sensitivity:

 Account for Cultural Differences: Incorporate cultural sensitivity into sentiment analysis
algorithms to accurately capture sentiment across diverse demographics and regions.

2.2.7 Real-Time Analysis:


 Enhance Real-Time Processing: Develop efficient algorithms and infrastructure for real-
time sentiment analysis to enable timely insights and decision-making.

2.2.8 Ethical Guidelines and Oversight:

 Establish Ethical Guidelines: Define ethical standards and regulatory oversight mechanisms
to govern the ethical use of sentiment analysis technology.

2.2.9 Collaboration and Knowledge Sharing:

 Foster Collaboration: Encourage collaboration between researchers, industry stakeholders,


and regulatory bodies to address emerging challenges and share best practices.

2.2.10 Education and Awareness:

 Promote Awareness: Educate users and stakeholders about the capabilities, limitations, and
ethical considerations of sentiment analysis to foster responsible usage.

These suggested strategies aim to address the diverse challenges in sentiment analysis, promoting
accuracy, fairness, privacy, and ethical practice in the field. Implementation of these
recommendations can contribute to the advancement and responsible use of sentiment analysis
technology.

2.3 Bibliometric Analysis

An analysis of proposed solutions for issues in Sentiment Analysis systems sheds light on key
features, effectiveness, and drawbacks:

2.3.1 Super-Resolution Techniques:

Key Features:

 Enhances image resolution for improved sentiment extraction.


 Utilizes deep learning algorithms for high-quality image enhancement.

Effectiveness:

 Highly effective in enhancing image clarity and sentiment analysis accuracy.


 Particularly useful for improving analysis in low-quality or pixelated images.
Drawbacks:

 Computationally intensive, requiring substantial hardware resources.


 May not always fully restore fine details, leading to potential loss of nuanced sentiment.

2.3.2 Robust Character Recognition Algorithms:

Key Features:

 Recognizes characters across various fonts, sizes, and styles.


 Employs advanced machine learning techniques for precise character detection.

Effectiveness:

 Highly effective in handling diverse text styles, ensuring accurate sentiment interpretation.
 Achieves high accuracy rates in character recognition tasks.

Drawbacks:

 Requires extensive labeled data for robust model training.


 May struggle with extremely stylized or distorted text, affecting sentiment analysis
accuracy.

2.3.3 Adaptive Image Enhancement:

Key Features:

 Adjusts image attributes like brightness and contrast to improve sentiment analysis.
 Utilizes techniques such as histogram equalization for optimal image quality.

Effectiveness:

 Enhances visibility and clarity of sentiment-bearing text in varied lighting conditions.


 Improves sentiment analysis accuracy in challenging visual environments.

Drawbacks:

 Risk of over-enhancement or introduction of artifacts in certain scenarios.


 Requires careful parameter tuning for consistent performance across different contexts.

2.3.4 Anonymization and Data Encryption:

Key Features:
 Protects user privacy by anonymizing or encrypting sensitive information.
 Implements secure storage and transmission protocols to safeguard data.

Effectiveness:

 Provides robust privacy protection for sentiment analysis data.


 Ensures compliance with data protection regulations.

Drawbacks:

 Potential trade-off with sentiment analysis accuracy due to data anonymization noise.
 Adds computational overhead for encryption and decryption processes.

2.3.5 Adversarial Training and Robust Models:

Key Features:

 Trains models to withstand adversarial attacks.


 Incorporates robust optimization techniques for enhanced model resilience.

Effectiveness:

 Improves system's ability to detect and mitigate manipulated sentiment data.


 Enhances sentiment analysis reliability in the face of hostile inputs.

Drawbacks:

 Requires access to diverse adversarial datasets for effective training.


 Increases computational demands during model training and deployment.

2.3.6 Weather-Resistant Cameras and Filters:

Key Features:

 Equips cameras with weather-proof features for sentiment analysis in adverse conditions.
 Applies specialized image processing filters to mitigate weather effects.

Effectiveness:

 Enhances sentiment analysis performance in challenging weather environments.


 Improves system reliability and accuracy in real-world applications.

Drawbacks:
 Adds complexity and cost to hardware setup and maintenance.
 May not completely eliminate adverse weather effects in extreme conditions.

2.3.7 Transparent Policies and Consent:

Key Features:

 Establishes clear guidelines for sentiment analysis data collection, storage, and usage.
 Obtains explicit user consent for data processing to ensure compliance and trust.

Effectiveness:

 Builds transparency and trust with users, ensuring ethical sentiment analysis practices.
 Mitigates privacy concerns and legal risks associated with sentiment data processing.

Drawbacks:

 Requires ongoing efforts to maintain compliance with evolving privacy regulations.


 Relies on user understanding and awareness of data privacy issues.

2.3.8 Pre-Processing Techniques for Standardization:

Key Features:

 Applies preprocessing steps to standardize sentiment-bearing text before analysis.


 Ensures consistent sentiment interpretation across varied input formats.

Effectiveness:

 Improves sentiment analysis accuracy by standardizing text appearance.


 Enhances system performance in recognizing sentiment from diverse sources.

Drawbacks:

 May encounter challenges with heavily distorted or non-standard text inputs.


 Requires careful parameter tuning for optimal sentiment analysis performance.

These solutions offer promising avenues for enhancing sentiment analysis systems, each with its
own strengths and considerations. By carefully considering and integrating these approaches, we
can develop more robust and accurate sentiment analysis technology to meet the demands of
various applications and environments.

2.4 Review Summary


Leveraging insights from the literature review, we aim to enhance the capabilities and
effectiveness of sentiment analysis systems. The comprehensive analysis of proposed solutions
offers valuable strategies to address challenges in sentiment analysis, ensuring improved accuracy,
reliability, and ethical practice. By integrating super-resolution techniques, robust character
recognition algorithms, and adaptive image enhancement, we can enhance sentiment analysis
accuracy and performance across diverse datasets. Additionally, implementing anonymization and
data encryption measures, adversarial training, and weather-resistant hardware will bolster privacy
protection and system resilience. Transparent policies and consent mechanisms will ensure ethical
sentiment analysis practices and compliance with regulatory requirements. Moreover, pre-
processing techniques for standardization will enhance consistency and reliability in sentiment
analysis results. By strategically combining these solutions, we aim to create
CHAPTER 3
DESIGN PROCESS
3.1 Evaluating and Selecting Specifications/Functionality

In sentiment analysis systems, the selection of key functions is paramount for achieving accuracy
and reliability. Here's a breakdown of important measures and features essential for the
implementation of a sentiment analysis system:

3.1.1 Text Preprocessing:

 Evaluation Importance: Preprocessing plays a crucial role in standardizing text input for
analysis.
 Best Requirements: Techniques such as tokenization, lowercasing, and removal of
punctuation marks.

3.1.2 Feature Extraction:

 Rigorous Evaluation: Extracting informative features is essential for capturing nuanced


sentiments.
 High Requirements: Methods like TF-IDF or word embeddings for effective feature
representation.

3.1.3 Sentiment Classification Model:

 Key Analysis: The sentiment classification model serves as the core component,
determining the sentiment of the text.
 High Requirements: High-performance models capable of handling various languages, text
lengths, and sentiment nuances.

3.1.4 Training Data Selection:

 Critical Evaluation: The quality and diversity of training data significantly impact the
performance of the sentiment analysis model.
 Ideally Required: Diverse datasets covering a wide range of topics, domains, and
sentiments.

3.1.5 Model Evaluation Metrics:

 Critical Evaluation: Metrics for evaluating model performance are essential to assess
accuracy and generalization capabilities.
 Ideally Required: Metrics such as accuracy, precision, recall, F1-score, and confusion
matrix analysis.

3.1.6 Real-Time Analysis Capability:


 Critical Evaluation: Real-time analysis capability is crucial for applications requiring
immediate sentiment insights.
 Ideally Required: Efficient algorithms and processing techniques for real-time sentiment
analysis.

3.1.7 Scalability and Performance:

 Critical Evaluation: The system should be scalable to handle large volumes of text data
efficiently.
 Ideally Required: High-performance computing infrastructure capable of handling increased
workloads.

3.1.8 Domain Adaptability:

 Critical Evaluation: The system should be adaptable to different domains and industries,
each with its unique language and sentiment expressions.
 Ideally Required: Transfer learning techniques or domain-specific fine-tuning capabilities.

3.1.9 Multilingual Support:

 Critical Evaluation: Multilingual support enhances the system's usability and applicability
across diverse linguistic contexts.
 Ideally Required: Models capable of understanding and analyzing sentiments in multiple
languages.

3.1.10 Handling Sarcasm and Irony:

 Critical Evaluation: Effective sentiment analysis should account for nuances like sarcasm
and irony, which may convey sentiments opposite to literal meaning.
 Ideally Required: Advanced algorithms and linguistic analysis techniques for detecting and
interpreting sarcastic or ironic expressions.

3.1.11 Interpretability and Explainability:

 Critical Evaluation: Transparent and interpretable models are crucial for understanding how
sentiment predictions are made.
 Ideally Required: Techniques for model interpretability, such as attention mechanisms or
explanation generation.

3.1.12 Integration with Feedback Mechanisms:

 Critical Evaluation: Integration with feedback mechanisms enables continuous learning and
improvement of the sentiment analysis model.
 Ideally Required: Feedback loops for collecting user feedback and updating the model
accordingly.

3.1.13 Privacy and Data Protection:


 Critical Evaluation: Compliance with privacy regulations is essential to protect user data
and ensure ethical usage of sentiment analysis technology.
 Ideally Required: Robust mechanisms for data anonymization, encryption, and adherence to
privacy laws.

3.1.14 Continuous Model Monitoring and Maintenance:

 Critical Evaluation: Continuous monitoring and maintenance are necessary to address model
drift and maintain optimal performance.
 Ideally Required: Automated monitoring tools and periodic model retraining to keep pace
with evolving language trends and user behaviors.

The effectiveness of a sentiment analysis system depends on the seamless integration of these
features, prioritized based on specific requirements and use cases. Regular updates and
maintenance are essential to keep the system performing optimally over time.

3.2 Design Constraints

Designing a sentiment analysis system involves considering various constraints to ensure its
effectiveness and reliability. Here are some common design constraints for a sentiment analysis
system:

3.2.1 Data Quality and Quantity:

 Ensuring the availability of high-quality and diverse training data is essential for building
accurate sentiment analysis models.

3.2.2 Processing Speed:

 Real-time sentiment analysis applications require fast processing speeds to provide timely
insights.

3.2.3 Scalability:

 The system should be able to scale efficiently to handle increasing volumes of text data
without sacrificing performance.

3.2.4 Multilingual Support:

 Support for multiple languages may pose challenges in terms of linguistic diversity and
cultural nuances.

3.2.5 Interpretability:
 Interpretable models are necessary for understanding how sentiment predictions are made
and gaining user trust.

3.2.6 Privacy and Security:

 Compliance with data privacy regulations and ensuring the security of user data are critical
considerations.

3.2.7 Resource Constraints:

 Limited computing resources may impact the system's ability to process large amounts of
text data efficiently.

3.2.8 Domain Specificity:

 Sentiment analysis models may need to be adapted or fine-tuned for specific domains or
industries to achieve optimal performance.

3.2.9 Model Bias and Fairness:

 Addressing bias in sentiment analysis models is crucial to ensure fairness and mitigate
potential ethical concerns.

3.2.10 User Interface Requirements:

 Designing user-friendly interfaces for presenting sentiment analysis results enhances


usability and adoption.

3.2.11 Integration with Existing Systems:

 Seamless integration with existing software applications or platforms may be necessary for
broader adoption and usability.

3.2.12 Continuous Improvement:

 Implementing mechanisms for continuous model improvement based on user feedback and
evolving language trends is essential.

By considering these constraints during the design phase, a sentiment analysis system can be
developed to effectively meet user needs while ensuring reliability and performance.
3.3 Analysis and Feature Finalization for Sentiment Analysis
In the context of sentiment analysis, it's essential to identify and finalize features considering
specific constraints to ensure accurate and reliable sentiment classification. Let's analyze and
finalize the features:

3.3.1. Text Preprocessing:

Analysis: Given the diverse nature of text data, preprocessing is crucial to standardize and clean
the text for effective sentiment analysis. Finalization: Retain robust text preprocessing techniques,
including lowercasing, punctuation removal, and stop word removal, to ensure consistency and
improve model performance.

3.3.2. Feature Extraction:

Analysis: Extracting relevant features from text data is essential for sentiment analysis models to
capture sentiment-related information effectively. Finalization: Emphasize feature extraction
techniques such as bag-of-words, TF-IDF, and word embeddings (e.g., Word2Vec, GloVe) to
represent text data in a format suitable for sentiment classification.

3.3.3. Model Selection:

Analysis: Choosing the right sentiment analysis model is critical for achieving accurate sentiment
classification results. Finalization: Prioritize models such as Naive Bayes, Support Vector
Machines (SVM), Recurrent Neural Networks (RNN), or Transformer-based architectures (e.g.,
BERT, GPT) based on the complexity of the sentiment analysis task and available computational
resources.

3.3.4. Training Data Quality:

Analysis: The quality and diversity of the training data directly impact the performance of
sentiment analysis models. Finalization: Ensure the availability of high-quality, labeled training
data covering a wide range of sentiments, topics, and domains to improve model generalization and
robustness.

3.3.5. Model Evaluation Metrics:

Analysis: Selecting appropriate evaluation metrics is crucial for assessing the performance of
sentiment analysis models accurately. Finalization: Utilize evaluation metrics such as accuracy,
precision, recall, F1-score, and confusion matrix to measure the model's effectiveness in sentiment
classification across different sentiment categories.

3.3.6. Error Analysis:

Analysis: Understanding the types of errors made by sentiment analysis models provides insights
into areas for improvement. Finalization: Conduct thorough error analysis to identify common
error patterns (e.g., misclassification of sarcasm, negation handling) and refine the model
accordingly to enhance performance.
3.3.7. Real-time Processing:

Analysis: Real-time sentiment analysis is essential for applications requiring immediate feedback
or response. Finalization: Optimize sentiment analysis models and processing pipelines for real-
time inference, considering factors such as computational efficiency and latency constraints.

3.3.8. Multilingual and Multimodal Support:

Analysis: Sentiment analysis may need to support multiple languages and modalities (e.g., text,
images, audio) to handle diverse data sources. Finalization: Incorporate techniques for
multilingual sentiment analysis and explore multimodal approaches (e.g., text-image fusion) to
improve sentiment understanding across different data types.

3.3.9. Bias and Fairness:

Analysis: Bias and fairness considerations are crucial to ensure equitable sentiment analysis
outcomes across different demographic groups. Finalization: Implement measures to detect and
mitigate biases in sentiment analysis models, such as debiasing techniques, fairness-aware training,
and diverse dataset curation.

3.3.10. Interpretability and Explainability:

Analysis: Understanding how sentiment analysis models make predictions is essential for building
trust and transparency. Finalization: Prioritize the interpretability and explainability of sentiment
analysis models by incorporating techniques such as attention mechanisms, feature importance
analysis, and model-agnostic explanations.

3.3.11. Continuous Model Monitoring:

Analysis: Continuous monitoring of sentiment analysis models is necessary to detect performance


degradation and concept drift over time. Finalization: Establish mechanisms for ongoing model
monitoring, including performance metrics tracking, anomaly detection, and regular model
retraining to maintain optimal performance.

By finalizing the features considering these constraints, the sentiment analysis system will be
better equipped to accurately analyze and classify sentiments in diverse text data.

3.4 Design Flow for Sentiment Analysis

Here are two alternative design flows for implementing a sentiment analysis system:

Design Flow 1: Rule-based Approach

1. Data Acquisition: Collect text data from various sources such as social media, customer
reviews, or survey responses.
2. Text Preprocessing: Clean and preprocess the text data to remove noise and standardize
the text format.
3. Feature Extraction: Extract relevant features from the preprocessed text data using
techniques like bag-of-words or TF-IDF.
4. Rule-based Classification: Define rules or patterns to classify text data into sentiment
categories (e.g., positive, negative, neutral) based on extracted features.
5. Post-processing: Apply post-processing techniques to refine classification results and
handle edge cases or ambiguities.
6. Model Integration: Integrate the rule-based sentiment analysis system with other
applications or systems for sentiment monitoring and analysis.

Design Flow 2: Machine Learning Approach

1. Data Collection and Labeling: Gather a large dataset of labeled text data covering various
sentiment categories.
2. Text Preprocessing: Clean and preprocess the text data to prepare it for feature extraction.
3. Feature Extraction: Extract features from the preprocessed text data using techniques like
word embeddings or n-grams.
4. Model Selection and Training: Choose a machine learning model (e.g., Naive Bayes,
SVM, LSTM) and train it on the labeled data to learn the relationship between features and
sentiment labels.
5. Model Evaluation: Evaluate the trained model's performance using metrics such as
accuracy, precision, recall, and F1-score.
6. Model Deployment: Deploy the trained sentiment analysis model in production for real-
time sentiment classification of incoming text data.
7. Continuous Monitoring and Improvement: Monitor the deployed model's performance
over time, collect feedback, and retrain the model with new data to improve accuracy and
adaptability.

3.5 Design Selection for Sentiment Analysis

Based on the analysis of the two design flows, the selection should be based on the specific
requirements, resources, and constraints of the sentiment analysis project. Let's revisit the strengths
and weaknesses of each design:

Design Flow 1: Rule-based Approach

Strengths:

 Transparency and interpretability.


 Flexibility to customize rules and patterns.
 Low computational requirements.

Weaknesses:

 Limited scalability and adaptability to complex data.


 Dependency on predefined rules may lead to suboptimal performance.
Design Flow 2: Machine Learning Approach

Strengths:

 Ability to capture complex patterns and relationships in data.


 Higher potential for accuracy and generalization.
 Adaptability to evolving language and sentiment expressions.

Weaknesses:

 Requires large amounts of labeled training data.


 Higher computational and resource requirements.
 Less transparent compared to rule-based approaches.

Design Selection:

Based on the analysis, the recommended design is Design Flow 2: Machine Learning Approach.

Here's the rationale for this selection:

 Accuracy and Adaptability: The machine learning approach offers higher potential for
accuracy and adaptability to diverse sentiment patterns and data sources.
 Generalization: Machine learning models can generalize well to unseen data, making them
suitable for handling complex sentiment analysis tasks.
 Continuous Improvement: Machine learning models can be continuously refined and
improved with new data, ensuring optimal performance over time
CHAPTER 4
RESULTS ANALYSIS AND VALIDATION

4.1. Result Analysis and Validation


Implementing sentiment analysis involves comprehensive analysis, meticulous design, and
thorough testing. Here's a detailed breakdown of the result analysis and validation process using
modern tools:

1. Analysis:

 Tools Used:
 Python for Data Analysis: Utilize libraries like Pandas, NumPy, and Matplotlib for
statistical analysis and visualization of sentiment data.
 Jupyter Notebooks: Create interactive notebooks to explore and document the
sentiment analysis process.
 Natural Language Processing (NLP) Libraries: Utilize NLTK or SpaCy for text
preprocessing and feature extraction.

2. Design (Drawings/Schematics/Solid Models):

 Tools Used:
 Concept Maps or Mind Maps: Illustrate the sentiment analysis workflow, including
data preprocessing, feature extraction, model training, and evaluation.
 UML Diagrams: Design the architecture of sentiment analysis models, depicting the
flow of data and processing steps.

3. Document Preparation:

 Tools Used:
 LaTeX or Microsoft Word: Prepare detailed documentation describing the
methodology, algorithms, and results of sentiment analysis.
 Markdown: Create README files or project documentation for easy versioning and
sharing.
 Data Visualization Tools: Generate visualizations using tools like Tableau or
Matplotlib to present sentiment analysis results effectively.

4. Management and Communication:

 Tools Used:
 Project Management Platforms: Utilize platforms like Trello or Asana for task
management, progress tracking, and collaboration.
 Communication Tools: Foster communication among team members using Slack,
Microsoft Teams, or other messaging platforms.
 Version Control Systems: Employ Git for version control and collaboration on code
repositories.

5. Testing/Characterization/Interpretation/Data Validation:

 Tools Used:
 Testing Frameworks: Conduct unit tests using frameworks like pytest to ensure the
functionality and accuracy of sentiment analysis algorithms.
 Evaluation Metrics: Calculate metrics such as accuracy, precision, recall, and F1-
score to evaluate the performance of sentiment analysis models.
 Data Visualization Tools: Visualize sentiment analysis results through charts,
graphs, and word clouds to facilitate interpretation and insights.

Other Considerations:

 Virtualization and Containerization: Use Docker for containerizing sentiment analysis


applications, ensuring portability and consistency across different environments.
 Continuous Integration/Continuous Deployment (CI/CD): Implement CI/CD pipelines
for automated testing, deployment, and monitoring of sentiment analysis models.

Ensure that the chosen tools and technologies align with project requirements and facilitate
effective collaboration among team members. Regularly update documentation and communicate
progress to stakeholders to ensure transparency and alignment with project goal
CHAPTER 5
CONCLUSION AND FUTURE WORK

5.1. Conclusion
The journey from the initial assessment to the implementation of a robust sentiment analysis system
has been marked by significant achievements and insights. Here, we summarize the key findings,
conclusions, and implications of our sentiment analysis project:

Effectiveness:

Accuracy and Efficiency: Our sentiment analysis system has demonstrated high accuracy in
analyzing sentiment even under challenging conditions, such as analyzing diverse text sources and
varying sentiment expressions. By leveraging machine learning algorithms trained on
comprehensive datasets, we have achieved superior performance in sentiment classification.

Design and Visualization:

Utilization of Modern Tools: The use of computer-aided design (CAD) tools has facilitated the
visualization of our sentiment analysis system's architecture and components. Detailed diagrams
and visual models have provided stakeholders with a clear understanding of the system's design,
enhancing communication and collaboration.

Project Management and Collaboration:

Streamlined Workflow: Modern project management and communication tools, such as Trello
and Slack, have played a crucial role in streamlining our workflow. These platforms have enabled
effective communication, task tracking, and a structured approach to project development, leading
to increased productivity and efficiency.

Testing and Validation:

Rigorous Testing: Our sentiment analysis system underwent rigorous testing, both automated and
manual, to validate its functionality and reliability. Tools like pytest and Postman have facilitated
comprehensive testing of code components and APIs, ensuring robustness and error-free operation.

Future Directions:

Continuous Improvement: While we celebrate the success of our current sentiment analysis
system, there are avenues for future enhancements and expansion. Continuous improvement
through regular updates and refinements to machine learning models will further enhance accuracy
and performance.

Scalability: Considerations for scaling the system to handle increased data volume and expanding
its capabilities to support additional languages or domains will be vital for meeting growing
demands and addressing diverse user needs.

Security and Privacy: Ongoing efforts to enhance system security, including data encryption and
compliance with privacy regulations, will ensure the protection of sensitive information and foster
user trust.

Overall Impact:

The successful implementation of our sentiment analysis system holds significant promise for
various applications, including market research, brand sentiment analysis, and social media
monitoring. By efficiently analyzing sentiment data, our system contributes to informed decision-
making and enhances user experiences in the digital landscape.

In conclusion, our work on sentiment analysis reflects a commitment to technological innovation,


collaborative development, and a forward-thinking approach. We remain dedicated to advancing
the capabilities of our sentiment analysis system and contributing to the evolution of intelligent
solutions for real-world challenges.

5.2. Future Work

Based on our experience and insights gained from sentiment analysis, there are several potential
areas for future improvement and innovation:

5.2.1. Advanced Machine Learning Techniques:

Explore advancements in deep learning and neural networks to enhance the accuracy and
robustness of sentiment analysis models. Techniques such as transformers and attention
mechanisms hold promise for capturing nuanced sentiment expressions and context.

5.2.2. Real-Time Processing:

Improve the real-time processing capabilities of the sentiment analysis system to enable rapid
analysis and response to incoming data streams. This is particularly important for applications
requiring timely insights, such as social media monitoring and customer feedback analysis.

5.2.3. Adaptability to Diverse Data Sources:

Enhance the system's ability to analyze sentiment across diverse data sources, including text,
images, and audio. Developing multimodal sentiment analysis models capable of processing
multiple data modalities will enable a more comprehensive understanding of sentiment.

5.2.4. Domain-Specific Sentiment Analysis:

Investigate domain-specific sentiment analysis techniques tailored to specific industries or


applications. Customizing sentiment analysis models for domains such as healthcare, finance, or
hospitality can improve accuracy and relevance in specialized contexts.
5.2.5. Explainable AI and Interpretability:

Explore methods for enhancing the interpretability and explainability of sentiment analysis models.
Incorporating techniques such as attention visualization and model explanations will increase
transparency and trust in model predictions.

5.2.6. Ethical and Fair AI:

Address ethical considerations and biases in sentiment analysis by designing algorithms that
prioritize fairness, transparency, and inclusivity. Implementing fairness-aware learning techniques
and bias mitigation strategies will ensure equitable treatment across diverse user groups.

5.2.7. Integration with Decision Support Systems:

Integrate sentiment analysis capabilities into decision support systems to provide actionable
insights for stakeholders. Leveraging sentiment analysis to inform strategic decision-making
processes will enable organizations to respond effectively to changing market dynamics and
consumer sentiment.

5.2.8. Continuous Model Training and Updating:

Implement a framework for continuous model training and updating to adapt to evolving language
trends and sentiment expressions. Regularly retraining sentiment analysis models with fresh data
will maintain model relevance and accuracy over time.

5.2.9. Collaboration and Knowledge Sharing:

Foster collaboration and knowledge sharing within the sentiment analysis community through
open-source initiatives and collaborative research projects. Encouraging the exchange of ideas and
resources will accelerate innovation and drive advancements in sentiment analysis technology.

5.2.10. Integration with Smart Platforms:

Explore opportunities to integrate sentiment analysis capabilities into smart platforms and
intelligent systems. Leveraging sentiment analysis in conjunction with IoT devices, virtual
assistants, and smart analytics platforms will enable context-aware decision-making and
personalized user experiences.

5.2.11. User-Centric Design and Feedback:

Prioritize user-centric design principles and solicit feedback from end-users to ensure the sentiment
analysis system meets their needs and preferences. Incorporating user feedback into system design
and iteration cycles will enhance usability and user satisfaction.

In conclusion, the future of sentiment analysis holds exciting possibilities for innovation and
advancement. By embracing emerging technologies, addressing ethical considerations, and
prioritizing user needs, we can continue to unlock the full potential of sentiment analysis in various
domains and applications.
REFRENCES
1. NasukawaY(2003)Sentimentanalysis:capturing favorability using natural language process ing, IBM
Almaden Research Center, CA 95120, https://doi.org/10.1145/945645.945658
2. MoheyD(2016)Asurveyonsentimentanalysis challenges. J King Saud Univ Eng https://doi.
org/10.1016/j.jksues.2016.04.002
3. Alessia D (2015) Approaches, tools and applications for sentiment analysis implementation. Int J
Comput Appl 125(3)
4. Xu W,Ritter A, Grishman R (2013) Gathering and generating paraphrases from twitter with
application to normalization
5. Hazra TK (2015) Mitigating the adversities of social media through real time tweet extraction
system, IEEE, https://doi.org/10.1109/iemcon.2015.7344483
6. Semih Y (2014) Tagging accuracy analysis on part-of-speech taggers. J Comput Commun 2:157–
162, https://doi.org/10.4236/jcc.2014.24021
7. El-Din DM (2015) Online paper review analysis. Int J Adv Comput Sci Appl 6(9)
8. Kaushik L (2013) Sentiment extraction from natural audio streams, IEEE https://doi.org/10.
1109/icassp.2013.6639321
9. Vaghela VB(2016)Analysisofvarious sentiment classification techniques. Int J Comput Appl 140(3)
10. BiltawiL M (2016) Sentiment classification techniques for Arabic language a survey, IEEE,
https://doi.org/10.1109/iacs.2016.7476075
11. GoelA(2016)Realtimesentiment analysis of tweets using naive bayes, IEEE, https://doi.org/
10.1109/ngct.2016.7877424 12.
Hu M, Liu B (2004) Mining and summarizing customer reviews, seattle, Washington, USA,
https://doi.org/10.1145/1014052.1014073
13.Rob Mulla
14.KimS-M(2004)Determiningthe sentiment of opinions, ACM Digital Library, https://doi.org/
10.3115/1220355.1220555
15. Mohammad S (2009) Generating high-coverage semantic orientation lexicons from overtly marked
words and a thesaurus. In: Conference on empirical methods in natural language pro cessing, pp 599–
608
16. Miller GA (1993) Introduction to word net: an on-line lexical database 16. Hatzivassiloglou V,
McKeown R(1998)Predicting the semantic orientation of adjectives, New York, N.Y.10027, USA
17. Medhat W (2014) Sentiment analysis algorithms and applications a survey. Ain Shams Eng J
(Elsevier B.V.), 5(4):1093–1113
18. Soo-Min Kim, Determining the Sentiment of Opinions, International Journal, doi=10.1.1.68.1034,
(2004)
19. Pang B, Lee L (2008) Opinion mining and sentiment analysis. https://doi.org/10.1561/ 1500000011
20. Niu Y (2005) Analysis of polarity information in medical text, PMC Jurnal
21. Park S (2016) Building thesaurus lexicon using dictionary based approach for sentiment clas
sification, IEEE, https://doi.org/10.1109/sera.2016.7516126
22. Ramsingh J (2016) Data analytic on diabetic awareness with Hadoop streaming using map reduce in
Python, IEEE, https://doi.org/10.1109/icaca.2016.7887979
23. Kim S-M, Hovy E (2006) Automatic identification of pro and con reasons in online reviews,
ACMDigital Library 24. Trupthi M (2017) Sentiment analysis on twitter using streaming API, IEEE,
https://doi.org/10. 1109/iacc.2017.0186
25. Cambria E, Hussain A (2015) Group Using Lexicon Based Approach. Springer J https://doi.
Org/10.1007/978-3-319-23654-4
26. Akter S (2016) Sentiment analysis on Facebook group using lexicon based approach, IEEE,
https://doi.org/10.1109/ceeict.2016.7873080
27. Yoshizawa A (2016) Machine-learning approach to analysis of driving simulation data, IEEE,
https://doi.org/10.1109/icci-cc.2016.7862067 162 A. A. Q. Aqlan et al.
28. Istiaq Ahsan MN (2016) An ensemble approach to detect review spam using hybrid machine
learning technique, IEEE, https://doi.org/10.1109/iccitechn.2016.7860229
29. Kumar M (2016) Analyzing Twitter sentiments through big data, IEEE, https://doi.org/10.
1109/sysmart.2016.7894530 30. Abhinandan P, Shirahatti (2015) Sentiment analysis on Twitter data
using Hadoop. Int J Eng Res Gen Sci 3(6)
USER MANUAL
Sentiment Analysis in Python
This notebook is part of a tutorial that can be found on my youtube channel here, please check it out!
In this notebook we will be doing some sentiment analysis in python using two different techniques:

🤗
1. VADER (Valence Aware Dictionary and sEntiment Reasoner) - Bag of words approach
2. Roberta Pretrained Model from
3. Huggingface Pipeline
Step 0. Read in Data and NLTK Basics
[1]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
import nltk
[2]
# Read in data
df = pd.read_csv('../input/amazon-fine-food-reviews/Reviews.csv')
print(df.shape)
df = df.head(500)
print(df.shape)
(568454, 10)
(500, 10)
[3]
df.head()

Quick EDA
[4]
ax = df['Score'].value_counts().sort_index() \
.plot(kind='bar',
title='Count of Reviews by Stars',
figsize=(10, 5))
ax.set_xlabel('Review Stars')
plt.show()
USER MANUAL

Basic NLTK
[5]
example = df['Text'][50]
print(example)
This oatmeal is not good. Its mushy, soft, I don't like it. Quaker Oats is the way to go.
[6]
tokens = nltk.word_tokenize(example)
tokens[:10]
['This', 'oatmeal', 'is', 'not', 'good', '.', 'Its', 'mushy', ',', 'soft']
[7]
tagged = nltk.pos_tag(tokens)
tagged[:10]
[('This', 'DT'),
('oatmeal', 'NN'),
('is', 'VBZ'),
('not', 'RB'),
('good', 'JJ'),
('.', '.'),
('Its', 'PRP$'),
('mushy', 'NN'),
(',', ','),
('soft', 'JJ')]
[8]
entities = nltk.chunk.ne_chunk(tagged)
entities.pprint()
(S
This/DT
oatmeal/NN
is/VBZ
not/RB
good/JJ
./.
USER MANUAL
Its/PRP$
mushy/NN
,/,
soft/JJ
,/,
I/PRP
do/VBP
n't/RB
like/VB
it/PRP
./.
(ORGANIZATION Quaker/NNP Oats/NNPS)
is/VBZ
the/DT
way/NN
to/TO
go/VB
./.)
Step 1. VADER Seniment Scoring

We will use NLTK's SentimentIntensityAnalyzer to get the neg/neu/pos scores of the text.
 This uses a "bag of words" approach:
1. Stop words are removed
2. each word is scored and combined to a total score.
[9]
from nltk.sentiment import SentimentIntensityAnalyzer
from tqdm.notebook import tqdm
sia = SentimentIntensityAnalyzer()
/opt/conda/lib/python3.7/site-packages/nltk/twitter/__init__.py:20: UserWarning: The twython library has not
been installed. Some functionality from the twitter package will not be available.
warnings.warn("The twython library has not been installed. "
[10]
sia.polarity_scores('I am so happy!')
{'neg': 0.0, 'neu': 0.318, 'pos': 0.682, 'compound': 0.6468}
[11]
sia.polarity_scores('This is the worst thing ever.')
{'neg': 0.451, 'neu': 0.549, 'pos': 0.0, 'compound': -0.6249}
[12]
sia.polarity_scores(example)
{'neg': 0.22, 'neu': 0.78, 'pos': 0.0, 'compound': -0.5448}
[13]
# Run the polarity score on the entire dataset
res = {}
for i, row in tqdm(df.iterrows(), total=len(df)):
text = row['Text']
myid = row['Id']
res[myid] = sia.polarity_scores(text)

[14]
vaders = pd.DataFrame(res).T
vaders = vaders.reset_index().rename(columns={'index': 'Id'})
vaders = vaders.merge(df, how='left')
[15]
# Now we have sentiment score and metadata
vaders.head()

Plot VADER results


[16]
ax = sns.barplot(data=vaders, x='Score', y='compound')
ax.set_title('Compund Score by Amazon Star Review')
plt.show()
USER MANUAL

[17]
fig, axs = plt.subplots(1, 3, figsize=(12, 3))
sns.barplot(data=vaders, x='Score', y='pos', ax=axs[0])
sns.barplot(data=vaders, x='Score', y='neu', ax=axs[1])
sns.barplot(data=vaders, x='Score', y='neg', ax=axs[2])
axs[0].set_title('Positive')
axs[1].set_title('Neutral')
axs[2].set_title('Negative')
plt.tight_layout()
plt.show()

Step 3. Roberta Pretrained Model

 Use a model trained of a large corpus of data.


 Transformer model accounts for the words but also the context related to other words.
[18]
from transformers import AutoTokenizer
USER MANUAL
from transformers import AutoModelForSequenceClassification
from scipy.special import softmax
[19]
MODEL = f"cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

[20]
# VADER results on example
print(example)
sia.polarity_scores(example)
This oatmeal is not good. Its mushy, soft, I don't like it. Quaker Oats is the way to go.
{'neg': 0.22, 'neu': 0.78, 'pos': 0.0, 'compound': -0.5448}
[21]
# Run for Roberta Model
encoded_text = tokenizer(example, return_tensors='pt')
output = model(**encoded_text)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
scores_dict = {
'roberta_neg' : scores[0],
'roberta_neu' : scores[1],
'roberta_pos' : scores[2]
}
print(scores_dict)
{'roberta_neg': 0.9763551, 'roberta_neu': 0.020687457, 'roberta_pos': 0.0029573673}
[22]
def polarity_scores_roberta(example):
encoded_text = tokenizer(example, return_tensors='pt')
output = model(**encoded_text)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
scores_dict = {
'roberta_neg' : scores[0],
'roberta_neu' : scores[1],
'roberta_pos' : scores[2]
}
return scores_dict
[23]
res = {}
for i, row in tqdm(df.iterrows(), total=len(df)):
try:
text = row['Text']
myid = row['Id']
vader_result = sia.polarity_scores(text)
vader_result_rename = {}
for key, value in vader_result.items():
vader_result_rename[f"vader_{key}"] = value
roberta_result = polarity_scores_roberta(text)
both = {**vader_result_rename, **roberta_result}
res[myid] = both
except RuntimeError:
print(f'Broke for id {myid}')
Broke for id 83
Broke for id 187

[24]
results_df = pd.DataFrame(res).T
USER MANUAL
results_df = results_df.reset_index().rename(columns={'index': 'Id'})
results_df = results_df.merge(df, how='left')
Compare Scores between models
[25]
results_df.columns
Index(['Id', 'vader_neg', 'vader_neu', 'vader_pos', 'vader_compound',
'roberta_neg', 'roberta_neu', 'roberta_pos', 'ProductId', 'UserId',
'ProfileName', 'HelpfulnessNumerator', 'HelpfulnessDenominator',
'Score', 'Time', 'Summary', 'Text'],
dtype='object')

Step 3. Combine and compare


[26]
sns.pairplot(data=results_df,
vars=['vader_neg', 'vader_neu', 'vader_pos',
'roberta_neg', 'roberta_neu', 'roberta_pos'],
hue='Score',
palette='tab10')
plt.show()
USER MANUAL
USER MANUAL
Step 4: Review Examples:

 Positive 1-Star and Negative 5-Star Reviews


Lets look at some examples where the model scoring and review score differ the most.
[27]
results_df.query('Score == 1') \
.sort_values('roberta_pos', ascending=False)['Text'].values[0]
'I felt energized within five minutes, but it lasted for about 45 minutes. I paid $3.99 for this drink. I
could have just drunk a cup of coffee and saved my money.'
[28]
results_df.query('Score == 1') \
.sort_values('vader_pos', ascending=False)['Text'].values[0]
'So we cancelled the order. It was cancelled without any problem. That is a positive note...'
[29]
# nevative sentiment 5-Star view
[30]
results_df.query('Score == 5') \
.sort_values('roberta_neg', ascending=False)['Text'].values[0]
'this was sooooo deliscious but too bad i ate em too fast and gained 2 pds! my fault'
[31]
results_df.query('Score == 5') \
.sort_values('vader_neg', ascending=False)['Text'].values[0]
'this was sooooo deliscious but too bad i ate em too fast and gained 2 pds! my fault'

Extra: The Transformers Pipeline

 Quick & easy way to run sentiment predictions


[32]
from transformers import pipeline
sent_pipeline = pipeline("sentiment-analysis")
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english
(https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)

[33]
sent_pipeline('I love sentiment analysis!')
[{'label': 'POSITIVE', 'score': 0.9997853636741638}]
[34]
sent_pipeline('Make sure to like and subscribe!')
[{'label': 'POSITIVE', 'score': 0.9991742968559265}]
[35]
sent_pipeline('booo')
[{'label': 'NEGATIVE', 'score': 0.9936267137527466}]

The End

You might also like