Sentiment Analysis for Social Media Presence
Sashank Saya
Dayananda Sagar Academy of Technology and Management
Shreelaxmi K Malawade
Dayananda Sagar Academy of Technology and Management
Smaranya Vijaya Krishna
Dayananda Sagar Academy of Technology and Management
Vaishnavi K S
Dayananda Sagar Academy of Technology and Management
Shylaja B
Dayananda Sagar Academy of Technology and Management
Research Article
Keywords: Sentiment Analysis, Flask Framework, Multi- Modal Analysis, Textual Data Processing, Image Data Processing, Natural Language Processing
(NLP), Deep Learning, Machine Learning, Social Media Monitoring, VGG16 Model, Gradient Boosting Classifier, TextBlob
Posted Date: June 13th, 2024
DOI: https://doi.org/10.21203/rs.3.rs-4496017/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License
Additional Declarations: No competing interests reported.
Page 1/7
Abstract
This Flask - React application offers a versatile platform for sentiment analysis on both text and image data. It includes functionalities for batch text
sentiment prediction, single text input analysis, and image sentiment prediction with generated captions. Leveraging machine learning and deep learning
techniques, the app provides insights into sentiment distribution, catering to diverse use cases such as social media monitoring and content moderation.
Through this project, we provide real – time solutions for the analysis of comments and images with the use of a Gradient Boosting Classifier and the VGG16
convolutional model with an accuracy of 82% when compared with the output of the sentiment module itself present in TextBlob.
1. INTRODUCTION
In today's digital age, the ability to understand and analyze sentiment from textual and visual content is essential for businesses and organizations across
various domains. Sentiment analysis, a branch of natural language processing (NLP), aims to extract subjective information from text and images, providing
valuable insights into user opinions, emotions, and attitudes. This project introduces a versatile Flask-based application designed to facilitate multi-modal
sentiment analysis, encompassing both textual and image data. The application offers a comprehensive suite of functionalities tailored to address the
growing need for sentiment analysis in diverse contexts. Leveraging state-of- the-art machine learning and deep learning techniques, the application enables
users to extract actionable insights from textual data sources such as social media comments, customer feedback, and reviews, as well as from visual
content like images and their associated captions. By combining the strengths of NLP with advanced image processing methodologies, the application
provides a holistic approach to sentiment analysis, catering to a wide range of use cases and
applications.
Key features of the application include batch sentiment prediction from CSV files, real-time sentiment analysis of single text inputs, and sentiment
assessment of images accompanied by automatically generated captions. Through an intuitive user interface and streamlined processing pipelines, users
can seamlessly upload, analyze, and interpret sentiment across multiple data modalities. By offering a flexible and
accessible platform for sentiment analysis, the application empowers organizations to make informed decisions, enhance customer engagement strategies,
and gain a deeper understanding of user sentiment in the digital landscape.
2. RELATED WORKS
Methodologies of all the review papers:
The methodologies employed across the reviewed papers encompass a range of traditional and advanced analytic techniques. These include the use of
dictionaries, neural networks, support vector machines, and sophisticated transformer-based systems aimed at understanding sentiments on various social
media platforms. A systematic approach is also evident, with the PRISMA framework being utilized for a structured, systematic review of sentiment analysis
literature, ensuring methodological rigor. Machine learning and deep learning play critical roles, employing models like Multinomial Naive Bayes, LSTM, CNN,
and BiLSTM to detect sentiments and specific conditions such as depression from diverse data forms including text, emoticons, and emojis. The approach to
multimodal sentiment analysis integrates analysis from multiple data types, including text, images, and GIFs, utilizing tools like VADER for text and fine-tuned
CNNs for visual content. Additionally, some systems focus on structured sentiment identification, developing capabilities to identify sentiment holders,
assess sentiment polarity, and combine sentiments within sentences using pre-defined classification models.
Pros:
The range of methodologies provides comprehensive coverage and insights into different applications of sentiment analysis, from general social media
analysis to specific applications like depression detection, enhancing understanding and capabilities in these areas. The incorporation of multimodal data
sources and advanced computational models has significantly improved the accuracy and precision of sentiment analyses. Innovations in methodology,
particularly the use of cutting-edge techniques such as deep learning and systematic review frameworks, introduce new and effective ways to manage
complex sentiment analysis tasks. Several studies have also offered practical solutions and structured methodologies to tackle common challenges in
sentiment analysis, ensuring their findings are applicable in real-world scenarios. Furthermore, efforts to automate and simplify the identification and
analysis of sentiments are notable, helping scale applications efficiently.
Cons:
Despite the advancements, there are significant drawbacks. The adoption of advanced models and multimodal frameworks can be costly and
computationally intensive, which may restrict their use to well-resourced organizations. There are also reproducibility issues, particularly with newer models
like transformer-based systems, which can impede the validation and broader application of research findings. Data collection and processing challenges,
along with inherent biases in sentiment analysis methods, could negatively affect the accuracy and generalizability of results. Ethical and privacy concerns
are especially prominent in studies involving sensitive data from social media, such as those used for detecting mental health conditions. Lastly, a heavy
reliance on specific analytical models and data types may limit the flexibility and adaptability of sentiment analysis frameworks to various contexts or
emerging challenges.
3. METHOD
Page 2/7
The entire project is composed of two main components: Text and Image Analysis both of which are running on a Flask Server backend. They are detailed
below:
3.1 Textual Data Processing
Textual data undergoes a series of preprocessing steps to ensure its suitability for sentiment analysis. These steps typically include:
1. Noise Removal: Eliminating irrelevant information such as URLs, special characters, and punctuation marks.
2. Tokenization: Splitting the text into individual words or tokens.
3. Part-of-Speech (POS) Tagging: Assigning grammatical tags to each token to identify its role in the
4. Stemming: Reducing words to their base or root form to normalize variations (e.g., "running" to "run").
5. Bag-of-Words (BoW) Representation[1]: After preprocessing, textual data is transformed into a numerical representation using the Bag-of-Words (BoW)
model. This model represents each document as a vector of word frequencies, where each dimension corresponds to a unique word in the vocabulary.
This representation captures the presence and frequency of words in the text but disregards their order or sequence.
6. Machine Learning Classification: The BoW representations[1] of textual data are used as input features for a Gradient Boosting Classifier[2] is employed
for sentiment analysis. This classifier learns to predict sentiment labels – positive, negative and neutral, based on the patterns and relationships present
in the input data. The classifier is trained on labelled training data, where each sample is associated with a sentiment label. The Gradient Boosting
Classifier employs decision trees as its base learners, which are sequentially constructed to address the
errors of previous trees. Through gradient descent optimization, the algorithm minimizes a loss function (e.g., cross-entropy loss for classification) by
iteratively fitting new trees to the residuals of the ensemble's predictions. This process updates the model parameters in a direction that minimizes the loss.
In this optimization framework, each decision tree is trained to predict the residuals of the ensemble, capturing the difference between the target values and
the current predictions. By combining these weak learners, the algorithm gradually enhances the overall prediction accuracy. Throughout multiple boosting
rounds, new decision trees are added to the ensemble, and their weights are adjusted based on their effectiveness in reducing the overall loss. The boosting
process iteratively refines the ensemble model by incorporating the predictions of multiple decision trees and updating their contributions to the final
prediction. The learning rate parameter controls the impact of each tree on the ensemble's predictions, balancing the trade-off between model complexity
and prediction accuracy. Overall, this process leads to the creation of a robust predictive model capable of capturing complex relationships in the data and
achieving high performance in classification tasks.
3.2 Image Data Processing
Image data is processed using deep learning techniques to extract relevant features for sentiment analysis. A pre-trained convolutional neural network (CNN)
model, such as VGG16[3], is utilized to extract high-level visual features from images. These features capture important visual patterns and semantics,
enabling the model to understand the content and context of images.
Images are accompanied by textual captions, a deep learning-based captioning model is employed to generate captions automatically. This model generates
descriptive text that summarizes the content of the image, providing context for sentiment analysis. The key components of this model are as follows:
1. Pooling Layers (block1_pool to block5_pool): After each convolutional block, a max-pooling layer (MaxPooling2D) reduces the spatial dimensions of the
feature maps, preserving the most important information. Max-pooling helps to reduce the computational cost and control overfitting by down-sampling
the feature maps.
2. Convolutional Blocks (block1_conv1 to block5_conv3): These blocks contain multiple convolutional layers (Conv2D) followed by rectified linear
activation functions (ReLU). Each convolutional layer extracts features from the input image using a set of learnable filters. The number of filters
increases as the network goes deeper into the architecture, capturing increasingly complex features.
3. Input Layer (input_3): The shape of this layer is (None, 224, 224, 3). The input layer expects images with a height and width of 224 pixels and three colour
channels (RGB).
4. Flatten Layer (flatten): This layer converts the output of the last convolutional block into a one-dimensional vector. It prepares the feature maps for input
into the fully connected layers.
5. Fully Connected Layers (fc1 and fc2): These layers consist of densely connected neurons. They perform classification based on the high-level features
extracted by the convolutional The output layer typically contains neurons corresponding to the number of classes in the classification task.
6. Total Parameters: The VGG16 model has a total of 134,260,544 parameters, making it a large and computationally intensive model. These parameters
include weights and biases associated with the convolutional and fully connected layers.
7. Trainable Parameters: All parameters in the VGG16 model are trainable, meaning they are updated during the training process based on the optimization
objective (e.g., minimizing classification error).
8. Non-Trainable Parameters: The VGG16 model does not contain any non-trainable parameters in this configuration. The below fig. 1 is the flowchart that
depicts the sequential processing steps of VGG16 model, from input image through convolutional and fully connected layers to classification output.
Fig. 1 The flowchart depicts the sequential processing steps of the VGG16 model, from input image through convolutional and fully connected layers to
classification output.
3.3 Sentiment Analysis Integration
Page 3/7
Once textual and/or visual data have been processed and features extracted, sentiment analysis is performed using appropriate techniques. For textual data,
sentiment labels are predicted directly using Gradient Boosting Classifier. For images, sentiment analysis is performed on generated captions.
3.4 Flask Application Development
Finally, the methods and models described above are integrated into a Flask-based React application. This application provides a user-friendly interface for
interacting with the sentiment analysis functionality, allowing users to input text or upload images for sentiment analysis, visualize sentiment results, and
obtain insights into sentiment distribution across different data modalities.
4. EXPERIMENTS AND RESULTS
4.1 Model Comparisons for Analysis Accuracy
1) Text Analysis: There are six models whose accuracy we are trying to compare with the results that we get by using the sentiment function present in
TextBlob module. By this comparison, we come to the conclusion that Gradient Boosting Classifier has the highest accuracy. The below table 1 depicts the
accuracy percentages of all the possible classifiers like decision tree, random forest, support vector machine, logistic regression, multinominal naïve bayes,
gradient boosting algorithms
Classifier Accuracy %
Decision Tree 62.15
Random Forest 67.24
Support Vector Machine 70.01
Logistic Regression 70.04
Multinomial Naïve Bayes 56.59
Gradient Boosting 62.39
Table. 1 A table that depicts the accuracy percentages of all the possible classifiers that can be used for text analysis.
2)Image Analysis: We compared the working of the CLIP model and the VGG16 model for caption generation, but the VGG16 model provides ease of storing
the model weights and the features extracted from the training dataset
– the Flickr30k dataset. The image captioning was first chosen to be done by a CLIP model, but it is inadequate in saving a trained model in a HDF5 file. So,
we use the VGG16 model for image captioning.
4.2 Data Preparation
1) Text Analysis: The training dataset is CSV file containing comments and their classification into positive, negative and neutral, taken from Twitter[5]. The
test dataset can be a CSV file or a single line of text.
4.3 Image Analysis: The training dataset is the Flickr30k[6] dataset, that consists of 31,784 images and their captions. This dataset helps us generate the
captions for any test
4.4 Analysis Procedure
1. Text Analysis: The dataset is first preprocessed by removing user mentions or URLs present in the comment, applying POS tagging and retaining the
words that count for sentiment analysis, like adjectives and then stemming by using the Snowball Stemmer which works for a variety of languages. After
this preprocessing, the classifier is fit on the training dataset. When required to analyze a single line of text, we preprocess it first and then, we predict the
sentiment of the text.
2. Image Analysis: The VGG16 model is first trained on the Flickr30k dataset of images, which extracts features using TensorFlow. It can be used to
generate the caption of the image, which can then be passed into the text analysis module, thereby giving us the sentiment of the image. The below fig. 3
is the graph that summarizes the table 1 depicting the accuracy results of various algorithms like decision tree, random forest, logistic regression, and
gradient boosting algorithms.
5. CONCLUSION AND CHALLENGES
In conclusion, the integration of the Gradient Boosting Classifier for text analysis and the VGG16 model for image caption generation has enabled a
comprehensive approach to sentiment analysis for both textual and visual content. By leveraging advanced machine learning and deep learning techniques,
the project has demonstrated the ability to extract meaningful insights from diverse data modalities, offering a holistic perspective on user sentiment.
Despite the project's success in integrating text analysis and image captioning for sentiment analysis, several challenges were encountered along the way.
These challenges include:
Page 4/7
1. Data Integration: Integrating textual and visual data sources and ensuring compatibility between different data modalities posed a significant challenge.
Overcoming data heterogeneity and preprocessing variations required careful consideration and data transformation techniques.
2. Model Complexity: Managing the complexity of the Gradient Boosting Classifier and the VGG16 model, both of which contain a large number of
parameters, required efficient optimization strategies and computational resources. Balancing model performance with computational cost was a key
challenge throughout the project.
3. Interpretability of Sarcasm and Slang: Interpreting sarcasm and slang poses a significant challenge in sentiment analysis due to their ambiguity, context
dependency, semantic complexity, data sparsity, and lack of explicit signals. Machine learning models, such as the Gradient Boosting Classifier and the
VGG16 model, may struggle to accurately detect sarcasm or understand slang expressions without sufficient contextual information and diverse training
data. Addressing this challenge requires innovative approaches that leverage contextual information, domain-specific knowledge, and advanced natural
language processing techniques to enhance the models' interpretability of nuanced language features accurately. Additionally, incorporating user
feedback mechanisms and human-in-the-loop approaches can help refine machine learning models and improve their ability to interpret sarcasm and
slang effectively.
4. Performance Evaluation: Evaluating the performance of the sentiment analysis system across multiple data modalities and ensuring consistent
evaluation metrics presented challenges. Addressing issues such as class imbalance, data bias, and domain specificity required rigorous evaluation
methodologies and validation techniques. The Fig. 4 beside shows the flowchart of the entire system architecture of the project. It is made up of 3 main
APIs: one for predicting sentiments for comments in a CSV file, one for predicting the sentiment of a single text and the third for predicting the sentiment
of a single image by generating its caption. The below Fig. 4 shows the flowchart of entire system architecture of the project. The image depicts a multi-
component system for automatically analyzing the sentiment (positive, negative, or neutral) expressed in text data from sources like social media, news,
and reviews. It ingests text, preprocesses it, extracts relevant features, applies sentiment analysis techniques like machine learning models, generates
ratings/summaries based on the analysis, monitors performance, and utilizes reference data to improve the sentiment analysis algorithms over time.
Declarations
Author Contribution
S.V.K. and S.K.M. wrote the main manuscript text. V.K.S. and S.S. prepared Figures 1-3 and Table 1. All authors (S.V.K., S.K.M., V.K.S., S.S., and S.B.) reviewed
the manuscript.
Data Availability
The code for the entire backend is present here: https://github.com/smaranya/Sentify_BackendThe dataset for the images has been taken from here:
https://www.kaggle.com/datasets/hsankesara/flickr-image-datasetThis is the train.csv and test.csv links, respectively, for training the text analysis model:
https://raw.githubusercontent.com/dD2405/Twitter_Sentiment_Analysis/master/train.csvhttps://raw.githubusercontent.com/dD2405/Twitter_Sentiment_Ana
References
1. M. Rodriguez – Ibanez et al., “A review on sentiment analysis from social media platforms”, Expert Systems With Applications, 2023
2. V. Athanasiou and M. Maragoudakis, "A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not
Plentiful: A Case Study for Modern Greek," Algorithms, vol. 10, no. 1, p. 34, Mar. 2017, doi: 10.3390/a10010034.
3. Tammina, Srikanth. (2019). Transfer learning using VGG- 16 with Deep Convolutional Neural Network for Classifying Images. International Journal of
Scientific and Research Publications (IJSRP). 9. p9420.
4. A. Shirzad, Hadi Zare, Mehdi Teimouri, “Deep Learning approach for text, image and GIF multimodal sentiment analysis”, ICCKE, 10th Conference, 2020
[5]https://raw.githubusercontent.com/dD2405/Twitter_Sentiment_Analysis/master/train.csv
5. Bryan A. Plummer, Liwei Wang, Christopher M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, and Svetlana Lazebnik, Flickr30K Entities: Collecting
Region-to-Phrase Correspondences for Richer Image-to-Sentence Models, IJCV, (1):74–93, 2017.
Figures
Page 5/7
Figure 1
The flowchart depicts the sequential processing steps of the VGG16 model, from input image through convolutional and fully connected layers to
classification output.
Page 6/7
Figure 2
Graph depicting the accuracy results of various algorithms
Figure 3
Flowchart of the entire system architecture of the project.
Page 7/7