Project
Project
A PROJECT REPORT
Submitted by
BACHELOR OF ENGINEERING
IN
Chandigarh University
November 2024
BONAFIDE CERTIFICATE
Certified that this project report “Social Media Analytics” is the bonafide work of
“Yashaswi Soni, Anshul Somani, Garv Jindal, Naman Sahni, Deepanshu Sharma”
who carried out the project work under my/our supervision.
SIGNATURE SIGNATURE
Submitted for the project viva-voce examination held on Nov 13, 2024.
We are immensely grateful to Er. Shivani Sharma, our project supervisor, for her invaluable
guidance, patience, and unwavering support throughout the duration of this project. Her profound
insights and expertise in data analytics and social media technologies were instrumental in shaping
the direction of our research and ensuring the successful completion of this project.
We would like to extend our deepest gratitude to Dr. Sushil Kumar Mishra, Head of the Computer
Science Engineering department at Chandigarh University, for fostering a nurturing academic
environment that emphasizes innovation and research. His support in providing essential resources
and continuous encouragement played a crucial role in our academic growth.
We are also thankful to the faculty of the Computer Science Engineering department for their
invaluable feedback and suggestions throughout the various stages of the project. Their constructive
criticism helped us enhance the quality and scope of our work.
We wish to acknowledge our peers and classmates for their collaboration and insights, particularly
during the brainstorming sessions, which enriched our understanding of the subject. Their constant
exchange of ideas and technical assistance helped refine our project methodology.
This project is the result of the collective effort of many, and we deeply appreciate everyone who
contributed to making it a success.
TABLE OF CONTENTS
Abstract ........................................................................................................................................ i
List of Tables............................................................................................................................... v
List of Standards......................................................................................................................... vi
Symbols ...................................................................................................................................... ix
REFERENCES ....................................................................................................... 67
APPENDIX ............................................................................................................. 70
1. Plagiarism Report ................................................................................................................70
Social media platforms like Facebook, Twitter, and Instagram generate vast amounts of data every day.
Analyzing this data effectively is crucial for businesses to gain insights into user engagement, sentiment, and
content performance. This project focuses on developing an interactive Tableau dashboard for Social Media
Analytics, enabling organizations to visualize key metrics such as likes, shares, comments, follower growth,
Using API integration and web scraping techniques, data was collected from multiple platforms, followed by
preprocessing steps like data cleaning and Natural Language Processing (NLP) for sentiment analysis. The
Tableau dashboard presents real-time insights, allowing businesses to track engagement trends, analyze
The project demonstrates the value of visual analytics in simplifying complex data and providing actionable
insights to improve social media strategies, ultimately helping businesses enhance their digital presence and
customer-interactions.
i
श
, ,औ
, , ,औ
Tableau ,
, , , ,औ
API औ ई ,
औ (NLP) ,
औ -
औ औ
ii
GRAPHICAL ABSTRACT
iii
List of Figures
iv
List of Tables
v
List of Standards
Publishing Page
Standard About the Standard
Agency No.
vi
IEEE 12207 establishes a framework for the life cycle
IEEE processes of software, which is beneficial for Page
IEEE
12207 systematically developing, testing, and deploying NLP 56
applications.
vii
ABBREVIATIONS
1. AI - Artificial Intelligence
2. API - Application Programming Interface
3. CNN - Convolutional Neural Network
4. CRF - Conditional Random Fields
5. DL - Deep Learning
6. ELMo - Embeddings from Language Models
7. EM - Expectation-Maximization
8. GPU - Graphics Processing Unit
9. LSTM - Long Short-Term Memory
10. ML - Machine Learning
11. NLP - Natural Language Processing
12. RNN - Recurrent Neural Network
13. RoBERTa - Robustly Optimized BERT Approach
14. SA - Sentiment Analysis
15. SVM - Support Vector Machine
16. TF-IDF - Term Frequency-Inverse Document Frequency
17. TPU - Tensor Processing Unit
18. VADER - Valence Aware Dictionary and sEntiment Reasoner
19. Bi-LSTM - Bidirectional Long Short-Term Memory
20. BERT - Bidirectional Encoder Representations from Transformers
21. TF - TensorFlow
viii
SYMBOLS
ix
CHAPTER - 1
INTRODUCTION
1. Expanding Role of Social Media in Business and Society: Social media platforms like
Facebook, Twitter, and Instagram have become vital channels for businesses, influencers, and
public institutions to connect with audiences, gather insights, and adapt strategies. The ability
to analyze social media data is essential for understanding public sentiment, customer
engagement, and emerging trends.
b. Application in Multiple Sectors: Social media analytics plays an essential role in various
fields, from business intelligence—where it informs brand strategy and customer satisfaction—
to public health and politics, where it helps monitor societal trends and public opinion.
2. Challenges in Social Media Analytics: Analyzing diverse content from multiple social media
sources presents challenges, especially due to the mix of formal and informal language,
widespread use of slang, abbreviations, emojis, and the lack of standardized structure in posts.
a. Data Noise and Informality: Social media posts often include irrelevant or unstructured
information, complicating sentiment analysis and data accuracy. To tackle this, effective data
preprocessing and cleaning are critical.
b. Dynamic Content and Sentiment Shifts: Sentiment analysis on social media is complex,
10
as public sentiment can shift rapidly in response to events, requiring real-time processing to
capture,trends-accurately.
1. Trends in Social Media Use: Studies reveal that more than 4 billion users worldwide interact
on social media, with younger demographics showing high engagement rates. This widespread
interaction presents a major opportunity for insights, especially for businesses aiming to
understand consumer preferences and sentiments.
a. User Behavior and Preferences: Users increasingly blend languages and employ informal
expressions, especially in multilingual societies. This trend reflects the natural evolution of
digital communication toward linguistic inclusivity.
b. The Role of Real-Time Data: Real-time data is essential for organizations aiming to
respond quickly to emerging trends and crises, especially in sectors where timely information
is crucial, such as public health and crisis management.
2. Complications in Data Interpretation: Abbreviations, slang, and emojis often complicate the
accuracy of sentiment analysis, underscoring the need for models that can handle both formal
and informal language.
a. Example of Interpretation Challenges: Phrases like “That’s unbelievable, lol!” may imply
sarcasm, but without nuanced models, sentiment interpretation can be ambiguous. This points
to the need for more sophisticated sentiment models capable of recognizing informal language
and contextual cues.
11
1. Businesses and Brand Managers: Social media analysts and marketers depend on insights
into public sentiment to gauge brand perception and tailor marketing strategies.
a. Brand Reputation Analysis: Brands can analyze customer feedback on social media to
detect trends in satisfaction, allowing for proactive customer engagement.
b. Targeted Marketing: Emotion and sentiment analysis can help brands adapt campaigns
to resonate with regional audiences, especially where cultural and linguistic factors play a
role.
2. Mental Health and Crisis Management: Public health professionals and psychologists
utilize social media data to monitor public sentiment, detect signs of distress, and provide
timely interventions.
a. Real-Time Support for Mental Health: Social media sentiment analysis offers health
organizations the ability to track public sentiment trends, assisting in identifying
communities in need of support.
b. Crisis Response: During public crises or disasters, timely and accurate sentiment
analysis enables authorities to understand public sentiment and respond with appropriate
support.
3. Limitations of Traditional Models: Conventional emotion detection tools are designed for
monolingual analysis and lack the sophistication to handle code-mixed expressions.
a. Language-Specific Challenges: Certain languages have unique ways of expressing emotions that
may not directly translate, causing conventional models to misinterpret the tone or intent of the
post.
12
b. Example of Model Limitations: Traditional models trained on English may miss emotional
nuances conveyed in a mix of English and Punjabi, such as “Missing home so much yaar," where
“yaar” adds an emotional undertone typical in Punjabi.
1. Industry Demand for Social Media Analytics: A recent survey by the Social Media Analysis
Institute highlights the growing importance of advanced analytics tools for understanding
social media metrics and audience sentiment.
a. Challenges in Analyzing Social Media Data: Over 75% of marketing and tech firms
surveyed expressed difficulty in accurately interpreting social media data due to the informal
and diverse language used across platforms. This has driven the demand for specialized
analytics tools, such as dashboards, that can provide a clear view of engagement, sentiment,
and trends.
2. Implications for Business and Public Relations: The importance of analyzing social media
data is especially pronounced in business, where understanding audience engagement and
sentiment is central to brand strategy and reputation management.
b. Crisis Management and Public Sentiment: Social media has become a critical channel for
13
public relations during crises. Real-time analytics allow organizations to monitor public
sentiment as situations unfold, enabling them to communicate more effectively and address
concerns-promptly.
1. Enhancing Customer Experience: The ability to accurately detect sentiment and engagement
trends across social media platforms can significantly enhance the customer experience.
Businesses can better understand their audience and adjust their strategies to meet customer
needs.
2. Customer Retention and Engagement: By using social media analytics, companies can
understand customer preferences, predict trends, and adjust marketing campaigns accordingly.
Retention strategies benefit as companies leverage insights from customer interactions to
improve loyalty and satisfaction.
3. Real-Time Feedback Interpretation: Social media analytics tools that provide real-time
insights empower companies to respond instantly to customer feedback, addressing concerns
before they escalate. This improves brand loyalty and strengthens the customer relationship.
4. Public Health Monitoring and Mental Well-Being: Beyond business, social media analytics
is valuable in public health. Public health organizations use it to monitor sentiment around
health campaigns or emerging crises, such as outbreaks. Understanding the tone of social
media discussions can guide timely, relevant responses to community concerns.
5. Crisis Management: Social media platforms are often the first line of communication during
emergencies. Analytics tools help organizations monitor public sentiment, understand the
extent of crises, and manage their response strategies effectively. This capability is crucial for
delivering timely support and addressing misinformation.
7. Broader Economic Impact: Effective use of social media analytics leads to more informed
decision-making, giving companies a competitive advantage. The insights generated help
foster an inclusive, data-driven approach to digital engagement, benefiting businesses and
users-alike.
The problem of analyzing social media data lies in the unique challenges presented by the vast and
unstructured nature of social media content, especially with respect to the mixed formats and rapidly
evolving language patterns.
Informal Language: Social media language is informal and unpredictable, with a mix of
emojis, slang, and abbreviations. This dynamic language use demands robust data-cleaning
methods and sophisticated analysis tools.
Lack of Unified Datasets: Analyzing mixed-format data from various platforms requires a
consolidated dataset. However, such datasets are rare, posing a challenge for creating
reliable sentiment models.
Real-Time Analysis Needs: The constantly changing sentiment on social media calls for
real-time analysis to capture shifts accurately, especially during events that provoke strong
public reaction.
1.3. Timeline
The project follows a structured timeline from August to November, encompassing all
major phases of analysis, design, implementation, and deployment.
15
August-September: Analysis phase, requirement gathering, and design architecture.
October: Coding phase, developing the main sentiment analysis components.
October-November: Testing and validation to ensure the accuracy of the sentiment
models.
November: Deployment and documentation for usability.
This timeline provides a structured overview to keep the project on schedule and organized.
16
1.4. Identification of Tasks
Table I: Identification of Tasks
19
1.5. Organization of the Report
Chapter 1: Introduction
Overview: Introduces the Social Media Analytics project, including its background,
relevance, and the motivations driving its development, especially the importance of
analyzing user interactions on platforms like Twitter, Facebook, and Instagram.
Report Structure: Outlines the roadmap of the report, detailing the organization and
flow of chapters.
Review of Existing Solutions: Summarizes current methods and tools available for
social media data analysis, including approaches to sentiment analysis, data
visualization, and engagement metrics tracking.
Gap Analysis: Identifies gaps in current research and technology that the project
aims to address, such as the need for real-time, interactive dashboards and handling
code-mixed language.
20
Chapter 3: Design Flow and Process
Feature Selection and Evaluation: Details the criteria used to evaluate and select
key features and metrics for social media analysis, such as engagement rates,
sentiment scores, and follower growth.
Design Flow: Provides a step-by-step illustration of the project’s design flow, from
data collection and cleaning to visualization and dashboard assembly in Tableau.
Design Approach: Outlines the chosen approach for implementing the social media
analytics dashboard, including methods for integrating real-time data and creating
user-friendly visualizations.
Future Work: Recommends directions for future research and development, such as
expanding multilingual support, adding predictive analytics capabilities, and
integrating more advanced NLP features.
Guiding Structure: Each chapter systematically guides the reader through the
project’s methodology, findings, and implications, offering a structured approach to
understanding the project’s contribution to social media analytics.
22
CHAPTER - 2
23
3. Introduction of Sentiment Analysis for Social Media (2015-2017)
Timeline: 2015-2017
Context: As social media data became central to marketing strategies, companies sought to
understand not only engagement but also sentiment. Early sentiment analysis models were
introduced but were primarily designed for English text and struggled to interpret the informal
language commonly used on social media.
Incident/Observation: Research into sentiment analysis gained traction as businesses wanted to
analyze customer sentiment accurately. However, these early models, which were largely rule-
based, often misinterpreted sentiment, particularly in posts with abbreviations, slang, or informal
language.
Proof: A pivotal study by Pang and Lee (2016) evaluated sentiment analysis tools, identifying
significant limitations in handling social media text accurately. Another study in 2017 by the
Association for Computational Linguistics pointed out the challenges in adapting these models for
real-time sentiment monitoring.
4. Emergence of Advanced NLP and Machine Learning Models (2017-2020)
Timeline: 2017-2020
Context: With advancements in machine learning, particularly deep learning and NLP, companies
began developing more sophisticated language models capable of processing complex text data.
Transformer-based models like BERT and GPT-2 showed potential in improving sentiment
analysis on social media.
Incident/Observation: NLP models were able to capture context more accurately, which
improved sentiment analysis performance. Yet, these models faced difficulties in distinguishing
nuanced emotions and processing non-standard language prevalent on social media.
Proof: Google and Facebook released advanced models like BERT and RoBERTa, which
demonstrated improvements in NLP tasks but acknowledged difficulties in social media sentiment
analysis. A study in 2019 from Facebook AI Research specifically highlighted that despite
advancements, accurately interpreting social media text remains a challenge due to its informal
and evolving nature.
5. Increased Demand for Real-Time Social Media Analytics (2020-Present)
Timeline: 2020-Present
Context: Social media analytics has become essential in real-time sentiment and trend monitoring,
24
especially with the rise in e-commerce, brand management, and public health awareness.
Organizations now require analytics solutions that provide instant insights to respond quickly to
public sentiment and events.
Incident/Observation: The COVID-19 pandemic and increasing use of social media for customer
feedback spurred demand for real-time sentiment analysis and engagement tracking. Companies
and public health organizations recognized the need for tools that could capture and analyze real-
time social media data accurately.
Proof: A 2021 Deloitte report highlighted the need for advanced social media analytics tools
capable of processing real-time data to track sentiment trends, especially during crises.
Additionally, research from McKinsey (2021) noted that real-time analytics has become a critical
tool for businesses and public organizations to make data-driven decisions based on social media
insights.
In the field of social media analytics, various methodologies have been employed to analyze and
extract valuable insights from user interactions, engagement patterns, and sentiment expressed in
posts. These solutions span across different techniques, ranging from traditional methods like
keyword analysis to modern machine learning-based approaches. Below are some of the key
approaches used for social media data analysis:
a. Keyword Analysis: This approach focuses on identifying key terms and phrases in social
media posts and categorizing them into sentiment categories (e.g., positive, negative, or
neutral). It is widely used in brand monitoring to track mentions of specific products or
services.
b. Engagement Metrics Calculation: Traditional statistical methods calculate engagement
rates, including likes, shares, comments, and followers. These metrics provide businesses
25
with a high-level overview of social media performance.
Limitations:
a. Rule-based systems are generally not flexible enough to handle nuanced or evolving
language, making them less effective for analyzing informal language, slang, or new
terminologies on social media platforms.
b. They also lack the ability to handle multilingual or code-mixed data effectively.
a. Supervised Learning: Algorithms like Support Vector Machines (SVM) and Random
Forest are used to classify social media posts into categories (e.g., sentiment analysis:
positive, negative, or neutral). These models are trained on labeled datasets where the
sentiment of posts is already known.
b. Unsupervised Learning: Clustering techniques such as k-means or DBSCAN help in
identifying patterns and grouping similar posts based on content features. These models can
be used to uncover emerging topics or trends from user-generated content.
Challenges:
a. These models require large labeled datasets, which are difficult to obtain, especially for
niche topics or languages.
b. They also struggle with informal language, emojis, and mixed-language content commonly
found in social media posts.
26
a. Sentiment Analysis: NLP models use techniques like tokenization, part-of-speech tagging,
and named entity recognition (NER) to assess the sentiment of social media posts. These
methods rely on both lexical and contextual analysis to determine whether the text is positive,
negative, or neutral.
b. Emotion Detection: More advanced NLP models detect specific emotions such as joy,
anger, sadness, or surprise. These models often use emotion lexicons combined with machine
learning models to classify posts into specific emotional categories.
Limitations:
a. NLP-based systems can misinterpret informal language, emojis, sarcasm, or idiomatic
expressions common in social media, which can lead to inaccurate sentiment detection.
b. Language-specific nuances and mixed-language posts pose a significant challenge for
NLP-based models, requiring constant updates and adaptations.
a. Recurrent Neural Networks (RNNs): RNNs are used to analyze the sequential nature of
social media posts, especially in cases where the sentiment or meaning of a post depends on
its context or previous parts. RNNs are particularly useful for time-series analysis of posts,
such as monitoring trends over time.
b. Long Short-Term Memory (LSTM): An advanced type of RNN, LSTMs can capture
long-range dependencies, making them more effective at understanding longer posts or
comments that contain mixed sentiments and emotional shifts.
Advantages:
a. Deep learning models, especially LSTMs and attention mechanisms, excel at processing
large datasets and can automatically learn patterns from raw text data, eliminating the need
27
for feature engineering.
b. These models can handle a more diverse range of social media data, from posts to
comments and interactions, improving their effectiveness in sentiment analysis.
5. Transformer-Based Models
Transformer models like BERT (Bidirectional Encoder Representations from Transformers)
and its multilingual variant mBERT have revolutionized the field of social media analytics.
These models can process entire sequences of text in parallel, capturing contextual meaning
across long distances within text.
a. BERT and mBERT: BERT models are pretrained on vast amounts of text data in multiple
languages, enabling them to understand context in both monolingual and multilingual posts.
This makes them highly effective for analyzing the sentiment and emotions in code-mixed or
multilingual social media posts.
b. Fine-Tuning for Specific Tasks: Transformer models like BERT can be fine-tuned on
domain-specific datasets, such as sentiment analysis for specific industries, brand monitoring,
or political sentiment tracking, ensuring that they provide accurate insights for social media
applications.
Advantages:
a. Transformer models like BERT are highly accurate for understanding the nuances of
language in a social media context and are particularly effective at interpreting code-mixed or
multilingual posts.
b. Their ability to capture relationships between words across different languages allows
them to outperform traditional NLP models in complex, real-time applications.
6. Hybrid Models
Hybrid models combine the strengths of machine learning, NLP, and deep learning
techniques to provide more accurate and reliable social media analytics solutions. These
models often integrate rule-based sentiment lexicons with machine learning or deep learning
models to improve emotion detection accuracy.
28
a. Lexicon + Machine Learning: Hybrid models first use lexicon-based approaches to
assign initial sentiment scores or labels to posts, which are then refined using machine
learning classifiers like SVM or decision trees for better accuracy.
b. Multimodal Analysis: Some hybrid models integrate non-textual data (e.g., emojis,
images, and hashtags) into the analysis to capture a fuller picture of social media interactions.
Emojis and hashtags often carry significant emotional or contextual meaning, enhancing the
sentiment classification process.
Advantages:
a. Hybrid models provide a better overall performance by combining the strengths of both
rule-based lexicons and machine learning or deep learning techniques, especially in
processing informal, evolving social media language.
b. They also offer greater flexibility, adapting to various languages, social media platforms,
and different types of engagement (comments, posts, retweets, etc.)
Bibliometric analysis involves analyzing various bibliographic data such as publication counts,
citations, and co-authorship networks. It is widely used to identify the progression of scientific
research, the relationships between different areas of study, and the key contributors to a particular
field.
1. Publication Trends:
Bibliometric analysis helps in tracking the number of publications in the area of social media
analytics over time, identifying periods of significant growth or decline. This can reveal how
29
the field has evolved, whether through the rise of new research topics or the shift in focus of
existing ones.
2. Citation Analysis:
Citation counts are one of the most important metrics in bibliometric analysis. High citation
counts typically indicate influential research. In social media analytics, citation analysis can be
used to identify seminal papers, authors, and key theories that have shaped the field.
3. Co-authorship Networks:
This analysis examines the collaboration patterns between authors. It identifies how scholars
in social media analytics collaborate on research, which research institutions dominate the
field, and how knowledge is shared within the academic community.
6. Impact Factor:
Analyzing the impact factor of journals and citations of specific publications provides insight
into the scientific influence and relevance of certain research in the field of social media
analytics.
30
research. By tracking the evolution of topics over time, researchers can recognize which
methodologies (e.g., machine learning, deep learning, NLP) have gained prominence and
which have faded from interest.
5. Comparative Insights:
Bibliometric analysis helps compare research outputs across various subfields of social media
analytics. For instance, the number of papers on sentiment analysis versus those on social
media engagement tracking can provide insights into what areas are prioritized in the field.
31
into social media analytics may come from sources outside the academic literature. For
example, industry reports, white papers, and conference proceedings may offer valuable
information not captured in citation databases.
5. Lack of Context:
While bibliometric analysis can tell you how many times a paper has been cited or its impact
factor, it does not provide context. It is difficult to gauge the actual contribution of a paper in
terms of its novelty or real-world application from citation data alone.
32
Table II: Aspect, Effectiveness & Drawbacks
Aspect Details
Key - Trends in Research Output: Growth in publications, emerging research focus
Features areas.
- Research Methodologies: Use of machine learning, deep learning, hybrid models,
and lexicon-based methods.
- Key Researchers and Institutions: Identification of prominent authors and
research institutions.
- Keywords and Themes: Common keywords like "code-mixed sentiment analysis",
"social media mining", "deep learning for bilingual texts".
- Publication Venues: Journals and conferences with a significant number of papers
in the field.
Effectiveness - Identifying Emerging Trends: Understanding shifts toward deep learning and
multimodal approaches.
- Research Gaps: Revealing shortcomings in existing methodologies, such as
informal language handling.
- Quality of Research: Impact based on citation count and journal ranking.
- Assessment of Tools and Techniques: Most used sentiment lexicons and deep
learning frameworks.
Drawbacks - Citation Bias: Overreliance on citation counts, possibly ignoring novel but under-
cited work.
- Exclusion of Non-Published Work: Missing out on valuable research in
conferences, white papers, or dissertations.
- Inability to Measure Practical Impact: Focusing on academic impact, not real-
world application.
- Limited to Textual Features: Ignoring the role of multimedia in sentiment analysis
on social media.
35
2.4. Review Summary
In the literature review, several key insights were identified that are directly relevant to the Sentiment
Analysis in English-Punjabi Mixed Social Media Posts project. These findings form the
foundation for the development of the proposed solution and help address specific challenges that
have been highlighted in previous research. Here is how the findings are linked to the project at hand:
Literature Insight: Code-mixing, particularly in social media posts, involves the blending
of two or more languages, which significantly complicates sentiment analysis. Research has
shown that traditional models often fail to effectively handle the nuances of code-mixed
content, especially when it involves informal language, slang, and unique expressions that are
prevalent in social media.
Link to the Project: This insight is directly applicable to the project, which focuses on
improving sentiment analysis for English-Punjabi mixed texts. The project aims to enhance
existing models to handle these language complexities, ensuring more accurate detection of
emotions such as positivity, negativity, or neutrality in posts containing both English and
Punjabi.
Literature Insight: Deep learning models like LSTM and BERT have been increasingly used
for sentiment analysis due to their ability to understand context and capture long-term
dependencies in code-mixed texts. These models have proven to be more effective than
traditional methods (such as Naive Bayes or SVM) when it comes to understanding the
dynamics of mixed-language data.
Link to the Project: The project will leverage deep learning techniques such as Recurrent
Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and BERT models.
By using pretrained multilingual embeddings, the project aims to enhance sentiment
36
classification by capturing the contextual meaning of words in both English and Punjabi
within social media posts.
Literature Insight: Social media language is informal, with frequent use of abbreviations,
slang, and emojis. These aspects make sentiment analysis challenging as they often deviate
from formal grammar and vocabulary. Existing lexicons, like SentiWordNet, have limitations
when applied to such informal language.
Link to the Project: The project addresses these challenges by focusing on data pre-
processing techniques, including slang detection and contextual language normalization, to
better handle the informal nature of code-mixed posts. Moreover, custom sentiment lexicons
tailored for English-Punjabi text will be developed to ensure better accuracy in understanding
slang and informal expressions in the posts.
Literature Insight: Transformer models like BERT and mBERT have shown strong
performance in multilingual sentiment analysis tasks because they can understand the broader
context of words and phrases, rather than relying solely on n-grams or individual words.
37
Link to the Project: The Sentiment Analysis in English-Punjabi Mixed Social Media Posts
project will make use of multilingual BERT (mBERT) or XLM-R (Cross-lingual Model) to
enhance sentiment analysis. These models have been pretrained on large multilingual
datasets, enabling them to grasp the semantics of mixed-language posts, which is key to
accurate sentiment detection.
Literature Insight: Hybrid approaches, combining machine learning models with lexicon-
based methods, have been shown to improve sentiment classification accuracy. This is
because hybrid models can leverage the strengths of both rule-based systems (which
understand sentiment-bearing words) and data-driven methods (which learn complex patterns
from large datasets).
Link to the Project: The project will explore hybrid techniques, combining lexicon-based
sentiment analysis with deep learning models. This will help capture both explicit emotional
cues from the lexicon and nuanced emotional patterns that can only be learned from large,
labeled datasets.
Literature Insight: Research has demonstrated that sentiment analysis in code-mixed social
media posts has significant real-world applications, especially in customer experience
management, mental health monitoring, and targeted advertising. However, there is a gap in
tools capable of accurately processing mixed-language content.
Link to the Project: This project is positioned to address this gap, with the aim of creating a
tool that can effectively analyze emotions in bilingual or multilingual social media posts. The
tool could be used in customer feedback analysis, mental health assessment, and other
domains where emotional insights from social media are crucial for decision-making.
38
Accuracy of different models on different datasets: -
Self-supervised representation
GRU CREMA-D (SER) learning 55.01%
CREMA-D and
EmoAffectNet AffectNet (FER) CNN-LSTM 79%
CMU-MOSEI
MMLatch (Multimodal) LSTM, RNNs, and Transformers 82.40%
43
Figure 4: Accuracy of Models on different Datasets
The goal of this project is to develop an efficient sentiment analysis model tailored to code-mixed
social media content, specifically posts that combine multiple languages, such as English and
Punjabi. The scope covers:
44
Handling Code-Mixing: Code-mixing in social media is a common practice, especially in
multilingual communities. This project aims to address the challenges posed by code-
switching, where users blend multiple languages within a sentence or even a single word.
Language Diversity: While the primary focus is on English-Punjabi code-mixing, the
approach should be adaptable to handle other bilingual or multilingual combinations that are
prevalent on social media.
Contextual Sensitivity: The sentiment classification model will need to understand both the
textual content and context, including handling nuances like sarcasm, irony, and emotional
undertones often used in informal communication.
Sentiment analysis in code-mixed social media content presents several unique challenges that must
be addressed:
Linguistic Complexity: Code-mixed content often features words, phrases, and constructs
from multiple languages, making it difficult for traditional sentiment analysis models that are
typically designed for a single language. For example, the emotional tone of a post can depend
on the language used in different segments of the sentence.
Informal Language: Social media is filled with slang, abbreviations, and creative language
usage (e.g., emojis, acronyms like "LOL," and internet-specific expressions) that complicates
the extraction of clear sentiments.
Contextual Interpretation: Sentiment on social media is not always straightforward. The
same word can have different meanings depending on its context, making sentiment detection
more challenging. Sarcasm, humor, and irony are often used, where a positive sentiment word
like "great" might carry a negative connotation when used sarcastically.
Data Scarcity and Labeling Issues: There is a shortage of large, labeled datasets of code-
mixed social media content for training machine learning models. The limited availability of
such datasets affects the robustness of sentiment models.
Multilingual Models: Code-mixed text may span multiple languages (e.g., English and
Punjabi), and most existing models do not effectively handle multiple languages
45
simultaneously. While multilingual models like mBERT exist, they still face challenges in
understanding the specific nuances of code-mixed content.
3. Expected Outcomes
The expected outcome of this project is the creation of a sentiment analysis system that can:
Accurately Classify Emotions: The system should classify the sentiment expressed in code-
mixed posts into one of three categories: positive, negative, or neutral.
Handle Code-Switching: The model should effectively process content that contains
multiple languages, even when they are intermixed within sentences or phrases.
Improve Accuracy with Contextual Understanding: By incorporating deep learning
models like RNNs, LSTMs, or transformers (such as mBERT), the system should better
understand the context of mixed-language posts and improve sentiment accuracy compared
to traditional methods.
Provide Real-World Applications: The developed system can be deployed for use in real-
time social media monitoring tools, enabling brands, marketers, and mental health
professionals to understand user sentiments in multilingual digital spaces.
Continual Adaptability: The system should be capable of being retrained as new slang,
abbreviations, and linguistic patterns emerge on social media, ensuring long-term
effectiveness.
2.6. Goals/Objectives
The following objectives set clear milestones for the sentiment analysis project targeting code-mixed
social media posts. These objectives outline what is to be learned, performed, and achieved during
the course of the project.
46
1. Data Collection and Preprocessing
Objective: Develop multiple machine learning models for sentiment analysis, including
traditional models (Naive Bayes, SVM), lexicon-based methods, and deep learning
approaches (LSTM, BERT).
o Milestone: Implement baseline models by the end of Month 3.
o Measure: Performance of these models will be validated using a cross-validation
technique to assess their ability to classify sentiment correctly.
Objective: Train deep learning models, such as LSTM or mBERT, to handle code-mixed text
and improve sentiment analysis accuracy.
o Milestone: Train and evaluate LSTM and mBERT models by the end of Month 4.
o Measure: Achieve a minimum of 80% accuracy on the validation set for both models.
47
3. Model Evaluation and Optimization
Objective: Deploy the sentiment analysis model for real-time sentiment classification on
social media posts.
o Milestone: Implement a prototype sentiment analysis tool and test it on real-time
social media feeds by the end of Month 7.
o Measure: Ensure the deployed tool can classify sentiments with at least 80% accuracy
on live data.
Objective: Conduct user testing to assess the tool's effectiveness for social media analysts,
marketers, or mental health professionals.
o Milestone: Gather feedback and evaluate the tool's real-world utility by the end of
Month 8.
o Measure: Achieve a positive feedback rate of over 75% from users in terms of
accuracy and usability.
48
5. Reporting and Documentation
Objective: Identify areas for future improvements, such as handling additional languages or
enhancing contextual understanding for sarcasm and irony.
o Milestone: Document potential future improvements in the final report.
o Measure: Propose at least three new directions for future research or model
enhancement.
49
CHAPTER - 3
DESIGN FLOW/PROCESS
The design and development of the Social Media Analytics Dashboard involved multiple phases,
from the evaluation of features and selection of appropriate technologies to the implementation
and testing of the final solution. This chapter details the steps taken in the design process, the
constraints encountered, and the methodology used to ensure the system met the desired
requirements effectively.
1. Social Media Platforms Coverage: We identified the primary platforms from which we
needed to gather data: Facebook, Instagram, and Twitter. These platforms were chosen due to
their widespread use and the availability of APIs that allow for data collection.
2. Metrics to Track: Based on the project’s goal to analyze user engagement and sentiment, the
following metrics were selected:
a. Likes, Shares, Comments, and Retweets: These metrics indicate user interaction and
engagement with the content.
b. Follower Growth: This tracks how the audience is growing over time.
c. Sentiment Analysis: By analyzing user comments and posts, we assess overall
sentiment (positive, neutral, or negative).
d. Top Performing Content: This involves identifying posts that have the highest
engagement across different platforms.
3. Data Collection Methods: The decision was made to use API integration for platforms like
Twitter and Instagram, as these platforms provide structured data through their developer APIs.
50
For platforms with limited API access, such as Instagram’s deeper analytics, web scraping tools
like BeautifulSoup and Selenium were selected to collect the necessary data.
4. Real-Time Data Visualization: The dashboard needed to provide real-time updates to allow
businesses to monitor social media performance instantly. This led to the selection of Tableau for
data visualization due to its powerful real-time analytics capabilities and user-friendly interface.
5. Data Preprocessing: We selected Natural Language Processing (NLP) tools for cleaning the
text data (such as removing noise, correcting misspellings, and handling special characters), and
for conducting sentiment analysis to understand user opinions and emotions better.
1. Data Access Limitations: Not all social media platforms provide complete access to their data
through APIs. For instance, Instagram’s API restricts certain types of data, requiring the use of
web scraping for full data extraction. Additionally, privacy policies limit the kind of personal user
information that can be accessed.
2. API Rate Limiting: APIs like Twitter’s impose rate limits on how frequently data can be
retrieved, meaning we had to optimize our data extraction processes to stay within these limits
without missing out on crucial data.
3. Real-Time Data Processing: Achieving real-time data updates posed a challenge because the
speed of API responses and the need for constant data refreshing can cause latency. This required
us to balance the frequency of data pulls with server capacity and responsiveness.
4. Data Volume and Storage: The high volume of social media data can create storage and
performance bottlenecks. Storing and processing large amounts of unstructured data required
careful planning around database structures, which led to the use of cloud storage solutions to
51
handle scalability.
a. Platform Selection and API Use: While Facebook, Instagram, and Twitter remained the key
platforms, API rate limits led us to implement a tiered data retrieval system. More frequent
updates were scheduled for high-priority data (e.g., trending hashtags and recent posts), while less
frequently accessed data (e.g., follower demographics) were updated on a slower cycle.
c. Visualization Constraints: Given the volume of data to be visualized, Tableau was chosen not
only for its robust visualization capabilities but also for its performance optimization features that
allow handling large datasets. We also limited certain visualizations (e.g., historical data) to
prevent the dashboard from becoming too cluttered.
1. Data Collection: Data was collected from various social media platforms using APIs and web
52
scraping tools. APIs like Twitter’s provided structured data, while web scraping allowed us to
collect user comments, post metrics, and other engagement data from platforms with restricted
API access.
2. Data Preprocessing: The raw data was cleaned and prepared for analysis. This included
removing duplicates, handling missing values, and normalizing text data for further processing.
Sentiment analysis was then applied using NLP techniques to classify user comments as positive,
negative, or neutral.
3. Data Integration: Data from multiple sources (different social media platforms) was merged
to create a unified dataset. This allowed for cross-platform comparisons and tracking of overall
trends, such as total engagement or average sentiment across all platforms.
4. Visualization in Tableau: Processed data was sent to Tableau, where interactive visualizations
were created. The dashboard included bar charts, line graphs, and heatmaps to represent user
engagement, sentiment analysis, follower growth, and content performance across different time
periods and platforms.
5. User Interaction: The dashboard was designed to be interactive, allowing users to filter by
time range, platform, or content type. This flexibility enabled businesses to focus on specific
campaigns or content types and drill down into the data for deeper analysis.
To ensure the system’s flexibility, we designed the dashboard with a modular structure, allowing
for the easy addition of new metrics or platforms as needed. The cloud storage solution was
chosen to handle large data volumes efficiently, ensuring the system could scale as needed
without sacrificing performance.
53
3.6 Implementation Plan/Methodology
The project was implemented using the Agile methodology, ensuring iterative development and
allowing for frequent feedback and improvements. The implementation was broken down into the
following phases:
a. Cleaning and Filtering: Duplicate entries and irrelevant data points (such as bot interactions)
were removed. Missing values were handled, and text data was normalized to prepare it for sentiment
analysis.
b. Sentiment Analysis: Using NLP techniques, each post and comment was classified into positive,
neutral, or negative sentiment categories. This step was crucial for identifying trends in user
emotions across different posts and campaigns.
c. Data Merging: Data from different platforms was merged to provide a comprehensive view of
engagement metrics across all platforms. This allowed for cross-platform comparisons, showing how
a campaign performs on Twitter compared to Instagram, for example.
Engagement Trends: Line graphs and bar charts were used to show trends in likes, shares, retweets,
57
comments, and overall engagement over time.
Sentiment Analysis: The dashboard includes a visual breakdown of the sentiment (positive, neutral,
negative) for posts and user comments, allowing businesses to gauge public reaction to specific
campaigns or posts.
Top-Performing Content: The dashboard highlights the posts with the highest engagement,
enabling businesses to identify which types of content resonate most with their audience.
Platform Comparisons: Visualizations that compare performance across different platforms (e.g.,
Instagram vs. Twitter) help businesses understand where they should focus their efforts.
Real-Time Data Updates: We tested the frequency of data refreshes to ensure that the dashboard
could handle real-time updates without significant delays or data inconsistencies. The data refresh
rates were optimized to avoid overloading the API limits while still providing timely insights.
User Feedback: The dashboard was tested by a group of users, including social media managers and
marketers, who provided feedback on its usability, clarity of visualizations, and the relevance of the
insights generated. Their feedback helped refine the design and improve the user experience.
The successful validation of these results demonstrates the effectiveness of the solution in providing
actionable insights from social media data.
58
CHAPTER - 5
5.1 Conclusion
The project titled Social Media Analytics with Tableau Dashboard aimed to develop an interactive
tool for analyzing user engagement, sentiment, and content performance across major social media
platforms, including Facebook, Twitter, and Instagram. The primary objective was to create a
dashboard that aggregates data from these platforms, providing businesses and marketers with
actionable insights to optimize their social media strategies.
The dashboard successfully visualized key metrics such as likes, shares, comments, and follower
growth, as well as performed sentiment analysis on user comments and posts. This provided
businesses with an understanding of public sentiment toward their content, identifying trends that
influence engagement. Additionally, the project employed Natural Language Processing (NLP)
techniques to analyze user sentiment and identified top-performing content, offering critical insights
for improving future campaigns.
Through the use of Tableau for real-time data visualization and API integration for continuous data
updates, the project demonstrated the potential of visual analytics in simplifying large datasets and
offering a user-friendly platform for stakeholders. The project achieved its goal by providing
businesses with an efficient tool to monitor social media performance and improve their digital
presence.
The project achieved these goals by successfully implementing a user-friendly dashboard that met
the expectations for data aggregation, visualization, and sentiment analysis. The dashboard's
59
interactive features, such as filters for specific time periods and platforms, allowed for customized
insights, making it an effective tool for real-time social media monitoring.
API Limitations: Platforms like Instagram restrict access to certain types of data through their APIs.
This limitation sometimes led to incomplete data for analysis, requiring web scraping as a secondary
method to fill in gaps.
Real-Time Updates: While the system aimed to provide real-time data updates, the frequency of
these updates was occasionally constrained by API rate limits, particularly on platforms like Twitter.
As a result, some data points could not be refreshed as frequently as initially planned.
Despite these challenges, the overall system performed well and provided meaningful insights, albeit
with slight limitations in data availability and update frequency.
Enhanced API Integration: Further integration with social media APIs and improving existing web
scraping techniques could help gather data more efficiently and minimize delays in data updates.
Streaming Data Technologies: Implementing real-time data streaming technologies, such as Kafka or
Amazon Kinesis, would allow for more immediate data analysis and reporting, especially for time-
sensitive campaigns.
60
5.2.2 Expanding Sentiment Analysis Capabilities
The current sentiment analysis focuses on classifying user comments and posts into positive, neutral,
or negative categories. Future work can extend these capabilities to provide more fine-grained
sentiment analysis and emotion detection:
Emotion Detection: Expanding the system to classify emotions such as joy, sadness, anger, or
surprise would offer a deeper understanding of user reactions, which could be particularly useful for
brand management and customer support.
Multi-Language Sentiment Analysis: Incorporating multi-language sentiment analysis to handle
posts in languages other than English would make the tool more versatile, particularly for global
brands.
LinkedIn and YouTube: Expanding the system to include LinkedIn for professional content
analysis and YouTube for video performance metrics would provide a more comprehensive view of
social media presence.
TikTok Analytics: Incorporating TikTok analytics could help brands better understand the younger
audience, providing valuable insights into engagement trends on emerging platforms.
Machine Learning Models: By training machine learning models on historical social media data, the
system could predict which types of content are likely to generate the most engagement, allowing
businesses to plan their social media campaigns more effectively.
Trend Detection: Predicting emerging social media trends based on historical data would allow
businesses to stay ahead of the competition and adjust their strategies in real-time.
61
5.2.5 Building a Real-Time Sentiment Monitoring System
Expanding the current system into a real-time sentiment monitoring tool could add significant value
to businesses looking to respond to user feedback instantly:
Real-Time Alerts: Adding a feature that triggers real-time alerts for significant shifts in sentiment
(e.g., a sudden increase in negative comments) would help businesses address potential issues
proactively, improving brand reputation management.
Sentiment Visualization Over Time: Implementing time-series analysis of sentiment trends could
help businesses understand how their audience's emotions change over time, providing insights into
long-term brand perception.
Custom Dashboards: Allowing users to create custom dashboards based on specific metrics or
campaigns could make the tool more flexible for different use cases.
Actionable Recommendations: Incorporating automated recommendations based on the data could
guide businesses on how to improve engagement, optimize posting schedules, or adjust content
strategies to maximize results.
62
REFERENCES
1. Ghosh, S., Priyankar, A., Ekbal, A. and Bhattacharyya, P. (2023) ‘Multitasking of sentiment
detection and emotion recognition in code-mixed Hinglish data’, Knowledge-Based Systems,
Vol. 260, p. 110182.
2. Li, Y., Chan, J., Peko, G. and Sundaram, D. (2023) ‘Mixed emotion extraction, analysis and
visualisation of social media text’, Data Knowledge Engineering, Vol. 148, p. 102220.
3. Machova, K., Szaboova, M., Paralic, J. and Micko, J. (2023) ‘Detection of emotion by text
analysis using machine learning’, Frontiers in Psychology, Vol. 14, p. 1190326.
4. Madhu Midhan, T., Selvaraj, P., Harshavardan Kumar Raju, M., Bhanu Prakash Reddy, M. and
Bhaskar, T. (2023) ‘Classification of mental health and emotion of human from text using
machine learning approaches’, 2023 6th International Conference on Information Systems and
Computer Networks (ISCON), pp. 1–7.
5. Chowanda, A., Sutoyo, R., Meiliana and Tanachutiwat, S. (2021) ‘Exploring text-based
emotions recognition machine learning techniques on social media conversation’, Procedia
Computer Science, Vol. 179, pp. 821–828.
6. Tan, K., Lim, T. and Tan, C. W. (2021) ‘A study on multiword expression features in emotion
detection of code-mixed Twitter data’, pp. 1–5, September.
7. Saumya, S., Kumar, A. and Singh, J. P. (2021) ‘Offensive language identification in Dravidian
code-mixed social media text’, Proceedings of the First Workshop on Speech and Language
Technologies for Dravidian Languages, pp. 36–45, Association for Computational Linguistics,
April.
8. Vijay, D., Bohra, A., Singh, V., Akhtar, S. S. and Shrivastava, M. (2018) ‘Corpus creation and
emotion prediction for Hindi-English code-mixed social media text’, Proceedings of NAACL-
HLT 2018: Student Research Workshop, (New Orleans, Louisiana, USA), pp. 128–135,
Association for Computational Linguistics, June 2–4.
9. Wadhawan, A. and Aggarwal, A. (2021) ‘Towards emotion recognition in Hindi-English code-
mixed data: A transformer-based approach’, arXiv preprint arXiv:2102.09943v2, February 28.
10. Ahmad, G. I., Singla, J., Ali, A., Reshi, A. A. and Salameh, A. A. (2022) ‘Machine learning
techniques for sentiment analysis of code-mixed and switched Indian social media text corpus:
A comprehensive review’, International Journal of Advanced Computer Science and
Applications (IJACSA), Vol. 13, No. 2.
63
11. Kumari, J. and Kumar, A. (2022) ‘A deep neural network-based model for the sentiment
analysis of Dravidian code-mixed social media posts’, July.
12. Shanmugavadivel, K., Sathishkumar, V., Raja, S., et al. (2022) ‘Deep learning-based sentiment
analysis and offensive language identification on multilingual code-mixed data’, Scientific
Reports, Vol. 12, p. 21557, December 13.
13. Shekhar, S., Garg, H., Agrawal, R., et al. (2023) ‘Hatred and trolling detection transliteration
framework using hierarchical LSTM in code-mixed social media text’, Complex Intelligence
and Systems, Vol. 9, pp. 2813–2826.
14. Ameer, I., Sidorov, G., Gomez-Adorno, H. and Nawab, R. A. (2022) ‘Multi-label emotion
classification on code-mixed text: Data and methods’, IEEE Access, Vol. 10, pp. 23854–23868,
January 14.
15. Kumar, A., Saumya, S. and Singh, J. P. (2021) ‘An ensemble-based model for sentiment
analysis of Dravidian code-mixed social media posts’, Proceedings of FIRE 2021: Forum for
Information Retrieval Evaluation, (India), pp. 1–10, CEUR Workshop Proceedings, December
13–17.
16. Maity, K., Jha, P., Saha, S. and Bhattacharyya, P. (2022) ‘A multitask framework for sentiment,
emotion, and sarcasm-aware cyberbullying detection from multi-modal code-mixed memes’,
Proceedings of the 45th International ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR ’22), (New York, NY, USA), pp. 1739–1749, Association for
Computing Machinery.
17. Rani, P., Suryawanshi, S., Goswami, K., Chakravarthi, B. R., Fransen, T. and McCrae, J. P.
(2020) ‘A comparative study of different state-of-the-art hate speech detection methods for
Hindi-English code-mixed data’, Proceedings of the Second Workshop on Trolling, Aggression,
and Cyberbullying, (Marseille, France), pp. 42–48, European Language Resources Association
(ELRA), May 11–16.
18. Balakrishnan, V. and Kaur, W. (2019) ‘String-based multinomial naïve Bayes for emotion
detection among Facebook diabetes community’, Procedia Computer Science, Vol. 159, pp.
30–37.
19. Nandwani, P. and Verma, R. (2021) ‘A review on sentiment analysis and emotion detection
from text’, Social Network Analysis and Mining, Vol. 11, No. 1, p. 81.
64
20. Rabeya, T., Ferdous, S., Ali, H. S. and Chakraborty, N. R. (2017) ‘A survey on emotion
detection: A lexicon-based backtracking approach for detecting emotion from Bengali text’,
20th International Conference of Computer and Information Technology (ICCIT), pp. 1–7.
21. Bharti, S. K., Varadhaganapathy, S., Gupta, R. K., Shukla, P. K., Bouye, M., Hingaa, S. K. and
Mahmoud, A. (2021) ‘Text-based emotion recognition using deep learning approach’,
Computational Intelligence and Neuroscience, Vol. 2022, No. 1, p. 2645381.
22. J, A. K., Cambria, E. and Trueman, T. E. (2022) ‘Transformer-based bidirectional encoder
representations for emotion detection from text’, 2021 IEEE Symposium Series on
Computational Intelligence (SSCI), pp. 1–6.
23. Majumder, N., Poria, S., Gelbukh, A., Cambria, E. and Mihalcea, R. (2019) ‘Dialoguernn: An
attentive RNN for emotion detection in conversations’, Proceedings of the AAAI Conference on
Artificial Intelligence, Vol. 33, No. 01, pp. 6818–6825.
24. Poria, S., Cambria, E., Hazarika, D., and Vij, P. (2020) ‘A deeper look into sarcastic tweets using
deep convolutional neural networks’, Information Processing & Management, Vol. 56, No. 5, p.
102101.
25. Huang, L., Ji, Y., Mohtarami, M. and Glass, J. (2020) ‘EmotionX-IDEA: Emotion BERT:
Improving the accuracy of emotion detection in code-mixed texts’, Proceedings of the Third
Workshop on Computational Approaches to Linguistic Code-Switching, pp. 34–40.
26. Xia, R., Wang, Z., and Tao, X. (2021) ‘Dual-channel sentiment-emotion model for detecting
emotion from text’, IEEE Transactions on Affective Computing, Vol. 12, No. 3, pp. 617–627.
27. Cohn, T., Baldwin, T., and Derczynski, L. (2019) ‘Evaluating emotion detection on code-mixed
texts: A case study in Hindi-English data’, Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics, pp. 10–20.
28. Pradhan, P., Pande, A., and Mehta, A. (2022) ‘Sentiment analysis and emotion detection for
Hindi-English code-mixed data using transfer learning’, Proceedings of the ACL 2022 Workshop
on Social Media Mining for Health Applications, pp. 23–32.
29. Verma, H., Singla, J. and Patil, S. (2023) ‘Enhancing emotion detection in code-mixed social
media text using BERT-based transformers’, International Journal of Data Science and
Analytics, Vol. 14, pp. 567–576.
30. Gupta, V., Pandey, A., and Sharma, S. (2020) ‘An efficient approach for emotion detection in
multilingual code-mixed social media data’, IEEE Access, Vol. 8, pp. 131000–131012.
65
APPENDIX
1. Plagiarism Report
66
2. Design Checklist
1. Data Preparation
✅ Collected data from social media platforms (Twitter, Instagram, Facebook) via API integration
and web scraping techniques.
✅ Performed data cleaning (removed duplicates, noise, irrelevant data like bot activity).
✅ Preprocessed text for sentiment analysis (normalized text, removed special characters, and handled
missing data).
✅ Split the dataset into training, validation, and testing sets for sentiment analysis and performance
evaluation.
67
USER MANUAL
Prerequisites
Before getting started, ensure the following software is installed on your machine:
2. Run the following command to clone the repository containing the code and data for your
social media analytics project:
On Windows:
68
On Mac:
After activation, ensure that you see (env) at the start of the command line, indicating the virtual
environment is active.
This will install all the required libraries, including Tableau, Pandas, NLP tools, and other
essential packages for data processing and visualization.
Ensure all libraries are installed. If there are any issues, you can update pip and retry:
Collect data from various social media platforms (e.g., Twitter, Instagram, Facebook).
Preprocess the data (e.g., cleaning, normalization, sentiment analysis).
Push the processed data to Tableau for visualization.
4. Once opened, the dashboard will load the processed data, and you’ll be able to interact with
various visualizations, including:
Sentiment Analysis: View the sentiment distribution (positive, negative, neutral) across different
social media platforms.
Engagement Metrics: Track likes, comments, shares, and follower growth.
Top-Performing Content: Identify posts with the highest engagement.
Time Filters: Use filters to explore data over specific time ranges.
a. Platform Filters: Filter the visualizations by platform (e.g., Twitter, Instagram, Facebook) to
focus on specific social media accounts.
b. Time Range Selector: Adjust the time frame to analyze engagement and sentiment trends over
specific periods (e.g., last week, last month).
c. Content Performance: Dive deeper into the performance of individual posts and campaigns,
identifying which content drives the most interaction.
d. Sentiment Drill-Down: Explore how different types of posts (e.g., video, image, text) affect
user sentiment over time.
1. Once you’ve customized the dashboard with filters and explored the insights, you can save
your work:
Go to File > Save As to save a customized version of the dashboard.
2. To export a report:
Use File > Export to create a PDF or image file summarizing the key insights from your social
media data.
Troubleshooting
Solution: Ensure that Python and pip are installed correctly. If needed, update pip with
the command pip install --upgrade pip.
Solution: Ensure that the process_data.py script has been executed correctly and that the
output file is correctly formatted for Tableau.
70