0% found this document useful (0 votes)
27 views65 pages

Project

Uploaded by

Anshul Somani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views65 pages

Project

Uploaded by

Anshul Somani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

SOCIAL MEDIA ANALYTICS

A PROJECT REPORT

Submitted by

Yashaswi Soni (21BCS10751)


Anshul Somani (21BCS10764)
Garv Jindal (21BCS10845)
Naman Sahni(21BCS11724)
Deepanshu Sharma(21BCS7709)

in partial fulfillment for the award of the degree of

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE ENGINEERING

Chandigarh University

November 2024
BONAFIDE CERTIFICATE

Certified that this project report “Social Media Analytics” is the bonafide work of
“Yashaswi Soni, Anshul Somani, Garv Jindal, Naman Sahni, Deepanshu Sharma”
who carried out the project work under my/our supervision.

SIGNATURE SIGNATURE

Dr. Sushil Kumar Mishra Er. Shivani Sharma


SUPERVISOR
HEAD OF THE DEPARTMENT
Assistant Professor

Computer Science Engineering


Computer Science Engineering

Submitted for the project viva-voce examination held on Nov 13, 2024.

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGMENTS

We are immensely grateful to Er. Shivani Sharma, our project supervisor, for her invaluable
guidance, patience, and unwavering support throughout the duration of this project. Her profound
insights and expertise in data analytics and social media technologies were instrumental in shaping
the direction of our research and ensuring the successful completion of this project.

We would like to extend our deepest gratitude to Dr. Sushil Kumar Mishra, Head of the Computer
Science Engineering department at Chandigarh University, for fostering a nurturing academic
environment that emphasizes innovation and research. His support in providing essential resources
and continuous encouragement played a crucial role in our academic growth.

We are also thankful to the faculty of the Computer Science Engineering department for their
invaluable feedback and suggestions throughout the various stages of the project. Their constructive
criticism helped us enhance the quality and scope of our work.

We wish to acknowledge our peers and classmates for their collaboration and insights, particularly
during the brainstorming sessions, which enriched our understanding of the subject. Their constant
exchange of ideas and technical assistance helped refine our project methodology.

This project is the result of the collective effort of many, and we deeply appreciate everyone who
contributed to making it a success.
TABLE OF CONTENTS
Abstract ........................................................................................................................................ i

Graphical Abstract...................................................................................................................... iii

List of Figures ............................................................................................................................ iv

List of Tables............................................................................................................................... v

List of Standards......................................................................................................................... vi

Abbreviations ........................................................................................................................... viii

Symbols ...................................................................................................................................... ix

CHAPTER 1. INTRODUCTION ......................................................................... 1


1.1. Identification of Client/ Need/ Relevant Contemporary issue .......................................... 1

1.2. Identification of Problem .................................................................................................. 6

1.3. Identification of Tasks ...................................................................................................... 7

1.4. Timeline ............................................................................................................................ 8

1.5. Organization of the Report................................................................................................ 9

CHAPTER 2. LITERATURE REVIEW/BACKGROUND STUDY............... 11


2.1. Timeline of the reported problem ....................................................................................11

2.2. Existing solutions ............................................................................................................14

2.3. Bibliometric analysis .......................................................................................................18

2.4. Review Summary ............................................................................................................ 22

2.5. Problem Definition ......................................................................................................... 31

2.6. Goals/Objectives ..............................................................................................................33

CHAPTER 3. DESIGN FLOW/PROCESS ....................................................... 37


3.1. Evaluation & Selection of Specifications/Features ..........................................................37

3.2. Design Constraints ...........................................................................................................40

3.3. Analysis of Features and finalization subject to constraints ............................................42


3.4. Design Flow ..................................................................................................................... 44

3.5. Design selection ...............................................................................................................47

3.6. Implementation plan/methodology ..................................................................................48

CHAPTER 4. RESULTS ANALYSIS AND VALIDATION ........................... 51


4.1. Implementation of solution..............................................................................................51

CHAPTER 5. CONCLUSION AND FUTURE WORK ................................... 61


5.1. Conclusion .......................................................................................................................61

5.2. Future work......................................................................................................................63

REFERENCES ....................................................................................................... 67
APPENDIX ............................................................................................................. 70
1. Plagiarism Report ................................................................................................................70

2. Design Checklist .................................................................................................................71

USER MANUAL .................................................................................................... 72


ABSTRACT

Social media platforms like Facebook, Twitter, and Instagram generate vast amounts of data every day.

Analyzing this data effectively is crucial for businesses to gain insights into user engagement, sentiment, and

content performance. This project focuses on developing an interactive Tableau dashboard for Social Media

Analytics, enabling organizations to visualize key metrics such as likes, shares, comments, follower growth,

and sentiment analysis.

Using API integration and web scraping techniques, data was collected from multiple platforms, followed by

preprocessing steps like data cleaning and Natural Language Processing (NLP) for sentiment analysis. The

Tableau dashboard presents real-time insights, allowing businesses to track engagement trends, analyze

audience behavior, and identify top-performing content.

The project demonstrates the value of visual analytics in simplifying complex data and providing actionable

insights to improve social media strategies, ultimately helping businesses enhance their digital presence and

customer-interactions.

i

, ,औ

, , ,औ

Tableau ,

, , , ,औ

API औ ई ,

औ (NLP) ,

औ -

औ औ

ii
GRAPHICAL ABSTRACT

iii
List of Figures

Figure 1: Gantt Chart for Timeline of Project .............................................................. 7

Figure 2: Paper Databases............................................................................................ 27

Figure 3: Accuracy of Models on Mixed-languages.................................................... 29

Figure 4: Accuracy of Models on Different Datasets .................................................. 31

Figure 5: Model Creation............................................................................................ 39

Figure 6: Flow of Model for Emotion Detection ........................................................ 50

Figure 7: Confusion Matrix ....................................................................................... 52

Figure 8: User Interface ..............................................................................................53

Figure 9: Research Trend............................................................................................ 63

iv
List of Tables

Table I Identification of Tasks .................................................................................................. 8

Table II Aspect, Effectiveness & Drawbacks ........................................................................... 21

Table III Summary of Literature Survey ................................................................................. 25

Table IV Accuracy of Models on Mixed-languages................................................................ 28

Table V Accuracy of Models on Different Datasets ............................................................... 30

Table VI Design Comparison .................................................................................................. 46

Table VII Classification Matrix ............................................................................................... 52

v
List of Standards

Publishing Page
Standard About the Standard
Agency No.

ISO/IEC 8859-1 is a part of the ISO/IEC 8859 series of


ISO/IEC Page
ISO/IEC ASCII-based character encodings, used to represent
8859-1 1
text for English, Punjabi, and other languages in NLP.

IEEE 754 specifies the standard for floating-point


arithmetic, which is essential for consistent numerical Page
IEEE 754 IEEE
representation and accuracy in machine learning 2
computations.

ISO 9126 is a standard for software quality metrics,


which may be used to assess the quality of NLP Page
ISO 9126 ISO
software used in the project, including metrics like 43
reliability, maintainability, and usability.

UTF-8 is a variable-width character encoding standard


Unicode for Unicode, which ensures compatibility in Page
UTF-8
Consortium representing multilingual text such as English-Punjabi 43
code-mixed text.
Specifies standards for Ethernet, enabling networked
systems to communicate and share data, which is Page
IEEE 802.3 IEEE
relevant when NLP models are deployed on networks 44
or accessed remotely.

ISO/IEC 27001 outlines standards for information


ISO/IEC security management, particularly important for Page
ISO/IEC
27001 handling user data securely in sentiment analysis 45
applications.

ISO 9001 defines criteria for quality management


Page
ISO 9001 ISO systems, useful for ensuring consistent project quality
and meeting research and application standards. 50

vi
IEEE 12207 establishes a framework for the life cycle
IEEE processes of software, which is beneficial for Page
IEEE
12207 systematically developing, testing, and deploying NLP 56
applications.

This standard specifies guidelines for annotation of


ISO 24617- Page
ISO dialogue act data, relevant to labeling and processing
2 57
sentiment in code-mixed social media posts.

BERT (Bidirectional Encoder Representations from


Transformers) is a widely used NLP model standard for
Page
BERT n/a bidirectional context representation. Though not
formalized by a standards agency, it sets a model 64
benchmark.

IEEE 830 provides standards for software requirements


IEEE 830- Page
IEEE specifications, useful for defining the requirements and
1998 65
scope for sentiment analysis systems.

vii
ABBREVIATIONS

1. AI - Artificial Intelligence
2. API - Application Programming Interface
3. CNN - Convolutional Neural Network
4. CRF - Conditional Random Fields
5. DL - Deep Learning
6. ELMo - Embeddings from Language Models
7. EM - Expectation-Maximization
8. GPU - Graphics Processing Unit
9. LSTM - Long Short-Term Memory
10. ML - Machine Learning
11. NLP - Natural Language Processing
12. RNN - Recurrent Neural Network
13. RoBERTa - Robustly Optimized BERT Approach
14. SA - Sentiment Analysis
15. SVM - Support Vector Machine
16. TF-IDF - Term Frequency-Inverse Document Frequency
17. TPU - Tensor Processing Unit
18. VADER - Valence Aware Dictionary and sEntiment Reasoner
19. Bi-LSTM - Bidirectional Long Short-Term Memory
20. BERT - Bidirectional Encoder Representations from Transformers
21. TF - TensorFlow

viii
SYMBOLS

1. α (alpha) - Learning rate or parameter for regularization


2. β (beta) - Momentum term in optimization algorithms
3. γ (gamma) - Discount factor in reinforcement learning, or parameter in gamma
correction
4. θ (theta) - Model parameters, typically weights in neural networks
5. Σ (sigma, uppercase) - Summation, often used in equations for loss functions or
aggregations
6. σ (sigma, lowercase) - Standard deviation, or the activation function in neural networks
(e.g., sigmoid)
7. μ (mu) - Mean or average of a distribution
8. ϵ (epsilon) - Small constant to prevent division by zero, often in smoothing functions
9. λ (lambda) - Regularization parameter, e.g., in L2 regularization (Ridge) and L1
regularization (Lasso)
10. ∂/∂x (partial derivative) - Used in gradient calculations
11. |·| - Absolute value or magnitude, often used in loss functions
12. ∀ (for all) - Used in mathematical expressions, such as in generalizations
13. ∃ (there exists) - Denotes the existence of an element satisfying a condition
14. P(x) - Probability of an event occurring, typically in probabilistic models like Naive
Bayes
15. log(x) - Logarithmic function, frequently used in log-loss calculations
16. f(x) - Function of x, representing a model function or transformation

ix
CHAPTER - 1

INTRODUCTION

1.1. Identification of Client /Need / Relevant Contemporary issue

1.1.1 Social Media Analytics as a Contemporary Issue

1. Expanding Role of Social Media in Business and Society: Social media platforms like
Facebook, Twitter, and Instagram have become vital channels for businesses, influencers, and
public institutions to connect with audiences, gather insights, and adapt strategies. The ability
to analyze social media data is essential for understanding public sentiment, customer
engagement, and emerging trends.

a. Growth of User-Generated Content: The volume of data generated by users online is


immense, with billions of daily posts, comments, and shares on social media platforms. This
creates a valuable opportunity for businesses to analyze and harness insights from public
interactions.

b. Application in Multiple Sectors: Social media analytics plays an essential role in various
fields, from business intelligence—where it informs brand strategy and customer satisfaction—
to public health and politics, where it helps monitor societal trends and public opinion.

2. Challenges in Social Media Analytics: Analyzing diverse content from multiple social media
sources presents challenges, especially due to the mix of formal and informal language,
widespread use of slang, abbreviations, emojis, and the lack of standardized structure in posts.

a. Data Noise and Informality: Social media posts often include irrelevant or unstructured
information, complicating sentiment analysis and data accuracy. To tackle this, effective data
preprocessing and cleaning are critical.

b. Dynamic Content and Sentiment Shifts: Sentiment analysis on social media is complex,
10
as public sentiment can shift rapidly in response to events, requiring real-time processing to
capture,trends-accurately.

1.1.2 Statistics & Documentation

1. Trends in Social Media Use: Studies reveal that more than 4 billion users worldwide interact
on social media, with younger demographics showing high engagement rates. This widespread
interaction presents a major opportunity for insights, especially for businesses aiming to
understand consumer preferences and sentiments.

a. User Behavior and Preferences: Users increasingly blend languages and employ informal
expressions, especially in multilingual societies. This trend reflects the natural evolution of
digital communication toward linguistic inclusivity.

b. The Role of Real-Time Data: Real-time data is essential for organizations aiming to
respond quickly to emerging trends and crises, especially in sectors where timely information
is crucial, such as public health and crisis management.

2. Complications in Data Interpretation: Abbreviations, slang, and emojis often complicate the
accuracy of sentiment analysis, underscoring the need for models that can handle both formal
and informal language.

a. Example of Interpretation Challenges: Phrases like “That’s unbelievable, lol!” may imply
sarcasm, but without nuanced models, sentiment interpretation can be ambiguous. This points
to the need for more sophisticated sentiment models capable of recognizing informal language
and contextual cues.

1.1.3 Client Need

11
1. Businesses and Brand Managers: Social media analysts and marketers depend on insights
into public sentiment to gauge brand perception and tailor marketing strategies.

a. Brand Reputation Analysis: Brands can analyze customer feedback on social media to
detect trends in satisfaction, allowing for proactive customer engagement.

b. Targeted Marketing: Emotion and sentiment analysis can help brands adapt campaigns
to resonate with regional audiences, especially where cultural and linguistic factors play a
role.

2. Mental Health and Crisis Management: Public health professionals and psychologists
utilize social media data to monitor public sentiment, detect signs of distress, and provide
timely interventions.

a. Real-Time Support for Mental Health: Social media sentiment analysis offers health
organizations the ability to track public sentiment trends, assisting in identifying
communities in need of support.

b. Crisis Response: During public crises or disasters, timely and accurate sentiment
analysis enables authorities to understand public sentiment and respond with appropriate
support.

3. Limitations of Traditional Models: Conventional emotion detection tools are designed for
monolingual analysis and lack the sophistication to handle code-mixed expressions.
a. Language-Specific Challenges: Certain languages have unique ways of expressing emotions that
may not directly translate, causing conventional models to misinterpret the tone or intent of the
post.

12
b. Example of Model Limitations: Traditional models trained on English may miss emotional
nuances conveyed in a mix of English and Punjabi, such as “Missing home so much yaar," where
“yaar” adds an emotional undertone typical in Punjabi.

1.1.4 Justification through Surveys

1. Industry Demand for Social Media Analytics: A recent survey by the Social Media Analysis
Institute highlights the growing importance of advanced analytics tools for understanding
social media metrics and audience sentiment.

a. Challenges in Analyzing Social Media Data: Over 75% of marketing and tech firms
surveyed expressed difficulty in accurately interpreting social media data due to the informal
and diverse language used across platforms. This has driven the demand for specialized
analytics tools, such as dashboards, that can provide a clear view of engagement, sentiment,
and trends.

b. Increasing Demand for Real-Time Analytics: Approximately 85% of respondents


expressed interest in tools capable of real-time social media analytics. The ability to instantly
assess trends and public sentiment is critical for making prompt, data-driven decisions,
especially in fields like crisis management and brand monitoring.

2. Implications for Business and Public Relations: The importance of analyzing social media
data is especially pronounced in business, where understanding audience engagement and
sentiment is central to brand strategy and reputation management.

a. Customer Feedback and Brand Perception: In today’s digital landscape, customer


feedback on social media provides valuable insights into brand reputation. However, without
the right analytics tools, companies struggle to gather accurate sentiment and engagement
insights, impacting their ability to address customer concerns and improve satisfaction.

b. Crisis Management and Public Sentiment: Social media has become a critical channel for

13
public relations during crises. Real-time analytics allow organizations to monitor public
sentiment as situations unfold, enabling them to communicate more effectively and address
concerns-promptly.

1.1.5 Contemporary Issue and Broader Implications

1. Enhancing Customer Experience: The ability to accurately detect sentiment and engagement
trends across social media platforms can significantly enhance the customer experience.
Businesses can better understand their audience and adjust their strategies to meet customer
needs.

2. Customer Retention and Engagement: By using social media analytics, companies can
understand customer preferences, predict trends, and adjust marketing campaigns accordingly.
Retention strategies benefit as companies leverage insights from customer interactions to
improve loyalty and satisfaction.

3. Real-Time Feedback Interpretation: Social media analytics tools that provide real-time
insights empower companies to respond instantly to customer feedback, addressing concerns
before they escalate. This improves brand loyalty and strengthens the customer relationship.

4. Public Health Monitoring and Mental Well-Being: Beyond business, social media analytics
is valuable in public health. Public health organizations use it to monitor sentiment around
health campaigns or emerging crises, such as outbreaks. Understanding the tone of social
media discussions can guide timely, relevant responses to community concerns.

5. Crisis Management: Social media platforms are often the first line of communication during
emergencies. Analytics tools help organizations monitor public sentiment, understand the
extent of crises, and manage their response strategies effectively. This capability is crucial for
delivering timely support and addressing misinformation.

6. Targeted Advertising and Campaign Personalization: For advertisers, social media


14
analytics provides insights that allow for tailored marketing strategies. By analyzing
engagement and sentiment, advertisers can create more effective, culturally relevant campaigns
that resonate with diverse audiences.

7. Broader Economic Impact: Effective use of social media analytics leads to more informed
decision-making, giving companies a competitive advantage. The insights generated help
foster an inclusive, data-driven approach to digital engagement, benefiting businesses and
users-alike.

1.2. Identification of Problem

The problem of analyzing social media data lies in the unique challenges presented by the vast and
unstructured nature of social media content, especially with respect to the mixed formats and rapidly
evolving language patterns.

 Informal Language: Social media language is informal and unpredictable, with a mix of
emojis, slang, and abbreviations. This dynamic language use demands robust data-cleaning
methods and sophisticated analysis tools.

 Lack of Unified Datasets: Analyzing mixed-format data from various platforms requires a
consolidated dataset. However, such datasets are rare, posing a challenge for creating
reliable sentiment models.

 Real-Time Analysis Needs: The constantly changing sentiment on social media calls for
real-time analysis to capture shifts accurately, especially during events that provoke strong
public reaction.

1.3. Timeline
The project follows a structured timeline from August to November, encompassing all
major phases of analysis, design, implementation, and deployment.

15
August-September: Analysis phase, requirement gathering, and design architecture.
October: Coding phase, developing the main sentiment analysis components.
October-November: Testing and validation to ensure the accuracy of the sentiment
models.
November: Deployment and documentation for usability.

This timeline provides a structured overview to keep the project on schedule and organized.

Figure 1 Gantt Chart for Timeline of Project

16
1.4. Identification of Tasks
Table I: Identification of Tasks

Phase Task Subtasks Duratio


n
1. Data 1.1 Gather data from - Use APIs, web scraping, and third-party 1 week
Collection social media platforms tools to collect data from platforms like
Twitter, Instagram, and Facebook
1.2 Preprocess and clean - Remove duplicates, handle missing data, 1 week
data filter out noise, and standardize data
formats
1.3 Organize data - Structure data into organized formats 2 days
(e.g., tables, CSV) for ease of analysis
2. Analysis 2.1 Explore and analyze - Conduct initial analysis to understand 1 week
raw data trends, outliers, and general data structure
2.2 Perform statistical - Apply basic statistical measures (e.g., 3 days
analysis mean, median, variance) to gain
preliminary insights
3. Setting 3.1 Define analysis goals - Identify metrics of interest (e.g., 2 days
Parameter engagement rate, sentiment)
s for
Analysis
3.2 Set specific - Define key performance indicators (KPIs) 2 days
parameters for analysis and sentiment categories
4. Tableau 4.1 Create data - Generate sheets to display metrics like 1 week
Sheet visualizations in Tableau engagement, sentiment, follower growth
Creation
4.2 Apply filters and - Configure interactive elements to allow 2 days
sorting options for filtering by date range, platform, or
demographic
5. 5.1 Design and assemble - Integrate individual Tableau sheets into a 3 days
Dashboard the dashboard single cohesive dashboard
Creation
5.2 Add interactive - Incorporate dropdowns, checkboxes, and 2 days
features drill-down options for user interactivity
6. Proofing 6.1 Test dashboard - Check all filters, interactions, and data 3 days
functionality visualizations for accuracy
6.2 Gather feedback and - Refine visual elements and ensure clarity 2 days
make adjustments based on user feedback
6.3 Finalize and export - Prepare the final dashboard for 1 day
the dashboard deployment and review

19
1.5. Organization of the Report
Chapter 1: Introduction

 Overview: Introduces the Social Media Analytics project, including its background,
relevance, and the motivations driving its development, especially the importance of
analyzing user interactions on platforms like Twitter, Facebook, and Instagram.

 Problem Statement: Defines the problem of extracting meaningful insights from


unstructured social media data, highlighting challenges such as diverse language
usage, informal expressions, and rapid data growth.

 Objectives: Lists the project's primary goals, including the development of a


comprehensive analysis tool to track social media engagement, sentiment, and
trends, and specific objectives like dashboard creation and real-time analysis.

 Report Structure: Outlines the roadmap of the report, detailing the organization and
flow of chapters.

Chapter 2: Literature Review/Background Study

 Review of Existing Solutions: Summarizes current methods and tools available for
social media data analysis, including approaches to sentiment analysis, data
visualization, and engagement metrics tracking.

 Challenges and Limitations: Discusses difficulties associated with existing


solutions, such as handling unstructured data, multilingual content, informal
language, and real-time processing limitations.

 Gap Analysis: Identifies gaps in current research and technology that the project
aims to address, such as the need for real-time, interactive dashboards and handling
code-mixed language.

 Theoretical Background: Covers the theoretical foundations relevant to social


media analytics, including machine learning, natural language processing (NLP),
and data visualization techniques in Tableau.

20
Chapter 3: Design Flow and Process

 Feature Selection and Evaluation: Details the criteria used to evaluate and select
key features and metrics for social media analysis, such as engagement rates,
sentiment scores, and follower growth.

 Design Constraints: Explains limitations that influenced design choices, including


data privacy concerns, platform-specific data access limitations, and scalability for
large datasets.

 Feature Analysis and Finalization: Describes the analysis of selected features,


considering identified constraints, and finalizes the set of metrics that the dashboard
will present.

 Design Flow: Provides a step-by-step illustration of the project’s design flow, from
data collection and cleaning to visualization and dashboard assembly in Tableau.

 Design Approach: Outlines the chosen approach for implementing the social media
analytics dashboard, including methods for integrating real-time data and creating
user-friendly visualizations.

 Implementation Plan: Details the implementation plan, including necessary


software (e.g., Tableau, Python) and hardware requirements for data processing and
visualization.

Chapter 4: Results Analysis and Validation

 Dashboard Implementation: Describes the actual implementation of the social


media analytics dashboard, outlining the integration of various social media metrics
and visualizations in Tableau.

 Performance Assessment: Reports key performance metrics, such as dashboard


response time, data accuracy, and user engagement with the analytics tool.

 Comparison with Baseline Solutions: Compares the developed dashboard to


existing social media analytics tools or models, highlighting improvements and
unique features.
21
 Result Discussion: Analyzes the significance of results obtained, addresses any
limitations encountered during testing, and interprets the effectiveness of the
dashboard.

 Validation of Outcomes: Verifies findings and conclusions, ensuring the


dashboard’s accuracy and effectiveness in providing actionable insights into social
media trends.

Chapter 5: Conclusion and Future Directions

 Summary of Findings: Summarizes the main findings and accomplishments of the


project, including the successful implementation of a real-time, interactive social
media analytics dashboard.

 Conclusions: Draws overall conclusions based on results, emphasizing the


dashboard’s potential to help businesses and organizations better understand social
media engagement and sentiment.

 Limitations: Identifies any limitations encountered, such as scalability for


extremely large datasets, multilingual analysis constraints, and areas for
improvement in real-time functionality.

 Future Work: Recommends directions for future research and development, such as
expanding multilingual support, adding predictive analytics capabilities, and
integrating more advanced NLP features.

 Guiding Structure: Each chapter systematically guides the reader through the
project’s methodology, findings, and implications, offering a structured approach to
understanding the project’s contribution to social media analytics.

22
CHAPTER - 2

LITERATURE REVIEW/BACKGROUND STUDY

2.1. Timeline of the reported problem

1. Emergence of Social Media Analytics as a Data Source (2005-2010)


Timeline: 2005-2010
Context: The early 2000s marked the initial rise of social media platforms such as Facebook,
Twitter, and LinkedIn. During this period, companies and researchers started recognizing social
media as a valuable data source for understanding public sentiment, trends, and customer
behavior.
Incident/Observation: Although social media analytics as a field was in its infancy, organizations
began observing patterns in user interactions and engagement. However, sentiment analysis tools
were limited and mostly rule-based, designed for text in a single language, which restricted their
ability to capture nuanced sentiments.
Proof: Early studies, such as those by Kaplan and Haenlein (2009), highlighted the potential of
social media analytics for understanding customer engagement, although they noted the need for
more sophisticated tools that could capture sentiment accurately.
2. Growth in Social Media Usage and Development of Engagement Metrics (2010-2015)
Timeline: 2010-2015
Context: The popularity of social media surged globally, and platforms expanded to include
diverse user demographics. Social media became essential for businesses, influencing marketing,
public relations, and customer service.
Incident/Observation: Engagement metrics, such as likes, shares, comments, and follower
counts, became standard indicators of social media success. Analysts began using these metrics to
evaluate audience engagement and brand sentiment, but the tools still lacked the ability to capture
real-time and nuanced insights effectively.
Proof: Studies by Gartner and McKinsey (2012) emphasized the growing importance of
understanding social media engagement, recommending investments in tools that could support
businesses in analyzing these metrics to guide strategic decisions.

23
3. Introduction of Sentiment Analysis for Social Media (2015-2017)
Timeline: 2015-2017
Context: As social media data became central to marketing strategies, companies sought to
understand not only engagement but also sentiment. Early sentiment analysis models were
introduced but were primarily designed for English text and struggled to interpret the informal
language commonly used on social media.
Incident/Observation: Research into sentiment analysis gained traction as businesses wanted to
analyze customer sentiment accurately. However, these early models, which were largely rule-
based, often misinterpreted sentiment, particularly in posts with abbreviations, slang, or informal
language.
Proof: A pivotal study by Pang and Lee (2016) evaluated sentiment analysis tools, identifying
significant limitations in handling social media text accurately. Another study in 2017 by the
Association for Computational Linguistics pointed out the challenges in adapting these models for
real-time sentiment monitoring.
4. Emergence of Advanced NLP and Machine Learning Models (2017-2020)
Timeline: 2017-2020
Context: With advancements in machine learning, particularly deep learning and NLP, companies
began developing more sophisticated language models capable of processing complex text data.
Transformer-based models like BERT and GPT-2 showed potential in improving sentiment
analysis on social media.
Incident/Observation: NLP models were able to capture context more accurately, which
improved sentiment analysis performance. Yet, these models faced difficulties in distinguishing
nuanced emotions and processing non-standard language prevalent on social media.
Proof: Google and Facebook released advanced models like BERT and RoBERTa, which
demonstrated improvements in NLP tasks but acknowledged difficulties in social media sentiment
analysis. A study in 2019 from Facebook AI Research specifically highlighted that despite
advancements, accurately interpreting social media text remains a challenge due to its informal
and evolving nature.
5. Increased Demand for Real-Time Social Media Analytics (2020-Present)
Timeline: 2020-Present
Context: Social media analytics has become essential in real-time sentiment and trend monitoring,
24
especially with the rise in e-commerce, brand management, and public health awareness.
Organizations now require analytics solutions that provide instant insights to respond quickly to
public sentiment and events.
Incident/Observation: The COVID-19 pandemic and increasing use of social media for customer
feedback spurred demand for real-time sentiment analysis and engagement tracking. Companies
and public health organizations recognized the need for tools that could capture and analyze real-
time social media data accurately.
Proof: A 2021 Deloitte report highlighted the need for advanced social media analytics tools
capable of processing real-time data to track sentiment trends, especially during crises.
Additionally, research from McKinsey (2021) noted that real-time analytics has become a critical
tool for businesses and public organizations to make data-driven decisions based on social media
insights.

2.2. Existing solutions for Social Media Analytics

In the field of social media analytics, various methodologies have been employed to analyze and
extract valuable insights from user interactions, engagement patterns, and sentiment expressed in
posts. These solutions span across different techniques, ranging from traditional methods like
keyword analysis to modern machine learning-based approaches. Below are some of the key
approaches used for social media data analysis:

1. Rule-Based and Statistical Methods


Rule-based approaches, often seen in early stages of social media analytics, rely on
predefined sets of rules and statistical techniques to analyze user-generated content. These
methods are particularly useful for keyword-based analysis and simple engagement metrics.

a. Keyword Analysis: This approach focuses on identifying key terms and phrases in social
media posts and categorizing them into sentiment categories (e.g., positive, negative, or
neutral). It is widely used in brand monitoring to track mentions of specific products or
services.
b. Engagement Metrics Calculation: Traditional statistical methods calculate engagement
rates, including likes, shares, comments, and followers. These metrics provide businesses
25
with a high-level overview of social media performance.

Limitations:
a. Rule-based systems are generally not flexible enough to handle nuanced or evolving
language, making them less effective for analyzing informal language, slang, or new
terminologies on social media platforms.
b. They also lack the ability to handle multilingual or code-mixed data effectively.

2. Machine Learning Approaches


Machine learning-based methods have significantly advanced social media analytics by
enabling more sophisticated analysis of content, sentiment, and user behavior.

a. Supervised Learning: Algorithms like Support Vector Machines (SVM) and Random
Forest are used to classify social media posts into categories (e.g., sentiment analysis:
positive, negative, or neutral). These models are trained on labeled datasets where the
sentiment of posts is already known.
b. Unsupervised Learning: Clustering techniques such as k-means or DBSCAN help in
identifying patterns and grouping similar posts based on content features. These models can
be used to uncover emerging topics or trends from user-generated content.

Challenges:
a. These models require large labeled datasets, which are difficult to obtain, especially for
niche topics or languages.
b. They also struggle with informal language, emojis, and mixed-language content commonly
found in social media posts.

3. Natural Language Processing (NLP) for Sentiment and Emotion Detection


Natural Language Processing (NLP) techniques are integral to modern social media analytics,
as they allow machines to understand and interpret human language. NLP models help in
extracting sentiment and emotional tone from posts by analyzing the words, context, and
structure.

26
a. Sentiment Analysis: NLP models use techniques like tokenization, part-of-speech tagging,
and named entity recognition (NER) to assess the sentiment of social media posts. These
methods rely on both lexical and contextual analysis to determine whether the text is positive,
negative, or neutral.
b. Emotion Detection: More advanced NLP models detect specific emotions such as joy,
anger, sadness, or surprise. These models often use emotion lexicons combined with machine
learning models to classify posts into specific emotional categories.

Limitations:
a. NLP-based systems can misinterpret informal language, emojis, sarcasm, or idiomatic
expressions common in social media, which can lead to inaccurate sentiment detection.
b. Language-specific nuances and mixed-language posts pose a significant challenge for
NLP-based models, requiring constant updates and adaptations.

4. Deep Learning Models


Deep learning models have become an essential tool for social media analytics, especially for
processing large volumes of unstructured text data. These models, particularly those built on
architectures such as RNNs (Recurrent Neural Networks) and CNNs (Convolutional Neural
Networks), are designed to handle the complexity of textual data and capture long-term
dependencies in the data.

a. Recurrent Neural Networks (RNNs): RNNs are used to analyze the sequential nature of
social media posts, especially in cases where the sentiment or meaning of a post depends on
its context or previous parts. RNNs are particularly useful for time-series analysis of posts,
such as monitoring trends over time.
b. Long Short-Term Memory (LSTM): An advanced type of RNN, LSTMs can capture
long-range dependencies, making them more effective at understanding longer posts or
comments that contain mixed sentiments and emotional shifts.

Advantages:
a. Deep learning models, especially LSTMs and attention mechanisms, excel at processing
large datasets and can automatically learn patterns from raw text data, eliminating the need
27
for feature engineering.
b. These models can handle a more diverse range of social media data, from posts to
comments and interactions, improving their effectiveness in sentiment analysis.

5. Transformer-Based Models
Transformer models like BERT (Bidirectional Encoder Representations from Transformers)
and its multilingual variant mBERT have revolutionized the field of social media analytics.
These models can process entire sequences of text in parallel, capturing contextual meaning
across long distances within text.

a. BERT and mBERT: BERT models are pretrained on vast amounts of text data in multiple
languages, enabling them to understand context in both monolingual and multilingual posts.
This makes them highly effective for analyzing the sentiment and emotions in code-mixed or
multilingual social media posts.
b. Fine-Tuning for Specific Tasks: Transformer models like BERT can be fine-tuned on
domain-specific datasets, such as sentiment analysis for specific industries, brand monitoring,
or political sentiment tracking, ensuring that they provide accurate insights for social media
applications.

Advantages:
a. Transformer models like BERT are highly accurate for understanding the nuances of
language in a social media context and are particularly effective at interpreting code-mixed or
multilingual posts.
b. Their ability to capture relationships between words across different languages allows
them to outperform traditional NLP models in complex, real-time applications.

6. Hybrid Models
Hybrid models combine the strengths of machine learning, NLP, and deep learning
techniques to provide more accurate and reliable social media analytics solutions. These
models often integrate rule-based sentiment lexicons with machine learning or deep learning
models to improve emotion detection accuracy.

28
a. Lexicon + Machine Learning: Hybrid models first use lexicon-based approaches to
assign initial sentiment scores or labels to posts, which are then refined using machine
learning classifiers like SVM or decision trees for better accuracy.
b. Multimodal Analysis: Some hybrid models integrate non-textual data (e.g., emojis,
images, and hashtags) into the analysis to capture a fuller picture of social media interactions.
Emojis and hashtags often carry significant emotional or contextual meaning, enhancing the
sentiment classification process.

Advantages:
a. Hybrid models provide a better overall performance by combining the strengths of both
rule-based lexicons and machine learning or deep learning techniques, especially in
processing informal, evolving social media language.
b. They also offer greater flexibility, adapting to various languages, social media platforms,
and different types of engagement (comments, posts, retweets, etc.)

2.3. Bibliometric analysis


Bibliometric analysis is a quantitative research method used to evaluate scientific literature
through statistical measures. In the context of social media analytics, bibliometric analysis can
help track the evolution of research in the field, identify key authors, journals, publications, and
emerging trends. By applying bibliometric techniques, researchers can assess the impact of social
media studies and better understand the current landscape of social media analytics.

Bibliometric analysis involves analyzing various bibliographic data such as publication counts,
citations, and co-authorship networks. It is widely used to identify the progression of scientific
research, the relationships between different areas of study, and the key contributors to a particular
field.

Key Features of Bibliometric Analysis in Social Media Analytics

1. Publication Trends:
Bibliometric analysis helps in tracking the number of publications in the area of social media
analytics over time, identifying periods of significant growth or decline. This can reveal how
29
the field has evolved, whether through the rise of new research topics or the shift in focus of
existing ones.

2. Citation Analysis:
Citation counts are one of the most important metrics in bibliometric analysis. High citation
counts typically indicate influential research. In social media analytics, citation analysis can be
used to identify seminal papers, authors, and key theories that have shaped the field.

3. Co-authorship Networks:
This analysis examines the collaboration patterns between authors. It identifies how scholars
in social media analytics collaborate on research, which research institutions dominate the
field, and how knowledge is shared within the academic community.

4. Keywords and Topic Modeling:


Bibliometric analysis helps identify frequently used keywords and terms in social media
analytics papers. This allows researchers to identify emerging trends and hot topics in the
field. It can also help track shifts in research focus, such as the shift from basic sentiment
analysis to more advanced methods like emotion detection and deep learning.

5. Journals and Publishers:


Analyzing which journals and publishers are most prominent in publishing social media
analytics research can reveal the most respected and influential publications in the field. It
helps to identify gaps where more research is needed or emerging journals where new research
might be published.

6. Impact Factor:
Analyzing the impact factor of journals and citations of specific publications provides insight
into the scientific influence and relevance of certain research in the field of social media
analytics.

Effectiveness of Bibliometric Analysis in Social Media Analytics

1. Tracking Research Evolution:


Bibliometric analysis is effective in identifying trends and shifts in social media analytics

30
research. By tracking the evolution of topics over time, researchers can recognize which
methodologies (e.g., machine learning, deep learning, NLP) have gained prominence and
which have faded from interest.

2. Identifying Key Influencers:


Through citation and co-authorship analysis, bibliometric methods help identify leading
scholars, institutions, and research groups in the field. This is crucial for networking, forming
collaborations, or understanding the most influential works in the field.

3. Quantifying Research Impact:


Bibliometric analysis provides a clear measure of the impact of social media analytics
research, whether by citation counts or journal impact factors. This helps assess the
importance of specific studies or the contributions of particular authors or research teams.

4. Informing Future Research:


By identifying gaps in the existing literature, bibliometric analysis can inform future research
directions. For example, if a specific aspect of social media behavior or analytics has received
limited attention, researchers can target this gap with novel contributions.

5. Comparative Insights:
Bibliometric analysis helps compare research outputs across various subfields of social media
analytics. For instance, the number of papers on sentiment analysis versus those on social
media engagement tracking can provide insights into what areas are prioritized in the field.

Drawbacks of Bibliometric Analysis in Social Media Analytics

1. Limited Scope of Citation-Based Metrics:


While citation counts and impact factors are widely used in bibliometric analysis, they do not
necessarily capture the true value or quality of research. A highly cited paper may not always
be groundbreaking, and some important research may not be well cited due to the niche nature
of the topic or the journal in which it is published.

2. Exclusion of Non-Social Media Analytics Sources:


Bibliometric analysis tends to focus on formal academic publications, but important insights

31
into social media analytics may come from sources outside the academic literature. For
example, industry reports, white papers, and conference proceedings may offer valuable
information not captured in citation databases.

3. Bias Toward English Language Publications:


Bibliometric analysis often heavily favors English-language publications, especially in fields
like social media analytics, where much of the research is conducted and published in English.
This limits the comprehensiveness of the analysis, especially in regions where social media
use and analytics are burgeoning but not necessarily reflected in English publications.

4. Data Availability Issues:


Bibliometric analysis relies heavily on databases like Scopus, Web of Science, or Google
Scholar for citation and publication data. However, access to these databases may be limited
or incomplete in certain regions or institutions, potentially skewing the results or missing
critical research in the field.

5. Lack of Context:
While bibliometric analysis can tell you how many times a paper has been cited or its impact
factor, it does not provide context. It is difficult to gauge the actual contribution of a paper in
terms of its novelty or real-world application from citation data alone.

6. Over-Reliance on Quantitative Metrics:


Bibliometric analysis predominantly relies on quantitative metrics like citation counts,
publication volume, and journal impact factors. These measures, however, fail to account for
qualitative aspects such as the originality of the research, the depth of analysis, or the
relevance to real-world social media analytics problems.

7. Emerging Fields and Lack of Data:


Social media analytics is a rapidly evolving field, and new research topics emerge frequently.
Bibliometric methods may not capture emerging trends or methodologies in real-time,
especially for new or niche topics that have not yet had a significant number of publications or
citations.

32
Table II: Aspect, Effectiveness & Drawbacks

Aspect Details
Key - Trends in Research Output: Growth in publications, emerging research focus
Features areas.
- Research Methodologies: Use of machine learning, deep learning, hybrid models,
and lexicon-based methods.
- Key Researchers and Institutions: Identification of prominent authors and
research institutions.
- Keywords and Themes: Common keywords like "code-mixed sentiment analysis",
"social media mining", "deep learning for bilingual texts".
- Publication Venues: Journals and conferences with a significant number of papers
in the field.
Effectiveness - Identifying Emerging Trends: Understanding shifts toward deep learning and
multimodal approaches.
- Research Gaps: Revealing shortcomings in existing methodologies, such as
informal language handling.
- Quality of Research: Impact based on citation count and journal ranking.
- Assessment of Tools and Techniques: Most used sentiment lexicons and deep
learning frameworks.
Drawbacks - Citation Bias: Overreliance on citation counts, possibly ignoring novel but under-
cited work.
- Exclusion of Non-Published Work: Missing out on valuable research in
conferences, white papers, or dissertations.
- Inability to Measure Practical Impact: Focusing on academic impact, not real-
world application.
- Limited to Textual Features: Ignoring the role of multimedia in sentiment analysis
on social media.

35
2.4. Review Summary

In the literature review, several key insights were identified that are directly relevant to the Sentiment
Analysis in English-Punjabi Mixed Social Media Posts project. These findings form the
foundation for the development of the proposed solution and help address specific challenges that
have been highlighted in previous research. Here is how the findings are linked to the project at hand:

1. Complexity of Code-Mixed Texts

 Literature Insight: Code-mixing, particularly in social media posts, involves the blending
of two or more languages, which significantly complicates sentiment analysis. Research has
shown that traditional models often fail to effectively handle the nuances of code-mixed
content, especially when it involves informal language, slang, and unique expressions that are
prevalent in social media.
 Link to the Project: This insight is directly applicable to the project, which focuses on
improving sentiment analysis for English-Punjabi mixed texts. The project aims to enhance
existing models to handle these language complexities, ensuring more accurate detection of
emotions such as positivity, negativity, or neutrality in posts containing both English and
Punjabi.

2. Shift Toward Deep Learning Models

 Literature Insight: Deep learning models like LSTM and BERT have been increasingly used
for sentiment analysis due to their ability to understand context and capture long-term
dependencies in code-mixed texts. These models have proven to be more effective than
traditional methods (such as Naive Bayes or SVM) when it comes to understanding the
dynamics of mixed-language data.
 Link to the Project: The project will leverage deep learning techniques such as Recurrent
Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and BERT models.
By using pretrained multilingual embeddings, the project aims to enhance sentiment

36
classification by capturing the contextual meaning of words in both English and Punjabi
within social media posts.

3. Challenges of Informal Language and Slang

 Literature Insight: Social media language is informal, with frequent use of abbreviations,
slang, and emojis. These aspects make sentiment analysis challenging as they often deviate
from formal grammar and vocabulary. Existing lexicons, like SentiWordNet, have limitations
when applied to such informal language.
 Link to the Project: The project addresses these challenges by focusing on data pre-
processing techniques, including slang detection and contextual language normalization, to
better handle the informal nature of code-mixed posts. Moreover, custom sentiment lexicons
tailored for English-Punjabi text will be developed to ensure better accuracy in understanding
slang and informal expressions in the posts.

4. Importance of Multilingual Sentiment Lexicons

 Literature Insight: Lexicon-based approaches play an important role in sentiment analysis,


especially when used in conjunction with machine learning techniques. However, existing
lexicons often lack the ability to handle mixed-language texts, which is crucial for the success
of sentiment analysis in bilingual or multilingual environments.
 Link to the Project: This finding highlights the need to develop a bilingual sentiment lexicon
specifically designed for English-Punjabi mixed texts. The project will create or adapt
existing lexicons to provide emotional labels for words in both languages, which will be
integrated into the sentiment analysis model.

5. Transformer-Based Models for Contextual Understanding

 Literature Insight: Transformer models like BERT and mBERT have shown strong
performance in multilingual sentiment analysis tasks because they can understand the broader
context of words and phrases, rather than relying solely on n-grams or individual words.

37
 Link to the Project: The Sentiment Analysis in English-Punjabi Mixed Social Media Posts
project will make use of multilingual BERT (mBERT) or XLM-R (Cross-lingual Model) to
enhance sentiment analysis. These models have been pretrained on large multilingual
datasets, enabling them to grasp the semantics of mixed-language posts, which is key to
accurate sentiment detection.

6. Hybrid Models for Improved Performance

 Literature Insight: Hybrid approaches, combining machine learning models with lexicon-
based methods, have been shown to improve sentiment classification accuracy. This is
because hybrid models can leverage the strengths of both rule-based systems (which
understand sentiment-bearing words) and data-driven methods (which learn complex patterns
from large datasets).
 Link to the Project: The project will explore hybrid techniques, combining lexicon-based
sentiment analysis with deep learning models. This will help capture both explicit emotional
cues from the lexicon and nuanced emotional patterns that can only be learned from large,
labeled datasets.

7. Real-World Applications and Societal Relevance

 Literature Insight: Research has demonstrated that sentiment analysis in code-mixed social
media posts has significant real-world applications, especially in customer experience
management, mental health monitoring, and targeted advertising. However, there is a gap in
tools capable of accurately processing mixed-language content.
 Link to the Project: This project is positioned to address this gap, with the aim of creating a
tool that can effectively analyze emotions in bilingual or multilingual social media posts. The
tool could be used in customer feedback analysis, mental health assessment, and other
domains where emotional insights from social media are crucial for decision-making.

38
Accuracy of different models on different datasets: -

Table V: Accuracy of Models on different Datasets

Model Dataset Approach Accuracy

T5-3B SST (NLP) Transformer and self-attention 97.40%

MT-DNN- Transformer and smoothness-


SMART SST (NLP) inducing regularization 97.50%

Self-supervised representation
GRU CREMA-D (SER) learning 55.01%

CREMA-D and
EmoAffectNet AffectNet (FER) CNN-LSTM 79%

IEMOCAP and MELD Multi-task CNN and multi-head


M2FNet (Multimodal) attention-based fusion 69.69%

CH Fusion IEMOCAP (Multimodal) RNN and feature fusion strategy 76.50%

EmotionFlow- BERT model and Conditional


large MELD (Multimodal) random field (CRF) 66.50%

Deep Convolutional Neural


FN2EN CK+ (FER) Network (DCNN) 98.60%
Multi-task
EfficientNet-
B2 AffectNet (FER) MTCNN and Adam optimization 66.29%

CNN and Class Activation


EAC RAF-DB (FER) Mapping (CAM) 90.35%

BiHDM SEED (EEG signal) RNNs 74.35%

CMU-MOSEI
MMLatch (Multimodal) LSTM, RNNs, and Transformers 82.40%

Graphically, we can demonstrate the performance of different models on different datasets: -

43
Figure 4: Accuracy of Models on different Datasets

2.5. Problem Definition

1. Scope of the Problem

The goal of this project is to develop an efficient sentiment analysis model tailored to code-mixed
social media content, specifically posts that combine multiple languages, such as English and
Punjabi. The scope covers:

 Identification of Emotions: The primary focus is on classifying emotions (positive, negative,


neutral) expressed in mixed-language posts on platforms like Twitter, Facebook, and
Instagram.

44
 Handling Code-Mixing: Code-mixing in social media is a common practice, especially in
multilingual communities. This project aims to address the challenges posed by code-
switching, where users blend multiple languages within a sentence or even a single word.
 Language Diversity: While the primary focus is on English-Punjabi code-mixing, the
approach should be adaptable to handle other bilingual or multilingual combinations that are
prevalent on social media.
 Contextual Sensitivity: The sentiment classification model will need to understand both the
textual content and context, including handling nuances like sarcasm, irony, and emotional
undertones often used in informal communication.

2. Challenges in Sentiment Analysis for Code-Mixed Content

Sentiment analysis in code-mixed social media content presents several unique challenges that must
be addressed:

 Linguistic Complexity: Code-mixed content often features words, phrases, and constructs
from multiple languages, making it difficult for traditional sentiment analysis models that are
typically designed for a single language. For example, the emotional tone of a post can depend
on the language used in different segments of the sentence.
 Informal Language: Social media is filled with slang, abbreviations, and creative language
usage (e.g., emojis, acronyms like "LOL," and internet-specific expressions) that complicates
the extraction of clear sentiments.
 Contextual Interpretation: Sentiment on social media is not always straightforward. The
same word can have different meanings depending on its context, making sentiment detection
more challenging. Sarcasm, humor, and irony are often used, where a positive sentiment word
like "great" might carry a negative connotation when used sarcastically.
 Data Scarcity and Labeling Issues: There is a shortage of large, labeled datasets of code-
mixed social media content for training machine learning models. The limited availability of
such datasets affects the robustness of sentiment models.
 Multilingual Models: Code-mixed text may span multiple languages (e.g., English and
Punjabi), and most existing models do not effectively handle multiple languages

45
simultaneously. While multilingual models like mBERT exist, they still face challenges in
understanding the specific nuances of code-mixed content.

3. Expected Outcomes

The expected outcome of this project is the creation of a sentiment analysis system that can:

 Accurately Classify Emotions: The system should classify the sentiment expressed in code-
mixed posts into one of three categories: positive, negative, or neutral.
 Handle Code-Switching: The model should effectively process content that contains
multiple languages, even when they are intermixed within sentences or phrases.
 Improve Accuracy with Contextual Understanding: By incorporating deep learning
models like RNNs, LSTMs, or transformers (such as mBERT), the system should better
understand the context of mixed-language posts and improve sentiment accuracy compared
to traditional methods.
 Provide Real-World Applications: The developed system can be deployed for use in real-
time social media monitoring tools, enabling brands, marketers, and mental health
professionals to understand user sentiments in multilingual digital spaces.
 Continual Adaptability: The system should be capable of being retrained as new slang,
abbreviations, and linguistic patterns emerge on social media, ensuring long-term
effectiveness.

2.6. Goals/Objectives

The following objectives set clear milestones for the sentiment analysis project targeting code-mixed
social media posts. These objectives outline what is to be learned, performed, and achieved during
the course of the project.

46
1. Data Collection and Preprocessing

 Objective: Collect a comprehensive dataset of code-mixed social media posts (English-


Punjabi and potentially other code-mixed pairs) from platforms like Twitter, Facebook, and
Instagram.
o Milestone: Gather a dataset of at least 10,000 code-mixed posts for training and
validation by the end of Month 1.
o Measure: The dataset will be validated by ensuring it includes a balanced distribution
of positive, negative, and neutral sentiments.
 Objective: Preprocess the collected data by performing tasks such as tokenization, language
identification, normalization, and handling informal language (slang, emojis).
o Milestone: Complete the preprocessing pipeline by the end of Month 2, ensuring
proper handling of code-mixing and informal language.
o Measure: Evaluate preprocessing quality by checking if the model can correctly
tokenize and classify mixed-language phrases.

2. Model Development and Selection

 Objective: Develop multiple machine learning models for sentiment analysis, including
traditional models (Naive Bayes, SVM), lexicon-based methods, and deep learning
approaches (LSTM, BERT).
o Milestone: Implement baseline models by the end of Month 3.
o Measure: Performance of these models will be validated using a cross-validation
technique to assess their ability to classify sentiment correctly.
 Objective: Train deep learning models, such as LSTM or mBERT, to handle code-mixed text
and improve sentiment analysis accuracy.
o Milestone: Train and evaluate LSTM and mBERT models by the end of Month 4.
o Measure: Achieve a minimum of 80% accuracy on the validation set for both models.

47
3. Model Evaluation and Optimization

 Objective: Evaluate and compare the performance of different models (Traditional ML vs


Deep Learning vs Hybrid models).
o Milestone: Complete model evaluation by the end of Month 5.
o Measure: Compare metrics such as accuracy, precision, recall, and F1-score across
all models. Identify the best-performing model for sentiment classification.
 Objective: Fine-tune the best-performing model to optimize its ability to handle code-mixed
content, including optimizing hyperparameters and training on larger datasets.
o Milestone: Finalize model optimization by the end of Month 6.
o Measure: Evaluate the optimized model on a separate test set, aiming for an accuracy
improvement of at least 5%.

4. Real-World Application and Deployment

 Objective: Deploy the sentiment analysis model for real-time sentiment classification on
social media posts.
o Milestone: Implement a prototype sentiment analysis tool and test it on real-time
social media feeds by the end of Month 7.
o Measure: Ensure the deployed tool can classify sentiments with at least 80% accuracy
on live data.
 Objective: Conduct user testing to assess the tool's effectiveness for social media analysts,
marketers, or mental health professionals.
o Milestone: Gather feedback and evaluate the tool's real-world utility by the end of
Month 8.
o Measure: Achieve a positive feedback rate of over 75% from users in terms of
accuracy and usability.

48
5. Reporting and Documentation

 Objective: Prepare comprehensive documentation of the project, including the methodology,


model development process, results, and potential real-world applications.
o Milestone: Complete the documentation and final report by the end of Month 9.
o Measure: Ensure the documentation covers all aspects of the project and is validated
by peer reviews or academic advisors.

6. Future Work and Improvements

 Objective: Identify areas for future improvements, such as handling additional languages or
enhancing contextual understanding for sarcasm and irony.
o Milestone: Document potential future improvements in the final report.
o Measure: Propose at least three new directions for future research or model
enhancement.

49
CHAPTER - 3

DESIGN FLOW/PROCESS

The design and development of the Social Media Analytics Dashboard involved multiple phases,
from the evaluation of features and selection of appropriate technologies to the implementation
and testing of the final solution. This chapter details the steps taken in the design process, the
constraints encountered, and the methodology used to ensure the system met the desired
requirements effectively.

3.1 Evaluation & Selection of Specifications/Features


The first step in the design process was evaluating the essential features that would allow the
dashboard to provide comprehensive insights into social media data. The following specifications
and features were selected after careful consideration of the project objectives and available tools:

1. Social Media Platforms Coverage: We identified the primary platforms from which we
needed to gather data: Facebook, Instagram, and Twitter. These platforms were chosen due to
their widespread use and the availability of APIs that allow for data collection.

2. Metrics to Track: Based on the project’s goal to analyze user engagement and sentiment, the
following metrics were selected:
a. Likes, Shares, Comments, and Retweets: These metrics indicate user interaction and
engagement with the content.
b. Follower Growth: This tracks how the audience is growing over time.
c. Sentiment Analysis: By analyzing user comments and posts, we assess overall
sentiment (positive, neutral, or negative).
d. Top Performing Content: This involves identifying posts that have the highest
engagement across different platforms.

3. Data Collection Methods: The decision was made to use API integration for platforms like
Twitter and Instagram, as these platforms provide structured data through their developer APIs.
50
For platforms with limited API access, such as Instagram’s deeper analytics, web scraping tools
like BeautifulSoup and Selenium were selected to collect the necessary data.

4. Real-Time Data Visualization: The dashboard needed to provide real-time updates to allow
businesses to monitor social media performance instantly. This led to the selection of Tableau for
data visualization due to its powerful real-time analytics capabilities and user-friendly interface.

5. Data Preprocessing: We selected Natural Language Processing (NLP) tools for cleaning the
text data (such as removing noise, correcting misspellings, and handling special characters), and
for conducting sentiment analysis to understand user opinions and emotions better.

3.2 Design Constraints


During the design process, several constraints were encountered that influenced the final
architecture of the project:

1. Data Access Limitations: Not all social media platforms provide complete access to their data
through APIs. For instance, Instagram’s API restricts certain types of data, requiring the use of
web scraping for full data extraction. Additionally, privacy policies limit the kind of personal user
information that can be accessed.

2. API Rate Limiting: APIs like Twitter’s impose rate limits on how frequently data can be
retrieved, meaning we had to optimize our data extraction processes to stay within these limits
without missing out on crucial data.

3. Real-Time Data Processing: Achieving real-time data updates posed a challenge because the
speed of API responses and the need for constant data refreshing can cause latency. This required
us to balance the frequency of data pulls with server capacity and responsiveness.

4. Data Volume and Storage: The high volume of social media data can create storage and
performance bottlenecks. Storing and processing large amounts of unstructured data required
careful planning around database structures, which led to the use of cloud storage solutions to
51
handle scalability.

5. Sentiment Analysis Complexity: Processing mixed-language or slang-filled posts can be


challenging for standard NLP models, as social media often uses informal language. This led us
to customize our sentiment analysis model to better understand common colloquialisms,
abbreviations, and emoji usage.

3.3 Analysis of Features and Finalization Subject to Constraints


After evaluating the necessary features and the constraints, several key decisions were made to
ensure the project would meet its objectives:

a. Platform Selection and API Use: While Facebook, Instagram, and Twitter remained the key
platforms, API rate limits led us to implement a tiered data retrieval system. More frequent
updates were scheduled for high-priority data (e.g., trending hashtags and recent posts), while less
frequently accessed data (e.g., follower demographics) were updated on a slower cycle.

b. Data Preprocessing Customization: To address the challenge of informal and mixed-


language posts, we fine-tuned the NLP models used for sentiment analysis by training them on a
dataset that included common social media slang, emojis, and abbreviations. This improved
accuracy when detecting user sentiment across posts.

c. Visualization Constraints: Given the volume of data to be visualized, Tableau was chosen not
only for its robust visualization capabilities but also for its performance optimization features that
allow handling large datasets. We also limited certain visualizations (e.g., historical data) to
prevent the dashboard from becoming too cluttered.

3.4 Design Flow


The design flow was structured around the data lifecycle—collection, preprocessing, analysis, and
visualization. The following steps describe the flow of data through the system:

1. Data Collection: Data was collected from various social media platforms using APIs and web
52
scraping tools. APIs like Twitter’s provided structured data, while web scraping allowed us to
collect user comments, post metrics, and other engagement data from platforms with restricted
API access.

2. Data Preprocessing: The raw data was cleaned and prepared for analysis. This included
removing duplicates, handling missing values, and normalizing text data for further processing.
Sentiment analysis was then applied using NLP techniques to classify user comments as positive,
negative, or neutral.

3. Data Integration: Data from multiple sources (different social media platforms) was merged
to create a unified dataset. This allowed for cross-platform comparisons and tracking of overall
trends, such as total engagement or average sentiment across all platforms.

4. Visualization in Tableau: Processed data was sent to Tableau, where interactive visualizations
were created. The dashboard included bar charts, line graphs, and heatmaps to represent user
engagement, sentiment analysis, follower growth, and content performance across different time
periods and platforms.

5. User Interaction: The dashboard was designed to be interactive, allowing users to filter by
time range, platform, or content type. This flexibility enabled businesses to focus on specific
campaigns or content types and drill down into the data for deeper analysis.

3.5 Design Selection


The final design was selected based on its ability to balance performance, scalability, and ease of
use. Tableau was chosen for its rich feature set and ease of integration with live data sources,
while API integration was prioritized for platforms that provided access to high-quality data.

To ensure the system’s flexibility, we designed the dashboard with a modular structure, allowing
for the easy addition of new metrics or platforms as needed. The cloud storage solution was
chosen to handle large data volumes efficiently, ensuring the system could scale as needed
without sacrificing performance.
53
3.6 Implementation Plan/Methodology
The project was implemented using the Agile methodology, ensuring iterative development and
allowing for frequent feedback and improvements. The implementation was broken down into the
following phases:

Phase 1: Data Collection Setup


APIs were integrated, and web scraping scripts were built to gather data from the chosen social
media platforms. Initial testing ensured that the data could be retrieved within the platforms’ rate
limits.

Phase 2: Data Preprocessing and Sentiment Analysis


Data preprocessing was automated to clean and prepare the data for visualization. Custom
sentiment analysis models were built using NLP libraries to classify social media posts into
sentiment categories.

Phase 3: Dashboard Development


The processed data was visualized in Tableau, where various chart types (line graphs, bar charts,
pie charts) were used to represent key metrics. Interactive filters were added to allow users to
explore the data by platform, time period, or sentiment.

Phase 4: Testing and Optimization


The system underwent testing to ensure that data was being collected and displayed correctly, and
that the dashboard remained responsive when handling large datasets. We optimized the
dashboard’s performance by reducing the number of visualizations displayed simultaneously and
caching certain results.

Phase 5: Deployment and User Feedback


After final testing, the dashboard was deployed for use. Feedback was collected from potential
users, and adjustments were made to improve usability and add any additional features requested
by stakeholders.
54
CHAPTER - 4

RESULTS ANALYSIS AND VALIDATION

4.1 Implementation of the Solution


The implementation of the Social Media Analytics Dashboard was completed by integrating data
from various social media platforms and visualizing it using Tableau. The solution was tested
extensively to ensure accuracy in data representation and real-time updates.

4.1.1 Data Collection and Integration


Data was collected from Twitter, Instagram, and Facebook using API integration and web scraping
tools. For platforms with comprehensive APIs (e.g., Twitter), structured data such as likes, retweets,
comments, and follower counts were retrieved directly. On platforms with limited API access (e.g.,
Instagram), additional data like comments and hashtag analysis was extracted using web scraping
techniques.

The collected data was then preprocessed by:

a. Cleaning and Filtering: Duplicate entries and irrelevant data points (such as bot interactions)
were removed. Missing values were handled, and text data was normalized to prepare it for sentiment
analysis.
b. Sentiment Analysis: Using NLP techniques, each post and comment was classified into positive,
neutral, or negative sentiment categories. This step was crucial for identifying trends in user
emotions across different posts and campaigns.
c. Data Merging: Data from different platforms was merged to provide a comprehensive view of
engagement metrics across all platforms. This allowed for cross-platform comparisons, showing how
a campaign performs on Twitter compared to Instagram, for example.

4.1.2 Dashboard Design and Visualization


Once the data was processed, it was visualized using Tableau to create an interactive dashboard. The
key features of the dashboard include:

Engagement Trends: Line graphs and bar charts were used to show trends in likes, shares, retweets,

57
comments, and overall engagement over time.
Sentiment Analysis: The dashboard includes a visual breakdown of the sentiment (positive, neutral,
negative) for posts and user comments, allowing businesses to gauge public reaction to specific
campaigns or posts.
Top-Performing Content: The dashboard highlights the posts with the highest engagement,
enabling businesses to identify which types of content resonate most with their audience.
Platform Comparisons: Visualizations that compare performance across different platforms (e.g.,
Instagram vs. Twitter) help businesses understand where they should focus their efforts.

4.1.3 Validation of Results

To validate the solution, we performed multiple tests:


Accuracy of Sentiment Analysis: The accuracy of sentiment classification was validated using a
manually labeled dataset. This dataset allowed us to measure how well the NLP model performed,
with an accuracy rate of 85%.

Real-Time Data Updates: We tested the frequency of data refreshes to ensure that the dashboard
could handle real-time updates without significant delays or data inconsistencies. The data refresh
rates were optimized to avoid overloading the API limits while still providing timely insights.

User Feedback: The dashboard was tested by a group of users, including social media managers and
marketers, who provided feedback on its usability, clarity of visualizations, and the relevance of the
insights generated. Their feedback helped refine the design and improve the user experience.
The successful validation of these results demonstrates the effectiveness of the solution in providing
actionable insights from social media data.

58
CHAPTER - 5

CONCLUSION AND FUTURE WORK

5.1 Conclusion
The project titled Social Media Analytics with Tableau Dashboard aimed to develop an interactive
tool for analyzing user engagement, sentiment, and content performance across major social media
platforms, including Facebook, Twitter, and Instagram. The primary objective was to create a
dashboard that aggregates data from these platforms, providing businesses and marketers with
actionable insights to optimize their social media strategies.

The dashboard successfully visualized key metrics such as likes, shares, comments, and follower
growth, as well as performed sentiment analysis on user comments and posts. This provided
businesses with an understanding of public sentiment toward their content, identifying trends that
influence engagement. Additionally, the project employed Natural Language Processing (NLP)
techniques to analyze user sentiment and identified top-performing content, offering critical insights
for improving future campaigns.

Through the use of Tableau for real-time data visualization and API integration for continuous data
updates, the project demonstrated the potential of visual analytics in simplifying large datasets and
offering a user-friendly platform for stakeholders. The project achieved its goal by providing
businesses with an efficient tool to monitor social media performance and improve their digital
presence.

5.1.1 Expected Results and Outcomes


The expected outcome of this project was the development of a Tableau dashboard that could
visualize social media data in real time, with the ability to track trends in engagement metrics,
sentiment analysis, and content performance. The goal was to help users quickly identify positive,
negative, or neutral sentiment across their social media platforms, providing a holistic view of their
social media presence.

The project achieved these goals by successfully implementing a user-friendly dashboard that met
the expectations for data aggregation, visualization, and sentiment analysis. The dashboard's
59
interactive features, such as filters for specific time periods and platforms, allowed for customized
insights, making it an effective tool for real-time social media monitoring.

5.1.2 Deviations from Expected Results and Reasons


Although the project met most expectations, some challenges arose, particularly in achieving
consistency across all social media platforms.

API Limitations: Platforms like Instagram restrict access to certain types of data through their APIs.
This limitation sometimes led to incomplete data for analysis, requiring web scraping as a secondary
method to fill in gaps.

Real-Time Updates: While the system aimed to provide real-time data updates, the frequency of
these updates was occasionally constrained by API rate limits, particularly on platforms like Twitter.
As a result, some data points could not be refreshed as frequently as initially planned.

Despite these challenges, the overall system performed well and provided meaningful insights, albeit
with slight limitations in data availability and update frequency.

5.2 Future Work


The success of this project in developing a Social Media Analytics Dashboard opens up several
opportunities for future improvements and extensions. Below are some promising directions for
future work:

5.2.1 Improving Data Collection and Real-Time Integration


Although the current system collects data effectively, improvements can be made to achieve more
real-time and automated data updates across all platforms:

Enhanced API Integration: Further integration with social media APIs and improving existing web
scraping techniques could help gather data more efficiently and minimize delays in data updates.
Streaming Data Technologies: Implementing real-time data streaming technologies, such as Kafka or
Amazon Kinesis, would allow for more immediate data analysis and reporting, especially for time-
sensitive campaigns.

60
5.2.2 Expanding Sentiment Analysis Capabilities
The current sentiment analysis focuses on classifying user comments and posts into positive, neutral,
or negative categories. Future work can extend these capabilities to provide more fine-grained
sentiment analysis and emotion detection:

Emotion Detection: Expanding the system to classify emotions such as joy, sadness, anger, or
surprise would offer a deeper understanding of user reactions, which could be particularly useful for
brand management and customer support.
Multi-Language Sentiment Analysis: Incorporating multi-language sentiment analysis to handle
posts in languages other than English would make the tool more versatile, particularly for global
brands.

5.2.3 Integration of Additional Social Media Platforms


Currently, the system supports Facebook, Twitter, and Instagram. To enhance its value, future work
could include the integration of other popular platforms:

LinkedIn and YouTube: Expanding the system to include LinkedIn for professional content
analysis and YouTube for video performance metrics would provide a more comprehensive view of
social media presence.
TikTok Analytics: Incorporating TikTok analytics could help brands better understand the younger
audience, providing valuable insights into engagement trends on emerging platforms.

5.2.4 Developing Predictive Analytics


In addition to tracking current and past performance, the system could be enhanced with predictive
analytics to forecast trends and optimize future content strategies:

Machine Learning Models: By training machine learning models on historical social media data, the
system could predict which types of content are likely to generate the most engagement, allowing
businesses to plan their social media campaigns more effectively.
Trend Detection: Predicting emerging social media trends based on historical data would allow
businesses to stay ahead of the competition and adjust their strategies in real-time.

61
5.2.5 Building a Real-Time Sentiment Monitoring System
Expanding the current system into a real-time sentiment monitoring tool could add significant value
to businesses looking to respond to user feedback instantly:

Real-Time Alerts: Adding a feature that triggers real-time alerts for significant shifts in sentiment
(e.g., a sudden increase in negative comments) would help businesses address potential issues
proactively, improving brand reputation management.
Sentiment Visualization Over Time: Implementing time-series analysis of sentiment trends could
help businesses understand how their audience's emotions change over time, providing insights into
long-term brand perception.

5.2.6 Custom Reporting and Actionable Insights


Developing customizable reporting features could provide even more value to businesses, allowing
them to tailor insights based on specific goals:

Custom Dashboards: Allowing users to create custom dashboards based on specific metrics or
campaigns could make the tool more flexible for different use cases.
Actionable Recommendations: Incorporating automated recommendations based on the data could
guide businesses on how to improve engagement, optimize posting schedules, or adjust content
strategies to maximize results.

62
REFERENCES

1. Ghosh, S., Priyankar, A., Ekbal, A. and Bhattacharyya, P. (2023) ‘Multitasking of sentiment
detection and emotion recognition in code-mixed Hinglish data’, Knowledge-Based Systems,
Vol. 260, p. 110182.
2. Li, Y., Chan, J., Peko, G. and Sundaram, D. (2023) ‘Mixed emotion extraction, analysis and
visualisation of social media text’, Data Knowledge Engineering, Vol. 148, p. 102220.
3. Machova, K., Szaboova, M., Paralic, J. and Micko, J. (2023) ‘Detection of emotion by text
analysis using machine learning’, Frontiers in Psychology, Vol. 14, p. 1190326.
4. Madhu Midhan, T., Selvaraj, P., Harshavardan Kumar Raju, M., Bhanu Prakash Reddy, M. and
Bhaskar, T. (2023) ‘Classification of mental health and emotion of human from text using
machine learning approaches’, 2023 6th International Conference on Information Systems and
Computer Networks (ISCON), pp. 1–7.
5. Chowanda, A., Sutoyo, R., Meiliana and Tanachutiwat, S. (2021) ‘Exploring text-based
emotions recognition machine learning techniques on social media conversation’, Procedia
Computer Science, Vol. 179, pp. 821–828.
6. Tan, K., Lim, T. and Tan, C. W. (2021) ‘A study on multiword expression features in emotion
detection of code-mixed Twitter data’, pp. 1–5, September.
7. Saumya, S., Kumar, A. and Singh, J. P. (2021) ‘Offensive language identification in Dravidian
code-mixed social media text’, Proceedings of the First Workshop on Speech and Language
Technologies for Dravidian Languages, pp. 36–45, Association for Computational Linguistics,
April.
8. Vijay, D., Bohra, A., Singh, V., Akhtar, S. S. and Shrivastava, M. (2018) ‘Corpus creation and
emotion prediction for Hindi-English code-mixed social media text’, Proceedings of NAACL-
HLT 2018: Student Research Workshop, (New Orleans, Louisiana, USA), pp. 128–135,
Association for Computational Linguistics, June 2–4.
9. Wadhawan, A. and Aggarwal, A. (2021) ‘Towards emotion recognition in Hindi-English code-
mixed data: A transformer-based approach’, arXiv preprint arXiv:2102.09943v2, February 28.
10. Ahmad, G. I., Singla, J., Ali, A., Reshi, A. A. and Salameh, A. A. (2022) ‘Machine learning
techniques for sentiment analysis of code-mixed and switched Indian social media text corpus:
A comprehensive review’, International Journal of Advanced Computer Science and
Applications (IJACSA), Vol. 13, No. 2.

63
11. Kumari, J. and Kumar, A. (2022) ‘A deep neural network-based model for the sentiment
analysis of Dravidian code-mixed social media posts’, July.
12. Shanmugavadivel, K., Sathishkumar, V., Raja, S., et al. (2022) ‘Deep learning-based sentiment
analysis and offensive language identification on multilingual code-mixed data’, Scientific
Reports, Vol. 12, p. 21557, December 13.
13. Shekhar, S., Garg, H., Agrawal, R., et al. (2023) ‘Hatred and trolling detection transliteration
framework using hierarchical LSTM in code-mixed social media text’, Complex Intelligence
and Systems, Vol. 9, pp. 2813–2826.
14. Ameer, I., Sidorov, G., Gomez-Adorno, H. and Nawab, R. A. (2022) ‘Multi-label emotion
classification on code-mixed text: Data and methods’, IEEE Access, Vol. 10, pp. 23854–23868,
January 14.
15. Kumar, A., Saumya, S. and Singh, J. P. (2021) ‘An ensemble-based model for sentiment
analysis of Dravidian code-mixed social media posts’, Proceedings of FIRE 2021: Forum for
Information Retrieval Evaluation, (India), pp. 1–10, CEUR Workshop Proceedings, December
13–17.
16. Maity, K., Jha, P., Saha, S. and Bhattacharyya, P. (2022) ‘A multitask framework for sentiment,
emotion, and sarcasm-aware cyberbullying detection from multi-modal code-mixed memes’,
Proceedings of the 45th International ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR ’22), (New York, NY, USA), pp. 1739–1749, Association for
Computing Machinery.
17. Rani, P., Suryawanshi, S., Goswami, K., Chakravarthi, B. R., Fransen, T. and McCrae, J. P.
(2020) ‘A comparative study of different state-of-the-art hate speech detection methods for
Hindi-English code-mixed data’, Proceedings of the Second Workshop on Trolling, Aggression,
and Cyberbullying, (Marseille, France), pp. 42–48, European Language Resources Association
(ELRA), May 11–16.
18. Balakrishnan, V. and Kaur, W. (2019) ‘String-based multinomial naïve Bayes for emotion
detection among Facebook diabetes community’, Procedia Computer Science, Vol. 159, pp.
30–37.
19. Nandwani, P. and Verma, R. (2021) ‘A review on sentiment analysis and emotion detection
from text’, Social Network Analysis and Mining, Vol. 11, No. 1, p. 81.

64
20. Rabeya, T., Ferdous, S., Ali, H. S. and Chakraborty, N. R. (2017) ‘A survey on emotion
detection: A lexicon-based backtracking approach for detecting emotion from Bengali text’,
20th International Conference of Computer and Information Technology (ICCIT), pp. 1–7.
21. Bharti, S. K., Varadhaganapathy, S., Gupta, R. K., Shukla, P. K., Bouye, M., Hingaa, S. K. and
Mahmoud, A. (2021) ‘Text-based emotion recognition using deep learning approach’,
Computational Intelligence and Neuroscience, Vol. 2022, No. 1, p. 2645381.
22. J, A. K., Cambria, E. and Trueman, T. E. (2022) ‘Transformer-based bidirectional encoder
representations for emotion detection from text’, 2021 IEEE Symposium Series on
Computational Intelligence (SSCI), pp. 1–6.
23. Majumder, N., Poria, S., Gelbukh, A., Cambria, E. and Mihalcea, R. (2019) ‘Dialoguernn: An
attentive RNN for emotion detection in conversations’, Proceedings of the AAAI Conference on
Artificial Intelligence, Vol. 33, No. 01, pp. 6818–6825.
24. Poria, S., Cambria, E., Hazarika, D., and Vij, P. (2020) ‘A deeper look into sarcastic tweets using
deep convolutional neural networks’, Information Processing & Management, Vol. 56, No. 5, p.
102101.
25. Huang, L., Ji, Y., Mohtarami, M. and Glass, J. (2020) ‘EmotionX-IDEA: Emotion BERT:
Improving the accuracy of emotion detection in code-mixed texts’, Proceedings of the Third
Workshop on Computational Approaches to Linguistic Code-Switching, pp. 34–40.
26. Xia, R., Wang, Z., and Tao, X. (2021) ‘Dual-channel sentiment-emotion model for detecting
emotion from text’, IEEE Transactions on Affective Computing, Vol. 12, No. 3, pp. 617–627.
27. Cohn, T., Baldwin, T., and Derczynski, L. (2019) ‘Evaluating emotion detection on code-mixed
texts: A case study in Hindi-English data’, Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics, pp. 10–20.
28. Pradhan, P., Pande, A., and Mehta, A. (2022) ‘Sentiment analysis and emotion detection for
Hindi-English code-mixed data using transfer learning’, Proceedings of the ACL 2022 Workshop
on Social Media Mining for Health Applications, pp. 23–32.
29. Verma, H., Singla, J. and Patil, S. (2023) ‘Enhancing emotion detection in code-mixed social
media text using BERT-based transformers’, International Journal of Data Science and
Analytics, Vol. 14, pp. 567–576.
30. Gupta, V., Pandey, A., and Sharma, S. (2020) ‘An efficient approach for emotion detection in
multilingual code-mixed social media data’, IEEE Access, Vol. 8, pp. 131000–131012.

65
APPENDIX

1. Plagiarism Report

66
2. Design Checklist

1. Data Preparation
✅ Collected data from social media platforms (Twitter, Instagram, Facebook) via API integration
and web scraping techniques.
✅ Performed data cleaning (removed duplicates, noise, irrelevant data like bot activity).
✅ Preprocessed text for sentiment analysis (normalized text, removed special characters, and handled
missing data).
✅ Split the dataset into training, validation, and testing sets for sentiment analysis and performance
evaluation.

2. Model Selection and Configuration


✅ Selected appropriate models for sentiment analysis (e.g., Naive Bayes, Logistic Regression, NLP-
based models like BERT).
✅ Configured model hyperparameters (e.g., learning rate, batch size, epochs) for optimized
sentiment classification.
✅ Applied feature extraction techniques using Natural Language Processing (NLP) for extracting
meaningful insights from text data (e.g., word embeddings, TF-IDF).

3. Visualization Design and Metrics Selection


✅ Selected key social media metrics to track, such as engagement rates (likes, shares, retweets,
comments), follower growth, and sentiment trends.
✅ Designed interactive dashboards using Tableau, including visualizations like:
Time-series graphs for sentiment trends (positive, negative, neutral).
Bar charts for engagement metrics across platforms.
Comparative charts for performance across different social media platforms.
✅ Added filters and interactive elements to allow users to explore data by platform, time range, and
specific posts.

4. Dashboard Development and Integration


✅ Integrated real-time data updates from social media APIs to ensure up-to-date visualizations.
✅ Validated data presentation and accuracy in the Tableau dashboard, ensuring clear interpretation
of social media metrics.
✅ Implemented interactive features for users to drill down into specific posts and campaigns to
understand performance.

5. Evaluation and Testing


✅ Defined evaluation metrics for the project, including:
Sentiment classification accuracy (positive, negative, neutral).
Engagement trend analysis (growth in likes, shares, and follower count).
Content performance analysis (identifying top-performing posts).
✅ Tested dashboard performance under varying data loads to ensure it can handle large datasets
without compromising user experience.
✅ Performed error analysis and ensured that the data visualizations remained consistent and accurate
during real-time updates.

67
USER MANUAL

Prerequisites
Before getting started, ensure the following software is installed on your machine:

1. Python 3.8+: Make sure Python is installed on your system.


2. Git: Install Git for version control.
3. Tableau Desktop: You will need Tableau installed to view the dashboard.

Step 1: Clone the GitHub Repository


1. Open your Terminal (Command Prompt or Git Bash).

2. Run the following command to clone the repository containing the code and data for your
social media analytics project:

This will download the project files to your local machine.

3. Navigate to the project directory:

Step 2: Set Up a Virtual Environment (Optional but Recommended)


To isolate the project’s dependencies and avoid conflicts with other Python projects, it’s
recommended to set up a virtual environment:

1. Create a virtual environment:

On Windows:

68
On Mac:

After activation, ensure that you see (env) at the start of the command line, indicating the virtual
environment is active.

Step 3: Install Required Dependencies


Inside the project directory, run the following command to install the necessary Python libraries:

This will install all the required libraries, including Tableau, Pandas, NLP tools, and other
essential packages for data processing and visualization.

Ensure all libraries are installed. If there are any issues, you can update pip and retry:

Step 4: Run the Social Media Analytics Script


Now, you can run the Python script responsible for processing the social media data and pushing
it to the Tableau Dashboard.

Run the following command:

This script will:

Collect data from various social media platforms (e.g., Twitter, Instagram, Facebook).
Preprocess the data (e.g., cleaning, normalization, sentiment analysis).
Push the processed data to Tableau for visualization.

Step 5: Visualize Data in Tableau

1. Open Tableau Desktop on your computer.

2. Go to the File menu and select Open.


69
3. Navigate to the project directory and open the provided Tableau Dashboard file (e.g.,
SocialMediaAnalyticsDashboard.twbx).

4. Once opened, the dashboard will load the processed data, and you’ll be able to interact with
various visualizations, including:

Sentiment Analysis: View the sentiment distribution (positive, negative, neutral) across different
social media platforms.
Engagement Metrics: Track likes, comments, shares, and follower growth.
Top-Performing Content: Identify posts with the highest engagement.
Time Filters: Use filters to explore data over specific time ranges.

Step 6: Explore the Interactive Dashboard


The Social Media Analytics Dashboard includes various features that allow you to interact with
and explore the data:

a. Platform Filters: Filter the visualizations by platform (e.g., Twitter, Instagram, Facebook) to
focus on specific social media accounts.
b. Time Range Selector: Adjust the time frame to analyze engagement and sentiment trends over
specific periods (e.g., last week, last month).
c. Content Performance: Dive deeper into the performance of individual posts and campaigns,
identifying which content drives the most interaction.
d. Sentiment Drill-Down: Explore how different types of posts (e.g., video, image, text) affect
user sentiment over time.

Step 7: Save and Export Reports

1. Once you’ve customized the dashboard with filters and explored the insights, you can save
your work:
Go to File > Save As to save a customized version of the dashboard.

2. To export a report:
Use File > Export to create a PDF or image file summarizing the key insights from your social
media data.

Troubleshooting

1. Issue: "Pip not recognized" or missing dependencies.

Solution: Ensure that Python and pip are installed correctly. If needed, update pip with
the command pip install --upgrade pip.

2. Issue: Data not loading into Tableau.

Solution: Ensure that the process_data.py script has been executed correctly and that the
output file is correctly formatted for Tableau.

70

You might also like