Final Report Mahnoor
Final Report Mahnoor
1. Introduction...................................................................................................3
1.1 Background............................................................................................3
1.2 Problem Statement....................................................................................3
Data Diversity:...............................................................................................3
Volume and Complexity:...................................................................................3
Sentiment and Engagement:...............................................................................3
Behavioral Patterns:.........................................................................................3
1.3 Objectives..............................................................................................4
1.4 Datasets Used..........................................................................................4
Dataset 1: Social Media Posts Dataset..................................................................4
Dataset 2: Social Media Usage and Emotional Well-Being (Testing Dataset)..................5
1.5 Report Structure.......................................................................................5
2. Literature Review...........................................................................................7
2.1 Introduction............................................................................................7
2.2 Social Media as a Data Source.....................................................................8
2.3 Main Fields of Social Media Data Analysis.....................................................9
2.3.1 Sentiment Analysis.............................................................................9
2.3.2 Identifying Trends and Topics..............................................................10
2.3.3 User Interaction and Usage Pattern Study...............................................11
2.3.4 Social Network Analysis (SNA)...........................................................13
2.4 Methods for the Analysis of Social Media Data..............................................13
Foundations of Mathematics and Computing off Stream in Social Media Analysis.........13
Polynomial Approximations.............................................................................14
Homomorphic Encryption:..............................................................................14
Visualization and Modeling.............................................................................14
2.4.1 Natural Language Processing (NLP)......................................................14
2.4.2 Machine Learning and Artificial Intelligence...........................................15
2.4.3 Data Visualization............................................................................15
2.5 Difficulties in Analyzing Social Media Data..................................................16
2.6 Gaps in the Literature..............................................................................16
2.7 Conclusion............................................................................................17
3. Methodology...............................................................................................18
3.1 Data Collection......................................................................................18
3.1.1 Twitter & Facebook Dataset................................................................18
3.1.2 Social Media Usage and Emotional Well-being Dataset..............................19
3.1.3 Ethical Considerations.......................................................................19
3.2 Data Preprocessing.................................................................................19
3.2.1 Preprocessing of Twitter Data..............................................................19
3.2.2 Data Cleaning of the Emotional Well-Being Dataset..................................20
3.3 Sentiment Analysis.................................................................................20
3.4 Engagement Analysis..............................................................................21
3.4.1 Preprocessing Twitter & Facebook Data.................................................21
3.4.2 Preprocessing the Emotional Well-Being Dataset......................................22
3.4.3 Sentiment Analysis...........................................................................22
3.5 Advanced Computational Techniques..........................................................23
3.6 Social Network Analysis..........................................................................23
3.7 Visualization.........................................................................................24
3.8 Reporting and Recommendations Encompassing............................................24
3.9 Limitations...........................................................................................25
4. Results, Discussion, and Recommendations.........................................................26
4.1 Results.................................................................................................26
4.1.1 Twitter Dataset Analysis.....................................................................26
4.1.2 Well-Being Dataset Analysis...............................................................31
4.1.3 Facebook Dataset Analysis.................................................................32
4.2 Discussion............................................................................................37
4.2.1 Engagement and Sentiment.................................................................37
4.2.2 Platform-Specific Usage....................................................................37
4.2.3 Topic Trends...................................................................................38
4.3 Recommendations..................................................................................38
4.3.1 For Businesses.................................................................................38
4.3.2 For Policymakers.............................................................................38
4.3.3 For Researchers...............................................................................39
4.3.4 Conclusion.....................................................................................39
5. Conclusion..................................................................................................40
5.1 Overview of the Study.............................................................................40
5.2 Key Findings.........................................................................................40
5.2.1 Engagement Patterns.........................................................................40
5.2.2 Emotional States and Usage................................................................40
5.2.3 Trends and Topics............................................................................41
5.3 Contributions of the Study........................................................................41
5.4 Limitations...........................................................................................41
5.4.1 Dataset Completeness........................................................................42
5.4.2 Platform Scope................................................................................42
5.4.3 Emotional Data Representation............................................................42
5.4.4 Topic Modeling: Subjectivity..............................................................42
5.5 Recommendations..................................................................................42
For Businesses.............................................................................................42
For Researchers............................................................................................43
5.6 Future Directions....................................................................................43
5.7 Final Remarks.......................................................................................43
6. Future work.................................................................................................45
6.1 Expanding Data Sources...........................................................................45
6.2 Improving the Methods of Sentiment Analysis...............................................45
6.3 Understanding the level of engagement and creating a user profile.......................46
6.4 Moving Average, Trends and Topics............................................................46
6.5 Advanced Social Network Analysis.............................................................47
6.6 Key Applications of Real-time Analytics......................................................47
6.7 Professional Standards and Ways to Eliminate Biases.......................................47
6.8 Conclusion............................................................................................48
References........................................................................................................49
Table of Figures
Figure 1: The overview of Twitter Dataset.................................................................29
Figure 2: Average Engagement Metrix Result of Data in MATLAB..................................30
Figure 3: Results of Sentiment Analysis of Data in MATLAB.........................................30
Figure 4: Results of Average Sentiment Score of Data in MATLAB..................................30
Figure 5: Scatter Plot of Sentiment Analysis in MATLAB..............................................31
Figure 6: Results of Word Cloud of Data in MATLAB..................................................32
Figure 7: LDA Topic Modelling in MATLAB.............................................................33
Figure 8: Topic Modeling of Data in MATLAB...........................................................33
Figure 9: Hashtag Analysis of Data in MATLAB.........................................................33
Figure 10: Well-Being Dataset overview Using Matlab...................................34
Figure 11: Analysis of Engagement Patterns in MATLAB..............................................34
Figure 12: Correlation Analysis of Data in MATLAB...................................................35
Figure 13: Facebook Dataset Overview using MATLAB...............................................35
Figure 14: Temporal Trend of Open Analysis in MATLAB.............................................36
Figure 15: Temporal Trend of Close Analysis in MATLAB............................................37
Figure 16: Temporal Trend of High Analysis in MATLAB.............................................38
Figure 17: Temporal Trend of Low Analysis in MATLAB..............................................39
Figure 18: Temporal Trend of Volume Analysis in MATLAB..........................................40
1. Introduction
1.1 Background
Thus, social networks, including Facebook, Twitter, Instagram, LinkedIn, TikTok, etc, are
important in their daily existence (Bengtsson and Johansson, 2022). These platforms also
generate a significant amount of user content, including texts, images, videos, and metadata, all
of which enhance our knowledge of users' willingness, attitude, and behaviour. However, this
abundance of data is a major problem, primarily due to the data's volume, variety, and
heterogeneity. This data cannot be analyzed manually anymore, so there is a need to use
automated approaches to get the desired information (Corvite and Hui, 2024).
Social media data analysis has emerged as a critical tool for organizations, marketers, and
researchers who seek to understand how users interact with content, the underlying affective
states of such engagement, and the processes occurring in social media (Horova et al., 2024). In
recent writing, research has recently incorporated machine learning, natural language processing,
and sentiment analysis to analyze social media data. For this purpose, MATLAB is ideal for
analysis since it is a very effective computational platform, especially in statistics, learning and
data visualization (Chukwunweike et al., 2024). In particular, with the help of the effective
functions of MATLAB, this study aims to analyze user behaviors, perceptions, and interactivity
concerning different social media platforms to offer valuable suggestions that organizational
actors can use to improve their outcomes (Dunsin et al., 2024).
This paper uses MATLAB to analyze social media data, with special reference to user
interaction, sentiment analysis, and trends (Najafabadi, Skryzhadlovska and Valilai, 2024). This
research will reveal the nature of user engagement and how emotions affect users' behaviour.
This approach aims to obtain practical conclusions, enhance user interaction on social media, and
produce effective business, marketing and content strategies for different prospective producers.
Data Diversity:
Social media data consists of different types, including text, images, video, and metadata, which
need to be analyzed and integrated in different ways (Afyouni, Aghbari and Razack, 2021).
Behavioral Patterns:
Recognition of regularities of user interactions and level of engagement in general and across
various social networks, in particular, is still a challenging task because of cultural points of view
and, for example, differences in impact and effects of various social networks.
This research addresses these challenges by developing a MATLAB method for analyzing social
media data. The purpose is to create models to analyze user interactions, determine their
attitudes, and detect new network trends. Specifically, this study will be useful for businesses,
marketers, and researchers by offering MATLAB-based instruments to help understand SM
interactions and make more relevant decisions based on such knowledge (Bilinski, 2022).
1.3 Objectives
This research will analyze social media data to uncover patterns in user engagement, sentiment,
and trending topics. The specific objectives of this project are as follow
To track common tendencies in users' behavior, compare the degree of activity of users
with different indicators (likes, shares, comments, and mentions) in different social
networks.
To analyze the correlation between sentiment and user engagement, sentiment analysis is
applied to establish a positive/negative/neutral correlation between social media postings
and the level of shares, comments, and likes generated.
To discover and categorize the topics that are most popular at the present time by using
the methods of keyword frequency analysis and topic modelling to determine what
subjects are most actively discussed at the present time and how these trends develop in
different social networks.
To compare engagement and positive and negative tones determined by the user age,
gender or geographic location and how they impact behavior online and social media
platform interaction.
The purpose of this study is to analyze and identify major results presented in the data,
such as the users' overall sentiment and frequency of activity, to facilitate the assessment
of the discovered patterns through the utilization of charts, graphs, and word clouds.
To provide practical recommendations for businesses, marketers, and content producers
based on the analysis, such as content plans, interaction techniques, and enhancements to
the user experience on social networks.
The combination of these datasets enables the assessment of users' activity and mood, as well as
the identification of patterns in the relationship between social media activity and mood and the
differences between different users and platforms.
Chapter 2:
Literature Review: This chapter will present a review of the literature on social media data
analysis, with emphasis on sentiment analysis, topic modeling, and engagement analysis. It will
discuss the difficulties and approaches applied in the field and provide the background for this
study.
Chapter 3:
Methodology: This chapter will describe the data analysis process for the collected datasets,
including data cleaning, sentiment analysis, topic modeling, and demographic analysis. It will
also describe in detail the application of MATLAB's computational tools.
Chapter 4:
Results and Discussions: This chapter will encompass the findings from the analysis part of the
study, highlighting aspects and main observations on user engagement, attitude, and trends. The
findings will be discussed considering prior studies, and conclusions will be made concerning
users' behavior and social media usage.
Chapter 5:
Conclusion and Recommendations: This last chapter will synthesize the study's results, evaluate
the research's constraints, and provide recommendations for businesses, marketers, and content
producers based on the study's findings.
2. Literature Review
2.1 Introduction
Social sites like Facebook, Twitter, Instagram, Linked In, TikTok and many others generate
immeasurable amounts of user output daily. This data is diverse and dynamic; it records the
current interactions of billions of users across the globe (Camacho et al., 2020). This wealth of
data tells the story of the user’s activity, behavior, perception, and trends in status updates,
comments, likes, posts, links, pictures, videos, and audio. Therefore, analyzing this data has
become an important activity for businesses, researchers, and policymakers, who need to monitor
the opinion and behavior of the population and their interactions with products and services
(Habib and Raza, 2021).
The issue is that social media data is not uniform; it includes quantitative data such as time
stamps and user characteristics and qualitative data such as text, images, and videos. This data is
diverse and complex, and it needs to be analyzed using methods beyond the scope of this paper
(Nasir et al., 2021). More than basic computational tools are required to obtain useful
information and knowledge; applying the more sophisticated technological tools, including
machine learning, sentiment analysis, topic modeling, and social network analysis, is inevitable.
These techniques help in pattern mining, predictive modeling and trend analysis, which are
important in fast-moving environments (Najafabadi, Skryzhadlovska and Valilai, 2024).
In addition, it means that data provided by the various social platforms is significant for the
platforms themselves and for the overall society goals and objectives, including election
campaigns, catastrophe response, and public health endeavors (Pierce et al., 2021). For instance,
during the COVID-19 health crisis, social media was primarily used to inform the public and to
measure its opinions on healthcare policies or vaccines. Alas, the enormity and diversification of
social media data requires a strong analytical framework and tools to unleash its potential
(Nagaraj and J, 2021).
This chapter aims to review the social media data analysis literature about methods, tools, and
issues. It also points out the areas in the literature that need improvement, thus indicating a
trajectory for the expansion of the research area (Dukovski et al., 2021). Thus, the goal of this
study is to contribute to the identification of the current state of social media data analysis and
help build a framework for extracting valuable information from social media data (Busalim,
Ghabban and Hussin, 2020).
Another important feature of social media data is that it is diverse. In developed countries, any
account, post, comment, like, share, and multimedia content in images and videos shape the
multidimensional dataset. These elements offer a different view of user activity and interaction.
For instance, frequency analysis of hashtags can help identify hot topics or trends, while the
comments made by the users may indicate the public's perception of a certain brand, event, or
policy (Chukwunweike et al., 2024).
However, this is also a major problem because there are many different types of people. Most of
the data collected from social media is unstructured and, therefore, cannot be easily analyzed
using conventional tools. According to (Corvite and Hui, 2024), managing the amount and
diversity of this data is a challenging task and requires proper preprocessing and analysis tools.
Some preprocessing techniques include text cleansing, noise reduction, and feature creation,
which help transform data from raw input to useful forms.
Another important characteristic of social media as a data source is that it is real-time data. While
other types of datasets are static, social media data is dynamic, meaning it captures a process as it
unfolds. That is why social media is so useful for monitoring what is going on now, popular
topics, and conflicts. For example, people turn to social networks to receive updates on natural
disasters so the authorities can track regions requiring attention and organize necessary
assistance (Nandi and Sharma, 2020).
However, using social media data has benefits, which can be realized if several issues are
considered. The data is frequently 'noisy' and contains many unrelated items, including
colloquialisms, acronyms, and sarcasm, which, if introduced into the equation, can skew the
results. In addition, the ethical issues of using data that can identify the user, and their details
must always be well addressed.
Altogether, social media can be viewed as the major source of big data regarding peoples'
behavior and opinions and overall trends in society (Rodriguez and Storer, 2019). However, to
achieve this, strong analytical tools are needed that can effectively capture the complexity of the
system and solve the problems associated with it. This highlights the calling of systems such as
MATLAB, which provide abilities for managing, deciphering, and portraying big information
from social media for vast research (Camacho et al., 2020).
Hailong et al. (2014) have pointed out that lexicon-based methods use sentiment dictionaries that
assign sentiment values to words. Although less complex than previous methods, they can be less
effective in capturing context, irony, or other contextual features and linguistic patterns. On the
other hand, machine learning techniques utilize supervised learning algorithms for more accurate
sentiment prediction. Current techniques like Support Vector Machines (SVMs), Random
Forests, and Neural Networks have worked far better in sentiment classification (Bengtsson and
Johansson, 2022).
Transformer-based models like BERT (Bidirectional Encoder Representations from
Transformers) have recently gained much strength in sentiment analysis. The contextual
embeddings and self-attention mechanisms make BERT models better understand the
dependency of the words in the sentence than in the traditional model. For example, sarcasm
identification, a well-known problem in sentiment analysis, has been reported to benefit from
such models (Bilinski, 2022).
For business applications, sentiment analysis is indispensable for supervising clients' opinions,
observing public mood during an election campaign, and tracing changes in consumers'
perceptions of a brand. For instance, during a product launch, firms can monitor the social media
response, look for problems that may arise, and then solve them. This dynamic capability shows
that sentiment analysis is valuable in the current business environment (Chukwunweike et al.,
2024).
Keyword density analysis is one of the easiest ways to identify trends based on the number of
specific terms in each set. However, more complex methods like Latent Dirichlet Allocation
(LDA) make the topic modeling by detecting other forms of constructor thematic latent within
the large text corpora. LDA team's words often co-occur, showing communities of related themes
(Corvite and Hui, 2024).
In addition, temporal analysis also helps to enhance the trend identification process since topics
concerning frequency changes are examined. Lecture: For instance, #MeToo or #ClimateChange
are the sorts of hashtags that might be used to increase activity levels during certain events.
Knowing these temporal patterns enables an organization to address user concerns as they arise
(Afyouni, Aghbari and Razack, 2021).
Research has also examined dynamic topic modeling, which tracks shifts in topic frequency over
time. When used with real-time monitoring instruments, such methods can uncover trends as
they emerge, making them ideal for real-time decision-making processes characterized by high
levels of uncertainty, such as marketing or disaster management (Gkikas et al., 2022).
However, using social media data can be beneficial only if several challenges are solved. In most
cases, the data is ‘noisy’: it comes with so much unwanted information, such as informal
language, slang, abbreviations, and even irony! Moreover, the ethical considerations of using
users’ information must be observed more closely, especially when the analyzed data are
classified as private or contain an individual’s personal information (Dunsin et al., 2024)..
Another important factor for considering usage patterns is demographic factors. This means that
audience segmentation classifies users by age, sex, geographical location, and interests to
understand the audience’s needs regarding communication better. For example, short and funny
videos may be popular among the youth on TikTok, while selected and serious posts may be
searched for on LinkedIn by working people interested in professional content (Corvite and Hui,
2024).
Moreover, combining demographics and natural language processing provides more information
about user behaviour trends. For instance, some genders may have higher or lower effective
reactions and emotional attitudes towards specific topics, which defines their relations with
content. This information is useful for developing targeted advertising messages and customer
retention (Afyouni, Aghbari and Razack, 2021).
Analyzing the interaction with social media content requires focusing on user engagement and
usage behaviour. This field uses likes, shares, follow-ups, comments, and retweets (Nandi and
Sharma, 2020). These metrics define the content and can be viewed as quantitative measures of
its popularity and the message volume in the user community.
The aspects of network properties are usually presented through parameters such as centrality,
density, and clustering coefficients. Measures like centrality select the most popular user or the
user with the biggest audience or connection level in the network. Such users are referred to as
opinion leaders, and they are very influential in society and engagement.
SNA is most useful when it is used to identify which communities are similar in terms of interest
or activity (Qalati et al., 2021). By identifying the groups of connected users, the researchers can
target them with the content, making the content more relevant. SNA is also helpful in tracking
the spread of messages, whether good or bad, for use in developing mechanisms of discouraging
the spread of negativity.
Another of the more challenging applications of SNA is sentiment propagation analysis, which
examines how sentiments are spread through a network. Such information can help an
organization predict campaigns' impact or identify potential problems before they emerge
(Chakraborty, Bhattacharyya and Bag, 2020).
Homomorphic Encryption
One of the biggest issues with social media analytics is the inability to have privacy.
Homomorphic encryption deals with this issue by allowing data processing on encrypted data to
ensure users’ confidentiality. Using encryption methods in MATLAB, intensive data analysis of
relevant and often sensitive data like mental health indicators or demographics data can be
performed safely (Anzano-Oto, Vázquez-Toledo and Latorre-Cosculluela, 2023).
There are various preprocessing steps, such as Stop word removal, stemming, and lemmatization,
which is essential for preparing text data for analysis. MATLAB has functions that make these
processes easier and more uniform to perform. The data can be analyzed again using a lexicon-
based method or more complicated machine learning classifiers in case the data is large
(Chukwunweike et al., 2024).
LDA and other topic models allow latent themes to be aggregated from the text. When these
topics are visualized, researchers can get a better perspective of the users and market trends (Lee,
Wood and Kim, 2021).
2.4.2 Machine Learning and Artificial Intelligence
Huge advancements in social media analytics have been observed due to machine learning
methods that allow predictive modeling and pattern analysis. MATLAB's available edition in the
machine learning category contains a comprehensive list of algorithms, including simple
algorithms like Support Vector Machine and complex neural network algorithms (Kurani et al.,
2021).
Besides sentiment classification, machine learning models can estimate the level of user
engagement based on previous experience. For instance, they can predict the chances of a
particular post going viral. These metrics are useful for businesses to align their content
marketing strategies.
Line graphs ease color coding to indicate trends in sentiment or engagement over time as
influenced by events or changes in user behavior. In contrast to word clouds, which show which
terms or hashtags appear most often, line graphs reveal trends in the given topic area.
Graphic plots are especially useful for SNA because they represent users and their relationships.
Through such networks, researchers can pinpoint the key opinion leaders, group the users, and
track the flow of information in social media (Camacho et al., 2020).
Real-Time Analysis: Social media is constantly changing, so data has to be collected live to
capture evolving trends. Building large systems that can process data in real-time is a technical
problem.
Ethical Concerns: Processing personal data is a privacy issue, especially under the GDPR
regulation. The authors must be compliant and use anonymization measures to safeguard the
users' identities (Bengtsson and Johansson, 2022).
Integration of Multimodal Data: Many works are based on textual data, while the potential of
images, videos, and emojis is not considered (Tembhurne and Diwan, 2020).
Real-Time Monitoring: Although post-event analysis is typical, monitoring social media activity
in real-time is still a relatively uncharted area.
Cross-Language and Cross-Platform Analysis: Little work has been done in the area of cross-
language and cross-platform analysis, which could give a better picture of the trends happening
around the world.
Demographic and Psychological Insights: Demographic segmentation is widely used, but more
needs to be known about the psychological factors that affect the use of social networks (Nasir et
al., 2021).
2.7 Conclusion
Social media data analysis is a relatively new and growing study area with important
implications for organizations, scholars, and governments. Sentiment tracking, trend
identification, and social network mapping, among others, are useful paradigms in understanding
the users and social trends. Software like MATLAB offers a strong ability to handle big and
complex data, and the researchers have the advantage of graphics to reveal significant
relationships and patterns.
Nevertheless, difficulties like noisy data, ethical issues, and quickly evolving trends demonstrate
the increased emergence of new methods. Thus, the further development of the presented
approach, which focuses on the problems of multimodal integration, real-time monitoring, and
cross-platform analysis, will contribute to improving the understanding of social media's role in
society.
3. Methodology
This research employs a sound and systematic approach to examine social media data
systematically. It leverages three datasets: (1) a Twitter dataset containing tweets, user activity,
tweet level engagement, and trends related to a specific worldwide political event; (2) a second,
more structured dataset collected through surveys that offer demographic information about the
user, their emotional state, and engagement habits, (3) a third dataset that of Facebook dataset
which consists of identifying users that can be focused more to increase the business. These
valuable insights should help Facebook make intelligent decisions to identify its useful users and
provide correct recommendations to them. MATLAB is the dominant computational
environment and provides a toolbox for data manipulation, analysis, and visualization tools. This
is accomplished by data collection, preprocessing in detail, using high-end computational tools,
and interpreting the results.
The collection process was also characterized by handling the Twitter API's rate limits through
paginated requests and the extraction of data over several days. This approach provided enough
coverage of the user activity, especially in terms of time, during the most important moments of
the event, for instance, debates or the declaration of the results.
Engagement Metrics: Number of posts, likes, comments, and messages sent daily.
Emotional States: Predetermined emotions are labeled as dominant emotions, such as happy, sad,
and anxiety.
This dataset allowed for demographic targeting, making it easy to compare engagement and
emotional differences among users. In contrast to the Twitter dataset, this dataset needed little
preprocessing before analysis since it was already structured.
First, noise, including URLs, mentions (@username), and special characters, were removed from
the data. Hashtags were kept in the data for trend analysis despite the fact that they are usually
noisy. Emojis containing sentiment-bearing information were translated to text using a custom
dictionary (e.g., 😊 – happy). This step made it possible to include them in the sentiment
analysis.
This ensured that words were separated from the rest of the text to facilitate word-by-word
analysis. The text data was then steamed to remove word suffixes and map them to their base or
stem, such as changing 'running' and 'ran' to 'run.' For instance, the use of 'and' and 'the' was
omitted only to consider meaningful words. Lastly, in order to focus only on the tweets that
convey an emotion, those with little semantic content, like only URLs and vague words, were
removed.
Time series data in both datasets were converted to MATLAB's datetime format and then
grouped into hourly, daily, and weekly frequency. Such standardization was helpful in analyzing
the temporal dynamics of users' activity and sentiment.
Sentiment analysis based on lexicon used sentiment dictionaries that were available in
MATLAB. The sentiment of each tweet was determined by adding the sentiment values of the
words used in the tweet. For instance, the tweet "I love this product! 😊," because the positive
word is "love" and the positive emoji is "😊." Nevertheless, techniques based on lexicons could
not recognize contextual phrases or words, such as sarcastic or ambiguous repetitions that
required machine learning classifiers. Some Preliminary Analyses of the Emotional Well-Being
dataset text-dependent phrases, including sarcasm or ambiguous language, called for machine
learning classifiers.
The screen message and sentiment classification through machine learning was done by feature
extraction using the Term Frequency-Inverse Document Frequency (TF-IDF), which established
the relevance of the terms in the dataset. Classification algorithms like SVMs and Random
forests were used to predict the sentiments of the text into three categories: positive, negative, or
neutral. All these models were assessed based on metrics such as F1 score and Area Under the
Curve (AUC), with the models performing at efficiencies of more than 85%.
First, noise was eliminated, including URLs, mentions (@username), and special characters.
Although frequently loud, hashtags were kept for trend analysis because they inform user
discussions. Since emojis contain sentiment-bearing information, they were translated into text
(e.g., 😊 = happy). This step made them captured in sentiment analysis.
Word tokenization was employed during text preprocessing to break it down into words for finer
analysis. Next, numbers were removed from the words, and the words were then lemmatized to
the stem, standardizing as "running" and "ran" to "run". Only terms with valuable content were
researched, such as 'and' or 'the'. Last, tweets with less semantic meaning, for example, those
with links or only containing general words, were removed to focus on the tweets with the
sentiment.
Time series data in both datasets were converted to MATLAB datetime format to facilitate
binning into hourly, daily, and weekly bins. This standardization helped analyze temporal
patterns of users' activity and attitude.
3.4.3Sentiment Analysis
Sentiment analysis tries to classify tweets' emotional polarity while also employing hybrid
methods based on lexicon and machine learning.
Lexicon-based sentiment analysis uses sentiment dictionaries that are available in MATLAB.
The sentiment score of each tweet was obtained by adding the sentiment scores of the words that
made up the tweet. For instance, the positive score was obtained from words such as "love" and
the "😊" emoticon. Nevertheless, methods based on a fixed lexicon were unsuitable for context-
dependent phrases, which include sarcasm or ambiguous language, making it clear that machine
learning classifiers would be more appropriate.
Applying machine learning to sentiment analysis, feature extraction was carried out with Term
Frequency-Inverse Document Frequency (TF-IDF) to arrive at the importance of terms in the
documents. They used Support Vector Machines (SVMs) and Random Forest classifier theories
and algorithms, which depend on labelled datasets to forecast sequential sentiment vocations.
These models were assessed using the F1-score and Area under the curve (AUC), and the results
obtained were above 85% accuracy.
Several measures of anonymity were integrated into the analysis to protect any information that
may be sensitive to the user. Specifically, we used CKKS homomorphic encryption to perform
computations on encrypted data so that the raw user information was not revealed during the
computation. For example, demographic and sentiment data were encrypted before analysis to
compute them while preserving the anonymity of the users.
An example of its improvement was through Optimizers like Nesterov Accelerated Gradient
(NAG). They were useful in reducing the computational time required to achieve the
convergence of various models to half when training large datasets. NAG was most beneficial in
sentiment classification and engagement prediction, where speed is important when performing
multiple runs.
The degree of centrality and betweenness of centrality were calculated to identify the key users.
Degree centrality measured the number of first-degree connections a user had, while
betweenness centrality measured a user's capacity to bridge different network sections. The
opinion leaders were identified as the users with high centrality scores, which are the influential
users. These users were very active in sharing content and encouraging other users to share the
content they posted.
In the network, clustering algorithms were applied to detect communities. For instance, samples
of users interacting with hashtags like #ClimateChange were segmented because these people are
interested in climate issues. These clusters were then represented in modularity-based network
graphs to illustrate the users' connectedness.
SNA also helped identify content-sharing trends. For instance, the study showed that the
centrality of the users was a significant factor in the probability of the tweets being retweeted,
proving that such users are opinion leaders.
3.7 Visualization
Visualization was an important part of the analysis as it offered easy-to-understand data
representations. Word clouds presented frequently used words and hashtags, meaning the most
popular items. For instance, the word cloud for the tag #Elections consisted of words such as
vote, poll, and debate, which are the focus of the discussions.
Sentiment and engagement were analyzed over time with Temporal plots. These plots showed
trends, such as increased positive tone during an important announcement or high activity during
sensitive discussions. The plots also gave the timing of user activity and the progression of
event-related discussions from start to end.
The social network diagram depicted the social network organization and depicted how the users
were connected. With the help of these diagrams, Twitter emphasized influential nodes (users)
and the connections that guided the flow of information within the network. When combined
with the community detection results, the diagrams clearly show the user activity and interaction.
For policymakers, the analysis pointed to the need to monitor public discourse sentiment to
intervene when wrong information is present, or the public is concerned. Recommendations also
involved improving users' emotional health using messages and content relevant to their needs.
3.9 Limitations
However, some limitations were recognized concerning the methodology used in the study. The
data were collected from two social media platforms where data are usually text-based, which
might have resulted in different findings from other SNS. Encryption techniques were found to
cause computational delays that affected real-time analysis, but the use of optimized algorithms
helped to reduce these delays to some extent.
The topics in LDA were labeled subjectively since the analysis of the thematic structures
incorporated the researcher's judgment. Cross-validation was done to avoid common bias in topic
labeling, and the results were reviewed by independent personnel. Future research should rectify
these drawbacks by including cross-data platform data and automated topic categorization
methods.
4. Results, Discussion, and Recommendations
4.1 Results
Engagement Metrics
Average likes, retweets, and replies per post were also obtained, determining the overall level of
user engagement with the content. The mean number of likes was established to be 49.93,
retweets 49.72, and replies to NaN, which can be considered as low to moderate engagement of
users in the dataset.
#COVID19
#BlackLivesMatter
#ClimateChange
#Bitcoin
#TechNews
Figure 2: Average Engagement Metrix Result of Data in MATLAB
With these hashtags, it could be understood that the dataset contains a vast amount of discourse
on international health crises (COVID-19, etc.), social activism (Black Lives Matter, etc.), and
innovations (Bitcoin, AI, etc.). The use of these hashtags indicates that these issues were most
probably of interest when the tweets were gathered.
Sentiment Analysis
The tweets were also analyzed for sentiment to determine whether they were positive, negative,
or neutral. This study employed a positive and negative word list. Positive words were 'good',
'great', 'awesome', and 'happy', while negative words were 'bad', 'sad', 'hate', and 'terrible'.
The number of positive and negative words in each tweet was counted to arrive at a score for
each. If the tweet had more positive words than negative, it was considered positive, and if it had
more negative words than positive, it was considered negative. Tweets that had an equal number
of positive and negative words were considered neutral.
The sentiment scores were appended to the dataset as a new column called sentiment_score, with
positive sentiment assigned a score of 1, negative sentiment a score of -1 and neutral sentiment a
score of 0. This classification allowed us to analyze the correlation between sentiment and user
engagement in further stages.
Word Cloud
A word cloud was created from the cleaned tweet text to analyze the most used terms in the
dataset. Worth mentioning are the words “production,” “politics,” and “development,” which
were expressed frequently when the data was collected from Twitter.
Figure 6: Results of Word Cloud of Data in MATLAB
Topic Modeling
The Latent Dirichlet Allocation (LDA) was used to identify five main topics in the Twitter
dataset. These topics represented distinct themes in user discussions, such as:
Political Opinions: This topic is commonly used in terms such as voting, debate, and
government.
Social Issues: Words such as 'community,' 'climate,' and 'justice' were often used.
User Reactions: Examples of adjectives used included amazing, frustrated, and happy, all of
which pertain to emotive responses to certain occurrences.
The process of fitting the LDA model was iterated, and the perplexity decreased, proving that the
model was suitable for identifying coherent topics.
Figure 7: LDA Topic Modelling in MATLAB
Hashtag Analysis
The dataset didn't include hashtags, which prevented understanding the end-user-driven trend
and keyword association. This absence highlighted the dataset's incompleteness and the need to
include hashtags in the subsequent analysis.
Engagement Patterns
The Well-Being dataset provided insights into user engagement metrics segmented by platform
and demographics:
Average Daily Usage Time: The most time spent on Instagram (120 minutes), Twitter (90
minutes) and Facebook (60 minutes).
Engagement Metrics: On Instagram, female users reported higher likes and comments per day,
and male users reported higher message-sending activity on Twitter.
Emotion Distribution
The dominant emotions in the dataset were distributed as follows:
Correlation Analysis
Correlating dominant emotion with engagement metrics was inconclusive because emotion data
was categorical. This limitation was based on the need for more advanced statistical techniques
or numeric representations of emotional states to allow effective correlation analysis.
4.2 Discussion
At the same time, there is a strong correlation between negative emotions (e.g., Anxiety,
Sadness) and prolonged usage that fuels questions about the psychological side effects of our
overexposure to social media. These results fit in with previous work that shows an association
between excessive social media usage and mental health issues, but this does not yet show
causation.
The absence of hashtags in the Twitter dataset prevented insights into user-driven trends, and
future studies should collect more comprehensive data. Hashtags, on their own, are useful
indicators of what is trending and how the audience is engaging.
4.3 Recommendations
Targeted Marketing: Use Demographics to target specific campaigns. For instance, Instagram is
used for the youth, especially females, and Twitter is used for quick and textual interactions.
Monitoring Emotional Trends: Create ways to track changes in users' attitudes to adjust
marketing and customer service approaches.
Public Sentiment Analysis: To this end, social media sentiment data can be used to analyze the
public's perception of policies and programs, thus allowing early responses to grievances.
Advanced Correlation Methods: Methods can be applied to compare categorical emotional states
and numerical engagement indicators, for example, by converting emotions into sentiment
scores.
Cross-Platform Studies: Broaden the study to include applications other than the two to increase
data coverage of user behavior and trends.
4.3.4 Conclusion
This chapter provided an overview of the analysis of social media datasets, including the level of
user engagement, sentiment, and topics. However, some limitations in the data prevented the
authors from performing some analyses. The recommendations are intended to address these
gaps and inform future research, business practice, and policymaking regarding the use of social
media data.
5. Conclusion
5.1 Overview of the Study
This study explored user behavior, sentiment trends, and engagement dynamics on social media
platforms using the "Social Media Usage and Emotional Well-Being" and Twitter datasets. It
used computational tools and techniques within MATLAB to explore patterns and gain insights
into users' interaction with social media, how sentiment would affect engagement, and how
demographic factors would influence user behavior. The result is a robust framework for future
social media analytics studies that combine social media analytics, computational modeling, and
behavioral research.
The datasets were analyzed using sentiment analysis, engagement metrics, topic modeling, and
demographic segmentation. The data's complexity and sensitivity were handled using advanced
methods such as Latent Dirichlet Allocation (LDA) and homomorphic encryption. Visualization
techniques such as word clouds, temporal plots, network diagrams, etc., aided in interpreting the
findings.
5.5 Recommendations
For Businesses
Content Optimization: Instead of capturing just the news, highlight the emotionally engaging,
positive content that sparks like and likes. Multimedia elements, such as videos and images,
should be added to increase visual appeal and interest.
Targeted Marketing: Apply demographic insights to platform-specific strategies. For instance, if
you are running a campaign for a younger audience, you should focus on Instagram, whereas if
you are running a Twitter campaign, you should focus on rapid, text-based engagement.
Sentiment Monitoring: We develop real-time tools to track public sentiment and adjust
campaigns based on user feedback.
For Policymakers
Public Sentiment Analysis: We capture social media sentiment data to monitor how the public
reacts to policies and initiatives. Early detection of negative sentiment will help identify timely
intervention on public concerns.
Mental Health Awareness: For example, positive and negative emotions have different effects on
social media usage, and, therefore, it should be balanced for users who do not have positive
emotions. Public health campaigns can raise the psychological implications of overuse, as can
the resources for mental well-being available to adherents.
For Researchers
Expand Dataset Coverage: Enable richer analyses by collecting more comprehensive datasets—
all engagement metrics (likes, shares, comments) and any hashtags in them—so they can be
analyzed more easily.
Enhance Emotional Metrics: Develop methods to quantify emotional states numerically,
enabling advanced correlation analyses and cross-platform comparisons.
Explore Cross-Platform Dynamics: Look at how users behave across different platforms and
how different platform designs affect engagement and user sentiment.
Moreover, trend analysis could be more real-time if the data is taken from Twitter or any other
social media feed. This would enable researchers to follow the development of various hashtags,
topics, and sentiments in real-time while providing useful information for businesspeople,
advertisers, and policymakers.
One more promising avenue would be to expand the concept of user profiling further. By
choosing the segments from users' demographic information (age, gender, location, and so on)
and their behavior (influencers, casual users), as well as users' psychological characteristics
(sentiment types, levels of engagement, etc.), the analysis could reveal more about which user
groups are most appropriate for engaging on topics. For example, any further development of the
influencer analysis could be made through not only an identification of the influencer accounts
but also through focusing on the types of tweets they publish and how these impact the overall
sentiments and engagement of the CN.
Further, the temporal aspect of trend detection could be investigated in future research. This
would involve learning how topics change over time and finding out when a discussion is likely
to turn. Dynamic topic models could be applied to understand how topic prominence changes,
which may give insight as to what kind of events influence public discussions.
6.5 Advanced Social Network Analysis
In the case of SNA, improving the algorithms used to identify influential users is possible. Future
work could also consider other centrality measures, such as community detection algorithms like
Louvain or Infomap, which would enable the identification of user clusters based on their
interaction. These methods could identify other 'buried' subgraphs and communities within the
large-scale Twitter network, improving the current understanding of information dissemination in
these various groups.
In addition, sentiment propagation could be analyzed to understand how sentiment flows in the
social network. Knowledge of users' emotions, opinions, and information-sharing activities
might explain how the spread is facilitated, and which aspects of content make certain topics go
viral.
In turn, the analysis of the real-time sentiment distribution could be beneficial in crises. For
instance, monitoring public mood for certain topics during natural disasters, political events, or
any other events could assist governments or organizations in addressing public concerns more
efficiently and how they manage their stakeholders' engagement.
6.8 Conclusion
Future work in social media analytics is expected to have great potential to provide a more
profound and precise understanding of the users' behaviors, sentiment trends, and new patterns.
More advanced methodological approaches, a wider range of data sources, and solutions to the
ethical problems that may emerge in the future contribute to a greater elucidation of the
interactions mediated by and facilitated through social media and their financial, political, and
social implications.
References
Afyouni, I., Aghbari, Z.A. and Razack, R.A. (2021) 'Multi-feature, multi-modal, and multi-
source social event detection: A comprehensive survey,' Information Fusion, 79, pp. 279–
308. https://doi.org/10.1016/j.inffus.2021.10.013.
Anzano-Oto, S., Vázquez-Toledo, S. and Latorre-Cosculluela, C. (2023) 'Digital reality in
Compulsary Secondary Education: uses, purposes and profiles in social networks,' New
Review of Information Networking, 28(1–2), pp. 26–48.
https://doi.org/10.1080/13614576.2023.2219244.
Bengtsson, S. and Johansson, S. (2022) 'The meanings of social media use in everyday life:
filling empty slots, everyday transformations, and mood management,' Social Media +
Society, 8(4), p. 205630512211302. https://doi.org/10.1177/20563051221130292.
Bilinski, P. (2022) 'The content of tweets and the usefulness of YouTube and Instagram in
corporate communication,' European Accounting Review, 33(1), pp. 279–311.
https://doi.org/10.1080/09638180.2022.2084759.
Busalim, A.H., Ghabban, F. and Hussin, A.R.C. (2020) 'Customer engagement behaviour on
social commerce platforms: An empirical study,' Technology in Society, 64, p. 101437.
https://doi.org/10.1016/j.techsoc.2020.101437.
Camacho, D. et al. (2020) 'The four dimensions of social network analysis: An overview of
research methods, applications, and software tools,' Information Fusion, 63, pp. 88–120.
https://doi.org/10.1016/j.inffus.2020.05.009.
Chakraborty, K., Bhattacharyya, S. and Bag, R. (2020) 'A Survey of Sentiment Analysis from
Social Media Data,' IEEE Transactions on Computational Social Systems, 7(2), pp. 450–
464. https://doi.org/10.1109/tcss.2019.2956957.
Chukwunweike, N.J.N. et al. (2024) 'Integrating deep learning, MATLAB, and advanced CAD
for predictive root cause analysis in PLC systems: A multi-tool approach to enhancing
industrial automation and reliability,' World Journal of Advanced Research and Reviews,
23(2), pp. 2538–3557. https://doi.org/10.30574/wjarr.2024.23.2.2631.
Corvite, S. and Hui, J. (2024) 'Social Media as a Lens into Careers During a Changing World of
Work,' Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), pp. 1–27.
https://doi.org/10.1145/3687053.
Drivas, I.C. et al. (2022) 'Social Media Analytics and Metrics for improving users engagement,'
Knowledge, 2(2), pp. 225–242. https://doi.org/10.3390/knowledge2020014.
Dukovski, I. et al. (2021) 'A metabolic modeling platform for the computation of microbial
ecosystems in time and space (COMETS),' Nature Protocols, 16(11), pp. 5030–5082.
https://doi.org/10.1038/s41596-021-00593-3.
Dunsin, D. et al. (2024) 'A comprehensive analysis of the role of artificial intelligence and
machine learning in modern digital forensics and incident response,' Forensic Science
International Digital Investigation, 48, p. 301675.
https://doi.org/10.1016/j.fsidi.2023.301675.
Gheisari, M. et al. (2023) 'Deep learning: Applications, architectures, models, tools, and
frameworks: A comprehensive survey,' CAAI Transactions on Intelligence Technology,
8(3), pp. 581–606. https://doi.org/10.1049/cit2.12180.
Gkikas, D.C. et al. (2022) 'How do text characteristics impact user engagement in social media
posts: Modeling content readability, length, and hashtags number in Facebook,'
International Journal of Information Management Data Insights, 2(1), p. 100067.
https://doi.org/10.1016/j.jjimei.2022.100067.
Habib, A. and Raza, A.A. (2021) 'IoT-Based Pervasive Sentiment Analysis: A Fine-Grained Text
Normalization Framework for context aware Hybrid Applications,' in EAI/Springer
Innovations in Communication and Computing, pp. 201–226.
https://doi.org/10.1007/978-3-030-75123-4_10.
Horova, V. et al. (2024) 'In-Depth examination of the effective use of social networks for
communication in united Territorial communities: Navigating the Digital Landscape,' in
Lecture notes on data engineering and communications technologies, pp. 225–246.
https://doi.org/10.1007/978-3-031-62213-7_11.
Kordzadeh, N. and Young, D.K. (2020) 'How social media analytics can inform content
strategies,' Journal of Computer Information Systems, 62(1), pp. 128–140.
https://doi.org/10.1080/08874417.2020.1736691.
Kurani, A. et al. (2021) 'A comprehensive comparative study of Artificial Neural Network
(ANN) and Support Vector Machines (SVM) on stock forecasting,' Annals of Data
Science, 10(1), pp. 183–208. https://doi.org/10.1007/s40745-021-00344-x.
Lee, J.H., Wood, J. and Kim, J. (2021) 'Tracing the trends in sustainability and social media
research using topic modeling,' Sustainability, 13(3), p. 1269.
https://doi.org/10.3390/su13031269.
Nagaraj, N. and J, C. (2021) 'Sentence Classification using Machine Learning with Term
Frequency–Inverse Document Frequency with N-Gram,' in Soft Computing Research
Society eBooks, pp. 337–346. https://doi.org/10.52458/978-81-95502-00-4-35.
Najafabadi, A.J., Skryzhadlovska, A. and Valilai, O.F. (2024) 'Agile Product Development by
Prediction of Consumers’ Behaviour; using Neurobehavioral and Social Media Sentiment
Analysis Approaches,' Procedia Computer Science, 232, pp. 1683–1693.
https://doi.org/10.1016/j.procs.2024.01.166.
Nandi, G. and Sharma, R.K. (2020) Data Science Fundamentals and Practical Approaches:
Understand why data science is the next.
https://openlibrary.telkomuniversity.ac.id/pustaka/165518/data-science-fundamentals-
and-practical-approaches-understand-why-data-science-is-the-next.html.
Nasir, V.A. et al. (2021) 'Segmenting consumers based on social media advertising perceptions:
How does purchase intention differ across segments?,' Telematics and Informatics, 64, p.
101687. https://doi.org/10.1016/j.tele.2021.101687.
Pavarino, E.C. et al. (2023) 'mEMbrain: an interactive deep learning MATLAB tool for
connectomic segmentation on commodity desktops,' Frontiers in Neural Circuits, 17.
https://doi.org/10.3389/fncir.2023.952921.
Pierce, P.P. et al. (2021) 'Social network analysis: Exploring connections to advance military
nursing science,' Nursing Outlook, 69(3), pp. 311–321.
https://doi.org/10.1016/j.outlook.2020.12.013.
Qalati, S.A. et al. (2021) 'A mediated model on the adoption of social media and SMEs’
performance in developing countries,' Technology in Society, 64, p. 101513.
https://doi.org/10.1016/j.techsoc.2020.101513.
Rahate, A. et al. (2021) 'Multimodal Co-learning: Challenges, applications with datasets, recent
advances and future directions,' Information Fusion, 81, pp. 203–239.
https://doi.org/10.1016/j.inffus.2021.12.003.
Rodriguez, M.Y. and Storer, H. (2019) 'A computational social science perspective on qualitative
data exploration: Using topic models for the descriptive analysis of social media data*,'
Journal of Technology in Human Services, 38(1), pp. 54–86.
https://doi.org/10.1080/15228835.2019.1616350.
Tembhurne, J.V. and Diwan, T. (2020) 'Sentiment analysis in textual, visual and multimodal
inputs using recurrent neural networks,' Multimedia Tools and Applications, 80(5), pp.
6871–6910. https://doi.org/10.1007/s11042-020-10037-x.
Zion, G.D. and Tripathy, B.K. (2020) 'Comparative analysis of tools for big data visualization
and challenges,' in Springer eBooks, pp. 33–52. https://doi.org/10.1007/978-981-15-
2282-6_3.