0% found this document useful (0 votes)
15 views56 pages

Final Report Mahnoor

Uploaded by

faizansmemon7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views56 pages

Final Report Mahnoor

Uploaded by

faizansmemon7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Table of Contents

1. Introduction...................................................................................................3
1.1 Background............................................................................................3
1.2 Problem Statement....................................................................................3
Data Diversity:...............................................................................................3
Volume and Complexity:...................................................................................3
Sentiment and Engagement:...............................................................................3
Behavioral Patterns:.........................................................................................3
1.3 Objectives..............................................................................................4
1.4 Datasets Used..........................................................................................4
Dataset 1: Social Media Posts Dataset..................................................................4
Dataset 2: Social Media Usage and Emotional Well-Being (Testing Dataset)..................5
1.5 Report Structure.......................................................................................5
2. Literature Review...........................................................................................7
2.1 Introduction............................................................................................7
2.2 Social Media as a Data Source.....................................................................8
2.3 Main Fields of Social Media Data Analysis.....................................................9
2.3.1 Sentiment Analysis.............................................................................9
2.3.2 Identifying Trends and Topics..............................................................10
2.3.3 User Interaction and Usage Pattern Study...............................................11
2.3.4 Social Network Analysis (SNA)...........................................................13
2.4 Methods for the Analysis of Social Media Data..............................................13
Foundations of Mathematics and Computing off Stream in Social Media Analysis.........13
Polynomial Approximations.............................................................................14
Homomorphic Encryption:..............................................................................14
Visualization and Modeling.............................................................................14
2.4.1 Natural Language Processing (NLP)......................................................14
2.4.2 Machine Learning and Artificial Intelligence...........................................15
2.4.3 Data Visualization............................................................................15
2.5 Difficulties in Analyzing Social Media Data..................................................16
2.6 Gaps in the Literature..............................................................................16
2.7 Conclusion............................................................................................17
3. Methodology...............................................................................................18
3.1 Data Collection......................................................................................18
3.1.1 Twitter & Facebook Dataset................................................................18
3.1.2 Social Media Usage and Emotional Well-being Dataset..............................19
3.1.3 Ethical Considerations.......................................................................19
3.2 Data Preprocessing.................................................................................19
3.2.1 Preprocessing of Twitter Data..............................................................19
3.2.2 Data Cleaning of the Emotional Well-Being Dataset..................................20
3.3 Sentiment Analysis.................................................................................20
3.4 Engagement Analysis..............................................................................21
3.4.1 Preprocessing Twitter & Facebook Data.................................................21
3.4.2 Preprocessing the Emotional Well-Being Dataset......................................22
3.4.3 Sentiment Analysis...........................................................................22
3.5 Advanced Computational Techniques..........................................................23
3.6 Social Network Analysis..........................................................................23
3.7 Visualization.........................................................................................24
3.8 Reporting and Recommendations Encompassing............................................24
3.9 Limitations...........................................................................................25
4. Results, Discussion, and Recommendations.........................................................26
4.1 Results.................................................................................................26
4.1.1 Twitter Dataset Analysis.....................................................................26
4.1.2 Well-Being Dataset Analysis...............................................................31
4.1.3 Facebook Dataset Analysis.................................................................32
4.2 Discussion............................................................................................37
4.2.1 Engagement and Sentiment.................................................................37
4.2.2 Platform-Specific Usage....................................................................37
4.2.3 Topic Trends...................................................................................38
4.3 Recommendations..................................................................................38
4.3.1 For Businesses.................................................................................38
4.3.2 For Policymakers.............................................................................38
4.3.3 For Researchers...............................................................................39
4.3.4 Conclusion.....................................................................................39
5. Conclusion..................................................................................................40
5.1 Overview of the Study.............................................................................40
5.2 Key Findings.........................................................................................40
5.2.1 Engagement Patterns.........................................................................40
5.2.2 Emotional States and Usage................................................................40
5.2.3 Trends and Topics............................................................................41
5.3 Contributions of the Study........................................................................41
5.4 Limitations...........................................................................................41
5.4.1 Dataset Completeness........................................................................42
5.4.2 Platform Scope................................................................................42
5.4.3 Emotional Data Representation............................................................42
5.4.4 Topic Modeling: Subjectivity..............................................................42
5.5 Recommendations..................................................................................42
For Businesses.............................................................................................42
For Researchers............................................................................................43
5.6 Future Directions....................................................................................43
5.7 Final Remarks.......................................................................................43
6. Future work.................................................................................................45
6.1 Expanding Data Sources...........................................................................45
6.2 Improving the Methods of Sentiment Analysis...............................................45
6.3 Understanding the level of engagement and creating a user profile.......................46
6.4 Moving Average, Trends and Topics............................................................46
6.5 Advanced Social Network Analysis.............................................................47
6.6 Key Applications of Real-time Analytics......................................................47
6.7 Professional Standards and Ways to Eliminate Biases.......................................47
6.8 Conclusion............................................................................................48
References........................................................................................................49
Table of Figures
Figure 1: The overview of Twitter Dataset.................................................................29
Figure 2: Average Engagement Metrix Result of Data in MATLAB..................................30
Figure 3: Results of Sentiment Analysis of Data in MATLAB.........................................30
Figure 4: Results of Average Sentiment Score of Data in MATLAB..................................30
Figure 5: Scatter Plot of Sentiment Analysis in MATLAB..............................................31
Figure 6: Results of Word Cloud of Data in MATLAB..................................................32
Figure 7: LDA Topic Modelling in MATLAB.............................................................33
Figure 8: Topic Modeling of Data in MATLAB...........................................................33
Figure 9: Hashtag Analysis of Data in MATLAB.........................................................33
Figure 10: Well-Being Dataset overview Using Matlab...................................34
Figure 11: Analysis of Engagement Patterns in MATLAB..............................................34
Figure 12: Correlation Analysis of Data in MATLAB...................................................35
Figure 13: Facebook Dataset Overview using MATLAB...............................................35
Figure 14: Temporal Trend of Open Analysis in MATLAB.............................................36
Figure 15: Temporal Trend of Close Analysis in MATLAB............................................37
Figure 16: Temporal Trend of High Analysis in MATLAB.............................................38
Figure 17: Temporal Trend of Low Analysis in MATLAB..............................................39
Figure 18: Temporal Trend of Volume Analysis in MATLAB..........................................40
1. Introduction
1.1 Background
Thus, social networks, including Facebook, Twitter, Instagram, LinkedIn, TikTok, etc, are
important in their daily existence (Bengtsson and Johansson, 2022). These platforms also
generate a significant amount of user content, including texts, images, videos, and metadata, all
of which enhance our knowledge of users' willingness, attitude, and behaviour. However, this
abundance of data is a major problem, primarily due to the data's volume, variety, and
heterogeneity. This data cannot be analyzed manually anymore, so there is a need to use
automated approaches to get the desired information (Corvite and Hui, 2024).

Social media data analysis has emerged as a critical tool for organizations, marketers, and
researchers who seek to understand how users interact with content, the underlying affective
states of such engagement, and the processes occurring in social media (Horova et al., 2024). In
recent writing, research has recently incorporated machine learning, natural language processing,
and sentiment analysis to analyze social media data. For this purpose, MATLAB is ideal for
analysis since it is a very effective computational platform, especially in statistics, learning and
data visualization (Chukwunweike et al., 2024). In particular, with the help of the effective
functions of MATLAB, this study aims to analyze user behaviors, perceptions, and interactivity
concerning different social media platforms to offer valuable suggestions that organizational
actors can use to improve their outcomes (Dunsin et al., 2024).

This paper uses MATLAB to analyze social media data, with special reference to user
interaction, sentiment analysis, and trends (Najafabadi, Skryzhadlovska and Valilai, 2024). This
research will reveal the nature of user engagement and how emotions affect users' behaviour.
This approach aims to obtain practical conclusions, enhance user interaction on social media, and
produce effective business, marketing and content strategies for different prospective producers.

1.2 Problem Statement


Owing to the rising incidences of social media usage, large volumes of unstructured data have
emerged that could be more manageable for organizations in terms of monitoring the usage by
users. The data shown in the current work is as diverse as textual, image, video, and metadata –
all of which come with certain difficulties in terms of what is referred to as data processing,
content understanding, and, ultimately, insight generation (Rahate et al., 2021). The old ways of
content monitoring and basic measures of engagement (likes, shares, comments) are insufficient
to describe the richness of users' activity or social interactions.

The primary challenges in analyzing social media data include:

Data Diversity:
Social media data consists of different types, including text, images, video, and metadata, which
need to be analyzed and integrated in different ways (Afyouni, Aghbari and Razack, 2021).

Volume and Complexity:


The sheer volume of data produced every day makes it impossible to analyze it manually,
requiring the use of tools that can work on large datasets.

Sentiment and Engagement:


The sentiment of the users or how they feel based on the posts that they make can also be
assessed from this element or feature, and it is very important to assess engagement, such as
likes, shares, and comments, from the understanding of the ad's emotional pull (Habib and Raza,
2021).

Behavioral Patterns:
Recognition of regularities of user interactions and level of engagement in general and across
various social networks, in particular, is still a challenging task because of cultural points of view
and, for example, differences in impact and effects of various social networks.

This research addresses these challenges by developing a MATLAB method for analyzing social
media data. The purpose is to create models to analyze user interactions, determine their
attitudes, and detect new network trends. Specifically, this study will be useful for businesses,
marketers, and researchers by offering MATLAB-based instruments to help understand SM
interactions and make more relevant decisions based on such knowledge (Bilinski, 2022).
1.3 Objectives
This research will analyze social media data to uncover patterns in user engagement, sentiment,
and trending topics. The specific objectives of this project are as follow

 To track common tendencies in users' behavior, compare the degree of activity of users
with different indicators (likes, shares, comments, and mentions) in different social
networks.
 To analyze the correlation between sentiment and user engagement, sentiment analysis is
applied to establish a positive/negative/neutral correlation between social media postings
and the level of shares, comments, and likes generated.
 To discover and categorize the topics that are most popular at the present time by using
the methods of keyword frequency analysis and topic modelling to determine what
subjects are most actively discussed at the present time and how these trends develop in
different social networks.
 To compare engagement and positive and negative tones determined by the user age,
gender or geographic location and how they impact behavior online and social media
platform interaction.
 The purpose of this study is to analyze and identify major results presented in the data,
such as the users' overall sentiment and frequency of activity, to facilitate the assessment
of the discovered patterns through the utilization of charts, graphs, and word clouds.
 To provide practical recommendations for businesses, marketers, and content producers
based on the analysis, such as content plans, interaction techniques, and enhancements to
the user experience on social networks.

1.4 Datasets Used


This study uses two datasets to analyze social media interactions, user engagement, and
sentiment:

Dataset 1: Social Media Posts Dataset


The first dataset includes posts, comments, and engagement (likes, shares, comments) from
Twitter regarding a global political event. It also has general information about the user, like
geographical position, number of followers, and peculiarities of certain OS. This dataset will be
used for sentiment analysis and to determine the relationship between the emotional tone of the
user posts and the engagement levels, which include retweets and comments.

Dataset 2: Social Media Usage and Emotional Well-Being (Testing Dataset)


The second dataset is concerned with the correlation between the frequency of social media use
and subjective well-being. It contains parameters that quantify various aspects of social media
usage: duration spent on social networks during a day, number of posts created (and vice versa –
received) during a day, number of likes, comments, and messages sent, major affective tone of
their owner (e.g., happy, sad, anxious, bored, neutral). The data on social media will be collected
to show the existence and degrees of relationships between the variables and to compare the
positive and negative effects of the types of social media network platforms and their usage.

The combination of these datasets enables the assessment of users' activity and mood, as well as
the identification of patterns in the relationship between social media activity and mood and the
differences between different users and platforms.

1.5 Report Structure


The remainder of the report is structured as follows:

Chapter 2:

Literature Review: This chapter will present a review of the literature on social media data
analysis, with emphasis on sentiment analysis, topic modeling, and engagement analysis. It will
discuss the difficulties and approaches applied in the field and provide the background for this
study.

Chapter 3:

Methodology: This chapter will describe the data analysis process for the collected datasets,
including data cleaning, sentiment analysis, topic modeling, and demographic analysis. It will
also describe in detail the application of MATLAB's computational tools.

Chapter 4:

Results and Discussions: This chapter will encompass the findings from the analysis part of the
study, highlighting aspects and main observations on user engagement, attitude, and trends. The
findings will be discussed considering prior studies, and conclusions will be made concerning
users' behavior and social media usage.

Chapter 5:

Conclusion and Recommendations: This last chapter will synthesize the study's results, evaluate
the research's constraints, and provide recommendations for businesses, marketers, and content
producers based on the study's findings.
2. Literature Review
2.1 Introduction
Social sites like Facebook, Twitter, Instagram, Linked In, TikTok and many others generate
immeasurable amounts of user output daily. This data is diverse and dynamic; it records the
current interactions of billions of users across the globe (Camacho et al., 2020). This wealth of
data tells the story of the user’s activity, behavior, perception, and trends in status updates,
comments, likes, posts, links, pictures, videos, and audio. Therefore, analyzing this data has
become an important activity for businesses, researchers, and policymakers, who need to monitor
the opinion and behavior of the population and their interactions with products and services
(Habib and Raza, 2021).

The issue is that social media data is not uniform; it includes quantitative data such as time
stamps and user characteristics and qualitative data such as text, images, and videos. This data is
diverse and complex, and it needs to be analyzed using methods beyond the scope of this paper
(Nasir et al., 2021). More than basic computational tools are required to obtain useful
information and knowledge; applying the more sophisticated technological tools, including
machine learning, sentiment analysis, topic modeling, and social network analysis, is inevitable.
These techniques help in pattern mining, predictive modeling and trend analysis, which are
important in fast-moving environments (Najafabadi, Skryzhadlovska and Valilai, 2024).

In addition, it means that data provided by the various social platforms is significant for the
platforms themselves and for the overall society goals and objectives, including election
campaigns, catastrophe response, and public health endeavors (Pierce et al., 2021). For instance,
during the COVID-19 health crisis, social media was primarily used to inform the public and to
measure its opinions on healthcare policies or vaccines. Alas, the enormity and diversification of
social media data requires a strong analytical framework and tools to unleash its potential
(Nagaraj and J, 2021).

This chapter aims to review the social media data analysis literature about methods, tools, and
issues. It also points out the areas in the literature that need improvement, thus indicating a
trajectory for the expansion of the research area (Dukovski et al., 2021). Thus, the goal of this
study is to contribute to the identification of the current state of social media data analysis and
help build a framework for extracting valuable information from social media data (Busalim,
Ghabban and Hussin, 2020).

2.2 Social Media as a Data Source


Social networks have greatly impacted how users interact, share information, and find
themselves indispensable to millions. Most popular social platforms such as Twitter, Instagram,
and TikTok allow people to share their thoughts, pictures, videos, or broadcasts immediately,
which is a shocking data flow. As (Nasir et al., 2021) highlight, this pool of information makes it
a gold mine of key information about people's behaviors, feelings, and interactions useful for
business analysis, marketing, political campaigning, and academic explorations, among other
uses.

Another important feature of social media data is that it is diverse. In developed countries, any
account, post, comment, like, share, and multimedia content in images and videos shape the
multidimensional dataset. These elements offer a different view of user activity and interaction.
For instance, frequency analysis of hashtags can help identify hot topics or trends, while the
comments made by the users may indicate the public's perception of a certain brand, event, or
policy (Chukwunweike et al., 2024).

However, this is also a major problem because there are many different types of people. Most of
the data collected from social media is unstructured and, therefore, cannot be easily analyzed
using conventional tools. According to (Corvite and Hui, 2024), managing the amount and
diversity of this data is a challenging task and requires proper preprocessing and analysis tools.
Some preprocessing techniques include text cleansing, noise reduction, and feature creation,
which help transform data from raw input to useful forms.

Another important characteristic of social media as a data source is that it is real-time data. While
other types of datasets are static, social media data is dynamic, meaning it captures a process as it
unfolds. That is why social media is so useful for monitoring what is going on now, popular
topics, and conflicts. For example, people turn to social networks to receive updates on natural
disasters so the authorities can track regions requiring attention and organize necessary
assistance (Nandi and Sharma, 2020).
However, using social media data has benefits, which can be realized if several issues are
considered. The data is frequently 'noisy' and contains many unrelated items, including
colloquialisms, acronyms, and sarcasm, which, if introduced into the equation, can skew the
results. In addition, the ethical issues of using data that can identify the user, and their details
must always be well addressed.

Altogether, social media can be viewed as the major source of big data regarding peoples'
behavior and opinions and overall trends in society (Rodriguez and Storer, 2019). However, to
achieve this, strong analytical tools are needed that can effectively capture the complexity of the
system and solve the problems associated with it. This highlights the calling of systems such as
MATLAB, which provide abilities for managing, deciphering, and portraying big information
from social media for vast research (Camacho et al., 2020).

2.3 Main Fields of Social Media Data Analysis


There are many approaches for data analysis on social media platforms, and analysis revolves
around several key issues to address human behavior and assess user interaction. These fields
demonstrate the relevance and flexibility of training for obtaining valuable insights from the
extra-systemic users (Kurani et al., 2021).

2.3.1 Sentiment Analysis


One of the most popular techniques in SMM is sentiment analysis, which allows organizations to
determine the public's attitude to products, services, or events. Known as the article classification
technique, it is useful in that it helps business entities, whether a firm or an enterprise,
understand how customers view and feel about them.

Hailong et al. (2014) have pointed out that lexicon-based methods use sentiment dictionaries that
assign sentiment values to words. Although less complex than previous methods, they can be less
effective in capturing context, irony, or other contextual features and linguistic patterns. On the
other hand, machine learning techniques utilize supervised learning algorithms for more accurate
sentiment prediction. Current techniques like Support Vector Machines (SVMs), Random
Forests, and Neural Networks have worked far better in sentiment classification (Bengtsson and
Johansson, 2022).
Transformer-based models like BERT (Bidirectional Encoder Representations from
Transformers) have recently gained much strength in sentiment analysis. The contextual
embeddings and self-attention mechanisms make BERT models better understand the
dependency of the words in the sentence than in the traditional model. For example, sarcasm
identification, a well-known problem in sentiment analysis, has been reported to benefit from
such models (Bilinski, 2022).

For business applications, sentiment analysis is indispensable for supervising clients' opinions,
observing public mood during an election campaign, and tracing changes in consumers'
perceptions of a brand. For instance, during a product launch, firms can monitor the social media
response, look for problems that may arise, and then solve them. This dynamic capability shows
that sentiment analysis is valuable in the current business environment (Chukwunweike et al.,
2024).

2.3.2 Identifying Trends and Topics


Trending and topics both involve finding the most discussed issues and topics across social
media platforms. Thanks to the quantitative analysis of keywords, hashtags, and phrases, as well
as their density, researchers can find patterns in public discussions and gain insights into what
interests’ people (Qalati et al., 2021).

Keyword density analysis is one of the easiest ways to identify trends based on the number of
specific terms in each set. However, more complex methods like Latent Dirichlet Allocation
(LDA) make the topic modeling by detecting other forms of constructor thematic latent within
the large text corpora. LDA team's words often co-occur, showing communities of related themes
(Corvite and Hui, 2024).

In addition, temporal analysis also helps to enhance the trend identification process since topics
concerning frequency changes are examined. Lecture: For instance, #MeToo or #ClimateChange
are the sorts of hashtags that might be used to increase activity levels during certain events.
Knowing these temporal patterns enables an organization to address user concerns as they arise
(Afyouni, Aghbari and Razack, 2021).

Research has also examined dynamic topic modeling, which tracks shifts in topic frequency over
time. When used with real-time monitoring instruments, such methods can uncover trends as
they emerge, making them ideal for real-time decision-making processes characterized by high
levels of uncertainty, such as marketing or disaster management (Gkikas et al., 2022).

Primary Research in Social Media Data


A primary research study in social media analytics involves exploring data sets such as
information logs of an organization’s Twitter profiles, posts in Instagram accounts, or comments
on YouTube clips (Kordzadeh and Young, 2020). These studies usually involve tracking user
metrics, which include the interaction (likes, shares, comments), the attitude (positive, negative,
neutral), and the user characteristics (age, gender, location). Schneider and Watson have explored
datasets like “Social Media Usage and Emotional Well-Being,” which helps to set up a context
for the kind of interconnection between interaction and mental functioning that would be
documented, as measured by engagement or activity data.

Secondary Research in Social Media Data


Secondary research is useful in the first studies because it synthesizes data in different platforms
and contexts. For example (Bilinski, 2022) explained the case where communication analysis of
text content from Twitter is complemented by the multimedia data of Instagram. Such studies
also highlight the need for using both primary and secondary data sources and the fact that using
primary data sources to analyze trends in specific platforms does not rule out using secondary
data sources to get a broader perspective.

However, using social media data can be beneficial only if several challenges are solved. In most
cases, the data is ‘noisy’: it comes with so much unwanted information, such as informal
language, slang, abbreviations, and even irony! Moreover, the ethical considerations of using
users’ information must be observed more closely, especially when the analyzed data are
classified as private or contain an individual’s personal information (Dunsin et al., 2024)..

2.3.3 User Interaction and Usage Pattern Study


Frequently, information about users' engagement with the posted content is also critical in
evaluating communication strategies (Gkikas et al., 2022). Shares, likes, comments, and retweets
are quantifiable measures of end users' engagement with content. These metrics are not only
indicators of content quality but also the result of circumstances like the time of the post or
audience and platform peculiarities.
The number of entries shared may depend significantly on networking effects, where customers’
actions determine the growth of content sharing (Drivas et al., 2022). Studying these interactions
helps firms identify the best way to manage their content. For instance, a post with any form of
video or infographics generates more engagement than a plain text post.

Another important factor for considering usage patterns is demographic factors. This means that
audience segmentation classifies users by age, sex, geographical location, and interests to
understand the audience’s needs regarding communication better. For example, short and funny
videos may be popular among the youth on TikTok, while selected and serious posts may be
searched for on LinkedIn by working people interested in professional content (Corvite and Hui,
2024).

Moreover, combining demographics and natural language processing provides more information
about user behaviour trends. For instance, some genders may have higher or lower effective
reactions and emotional attitudes towards specific topics, which defines their relations with
content. This information is useful for developing targeted advertising messages and customer
retention (Afyouni, Aghbari and Razack, 2021).

Analyzing the interaction with social media content requires focusing on user engagement and
usage behaviour. This field uses likes, shares, follow-ups, comments, and retweets (Nandi and
Sharma, 2020). These metrics define the content and can be viewed as quantitative measures of
its popularity and the message volume in the user community.

Incorporating Technical Applications: The application of complex algorithms enhances the


assessment of user engagement since it is possible to classify the audience based on the level of
activity (Corvite and Hui, 2024). For example, clustering techniques mean the classification of
the users based on their interacting patterns and regression analysis predicts the chances of
interaction based on the content features. Although polynomials or machine learning tools like
MATLAB can be used to study more intricate user characteristics, engagement segmentation
sharpened by researchers is possible.
2.3.4 Social Network Analysis (SNA)
Social Network Analysis (SNA) refers to interactions within social networking sites (Pierce et
al., 2021). Indeed, SNA reveals relations between users, information flow within the network,
power of opinion makers, and group formulation.

The aspects of network properties are usually presented through parameters such as centrality,
density, and clustering coefficients. Measures like centrality select the most popular user or the
user with the biggest audience or connection level in the network. Such users are referred to as
opinion leaders, and they are very influential in society and engagement.

SNA is most useful when it is used to identify which communities are similar in terms of interest
or activity (Qalati et al., 2021). By identifying the groups of connected users, the researchers can
target them with the content, making the content more relevant. SNA is also helpful in tracking
the spread of messages, whether good or bad, for use in developing mechanisms of discouraging
the spread of negativity.

Another of the more challenging applications of SNA is sentiment propagation analysis, which
examines how sentiments are spread through a network. Such information can help an
organization predict campaigns' impact or identify potential problems before they emerge
(Chakraborty, Bhattacharyya and Bag, 2020).

2.4 Methods for the Analysis of Social Media Data


The growth of social media data and its content density calls for more complex analysis and
methods. Many tools are available today for social media data analysis. However, the preferable
tool is MATLAB since it has a user-friendly interface that is simple, easy, and comprehensive for
data preprocessing, analysis, and visualization (Pavarino et al., 2023).

Foundations of Mathematics and Computing off Stream in


Social Media Analysis
Due to social media data being big and unstructured, a high level of mathematics and
computation is required to solve the problems. MATLAB is among the most elaborate
computation tools and is well-equipped for handling various data types for preprocessing,
analysis and visualization (Horova et al., 2024).
Polynomial Approximations
For large data and the results from the observation made earlier, polynomial approximations
could help compute large text data sets. These approximations are particularly beneficial for
trend analysis, where numerous records must be divided and analyzed (Gheisari et al., 2023).

Homomorphic Encryption
One of the biggest issues with social media analytics is the inability to have privacy.
Homomorphic encryption deals with this issue by allowing data processing on encrypted data to
ensure users’ confidentiality. Using encryption methods in MATLAB, intensive data analysis of
relevant and often sensitive data like mental health indicators or demographics data can be
performed safely (Anzano-Oto, Vázquez-Toledo and Latorre-Cosculluela, 2023).

Visualization and Modeling


MATLAB offers simple and easy-to-understand word clouds, network graphs, and time series
plots for data visualization (Zion and Tripathy, 2020). In addition to improving result readability,
such graphics also shed light on patterns in user engagement and sentiment.

2.4.1 Natural Language Processing (NLP)


Text analytics is the key component of social media analysis since NLP extracts relevant
information from messages. MATLAB's NLP offering contains functions such as text cleaning,
tokenization, sentiment analysis, and topic models.

There are various preprocessing steps, such as Stop word removal, stemming, and lemmatization,
which is essential for preparing text data for analysis. MATLAB has functions that make these
processes easier and more uniform to perform. The data can be analyzed again using a lexicon-
based method or more complicated machine learning classifiers in case the data is large
(Chukwunweike et al., 2024).

LDA and other topic models allow latent themes to be aggregated from the text. When these
topics are visualized, researchers can get a better perspective of the users and market trends (Lee,
Wood and Kim, 2021).
2.4.2 Machine Learning and Artificial Intelligence
Huge advancements in social media analytics have been observed due to machine learning
methods that allow predictive modeling and pattern analysis. MATLAB's available edition in the
machine learning category contains a comprehensive list of algorithms, including simple
algorithms like Support Vector Machine and complex neural network algorithms (Kurani et al.,
2021).

In sentiment analysis, supervised learning depends on training data, examples labeled in


sentiment type. This paper aims at radar monitoring and feature extraction methodologies, such
as the Term Frequency-Inverse Document Frequency (TF-IDF), to convert text data into
numerical data as appropriate for the machine learning algorithm (Nagaraj and J, 2021).

Besides sentiment classification, machine learning models can estimate the level of user
engagement based on previous experience. For instance, they can predict the chances of a
particular post going viral. These metrics are useful for businesses to align their content
marketing strategies.

2.4.3 Data Visualization


Visualization is an important part of analyzing social media data because it helps researchers
share their results. Some of the valuable features of MATLAB in data visualization are time
series plots, word clouds, and network plots, which give a glance at the data's real situation
(Nandi and Sharma, 2020).

Line graphs ease color coding to indicate trends in sentiment or engagement over time as
influenced by events or changes in user behavior. In contrast to word clouds, which show which
terms or hashtags appear most often, line graphs reveal trends in the given topic area.

Graphic plots are especially useful for SNA because they represent users and their relationships.
Through such networks, researchers can pinpoint the key opinion leaders, group the users, and
track the flow of information in social media (Camacho et al., 2020).

2.5 Difficulties in Analyzing Social Media Data


Social media analytics faces several challenges that must be addressed to ensure accurate and
ethical insights:
Data Noise and Ambiguity: Social media content's LOs also contain informal language, numbers,
abbreviations, and irony, which makes them difficult to analyze (Habib and Raza, 2021). To
understand this, the most sophisticated NLP methods are necessary.

Real-Time Analysis: Social media is constantly changing, so data has to be collected live to
capture evolving trends. Building large systems that can process data in real-time is a technical
problem.

Ethical Concerns: Processing personal data is a privacy issue, especially under the GDPR
regulation. The authors must be compliant and use anonymization measures to safeguard the
users' identities (Bengtsson and Johansson, 2022).

2.6 Gaps in the Literature


Several gaps in social media data analysis warrant further research:

Integration of Multimodal Data: Many works are based on textual data, while the potential of
images, videos, and emojis is not considered (Tembhurne and Diwan, 2020).

Real-Time Monitoring: Although post-event analysis is typical, monitoring social media activity
in real-time is still a relatively uncharted area.

Cross-Language and Cross-Platform Analysis: Little work has been done in the area of cross-
language and cross-platform analysis, which could give a better picture of the trends happening
around the world.

Demographic and Psychological Insights: Demographic segmentation is widely used, but more
needs to be known about the psychological factors that affect the use of social networks (Nasir et
al., 2021).

2.7 Conclusion
Social media data analysis is a relatively new and growing study area with important
implications for organizations, scholars, and governments. Sentiment tracking, trend
identification, and social network mapping, among others, are useful paradigms in understanding
the users and social trends. Software like MATLAB offers a strong ability to handle big and
complex data, and the researchers have the advantage of graphics to reveal significant
relationships and patterns.

Nevertheless, difficulties like noisy data, ethical issues, and quickly evolving trends demonstrate
the increased emergence of new methods. Thus, the further development of the presented
approach, which focuses on the problems of multimodal integration, real-time monitoring, and
cross-platform analysis, will contribute to improving the understanding of social media's role in
society.
3. Methodology
This research employs a sound and systematic approach to examine social media data
systematically. It leverages three datasets: (1) a Twitter dataset containing tweets, user activity,
tweet level engagement, and trends related to a specific worldwide political event; (2) a second,
more structured dataset collected through surveys that offer demographic information about the
user, their emotional state, and engagement habits, (3) a third dataset that of Facebook dataset
which consists of identifying users that can be focused more to increase the business. These
valuable insights should help Facebook make intelligent decisions to identify its useful users and
provide correct recommendations to them. MATLAB is the dominant computational
environment and provides a toolbox for data manipulation, analysis, and visualization tools. This
is accomplished by data collection, preprocessing in detail, using high-end computational tools,
and interpreting the results.

3.1 Data Collection


The collection of the two datasets was informed by the fact that this study is interested in
understanding user behavior and sentiment in various settings. The Twitter sample obtained
through the Twitter API is raw, real-time data collected differently than the "Social Media Usage
and Emotional Well-Being" dataset, which provides quantified data to achieve a balanced
analysis of social media usage.

3.1.1 Twitter & Facebook Dataset


The Twitter and Facebook datasets were obtained using specific hashtags and keywords related
to a global political event. This is because the following hashtags were relevant indicators of the
discourse: #Elections, #Democracy. The dataset consists of textual data, specific engagements,
and meta-information of the tweets, retweets, or replies in the form of likes and comments,
adopted shares, date, time, geographical location, and followers. These points provide context for
understanding users' behaviors, engagement, and sentiment analysis.

The collection process was also characterized by handling the Twitter API's rate limits through
paginated requests and the extraction of data over several days. This approach provided enough
coverage of the user activity, especially in terms of time, during the most important moments of
the event, for instance, debates or the declaration of the results.

3.1.2 Social Media Usage and Emotional Well-being Dataset


The second data set was collected from an open-source database and included semi-structured
data on user activity on different platforms. Key attributes include:

Demographics: Age, gender, and geographical location.

Engagement Metrics: Number of posts, likes, comments, and messages sent daily.

Emotional States: Predetermined emotions are labeled as dominant emotions, such as happy, sad,
and anxiety.

This dataset allowed for demographic targeting, making it easy to compare engagement and
emotional differences among users. In contrast to the Twitter dataset, this dataset needed little
preprocessing before analysis since it was already structured.

3.1.3 Ethical Considerations


The process of data collection was highly ethical. All the datasets were de-identified to remove
the possibility of PII being stored in the datasets. In this study, GDPR and other related data
protection guidelines were complied with. The data used was only available in the public domain
or datasets that had been made available for research purposes, and with the consent of the
participants, the results were presented in a manner that would not identify any individual.

3.2 Data Preprocessing


It was important to clean the raw and incomplete data and prepare it for analysis. This step
focused on the issues related to unstructured textual data and structured numerical or categorical
data.

3.2.1 Preprocessing of Twitter Data.


The Twitter dataset was cleaned in several stages using MATLAB's Text Analytics Toolbox. To
enhance the analysis of sentiment and topics, textual content was cleaned to extract only
important information but to keep the features that would inform sentiment and topic
identification.

First, noise, including URLs, mentions (@username), and special characters, were removed from
the data. Hashtags were kept in the data for trend analysis despite the fact that they are usually
noisy. Emojis containing sentiment-bearing information were translated to text using a custom
dictionary (e.g., 😊 – happy). This step made it possible to include them in the sentiment
analysis.

This ensured that words were separated from the rest of the text to facilitate word-by-word
analysis. The text data was then steamed to remove word suffixes and map them to their base or
stem, such as changing 'running' and 'ran' to 'run.' For instance, the use of 'and' and 'the' was
omitted only to consider meaningful words. Lastly, in order to focus only on the tweets that
convey an emotion, those with little semantic content, like only URLs and vague words, were
removed.

3.2.2 Data Cleaning of the Emotional Well-Being Dataset.


For the "Social Media Usage and Emotional Well-Being" data set, there was a need to address
missing or incorrect data values. These are: In numeric fields such as 'Daily Usage Time,'
missing values were replaced with the median value to prevent listwise deletion of data;
categorical variables such as 'Dominant Emotion' were replaced by the most frequently occurring
value in the data. Outliers, for instance, usage times beyond 24 hours, were detected using z-
scores and were removed from the analysis.

Time series data in both datasets were converted to MATLAB's datetime format and then
grouped into hourly, daily, and weekly frequency. Such standardization was helpful in analyzing
the temporal dynamics of users' activity and sentiment.

3.3 Sentiment Analysis


The research on sentiment analysis classified the sentiment of the tweets using both lexicon-
based and machine-learning methods.

Sentiment analysis based on lexicon used sentiment dictionaries that were available in
MATLAB. The sentiment of each tweet was determined by adding the sentiment values of the
words used in the tweet. For instance, the tweet "I love this product! 😊," because the positive
word is "love" and the positive emoji is "😊." Nevertheless, techniques based on lexicons could
not recognize contextual phrases or words, such as sarcastic or ambiguous repetitions that
required machine learning classifiers. Some Preliminary Analyses of the Emotional Well-Being
dataset text-dependent phrases, including sarcasm or ambiguous language, called for machine
learning classifiers.

The screen message and sentiment classification through machine learning was done by feature
extraction using the Term Frequency-Inverse Document Frequency (TF-IDF), which established
the relevance of the terms in the dataset. Classification algorithms like SVMs and Random
forests were used to predict the sentiments of the text into three categories: positive, negative, or
neutral. All these models were assessed based on metrics such as F1 score and Area Under the
Curve (AUC), with the models performing at efficiencies of more than 85%.

3.4 Engagement Analysis


The engagement analysis was a major part of this study to measure and explain the level of user
interactions with the content shared on different platforms. Quantitative Indicators were used to
capture the users' engagement, including likes, shares, comments, and retweets. Engagement
rates were computed by dividing these metrics by the total number of posts to compare user
activity levels. A total number of times the post was liked, shared, and commented on (Total
Interactions), which is essential to transform raw, noisy data into structured formats suitable for
analysis. This step addressed challenges unique to unstructured textual and structured numerical
or categorical data.

3.4.1Preprocessing Twitter & Facebook Data


The Twitter dataset was cleaned in several stages using MATLAB's Text Analytics Toolbox.
Most text content was preprocessed to minimize noise while attempting to keep aspects
important to sentiment and topic analysis.

First, noise was eliminated, including URLs, mentions (@username), and special characters.
Although frequently loud, hashtags were kept for trend analysis because they inform user
discussions. Since emojis contain sentiment-bearing information, they were translated into text
(e.g., 😊 = happy). This step made them captured in sentiment analysis.
Word tokenization was employed during text preprocessing to break it down into words for finer
analysis. Next, numbers were removed from the words, and the words were then lemmatized to
the stem, standardizing as "running" and "ran" to "run". Only terms with valuable content were
researched, such as 'and' or 'the'. Last, tweets with less semantic meaning, for example, those
with links or only containing general words, were removed to focus on the tweets with the
sentiment.

3.4.2Preprocessing the Emotional Well-Being Dataset


The "Social Media Usage and Emotional Well-Being" dataset involved preprocessing by dealing
with missing and erroneous values. In the case of numeric fields such as 'Daily Usage Time,'
missing values were imputed using the median to prevent bias. In contrast, for categorical fields
such as 'Dominant Emotion,' missing values were imputed using mode. Extreme values, for
example, usage times that are beyond a reasonable expectation of 24 hours, were removed using
z-scores.

Time series data in both datasets were converted to MATLAB datetime format to facilitate
binning into hourly, daily, and weekly bins. This standardization helped analyze temporal
patterns of users' activity and attitude.

3.4.3Sentiment Analysis
Sentiment analysis tries to classify tweets' emotional polarity while also employing hybrid
methods based on lexicon and machine learning.

Lexicon-based sentiment analysis uses sentiment dictionaries that are available in MATLAB.
The sentiment score of each tweet was obtained by adding the sentiment scores of the words that
made up the tweet. For instance, the positive score was obtained from words such as "love" and
the "😊" emoticon. Nevertheless, methods based on a fixed lexicon were unsuitable for context-
dependent phrases, which include sarcasm or ambiguous language, making it clear that machine
learning classifiers would be more appropriate.

Applying machine learning to sentiment analysis, feature extraction was carried out with Term
Frequency-Inverse Document Frequency (TF-IDF) to arrive at the importance of terms in the
documents. They used Support Vector Machines (SVMs) and Random Forest classifier theories
and algorithms, which depend on labelled datasets to forecast sequential sentiment vocations.
These models were assessed using the F1-score and Area under the curve (AUC), and the results
obtained were above 85% accuracy.

3.5 Advanced Computational Techniques


Because of the size and type of data collected, the data analysis was done computationally. Non-
linear relationships in engagement metrics were approximated using Piecewise Chebyshev
polynomials. These polynomials provided a mathematically ideal way of modelling these
interactions, such as how sentiment influenced engagement rates. This was useful for
computation because the data was partitioned into intervals, and polynomial approximations
were tuned within such intervals.

Several measures of anonymity were integrated into the analysis to protect any information that
may be sensitive to the user. Specifically, we used CKKS homomorphic encryption to perform
computations on encrypted data so that the raw user information was not revealed during the
computation. For example, demographic and sentiment data were encrypted before analysis to
compute them while preserving the anonymity of the users.

An example of its improvement was through Optimizers like Nesterov Accelerated Gradient
(NAG). They were useful in reducing the computational time required to achieve the
convergence of various models to half when training large datasets. NAG was most beneficial in
sentiment classification and engagement prediction, where speed is important when performing
multiple runs.

3.6 Social Network Analysis


Ties in the corpus were examined with Twitter social Network Analysis. The users were
described as nodes, and the retweet, reply, or mention was described as an edge in the graph. The
network structure provided insight into the flow of information and the users.

The degree of centrality and betweenness of centrality were calculated to identify the key users.
Degree centrality measured the number of first-degree connections a user had, while
betweenness centrality measured a user's capacity to bridge different network sections. The
opinion leaders were identified as the users with high centrality scores, which are the influential
users. These users were very active in sharing content and encouraging other users to share the
content they posted.
In the network, clustering algorithms were applied to detect communities. For instance, samples
of users interacting with hashtags like #ClimateChange were segmented because these people are
interested in climate issues. These clusters were then represented in modularity-based network
graphs to illustrate the users' connectedness.

SNA also helped identify content-sharing trends. For instance, the study showed that the
centrality of the users was a significant factor in the probability of the tweets being retweeted,
proving that such users are opinion leaders.

3.7 Visualization
Visualization was an important part of the analysis as it offered easy-to-understand data
representations. Word clouds presented frequently used words and hashtags, meaning the most
popular items. For instance, the word cloud for the tag #Elections consisted of words such as
vote, poll, and debate, which are the focus of the discussions.

Sentiment and engagement were analyzed over time with Temporal plots. These plots showed
trends, such as increased positive tone during an important announcement or high activity during
sensitive discussions. The plots also gave the timing of user activity and the progression of
event-related discussions from start to end.

The social network diagram depicted the social network organization and depicted how the users
were connected. With the help of these diagrams, Twitter emphasized influential nodes (users)
and the connections that guided the flow of information within the network. When combined
with the community detection results, the diagrams clearly show the user activity and interaction.

3.8 Reporting and Recommendations Encompassing


The analysis's results were summarized into recommendations for businesses, marketers, and
policymakers. For businesses, the study offered recommendations on how to get the most out of
content, including using positive tone and media content. Marketers were encouraged to segment
their campaigns according to demographic preferences, emphasizing age and gender.

For policymakers, the analysis pointed to the need to monitor public discourse sentiment to
intervene when wrong information is present, or the public is concerned. Recommendations also
involved improving users' emotional health using messages and content relevant to their needs.
3.9 Limitations
However, some limitations were recognized concerning the methodology used in the study. The
data were collected from two social media platforms where data are usually text-based, which
might have resulted in different findings from other SNS. Encryption techniques were found to
cause computational delays that affected real-time analysis, but the use of optimized algorithms
helped to reduce these delays to some extent.

The topics in LDA were labeled subjectively since the analysis of the thematic structures
incorporated the researcher's judgment. Cross-validation was done to avoid common bias in topic
labeling, and the results were reviewed by independent personnel. Future research should rectify
these drawbacks by including cross-data platform data and automated topic categorization
methods.
4. Results, Discussion, and Recommendations
4.1 Results

4.1.1 Twitter Dataset Analysis


This section consists of the results of the Twitter Dataset by doing the analysis on MATLAB by
applying the methodology. Here is the overview of the dataset.

Figure 1: The overview of Twitter Dataset

Engagement Metrics
Average likes, retweets, and replies per post were also obtained, determining the overall level of
user engagement with the content. The mean number of likes was established to be 49.93,
retweets 49.72, and replies to NaN, which can be considered as low to moderate engagement of
users in the dataset.

 #COVID19

 #BlackLivesMatter

 #ClimateChange

 #Bitcoin

 #TechNews
Figure 2: Average Engagement Metrix Result of Data in MATLAB

With these hashtags, it could be understood that the dataset contains a vast amount of discourse
on international health crises (COVID-19, etc.), social activism (Black Lives Matter, etc.), and
innovations (Bitcoin, AI, etc.). The use of these hashtags indicates that these issues were most
probably of interest when the tweets were gathered.

Sentiment Analysis
The tweets were also analyzed for sentiment to determine whether they were positive, negative,
or neutral. This study employed a positive and negative word list. Positive words were 'good',
'great', 'awesome', and 'happy', while negative words were 'bad', 'sad', 'hate', and 'terrible'.

The number of positive and negative words in each tweet was counted to arrive at a score for
each. If the tweet had more positive words than negative, it was considered positive, and if it had
more negative words than positive, it was considered negative. Tweets that had an equal number
of positive and negative words were considered neutral.

Figure 3: Results of Sentiment Analysis of Data in MATLAB

Figure 4: Results of Average Sentiment Score of Data in MATLAB

The sentiment scores were appended to the dataset as a new column called sentiment_score, with
positive sentiment assigned a score of 1, negative sentiment a score of -1 and neutral sentiment a
score of 0. This classification allowed us to analyze the correlation between sentiment and user
engagement in further stages.

Figure 5: Scatter Plot of Sentiment Analysis in MATLAB

Word Cloud
A word cloud was created from the cleaned tweet text to analyze the most used terms in the
dataset. Worth mentioning are the words “production,” “politics,” and “development,” which
were expressed frequently when the data was collected from Twitter.
Figure 6: Results of Word Cloud of Data in MATLAB

Topic Modeling
The Latent Dirichlet Allocation (LDA) was used to identify five main topics in the Twitter
dataset. These topics represented distinct themes in user discussions, such as:

Political Opinions: This topic is commonly used in terms such as voting, debate, and
government.

Social Issues: Words such as 'community,' 'climate,' and 'justice' were often used.

User Reactions: Examples of adjectives used included amazing, frustrated, and happy, all of
which pertain to emotive responses to certain occurrences.

The process of fitting the LDA model was iterated, and the perplexity decreased, proving that the
model was suitable for identifying coherent topics.
Figure 7: LDA Topic Modelling in MATLAB

Figure 8: Topic Modeling of Data in MATLAB

Hashtag Analysis
The dataset didn't include hashtags, which prevented understanding the end-user-driven trend
and keyword association. This absence highlighted the dataset's incompleteness and the need to
include hashtags in the subsequent analysis.

Figure 9: Hashtag Analysis of Data in MATLAB


4.1.2 Well-Being Dataset Analysis
This section consists of the results of the Well-Being Dataset by doing the analysis on MATLAB
by applying the methodology. Here is the overview of the dataset.

Figure 10: Well-Being Dataset overview Using Matlab

Engagement Patterns
The Well-Being dataset provided insights into user engagement metrics segmented by platform
and demographics:

Average Daily Usage Time: The most time spent on Instagram (120 minutes), Twitter (90
minutes) and Facebook (60 minutes).

Engagement Metrics: On Instagram, female users reported higher likes and comments per day,
and male users reported higher message-sending activity on Twitter.

Figure 11: Analysis of Engagement Patterns in MATLAB

Emotion Distribution
The dominant emotions in the dataset were distributed as follows:

Happiness: 200 users

Sadness: 170 users

Neutral: 140 users

Anger: 130 users

Anxiety: 160 users


This distribution showed a prevalence of positive emotional states, with Happiness being the
most frequent dominant emotion. However, users who reported Anxiety and Sadness spent more
time online, possibly indicating a relationship between negative emotions and longer social
media use.

Figure 12: Correlation Analysis of Data in MATLAB

Correlation Analysis
Correlating dominant emotion with engagement metrics was inconclusive because emotion data
was categorical. This limitation was based on the need for more advanced statistical techniques
or numeric representations of emotional states to allow effective correlation analysis.

4.1.3 Facebook Dataset Analysis


This section consists of the results of the Facebook Dataset by doing the analysis on MATLAB
by applying the methodology. Here is the overview of the dataset.

Figure 13: Facebook Dataset Overview using MATLAB


Temporal Trends
As the main research methods were the metric dataset and user engagement metrics, the
mismatch between the data and the study's purpose was because the Facebook dataset primarily
focused on stock market data and not the metrics. Columns in the dataset, including "Open,"
"High," "Low," and "Close," were analyzed to observe trends over time. Temporal analysis
showed that stock prices fluctuated over certain periods, but it had little to do with engagement
and sentiment analysis.

Figure 14: Temporal Trend of Open Analysis in MATLAB


Figure 15: Temporal Trend of Close Analysis in MATLAB
Figure 16: Temporal Trend of High Analysis in MATLAB
Figure 17: Temporal Trend of Low Analysis in MATLAB
Figure 18: Temporal Trend of Volume Analysis in MATLAB

4.2 Discussion

4.2.1 Engagement and Sentiment


The Twitter analysis showed that likes are positively correlated with sentiment, consistent with
previous literature suggesting that emotionally engaging content leads to user interaction.
However, the lack of shared and comment data made it impossible to truly understand
engagement dynamics. Future datasets should contain complete engagement metrics to build
insights into how users engage with different types of content.

4.2.2 Platform-Specific Usage


The Well-Being dataset showed stark differences in platform usage and engagement by
demographics. Instagram's visual and emotional dragging nature most likely led to higher daily
usage times and interaction rates, especially among younger and female users. Users engaging in
quick message exchanges, such as male users in the dataset, enjoyed Twitter's text-heavy and
fast-paced environment.

At the same time, there is a strong correlation between negative emotions (e.g., Anxiety,
Sadness) and prolonged usage that fuels questions about the psychological side effects of our
overexposure to social media. These results fit in with previous work that shows an association
between excessive social media usage and mental health issues, but this does not yet show
causation.

4.2.3 Topic Trends


User discussions were analyzed through topic modeling, with distinct themes identified
reflecting society, political opinions, and personal reactions. Words relating to climate and
community issues dominate the web and show the power of social media to mobilize and provide
a public forum.

The absence of hashtags in the Twitter dataset prevented insights into user-driven trends, and
future studies should collect more comprehensive data. Hashtags, on their own, are useful
indicators of what is trending and how the audience is engaging.

4.3 Recommendations

4.3.1 For Businesses


Content Strategy: Engaging content with a positive sentiment increases user interactions. Video
or images can be added to increase the level of engagement.

Targeted Marketing: Use Demographics to target specific campaigns. For instance, Instagram is
used for the youth, especially females, and Twitter is used for quick and textual interactions.

Monitoring Emotional Trends: Create ways to track changes in users' attitudes to adjust
marketing and customer service approaches.

4.3.2 For Policymakers


Mental Health Interventions: Discuss the possibility of potential psychological effects of using
social media for a long time esp, especially for users who complain of Anxiety or Sadness.
Commercialization of social media should also include messages that encourage the right use
and mental health.

Public Sentiment Analysis: To this end, social media sentiment data can be used to analyze the
public's perception of policies and programs, thus allowing early responses to grievances.

4.3.3 For Researchers


Dataset Completeness: Subsequent research should ensure that the datasets contain engagement
metrics such as shares, comments, and hashtags for trend analysis.

Advanced Correlation Methods: Methods can be applied to compare categorical emotional states
and numerical engagement indicators, for example, by converting emotions into sentiment
scores.

Cross-Platform Studies: Broaden the study to include applications other than the two to increase
data coverage of user behavior and trends.

4.3.4 Conclusion
This chapter provided an overview of the analysis of social media datasets, including the level of
user engagement, sentiment, and topics. However, some limitations in the data prevented the
authors from performing some analyses. The recommendations are intended to address these
gaps and inform future research, business practice, and policymaking regarding the use of social
media data.
5. Conclusion
5.1 Overview of the Study
This study explored user behavior, sentiment trends, and engagement dynamics on social media
platforms using the "Social Media Usage and Emotional Well-Being" and Twitter datasets. It
used computational tools and techniques within MATLAB to explore patterns and gain insights
into users' interaction with social media, how sentiment would affect engagement, and how
demographic factors would influence user behavior. The result is a robust framework for future
social media analytics studies that combine social media analytics, computational modeling, and
behavioral research.
The datasets were analyzed using sentiment analysis, engagement metrics, topic modeling, and
demographic segmentation. The data's complexity and sensitivity were handled using advanced
methods such as Latent Dirichlet Allocation (LDA) and homomorphic encryption. Visualization
techniques such as word clouds, temporal plots, network diagrams, etc., aided in interpreting the
findings.

5.2 Key Findings

5.2.1 Engagement Patterns


The analysis revealed significant insights into engagement dynamics:
Twitter Engagement: The higher you scored on positive sentiment, the more correlated you were
to higher likes, which echoed the idea that emotionally engaging content matches better with
audiences. However, the lack of shared and comment data prevented a full understanding of
engagement metrics.
Platform-Specific Usage: Users spent more time on Instagram than on other platforms and
interacted more frequently. Male users were more active on Twitter, and female users were more
active on Instagram.
5.2.2 Emotional States and Usage
The Well-Being dataset provided valuable insights into the relationship between emotional states
and social media usage:
While users experiencing anxiety and sadness reported longer daily usage times, users who felt
happy reported significantly shorter daily usage times. This finding is consistent with worries
about a possible connection between excessive social media use and mental health problems.
Happy was the most reported dominant emotion, and complex behavioral motivations were
associated with taking higher interaction levels on some platforms.

5.2.3 Trends and Topics


Topic modeling identified distinct themes in user discussions on Twitter:
Political Opinions: Words like 'vote,' 'debate,' and 'government' dominate days like this.
Social Issues: Although terms like "climate," "justice," and "community" do tend to dominate.
Emotional Reactions: They also included expressions like 'amazing,' 'frustrated,' and 'happy.' The
topics covered in these papers also covered broader societal concerns and public discourse
during the period under analysis. However, the lack of hashtags in the dataset prevented us from
tracking user-driven trends comprehensively.

5.3 Contributions of the Study


This research contributes significantly to the field of social media analytics by:
Framework Development: A robust methodology for analyzing large-scale, heterogeneous
datasets using advanced computational techniques.
Practical Insights: Businesses' content strategies can be optimized with actionable insights,
public sentiment monitored by policymakers explored by researchers, and social media's impact
on behavior and mental health explored by researchers.
Innovative Methods: Minimizing data complexity and privacy concerns using advanced tools
like LDA, homomorphic encryption, and optimization algorithms.
Understanding Emotional Dynamics: Binding emotional states and social media usage highlight
how this is framed, the implications for mental health interventions, and platform design.
5.4 Limitations
While this study provides valuable findings, several limitations were identified:

5.4.1 Dataset Completeness


The Twitter dataset did not have columns for shares and comments, preventing the engagement
analysis depth. Like hashtags, the lack of hashtags made it difficult to comprehensively explore
user-driven trends and keyword co-occurrences.

5.4.2 Platform Scope


Though focused on Twitter, Instagram, and Facebook, this research will portray only some of
how users behave away from other platforms like TikTok, LinkedIn, and Snapchat. Engagement
patterns on each platform's different designs and audiences can differ.

5.4.3 Emotional Data Representation


Advanced statistical techniques could not be used because emotional states in the Well-Being
dataset are categorical. If emotions were represented as numerical sentiment scores, richer
analyses, such as correlations with engagement metrics, could have been performed.

5.4.4 Topic Modeling: Subjectivity


When subjects from tasks 2−6 were interpreted as topics through LDA, they were manually
labeled, introducing a certain degree of subjectivity. While attempts were made to make these
labels as accurate as possible, future studies might benefit from improved objectivity through
automated labeling methods.

5.5 Recommendations

For Businesses
Content Optimization: Instead of capturing just the news, highlight the emotionally engaging,
positive content that sparks like and likes. Multimedia elements, such as videos and images,
should be added to increase visual appeal and interest.
Targeted Marketing: Apply demographic insights to platform-specific strategies. For instance, if
you are running a campaign for a younger audience, you should focus on Instagram, whereas if
you are running a Twitter campaign, you should focus on rapid, text-based engagement.
Sentiment Monitoring: We develop real-time tools to track public sentiment and adjust
campaigns based on user feedback.
For Policymakers
Public Sentiment Analysis: We capture social media sentiment data to monitor how the public
reacts to policies and initiatives. Early detection of negative sentiment will help identify timely
intervention on public concerns.
Mental Health Awareness: For example, positive and negative emotions have different effects on
social media usage, and, therefore, it should be balanced for users who do not have positive
emotions. Public health campaigns can raise the psychological implications of overuse, as can
the resources for mental well-being available to adherents.

For Researchers
Expand Dataset Coverage: Enable richer analyses by collecting more comprehensive datasets—
all engagement metrics (likes, shares, comments) and any hashtags in them—so they can be
analyzed more easily.
Enhance Emotional Metrics: Develop methods to quantify emotional states numerically,
enabling advanced correlation analyses and cross-platform comparisons.
Explore Cross-Platform Dynamics: Look at how users behave across different platforms and
how different platform designs affect engagement and user sentiment.

5.6 Future Directions


To build on the findings of this research, future studies should:
Integrate Real-Time Analysis: Consider streaming data when analyzing dynamic changes in user
behavior and sentiment as events develop.
Adopt Cross-Platform Datasets: The analysis will be extended to additional platforms such as
TikTok, LinkedIn, and Snapchat to capture a richer universe of user interactions.
Refine Analytical Techniques: Explore advanced topic modeling methods and machine learning
algorithms to improve insight accuracy and depth.
Investigate Psychological Impacts: Conduct longitudinal studies to determine the causal
relationship between social media use and mental health, including emotive states.

5.7 Final Remarks


This research contributes to showing the transformative power of social media analytics to aid in
understanding user behavior, public sentiment, and engagement patterns. The study uses
computational tools and methodologies to lay a foundation for meaningfully analyzing large-
scale data. This has implications for businesses, policymakers, and researchers trying to refine
their strategies, find solutions to societal challenges, and enhance their knowledge of the digital
environment.
Social media is what it is today, and it will continue shaping public discourse, mental health, and
public behavior. This study represents a first step towards harnessing its data responsibly and
effectively, a challenge that remains critical.
6. Future work
The findings provided in the previous chapters provide a clear and detailed picture of user
engagement, sentiment, and trending topics in a Twitter dataset. However, given the constantly
evolving nature of social media, several directions for further research and development should
be noted. The subsequent sections describe the possible directions for further research to develop
and enhance the current study.

6.1 Expanding Data Sources


The current analysis is based on one dataset collected from the Twitter API or other datasets that
are publicly available. In future studies, data from other SM platforms, including Facebook,
Instagram, and Reddit, may be included in the analysis to further expand the research. Such
platforms provide a variety of user activities and participation patterns across the platforms,
which would have given a deeper insight into cross-platform social media trends. Besides,
including the data in several languages would mean a global approach to consideration of
sentiments and user activity, which would be more accurate regarding locality.

Moreover, trend analysis could be more real-time if the data is taken from Twitter or any other
social media feed. This would enable researchers to follow the development of various hashtags,
topics, and sentiments in real-time while providing useful information for businesspeople,
advertisers, and policymakers.

6.2 Improving the Methods of Sentiment Analysis


Although the sentiment analysis method applied in this study is based on the lexicon and is quite
simple, it can be problematic in analyzing language, such as sarcasm, irony, and context-
dependent meanings. There is a potential area regarding utilizing more innovative deep learning
models based on the transformers, like BERT or GPT, which are well known for their better
performance while training the models from different world languages to understand the human
language. The above-described models can capture contextual and nuanced emotions to give
better sentiment classifications.
Additionally, sentiment analysis might be expanded beyond text to include pictures, emojis, and
videos that users post on Twitter. Due to the current trend of intensified utilization of graphics on
social media, analysis that incorporates both text and non-text features may offer a better insight
into the public mood.

6.3 Understanding the level of engagement and creating a user profile


The current engagement analysis mainly includes likes, retweets, and replies. Future work could
build on this analysis by including other types of engagement, including quote tweets, mentions,
and click-through rates on links posted in tweets. Including these metrics gives a better
understanding of social media consumption and sharing of content.

One more promising avenue would be to expand the concept of user profiling further. By
choosing the segments from users' demographic information (age, gender, location, and so on)
and their behavior (influencers, casual users), as well as users' psychological characteristics
(sentiment types, levels of engagement, etc.), the analysis could reveal more about which user
groups are most appropriate for engaging on topics. For example, any further development of the
influencer analysis could be made through not only an identification of the influencer accounts
but also through focusing on the types of tweets they publish and how these impact the overall
sentiments and engagement of the CN.

6.4 Moving Average, Trends and Topics


In this study, trend detection was limited to identifying the frequency of hashtags and using LDA
for topic modelling. Nonetheless, this scope could be broadened with the help of other higher-
order topic modelling approaches, like Non-Negative Matrix Factorization (NMF) or the Latent
Semantic Analysis (LSA) approach. These methods may give a more refined categorization of
topics, so the budding issues can be found even though some adequate keywords or hashtags
must clearly define the topics.

Further, the temporal aspect of trend detection could be investigated in future research. This
would involve learning how topics change over time and finding out when a discussion is likely
to turn. Dynamic topic models could be applied to understand how topic prominence changes,
which may give insight as to what kind of events influence public discussions.
6.5 Advanced Social Network Analysis
In the case of SNA, improving the algorithms used to identify influential users is possible. Future
work could also consider other centrality measures, such as community detection algorithms like
Louvain or Infomap, which would enable the identification of user clusters based on their
interaction. These methods could identify other 'buried' subgraphs and communities within the
large-scale Twitter network, improving the current understanding of information dissemination in
these various groups.

In addition, sentiment propagation could be analyzed to understand how sentiment flows in the
social network. Knowledge of users' emotions, opinions, and information-sharing activities
might explain how the spread is facilitated, and which aspects of content make certain topics go
viral.

6.6 Key Applications of Real-time Analytics


Since, by its definition, social media is a constantly evolving and rapidly developing platform, it
is very important to operate in real-time to analyze the further trends or behaviours of users.
Further research could be aimed at designing systems that provide real-time sentiment,
engagement and trending topics on Twitter. Such systems could be applied to brand monitoring
by companies or opinion polling by policymakers on political or social topics.

In turn, the analysis of the real-time sentiment distribution could be beneficial in crises. For
instance, monitoring public mood for certain topics during natural disasters, political events, or
any other events could assist governments or organizations in addressing public concerns more
efficiently and how they manage their stakeholders' engagement.

6.7 Professional Standards and Ways to Eliminate Biases


Lastly, the last directional area of research that has been highlighted as warranting further study
encompasses ethical issues associated with analyzing social media data. Since most social media
involve sensitive data, future research needs to guide careful measures toward privacy protection,
particularly in meeting ethical concerns and data protection acts like GDPR.
However, demographical imperfections in the set and possible biases in the sentiment analysis
tools, such as virtue signifiers or racism, need to be detected and corrected. Paying particular
attention to the fairness of the analysis will be important to obtaining more sound and inclusive
results.

6.8 Conclusion
Future work in social media analytics is expected to have great potential to provide a more
profound and precise understanding of the users' behaviors, sentiment trends, and new patterns.
More advanced methodological approaches, a wider range of data sources, and solutions to the
ethical problems that may emerge in the future contribute to a greater elucidation of the
interactions mediated by and facilitated through social media and their financial, political, and
social implications.
References
Afyouni, I., Aghbari, Z.A. and Razack, R.A. (2021) 'Multi-feature, multi-modal, and multi-
source social event detection: A comprehensive survey,' Information Fusion, 79, pp. 279–
308. https://doi.org/10.1016/j.inffus.2021.10.013.
Anzano-Oto, S., Vázquez-Toledo, S. and Latorre-Cosculluela, C. (2023) 'Digital reality in
Compulsary Secondary Education: uses, purposes and profiles in social networks,' New
Review of Information Networking, 28(1–2), pp. 26–48.
https://doi.org/10.1080/13614576.2023.2219244.
Bengtsson, S. and Johansson, S. (2022) 'The meanings of social media use in everyday life:
filling empty slots, everyday transformations, and mood management,' Social Media +
Society, 8(4), p. 205630512211302. https://doi.org/10.1177/20563051221130292.
Bilinski, P. (2022) 'The content of tweets and the usefulness of YouTube and Instagram in
corporate communication,' European Accounting Review, 33(1), pp. 279–311.
https://doi.org/10.1080/09638180.2022.2084759.
Busalim, A.H., Ghabban, F. and Hussin, A.R.C. (2020) 'Customer engagement behaviour on
social commerce platforms: An empirical study,' Technology in Society, 64, p. 101437.
https://doi.org/10.1016/j.techsoc.2020.101437.
Camacho, D. et al. (2020) 'The four dimensions of social network analysis: An overview of
research methods, applications, and software tools,' Information Fusion, 63, pp. 88–120.
https://doi.org/10.1016/j.inffus.2020.05.009.
Chakraborty, K., Bhattacharyya, S. and Bag, R. (2020) 'A Survey of Sentiment Analysis from
Social Media Data,' IEEE Transactions on Computational Social Systems, 7(2), pp. 450–
464. https://doi.org/10.1109/tcss.2019.2956957.
Chukwunweike, N.J.N. et al. (2024) 'Integrating deep learning, MATLAB, and advanced CAD
for predictive root cause analysis in PLC systems: A multi-tool approach to enhancing
industrial automation and reliability,' World Journal of Advanced Research and Reviews,
23(2), pp. 2538–3557. https://doi.org/10.30574/wjarr.2024.23.2.2631.
Corvite, S. and Hui, J. (2024) 'Social Media as a Lens into Careers During a Changing World of
Work,' Proceedings of the ACM on Human-Computer Interaction, 8(CSCW2), pp. 1–27.
https://doi.org/10.1145/3687053.
Drivas, I.C. et al. (2022) 'Social Media Analytics and Metrics for improving users engagement,'
Knowledge, 2(2), pp. 225–242. https://doi.org/10.3390/knowledge2020014.
Dukovski, I. et al. (2021) 'A metabolic modeling platform for the computation of microbial
ecosystems in time and space (COMETS),' Nature Protocols, 16(11), pp. 5030–5082.
https://doi.org/10.1038/s41596-021-00593-3.
Dunsin, D. et al. (2024) 'A comprehensive analysis of the role of artificial intelligence and
machine learning in modern digital forensics and incident response,' Forensic Science
International Digital Investigation, 48, p. 301675.
https://doi.org/10.1016/j.fsidi.2023.301675.
Gheisari, M. et al. (2023) 'Deep learning: Applications, architectures, models, tools, and
frameworks: A comprehensive survey,' CAAI Transactions on Intelligence Technology,
8(3), pp. 581–606. https://doi.org/10.1049/cit2.12180.
Gkikas, D.C. et al. (2022) 'How do text characteristics impact user engagement in social media
posts: Modeling content readability, length, and hashtags number in Facebook,'
International Journal of Information Management Data Insights, 2(1), p. 100067.
https://doi.org/10.1016/j.jjimei.2022.100067.
Habib, A. and Raza, A.A. (2021) 'IoT-Based Pervasive Sentiment Analysis: A Fine-Grained Text
Normalization Framework for context aware Hybrid Applications,' in EAI/Springer
Innovations in Communication and Computing, pp. 201–226.
https://doi.org/10.1007/978-3-030-75123-4_10.
Horova, V. et al. (2024) 'In-Depth examination of the effective use of social networks for
communication in united Territorial communities: Navigating the Digital Landscape,' in
Lecture notes on data engineering and communications technologies, pp. 225–246.
https://doi.org/10.1007/978-3-031-62213-7_11.
Kordzadeh, N. and Young, D.K. (2020) 'How social media analytics can inform content
strategies,' Journal of Computer Information Systems, 62(1), pp. 128–140.
https://doi.org/10.1080/08874417.2020.1736691.
Kurani, A. et al. (2021) 'A comprehensive comparative study of Artificial Neural Network
(ANN) and Support Vector Machines (SVM) on stock forecasting,' Annals of Data
Science, 10(1), pp. 183–208. https://doi.org/10.1007/s40745-021-00344-x.
Lee, J.H., Wood, J. and Kim, J. (2021) 'Tracing the trends in sustainability and social media
research using topic modeling,' Sustainability, 13(3), p. 1269.
https://doi.org/10.3390/su13031269.
Nagaraj, N. and J, C. (2021) 'Sentence Classification using Machine Learning with Term
Frequency–Inverse Document Frequency with N-Gram,' in Soft Computing Research
Society eBooks, pp. 337–346. https://doi.org/10.52458/978-81-95502-00-4-35.
Najafabadi, A.J., Skryzhadlovska, A. and Valilai, O.F. (2024) 'Agile Product Development by
Prediction of Consumers’ Behaviour; using Neurobehavioral and Social Media Sentiment
Analysis Approaches,' Procedia Computer Science, 232, pp. 1683–1693.
https://doi.org/10.1016/j.procs.2024.01.166.
Nandi, G. and Sharma, R.K. (2020) Data Science Fundamentals and Practical Approaches:
Understand why data science is the next.
https://openlibrary.telkomuniversity.ac.id/pustaka/165518/data-science-fundamentals-
and-practical-approaches-understand-why-data-science-is-the-next.html.
Nasir, V.A. et al. (2021) 'Segmenting consumers based on social media advertising perceptions:
How does purchase intention differ across segments?,' Telematics and Informatics, 64, p.
101687. https://doi.org/10.1016/j.tele.2021.101687.
Pavarino, E.C. et al. (2023) 'mEMbrain: an interactive deep learning MATLAB tool for
connectomic segmentation on commodity desktops,' Frontiers in Neural Circuits, 17.
https://doi.org/10.3389/fncir.2023.952921.
Pierce, P.P. et al. (2021) 'Social network analysis: Exploring connections to advance military
nursing science,' Nursing Outlook, 69(3), pp. 311–321.
https://doi.org/10.1016/j.outlook.2020.12.013.
Qalati, S.A. et al. (2021) 'A mediated model on the adoption of social media and SMEs’
performance in developing countries,' Technology in Society, 64, p. 101513.
https://doi.org/10.1016/j.techsoc.2020.101513.
Rahate, A. et al. (2021) 'Multimodal Co-learning: Challenges, applications with datasets, recent
advances and future directions,' Information Fusion, 81, pp. 203–239.
https://doi.org/10.1016/j.inffus.2021.12.003.
Rodriguez, M.Y. and Storer, H. (2019) 'A computational social science perspective on qualitative
data exploration: Using topic models for the descriptive analysis of social media data*,'
Journal of Technology in Human Services, 38(1), pp. 54–86.
https://doi.org/10.1080/15228835.2019.1616350.
Tembhurne, J.V. and Diwan, T. (2020) 'Sentiment analysis in textual, visual and multimodal
inputs using recurrent neural networks,' Multimedia Tools and Applications, 80(5), pp.
6871–6910. https://doi.org/10.1007/s11042-020-10037-x.
Zion, G.D. and Tripathy, B.K. (2020) 'Comparative analysis of tools for big data visualization
and challenges,' in Springer eBooks, pp. 33–52. https://doi.org/10.1007/978-981-15-
2282-6_3.

You might also like