Donghyuk Shin
College of Business, Korea Advanced Institute of Science and Technology (KAIST)
Seoul, REPUBLIC OF KOREA {[email protected]}
K. Hazel Kwon
Walter Cronkite School of Journalism and Mass Communication, Arizona State University
Phoenix, AZ, U.S.A. {[email protected]}
Disinformation activities that aim to manipulate public opinion pose serious challenges to managing
online platforms. One of the most widely used disinformation techniques is bot-assisted fake social
engagement, which is used to falsely and quickly amplify the salience of information at scale. Based on
agenda-setting theory, we hypothesize that bot-assisted fake social engagement boosts public attention in
the manner intended by the manipulator. Leveraging a proven case of bot-assisted fake social engagement
operation in a highly trafficked news portal, this study examines the impact of fake social engagement on
the digital public’s news consumption, search activities, and political sentiment. For that purpose, we
used ground-truth labels of the manipulator’s bot accounts, as well as real-time clickstream logs
generated by ordinary public users. Results show that bot-assisted fake social engagement operations
disproportionately increase the digital public’s attention to not only the topical domain of the
manipulator’s interest (i.e., political news) but also to specific attributes of the topic (i.e., political
keywords and sentiment) that align with the manipulator’s intention. We discuss managerial and policy
implications for increasingly cluttered online platforms.
Keywords: Opinion manipulation, disinformation, fake social engagement, political bots, online platforms,
econometrics, machine learning
Likoebe M. Maruping was the accepting senior editor for this paper.
Ofer Arazy served as the associate editor.
While existing user-centered studies have offered lessons on disinformation reality because disinformation influence can
what makes users engage with or react to disinformation also be indirect: Users may be surreptitiously exposed to the
messages, these studies have predominantly examined message manipulator’s intention, and even the simplest exposure could
characteristics, providing little explanation about what happens result in a ripple effect on the consumption of other
to users when their attention is “hacked” by bots’ false informational sources without further direct interactions with
amplification (Marwick & Lewis, 2017, p. 19). To summarize, the manipulators or their content.
the two branches of disinformation research, bot-based
information diffusion studies and user effect studies, have rarely Second, disinformation operations entail not only the creation
been integrated, leaving the question of how bots’ amplifying of fake content but also the creation of fake engagement with
activities alter the digital public’s information consumption existing content in the manipulator’s favor. Thus far, a
patterns open to further exploration. considerable body of literature has focused on the effects of the
former—for example, by examining what makes fake content
persuasive, how it is propagated, and how it is detected (e.g.,
Missing Pieces: Influence Spillover and Fake Vosoughi et al., 2018; Cresci, 2020), with little attention paid to
Social Engagement the disruption of the digital information commons caused by
fake social engagement. Thus far, disinformation studies that
To fill in this gap, we expand on two aspects of a real-world have examined social engagement have mainly focused on
disinformation operation that have not yet been thoroughly organic social engagement with fake content. For example,
explored by empirical research. First, in reality, disinformation Edelson et al. (2021) found that about 70% of all user
never occurs in an isolated dyad between the manipulator (or engagements across far-right news pages on Facebook were
manipulated content) and a user. On the contrary, the immediate made with misinformation content. In another study, Freelon et
context in which users are exposed to a perpetrator’s action is a al. (2022) showed that user engagement with disinformation
subset of a larger information ecosystem. Therefore, the effect of tweets became disproportionately large when the tweets
a successful disinformation campaign is likely to extend beyond originated from fake accounts pretending to be Black activists.
the direct interaction between the manipulator (or manipulated A handful of studies have paid attention to fake (mostly bot-
content) and the user and spill into other settings of information assisted) social engagement activities (e.g., Boichak et al.,
consumption. Several qualitative case studies have alluded to 2021); however, to our knowledge, no study has taken a user-
this point by describing how disinformation perpetrators work centered approach to examine how bot-assisted fake social
not in isolation but exploit existing media networks. For example, engagement affects individual users’ informational behaviors.
successful disinformation content created in an online troll
community does not stay within the community but is picked up The reasons for the dearth of user-centered studies in the bot
by mainstream media attention, reaching broad audiences literature are twofold. First, it is difficult to detect bot activities
(Phillips, 2015; Marwick & Lewis, 2017). unless ground-truth labels are available. As a result, developing
detection techniques is a complex scientific problem that
That said, systematic empirical analyses of disinformation demands considerable effort (e.g., Varol et al., 2017; Cresci,
effects have mostly focused only on direct interactions between 2020). Second, the primary bot activities have thus far
manipulative content and users. This is understandable because functioned as information brokers (i.e., algorithmic conduit
it is rare to obtain data that represent the spillover of brokerage) rather than original content creators. Since the
disinformation influence. Nevertheless, previous findings that consequence of conduit brokerage is more nuanced than content
disinformation is engaged only by niche audience groups are creation, it is difficult to empirically differentiate between users
based on such limited measures, resulting in an incomplete who are exposed to bot activities and those who are not.
representation of disinformation’s sphere of influence. For
example, Nelson and Taneja (2018) measured fake news Despite the challenges, understanding the effects of bot-assisted
consumption by using site-visitation data, one of the most fake social engagement on individual users is imperative
proactive measures of audience engagement. Based on this, because of the phenomenon’s prevalence and significance. Bot-
they argued that broad users were seldom vulnerable to fake assisted fake social engagement is prevalent due to its cost-
news. Similarly, Bail et al.’s Twitter study (2020) found that effectiveness (e.g., Jeong et al., 2020; Carman et al., 2018;
Russia’s disinformation accounts were engaged mostly by Schäfer et al., 2017; Rossi et al., 2020). Also, it is a powerful
highly partisan users with high-frequency usage of Twitter, tactic because social engagement metrics are pivotal indicators
concluding that “Russian trolls might have failed to sow discord of content popularity fed into a platform’s content curation
because they mostly interacted with those who were already algorithms. Accordingly, we first ask the following question:
highly polarized” (p. 243). However, the Bail et al. study was
based on non-representative survey data matched with the RQ: Does bot-assisted fake social engagement have spillover
metrics of direct engagement with the troll accounts or their effects on public attention to information beyond the
messages. The findings of these studies are a partial snapshot of manipulated context?
Bot-Assisted Fake Social Engagement and While extant studies have alluded to the agenda-setting
Public Attention: An Agenda-Setting Theoretical potential of disinformation, they have focused only on fake
Framework news sites and the transporting of their narratives to other
mainstream media outlets. To our knowledge, no study has
Bot-assisted fake social engagement operations center on the examined disinformation’s agenda-setting effect on general
interplay among human perpetrators, automation (bots), and public users, particularly in terms of bot-assisted fake social
platform algorithms to “manufacture consensus or to engagement operations.
otherwise give the illusion of general support for a (perhaps
controversial) political idea or policy, with the goal of creating Bot-assisted fake social engagement operations facilitate a
a bandwagon effect” (Woolley & Howard, 2016, p.4, bandwagon of public attention through the mechanism of rapid
emphasis added). In information consumption contexts, the scaling (Salge et al., 2022). The rapidly inflated engagement
bandwagon effect is manifest in the shift of public attention to volume makes it look like a large number of “real” users are
certain types of information. interested in the (falsely) amplified topic, which can, in turn,
increase organic public attention to the topic. Technically
Agenda-setting theory (McCombs & Valenzuela, 2020) is a speaking, social engagements can be manipulated solely by
useful theoretical framework for explaining how fake social human workers. However, fake engagement operations would
engagement influences public attention. It suggests that the have little impact on rearranging the salience of information
media has the ability to influence audiences in terms of which unless the metric is rapidly fabricated at scale. In other words,
issue to pay attention to as an important public agenda and bots’ “rapid scaling” (Salge et al., 2022) of social engagement
which attributes of the issue to pay attention to in order to and the subsequent bandwagon effect resonates with the tenet
make sense of the issue (McCombs & Valenzuela, 2020). The of agenda-setting theory.
agenda-setting effect of news media on the public’s mind has
been well documented in the media and journalism literature Agenda-setting theory includes two levels of media effects on
since the seminal evidence of news media’s agenda-setting shaping public attention to news agendas (McCombs &
function a half-century ago. McCombs and Shaw (1972) Valenzuela, 2020). The first-level agenda-setting effect, also
found a significant association between the amount of news known as “issue agenda setting” (Kim et al., 2002), refers to the
coverage of political agendas during an election campaign and media’s ability to determine the hierarchy of public agendas by
the public ranking of the importance of agendas for the informing the audience what topic (issue or object) it should pay
election. Numerous studies have since then confirmed that the more attention to. The frequency of topics in news articles
public’s understanding of political reality is influenced by the influence how the audience prioritizes the importance of these
salience of issues emphasized in news coverage. topics (McCombs & Valenzuela, 2020). For example, if the
media covers news about Samsung Galaxy smartphones more
Provided that the media’s agenda-setting effect occurs by frequently than Apple iPhones, the audience will pay more
increasing the salience of information, disinformation actors attention to Samsung’s smartphones than Apple’s. The first-level
may also play the role of agenda setters by amplifying the agenda-setting effect can occur on an even more abstract topic
salience of the information that conveys their preferences. A domain. For example, if the media reports on foreign affairs more
few studies have alluded to this point. For example, Guo and frequently than on the domestic economy, the audience will be
Vargo (2020) showed that fake news stories exaggerated likely to consider international politics to be a more important
politician attributes, such as moral quality, leadership quality, current issue than domestic economic conditions.
and intellectual ability, to affect public attitudes toward
political candidates. Rojecki and Meraz (2016) examined
In other words, first-level agenda setting is about the media’s
conspiratorial information transmissions during the 2004 U.S.
influence on public attention to a topic. In this study’s empirical
presidential campaigns. They found that while the visibility of
context, where disinformation was related to a political issue,
conspiracies on the Google search results was not directly
associated with users’ overall search trend—an indicator of we posit a hypothesis that suggests the first-level agenda-setting
the naturally occurring volume of online public attention—the effect of bot-assisted fake social engagement on public attention
visibility of conspiratorial information on the search results to a political topic. That is, when a bot-assisted fake social
influenced traditional media’s coverage of it, which in turn engagement operation targets political content, the intensity of
was associated with users’ overall search trend. Vargo et al. exposure to the operation predicts a relative increase in public
(2018) analyzed big data from news archives, demonstrating attention to political news compared to non-political news.
that fake news sites had a stronger “intermedia” agenda-
setting effect (p. 2030) on legitimate news coverage than fact- H1: The salience of bot-assisted fake social engagement
checking sites, particularly by transferring their agendas to predicts an increase in public attention to political news
partisan news outlets (e.g., Fox News). compared to non-political news.
Meanwhile, second-level agenda setting, also known as setting effect occurs not only in the context of single attributes
“attribute agenda setting” (Kim et al., 2002), focuses on the but impacts bundles of mental associations in a so-called
presentation of attributes, qualities, or characteristics of a network agenda-setting effect (e.g., Guo & Vargo, 2015; Vu et
certain topic and its effect on how the audience will al., 2014). Drawing upon the recent theoretical development of
subsequently perceive or feel about that topic (Kiousis, 2005; the network agenda-setting model, we posit a second-level
McCombs & Valenzuela, 2020). For example, if the media agenda-setting hypothesis based on associative keywords and
frequently focuses on Samsung Galaxy’s foldable design when texts resonating with the manipulator’s intention. Given this
reporting on smartphone features, the audience will begin to study’s empirical context, featuring a disinformation operation
prioritize foldable design as the smartphone’s most important directed toward a political issue, we posit a second-level
attribute and pay more attention to information related to this agenda-setting hypothesis that centers on political attributes:
feature when they think about Samsung Galaxy smartphones.
On the contrary, if the media frequently reports on Samsung
H2: The salience of bot-assisted fake social engagement
Galaxy’s alleged benchmark manipulation with regard to the
predicts an increase in public attention to manipulator-
speed, battery life, and overall performance,4 the audience will
promoted political attributes compared to manipulator-
prioritize benchmark manipulation as the smartphone’s most
demoted political attributes (e.g., political keywords and
important attribute and will pay attention to information related
to this feature when they think about Samsung Galaxy
smartphones. In other words, the second-level agenda-setting
Figure 1 conceptually illustrates the fake social engagement-
effect is about the media’s ability to prime the audience’s
driven agenda-setting effect thesis that we propose. The lower
attitudinal or emotional reaction to a topic, because the selective
part of Figure 1 illustrates that bot-assisted fake social
presentation of attributes transmits sentiment, whether intended
engagement distorts the salience of topics and their attributes. In
or not, and subsequently influences the audience’s attitude
this example, Topic 1 is the target of manipulation in which
toward the topic (Coleman & Wu, 2010; Kim et al., 2002).
Attribute A3 is promoted and Attribute A2 is demoted through
bot-assisted fake social engagement. This manipulated salience
Likewise, bot-assisted fake social engagement operations may
then influences the distribution of public attention, as illustrated
engender the second-level agenda-setting effect by inflating the by the upper-right part of Figure 1, where Topic 1 becomes the
salience of certain attributes of the targeted topic, which may, most dominant topic and Attribute A3 emerges as the most salient
in turn, increase public attention to these attributes. Recent attribute while A2 becomes the least salient attribute of Topic 1.
advances in agenda-setting theory suggest that the agenda-
Digital Opinion Manipulation: The 2018 The Druking accounts were established based on the verified
Druking Scandal documents issued by the law enforcement department and the
details pertinent to general users are fully de-identified. The
subject of the focal article gained traction: 39,827 comments
We focused on an online opinion-rigging scandal that
were posted within 24 hours after the story was first published
occurred in South Korea. In 2018, South Korea experienced a
on January 17, 2018, at 9:35 a.m. Korea Standard Time (KST).
major disinformation activity, widely referred to as the
It is worth noting that none of the comments came from the
Druking scandal (Choe, 2018). “Druking” was the screen Druking accounts. That is, they were all posted by authentic
name for the disinformation operation team’s leader, who had users. As a result, the manipulator’s primary goal was to affect
been a popular blogger while secretly founding a shadow the popularity of comments created by others rather than to
company that ran illegitimate internet political campaigns create its own comments. Druking’s operation was clearly
utilizing political trolls. The company operated during the directed at altering the upvote/downvote counts of existing
2017 South Korean presidential election campaign to comments. Over the 24-hour time span, some 2,300 Druking
influence public opinion. While its initial political position accounts were used approximately 1.2 million times to cast
was aligned with the Democratic Party (the then ruling party), upvotes or downvotes in order to alter the ranking of the
in 2018 it assumed an anti-government stance. In 2018, the current comments targeted by the manipulator.
Druking team was indicted for rigging online comments. The
main locations where the Druking team operated were spaces
for news comments on major Korean portal sites. Given that
South Korea has a 96% internet penetration rate, with the vast Data
majority of online news consumption occurring via portal sites
and an active presence of news comment culture, such digital We examined one of the leading online news platforms in
opinion manipulations can have substantial ramifications.5,6 South Korea. On this platform, each news article’s page was
composed of the main article and the user comment space
One of the primary activities of the Druking team was to dedicated to the main article. The ranking of comments was
manipulate the ranking of comments on a news site. To this determined by their popularity, measured as upvotes (i.e., the
number of thumbs-ups it received) compared to downvotes.
end, they used a programmable code called “KingCrab,” a
Therefore, manipulators could escalate (decrease) the ranking
macro-based bot that cast a massive number of up/down votes
of comments they wanted to promote (demote) by generating
for certain targeted comments. The ranking of comments was
a large number of upvotes (downvotes) on them using a
important because the top-ranked comments achieved a higher programmable bot. An example screenshot of a news article
degree of visibility than the rest of the comments. The and its user comments from the focal platform is shown in
Druking operation team’s key action involved the selection of Figure 2.
target comments and the manipulation of their rankings by
pushing their favored (disapproved) comments to the top In partnership with the platform’s company, we obtained
(bottom) of the list. access to its proprietary data on user behaviors and
clickstream information, amounting to more than 108 million
As with Reddit and other online news aggregators, the focal raw user log entries. The granularity of the data enabled us to
platform determined the ranking of comments on a particular observe how the user-generated comment section embedded
news page based on their net vote count (i.e., total upvotes in a news article’s page was shaped over time and how users’
minus total downvotes per comment). The platform used news searching and viewing activities across the platform
phone verification to authenticate users’ identities during the changed after the consumption of a news article’s page. In the
account registration process and permitted only one upvote or following sections, we first describe how a bot-assisted fake
downvote per comment. Nonetheless, the Druking team social engagement operation influenced the real-time process
managed to circumvent this by obtaining and leveraging through which the user-generated comment section of the
thousands of legitimately created user accounts to create a focal article was created. Then we shift our focus to the
large number of upvotes and downvotes in an attempt to behaviors of organic users and provide the descriptive
manipulate news comment spaces in its favor. statistics that illustrate their activities on the platform.
Note: Contents of the article and user comments were translated from Korean to English using Google Translate.
The platform ranked user-generated comments in the order of presents 10 different comment convergence trends. The
popularity, as determined by the number of upvotes received rankings fluctuated significantly within the first three hours and
minus the number of downvotes. As a result, either upvoting or then steadily converged to their final positions at around five
downvoting on a particular comment would influence the hours, which was to be anticipated given that the rankings were
comment’s salience by changing its relative position. Knowing decided by the cumulative number of upvotes and downvotes.
this, the manipulator used programmable bots to increase the Although some comments shifted upward or held their status
number of upvotes on comments he endorsed while producing over time, others moved down due to downvotes, a surge of
downvotes on comments he wanted to suppress. Importantly, other comments, or the introduction of new comments.
however, the manipulator did not have full control because a
large portion of votes were generated by organic users with More importantly, the manipulator’s vote distribution was
diverse viewpoints. clearly distinct from that of organic votes. Figure 6 shows the
source of votes for the top 10 comments as of the last time point
Figure 4 depicts the number of upvotes and downvotes created in our data, ordered by their total net upvotes, including votes
by bots, as well as organic users, over time. There was a total of from both manipulator and organic users. The manipulator
953,578 votes cast on 3,775 comments, with 719,609 upvotes created upvotes to promote six comments (C1, C3, C4, C6, C7,
(75.46%) and 233,969 downvotes (24.54%). The manipulator C8, C9) and suppress four (C2, C4, C5, C10), demonstrating his
was responsible for 31.77% of the total upvotes and 20.92% of goal-directed behavior. That said, the manipulator did not have
the total downvotes, and its voting activities increased two complete control of the opinion landscape: While the operation
hours after the focal article was published. Organic users’ votes, succeeded in positioning six of his preferred comments among
on the other hand, appeared more quickly and had a longer tail the top 10, he was unable to overcome the organic popularity of
than the manipulator’s votes. This suggests that the manipulator the other four comments that he disapproved of. Nonetheless,
took some time to identify his target news page and the manipulative votes appeared to have a significant impact on the
comments on it and prepare for the attack. Then, he ceased the final ranking of the comments, even when organic votes
operation when the effect of the votes became muted due to the outnumbered them. For example, if the manipulator had voted
large volume of accumulated votes. against comment C1 while supporting comment C2, the relative
positions of the two comments would have been reversed. In
Comment rankings fluctuated over time and eventually total, the manipulator promoted 998 comments and suppressed
converged to a final rank, as illustrated in Figure 5, which 247 comments.
Note: Comments are ordered by their total net upvotes (upvotes minus downvotes), including those from both manipulator and organic users,
as of the last time point in our data.
Organic User Activities To identify articles that were extremely similar to the focal
article, we used the doc2vec model (Le & Mikolov, 2014),
The organic user activity dataset was made up of the full server which has rapidly gained popularity in the IS literature (e.g.,
log details of 23,735 general users from January 15, 2018, to Qiao et al., 2020; Shin et al., 2020) due to its state-of-the-art
January 18, 2018. We removed 384 users who had no activity performance in various natural language processing tasks. We
during the period, were younger than eight years of age, or had fine-tuned a large-scale doc2vec model pre-trained on more
missing demographic data. The samples were then divided into than 6.3 GB of text data7 using a total of 342,567 articles that
one of two categories. The first group, the treatment group, users visited during the sample period. Based on our doc2vec
consisted of 17,335 users who visited the focal news article with model, we first represented each article by its embedding vector
a manipulator-targeted comment section. The second group, in a latent feature space. 8 Then, using the cosine similarity
which we called the control group, consisted of 3,868 users who between their embedding vectors, we computed the similarity
visited one of 34 articles that contained highly similar content between the focal article and all other articles. Along with
to the focal article but had not been attacked by the manipulator. manual verification, we selected a total of 34 articles that were
most similar to the focal article as control group articles.9 To the publication of the focal article and then steadied at a positive
avoid cross-contamination, we eliminated 2,148 users who value of around 3.5 as the comment ranks stabilized. Figure 8(b)
visited both focal and control articles from our sample. As a depicts the arrival time of organic users at the focal article and
result, the valid sample included 21,203 organic users. its comment section. Visitors to the focal article within the first
five hours accounted for 25% of overall viewers of the focal
Our data enabled us to examine both pre- and post-visit log files article. If we extended the time window to the first ten hours,
for each user account since the focal news article was published the percentage rose to 62%. The variation in users’ arrival times
at 9:35 a.m. on January 17, 2018 (KST). An average user in both at the focal article resulted in a variation in the composition of
the treatment and control groups visited 153.2 pages and spent the comment section to which each user was exposed.
3.1 hours per day on the platform over the course of four days
(see Table 1). The two groups were comparable in terms of
Public Attention
platform engagement, with no statistically significant
differences in the number of logs, page views, votes, or amount
We operationalized public attention by using page views. That
of time spent during the sample period. However, there was a
high degree of heterogeneity among users, which is indicated by is, we measured the total amount of user 𝑖’s attention to news in
large sample standard deviations. In addition, the user activities time 𝑡 by the number of news pages that user 𝑖 viewed during
exhibited a clear pattern of temporal variation. Figure 7 depicts the corresponding window of one hour, which was denoted by
how an average user’s page views per hour changed over time. 𝑃𝑉𝑖𝑡 . Further, we decomposed public attention to news by
Users were more active in the afternoon and evening than late at topical categories to investigate the shift in public attention
night and early in the morning. The first three days contained a caused by FSE (i.e., the first-level agenda-setting effect). In order
higher number of user page views than the remaining days. to test the first-level agenda-setting effect (H1) given the political
nature of the focal news article in our empirical context, we
compared user attention to political and non-political news
articles using the platform’s preset news categories. The non-
Variables 𝑠𝑝𝑜𝑟𝑡𝑠
political news sections included sports ( 𝑃𝑉𝑖𝑡 ) ,
𝑒𝑛𝑡𝑒𝑟 10 𝑜𝑡ℎ𝑒𝑟
Salience of Bot-Assisted Fake Social Engagement entertainment (𝑃𝑉𝑖𝑡 ), and other news (𝑃𝑉𝑖𝑡 ).
Figure 8. Salience of Bot-Assisted Fake Social Engagement and the Number of Viewers of the Focal
To infer Druking’s keywords, we first computed the term page views offers additional operationalization of the
frequency-inverse document frequency (TF-IDF) scores across manipulator-intended political attribute based on search-
all comments. Then, using the ground-truth labels for Druking’s induced page views.
bot accounts, we found 43 promoted keywords and 21 demoted
keywords with the largest TF-IDF score differences across Passive public attention (unsearched page views of articles
comments with and without fake social engagement from with similar headlines): Users do not always navigate news
Druking accounts. The majority of upvoted (or promoted) articles by conducting proactive keyword searches; rather, they
keywords contained anti-government sentiments, such as frequently choose what to read due to incidental exposure to the
innuendo, mockery, or insinuation about the government and headline of an article. Indeed, 54% of our sample did not perform
president at the time, whereas the majority of downvoted (or any keyword searches during the sample period. To ascertain the
demoted) keywords contained terms referring to opinion effect of FSE on those who “passively” consumed news articles,
manipulation operations or investigations into harmful/fake we used our doc2vec model to examine the unsearched page
comments. The identified upvoted (downvoted) keywords views of all articles with headlines that were semantically similar
appeared in 90.5% (96.7%) of promoted (demoted) comments to the aforementioned Druking’s keywords. Following that, page
but not in any unmanipulated comments. views were calculated based on the top-10% most similar articles.
Finally, we operationalized passive attention to manipulator-
Given that general users may not always use search keywords intended political attributes by computing the difference in page
that are identical to manipulator-promoted/-demoted keywords, views between the news articles with headlines that resonated
we included other keywords that were deemed highly with the manipulator’s promoting keywords ( 𝑃𝑉𝑖𝑡 ) and
associative with the manipulator’s keyword list. To do this, we news articles with headlines that were consonant with the
represented each keyword by its word embedding vector using manipulator’s demoting keywords (𝑃𝑉𝑖𝑡𝑑𝑒𝑚𝑜 ).
the doc2vec model (explained in the Organic User Activities
section). Then, we identified the top-100 most associative and Political sentiment (pro-government vs. anti-government):
semantically similar search keywords from over seven million Additionally, we examined the second-level agenda-setting
distinct search queries by calculating their cosine similarity to effect from the perspective of political position. Recall that
the manipulator’s keyword list. The final keyword list had 143 Druking’s goal was to undermine the then-ruling party by
keywords associated with manipulator-promoted attributes and promoting anti-government comments while limiting pro-
121 keywords associated with manipulator-demoted attributes. government comments. Therefore, we hypothesize that the
Using the list, for each user and time period, we counted the salience of FSE would predict a relative increase in public
number of search activities containing manipulator-promoted attention to news with anti-government sentiment compared to
keywords (𝐾𝑆𝑖𝑡 ) and the number of searches containing news with pro-government sentiment.
manipulator-demoted keywords ( 𝐾𝑆𝑖𝑡𝑑𝑒𝑚𝑜 ). We next
operationalized the manipulator-intended political attribute by A crucial step for the analysis of political sentiment was
calculating the net count difference between the determining the political leanings of news articles that users
aforementioned two types of search activities. viewed following their exposure to the focal news page. To
identify the political orientation, we adopted an advanced semi-
Further, we examined a user’s subsequent viewing of news supervised machine learning (ML) approach called label
articles following the keyword search. Because a keyword propagation (LP) (see Appendix A for details of our LP model).
search returns a list of relevant news articles, a user may select Semi-supervised learning is best suited for scenarios in which
one or more of those articles from the list to get more insights, only a small number of labeled samples are available, whereas
which we refer to as search-induced page views. To measure most of the data are unlabeled (Zhou et al., 2003; Fujiwara &
search-induced page views, we first identified news articles Irie, 2014). The LP model has been shown to achieve
with a headline that contained the search keywords of interest. considerably more accurate performances for various
Then, we counted the number of the identified news articles applications by combining both labeled and unlabeled samples
viewed, contingent on the user viewing them within an hour of together during training compared to supervised ML models
the keyword search. Specifically, we used the embedding that utilize only labeled samples (e.g., Tarvainen & Valpola,
vectors obtained from our doc2vec model to compute the cosine 2017; Iscen et al., 2019). Notably, semi-supervised learning is
similarity between search keywords and articles and counted becoming increasingly popular due to the high cost of expert
the number of views of the top-n% most similar articles. Based data labeling along with the increasing need for large-scale
on the similarity measures, we counted news page views driven training data. In the IS literature, while both supervised and
by the manipulator-promoted search keywords unsupervised ML models have been extensively studied and
( 𝑃𝑉𝑖𝑡 ) and by the manipulator-demoted search employed, the investigation of semi-supervised learning has
keywords (𝑃𝑉𝑖𝑡𝑑𝑒𝑚𝑜−𝑠𝑒𝑎𝑟𝑐ℎ ). The difference between these two been extremely limited, with the exception of the work by
Abbasi et al. (2012) on financial fraud detection.
In our setting, we used the well-known political bias of engagements that user 𝑖 was exposed to at the time of her visit
partisan Korean news media (Lim et al., 2019) as the initial to the focal news article (i.e., 𝐹𝑆𝐸𝑖 ≡ 𝐹𝑆𝐸𝑡=𝑖’s arrival time ).
labels of articles, resulting in less than 18% of articles being
labeled as either pro- or anti-government. A similar The beta parameters, namely, 𝛽0 , 𝛽1 , and 𝛽2 , measured the
approach was used in David et al. (2016) to predict the change in page views following a visit to the focal news article
political orientation of Facebook users based on posts from or a control news article. That is, 𝛽0 captured the change in page
the pages of political parties. We note that our LP model views induced by the news content, 𝛽1 represented the baseline
achieved the best accuracy (F1-score of 0.913), compared to effect of the exposure to the manipulated comment section, and
other representative ML models (see Appendix A). Finally, 𝛽2 denoted the moderating effect of the salience of FSE.
using the political orientations of articles identified by our
LP model, we operationalized public attention to One challenge in estimating the effect of FSE on public
manipulator-intended political sentiment by the difference in attention is that comment visibility (i.e., comment rankings) is
the page views between the news articles with pro- a function of both FSE and organic user engagement. That is,
government sentiment (𝑃𝑉𝑖𝑡 ) and the news articles with the FSE variable computed by Equation (1) would be affected
anti-government sentiment (𝑃𝑉𝑖𝑡 ). not only by manipulated votes but also by organic votes cast by
general users. Econometrically, this gives rise to an issue of
endogeneity due to the correlation between the FSE variable
and the idiosyncratic error term in Equation (2). The virality of
Spillover Effects of Bot-Assisted Fake the focal news, for example, might affect both the number of
Social Engagement on Public Attention organic votes and public attention to news simultaneously.
Another source of correlation might be reverse causation in that
organic users’ news consumption might affect the FSE variable
RQ: A Spillover Effect of Bot-Assisted FSE on by increasing the number of organic votes. Thus, we use two-
Public Attention to News stage least squares (2SLS) estimation (Greene, 2017; Angrist &
Pischke, 2008), using the following first-stage equation, to
Bot-assisted FSE operations target a comment section within a attribute FSE only to the effect of the salience of manipulation
news article page. Hence, the effect of FSE cannot be accurately operations across comments:
estimated unless the model accounts for the variance due to the
exposure to the news article’s content. Furthermore, not all 𝐹𝑆𝐸𝑖 = 𝛿0 + 𝛿1 𝑈𝑉𝑖 + 𝛿2 𝐷𝑉𝑖 + 𝜉𝑖 , (3)
users who visit the attacked article’s comment space would be
exposed to the same level of manipulation: Depending on when where 𝑈𝑉𝑖 is the number of the manipulator’s upvotes for the
a user visits the article, the salience of FSE is different, as is its
comments to which user 𝑖 was exposed, 𝐷𝑉𝑖 is the number of
effect on the exposed user. Accordingly, we estimated the
the manipulator’s downvotes for the comments to which user
following two-way fixed effects regression model that
𝑖 was exposed, and 𝜉𝑖 is a random error term.
controlled the content effect and the exposure effect, along with
time and individual fixed effects:
Manipulative votes cast by a bot-assisted manipulator serve as
valid instrumental variables for identifying the effect of FSE
𝑃𝑉𝑖𝑡 = 𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 + 𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 +
on organic users’ attention and news consumption. First, they
𝛽2 𝑃𝑜𝑠𝑡𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝑢𝑖 + 𝑣𝑡 + 𝜀𝑖𝑡 , (2)
are clearly correlated with the FSE variable measured by
Equation (1), due to their direct influence on the ranking of
where 𝑃𝑜𝑠𝑡𝑖𝑡 is an indicator variable that indicates whether comments according to the platform’s comment-ranking
time 𝑡 occurred after the exposure to the news content or before algorithm, satisfying the inclusion restriction. Second, they
(1: after, 0: before) in either the treatment or the control group. are independent of the error term in the organic users’ news
It is unique to each user since the user visits the focal news consumption model (i.e., Equation 2). Because the
article or control news articles at various time periods. 𝐹𝑜𝑐𝑎𝑙𝑖 manipulative votes were generated by bots that cast a vast
is an indicator variable that indicates whether user 𝑖 is in the number of upvotes and downvotes for targeted comments that
treatment group (i.e., who visited the focal news article with match Druking’s keyword list, public users’ attention to news
FSE) or the control group, which was not affected by the has no bearing on the generation of manipulative votes. In
manipulator (1: treatment group, 0: control group); 𝑢𝑖 is a fixed addition, public users are unable to detect or distinguish the
effect for user 𝑖; 𝑣𝑡 captures a fixed effect for time 𝑡; and 𝜀𝑖𝑡 presence of manipulative votes from organic votes, further
represents an idiosyncratic error term that follows a standard confirming the independence between manipulative votes and
normal distribution. Note that the salience of FSE has subscript the idiosyncratic error term for users’ attention to news,
𝑖 instead of subscript 𝑡 . 𝐹𝑆𝐸𝑖 is the salience of fake social satisfying the exclusion restriction.
Table 2 shows the results of the 2SLS estimation. The first- respectively. Table 4 shows that the magnitude of the FSE
stage estimation results reveal that as expected, the bot- effect was smaller for female users than for male users, and its
generated upvotes increased the relative salience of FSE while magnitude was larger for younger users (under the age of
its downvotes decreased it. Furthermore, both the R-squared thirty) than for those in their sixties or older.
and F-statistic values of the regression were large (i.e., 𝑅2 =
0.390, 𝐹 = 557), alleviating the concern of weak instruments
(Bound et al., 1995). Notably, the second-stage regression H1: First-Level Agenda Setting (Effect of Bot-
results show that the FSE effect was statistically significant Assisted FSE on Public Attention to Political
and positive. That is, as the salience of FSE increased by one News Over Non-Political News)
unit, users increased their subsequent news consumption on
the platform by 0.332 pages per hour. According to H1, the salience of FSE should draw more public
attention to political news than non-political news. We tested
The effect of FSE may change over time. To distinguish its this hypothesis by examining the effect of FSE on the
short-term and long-term effects, we introduced an interaction difference in page views between political and non-political
term between the main effect and a short-term dummy (𝑆𝑇𝑖𝑡 ) news articles (i.e., sports, entertainment, and other
that represented the first three hours after leaving the focal miscellaneous news):
news page. 11 Table 3 reveals that the short-term effect was
𝑝𝑜𝑙𝑖 𝑛𝑜𝑛𝑝𝑜𝑙𝑖
greater than the long-term effect. For the first three hours, a 𝑃𝑉𝑖𝑡 − 𝑃𝑉𝑖𝑡 = 𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 + 𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 +
one-unit increase in the salience of FSE increased a user’s 𝛽2 𝑃𝑜𝑠𝑡𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝑢𝑖 + 𝑣𝑡 + 𝜀𝑖𝑡 (4)
subsequent hourly news consumption by 1.339 (= 0.195 +
1.144) page views. After the first three hours, its effect was Overall, the results support H1, showing the positive and
still positive yet reduced to 0.195 page views per hour. Since significant effect of the salience of FSE on the net difference
our data spanned up to 39 hours from the publication of the in page views between the politics section and non-politics
focal news article, we were unable to empirically measure the sections: 𝛽2 = 0.048, p < 0.01 for the comparison with sports,
effect’s longevity after 39 hours. 𝛽2 = 0.103, p < 0.01 for the comparison with entertainment,
𝛽2 = 0.027, p < 0.01 for the comparison with other news (see
In addition, the effect of FSE might vary by demographic Table 5). The results suggest that the rate of increase in news
group. We investigated the effect’s user heterogeneity by consumption induced by FSE was greater in the political news
incorporating interaction terms with a user’s gender and age, domain compared to non-political news topics.
The choice of a three-hour window for the short-term period was made observed a consistent pattern in which the impact of the manipulator is
empirically by experimenting with different time windows. Although the temporarily strong but significantly weakens in the long run.
magnitude of the impact changes according to short-term durations, we
Table 5. Impact of FSE on Public Attention to Political News over Non-political News
Dependent variable
Politics Politics Politics
Parameter Independent variable vs. sports vs. entertainment vs. other news
𝒑𝒐𝒍𝒊 𝒔𝒑𝒐𝒓𝒕𝒔 𝒑𝒐𝒍𝒊 𝒑𝒐𝒍𝒊
(𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒊𝒕 ) (𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒆𝒏𝒕𝒆𝒓
𝒊𝒕 ) (𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒐𝒕𝒉𝒆𝒓
𝒊𝒕 )
𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 0.031 (0.008) 0.042*** (0.007) 0.088*** (0.007)
𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 0.059*** (0.006) 0.059*** (0.006) -0.026*** (0.006)
𝛽2 ̂𝑖
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.048*** (0.008) 0.103*** (0.008) 0.027*** (0.007)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors are in parentheses. Estimates of user and time fixed effects are omitted for brevity.
H2: Second-Level Agenda Setting (Effect of Bot- show the salience of FSE increased the difference in search-
Assisted FSE on Public Attention to Manipulator- induced page views between articles associative with
Promoted Compared to Manipulator-Demoted manipulator-promoted search keywords and articles
Political Attributes) associative with manipulator-demoted search keywords (𝛽2
= 0.027, p < 0.01).
According to H2, the salience of FSE should direct more
public attention to manipulator-promoted political attributes Second, we conducted the same fixed effects regression
than manipulator-demoted political attributes. H2 was tested analysis using a different dependent variable: the net
in three ways: by examining (a) proactive public attention, difference in page views for articles with and without
operationalized by keyword searches and search-induced headlines associated with the manipulator’s FSE keywords.
page views; (b) passive public attention, operationalized by The results are consistent with the results for the search-
induced page views. That is, the salience of FSE increased
page views of articles whose headlines were semantically
the difference in page views between articles with similar
similar to manipulator’s keywords (no search involved); and
headlines to the manipulator’s promoting keywords and
(c) political sentiment, operationalized by page views of articles with similar headlines to the manipulator’s demoting
anti- vs. pro-government news. keywords (𝛽2 = 0.021, p < 0.01).
First, we regressed the difference in keyword search counts Lastly, we tested the second-level agenda-setting effect in
between searches containing manipulator-promoted terms of political sentiment. The results indicate that the
keywords and manipulator-demoted keywords on the same salience of FSE increased the difference in page views
set of independent variables as in Equation (2). Table 6 between articles with anti-government sentiment and articles
shows that the salience of FSE increased the above- with pro-government sentiment ( 𝛽2 = 0.007, p < 0.01),
mentioned difference in keyword searches (𝛽2 = 0.040, p < which is well-aligned with the manipulator’s intention.
0.01). Additionally, because a keyword search results in a
list of pertinent news articles, we examined the influence of To summarize, all results in Table 6 demonstrate that the
FSE on search-induced page views. We estimated the same salience of FSE directed greater public attention to political
fixed effects regression model with the difference in search- attributes consistent with the manipulator’s goal, thus
induced page views as a new dependent variable. The results supporting H2.
Table 6. Impact of FSE on Public Attention to Political Attributes over Non-political Attributes
Dependent variable
Passive public
Proactive public attention Political sentiment
Parameter Independent variable Search-induced
Search keywords Related page
𝒑𝒓𝒐𝒎𝒐 page views Political sentiment
(𝑲𝑺𝒊𝒕 − 𝒑𝒓𝒐𝒎𝒐−𝒔𝒆𝒂𝒓𝒄𝒉 views 𝒂𝒏𝒕𝒊𝒈𝒐𝒗 𝒑𝒓𝒐𝒈𝒐𝒗
(𝑷𝑽𝒊𝒕 − 𝒑𝒓𝒐𝒎𝒐 (𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒊𝒕 )
𝒊𝒕 ) (𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒅𝒆𝒎𝒐
𝒊𝒕 )
𝒊𝒕 )
0.058*** 0.040*** 0.026*** 0.011***
𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡
(0.010) (0.004) (0.003) (0.002)
-0.041*** -0.025*** 0.022*** -0.004**
𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖
(0.008) (0.003) (0.002) (0.002)
̂𝑖 0.040*** 0.027*** 0.021*** 0.007***
𝛽2 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸
(0.010) (0.004) (0.003) (0.003)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors are in parentheses. Estimates of user and time fixed effects are omitted for brevity.
Lastly, we developed a multisite entry, relative time model where 𝐿𝑒𝑎𝑑2𝑖𝑡 is a relative time dummy that indicates the
(Angrist & Pischke, 2008; Autor, 2003) to conduct a Granger time span from twelve to twenty-four hours prior to the
causality test. If the salience of FSE is a cause, a change in exposure to the focal article’s comment section, and 𝐿𝑎𝑔0𝑖𝑡 ,
users’ news consumption would be predicted by past exposure 𝐿𝑎𝑔1𝑖𝑡 , and 𝐿𝑎𝑔2𝑖𝑡 are relative time dummies for 0~12,
to FSE (i.e., lag) but not by future exposure to FSE (i.e., lead). 12~24, and 24~36 hours after manipulation exposure,
The following is the lead-lag regression equation: respectively. Note that a dummy indicating zero to twelve
hours before the arrival at the focal article (i.e., 𝐿𝑎𝑔1𝑖𝑡 ) is
omitted as the base group. Tables 7 and 8 shows the null
𝑃𝑉𝑖𝑡 = 𝛽2,−2 𝐿𝑎𝑔2𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝛽2,−1 𝐿𝑎𝑔1𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + effect of the lead variable. Only after the exposure to the
𝛽2,0 𝐿𝑎𝑔0𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝛽2,+2 𝐿𝑒𝑎𝑑2𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝑢𝑖 + manipulation does the FSE effect become statistically
𝑣𝑡 + 𝜀𝑖𝑡 , (5) significant, lending support to the causal effect of FSE on
users’ news consumption. However, the positive effect does programmer behind bots) game (deceptively) the process of
not last long, rapidly diminishing within a day. public agenda setting on digital platforms. By deploying bots,
the manipulator can rapidly amplify social engagement
In sum, our empirical results hold consistently across various volume at scale, which in turn results in the rearrangement of
conditions, as summarized in Tables 7 and 8. The robustness information positions and eventually elicits a bandwagon of
check analyses show that our empirical findings were neither public attention in the manipulator’s favor. Further, this study
driven by nor dependent on a particular choice of dependent contends that the influence of bot-assisted FSE does not just
variable, independent variable, and parameters for the stay in the immediately manipulated space but leaks into a
machine learning procedure. larger information consumption ecosystem. By adopting
agenda-setting theory, this study elaborates a mechanism by
which a manipulator plays the role of a public agenda setter
by falsely amplifying the salience of selective messages.
General Discussion Importantly, deceptive agenda setting does not necessitate
creating one’s own fake messages. Manipulators can
This study examines the spillover effect of bot-assisted fake manufacture public attention by rapidly scaling the visibility
social engagement (FSE), a widespread false amplification of existing genuine content in their favor. Despite its
practice in the global disinformation industry, on prevalence and significance due to cost-effectiveness, bot-
manufacturing public attention in a large information assisted FSE has been largely overlooked in the literature due
ecosystem. Based on the algorithmic conduit brokerage to the difficulty of obtaining compatible empirical data
perspective (Salge et al., 2022) and the agenda-setting sources, which should ideally disambiguate inauthentic
framework (McCombs & Valenzuela, 2020), we pose a engagement from organic engagement. In this sense, this
research question of whether FSE produces the spillover effect study’s focus on bot-assisted FSE uniquely advances
on public attention to information beyond the immediately disinformation research.
manipulated context (RQ) and hypothesize that the salience of
FSE shifts public attention in line with the manipulator’s In addition to disinformation research, this study contributes
intention (H1 and H2). This study advances disinformation to advancing agenda-setting theory by theorizing a deceptive
research by integrating bot- and user-centered approaches to agenda-setting mechanism and developing computational
demonstrate that bots’ capacity for the rapid scaling of social processes to empirically demonstrate it. In particular, our
engagement elicits a false bandwagon of public attention. We semi-supervised ML modeling approach to detecting and
integrate the two approaches by empirically examining the including associative textual cues as compositions of the issue
spillover of bot operation effects into a broader information attributes echoes the tenet of the network agenda-setting
environment. Methodologically, this study leverages a unique model, an advanced branch of agenda-setting theory that
large-scale user-behavioral data source and the ground truth contends that the audience remember news not only as single
of disinformation bot activities, coupled with advanced semi- issues/attributes but also as a bundle of mental associations
supervised ML techniques. (Vu et al., 2014; Guo & Vargo, 2015). To our knowledge, this
study is the first attempt to incorporate advanced machine
Considering that disinformation campaigns have increasingly learning techniques to infer associative concepts that represent
incorporated automation software, understanding the issue attributes.
mechanism of bot-assisted FSE and its effect on the general
public’s attention may offer theoretical and managerial The study’s findings suggest both first- and second-level
insights into disinformation’s harms on digital information agenda-setting effects of bot-assisted FSE on public attention.
commons. This section discusses the study’s theoretical On the first level, we examined news domain-specific page
implications, methodological contributions, and managerial views by comparing page views for the politics news section
implications for scholars, practitioners, and policymakers. to those for non-politics sections. The findings revealed that
bot-assisted FSE operations have a first-level agenda-setting
effect on how the public allocates its attention, as our findings
Theoretical Contributions reveal that the exposed users directed greater attention to
political news than to non-political news such as sports and
Theoretically, bot-assisted FSE manifests functions of entertainment. On the second level, we compared political
algorithmic conduit brokerage, particularly in terms of bots’ attribute-specific news page views between articles that
ability for social alerting and rapid scaling (Salge et al., 2022). contained manipulator-promoted political attributes and those
In addition to an algorithmic conduit brokerage perspective, that contained manipulator-demoted attributes. The findings
we use agenda-setting theory to explain a mechanism of how confirm the second-level agenda-setting effect, as the FSE
bot-assisted FSE helps the human manipulator (i.e., the effect was greater for page views with manipulator-promoted
attributes than for those with manipulator-demoted attributes. our investigation into the second-level agenda-setting effect.
The results were consistent for proactive public attention Since text embedding models are not dependent on specific
(keyword searches and search-induced page views), passive context or language characteristics (Grave et al., 2018), our
public attention (other page views that occurred without search), proposed approach is generalizable to a wide range of
and political sentiment-driven public attention. Altogether, the languages. Our main approach, semi-supervised learning, is
empirical findings attest to the spillover influence of bot- ideally suited for situations with a limited amount of labeled
assisted FSE on the general public’s broader information (news) data that is mixed with abundant unlabeled data during
consumption beyond the immediate context targeted by a training, resulting in substantial performance improvements
manipulator. Our findings of disinformation effects on general (Tarvainen & Valpola, 2017; Iscen et al., 2019). Despite its
users’ information behaviors add new insights to existing advantages, semi-supervised learning has attracted little
knowledge that has thus far centered around subpopulation attention in the IS literature, with the exception of Abbasi et
groups of ideologically like-minded and/or heavy platform al. (2012). In this paper, we demonstrate how such an
users, based on a somewhat narrow definition of the sphere of approach can achieve superior accuracy in predicting specific
disinformation influence within the immediate interaction attributes of information (e.g., political orientation of articles).
context. Our study is one of the first in the IS literature to implement
semi-supervised learning to empirical research, broadening
the ML spectrum beyond the dichotomy of unsupervised and
Methodological Contributions supervised learning.
crafted false messages, in counteracting disinformation subsequently manufacture public attention to information.
operations. Concerted efforts of online platforms and policy This research contributes to the IS literature by broadening our
regulators will be necessary, and data-driven empirical theoretical understanding of a bot-assisted disinformation
insights, such as our findings, can serve as shared intelligence technique and by demonstrating how a computational and
in the process. data-driven approach can help quantify its effects on general
users’ informational behaviors. We hope this study will lead
to more IS scholarly attention to the misuse/abuse of digital
Limitations and Future Research technologies and their ramifications on cybersocial security.
Initiative’s Center on Narrative, Disinformation, and Strategic
Appendix A
We describe our label propagation (LP) model used to infer the political sentiment of articles that users visited. Two main advantages of LP
are (1) local consistency: nearby data points are likely to have the same label, and (2) global consistency: data points on the same structure
(i.e., manifold or cluster) are likely to have the same label. The core idea of LP is to construct an affinity graph from all labeled and unlabeled
samples and then iteratively propagate the known labels to the unlabeled samples according to the graph structure. More formally, the
algorithm proceeds as follows:
1. Form an affinity graph 𝐺 and its corresponding adjacency matrix 𝑊, where nodes represent samples and edges capture their pairwise
similarities (e.g., k-nearest neighbor graph).
2. Construct the normalized Laplacian 𝐿 = 𝐷 −1/2 𝑊𝐷 −1/2 , where 𝐷 is the diagonal matrix of node degrees (necessary for convergence).
3. Iterate 𝐹𝑡+1 = 𝜆𝐿𝐹𝑡 + (1 − 𝜆)𝑌 until convergence, where 𝐹𝑡 represents the labels at the 𝑡-th iteration, 𝜆 is a hyperparameter between
0 and 1 that specifies the relative amount of initial label information to retain, and 𝑌 is the vector of initial known labels.
To construct an affinity graph with articles as nodes, we computed pairwise cosine similarities between 342,567 articles using their embedding
vectors obtained from our doc2vec model (described in the Organic User Activities section), which has been shown to be accurate in detecting
political biases in articles (e.g., Baly et al., 2020; Kang & Yang, 2020). From the pairwise similarities, we formed a sparse k-nearest neighbor
graph with 𝑘 = 15 as the affinity graph 𝐺. For the iterations in Step 3, we set 𝜆 = 0.4.
Figure A1(a) shows an example of label propagation iterations. Starting from the nodes corresponding to Article 1 (labeled as “P”) and Article
2 (labeled as “A”), the initial known labels are propagated to other articles according to the affinity graph at each iteration. We also compare
our LP model to other representative supervised ML models, including feed-forward neural network (FNN), logistic regression (LR), gradient
boosting trees (GBT), and k-nearest neighbor (kNN) classifiers. Figure A1(b) depicts that the LP model yields the best prediction accuracy
(0.913) measured by the F1-score (a standard accuracy metric for classification tasks) averaged over multiple stratified 5-fold cross-
validations. We note that hyperparameters of the compared models are tuned with validation sets and F1-scores are reported using separate
test sets.
Note: (a) Shows an example of label propagation iterations where the initial known labels of Article 1 (“P”) and Article 2 (“A”) are spread to
other articles (i.e., nodes) according to the affinity graph. (b) Presents the prediction accuracies (F1-score) of different ML models showing
that label propagation (LP) achieves the best performance.
Figure A1. Example of Label Propagation Iterations and Performance Comparison