Disinformation Spillover - Uncovering The Ripple Effect of Bot-Ass

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

RESEARCH ARTICLE

DISINFORMATION SPILLOVER: UNCOVERING THE RIPPLE


EFFECT OF BOT-ASSISTED FAKE SOCIAL ENGAGEMENT ON
PUBLIC ATTENTION1
Sanghak Lee
W. P. Carey School of Business, Arizona State University
Tempe, AZ, U.S.A. {[email protected]}

Donghyuk Shin
College of Business, Korea Advanced Institute of Science and Technology (KAIST)
Seoul, REPUBLIC OF KOREA {[email protected]}

K. Hazel Kwon
Walter Cronkite School of Journalism and Mass Communication, Arizona State University
Phoenix, AZ, U.S.A. {[email protected]}

Sang Pil Han


W. P. Carey School of Business, Arizona State University
Tempe, AZ, U.S.A. {[email protected]}

Seok Kee Lee


College of IT Engineering, Hansung University
Seoul, REPUBLIC OF KOREA {[email protected]}

Disinformation activities that aim to manipulate public opinion pose serious challenges to managing
online platforms. One of the most widely used disinformation techniques is bot-assisted fake social
engagement, which is used to falsely and quickly amplify the salience of information at scale. Based on
agenda-setting theory, we hypothesize that bot-assisted fake social engagement boosts public attention in
the manner intended by the manipulator. Leveraging a proven case of bot-assisted fake social engagement
operation in a highly trafficked news portal, this study examines the impact of fake social engagement on
the digital public’s news consumption, search activities, and political sentiment. For that purpose, we
used ground-truth labels of the manipulator’s bot accounts, as well as real-time clickstream logs
generated by ordinary public users. Results show that bot-assisted fake social engagement operations
disproportionately increase the digital public’s attention to not only the topical domain of the
manipulator’s interest (i.e., political news) but also to specific attributes of the topic (i.e., political
keywords and sentiment) that align with the manipulator’s intention. We discuss managerial and policy
implications for increasingly cluttered online platforms.
Keywords: Opinion manipulation, disinformation, fake social engagement, political bots, online platforms,
econometrics, machine learning

1
Likoebe M. Maruping was the accepting senior editor for this paper.
Ofer Arazy served as the associate editor.

DOI:10.25300/MISQ/2023/17195 MIS Quarterly Vol. 48 No. 3 pp. 847-872 / September 2024 847
Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Introduction (McCombs & Valenzuela, 2020). Building on this theoretical


framework, we maintain that bot-assisted fake social
Disinformation, also known as adversarial information engagement amplifies the salience of not only the targeted topic
operation (Weedon et al., 2017) or network or computational (or issue) itself but also its specific traits. The amplified salience
propaganda (Benkler et al., 2018; Bradshaw & Howard, 2018), of the targeted topic and its traits, in turn, influence the
refers to deceptive informational activities that aim to distribution of public users’ attention in a broader information
manipulate public opinion (Freelon & Wells, 2020). Recent consumption context. This theoretical framework is applied to a
cases of disinformation operations have shown that employing specific empirical case of South Korea’s “Druking” scandal, one
programmable bots to disrupt digital information commons has of the most infamous opinion rigging scandals in this country.3
become common globally (Bradshaw et al., 2021). The Druking scandal is the epitome of a bot-assisted fake social
engagement, as described more later. Bot-assisted fake social
The current study aims to explore whether bot-assisted fake engagement operation is not particular to the Druking case but is
social engagement, a widely used technique of disinformation widely observed across various global contexts, for example,
operation, influences public attention to information beyond the politicians creating fake likes to inflate the popularity of their
manipulated context. Fake social engagement operations falsely posts on Facebook (Wong & Ernest, 2021).
amplify the salience of given information by manipulating the
volume of user engagement with it (e.g., up/down voting, To broaden the scope of disinformation influence, we look at
liking, sharing). Fake social engagement operations are integral how bot-assisted fake social engagement operations change
to false amplification in today’s digital environment because general users’ informational behaviors, regardless of their
user engagement metrics serve as fundamental signals for many political affinity. Previous studies have illustrated the ways in
digital platforms’ content curation algorithms to determine what which disinformation actors (and their contents) mobilize a
to prioritize for display.2 By deploying bots, manipulators can small, niche group of ideologically likeminded users (e.g., Bail
boost engagement metrics at scale with rapidity (Salge et al., et al., 2020; Bastos & Mercea, 2019; Freelon et al., 2022), or
2022). Despite being a major prong of today’s disinformation disloyal heavy internet users (Nelson & Taneja, 2018). However,
operation, how bot-assisted fake social engagement affects the no study, to our knowledge, has focused on general public users,
networked public’s information consumption patterns has not except one survey study that found no effect of disinformation
yet been empirically explored. This inquiry should be of campaigns on them on Twitter—currently known as X (Bail et
particular interest to information systems (IS) scholars, as it al., 2020). Based on large-scale clickstream data, the current
suggests that bots can pollute information commons by abusing study shows how bot-assisted fake social engagement influences
the platform’s content curation policy (Mindel et al., 2018). the general public’s informational behaviors in terms of what
types of news they subsequently view, what keywords they use
In addition to focusing on bot-assisted fake social engagement, to search, and what they actually click on. Empirical research
this study broadens the understanding of disinformation about the effects of disinformation on the general public’s
influence by scoping the sphere of influence beyond the broader information consumption patterns is scant, due to the
immediate context that disinformation actors disrupt. Existing rarity of data. Such research requires a natural experiment setting
studies have predominantly focused on users’ direct that captures general users’ real-time access to information while
engagement with manipulated content, for example, on how contrasting between users who are exposed to a disinformation
users accept (Effron & Raj, 2020) or share (or intend to share) operation and nonexposed users. Our data source meets both
fake information (e.g., Pennycook et al., 2021; Weidner et al., conditions, offering a unique opportunity to examine
2020). Few studies have evaluated the extent to which the disinformation effects on general users at scale.
manipulation effect continues even after users leave the context
in which manipulation occurs. By exploring how falsely In the following sections, we first review the current bot-
amplified messages insinuate themselves into organic user assisted disinformation research to point out two gaps in the
choices of nonmanipulated information, this study reflects the existing literature. We then introduce agenda-setting theory as
reality that disinformation operates not in isolation but in a a theoretical framework to explain how disinformation
broader information ecosystem. operation, particularly fake social engagement, can influence
public attention to information, followed by the presentation of
To examine the spillover effect of bot-assisted fake social our empirical study on the Druking scandal, an exemplary case
engagement on manufacturing users’ attention to information, of bot-assisted fake social engagement operation.
this study adopts a media theory, agenda-setting theory

2 3
For example, Facebook (https://blog.hootsuite.com/facebook-algorithm) https://en.wikipedia.org/wiki/2018_opinion_rigging_scandal_in_South_Korea
and Twitter algorithms (https://blog.hootsuite.com/twitter-algorithm) are
centered around engagement metrics.

848 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Literature Review importantly, the ability of algorithmic conduit brokerage for


“rapid scaling” is integral to a wide spectrum of platforms
wherein the main goal of bot activities is amplification. The
Disinformation Research: Connecting Bot-
“outcome [of bot actions] … is always high volume and not
Centered and User-Centered Approaches necessarily high reach, although both are certainly possible”
(Salge et al., 2022, p. 247, italics original). In online comment
Digital opinion manipulations, largely known as disinformation, sections, for example, bot-assisted fake social engagement
have evolved into a serious problem of cybersecurity, may not necessarily increase audience reach but will generate
presenting substantial challenges to public communication and high volumes of clicks or votes, which can help amplify the
information systems. Increased academic attention has focused targeted information’s visibility by rearranging the display of
on understanding bots and their roles in disinformation information. In this case, bot-assisted fake social engagement
diffusion. For example, Stella et al. (2018) examined how social operations in online comment spaces exploit the bot’s capacity
bots (e.g., software-controlled social media accounts) for rapid scaling.
maneuvered political opinion dynamics on Twitter during the
2017 Catalan referendum. Also, Gorodnichenko et al. (2021)
Whereas algorithmic conduit brokerage theory and related bot
described the diffusion of information in social media and the
research have offered insights into the mechanism of bot-
role of bots in shaping public opinion based on their analysis of
driven information diffusion, bot-centered disinformation
Twitter data on Brexit and the U.S. presidential election in 2016.
research has been disconnected from another main branch of
Other studies have inferred political bot activities based on the
disinformation research that centers on the effects of
bot-ness measure or ephemerality of accounts to understand the
disinformation on users’ perceptions, attitudes, and behaviors.
impact of bot-like accounts on amplifying political messages
This “user-centered” research has revealed conditions in
(Bastos & Mercea, 2019; Boichak et al., 2021).
which users become vulnerable to falsehoods and evaluated
how users interact with fake messages. For example,
Whereas most literature on disinformation bots is based on the Pennycook et al.’s experimental study (2018) highlighted the
network-structural view, Salge et al. (2022) importantly “illusory truth effect” of fake news, one type of disinformation,
suggested taking “a processual view of diffusion” (p. 230) showing that even a single prior exposure could enhance the
based on the concept of algorithmic conduit brokerage. (falsely) perceived accuracy of fake news. In the context of
Algorithmic conduit brokerage refers to the deliberate design science communication, Scheufele and Krause (2019)
and programming of bots as information brokers. Bots are examined the processes through which citizens become
programmed to play the role of information broker in multiple subject to scientific disinformation, concluding that
ways, ranging from social information alerts to rearranging vulnerability to scientific falsehoods should be determined not
shapes, forms, and structures of information (reconfiguration), only by individual-level characteristics, such as the person’s
to adding/inserting new information (embellishment) and ability and motivation to detect falsehoods, but also by group-
transmitting information (Salge et al., 2022). Considering that level and societal factors that facilitate access to correct(ive)
Twitter has been a dominant platform for bot research, the information. Other studies (Carnahan & Garrett, 2020; Kahan
algorithmic conduit brokerage perspective is perhaps best suited et al., 2017) have shown that users not only accept deceptive
to Twitter-like platforms. For example, Salge et al. (2022) information but also contribute to its propagation when the
emphasized the role of bots in “actually transferring information message affirms their cultural or political identity; conversely,
between parties” (p. 230). While such actual transmission can users resist the correcting message if it is identity-threatening.
be observed on Twitter in a relatively obvious form (i.e.,
retweets), it may not be apparent in other types of platforms—
Overall, the user-centered disinformation research suggests
for example, online comment sections.
that it is not the general population but specific audience
groups that are prone to engaging with fake content. For
Nevertheless, the algorithmic conduit brokerage perspective example, Chen et al. (2021) showed that COVID-19
offers overarching insights into the theorizing of misinformation about the inefficacy of wearing masks and an
disinformation bots. For example, Salge et al. (2022) suggest election conspiracy theory of voter fraud was pushed by a
“algorithmic social alertness” as the first step for bot activity “small but dense cluster of conservative users” on Twitter (p.
through which bots are programmed to search and discover 2). Nelson and Taneja (2018) analyzed audience visitation
already existing content and curate it in the programmer’s (i.e., data of fake and real news sites during the 2016 U.S.
human manipulator behind the bot operation) favor. presidential election campaigns, finding that heavy internet
Algorithmic social alertness is performed on a variety of web users who were not loyal to a mainstream news outlet were
platforms, not just on platforms with dynamic social feeds, the main fake news consumers.
such as Twitter but also in rather linearly designed platforms,
such as discussion forums or comment sections. More

MIS Quarterly Vol. 48 No. 3 / September 2024 849


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

While existing user-centered studies have offered lessons on disinformation reality because disinformation influence can
what makes users engage with or react to disinformation also be indirect: Users may be surreptitiously exposed to the
messages, these studies have predominantly examined message manipulator’s intention, and even the simplest exposure could
characteristics, providing little explanation about what happens result in a ripple effect on the consumption of other
to users when their attention is “hacked” by bots’ false informational sources without further direct interactions with
amplification (Marwick & Lewis, 2017, p. 19). To summarize, the manipulators or their content.
the two branches of disinformation research, bot-based
information diffusion studies and user effect studies, have rarely Second, disinformation operations entail not only the creation
been integrated, leaving the question of how bots’ amplifying of fake content but also the creation of fake engagement with
activities alter the digital public’s information consumption existing content in the manipulator’s favor. Thus far, a
patterns open to further exploration. considerable body of literature has focused on the effects of the
former—for example, by examining what makes fake content
persuasive, how it is propagated, and how it is detected (e.g.,
Missing Pieces: Influence Spillover and Fake Vosoughi et al., 2018; Cresci, 2020), with little attention paid to
Social Engagement the disruption of the digital information commons caused by
fake social engagement. Thus far, disinformation studies that
To fill in this gap, we expand on two aspects of a real-world have examined social engagement have mainly focused on
disinformation operation that have not yet been thoroughly organic social engagement with fake content. For example,
explored by empirical research. First, in reality, disinformation Edelson et al. (2021) found that about 70% of all user
never occurs in an isolated dyad between the manipulator (or engagements across far-right news pages on Facebook were
manipulated content) and a user. On the contrary, the immediate made with misinformation content. In another study, Freelon et
context in which users are exposed to a perpetrator’s action is a al. (2022) showed that user engagement with disinformation
subset of a larger information ecosystem. Therefore, the effect of tweets became disproportionately large when the tweets
a successful disinformation campaign is likely to extend beyond originated from fake accounts pretending to be Black activists.
the direct interaction between the manipulator (or manipulated A handful of studies have paid attention to fake (mostly bot-
content) and the user and spill into other settings of information assisted) social engagement activities (e.g., Boichak et al.,
consumption. Several qualitative case studies have alluded to 2021); however, to our knowledge, no study has taken a user-
this point by describing how disinformation perpetrators work centered approach to examine how bot-assisted fake social
not in isolation but exploit existing media networks. For example, engagement affects individual users’ informational behaviors.
successful disinformation content created in an online troll
community does not stay within the community but is picked up The reasons for the dearth of user-centered studies in the bot
by mainstream media attention, reaching broad audiences literature are twofold. First, it is difficult to detect bot activities
(Phillips, 2015; Marwick & Lewis, 2017). unless ground-truth labels are available. As a result, developing
detection techniques is a complex scientific problem that
That said, systematic empirical analyses of disinformation demands considerable effort (e.g., Varol et al., 2017; Cresci,
effects have mostly focused only on direct interactions between 2020). Second, the primary bot activities have thus far
manipulative content and users. This is understandable because functioned as information brokers (i.e., algorithmic conduit
it is rare to obtain data that represent the spillover of brokerage) rather than original content creators. Since the
disinformation influence. Nevertheless, previous findings that consequence of conduit brokerage is more nuanced than content
disinformation is engaged only by niche audience groups are creation, it is difficult to empirically differentiate between users
based on such limited measures, resulting in an incomplete who are exposed to bot activities and those who are not.
representation of disinformation’s sphere of influence. For
example, Nelson and Taneja (2018) measured fake news Despite the challenges, understanding the effects of bot-assisted
consumption by using site-visitation data, one of the most fake social engagement on individual users is imperative
proactive measures of audience engagement. Based on this, because of the phenomenon’s prevalence and significance. Bot-
they argued that broad users were seldom vulnerable to fake assisted fake social engagement is prevalent due to its cost-
news. Similarly, Bail et al.’s Twitter study (2020) found that effectiveness (e.g., Jeong et al., 2020; Carman et al., 2018;
Russia’s disinformation accounts were engaged mostly by Schäfer et al., 2017; Rossi et al., 2020). Also, it is a powerful
highly partisan users with high-frequency usage of Twitter, tactic because social engagement metrics are pivotal indicators
concluding that “Russian trolls might have failed to sow discord of content popularity fed into a platform’s content curation
because they mostly interacted with those who were already algorithms. Accordingly, we first ask the following question:
highly polarized” (p. 243). However, the Bail et al. study was
based on non-representative survey data matched with the RQ: Does bot-assisted fake social engagement have spillover
metrics of direct engagement with the troll accounts or their effects on public attention to information beyond the
messages. The findings of these studies are a partial snapshot of manipulated context?

850 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Bot-Assisted Fake Social Engagement and While extant studies have alluded to the agenda-setting
Public Attention: An Agenda-Setting Theoretical potential of disinformation, they have focused only on fake
Framework news sites and the transporting of their narratives to other
mainstream media outlets. To our knowledge, no study has
Bot-assisted fake social engagement operations center on the examined disinformation’s agenda-setting effect on general
interplay among human perpetrators, automation (bots), and public users, particularly in terms of bot-assisted fake social
platform algorithms to “manufacture consensus or to engagement operations.
otherwise give the illusion of general support for a (perhaps
controversial) political idea or policy, with the goal of creating Bot-assisted fake social engagement operations facilitate a
a bandwagon effect” (Woolley & Howard, 2016, p.4, bandwagon of public attention through the mechanism of rapid
emphasis added). In information consumption contexts, the scaling (Salge et al., 2022). The rapidly inflated engagement
bandwagon effect is manifest in the shift of public attention to volume makes it look like a large number of “real” users are
certain types of information. interested in the (falsely) amplified topic, which can, in turn,
increase organic public attention to the topic. Technically
Agenda-setting theory (McCombs & Valenzuela, 2020) is a speaking, social engagements can be manipulated solely by
useful theoretical framework for explaining how fake social human workers. However, fake engagement operations would
engagement influences public attention. It suggests that the have little impact on rearranging the salience of information
media has the ability to influence audiences in terms of which unless the metric is rapidly fabricated at scale. In other words,
issue to pay attention to as an important public agenda and bots’ “rapid scaling” (Salge et al., 2022) of social engagement
which attributes of the issue to pay attention to in order to and the subsequent bandwagon effect resonates with the tenet
make sense of the issue (McCombs & Valenzuela, 2020). The of agenda-setting theory.
agenda-setting effect of news media on the public’s mind has
been well documented in the media and journalism literature Agenda-setting theory includes two levels of media effects on
since the seminal evidence of news media’s agenda-setting shaping public attention to news agendas (McCombs &
function a half-century ago. McCombs and Shaw (1972) Valenzuela, 2020). The first-level agenda-setting effect, also
found a significant association between the amount of news known as “issue agenda setting” (Kim et al., 2002), refers to the
coverage of political agendas during an election campaign and media’s ability to determine the hierarchy of public agendas by
the public ranking of the importance of agendas for the informing the audience what topic (issue or object) it should pay
election. Numerous studies have since then confirmed that the more attention to. The frequency of topics in news articles
public’s understanding of political reality is influenced by the influence how the audience prioritizes the importance of these
salience of issues emphasized in news coverage. topics (McCombs & Valenzuela, 2020). For example, if the
media covers news about Samsung Galaxy smartphones more
Provided that the media’s agenda-setting effect occurs by frequently than Apple iPhones, the audience will pay more
increasing the salience of information, disinformation actors attention to Samsung’s smartphones than Apple’s. The first-level
may also play the role of agenda setters by amplifying the agenda-setting effect can occur on an even more abstract topic
salience of the information that conveys their preferences. A domain. For example, if the media reports on foreign affairs more
few studies have alluded to this point. For example, Guo and frequently than on the domestic economy, the audience will be
Vargo (2020) showed that fake news stories exaggerated likely to consider international politics to be a more important
politician attributes, such as moral quality, leadership quality, current issue than domestic economic conditions.
and intellectual ability, to affect public attitudes toward
political candidates. Rojecki and Meraz (2016) examined
In other words, first-level agenda setting is about the media’s
conspiratorial information transmissions during the 2004 U.S.
influence on public attention to a topic. In this study’s empirical
presidential campaigns. They found that while the visibility of
context, where disinformation was related to a political issue,
conspiracies on the Google search results was not directly
associated with users’ overall search trend—an indicator of we posit a hypothesis that suggests the first-level agenda-setting
the naturally occurring volume of online public attention—the effect of bot-assisted fake social engagement on public attention
visibility of conspiratorial information on the search results to a political topic. That is, when a bot-assisted fake social
influenced traditional media’s coverage of it, which in turn engagement operation targets political content, the intensity of
was associated with users’ overall search trend. Vargo et al. exposure to the operation predicts a relative increase in public
(2018) analyzed big data from news archives, demonstrating attention to political news compared to non-political news.
that fake news sites had a stronger “intermedia” agenda-
setting effect (p. 2030) on legitimate news coverage than fact- H1: The salience of bot-assisted fake social engagement
checking sites, particularly by transferring their agendas to predicts an increase in public attention to political news
partisan news outlets (e.g., Fox News). compared to non-political news.

MIS Quarterly Vol. 48 No. 3 / September 2024 851


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Meanwhile, second-level agenda setting, also known as setting effect occurs not only in the context of single attributes
“attribute agenda setting” (Kim et al., 2002), focuses on the but impacts bundles of mental associations in a so-called
presentation of attributes, qualities, or characteristics of a network agenda-setting effect (e.g., Guo & Vargo, 2015; Vu et
certain topic and its effect on how the audience will al., 2014). Drawing upon the recent theoretical development of
subsequently perceive or feel about that topic (Kiousis, 2005; the network agenda-setting model, we posit a second-level
McCombs & Valenzuela, 2020). For example, if the media agenda-setting hypothesis based on associative keywords and
frequently focuses on Samsung Galaxy’s foldable design when texts resonating with the manipulator’s intention. Given this
reporting on smartphone features, the audience will begin to study’s empirical context, featuring a disinformation operation
prioritize foldable design as the smartphone’s most important directed toward a political issue, we posit a second-level
attribute and pay more attention to information related to this agenda-setting hypothesis that centers on political attributes:
feature when they think about Samsung Galaxy smartphones.
On the contrary, if the media frequently reports on Samsung
H2: The salience of bot-assisted fake social engagement
Galaxy’s alleged benchmark manipulation with regard to the
predicts an increase in public attention to manipulator-
speed, battery life, and overall performance,4 the audience will
promoted political attributes compared to manipulator-
prioritize benchmark manipulation as the smartphone’s most
demoted political attributes (e.g., political keywords and
important attribute and will pay attention to information related
sentiment).
to this feature when they think about Samsung Galaxy
smartphones. In other words, the second-level agenda-setting
Figure 1 conceptually illustrates the fake social engagement-
effect is about the media’s ability to prime the audience’s
driven agenda-setting effect thesis that we propose. The lower
attitudinal or emotional reaction to a topic, because the selective
part of Figure 1 illustrates that bot-assisted fake social
presentation of attributes transmits sentiment, whether intended
engagement distorts the salience of topics and their attributes. In
or not, and subsequently influences the audience’s attitude
this example, Topic 1 is the target of manipulation in which
toward the topic (Coleman & Wu, 2010; Kim et al., 2002).
Attribute A3 is promoted and Attribute A2 is demoted through
bot-assisted fake social engagement. This manipulated salience
Likewise, bot-assisted fake social engagement operations may
then influences the distribution of public attention, as illustrated
engender the second-level agenda-setting effect by inflating the by the upper-right part of Figure 1, where Topic 1 becomes the
salience of certain attributes of the targeted topic, which may, most dominant topic and Attribute A3 emerges as the most salient
in turn, increase public attention to these attributes. Recent attribute while A2 becomes the least salient attribute of Topic 1.
advances in agenda-setting theory suggest that the agenda-

Figure 1. Agenda-Setting Effect in Bot-Assisted Fake Social Engagement Context

4
https://www.classaction.org/blog/samsung-phone-lagging-class-action-
alleges-the-company-misled-consumers-on-speed-and-performance

852 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Research Context Ground Truth of Opinion Manipulation

Digital Opinion Manipulation: The 2018 The Druking accounts were established based on the verified
Druking Scandal documents issued by the law enforcement department and the
details pertinent to general users are fully de-identified. The
subject of the focal article gained traction: 39,827 comments
We focused on an online opinion-rigging scandal that
were posted within 24 hours after the story was first published
occurred in South Korea. In 2018, South Korea experienced a
on January 17, 2018, at 9:35 a.m. Korea Standard Time (KST).
major disinformation activity, widely referred to as the
It is worth noting that none of the comments came from the
Druking scandal (Choe, 2018). “Druking” was the screen Druking accounts. That is, they were all posted by authentic
name for the disinformation operation team’s leader, who had users. As a result, the manipulator’s primary goal was to affect
been a popular blogger while secretly founding a shadow the popularity of comments created by others rather than to
company that ran illegitimate internet political campaigns create its own comments. Druking’s operation was clearly
utilizing political trolls. The company operated during the directed at altering the upvote/downvote counts of existing
2017 South Korean presidential election campaign to comments. Over the 24-hour time span, some 2,300 Druking
influence public opinion. While its initial political position accounts were used approximately 1.2 million times to cast
was aligned with the Democratic Party (the then ruling party), upvotes or downvotes in order to alter the ranking of the
in 2018 it assumed an anti-government stance. In 2018, the current comments targeted by the manipulator.
Druking team was indicted for rigging online comments. The
main locations where the Druking team operated were spaces
for news comments on major Korean portal sites. Given that
South Korea has a 96% internet penetration rate, with the vast Data
majority of online news consumption occurring via portal sites
and an active presence of news comment culture, such digital We examined one of the leading online news platforms in
opinion manipulations can have substantial ramifications.5,6 South Korea. On this platform, each news article’s page was
composed of the main article and the user comment space
One of the primary activities of the Druking team was to dedicated to the main article. The ranking of comments was
manipulate the ranking of comments on a news site. To this determined by their popularity, measured as upvotes (i.e., the
number of thumbs-ups it received) compared to downvotes.
end, they used a programmable code called “KingCrab,” a
Therefore, manipulators could escalate (decrease) the ranking
macro-based bot that cast a massive number of up/down votes
of comments they wanted to promote (demote) by generating
for certain targeted comments. The ranking of comments was
a large number of upvotes (downvotes) on them using a
important because the top-ranked comments achieved a higher programmable bot. An example screenshot of a news article
degree of visibility than the rest of the comments. The and its user comments from the focal platform is shown in
Druking operation team’s key action involved the selection of Figure 2.
target comments and the manipulation of their rankings by
pushing their favored (disapproved) comments to the top In partnership with the platform’s company, we obtained
(bottom) of the list. access to its proprietary data on user behaviors and
clickstream information, amounting to more than 108 million
As with Reddit and other online news aggregators, the focal raw user log entries. The granularity of the data enabled us to
platform determined the ranking of comments on a particular observe how the user-generated comment section embedded
news page based on their net vote count (i.e., total upvotes in a news article’s page was shaped over time and how users’
minus total downvotes per comment). The platform used news searching and viewing activities across the platform
phone verification to authenticate users’ identities during the changed after the consumption of a news article’s page. In the
account registration process and permitted only one upvote or following sections, we first describe how a bot-assisted fake
downvote per comment. Nonetheless, the Druking team social engagement operation influenced the real-time process
managed to circumvent this by obtaining and leveraging through which the user-generated comment section of the
thousands of legitimately created user accounts to create a focal article was created. Then we shift our focus to the
large number of upvotes and downvotes in an attempt to behaviors of organic users and provide the descriptive
manipulate news comment spaces in its favor. statistics that illustrate their activities on the platform.

5 6
https://www.digitalnewsreport.org/survey/2020/south-korea-2020 Korea Press Foundation (2018, Media Issue 5): User survey about portal
news and comments. https://bit.ly/3KkdnVK

MIS Quarterly Vol. 48 No. 3 / September 2024 853


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Note: Contents of the article and user comments were translated from Korean to English using Google Translate.

Figure 2. Example Screenshot of the Focal Platform

Figure 3. Dynamics of Comment Postings

Formation of a User-Generated Comment deterministic ranking mechanism of comments, this dataset


Section enabled us to recover the entire process of the formation of the
comment section.
When the focal article was published at 9:35 a.m. on January
17, 2018 (KST), the audience began to immediately utilize the A total of 3,775 comments were ranked in the top 1,000 at least
comment section. Some commented directly on the article, once during our sample span. The first comment appeared four
others indicated their agreement with an existing comment with minutes after the news article was published, and the last
an upvote or their disagreement with a downvote, while still comment was posted 25 hours later. Despite the fact that the
others viewed the comments passively without responding. The news was published on a weekday morning, approximately
first part of our data showed how the top 1,000 comments on 80% of the comments were generated within the first three
the focal article changed over time, recording the cumulative hours, showing that users reacted to the news article quickly
counts of upvotes and downvotes on each comment. Given the (see Figure 3).

854 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

The platform ranked user-generated comments in the order of presents 10 different comment convergence trends. The
popularity, as determined by the number of upvotes received rankings fluctuated significantly within the first three hours and
minus the number of downvotes. As a result, either upvoting or then steadily converged to their final positions at around five
downvoting on a particular comment would influence the hours, which was to be anticipated given that the rankings were
comment’s salience by changing its relative position. Knowing decided by the cumulative number of upvotes and downvotes.
this, the manipulator used programmable bots to increase the Although some comments shifted upward or held their status
number of upvotes on comments he endorsed while producing over time, others moved down due to downvotes, a surge of
downvotes on comments he wanted to suppress. Importantly, other comments, or the introduction of new comments.
however, the manipulator did not have full control because a
large portion of votes were generated by organic users with More importantly, the manipulator’s vote distribution was
diverse viewpoints. clearly distinct from that of organic votes. Figure 6 shows the
source of votes for the top 10 comments as of the last time point
Figure 4 depicts the number of upvotes and downvotes created in our data, ordered by their total net upvotes, including votes
by bots, as well as organic users, over time. There was a total of from both manipulator and organic users. The manipulator
953,578 votes cast on 3,775 comments, with 719,609 upvotes created upvotes to promote six comments (C1, C3, C4, C6, C7,
(75.46%) and 233,969 downvotes (24.54%). The manipulator C8, C9) and suppress four (C2, C4, C5, C10), demonstrating his
was responsible for 31.77% of the total upvotes and 20.92% of goal-directed behavior. That said, the manipulator did not have
the total downvotes, and its voting activities increased two complete control of the opinion landscape: While the operation
hours after the focal article was published. Organic users’ votes, succeeded in positioning six of his preferred comments among
on the other hand, appeared more quickly and had a longer tail the top 10, he was unable to overcome the organic popularity of
than the manipulator’s votes. This suggests that the manipulator the other four comments that he disapproved of. Nonetheless,
took some time to identify his target news page and the manipulative votes appeared to have a significant impact on the
comments on it and prepare for the attack. Then, he ceased the final ranking of the comments, even when organic votes
operation when the effect of the votes became muted due to the outnumbered them. For example, if the manipulator had voted
large volume of accumulated votes. against comment C1 while supporting comment C2, the relative
positions of the two comments would have been reversed. In
Comment rankings fluctuated over time and eventually total, the manipulator promoted 998 comments and suppressed
converged to a final rank, as illustrated in Figure 5, which 247 comments.

Figure 4. Trends of Upvotes and Downvotes by Manipulators and Organic Users

MIS Quarterly Vol. 48 No. 3 / September 2024 855


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Figure 5. Dynamics of Rankings for Ten Selected Comments

Note: Comments are ordered by their total net upvotes (upvotes minus downvotes), including those from both manipulator and organic users,
as of the last time point in our data.

Figure 6. Sources of Votes for the Top 10 Comments

Organic User Activities To identify articles that were extremely similar to the focal
article, we used the doc2vec model (Le & Mikolov, 2014),
The organic user activity dataset was made up of the full server which has rapidly gained popularity in the IS literature (e.g.,
log details of 23,735 general users from January 15, 2018, to Qiao et al., 2020; Shin et al., 2020) due to its state-of-the-art
January 18, 2018. We removed 384 users who had no activity performance in various natural language processing tasks. We
during the period, were younger than eight years of age, or had fine-tuned a large-scale doc2vec model pre-trained on more
missing demographic data. The samples were then divided into than 6.3 GB of text data7 using a total of 342,567 articles that
one of two categories. The first group, the treatment group, users visited during the sample period. Based on our doc2vec
consisted of 17,335 users who visited the focal news article with model, we first represented each article by its embedding vector
a manipulator-targeted comment section. The second group, in a latent feature space. 8 Then, using the cosine similarity
which we called the control group, consisted of 3,868 users who between their embedding vectors, we computed the similarity
visited one of 34 articles that contained highly similar content between the focal article and all other articles. Along with
to the focal article but had not been attacked by the manipulator. manual verification, we selected a total of 34 articles that were

7 8
https://ko-nlp.github.io/Korpora We use 𝑑 = 300 for the dimension of the embedding vectors. Other
reasonable values of the dimension ( 100 ≤ 𝑑 ≤ 500 ) yield similar
empirical results.

856 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

most similar to the focal article as control group articles.9 To the publication of the focal article and then steadied at a positive
avoid cross-contamination, we eliminated 2,148 users who value of around 3.5 as the comment ranks stabilized. Figure 8(b)
visited both focal and control articles from our sample. As a depicts the arrival time of organic users at the focal article and
result, the valid sample included 21,203 organic users. its comment section. Visitors to the focal article within the first
five hours accounted for 25% of overall viewers of the focal
Our data enabled us to examine both pre- and post-visit log files article. If we extended the time window to the first ten hours,
for each user account since the focal news article was published the percentage rose to 62%. The variation in users’ arrival times
at 9:35 a.m. on January 17, 2018 (KST). An average user in both at the focal article resulted in a variation in the composition of
the treatment and control groups visited 153.2 pages and spent the comment section to which each user was exposed.
3.1 hours per day on the platform over the course of four days
(see Table 1). The two groups were comparable in terms of
Public Attention
platform engagement, with no statistically significant
differences in the number of logs, page views, votes, or amount
We operationalized public attention by using page views. That
of time spent during the sample period. However, there was a
high degree of heterogeneity among users, which is indicated by is, we measured the total amount of user 𝑖’s attention to news in
large sample standard deviations. In addition, the user activities time 𝑡 by the number of news pages that user 𝑖 viewed during
exhibited a clear pattern of temporal variation. Figure 7 depicts the corresponding window of one hour, which was denoted by
how an average user’s page views per hour changed over time. 𝑃𝑉𝑖𝑡 . Further, we decomposed public attention to news by
Users were more active in the afternoon and evening than late at topical categories to investigate the shift in public attention
night and early in the morning. The first three days contained a caused by FSE (i.e., the first-level agenda-setting effect). In order
higher number of user page views than the remaining days. to test the first-level agenda-setting effect (H1) given the political
nature of the focal news article in our empirical context, we
compared user attention to political and non-political news
articles using the platform’s preset news categories. The non-
Variables 𝑠𝑝𝑜𝑟𝑡𝑠
political news sections included sports ( 𝑃𝑉𝑖𝑡 ) ,
𝑒𝑛𝑡𝑒𝑟 10 𝑜𝑡ℎ𝑒𝑟
Salience of Bot-Assisted Fake Social Engagement entertainment (𝑃𝑉𝑖𝑡 ), and other news (𝑃𝑉𝑖𝑡 ).

To falsely amplify the visibility of preferred comments, the Political Attributes


manipulator promoted some comments by upvoting them and
demoted others by downvoting them. Accordingly, we To test the second-level agenda-setting effect (H2), we
measured the salience of bot-assisted fake social engagement examined changes in page views within the political news
(FSE henceforth) using the visibility difference between section. Given that Druking promoted and demoted certain
manipulator-promoted and demoted comments at time 𝑡: political messages, we operationalized public attention to
manipulator-intended political attributes by calculating the net
1 1
𝐹𝑆𝐸𝑡 = ∑∀𝑘∈𝐾𝑝𝑟𝑜𝑚𝑜𝑡𝑒 − ∑∀𝑘∈𝐾𝑑𝑒𝑚𝑜𝑡𝑒 , (1) difference in page views between the articles whose content
𝑟𝑎𝑛𝑘𝑖𝑛𝑔𝑘𝑡 𝑟𝑎𝑛𝑘𝑖𝑛𝑔𝑘𝑡
matched manipulator-promoted attributes and articles whose
where 𝑟𝑎𝑛𝑘𝑖𝑛𝑔𝑘𝑡 denotes the ranking of comment 𝑘 at time 𝑡, content matched manipulator-demoted attributes. Specifically,
and 𝐾𝑝𝑟𝑜𝑚𝑜𝑡𝑒 and 𝐾𝑑𝑒𝑚𝑜𝑡𝑒 represent the set of manipulator- we constructed two variables: (1) proactive public attention and
(2) passive public attention.
promoted comments and the set of manipulator-demoted
comments, respectively. That is, we subtracted the sum of the
inverse rankings of manipulator-demoted comments from the Proactive public attention (keyword search and search-
sum of the inverse rankings of manipulator-promoted induced page views): One way for a user to proactively find
comments. By using this metric, we not only weighed higher- news articles is through the use of search keywords. Some
ranking comments (i.e., comments with a higher rank are more search keywords may resonate with the manipulator’s intention.
likely to be viewed and hence more salient than comments with According to court proceedings, Druking automated FSE
a lower rank), but we also accounted for the volume of operations by selecting target comments using a list of
comments at 𝑡. keywords that aligned with his goal and then setting the desired
number of upvotes and downvotes for the selected comments.
Thus, the keywords used by Druking should be indicative of the
Figure 8(a) shows the temporal variation of the salience of FSE.
political attributes that he either promoted or demoted.
It fluctuated between -4 and 4 for the first five hours following

9 10
All control articles have a cosine similarity larger than 0.85 with the focal Note that the portal site bundles the rest of other topics into a single
article, which is at the top 0.0001% of all similarity scores. category called “(other) news.”

MIS Quarterly Vol. 48 No. 3 / September 2024 857


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Table 1. Platform Activity Statistics of Treatment and Control Groups


Treatment group Control group
(users who visited the (users who visited the Total
focal article) control articles)
Number of users 17335 3868 21203
Number of logs per day 332.0 (332.3) 341.2 (324.7) 333.7 (330.9)
Hours spent on platform per day 3.1 (2.2) 3.3 (2.2) 3.1 (2.2)
Number of page views per day 151.0 (121.2) 163.2 (122.9) 153.2 (121.6)
Number of news page views per day 18.4 (19.6) 25.8 (28.5) 19.7 (21.5)
Number of upvotes per day 14.8 (71.4) 14.0 (62.3) 14.7 (69.8)
Number of downvotes per day 4.9 (36.1) 5.2 (34.7) 4.9 (35.9)
Note: Sample standard deviations are in parentheses.

Figure 7. Page Views over Time

Figure 8. Salience of Bot-Assisted Fake Social Engagement and the Number of Viewers of the Focal
Article

858 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

To infer Druking’s keywords, we first computed the term page views offers additional operationalization of the
frequency-inverse document frequency (TF-IDF) scores across manipulator-intended political attribute based on search-
all comments. Then, using the ground-truth labels for Druking’s induced page views.
bot accounts, we found 43 promoted keywords and 21 demoted
keywords with the largest TF-IDF score differences across Passive public attention (unsearched page views of articles
comments with and without fake social engagement from with similar headlines): Users do not always navigate news
Druking accounts. The majority of upvoted (or promoted) articles by conducting proactive keyword searches; rather, they
keywords contained anti-government sentiments, such as frequently choose what to read due to incidental exposure to the
innuendo, mockery, or insinuation about the government and headline of an article. Indeed, 54% of our sample did not perform
president at the time, whereas the majority of downvoted (or any keyword searches during the sample period. To ascertain the
demoted) keywords contained terms referring to opinion effect of FSE on those who “passively” consumed news articles,
manipulation operations or investigations into harmful/fake we used our doc2vec model to examine the unsearched page
comments. The identified upvoted (downvoted) keywords views of all articles with headlines that were semantically similar
appeared in 90.5% (96.7%) of promoted (demoted) comments to the aforementioned Druking’s keywords. Following that, page
but not in any unmanipulated comments. views were calculated based on the top-10% most similar articles.
Finally, we operationalized passive attention to manipulator-
Given that general users may not always use search keywords intended political attributes by computing the difference in page
that are identical to manipulator-promoted/-demoted keywords, views between the news articles with headlines that resonated
𝑝𝑟𝑜𝑚𝑜
we included other keywords that were deemed highly with the manipulator’s promoting keywords ( 𝑃𝑉𝑖𝑡 ) and
associative with the manipulator’s keyword list. To do this, we news articles with headlines that were consonant with the
represented each keyword by its word embedding vector using manipulator’s demoting keywords (𝑃𝑉𝑖𝑡𝑑𝑒𝑚𝑜 ).
the doc2vec model (explained in the Organic User Activities
section). Then, we identified the top-100 most associative and Political sentiment (pro-government vs. anti-government):
semantically similar search keywords from over seven million Additionally, we examined the second-level agenda-setting
distinct search queries by calculating their cosine similarity to effect from the perspective of political position. Recall that
the manipulator’s keyword list. The final keyword list had 143 Druking’s goal was to undermine the then-ruling party by
keywords associated with manipulator-promoted attributes and promoting anti-government comments while limiting pro-
121 keywords associated with manipulator-demoted attributes. government comments. Therefore, we hypothesize that the
Using the list, for each user and time period, we counted the salience of FSE would predict a relative increase in public
number of search activities containing manipulator-promoted attention to news with anti-government sentiment compared to
𝑝𝑟𝑜𝑚𝑜
keywords (𝐾𝑆𝑖𝑡 ) and the number of searches containing news with pro-government sentiment.
manipulator-demoted keywords ( 𝐾𝑆𝑖𝑡𝑑𝑒𝑚𝑜 ). We next
operationalized the manipulator-intended political attribute by A crucial step for the analysis of political sentiment was
calculating the net count difference between the determining the political leanings of news articles that users
aforementioned two types of search activities. viewed following their exposure to the focal news page. To
identify the political orientation, we adopted an advanced semi-
Further, we examined a user’s subsequent viewing of news supervised machine learning (ML) approach called label
articles following the keyword search. Because a keyword propagation (LP) (see Appendix A for details of our LP model).
search returns a list of relevant news articles, a user may select Semi-supervised learning is best suited for scenarios in which
one or more of those articles from the list to get more insights, only a small number of labeled samples are available, whereas
which we refer to as search-induced page views. To measure most of the data are unlabeled (Zhou et al., 2003; Fujiwara &
search-induced page views, we first identified news articles Irie, 2014). The LP model has been shown to achieve
with a headline that contained the search keywords of interest. considerably more accurate performances for various
Then, we counted the number of the identified news articles applications by combining both labeled and unlabeled samples
viewed, contingent on the user viewing them within an hour of together during training compared to supervised ML models
the keyword search. Specifically, we used the embedding that utilize only labeled samples (e.g., Tarvainen & Valpola,
vectors obtained from our doc2vec model to compute the cosine 2017; Iscen et al., 2019). Notably, semi-supervised learning is
similarity between search keywords and articles and counted becoming increasingly popular due to the high cost of expert
the number of views of the top-n% most similar articles. Based data labeling along with the increasing need for large-scale
on the similarity measures, we counted news page views driven training data. In the IS literature, while both supervised and
by the manipulator-promoted search keywords unsupervised ML models have been extensively studied and
𝑝𝑟𝑜𝑚𝑜−𝑠𝑒𝑎𝑟𝑐ℎ
( 𝑃𝑉𝑖𝑡 ) and by the manipulator-demoted search employed, the investigation of semi-supervised learning has
keywords (𝑃𝑉𝑖𝑡𝑑𝑒𝑚𝑜−𝑠𝑒𝑎𝑟𝑐ℎ ). The difference between these two been extremely limited, with the exception of the work by
Abbasi et al. (2012) on financial fraud detection.

MIS Quarterly Vol. 48 No. 3 / September 2024 859


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

In our setting, we used the well-known political bias of engagements that user 𝑖 was exposed to at the time of her visit
partisan Korean news media (Lim et al., 2019) as the initial to the focal news article (i.e., 𝐹𝑆𝐸𝑖 ≡ 𝐹𝑆𝐸𝑡=𝑖’s arrival time ).
labels of articles, resulting in less than 18% of articles being
labeled as either pro- or anti-government. A similar The beta parameters, namely, 𝛽0 , 𝛽1 , and 𝛽2 , measured the
approach was used in David et al. (2016) to predict the change in page views following a visit to the focal news article
political orientation of Facebook users based on posts from or a control news article. That is, 𝛽0 captured the change in page
the pages of political parties. We note that our LP model views induced by the news content, 𝛽1 represented the baseline
achieved the best accuracy (F1-score of 0.913), compared to effect of the exposure to the manipulated comment section, and
other representative ML models (see Appendix A). Finally, 𝛽2 denoted the moderating effect of the salience of FSE.
using the political orientations of articles identified by our
LP model, we operationalized public attention to One challenge in estimating the effect of FSE on public
manipulator-intended political sentiment by the difference in attention is that comment visibility (i.e., comment rankings) is
the page views between the news articles with pro- a function of both FSE and organic user engagement. That is,
𝑝𝑟𝑜𝑔𝑜𝑣
government sentiment (𝑃𝑉𝑖𝑡 ) and the news articles with the FSE variable computed by Equation (1) would be affected
𝑎𝑛𝑡𝑖𝑔𝑜𝑣
anti-government sentiment (𝑃𝑉𝑖𝑡 ). not only by manipulated votes but also by organic votes cast by
general users. Econometrically, this gives rise to an issue of
endogeneity due to the correlation between the FSE variable
and the idiosyncratic error term in Equation (2). The virality of
Spillover Effects of Bot-Assisted Fake the focal news, for example, might affect both the number of
Social Engagement on Public Attention organic votes and public attention to news simultaneously.
Another source of correlation might be reverse causation in that
organic users’ news consumption might affect the FSE variable
RQ: A Spillover Effect of Bot-Assisted FSE on by increasing the number of organic votes. Thus, we use two-
Public Attention to News stage least squares (2SLS) estimation (Greene, 2017; Angrist &
Pischke, 2008), using the following first-stage equation, to
Bot-assisted FSE operations target a comment section within a attribute FSE only to the effect of the salience of manipulation
news article page. Hence, the effect of FSE cannot be accurately operations across comments:
estimated unless the model accounts for the variance due to the
exposure to the news article’s content. Furthermore, not all 𝐹𝑆𝐸𝑖 = 𝛿0 + 𝛿1 𝑈𝑉𝑖 + 𝛿2 𝐷𝑉𝑖 + 𝜉𝑖 , (3)
users who visit the attacked article’s comment space would be
exposed to the same level of manipulation: Depending on when where 𝑈𝑉𝑖 is the number of the manipulator’s upvotes for the
a user visits the article, the salience of FSE is different, as is its
comments to which user 𝑖 was exposed, 𝐷𝑉𝑖 is the number of
effect on the exposed user. Accordingly, we estimated the
the manipulator’s downvotes for the comments to which user
following two-way fixed effects regression model that
𝑖 was exposed, and 𝜉𝑖 is a random error term.
controlled the content effect and the exposure effect, along with
time and individual fixed effects:
Manipulative votes cast by a bot-assisted manipulator serve as
valid instrumental variables for identifying the effect of FSE
𝑃𝑉𝑖𝑡 = 𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 + 𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 +
on organic users’ attention and news consumption. First, they
𝛽2 𝑃𝑜𝑠𝑡𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝑢𝑖 + 𝑣𝑡 + 𝜀𝑖𝑡 , (2)
are clearly correlated with the FSE variable measured by
Equation (1), due to their direct influence on the ranking of
where 𝑃𝑜𝑠𝑡𝑖𝑡 is an indicator variable that indicates whether comments according to the platform’s comment-ranking
time 𝑡 occurred after the exposure to the news content or before algorithm, satisfying the inclusion restriction. Second, they
(1: after, 0: before) in either the treatment or the control group. are independent of the error term in the organic users’ news
It is unique to each user since the user visits the focal news consumption model (i.e., Equation 2). Because the
article or control news articles at various time periods. 𝐹𝑜𝑐𝑎𝑙𝑖 manipulative votes were generated by bots that cast a vast
is an indicator variable that indicates whether user 𝑖 is in the number of upvotes and downvotes for targeted comments that
treatment group (i.e., who visited the focal news article with match Druking’s keyword list, public users’ attention to news
FSE) or the control group, which was not affected by the has no bearing on the generation of manipulative votes. In
manipulator (1: treatment group, 0: control group); 𝑢𝑖 is a fixed addition, public users are unable to detect or distinguish the
effect for user 𝑖; 𝑣𝑡 captures a fixed effect for time 𝑡; and 𝜀𝑖𝑡 presence of manipulative votes from organic votes, further
represents an idiosyncratic error term that follows a standard confirming the independence between manipulative votes and
normal distribution. Note that the salience of FSE has subscript the idiosyncratic error term for users’ attention to news,
𝑖 instead of subscript 𝑡 . 𝐹𝑆𝐸𝑖 is the salience of fake social satisfying the exclusion restriction.

860 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Table 2. Impact of FSE on News Consumption


Parameter Variable Estimate SE
𝛿0 Intercept 0.219*** (0.015)
The first-stage equation 𝛿1 𝑈𝑉𝑖 0.646*** (0.006)
𝛿2 𝐷𝑉𝑖 -0.482*** (0.005)
𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 0.059*** (0.009)
The second-stage equation 𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 0.162*** (0.008)
𝛽2 ̂𝑖
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.332*** (0.010)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Estimates of user and time fixed effects are omitted for brevity.

Table 3. Time-varying Pattern of the FSE Effect


Parameter Variable Estimate SE
𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 0.050*** (0.009)
𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 0.158*** (0.008)
𝛽2,𝑏𝑎𝑠𝑒 ̂𝑖
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.195*** (0.010)
𝛽3,𝑠ℎ𝑜𝑟𝑡−𝑡𝑒𝑟𝑚 ̂
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸𝑖 × 𝑆𝑇𝑖𝑡 1.144*** (0.024)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Estimates of user and time fixed effects are omitted for brevity.

Table 2 shows the results of the 2SLS estimation. The first- respectively. Table 4 shows that the magnitude of the FSE
stage estimation results reveal that as expected, the bot- effect was smaller for female users than for male users, and its
generated upvotes increased the relative salience of FSE while magnitude was larger for younger users (under the age of
its downvotes decreased it. Furthermore, both the R-squared thirty) than for those in their sixties or older.
and F-statistic values of the regression were large (i.e., 𝑅2 =
0.390, 𝐹 = 557), alleviating the concern of weak instruments
(Bound et al., 1995). Notably, the second-stage regression H1: First-Level Agenda Setting (Effect of Bot-
results show that the FSE effect was statistically significant Assisted FSE on Public Attention to Political
and positive. That is, as the salience of FSE increased by one News Over Non-Political News)
unit, users increased their subsequent news consumption on
the platform by 0.332 pages per hour. According to H1, the salience of FSE should draw more public
attention to political news than non-political news. We tested
The effect of FSE may change over time. To distinguish its this hypothesis by examining the effect of FSE on the
short-term and long-term effects, we introduced an interaction difference in page views between political and non-political
term between the main effect and a short-term dummy (𝑆𝑇𝑖𝑡 ) news articles (i.e., sports, entertainment, and other
that represented the first three hours after leaving the focal miscellaneous news):
news page. 11 Table 3 reveals that the short-term effect was
𝑝𝑜𝑙𝑖 𝑛𝑜𝑛𝑝𝑜𝑙𝑖
greater than the long-term effect. For the first three hours, a 𝑃𝑉𝑖𝑡 − 𝑃𝑉𝑖𝑡 = 𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 + 𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 +
one-unit increase in the salience of FSE increased a user’s 𝛽2 𝑃𝑜𝑠𝑡𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝑢𝑖 + 𝑣𝑡 + 𝜀𝑖𝑡 (4)
subsequent hourly news consumption by 1.339 (= 0.195 +
1.144) page views. After the first three hours, its effect was Overall, the results support H1, showing the positive and
still positive yet reduced to 0.195 page views per hour. Since significant effect of the salience of FSE on the net difference
our data spanned up to 39 hours from the publication of the in page views between the politics section and non-politics
focal news article, we were unable to empirically measure the sections: 𝛽2 = 0.048, p < 0.01 for the comparison with sports,
effect’s longevity after 39 hours. 𝛽2 = 0.103, p < 0.01 for the comparison with entertainment,
𝛽2 = 0.027, p < 0.01 for the comparison with other news (see
In addition, the effect of FSE might vary by demographic Table 5). The results suggest that the rate of increase in news
group. We investigated the effect’s user heterogeneity by consumption induced by FSE was greater in the political news
incorporating interaction terms with a user’s gender and age, domain compared to non-political news topics.

11
The choice of a three-hour window for the short-term period was made observed a consistent pattern in which the impact of the manipulator is
empirically by experimenting with different time windows. Although the temporarily strong but significantly weakens in the long run.
magnitude of the impact changes according to short-term durations, we

MIS Quarterly Vol. 48 No. 3 / September 2024 861


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Table 4. Heterogeneity of the FSE Effect


Parameter Variable Estimate SE
𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 0.059*** (0.009)
𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 0.162*** (0.008)
𝛽2,𝑏𝑎𝑠𝑒 ̂𝑖
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.302*** (0.044)
𝛽2,female ̂
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸𝑖 × 𝐷female -0.043** (0.021)
𝛽2,age0119 ̂ 𝑖 × 𝐷age0119
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.237*** (0.068)
𝛽2,age2029 ̂ 𝑖 × 𝐷age2029
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.143*** (0.047)
𝛽2,age3039 ̂ 𝑖 × 𝐷age3039
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.001 (0.046)
𝛽2,age4049 ̂ 𝑖 × 𝐷age4049
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 -0.033 (0.047)
𝛽2,age5059 ̂ 𝑖 × 𝐷age5059
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.033 (0.051)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Estimates of user and time fixed effects are omitted for brevity.

Table 5. Impact of FSE on Public Attention to Political News over Non-political News
Dependent variable
Politics Politics Politics
Parameter Independent variable vs. sports vs. entertainment vs. other news
𝒑𝒐𝒍𝒊 𝒔𝒑𝒐𝒓𝒕𝒔 𝒑𝒐𝒍𝒊 𝒑𝒐𝒍𝒊
(𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒊𝒕 ) (𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒆𝒏𝒕𝒆𝒓
𝒊𝒕 ) (𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒐𝒕𝒉𝒆𝒓
𝒊𝒕 )
***
𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡 0.031 (0.008) 0.042*** (0.007) 0.088*** (0.007)
𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 0.059*** (0.006) 0.059*** (0.006) -0.026*** (0.006)
𝛽2 ̂𝑖
𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸 0.048*** (0.008) 0.103*** (0.008) 0.027*** (0.007)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors are in parentheses. Estimates of user and time fixed effects are omitted for brevity.

H2: Second-Level Agenda Setting (Effect of Bot- show the salience of FSE increased the difference in search-
Assisted FSE on Public Attention to Manipulator- induced page views between articles associative with
Promoted Compared to Manipulator-Demoted manipulator-promoted search keywords and articles
Political Attributes) associative with manipulator-demoted search keywords (𝛽2
= 0.027, p < 0.01).
According to H2, the salience of FSE should direct more
public attention to manipulator-promoted political attributes Second, we conducted the same fixed effects regression
than manipulator-demoted political attributes. H2 was tested analysis using a different dependent variable: the net
in three ways: by examining (a) proactive public attention, difference in page views for articles with and without
operationalized by keyword searches and search-induced headlines associated with the manipulator’s FSE keywords.
page views; (b) passive public attention, operationalized by The results are consistent with the results for the search-
induced page views. That is, the salience of FSE increased
page views of articles whose headlines were semantically
the difference in page views between articles with similar
similar to manipulator’s keywords (no search involved); and
headlines to the manipulator’s promoting keywords and
(c) political sentiment, operationalized by page views of articles with similar headlines to the manipulator’s demoting
anti- vs. pro-government news. keywords (𝛽2 = 0.021, p < 0.01).
First, we regressed the difference in keyword search counts Lastly, we tested the second-level agenda-setting effect in
between searches containing manipulator-promoted terms of political sentiment. The results indicate that the
keywords and manipulator-demoted keywords on the same salience of FSE increased the difference in page views
set of independent variables as in Equation (2). Table 6 between articles with anti-government sentiment and articles
shows that the salience of FSE increased the above- with pro-government sentiment ( 𝛽2 = 0.007, p < 0.01),
mentioned difference in keyword searches (𝛽2 = 0.040, p < which is well-aligned with the manipulator’s intention.
0.01). Additionally, because a keyword search results in a
list of pertinent news articles, we examined the influence of To summarize, all results in Table 6 demonstrate that the
FSE on search-induced page views. We estimated the same salience of FSE directed greater public attention to political
fixed effects regression model with the difference in search- attributes consistent with the manipulator’s goal, thus
induced page views as a new dependent variable. The results supporting H2.

862 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Table 6. Impact of FSE on Public Attention to Political Attributes over Non-political Attributes
Dependent variable
Passive public
Proactive public attention Political sentiment
attention
Parameter Independent variable Search-induced
Search keywords Related page
𝒑𝒓𝒐𝒎𝒐 page views Political sentiment
(𝑲𝑺𝒊𝒕 − 𝒑𝒓𝒐𝒎𝒐−𝒔𝒆𝒂𝒓𝒄𝒉 views 𝒂𝒏𝒕𝒊𝒈𝒐𝒗 𝒑𝒓𝒐𝒈𝒐𝒗
(𝑷𝑽𝒊𝒕 − 𝒑𝒓𝒐𝒎𝒐 (𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒊𝒕 )
𝑲𝑺𝒅𝒆𝒎𝒐
𝒊𝒕 ) (𝑷𝑽𝒊𝒕 − 𝑷𝑽𝒅𝒆𝒎𝒐
𝒊𝒕 )
𝑷𝑽𝒅𝒆𝒎𝒐−𝒔𝒆𝒂𝒓𝒄𝒉
𝒊𝒕 )
0.058*** 0.040*** 0.026*** 0.011***
𝛽0 𝑃𝑜𝑠𝑡𝑖𝑡
(0.010) (0.004) (0.003) (0.002)
-0.041*** -0.025*** 0.022*** -0.004**
𝛽1 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖
(0.008) (0.003) (0.002) (0.002)
̂𝑖 0.040*** 0.027*** 0.021*** 0.007***
𝛽2 𝑃𝑜𝑠𝑡𝑖𝑡 × 𝐹𝑜𝑐𝑎𝑙𝑖 × 𝐹𝑆𝐸
(0.010) (0.004) (0.003) (0.003)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Standard errors are in parentheses. Estimates of user and time fixed effects are omitted for brevity.

Robustness Checks Second, we examined the sensitivity of our empirical


findings with respect to the choice of the dependent variable.
To assess the robustness of the empirical findings, we While the number of page views was used to measure public
performed a series of robustness checks by operationalizing attention in the main analysis, the time users spent viewing
the model components differently. We first investigated the news articles can be used as an alternative proxy for users’
sensitivity of our results to various ways of measuring FSE attention. When we conducted our analyses with this
(i.e., independent variables) and public attention (i.e., alternative dependent variable, the results remained
dependent variable). Then, we examined the robustness of consistent. We also tested if our results were driven by a
the second-level agenda-setting test results by exploring small number of outliers by removing data points whose
different parameter choices for the machine learning page views were over the 99th percentiles of the page view
procedure. Lastly, we conducted a Granger causality test by distribution. The main results held regardless of the removal
developing a multisite entry, relative time model (Angrist & of outliers.
Pischke, 2008; Autor, 2003).
Third, we conducted sensitivity analyses with respect to the
First, we developed three alternative measures of the salience machine learning models we employed in the process of
of FSE. Note that the original FSE was computed by the testing the second-level agenda-setting effect. Because the
difference in the sum of the inverse rankings between identification of search keywords associated with the
manipulator-promoted and manipulator-demoted comments manipulator’s intention relied on a parameter of our choice,
(see Equation 1). The first alternative metric was based on which determined the number of most aligned keywords, we
averages rather than summation. Since the total number of tried different values (i.e., top 50, top 200) and confirmed
default comments displayed using a mobile device screen that the second-level agenda-setting effects held regardless
setting was five, while it was 10 for PC users, the average of of the choice of the parameter. Similarly, we also needed to
inverse rankings was not perfectly correlated with the original choose a cutoff value for identifying news articles that were
variable that employed the summation. The second metric in harmony with the manipulator’s keywords based on their
summed rankings as they were, rather than taking an inverse similarity. We confirmed the robustness of the main findings
and eliminating weights given in the reverse order of their by exploring different cutoff values (i.e., top 5% and top
positions. We expected the sign of the effect to be negative 20%). In addition, the degree of alignment between the
since lower values mean a greater salience of manipulator- manipulator’s intention and news articles can be computed
promoted comments over manipulator-demoted comments. in various ways. While the similarity was measured based on
The third metric considered only the top-ranked comment, the manipulator’s keywords and news titles as in the main
assigning it a value of 1 if the manipulator supported it, 0 if it analysis, we could use the manipulator-promoted (or
was neutral, and -1 if the manipulator opposed it. When each demoted) comments instead of the manipulator’s keywords
of the three alternative measures of the FSE variable was or use news content instead of news titles. By employing
applied, the results were consistent with the original results, as different combinations in computing similarity, we found
shown in Tables 7 and 8. consistent support for the second-level agenda-setting effect.

MIS Quarterly Vol. 48 No. 3 / September 2024 863


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Table 7. Robustness Checks for Baseline and First-Level Agenda-Setting Hypothesis


First-level agenda-setting (H1)
Baseline Politics vs. Politics vs. Politics vs.
sports entertainment other news
Original 0.332*** (0.010) 0.048*** (0.008) 0.103*** (0.008) 0.027*** (0.007)
IV: Average of inverse rankings 1.758*** (0.033) 0.431*** (0.027) 0.694*** (0.026) 0.269*** (0.025)
IV: Sum of rankings -0.007*** (0.001) ***
-0.006 (0.001) -0.006*** (0.001) -0.004*** (0.001)
IV: Indicator for top ranking 0.435*** (0.008) ***
0.104 (0.007) 0.170*** (0.006) 0.065*** (0.006)
DV: Time spent 0.631*** (0.020) 0.095*** (0.017) 0.222*** (0.015) 0.032* (0.018)
DV: Without outliers 0.243*** (0.006) 0.048*** (0.005) 0.090*** (0.004) 0.005 (0.005)
Lead2 -0.020 (0.012) -0.017 (0.013) 0.005 (0.010) 0.067*** (0.009)
Lag0 0.473*** (0.015) 0.166*** (0.013) 0.198*** (0.012) 0.121*** (0.012)
Lag1 0.251*** (0.017) 0.021 (0.015) 0.073*** (0.014) 0.047*** (0.013)
Lag2 0.074*** (0.021) 0.023 (0.019) 0.031* (0.017) 0.048*** (0.016)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Reported are the estimates of the parameter of interest (𝛽2) and their standard errors in parentheses.
Estimates of 𝛽0, 𝛽1, user and time fixed effects are omitted for brevity. Following Autor (2003), Lead2 is a relative time dummy that indicates the
time span from twelve to twenty-four hours prior to the exposure to the focal article’s comment section, and Lag0, Lag1, and Lag2 are relative
time dummies for 0~12, 12~24, and 24~36 hours after manipulation exposure, respectively.

Table 8. Robustness Checks for Second-Level Agenda-Setting Hypothesis


Second-level agenda setting (H2)
Search Search-induced Related page Political
keywords page views views sentiment
Original 0.040*** (0.010) 0.027*** (0.004) 0.021*** (0.003) 0.007*** (0.003)
IV: Average of inverse rankings 0.236*** (0.034) 0.173*** (0.012) 0.156*** (0.010) 0.022** (0.009)
IV: Sum of rankings -0.001 (0.001) -0.001*** (0.000) -0.002*** (0.000) 0.000 (0.000)
***
IV: Indicator for top ranking 0.058 (0.009) 0.042*** (0.003) 0.038*** (0.002) 0.006*** (0.002)
DV: Time spent NA 0.027*** (0.006) 0.037*** (0.009) 0.020 (0.014)
DV: Without outliers 0.018*** (0.002) 0.009*** (0.001) 0.018*** (0.002) 0.005*** (0.002)
***
Search keyword level: Top 50 0.038 (0.011) 0.026*** (0.003) NA NA
Search keyword level: Top 200 0.046*** (0.010) 0.027*** (0.004) NA NA
Similarity level: Top 5% NA 0.021*** (0.003) 0.026*** (0.002) 0.004* (0.002)
Similarity level: Top 20% NA 0.043*** (0.005) 0.019*** (0.004) 0.013*** (0.004)
Article title × Abuser comment NA 0.030*** (0.003) 0.065*** (0.003) NA
Article text × Abuser keyword NA 0.021*** (0.002) 0.023*** (0.002) NA
Article text × Abuser comment NA 0.016*** (0.002) 0.009*** (0.002) NA
Lead2 0.001 (0.006) 0.001 (0.002) -0.006 (0.004) 0.005 (0.004)
Lag0 0.027*** (0.007) 0.023*** (0.003) 0.071*** (0.004) 0.012*** (0.004)
Lag1 0.015* (0.008) 0.006** (0.003) 0.030*** (0.005) 0.011** (0.005)
Lag2 0.004 (0.010) 0.002 (0.004) 0.026*** (0.006) 0.021*** (0.006)
Note: *** p < 0.01, ** p < 0.05, * p < 0.1. Reported are the estimates of the parameter of interest (𝛽2) and their standard errors in parentheses.
Estimates of 𝛽0, 𝛽1, user and time fixed effects are omitted for brevity. Following Autor (2003), Lead2 is a relative time dummy that indicates the
time span from twelve to twenty-four hours prior to the exposure to the focal article’s comment section, and Lag0, Lag1, and Lag2 are relative
time dummies for 0-12, 12-24, and 24-36 hours after manipulation exposure, respectively. NA denotes “not applicable.”

Lastly, we developed a multisite entry, relative time model where 𝐿𝑒𝑎𝑑2𝑖𝑡 is a relative time dummy that indicates the
(Angrist & Pischke, 2008; Autor, 2003) to conduct a Granger time span from twelve to twenty-four hours prior to the
causality test. If the salience of FSE is a cause, a change in exposure to the focal article’s comment section, and 𝐿𝑎𝑔0𝑖𝑡 ,
users’ news consumption would be predicted by past exposure 𝐿𝑎𝑔1𝑖𝑡 , and 𝐿𝑎𝑔2𝑖𝑡 are relative time dummies for 0~12,
to FSE (i.e., lag) but not by future exposure to FSE (i.e., lead). 12~24, and 24~36 hours after manipulation exposure,
The following is the lead-lag regression equation: respectively. Note that a dummy indicating zero to twelve
hours before the arrival at the focal article (i.e., 𝐿𝑎𝑔1𝑖𝑡 ) is
omitted as the base group. Tables 7 and 8 shows the null
𝑃𝑉𝑖𝑡 = 𝛽2,−2 𝐿𝑎𝑔2𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝛽2,−1 𝐿𝑎𝑔1𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + effect of the lead variable. Only after the exposure to the
𝛽2,0 𝐿𝑎𝑔0𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝛽2,+2 𝐿𝑒𝑎𝑑2𝑖𝑡 𝐹𝑜𝑐𝑎𝑙𝑖 𝐹𝑆𝐸𝑖 + 𝑢𝑖 + manipulation does the FSE effect become statistically
𝑣𝑡 + 𝜀𝑖𝑡 , (5) significant, lending support to the causal effect of FSE on

864 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

users’ news consumption. However, the positive effect does programmer behind bots) game (deceptively) the process of
not last long, rapidly diminishing within a day. public agenda setting on digital platforms. By deploying bots,
the manipulator can rapidly amplify social engagement
In sum, our empirical results hold consistently across various volume at scale, which in turn results in the rearrangement of
conditions, as summarized in Tables 7 and 8. The robustness information positions and eventually elicits a bandwagon of
check analyses show that our empirical findings were neither public attention in the manipulator’s favor. Further, this study
driven by nor dependent on a particular choice of dependent contends that the influence of bot-assisted FSE does not just
variable, independent variable, and parameters for the stay in the immediately manipulated space but leaks into a
machine learning procedure. larger information consumption ecosystem. By adopting
agenda-setting theory, this study elaborates a mechanism by
which a manipulator plays the role of a public agenda setter
by falsely amplifying the salience of selective messages.
General Discussion Importantly, deceptive agenda setting does not necessitate
creating one’s own fake messages. Manipulators can
This study examines the spillover effect of bot-assisted fake manufacture public attention by rapidly scaling the visibility
social engagement (FSE), a widespread false amplification of existing genuine content in their favor. Despite its
practice in the global disinformation industry, on prevalence and significance due to cost-effectiveness, bot-
manufacturing public attention in a large information assisted FSE has been largely overlooked in the literature due
ecosystem. Based on the algorithmic conduit brokerage to the difficulty of obtaining compatible empirical data
perspective (Salge et al., 2022) and the agenda-setting sources, which should ideally disambiguate inauthentic
framework (McCombs & Valenzuela, 2020), we pose a engagement from organic engagement. In this sense, this
research question of whether FSE produces the spillover effect study’s focus on bot-assisted FSE uniquely advances
on public attention to information beyond the immediately disinformation research.
manipulated context (RQ) and hypothesize that the salience of
FSE shifts public attention in line with the manipulator’s In addition to disinformation research, this study contributes
intention (H1 and H2). This study advances disinformation to advancing agenda-setting theory by theorizing a deceptive
research by integrating bot- and user-centered approaches to agenda-setting mechanism and developing computational
demonstrate that bots’ capacity for the rapid scaling of social processes to empirically demonstrate it. In particular, our
engagement elicits a false bandwagon of public attention. We semi-supervised ML modeling approach to detecting and
integrate the two approaches by empirically examining the including associative textual cues as compositions of the issue
spillover of bot operation effects into a broader information attributes echoes the tenet of the network agenda-setting
environment. Methodologically, this study leverages a unique model, an advanced branch of agenda-setting theory that
large-scale user-behavioral data source and the ground truth contends that the audience remember news not only as single
of disinformation bot activities, coupled with advanced semi- issues/attributes but also as a bundle of mental associations
supervised ML techniques. (Vu et al., 2014; Guo & Vargo, 2015). To our knowledge, this
study is the first attempt to incorporate advanced machine
Considering that disinformation campaigns have increasingly learning techniques to infer associative concepts that represent
incorporated automation software, understanding the issue attributes.
mechanism of bot-assisted FSE and its effect on the general
public’s attention may offer theoretical and managerial The study’s findings suggest both first- and second-level
insights into disinformation’s harms on digital information agenda-setting effects of bot-assisted FSE on public attention.
commons. This section discusses the study’s theoretical On the first level, we examined news domain-specific page
implications, methodological contributions, and managerial views by comparing page views for the politics news section
implications for scholars, practitioners, and policymakers. to those for non-politics sections. The findings revealed that
bot-assisted FSE operations have a first-level agenda-setting
effect on how the public allocates its attention, as our findings
Theoretical Contributions reveal that the exposed users directed greater attention to
political news than to non-political news such as sports and
Theoretically, bot-assisted FSE manifests functions of entertainment. On the second level, we compared political
algorithmic conduit brokerage, particularly in terms of bots’ attribute-specific news page views between articles that
ability for social alerting and rapid scaling (Salge et al., 2022). contained manipulator-promoted political attributes and those
In addition to an algorithmic conduit brokerage perspective, that contained manipulator-demoted attributes. The findings
we use agenda-setting theory to explain a mechanism of how confirm the second-level agenda-setting effect, as the FSE
bot-assisted FSE helps the human manipulator (i.e., the effect was greater for page views with manipulator-promoted

MIS Quarterly Vol. 48 No. 3 / September 2024 865


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

attributes than for those with manipulator-demoted attributes. our investigation into the second-level agenda-setting effect.
The results were consistent for proactive public attention Since text embedding models are not dependent on specific
(keyword searches and search-induced page views), passive context or language characteristics (Grave et al., 2018), our
public attention (other page views that occurred without search), proposed approach is generalizable to a wide range of
and political sentiment-driven public attention. Altogether, the languages. Our main approach, semi-supervised learning, is
empirical findings attest to the spillover influence of bot- ideally suited for situations with a limited amount of labeled
assisted FSE on the general public’s broader information (news) data that is mixed with abundant unlabeled data during
consumption beyond the immediate context targeted by a training, resulting in substantial performance improvements
manipulator. Our findings of disinformation effects on general (Tarvainen & Valpola, 2017; Iscen et al., 2019). Despite its
users’ information behaviors add new insights to existing advantages, semi-supervised learning has attracted little
knowledge that has thus far centered around subpopulation attention in the IS literature, with the exception of Abbasi et
groups of ideologically like-minded and/or heavy platform al. (2012). In this paper, we demonstrate how such an
users, based on a somewhat narrow definition of the sphere of approach can achieve superior accuracy in predicting specific
disinformation influence within the immediate interaction attributes of information (e.g., political orientation of articles).
context. Our study is one of the first in the IS literature to implement
semi-supervised learning to empirical research, broadening
the ML spectrum beyond the dichotomy of unsupervised and
Methodological Contributions supervised learning.

Disinformation research has employed ML techniques to


tackle detection problems. The current study advances this Managerial and Policy Contributions
line of research by demonstrating the utility of semi-
supervised ML approaches to explore the effects of This research has practical implications for online platforms
disinformation on the general public at scale. In particular, and policymakers. First, our results shed new light on the
given the sheer scale of our data, we note that it is infeasible underlying mechanism of bot-assisted disinformation
to manually code all articles, especially as this requires expert campaigns on online platforms. This knowledge can be
domain knowledge and familiarity with the political particularly helpful in managing digital platforms that battle
background. Prior work has primarily utilized supervised ML increasingly complex opinion manipulation by offering
with carefully engineered features (e.g., Horne et al., 2018; guidance in the design and development of manipulation
Potthast et al., 2018; Gangula et al., 2019). In practice, the two detection algorithms. In particular, we point out that bot-
major drawbacks of such models are that they require (1) vast assisted fake social engagements can substantially contribute
quantities of curated labeled training data and (2) features to changing the visibility of messages by deploying massive
unique to the context or characteristics (e.g., lexicon, style) of engagements simultaneously at a rapid pace. The content
the focal language, which is usually English. The latter visibility may, of course, not be fully controlled by the
drawback makes it especially difficult to extend these models manipulator yet can nonetheless be altered to some extent.
to other languages (e.g., Korean). While a keyword-based deployment of bots is a rather simple
technique, this disinformation tactic can be easily operated,
In this paper, we not only used the doc2vec model, an advanced making the content curation vulnerable to the attack,
ML technique that has gained popularity in the IS literature, but especially when the curation algorithm is simplistic (as in the
also demonstrated the utility of the label propagation model, a case of the net-vote-based rank order used by the studied
semi-supervised learning approach, by combining it with platform) and no rigorous monitoring protocol exists.
representation learning of text embeddings to effectively
resolve the aforementioned two issues. Semi-supervised Second, considering the ever-expanding role of digital social
models are not completely new to online data-driven research. conversations in setting the “climate” of public opinion in
For example, studies have successfully used semi-supervised network societies, it is obvious that the compromised social
models to classify the political orientations of Twitter users engagement culture deteriorates the quality of deliberative
using the retweet network (Badawy et al., 2018; Luceri et al., democracy. Platforms thus must take some social
2019). That being said, the application case in this study is responsibility for the conversational health of society. In
distinct from previous studies in that we inferred political particular, bot-assisted manipulation has become increasingly
attributes of news articles using natural language processing. common globally. Intensified bot deployment is deeply
problematic because it is scalable and can thus easily generate
Following the paradigm shift from manually engineering bandwagon effects (Caldarelli et al., 2020). Our study
features to learning representations, we created data-driven reiterates the importance of paying managerial attention to
text embeddings using the doc2vec model, which facilitated bot-assisted false amplification, as well as aspects of human-

866 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

crafted false messages, in counteracting disinformation subsequently manufacture public attention to information.
operations. Concerted efforts of online platforms and policy This research contributes to the IS literature by broadening our
regulators will be necessary, and data-driven empirical theoretical understanding of a bot-assisted disinformation
insights, such as our findings, can serve as shared intelligence technique and by demonstrating how a computational and
in the process. data-driven approach can help quantify its effects on general
users’ informational behaviors. We hope this study will lead
to more IS scholarly attention to the misuse/abuse of digital
Limitations and Future Research technologies and their ramifications on cybersocial security.

This research is subject to several limitations which in turn


highlight potential areas for future research. First, our work Acknowledgments
relies on observational data of a single event. While empirical
analysis of observational data has its own merits (e.g., high The third author is thankful for the mentoring received through the
external validity), it entails costs such as limited observations, U.S.-Korea NextGen Scholar program under the sponsorship of the
unobservable confounders, and context dependency. Korea Foundation. The fourth and fifth authors are co-corresponding
Experimental studies where the effect of FSE can be clearly authors for this paper. The third author’s effort was partly supported
by DEVCOM Army Research Laboratory-Army Research Office
measured under various conditions would complement this (Award Number: W911NF1910066), MIT-Lincoln Laboratory
research, allowing our findings to be generalized to broader (Award Number: PO 7000506684), and the National Science
contexts. Second, the context of this research limits us from Foundation (Award Number: 2210137). The fifth author’s effort was
examining how the effect of FSE might be influenced by social financially supported by Hansung University.
networking. The online news platform studied in this study is a
news aggregator, similar to Yahoo News, rather than a social
networking service, similar to Twitter or Facebook; thus, it References
provides limited data on how its users share specific
news/information. It would be fascinating to examine the role Abbasi, A., Albrecht, C., Vance, A., & Hansen, J. (2012). MetaFraud:
of social interaction in the context of fake social engagement A meta-learning framework for detecting financial fraud. MIS
operations. Third, this research studied a particular type of FSE Quarterly, 36(4), 1293-1327. https://doi.org/10.2307/41703508
operated by fake votes on organic comments. Other important Angrist, J. D., & Pischke, J.-S. (2008). Mostly harmless
contexts of fake engagement, such as sharing econometrics: An empiricist’s companion. Princeton University
articles/ads/posts/videos or paying for targeted ads, paired with Press.
relevant data would be very interesting for future work. For Autor, D. H. (2003). Outsourcing at will: The contribution of unjust
dismissal doctrine to the growth of employment outsourcing.
example, Bradshaw (2019) studied search engine optimization
Journal of Labor Economics, 21(1), 1-42. https://doi.org/
manipulation by junk news domains that targeted an increase in 10.1086/344122
their discoverability on Google Search. Last, this research Badawy, A., Ferrara, E., & Lerman, K. (2018). Analyzing the digital
measured the short-term effect of FSE, which was manifested traces of political manipulation: The 2016 Russian interference
by users’ news consumption behavior. Therefore, there are Twitter campaign. In Proceedings of the IEEE/ACM
remaining questions, such as how persistent the effect would be International Conference on Advances in Social Networks
and whether FSE would affect not only people’s information Analysis and Mining (pp. 258-265). https://doi.org/10.1109/
search behavior but also their attitudes or beliefs. We leave ASONAM.2018.8508646
these questions to future research. Bail, C. A., Guay, B., Maloney, E., Combs, A., Hillygus, D. S.,
Merhout, F., Freelon, D., & Volfovsky, A. (2020). Assessing the
Russian Internet Research Agency’s impact on the political
attitudes and behaviors of American Twitter users in late 2017.
Conclusion Proceedings of the National Academy of Sciences, 117(1), 243-
250. https://doi.org/10.1073/pnas.1906420116
Despite its limitations, this study has theoretical as well as Baly, R., Da San Martino, G., Glass, J., & Nakov, P. (2020). We can
practical implications for IS researchers, online platforms, and detect your bias: Predicting the political ideology of news
regulators. Many disinformation mechanisms still remain articles. In Proceedings of the Conference on Empirical Methods
in Natural Language Processing (pp. 4982-4991).
black-boxed, including those related to fake social https://doi.org/10.18653/v1/2020.emnlp-main.404
engagement operations. To our knowledge, this study is the Bastos, M. T., & Mercea, D. (2019). The Brexit Botnet and user-
first attempt to unravel the workings of a fake social generated hyperpartisan news. Social Science Computer Review,
engagement operation and its broad effect on users. Through 37(1), 38-54. https://doi.org/10.1177/0894439317734157
the lens of agenda-setting theory, the findings indicate that Benkler, Y., Faris, R., & Roberts, H. (2018). Network propaganda:
programmable bots increase the potential for perpetrators to Manipulation, disinformation, and radicalization in American
falsely inflate the salience of certain messages and politics. Oxford University Press.

MIS Quarterly Vol. 48 No. 3 / September 2024 867


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Boichak, O., Hemsley, J., Jackson, S., Tromble, R., & the 21st ACM Internet Measurement Conference (pp. 444-463).
Tanupabrungsun, S. (2021). Not the bots you are looking for: https://doi.org/10.1145/3487552.3487859
Patterns and effects of orchestrated interventions in the US and Effron, D. A., & Raj, M. (2020). Misinformation and morality:
German elections. International Journal of Communication, 15, Encountering fake-news headlines makes them seem less
814-839. https://ijoc.org/index.php/ijoc/article/view/14866 unethical to publish and share. Psychological Science, 31(1), 75-
Bound, J., Jaeger, D. A., Baker, R. M. (1995). Problems with 87. https://doi.org/10.1177/0956797619887896
instrumental variables estimation when the correlation between Freelon, D., Bossetta, M., Wells, C., Lukito, J., Xia, Y., & Adams, K.
the instruments and the endogenous explanatory variable is (2022). Black trolls matter: Racial and ideological asymmetries
weak. Journal of the American Statistical Association, 90(430), in social media disinformation. Social Science Computer Review,
443-450. https://doi.org/10.2307/2291055 40(3), 560-578. https://doi.org/10.1177/0894439320914853
Bradshaw, S. (2019). Disinformation optimised: Gaming search Freelon, D., & Wells, C. (2020) Disinformation as political
engine algorithms to amplify junk news. Internet Policy Review, communication. Political Communication, 37(2), 145-156.
8(4), 1-24. https://doi.org/10.14763/2019.4.1442 https://doi.org/10.1080/10584609.2020.1723755
Bradshaw, S., Bailey, H., & Howard, P. N. (2021). Industrialized Fujiwara, Y., & Irie, G. (2014). Efficient label propagation. In
disinformation: 2020 global inventory of organized social media Proceedings of the International Conference on Machine
manipulation. Computational Propaganda Project at the Oxford Learning (pp. 784-792).
Internet Institute. https://demtech.oii.ox.ac.uk/research/ Gangula, R. R. R., Duggenpudi, S. R., & Mamidi, R. (2019).
posts/industrialized-disinformation Detecting political bias in news articles using headline attention.
Bradshaw, S., & Howard, P. N. (2018). Challenging truth and trust: In Proceedings of the ACL Workshop BlackboxNLP: Analyzing
A global inventory of organized social media manipulation. and Interpreting Neural Networks for NLP (pp. 77-84).
Computational Propaganda Project at the Oxford Internet https://doi.org/10.18653/v1/W19-4809
Institute. https://demtech.oii.ox.ac.uk/wp-content/uploads/sites/ Gorodnichenko, Y., Pham, T., & Talavera, O. (2021). Social media,
127/2018/07/ct2018.pdf sentiment and public opinions: Evidence from #Brexit and
Caldarelli, G., Nicola, R. D., Vigna, F. D., Petrocchi, M., & Saracco, #USElection. European Economic Review, 136, Article 103772.
F. (2020). The role of bot squads in the political propaganda on https://doi.org/10.1016/j.euroecorev.2021.103772
Twitter. Communications Physics, 3(1), 1-15. https://doi.org/ Grave, E., Bojanowski, P., Gupta, P., Joulin, A., & Mikolov, T.
10.1038/s42005-020-0340-4 (2018). Learning word vectors for 157 Languages. In
Carman, M., Koerber, M., Li, J., Choo, K. R., & Ashman, H. (2018). Proceedings of the International Conference on Language
Manipulating visibility of political and apolitical threads on Resources and Evaluation (pp. 3483-3487).
Reddit via score boosting. In Proceedings of the IEEE Greene, W. H. (2017). Econometric analysis. Pearson.
International Conference on Trust, Security and Privacy in Guo, L., & Vargo, C. (2015). The power of message networks: A big-
Computing and Communications (pp. 184-190). data analysis of the network agenda setting model and issue
https://doi.org/10.1109/TrustCom/BigDataSE.2018.00037 ownership. Mass Communication and Society, 18(5), 557-576.
Carnahan, D., & Garrett, R. K, (2020). Processing style and http://dx.doi.org/10.1080/15205436.2015.1045300
responsiveness to corrective information. International Journal Guo, L., & Vargo, C. (2020). Fake news and emerging online media
of Public Opinion Research, 32(3), 530-546. https://doi.org/ ecosystem: An integrated intermedia agenda-setting analysis of
10.1093/ijpor/edz037 the 2016 U.S. presidential election. Communication Research,
Chen, E., Chang, H., Rao, A., Lerman, K., Cowan, G., & Ferrara, E. 47(2), 178-200. https://doi.org/10.1177/0093650218777177
(2021). COVID-19 misinformation and the 2020 US presidential Horne, B. D., Dron, W., Khedr, S., & Adali, S. (2018). Assessing the
election. The Harvard Kennedy School Misinformation Review, news landscape: A multi-module toolkit for evaluating the
1, Article 7. https://doi.org/10.37016/mr-2020-57 credibility of news. In Companion Proceedings of the World
Choe, S. (2018). Ally of South Korean leader conspired to rig online Wide Web Conference (pp. 235-238). https://doi.org/
opinion, inquiry finds. The New York Times. 10.1145/3184558.3186987
https://www.nytimes.com/2018/08/27/world/asia/moon-jae-in- Iscen, A., Tolias, G., Avrithis, Y., & Chum, O. (2019). Label
online-scandal.html propagation for deep semi-supervised learning. In Proceedings
Coleman, R., & Wu, H. D. (2010). Proposing emotion as a dimension of the IEEE Conference on Computer Vision and Pattern
of affective agenda setting: Separating affect into two Recognition (pp. 5070-5079). https://doi.ieeecomputersociety.
components and comparing their second-level effects. org/10.1109/CVPR.2019.00521
Journalism & Mass Communication Quarterly, 87(2), 315-327. Jeong, J., Kang, J.-H., & Moon, S. (2020). Identifying and
http://dx.doi.org/10.1177/107769901008700206 quantifying coordinated manipulation of upvotes and downvotes
Cresci, S. (2020). A decade of social bot detection. Communications in Naver news comments. In Proceedings of the International
of the ACM, 63(10), 72-83. https://doi.org/10.1145/3409116 AAAI Conference on Web and Social Media, 14(1), 303-314.
David, E., Zhitomirsky-Geffet, M., Koppel, M., & Uzan, H. (2016). https://doi.org/10.1609/icwsm.v14i1.7301
Utilizing Facebook pages of the political parties to automatically Kahan, D. M., Landrum, A., Carpenter, K., Helft, L., & Jamieson, K.
predict the political orientation of Facebook users. Online H. (2017). Science curiosity and political information processing.
Information Review, 40(5), 610-623. http://dx.doi.org/10.1108/ Political Psychology, 38(S1), 179-199. http://dx.doi.org/
OIR-09-2015-0308 10.1111/pops.12396
Edelson, L., Nguyen, M. K., Goldstein, I., Goga, O., McCoy, D., & Kang, H., & Yang, J. (2020). Quantifying perceived political bias of
Lauinger, T. (2021). Understanding engagement with US newspapers through a document classification technique.
(mis)information news sources on Facebook. In Proceedings of

868 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Journal of Quantitative Linguistics, 29(2), 127-150. Rojecki, A., & Meraz, S. (2016). Rumors and factitious informational
https://doi.org/10.1080/09296174.2020.1771136 blends: The role of the web in speculative politics. New Media &
Kim, S.-H., Scheufele, D. A., & Shanahan, J. (2002). Think about it Society, 18(1), 25-43. https://doi.org/10.1177/1461444814535724
this way: Attribute agenda-setting function of the press and the Rossi, S., Rossi, M., Upreti, B., & Liu, Y. (2020). Detecting political
public’s evaluation of a local issue. Journalism & Mass bots on Twitter during the 2019 Finnish parliamentary election.
Communication Quarterly, 79(1), 7-25. https://doi.org/10.1177/ In Proceedings of the Hawaii International Conference on
107769900207900102 System Sciences (pp. 2430-2439).
Kiousis, S. (2005). Compelling arguments and attitude strength: Salge, C., Karahanna, E., & Thatcher, J. B. (2022). Algorithmic
Exploring the impact of second-level agenda setting on public processes of social alertness and social transmission: How bots
opinion of presidential candidate images. Harvard International disseminate information on Twitter. MIS Quarterly, 46(1), 229-
Journal of Press/Politics, 10(2), 3-27. https://doi.org/10.1177/ 260. https://doi.org/10.25300/MISQ/2021/15598
1081180X05276095 Schäfer, F., Evert, S., & Heinrich, P. (2017). Japan’s 2014 general
Le, Q., & Mikolov, T. (2014). Distributed representations of election: Political bots, right-wing internet activism, and Prime
sentences and documents. In Proceedings of the International Minister Shinzō Abe’s hidden nationalist agenda. Big Data, 5(4),
Conference on Machine Learning (pp. 1188-1196). 294-309. https://doi.org/10.1089/big.2017.0049
Lim, W., Lee, C., & Choi, D. (2019). Opinion polarization in Korea: Scheufele, D. A., & Krause, N. M. (2019). Science audiences,
Its characteristics and drivers (Korea Development Institute misinformation, and fake news. Proceedings of the National
(KDI) Research Monograph). https://doi.org/10.22740/kdi.rm. Academy of Sciences, 116(16), 7662-7669. https://doi.org/
2019.03 10.1073/pnas.1805871115
Luceri, L., Deb, A., Badawy, A., & Ferrara, E. (2019). Red bots do it Shin, D., He, S., Lee, G. M., Whinston, A. B., Cetintas, S., & Lee,
better: Comparative analysis of social bot partisan behavior. In K.-C. (2020). Enhancing social media analysis with visual data
Companion Proceedings of the World Wide Web Conference (pp. analytics: A deep learning approach. MIS Quarterly, 44(4), 1459-
1007-1012). https://doi.org/10.1145/3308560.3316735 1492. https://doi.org/10.25300/misq/2020/14870
Marwick, A., & Lewis, R. (2017). Media manipulation and Stella, M., Ferrara, E., & De Domenico, M. (2018). Bots increase
disinformation online. Data & Society Research Institute. exposure to negative and inflammatory content in online social
https://datasociety.net/library/media-manipulation-and-disinfo- systems. Proceedings of the National Academy of Sciences, 115(49),
online 12435-12440. https://doi.org/10.1073/pnas.1803470115
McCombs, M., & Shaw, D. (1972). The agenda setting function of Tarvainen, A., & Valpola, H. (2017). Mean teachers are better role
mass media. Public Opinion Quarterly, 36(2), 176-187. models: Weight-averaged consistency targets improve semi-
https://doi.org/10.1086/267990 supervised deep learning results. In Proceedings of the Advances
McCombs, M., & Valenzuela, S. (2020). Setting the agenda: Mass in Neural Information Processing Systems (pp. 1195-1204).
media and public opinion. Polity Press. Vargo, C., Guo, L., & Amazeen, M. A. (2018). The agenda-setting
Mindel, V., Mathiassen, L., & Rai, A. (2018). The sustainability of power of fake news: A big data analysis of the online media
polycentric information commons. MIS Quarterly, 42(2), 607- landscape from 2014 to 2016. New Media & Society, 20(5),
632. http://dx.doi.org/10.25300/MISQ/2018/14015 2028-2049. https://doi.org/10.1177/1461444817712086
Nelson, J. L., & Taneja, H. (2018). The small, disloyal fake news Varol, O., Ferrara, E., Davis, C., Menczer, F., & Flammini, A. (2017).
audience: The role of audience availability in fake news Online human-bot interactions: Detection, estimation, and
consumption. New Media & Society, 20(10), 3720-3737. characterization. Proceedings of the International AAAI
https://doi.org/10.1177/1461444818758715 Conference on Web and Social Media, 11(1), 280-289.
Pennycook, G., Cannon, T. D., & Rand, D. G. (2018). Prior exposure https://doi.org/10.1609/icwsm.v11i1.14871
increases perceived accuracy of fake news. Journal of Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false
Experimental Psychology: General, 147(12), 1865-1880. news online. Science, 359(6380), 1146-1151. https://doi.org/
http://dx.doi.org/10.2139/ssrn.2958246 10.1126/science.aap9559
Pennycook, G., Epstein, Z., Mosleh, M., Arechar, A. A., Eckles, D., Vu, H. T., Guo, L., & McCombs, M. (2014). Exploring the world
& Rand, D. G. (2021). Shifting attention to accuracy can reduce outside and the pictures in our heads: A network agenda-setting
misinformation online. Nature, 592, 590-595. https://doi.org/ study. Journalism & Mass Communication Quarterly, 91(4),
10.1038/s41586-021-03344-2 669-686. https://doi.org/10.1177/1077699014550090
Phillips, W. (2015). This is why we can’t have nice things: Mapping Weedon, J., Nuland, W., & Stamos, A. (2017). Information
the relationship between online trolling and mainstream culture. operations and Facebook. Facebook. https://about.fb.com/wp-
MIT Press. content/uploads/2017/04/facebook-and-information-operations-
Potthast, M., Kiesel, J., & Reinartz, K. (2018). A stylometric inquiry v1.pdf
into hyperpartisan and fake news. In Proceedings of the Weidner, K., Beuk, F., & Bal, A. (2020). Fake news and the
Conference on Empirical Methods in Natural Language willingness to share: A schemer schema and confirmatory bias
Processing (pp. 3528-3539). https://doi.org/10.18653/v1/P18- perspective. Journal of Product & Brand Management, 29(2),
1022 180-187. https://doi.org/10.1108/JPBM-12-2018-2155
Qiao, D., Lee, S.-Y., Whinston, A. B., & Wei, Q. (2020). Financial Wong, J. C., & Ernst, J. (2021). Facebook knew of Honduran
incentives dampen altruism in online prosocial contributions: A president’s manipulation campaign—and let it continue for 11
study of online reviews. Information Systems Research, 31(4), months. The Guardian. https://www.theguardian.com/
1361-1375. https://doi.org/10.1287/isre.2020.0949 technology/2021/apr/13/facebook-honduras-juan-orlando-
hernandez-fake-engagement

MIS Quarterly Vol. 48 No. 3 / September 2024 869


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Woolley, S., & Howard, P. (2016). Political communication, Influence at Arizona State University. She has received grants from
computational propaganda, and autonomous agents. the DoD, NSF, Social Science Research Council, and the Gates
International Journal of Communication, 10, 4882-4890. Foundation for her various research projects on social media and
Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. participation. She has won multiple awards including the AEJMC
(2003). Learning with local and global consistency. In Emerging Scholar (2020), Top Faculty Papers from the Broadcast
Proceedings of the Advanced in Neural Information Processing Education Association (2022) and Chinese Communication
Systems (pp. 321-328). Association (2021), and the Herbert S. Dordick Dissertation Award
(3rd place) from the International Communication Association
(2012). In 2020-2021, she was selected as a U.S.-Korea NextGen
Scholar (ORCiD Id: 0000-0001-7414-6959).
Author Biographies
Sang-Pil Han is an associate professor of information systems at the
Sanghak Lee is an associate professor of marketing at the W. P.
W. P. Carey School of Business, Arizona State University. His
Carey School of Business, Arizona State University. He holds a B.S.
research interests encompass artificial intelligence, digital platforms,
in Chemical Engineering from Seoul National University, an M.S. in
and business analytics. Notably, his work has been published in
Management Engineering from KAIST (Korean Advanced Institute
esteemed journals such as Management Science, MIS Quarterly,
of Science and Technology), and a Ph.D. in Marketing from the Ohio
Information Systems Research, and Journal of Marketing. Beyond
State University. His research primarily focuses on direct utility
academia, his insights have been showcased in media outlets like
models, Bayesian econometrics, and choice modeling. His work has
Harvard Business Review, The Wall Street Journal, and BBC News.
been published in prestigious journals, including Marketing Science
Professor Han’s research has garnered support from institutions
and Management Science.
including the Marketing Science Institute, NET Institute, and Hong
Kong General Research Fund, as well as private enterprises. He has
Donghyuk Shin is an associate professor of information systems at held educational leadership roles, notably as co-faculty director for
the College of Business, Korea Advanced Institute of Science and the Master of Science in Business Analytics at ASU. Additionally, he
Technology (KAIST). Before joining KAIST, he held positions as an served as an associate editor for Information Systems Research.
assistant professor at Arizona State University and a machine Outside academia, his consultation spans from tech startups such as
learning scientist at Amazon. He earned his Ph.D. in computer Mathpresso to nonprofits like Simple Steps.
science from the University of Texas at Austin. His primary research
interest is at the nexus of machine learning and information systems,
Seok Kee Lee is a professor in the Department of Computer
with a focus on artificial intelligence, digital platforms, and business
Engineering at Hansung University in South Korea. He received his
analytics. His research has been featured in MIS Quarterly,
Ph.D. in management engineering at KAIST (Korea Advanced
Management Science, and at leading machine learning conferences
Institute of Science and Technology). His current research interests
such as NeurIPS, ACM RecSys, and CIKM.
include data analytics and artificial intelligence on consumer
behavior. His articles have been published in academic journals
K. Hazel Kwon is a professor of digital audiences and the founder including Information Sciences, International Journal of Consumer
and lead researcher of the Media, Information, Data, and Society Studies, and Sustainability.
(MIDaS) Lab at the Walter Cronkite School of Journalism and Mass
Communication, and an affiliate faculty with Global Security
Initiative’s Center on Narrative, Disinformation, and Strategic

870 MIS Quarterly Vol. 48 No. 3 / September 2024


Lee et al. / Disinformation Spillover: Uncovering the Ripple Effect

Appendix A
We describe our label propagation (LP) model used to infer the political sentiment of articles that users visited. Two main advantages of LP
are (1) local consistency: nearby data points are likely to have the same label, and (2) global consistency: data points on the same structure
(i.e., manifold or cluster) are likely to have the same label. The core idea of LP is to construct an affinity graph from all labeled and unlabeled
samples and then iteratively propagate the known labels to the unlabeled samples according to the graph structure. More formally, the
algorithm proceeds as follows:

1. Form an affinity graph 𝐺 and its corresponding adjacency matrix 𝑊, where nodes represent samples and edges capture their pairwise
similarities (e.g., k-nearest neighbor graph).

2. Construct the normalized Laplacian 𝐿 = 𝐷 −1/2 𝑊𝐷 −1/2 , where 𝐷 is the diagonal matrix of node degrees (necessary for convergence).

3. Iterate 𝐹𝑡+1 = 𝜆𝐿𝐹𝑡 + (1 − 𝜆)𝑌 until convergence, where 𝐹𝑡 represents the labels at the 𝑡-th iteration, 𝜆 is a hyperparameter between
0 and 1 that specifies the relative amount of initial label information to retain, and 𝑌 is the vector of initial known labels.

To construct an affinity graph with articles as nodes, we computed pairwise cosine similarities between 342,567 articles using their embedding
vectors obtained from our doc2vec model (described in the Organic User Activities section), which has been shown to be accurate in detecting
political biases in articles (e.g., Baly et al., 2020; Kang & Yang, 2020). From the pairwise similarities, we formed a sparse k-nearest neighbor
graph with 𝑘 = 15 as the affinity graph 𝐺. For the iterations in Step 3, we set 𝜆 = 0.4.

Figure A1(a) shows an example of label propagation iterations. Starting from the nodes corresponding to Article 1 (labeled as “P”) and Article
2 (labeled as “A”), the initial known labels are propagated to other articles according to the affinity graph at each iteration. We also compare
our LP model to other representative supervised ML models, including feed-forward neural network (FNN), logistic regression (LR), gradient
boosting trees (GBT), and k-nearest neighbor (kNN) classifiers. Figure A1(b) depicts that the LP model yields the best prediction accuracy
(0.913) measured by the F1-score (a standard accuracy metric for classification tasks) averaged over multiple stratified 5-fold cross-
validations. We note that hyperparameters of the compared models are tuned with validation sets and F1-scores are reported using separate
test sets.

(a) Example of Label Propagation Iterations (b) Performance Comparison

Note: (a) Shows an example of label propagation iterations where the initial known labels of Article 1 (“P”) and Article 2 (“A”) are spread to
other articles (i.e., nodes) according to the affinity graph. (b) Presents the prediction accuracies (F1-score) of different ML models showing
that label propagation (LP) achieves the best performance.
Figure A1. Example of Label Propagation Iterations and Performance Comparison

MIS Quarterly Vol. 48 No. 3 / September 2024 871


872 MIS Quarterly Vol. 48 No. 3 / September 2024

You might also like