Chen Ch1
Chen Ch1
Propagation in
Social Networks
Synthesis Lectures on Data
Management
Editor
M. Tamer Özsu, University of Waterloo
Synthesis Lectures on Data Management is edited by Tamer Özsu of the University of Waterloo.
e series will publish 50- to 125 page publications on topics pertaining to data management. e
scope will largely follow the purview of premier information and computer science conferences, such
as ACM SIGMOD, VLDB, ICDE, PODS, ICDT, and ACM KDD. Potential topics include, but
not are limited to: query languages, database system architectures, transaction management, data
warehousing, XML and databases, data stream systems, wide scale data distribution, multimedia
data management, data mining, and related subjects.
Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based
Data and Services for Advanced Applications
Amit Sheth and Krishnaprasad irunarayan
2012
Declarative Networking
Boon au Loo and Wenchao Zhou
2012
Probabilistic Databases
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch
2011
Database Replication
Bettina Kemme, Ricardo Jimenez-Peris, and Marta Patino-Martinez
2010
v
Relational and XML Data Exchange
Marcelo Arenas, Pablo Barcelo, Leonid Libkin, and Filip Murlak
2010
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
DOI 10.2200/S00527ED1V01Y201308DTM037
Lecture #37
Series Editor: M. Tamer Özsu, University of Waterloo
Series ISSN
Synthesis Lectures on Data Management
Print 2153-5418 Electronic 2153-5426
Information and Influence
Propagation in
Social Networks
Wei Chen
Microsoft Research Asia
Carlos Castillo
Qatar Computing Research Institute
M
&C Morgan & cLaypool publishers
ABSTRACT
Research on social networks has exploded over the last decade. To a large extent, this has been fu-
eled by the spectacular growth of social media and online social networking sites, which continue
growing at a very fast pace, as well as by the increasing availability of very large social network
datasets for purposes of research. A rich body of this research has been devoted to the analysis of
the propagation of information, influence, innovations, infections, practices and customs through
networks. Can we build models to explain the way these propagations occur? How can we vali-
date our models against any available real datasets consisting of a social network and propagation
traces that occurred in the past? ese are just some questions studied by researchers in this area.
Information propagation models find applications in viral marketing, outbreak detection, finding
key blog posts to read in order to catch important stories, finding leaders or trendsetters, infor-
mation feed ranking, etc. A number of algorithmic problems arising in these applications have
been abstracted and studied extensively by researchers under the garb of influence maximization.
is book starts with a detailed description of well-established diffusion models, includ-
ing the independent cascade model and the linear threshold model, that have been successful at
explaining propagation phenomena. We describe their properties as well as numerous extensions
to them, introducing aspects such as competition, budget, and time-criticality, among many oth-
ers. We delve deep into the key problem of influence maximization, which selects key individuals
to activate in order to influence a large fraction of a network. Influence maximization in clas-
sic diffusion models including both the independent cascade and the linear threshold models is
computationally intractable, more precisely #P-hard, and we describe several approximation al-
gorithms and scalable heuristics that have been proposed in the literature. Finally, we also deal
with key issues that need to be tackled in order to turn this research into practice, such as learning
the strength with which individuals in a network influence each other, as well as the practical
aspects of this research including the availability of datasets and software tools for facilitating re-
search. We conclude with a discussion of various research problems that remain open, both from
a technical perspective and from the viewpoint of transferring the results of research into industry
strength applications.
KEYWORDS
social networks, social influence, information and influence diffusion, stochastic dif-
fusion models, influence maximization, learning of propagation models, viral mar-
keting, competitive influence diffusion, game theory, computational complexity, ap-
proximation algorithms, heuristic algorithms, scalability.
ix
To our families
Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Social Networks and Social Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Examples of Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Examples of Information Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Social Influence Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Social Influence Analysis Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 e Flip Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Outline of is Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Complexity of Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Greedy Approach to Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.1 Greedy Algorithm for Influence Maximization . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Empirical Evaluation of .G; k/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Scalable Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Reducing the Number of Influence Spread Evaluations . . . . . . . . . . . . . 48
3.3.2 Speeding Up Influence Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.3 Other Scalable Influence Maximization Schemes . . . . . . . . . . . . . . . . . . 65
xii
4 Extensions to Diffusion Modeling and Influence Maximization . . . . . . . . . . . . 67
4.1 A Data-Based Approach to Influence Maximization . . . . . . . . . . . . . . . . . . . . . 67
4.2 Competitive Influence Modeling and Maximization . . . . . . . . . . . . . . . . . . . . . 71
4.2.1 Model Extensions for Competitive Influence Diffusion . . . . . . . . . . . . . 72
4.2.2 Maximization Problems for Competitive Influence Diffusion . . . . . . . . . 75
4.2.3 Endogenous Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2.4 A New Frontier – e Host Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Influence, Adoption, and Profit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3.1 Influence vs. Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.2 Influence vs. Profit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Other Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
xv
Acknowledgments
e material presented here has been deeply influenced by the research we conducted with
many wonderful colleagues, students, and post-doctoral fellows as well as the numerous stim-
ulating discussions we have had with them. We would like to express our gratitude to our col-
laborators, including: Smriti Bhagat, Francesco Bonchi, Alex Collins, Rachel Cummings, Aris-
tides Gionis, Amit Goyal, Xinran He, Dino Ienco, Qingye Jiang, Te Ke, Yanhua Li, Zhenming
Liu, Wei Lu, Michael Mathioudakis, David Rincón, Guojie Song, Xiaorui Sun, Antti Ukko-
nen, Suresh Venkatasubramanian, Chi Wang, Yajun Wang, Wei Wei, Siyu Yang, Yifei Yuan, Li
Zhang, and Zhi-Li Zhang. We are grateful to Lewis Tseng, Yajun Wang, and Cheng Yang for
helpful discussions on some technical sections of the book, and to Tian Lin for running a few addi-
tional evaluation tests on top of what previously published papers had and for generating relevant
plots. anks are due to Qiang Li, Tian Lin, Wei Lu, Lewis Tseng, Yajun Wang, and Cheng Yang
for their careful reading of and excellent comments on the manuscript which greatly improved
our presentation. We appreciate Diane Cerra and Tamer Ozsu for their assistance throughout the
preparation of this manuscript and for their patience. Tamer’s editorial comments were especially
helpful in improving the readability. Last but not the least, we are indebted to our families whose
patience and support throughout this project has been invaluable.
CHAPTER 1
Introduction
In this chapter we motivate the study of influence and information propagation by providing
numerous examples. In addition, we provide some basic definitions.
e study of information and influence propagation has found applications in several fields, in-
cluding viral marketing, social media analytics, the spread of rumors, stories, interest, trust, re-
ferrals, the adoption of innovations in organizations, the study of human and non-human animal
epidemics, expert finding, behavioral targeting, feed ranking, “friends” recommendation, social
search, etc.
Among these, viral marketing or word of mouth marketing as it is otherwise called, is a
“poster” application of influence analysis. e vision behind this is to activate a small number of
“influential” individuals in a social network through which a large number of other individuals can
be influenced by a viral propagation. Formally, consider a social network represented as a directed
graph G D .V; E/ with nodes V corresponding to individuals and links E V V representing
social ties. Furthermore, suppose there is a function p W E ! Œ0; 1 that associates a weight or
probability p.u; v/ with every link .u; v/, representing the influence exerted by user u on v . is
informally captures the intuition that whenever u performs an action, then v also performs the
action after u, with probability p.u; v/. e idea behind viral marketing is that by getting a small
set of users in V (a seed set) to use a product, for instance by giving it to them for free or at
a discounted price, we can reach a much larger set of users through transitive propagation of
influence.
Interestingly, a family of applications in seemingly different domains and settings fall into a
pattern similar to the one we describe for viral marketing. Consider the water distribution network
of a large metropolitan city [Leskovec et al., 2007, Ostfeld and Salomons, 2004, Ostfeld et al.,
2006]: accidental or deliberate interference can introduce viruses or other contaminants in the
water being distributed. ere are sensors capable of detecting any outbreak, but each sensor is
expensive both in itself and in terms of its deployment and maintenance. A natural question is
whether we can find a small set of crucial junctions in the water distribution network in which to
1.4. THE FLIP SIDE 5
place the sensors, so as to detect any outbreak as quickly as possible. Alternatively, we may wish
to minimize the size of the population potentially affected by the undetected contamination.
Similar ideas can also be applied to study the adoption of innovation in organizations and
the propagation of rumors and information in general through society. An important aspect of
this is the propagation of information through social media, including the blogosphere and mi-
croblogging platforms. In social media, posts are linked to other posts allowing us to study the
propagation of stories and to determine who is an expert or an influencer on a given topics.
A spate of startups has sprung up around the notion of social media influence and we de-
scribe a few examples here. Klout (http://klout.com/) claims to compute the overall influence
of users online based on their behavior and their followers’ behavior in Facebook, Twitter, and
LinkedIn. e “klout” score is computed as a function of the true reach, amplification probabil-
ity, and network influence. PeerIndex (http://peerindex.com/), on the other hand, identifies
authorities and determines authority scores over the social web on a per topic basis. PeerIndex
and Influencer50 (http://influencer50.com/) explicitly work with a viral marketing business
model, calculating users’ influence scores by analyzing the likes and retweets they receive online
and recommending the influential people they found to brand name companies. Users are en-
couraged to try their best to be influential and in return receive rewards and offers from such
companies.