0% found this document useful (0 votes)
26 views23 pages

Chen Ch1

Uploaded by

piyush upadhyaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views23 pages

Chen Ch1

Uploaded by

piyush upadhyaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Information and Influence

Propagation in
Social Networks
Synthesis Lectures on Data
Management
Editor
M. Tamer Özsu, University of Waterloo
Synthesis Lectures on Data Management is edited by Tamer Özsu of the University of Waterloo.
e series will publish 50- to 125 page publications on topics pertaining to data management. e
scope will largely follow the purview of premier information and computer science conferences, such
as ACM SIGMOD, VLDB, ICDE, PODS, ICDT, and ACM KDD. Potential topics include, but
not are limited to: query languages, database system architectures, transaction management, data
warehousing, XML and databases, data stream systems, wide scale data distribution, multimedia
data management, data mining, and related subjects.

Information and Influence Propagation in Social Networks


Wei Chen, Laks V.S. Lakshmanan, and Carlos Castillo
2013

Data Cleaning: A Practical Perspective


Venkatesh Ganti and Anish Das Sarma
2013

Data Processing on FPGAs


Jens Teubner and Louis Woods
2013

Perspectives on Business Intelligence


Raymond T. Ng, Patricia C. Arocena, Denilson Barbosa, Giuseppe Carenini, Luiz Gomes, Jr.
Stephan Jou, Rock Anthony Leung, Evangelos Milios, Renée J. Miller, John Mylopoulos, Rachel A.
Pottinger, Frank Tompa, and Eric Yu
2013

Semantics Empowered Web 3.0: Managing Enterprise, Social, Sensor, and Cloud-based
Data and Services for Advanced Applications
Amit Sheth and Krishnaprasad irunarayan
2012

Data Management in the Cloud: Challenges and Opportunities


Divyakant Agrawal, Sudipto Das, and Amr El Abbadi
2012
iii
Query Processing over Uncertain Databases
Lei Chen and Xiang Lian
2012

Foundations of Data Quality Management


Wenfei Fan and Floris Geerts
2012

Incomplete Data and Data Dependencies in Relational Databases


Sergio Greco, Cristian Molinaro, and Francesca Spezzano
2012

Business Processes: A Database Perspective


Daniel Deutch and Tova Milo
2012

Data Protection from Insider reats


Elisa Bertino
2012

Deep Web Query Interface Understanding and Integration


Eduard C. Dragut, Weiyi Meng, and Clement T. Yu
2012

P2P Techniques for Decentralized Applications


Esther Pacitti, Reza Akbarinia, and Manal El-Dick
2012

Query Answer Authentication


HweeHwa Pang and Kian-Lee Tan
2012

Declarative Networking
Boon au Loo and Wenchao Zhou
2012

Full-Text (Substring) Indexes in External Memory


Marina Barsky, Ulrike Stege, and Alex omo
2011

Spatial Data Management


Nikos Mamoulis
2011

Database Repairing and Consistent Query Answering


Leopoldo Bertossi
2011
iv
Managing Event Information: Modeling, Retrieval, and Applications
Amarnath Gupta and Ramesh Jain
2011

Fundamentals of Physical Design and Query Compilation


David Toman and Grant Weddell
2011

Methods for Mining and Summarizing Text Conversations


Giuseppe Carenini, Gabriel Murray, and Raymond Ng
2011

Probabilistic Databases
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch
2011

Peer-to-Peer Data Management


Karl Aberer
2011

Probabilistic Ranking Techniques in Relational Databases


Ihab F. Ilyas and Mohamed A. Soliman
2011

Uncertain Schema Matching


Avigdor Gal
2011

Fundamentals of Object Databases: Object-Oriented and Object-Relational Design


Suzanne W. Dietrich and Susan D. Urban
2010

Advanced Metasearch Engine Technology


Weiyi Meng and Clement T. Yu
2010

Web Page Recommendation Models: eory and Algorithms


Sule Gündüz-Ögüdücü
2010

Multidimensional Databases and Data Warehousing


Christian S. Jensen, Torben Bach Pedersen, and Christian omsen
2010

Database Replication
Bettina Kemme, Ricardo Jimenez-Peris, and Marta Patino-Martinez
2010
v
Relational and XML Data Exchange
Marcelo Arenas, Pablo Barcelo, Leonid Libkin, and Filip Murlak
2010

User-Centered Data Management


Tiziana Catarci, Alan Dix, Stephen Kimani, and Giuseppe Santucci
2010

Data Stream Management


Lukasz Golab and M. Tamer Özsu
2010

Access Control in Data Management Systems


Elena Ferrari
2010

An Introduction to Duplicate Detection


Felix Naumann and Melanie Herschel
2010

Privacy-Preserving Data Publishing: An Overview


Raymond Chi-Wing Wong and Ada Wai-Chee Fu
2010

Keyword Search in Databases


Jeffrey Xu Yu, Lu Qin, and Lijun Chang
2009
Copyright © 2014 by Morgan & Claypool

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.

Information and Influence Propagation in Social Networks


Wei Chen, Laks V.S. Lakshmanan, and Carlos Castillo
www.morganclaypool.com

ISBN: 9781627051156 paperback


ISBN: 9781627051163 ebook

DOI 10.2200/S00527ED1V01Y201308DTM037

A Publication in the Morgan & Claypool Publishers series


SYNTHESIS LECTURES ON DATA MANAGEMENT

Lecture #37
Series Editor: M. Tamer Özsu, University of Waterloo
Series ISSN
Synthesis Lectures on Data Management
Print 2153-5418 Electronic 2153-5426
Information and Influence
Propagation in
Social Networks

Wei Chen
Microsoft Research Asia

Laks V.S. Lakshmanan


University of British Columbia

Carlos Castillo
Qatar Computing Research Institute

SYNTHESIS LECTURES ON DATA MANAGEMENT #37

M
&C Morgan & cLaypool publishers
ABSTRACT
Research on social networks has exploded over the last decade. To a large extent, this has been fu-
eled by the spectacular growth of social media and online social networking sites, which continue
growing at a very fast pace, as well as by the increasing availability of very large social network
datasets for purposes of research. A rich body of this research has been devoted to the analysis of
the propagation of information, influence, innovations, infections, practices and customs through
networks. Can we build models to explain the way these propagations occur? How can we vali-
date our models against any available real datasets consisting of a social network and propagation
traces that occurred in the past? ese are just some questions studied by researchers in this area.
Information propagation models find applications in viral marketing, outbreak detection, finding
key blog posts to read in order to catch important stories, finding leaders or trendsetters, infor-
mation feed ranking, etc. A number of algorithmic problems arising in these applications have
been abstracted and studied extensively by researchers under the garb of influence maximization.
is book starts with a detailed description of well-established diffusion models, includ-
ing the independent cascade model and the linear threshold model, that have been successful at
explaining propagation phenomena. We describe their properties as well as numerous extensions
to them, introducing aspects such as competition, budget, and time-criticality, among many oth-
ers. We delve deep into the key problem of influence maximization, which selects key individuals
to activate in order to influence a large fraction of a network. Influence maximization in clas-
sic diffusion models including both the independent cascade and the linear threshold models is
computationally intractable, more precisely #P-hard, and we describe several approximation al-
gorithms and scalable heuristics that have been proposed in the literature. Finally, we also deal
with key issues that need to be tackled in order to turn this research into practice, such as learning
the strength with which individuals in a network influence each other, as well as the practical
aspects of this research including the availability of datasets and software tools for facilitating re-
search. We conclude with a discussion of various research problems that remain open, both from
a technical perspective and from the viewpoint of transferring the results of research into industry
strength applications.

KEYWORDS
social networks, social influence, information and influence diffusion, stochastic dif-
fusion models, influence maximization, learning of propagation models, viral mar-
keting, competitive influence diffusion, game theory, computational complexity, ap-
proximation algorithms, heuristic algorithms, scalability.
ix

To our families

Jian, Joice Yitao, and Ellie Yiqing

Sarada, Sundaram, Sharada, and Kaavya

Fabiola and Felipe


xi

Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Social Networks and Social Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Examples of Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Examples of Information Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Social Influence Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Social Influence Analysis Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 e Flip Side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Outline of is Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Stochastic Diffusion Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9


2.1 Main Progressive Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Independent Cascade Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Linear reshold Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 Submodularity and Monotonicity of Influence Spread Function . . . . . . 19
2.1.4 General reshold Model and General Cascade Model . . . . . . . . . . . . . 20
2.2 Other Related Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.1 Epidemic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.2 Voter Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.3 Markov Random Field Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2.4 Percolation eory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Complexity of Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Greedy Approach to Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.1 Greedy Algorithm for Influence Maximization . . . . . . . . . . . . . . . . . . . . 39
3.2.2 Empirical Evaluation of .G; k/ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 Scalable Influence Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Reducing the Number of Influence Spread Evaluations . . . . . . . . . . . . . 48
3.3.2 Speeding Up Influence Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3.3 Other Scalable Influence Maximization Schemes . . . . . . . . . . . . . . . . . . 65
xii
4 Extensions to Diffusion Modeling and Influence Maximization . . . . . . . . . . . . 67
4.1 A Data-Based Approach to Influence Maximization . . . . . . . . . . . . . . . . . . . . . 67
4.2 Competitive Influence Modeling and Maximization . . . . . . . . . . . . . . . . . . . . . 71
4.2.1 Model Extensions for Competitive Influence Diffusion . . . . . . . . . . . . . 72
4.2.2 Maximization Problems for Competitive Influence Diffusion . . . . . . . . . 75
4.2.3 Endogenous Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.2.4 A New Frontier – e Host Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Influence, Adoption, and Profit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.3.1 Influence vs. Adoption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.2 Influence vs. Profit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Other Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5 Learning Propagation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


5.1 Basic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 IC Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3 reshold Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.1 Static Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.2 Does Influence Remain Static? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.3 Continuous Time Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3.4 Discrete Time Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3.5 Are All Objects Equally Influence Prone? . . . . . . . . . . . . . . . . . . . . . . . 116
5.3.6 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.3.7 Experimental Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.3.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

6 Data and Software for Information/Influence: Propagation Research . . . . . . . 123


6.1 Types of Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2 Propagation of Information “Memes” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.1 Microblogging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.2 Newspapers/blogs/etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.3 Propagation of Other Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3.1 Consumption/Appraisal Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
6.3.2 User-Generated Content Sharing/Voting . . . . . . . . . . . . . . . . . . . . . . . 127
6.3.3 Community Membership as Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3.4 Cross-Provider Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.3.5 Phone Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.4 Network-Only Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
xiii
6.4.1 Citation Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.4.2 Other Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.5 Other Off-Line Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6.6 Publishing Your Own Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.7 Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.7.1 Graph Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.7.2 Propagation Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.7.3 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

7 Conclusion and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135


7.1 Application-Specific Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.1.1 Prove Value for Advertising/Marketing . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.1.2 Learn to Design for Virality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.1.3 Correct for Sampling Biases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
7.1.4 Contribute to Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.2 Technical Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

A Notational Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

Authors’ Biographies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
xv

Acknowledgments
e material presented here has been deeply influenced by the research we conducted with
many wonderful colleagues, students, and post-doctoral fellows as well as the numerous stim-
ulating discussions we have had with them. We would like to express our gratitude to our col-
laborators, including: Smriti Bhagat, Francesco Bonchi, Alex Collins, Rachel Cummings, Aris-
tides Gionis, Amit Goyal, Xinran He, Dino Ienco, Qingye Jiang, Te Ke, Yanhua Li, Zhenming
Liu, Wei Lu, Michael Mathioudakis, David Rincón, Guojie Song, Xiaorui Sun, Antti Ukko-
nen, Suresh Venkatasubramanian, Chi Wang, Yajun Wang, Wei Wei, Siyu Yang, Yifei Yuan, Li
Zhang, and Zhi-Li Zhang. We are grateful to Lewis Tseng, Yajun Wang, and Cheng Yang for
helpful discussions on some technical sections of the book, and to Tian Lin for running a few addi-
tional evaluation tests on top of what previously published papers had and for generating relevant
plots. anks are due to Qiang Li, Tian Lin, Wei Lu, Lewis Tseng, Yajun Wang, and Cheng Yang
for their careful reading of and excellent comments on the manuscript which greatly improved
our presentation. We appreciate Diane Cerra and Tamer Ozsu for their assistance throughout the
preparation of this manuscript and for their patience. Tamer’s editorial comments were especially
helpful in improving the readability. Last but not the least, we are indebted to our families whose
patience and support throughout this project has been invaluable.

Wei Chen, Laks V.S. Lakshmanan, and Carlos Castillo


August 2013
1

CHAPTER 1

Introduction
In this chapter we motivate the study of influence and information propagation by providing
numerous examples. In addition, we provide some basic definitions.

1.1 SOCIAL NETWORKS AND SOCIAL INFLUENCE


Social networks have been studied extensively by social scientists for decades (e.g., see [Barnes,
1954, Radcliffe-Brown, 1940, Wasserman and Faust, 1994]). Earlier studies have had to confine
themselves to extremely small datasets. Enabled by the Internet and sparked by the recent advent
of online social networking sites such as Facebook, LinkedIn, and Tumblr, research on social
networks is witnessing an unprecedented growth due to the ready availability of large scale social
network data. is has at once led to the development of many exciting applications of online
social networks and to the formulation and the subsequent study of many research questions.
A rich body of such studies has come to be classified as the analysis of influence and information
propagation in social networks.
It is our aim in this book to outline some of the key concepts, developments, and achieve-
ments in this area, as well as studying the driving applications that underlie this research and
highlight important challenges that remain open. For convenience and consistency of terminol-
ogy, we will use the term social influence analysis or just influence analysis to indicate the analysis
of the diffusion of information or influence through a social network.

1.1.1 EXAMPLES OF SOCIAL NETWORKS


By a social network,¹ we mean a possibly directed graph. A social network may be homogeneous,
where all nodes are of the same type, or heterogeneous, in which case the nodes fall into more
than one type.
Examples of homogeneous networks include the underlying graphs representing friend-
ships in basically all of the social networking platforms (e.g., the list of “friends” in Facebook),
as well as the graphs representing co-authorship or co-worker relationships in collaboration net-
works.
Examples of heterogeneous networks include rating networks consisting of people and ob-
jects such as songs, movies, books, etc. Such networks can be found in media appraisal and con-
sumption platforms such as Last.fm and Flixster. Here, people may be connected to one another
¹Not be confused with an online social networking site, which is a mobile or web-based software platform that allows users to
interact with their social connections.
2 1. INTRODUCTION
via friendship or acquaintance, whereas objects (songs, movies, etc.) may be linked with one an-
other by means of similarity of their metadata. For example, two songs may be linked since both
are in the same genre or by the same artist. Similarly, two movies may be directed by the same
director. In addition, links may be present between people and objects owing to their rating rela-
tionship: e.g., user Sam rating a specific model of Nikon SLR Camera produces a link between the
corresponding nodes. Another example of a heterogeneous network is a scientific collaboration
network between authors, augmented with articles (that are the result of collaboration), and the
venues they are published in. is network consists of three types of nodes—authors, articles, and
venues—and links between nodes of the same type as well as between nodes of different types.
In the bulk of this book, our focus will be on information propagation in homogeneous
networks. We briefly return to heterogeneous networks in Chapter 7.

1.1.2 EXAMPLES OF INFORMATION PROPAGATION


We begin with some concrete examples of propagations or information cascades in current online
social networking sites. Consider Facebook, where a user Sally updates her status or writes on a
friend’s wall about a new show in town that she enjoyed. Information about this action is typically
communicated to her friends. When some of Sally’s friends comment on her update, that infor-
mation is passed on to their friends and so on. In this way, information about the action taken
by Sally has the potential to propagate transitively through the network. Sam posts (“tweets”) in
Twitter about a nifty camera he bought which he is happy about. Some of his followers on Twitter
reply to his tweet while others retweet it. In a similar fashion, viewing of movies by users tends
to propagate on Flixster and MovieLens, information about users joining groups or communities
tends to spread through Flickr, adoption of songs and artists by listeners spreads through last.fm,
and interest in research topics propagates through scientific collaboration networks.
Is there a pattern to these propagation phenomena? What can we learn from analyzing
them and how can we benefit from the results of such analysis? In this chapter, we will address
these questions.

1.2 SOCIAL INFLUENCE EXAMPLES


We begin with a brief overview of several real-life stories that motivate the study of information
propagation in social networks.
In a famous study published in the New England Journal of Medicine, Christakis and Fowler
[2007] analyzed the medical records of about 12,000 patients. ey extracted a real offline (as
opposed to online) social network from these records, based on the relationships between the pa-
tients, including friendship, sibling, spouse, immediate neighbor, etc. eir goal was to study the
relationship between non-infectious health conditions, including obesity, and one’s social neigh-
bors and understand the correlation between having obese social network neighbors and being
obese oneself. Among other things, they found that having an obese friend makes an individual
171% more likely to be obese compared to a randomly chosen person. In cases of obese spouse
1.2. SOCIAL INFLUENCE EXAMPLES 3
and obese sibling, the corresponding numbers were 37% and 40%, respectively. It is to be noted
that their study did not focus on causation but instead on correlation. Still, their study shows
having obese social contacts is a good predictor of obesity.
e same authors, in an influential book [Christakis and Fowler, 2011] “present compelling
evidence for our profound influence on one another’s tastes, health, wealth, happiness, beliefs,
even weight, as they explain how social networks form and how they operate.” As specific ex-
amples, they argue that back pain spread from West Germany to East Germany once the Berlin
wall came down, that suicide spreads through communities, that specific sexual practices spread
through friendship networks among teenagers, and political beliefs and convictions propagate
through networks, the conviction being more intense the denser one’s connections.
In the business area, a famous case demonstrating information propagation leading to com-
mercial success, is the Hotmail phenomenon [Hugo and Garnsey, 2002]. In the early 1990s,
Hotmail was a relatively unknown e-mail service provider. ey had a simple idea, which was
appending to the end of each mail message sent by their users the text “Join the world’s largest
e-mail service with MSN Hotmail. http://www.hotmail.com.” is had the effect of building and
boosting a brand. In a mere 18 months, Hotmail became the number one e-mail provider, with 8
million users [Hugo and Garnsey, 2002]. e underlying phenomenon was that a fraction of the
recipients of Hotmail messages were inspired by the appended message to try it for themselves.
When they sent mail to others, a fraction of them felt a similar temptation. is phenomenon
propagated transitively and soon, adoption of Hotmail became viral.
Viral phenomena of the sort discussed above have sometimes changed lives, as in the rags-
to-riches story of Ted Williams [Zafar, 2012]. He was a homeless person in Columbus, Ohio,
USA, and had had many a brush with the law. He was found at a street corner in January 2011
when he was interviewed by a journalist. e interview was posted on YouTube, including details
that Williams was a former voice-over artist. Within months, the video attracted 11 million views,
and triggered numerous messages of support including job offers, changing his life for ever.
On November 16, 2011, a song from the soundtrack of a then upcoming Indian (Tamil)
movie, called “Why this kolaveri di? ” was released. By November 21, it was a top trend in Twitter.
Within a week of its release, it had attracted 1.3 million views on YouTube and more than a
million “shares” on Facebook, reaching and propagating through many non-Tamil speakers. It
eventually went on to win the Gold Award from YouTube for most views (e.g., 58 million as of
June 2012) and was featured in mainstream media such as Time, BBC, and CNN.
“Gangnam Style,” a South Korean song released in July 2012, became the first video to reach
1 billion views on YouTube as of December 21, 2012. Within one year of its first release, it has
been viewed more than 1.745 billion times, even surpassing Justin Bieber’s “Baby!”
e power of online information diffusion has also been utilized by citizens responding
to natural or man-made disasters. When there was a coordinated terror attack in Mumbai in
November 2008, as the events were unfolding, tweets were being sent via SMS at the rate of
about 16 per second, including in them such information as eye witness accounts, pleas for blood
4 1. INTRODUCTION
donors, location of blood banks and hospitals, etc. A Wikipedia page was up in minutes, providing
a staggering amount of detail and extremely fast “live” updates. A newswire service Metroblog was
set up in short order, containing 112 Flickr photos by a journalist giving a firsthand account of the
aftermath. A Google map with main buildings involved in the attacks, with links to background
and news stories was immediately set up. In Vancouver, Canada, in the summer of 2011, there
were riots following the Stanley Cup final. Rioters, many of them teens, looted and destroyed
properties in downtown. Many of them were bragging about it in social media, e.g., posing with
Gucci bags in front of burning cars. is triggered a widespread reaction of disgust and was
leveraged in mobilizing a cleanup effort. e amount of data made available for forensics was
staggering: contrasted with 100 h of VHS footage from 1994 riots, there now was 5000 h worth
of 100 types of digital video available for forensic analysis. is along with cooperation from the
public enabled the police to apprehend most of the rioters.

1.3 SOCIAL INFLUENCE ANALYSIS APPLICATIONS

e study of information and influence propagation has found applications in several fields, in-
cluding viral marketing, social media analytics, the spread of rumors, stories, interest, trust, re-
ferrals, the adoption of innovations in organizations, the study of human and non-human animal
epidemics, expert finding, behavioral targeting, feed ranking, “friends” recommendation, social
search, etc.
Among these, viral marketing or word of mouth marketing as it is otherwise called, is a
“poster” application of influence analysis. e vision behind this is to activate a small number of
“influential” individuals in a social network through which a large number of other individuals can
be influenced by a viral propagation. Formally, consider a social network represented as a directed
graph G D .V; E/ with nodes V corresponding to individuals and links E  V  V representing
social ties. Furthermore, suppose there is a function p W E ! Œ0; 1 that associates a weight or
probability p.u; v/ with every link .u; v/, representing the influence exerted by user u on v . is
informally captures the intuition that whenever u performs an action, then v also performs the
action after u, with probability p.u; v/. e idea behind viral marketing is that by getting a small
set of users in V (a seed set) to use a product, for instance by giving it to them for free or at
a discounted price, we can reach a much larger set of users through transitive propagation of
influence.
Interestingly, a family of applications in seemingly different domains and settings fall into a
pattern similar to the one we describe for viral marketing. Consider the water distribution network
of a large metropolitan city [Leskovec et al., 2007, Ostfeld and Salomons, 2004, Ostfeld et al.,
2006]: accidental or deliberate interference can introduce viruses or other contaminants in the
water being distributed. ere are sensors capable of detecting any outbreak, but each sensor is
expensive both in itself and in terms of its deployment and maintenance. A natural question is
whether we can find a small set of crucial junctions in the water distribution network in which to
1.4. THE FLIP SIDE 5
place the sensors, so as to detect any outbreak as quickly as possible. Alternatively, we may wish
to minimize the size of the population potentially affected by the undetected contamination.
Similar ideas can also be applied to study the adoption of innovation in organizations and
the propagation of rumors and information in general through society. An important aspect of
this is the propagation of information through social media, including the blogosphere and mi-
croblogging platforms. In social media, posts are linked to other posts allowing us to study the
propagation of stories and to determine who is an expert or an influencer on a given topics.
A spate of startups has sprung up around the notion of social media influence and we de-
scribe a few examples here. Klout (http://klout.com/) claims to compute the overall influence
of users online based on their behavior and their followers’ behavior in Facebook, Twitter, and
LinkedIn. e “klout” score is computed as a function of the true reach, amplification probabil-
ity, and network influence. PeerIndex (http://peerindex.com/), on the other hand, identifies
authorities and determines authority scores over the social web on a per topic basis. PeerIndex
and Influencer50 (http://influencer50.com/) explicitly work with a viral marketing business
model, calculating users’ influence scores by analyzing the likes and retweets they receive online
and recommending the influential people they found to brand name companies. Users are en-
couraged to try their best to be influential and in return receive rewards and offers from such
companies.

1.4 THE FLIP SIDE


e key hypothesis of viral marketing, namely that a small number of influential users can be
found and leveraged to reach a large audience through a social network, is wrought in practice
with many challenges.
Firstly, when and how can we say that there is influence between users? ere are at least
two different phenomena surrounding users’ behavior that are different from influence, but may
appear to be as such. ere is the possibility of homophily, often explained with the popular phrase
“birds of a feather flock together.” What it means is that the tastes of two users who are connected
may be similar. For example, a person who smokes may be married to another smoker, which in
turn may make both of them more propense to develop respiratory diseases. If we observe that
one spouse develops asthma, followed by the other one, can we really claim that the health of
the first influenced that of the second? e existence of a social tie does not necessarily cause a
certain behavior to propagate: the observation may be explained using correlation as opposed to
causation.
e problem of homophily vs. influence has been tackled by some researchers. Anagnos-
topoulos et al. [2008] describe a technique called shuffle test for distinguishing between influence
and correlation. Aral et al. [2009] describe a statistical method for distinguishing between influ-
ence and homophily. By analyzing the day-by-day mobile service adoption behavior of over 27
million Yahoo! users in Yahoo! instant messaging network, they show that over 50% of what was
previously perceived as behavioral contagion is explained by homophily.
6 1. INTRODUCTION
Secondly, researchers have asked whether influence can really drive substantial viral cascades
over real-world social networks. In a series of papers, Watts and Peretti [2007] and Goel et al.
[2012] have challenged the conventional notions and intuitions about social influence causing
large viral spread. On several datasets, they find that, empirically, most adoptions are not due to
peer influence and do not propagate beyond a first step. ey argue that social epidemics are not
always responsible for dramatic, possibly sudden social change. While the existence of influence
can be difficult to detect, they do not altogether dismiss the role played by influence. ey suggest
that instead of a small number of individuals driving epidemic-like viral cascades, it may be more
realistic to target a relatively large critical mass of users who can then carry the viral campaign.
ey call this “big seed” marketing.
On the other hand, there have been other studies [Huang et al., 2012, Iyengar et al., 2011]
revealing the genuine existence of social contagion and influence. Huang et al. [Huang et al.,
2012] show that even after removing the effects of homophily, there is clear evidence of influence.
For instance, they find that people rate items recommended by their friends higher than they oth-
erwise would. Iyengar et al. [Iyengar et al., 2011] analyzed data from a pharmaceutical company
on drug prescriptions for chronic illnesses by physicians in three major U.S. metropolitan areas.
Unlike traditional products, prescription drugs for chronic illnesses involve various risks, e.g., of
the patients developing resistance to the drug. In this case, they found that even after controlling
for other mass media marketing efforts, and global network wide changes, there is genuine social
contagion at work. Gruhl et al. [2004] analyzed the blogosphere data and showed, among other
things, that there are indeed influential individuals who are highly effective at contributing to the
spread of “infectious” topics in the blogosphere.
To summarize, the existence of influence and its effectiveness for applications such as viral
marketing depend on the datasets. ere is both evidence supporting and challenging it, found
from different datasets by researchers. For a given situation, careful analysis of evidence in avail-
able datasets should first be undertaken before deciding whether to adopt a viral marketing ap-
proach.

1.5 OUTLINE OF THIS BOOK


Propagation of information and its study is the central theme of this book. e next chapter
(Chapter 2) describes formally a general framework of stochastic diffusion models to understand
and model these phenomena, introduces the most studied models of propagation in the litera-
ture (independent cascade and linear threshold), and discusses their relationship with some other
models that describe similar phenomena.
Chapter 3 studies the problem of influence maximization to which we have alluded in the
present chapter, specifically in Section 1.3: how to select a small set of seed users to reach virally
a large population through a social network. e complexity analysis of this problem shows that
it is #P-hard under the two main models of propagation. Starting with a greedy algorithm for
influence maximization which has provable approximation guarantees, we introduce a series of
1.5. OUTLINE OF THIS BOOK 7
improvements to obtain highly scalable influence maximization algorithms. Some new results
and analyses, such as the #P-hardness of even selecting a single best seed (Corollary 3.3) and the
running time analysis of the original greedy algorithm (eorem 3.7), have not appeared before
in the literature and are included in this chapter.
Chapter 4 deals with variants of the influence maximization problem. Instead of performing
influence maximization on a model of the influence propagations (a model-based approach), we
describe a data-based approach in which the influence of the seed set is estimated from data about
past propagations containing those nodes. We also study influence maximization in a competitive
setting, where two or more ideas or viruses propagate simultaneously and compete with each other,
as well as other extensions of the main paradigm. We include in this chapter some new results
on the submodularity of homogeneous competitive independent cascade models under various
tie-breaking settings (eorems 4.9 and 4.10) that has not appeared in the literature.
Chapter 5 studies how to learn the parameters of influence propagation models from past
observations. Given a social network and a set of actions that have propagated through it, we
would like to know who is influential over whom and to what extent. ere are many aspects of
this problem that can be studied, including the fact that influence weights can change over time.
Chapter 6 addresses a pressing practical issue of this research which is the need for appro-
priate datasets for experimentation. In addition to describing the types of data that have been
used by research in this field, we overview a set of existing software tools that may be helpful to
researchers.
e last chapter (Chapter 7) describes key challenges for researchers in this area, many of
them related to transferring this research into actual technologies. We also outline a few algorith-
mic problems that remain open.
e content of this book is based on the tutorial titled “Information and Influence Spread
in Social Networks,” given by the authors at the 18th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD) in August 2012, the slides being available
online [Castillo et al., 2012]. e tutorial slides can serve as a companion to this book, while the
book includes more comprehensive and in-depth coverage of various diffusion models, influence
maximization algorithms and their analysis, and also contains some more recent developments in
the area since the tutorial.

You might also like