IM Balanced - Influence Maximization Under Balance Constraints

Uploaded by

Hoda El HALABI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views4 pages

IM Balanced - Influence Maximization Under Balance Constraints

Uploaded by

Hoda El HALABI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Demonstration Paper CIKM’18, October 22-26, 2018, Torino, Italy

IM-Balanced: Influence Maximization Under Balance

Constraints
Shay Gershtein, Tova Milo, Brit Youngmann and Gal Zeevi
Tel Aviv University
{shayg1,milo,brity,galzeevi}@post.tau.ac.il

ABSTRACT extensive research [2, 9], emphasizing the development of scalable

Influence Maximization (IM) is the problem of finding a set of in- IM algorithms [6, 11].
fluential users in a social network, so that their aggregated influ- The majority of IM works focus on maximizing the overall in-
ence is maximized. IM has natural applications in viral marketing fluence. While this serves the goal of reaching a large audience,
and has been the focus of extensive recent research. One critical IM algorithms may obliviously focus on certain well-connected
problem, however, is that while existing IM algorithms serve the populations, at the expense of key demographics. This may create
goal of reaching a large audience, they may obliviously focus on an undesirable imbalance, a conspicuous illustration of a broad
certain well-connected populations, at the expense of key demo- phenomenon referred to as algorithmic discrimination. In this work,
graphics, creating an undesirable imbalance, an illustration of a we assume the existence of a boolean function(s) over user profile
broad phenomenon referred to as algorithmic discrimination. In- attributes, which identifies a protected user group(s). This function
deed, we demonstrate an inherent trade-off between two objectives: can model juridical definitions of protected demographics, or be
(1) maximizing the overall influence and (2) maximizing influence any arbitrary boolean query over multiple attributes.
over a predefined “protected" demographic, with the optimal bal- As an example, consider a high-tech company interested in re-
ance between the two being open to different interpretations. To cruiting researchers, opting for a social media campaign to inform
this end, we present IM-Balanced, a system enabling end users to as many candidates as possible, through opinion leaders in the field.
declaratively specify the desired trade-off between these objectives Employing an IM algorithm may produce a campaign overlooking
w.r.t. an emphasized population. IM-Balanced provides theoretical a protected group of users, e.g., characterized by gender, age and/or
guarantees for the proximity to the optimal solution in terms of country. This latent, non-meritocratic discrimination harms both
both objectives and ensures an efficient, scalable computation via potential candidates and the company, impeding the promotion of
careful adaptation of existing state-of-the-art IM algorithms. Our a balanced environment and creating a vulnerability to discrimina-
demonstration illustrates the effectiveness of our approach through tion lawsuits. With the pervasiveness of such automated processes,
real-life viral marketing scenarios in an academic social network. these concerns have substantial economic and moral repercussions.
A related line of research is concerned with targeted IM algo-
KEYWORDS rithms [10], which aim to find a seed set maximizing the influence
over users relevant to a given topic/context [9]. While these works
Influence Maximization; Social Networks; Balance
provide theoretical guarantees for this objective, they do not en-
ACM Reference Format: sure that the overall influence is sufficiently large, compared to a
Shay Gershtein, Tova Milo, Brit Youngmann and Gal Zeevi. 2018. IM- non-targeted IM algorithm. Continuing with our example, while
Balanced: Influence Maximization Under Balance Constraints. In The 27th
promoting the exposure to protected users is important, general
ACM International Conference on Information and Knowledge Management
(CIKM ’18), October 22–26, 2018, Torino, Italy. ACM, New York, NY, USA,
large exposure is still essential to identify the best candidates, and
4 pages. https://doi.org/10.1145/3269206.3269212 is not guaranteed by existing targeted IM algorithms. Indeed, as we
show in this work, given a seed set size requirement, there exists
1 INTRODUCTION an inherent trade-off between the two objectives: (1) maximize the
Social networks attracting millions of people, such as Facebook, overall influence and (2) maximize influence over protected users;
LinkedIn, and Sina Weibo, have emerged recently as a prominent hindering a simultaneous optimization of both (see Section 2.2). This
marketing medium. Influence Maximization (IM) is the problem of trade-off produces a spectrum of different possible combinations
finding a set of influential users (termed a seed set) in the network, so of emphases on each objective, with the particular choice being
that their aggregated influence is maximized [7]. IM has a natural application dependent, corresponding to different interpretations
application in viral marketing, where companies promote their of balance, fairness and diversity in the literature [3, 12, 13].
brands through the word-of-mouth propagation. This has motivated To the best of our knowledge, IM-Balanced is the first system
enabling users to declaratively specify the desired trade-off between
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed the two objectives. IM-Balanced allows users to specify the pro-
for profit or commercial advantage and that copies bear this notice and the full citation tected population and the notion of balance between the objectives
on the first page. Copyrights for components of this work owned by others than ACM that they wish to achieve. Our algorithm ensures an efficient, scal-
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a able computation, while providing theoretical guarantees for the
fee. Request permissions from [email protected]. proximity to the optimal solution in terms of both objectives. For
CIKM ’18, October 22–26, 2018, Torino, Italy
simplicity of presentation, we assume in the next sections a single
© 2018 Association for Computing Machinery.
ACM ISBN 978-1-4503-6014-2/18/10. boolean function defining one protected group, but our definitions,
https://doi.org/10.1145/3269206.3269212

1919
Demonstration Paper CIKM’18, October 22-26, 2018, Torino, Italy

results and system apply to multiple functions corresponding to

possibly overlapping subsets of the populations.
The two key challenges that our system tackles are as follows.
Balanced IM. What is the correct trade-off between the objec-
tives? As the definition is arguably subjective and context depen-
dent, it requires a flexible, tunable system that enables to explicitly
manage this trade-off. IM-Balanced allows the user to prioritize
the objectives and transform one into a parametrized constraint. Figure 1: Example colored social network.
For example, “Maximize the overall influence, while ensuring the T . The function I (·) is defined by an influence propagation model.
influence over protected users is above a given threshold", or, al- The majority of existing IM algorithms apply for the Independent
ternatively, “Maximize the influence over protected users, while Cascade (IC) and the Linear Threshold (LT) models [2, 6], both pro-
ensuring the overall influenced is above a given threshold". That is, posed in [7]. Our results hold under both models, but, for simplicity
IM-Balanced enables a tunable definition of balance (to be formally of presentation, we focus on the IC model.
defined in Section 2), where the user can declaratively specify: (1) We refer to influenced nodes as covered. Initially only seed nodes
The required size of the seed set; (2) The protected group of users; (3) are covered. In the IC model the propagation is carried out in dis-
The balancing criteria, which translates to customizable objective crete steps, s.t. each node covered in the preceding step attempts to
and constraint functions, along with a size threshold parameter. influence its uncovered neighbors, with an independent probability
Efficiency and Scalability. State-of-the-art IM algorithms are the indicated by the weight of the edge connecting them.
product of decades of research focused on scalability, capable of Selecting the optimal seed set is N P-hard, with inapproximability
managing billion-node networks. A key challenge is, thus, to pro- beyond a factor of (1 − e1 ) [7]. Recent IM algorithms, based on the
vide an algorithm on par with existing IM algorithms, in terms of Reverse Influence Sampling (RIS) approach [2], achieve optimal
performance. Given a user specification, IM-Balanced generates accuracy in near optimal time [11]. The RIS framework samples
an algorithm instance suiting the user-defined notion of balance. nodes independently and uniformly, then for each sampled node,
The generated instances employ existing IM algorithms as a white constructs a Reverse Reachability (RR) set consisting of its sampled
box, with minor adaptations, thus supporting extensibility and sources of influence. Next, the problem is reduced to an instance of
facilitating performance comparable with top performing IM algo- the Maximum Coverage problem, where k nodes are selected with
rithms. IM-Balanced is also assured to satisfy the constraint while the goal of maximizing the number of covered RR sets.
providing theoretical guarantees for the objective.
Demonstration overview. We demonstrate the operation of IM- 2.2 Balanced IM
Balanced through the reenactment of a real-life viral marketing Our framework supports multiple (possibly overlapping) protected
scenario, where the system is used to identify influential individuals groups, however, for simplicity, we assume here a single such group.
in the research community, for the purpose of recruiting researchers In our setting, nodes (users) are colored s.t. a subset of users (blue
for a high-tech company. For our illustration, influence is captured nodes) belong to a protected group, with all other users (red nodes)
in terms of citations and collaborations, inferred from a real-life referred to as non-protected. For example, Figure 1, depicts a sample
social network [1], and protected groups are illustratively defined colored network. Recall that I (S) denotes the expected number of
in terms of various attributes such as gender and nationality. We covered users by S, and let Ir (S), Ib (S) denote the expected number
present several examples of subpopulations that are indeed ne- of red/blue users covered by S, resp. Additionally, let O denote the
glected by standard IM algorithms, alongside results obtained by optimal k-size solution in terms of cover size. For example, in Figure
our system, demonstrating the advantages of our approach. The 1, for k = 2, O = {e, д}, I (O) = 4 12 , Ir (O) = 4 18 and Ib (O) = 38 . One
audience will actively participate in the demonstration by selecting can see that O covers almost exclusively red users.
their desired protected groups and composing various balance defi- To obtain a more balanced solution, one may request that a
nitions, then reviewing the results in contrast to those obtained by larger number of blue nodes should be covered. However, this
other baseline approaches. See more details in Section 3. requirement alone, if not properly constrained, may lead to a drastic
decrease in the overall cover size, rendering the solution undesirable.
2 TECHNICAL BACKGROUND To illustrate, consider again Figure 1 with k = 2. As mentioned,
We start by providing a brief overview of the Influence Maximiza- the size of the optimal solution in terms of cover size is I (O) =
tion (IM) problem, then present our framework for Balanced-IM. 4 12 and Ib (O) = 38 . Nonetheless, the optimal solution in terms of
For space constraint, proofs are deferred to our technical report [5]. covered blue nodes is B = {d, f }, where I (B) = Ib (B) = 2 and
Ir (B) = 0, which covers a greater number of blue nodes at the cost
2.1 Influence Maximization of significantly reducing the overall cover.
We model a social network as a directed weighted graph G = This simple example exposes the inherent trade-off between
(V , E,W ), where V is the set of nodes and each edge (u, v) ∈ E is these two objectives, implying that instead of naively maximizing
associated with a weight W (u, v) ∈ [0, 1], which models the proba- both simultaneously, one should prioritize the objectives and trans-
bility that node u will influence its neighbor v. Given a function I (·) form the secondary objective into a constraint strong enough to
dictating how influence is propagated in the network and a number ensure balance, but weak enough to provide a necessary degree
k, IM is the problem of finding a seed set O = arдmaxT ⊆V , |T |=k I (T ), of freedom in optimizing the main objective. Towards this end,
where I (T ) denotes the expected number of nodes influenced by IM-Balanced enables the user to specify: (1) The number k of seed

1920
Demonstration Paper CIKM’18, October 22-26, 2018, Torino, Italy

nodes; (2) The protected group of users; (3) The objective and the Algorithm 1 Algorithm instance for the protected-oriented bal-
constraint functions, and (4) The threshold parameter t ∈ [0, 1], ance definition.
that restricts the extent to which the solution is allowed to deviate 1: Input: The parameters k and t and an algorithm A.
from the optimum for the constraint. 2: Output: A k -size solution S .
To illustrate, consider the following example definition of Bal- 3: We run independently the following two procedures:
anced IM, referred to as the protected-oriented definition: Given (1) S 1 ← Run algorithm A with k ′ = ⌈t · k ⌉.
(2) S 2 ← Run algorithm Ab with k ′ = ⌊(1 − t ) · k ⌋.
the parameters k and t, find a seed set B ∗ that maximizes the num-
4: S ← S1 ∪ S2
ber of covered blue nodes, subject to a constraint on the overall
5: if |S | < k then
cover size being above the specified fraction of its optimal (possibly 6: Run Ab again until k seeds are gathered.
unbalanced) maximal value. Namely, 7: end if
B ∗ = arдmax |T |=k, I (T )≥t ·(1− 1 )·I (O ) Ib (T ) 8: return S
e

Recall that O is the optimal solution in terms of cover size. Note Recall that O ∗ denotes the k-size optimal seed set for the protected-
that in the above formula the expected cover size of a given set is oriented definition. We can prove that Algorithm 1 guarantees a
compared to (1− e1 )·I (O), rather than to I (O), since even for standard (1 − t) · (1 − e1 )-approximation to the protected-oriented definition.
IM, unless P = N P, no polynomial algorithm can guarantee a cover That is: Ib (S) ≥ (1 − t) · (1 − e1 ) · Ib (O ∗ ) and I (S) ≥ t · (1 − e1 ) · I (O).
size greater than (1 − e1 ) · I (O) [7]. Note that the time complexity of the algorithm depends on that of
Similarly, one can choose to maximize the overall cover size, sub- A (we run A twice), which is nearly optimal [6, 11].
ject to the constraint that enough blue nodes are covered. We refer Finally, we conclude with a brief explanation of the algorithm
to this definition as size-oriented. Namely, find a set O ∗ satisfying: instances generated for other balance definitions. Conceptually,
given a balance definition, all that needs to be adjusted is the num-
O ∗ = arдmax |T |=k, Ib (T )≥t ·(1− 1 )·Ib (B) I (T )
e ber of seeds required for each of the algorithms A, Ab and Ar .
where B is the optimal k-size solution in terms of blue nodes. Here For example, to comply with the size-oriented definition, we set
again, we compare Ib (T ) to (1 − e1 ) · Ib (B) rather than to Ib (B), as algorithms A and Ab to return ⌈(1−t)·k⌉ and ⌊t ·k⌋ seeds, resp. As
we can prove that the same complexity bound mentioned above another example, one may ask to maximize the number of covered
holds for this variation as well. The user can similarly choose other blue nodes, subject to a constraint on the number of covered red
balance definitions that, e.g. constrain the number of covered red nodes. To support this definition, we run Ab and Ar to return
nodes, or add constraints on the selected seed nodes. ⌈(1 − t) · k⌉ and ⌊t · k⌋ seeds resp. For more details see [5].

2.3 Computing the Balanced IM solution 3 SYSTEM AND DEMONSTRATION

Given a user specification, IM-Balanced generates an algorithm in- Implementation. IM-Balanced is implemented in Python 2.7. The
stance, suited for that notion of balance. As mentioned, the instance IM algorithm our prototype employs is IMM [11]. The user specifies
employs, as a white-box, an existing RIS-based IM algorithm (e.g., her balance definition via the UI, implemented in HTML5/CSS3,
[6, 11]). We start by shortly describing our generic modification of then the system generates, and efficiently runs, the suitable algo-
a given IM algorithm, followed by our full solution scheme. rithm instance. The results are displayed on a results page, along
with charts depicting various statistics. The user may examine and
Protected-aware IM. Given an (RIS based) algorithm A, we de-
compare results, and correspondingly refine her inquiry.
fine Ab to be its protected-aware version, i.e., while A maximizes
the overall cover size, Ab maximizes the number of covered blue Demonstration. We demonstrate the capabilities of IM-Balanced
nodes exclusively. Any RIS-based algorithm can be adapted to its through the reenactment of a real-life viral marketing scenario,
protected-aware counterpart via a single modification: the RR sets where the system is used for the advertisement of open research
are sampled from blue nodes only. We can prove that Ab outputs positions in a high-tech company. For our illustration, we have con-
a solution covering at least (1 − e1 ) · Ib (B) blue nodes, where B is structed a graph based on a social network of researchers extracted
the optimal k-size solution maximizing Ib (·) [5]. Analogously, we from [1], focusing on the Database and Data Mining community.
define Ar to be a variant of an IM algorithm A, which maximizes The profile of a researcher includes details about her gender, coun-
the influence over the red nodes exclusively. try, etc. The mutual influence relations capture information on the
Generating a definition-dedicated algorithm instance. To ease the authors’ past collaborations/citations (with weights reflecting the
presentation, we first illustrate the algorithm template generated for portion of collaborations/citations between two authors).
the protected-oriented balance definition, then briefly explain how In our first demonstration scenario, we assume (for the sake of
this generalizes to support customizable alternative definitions. illustration) that the company wants to ensure that a large number
The algorithm instance IM-Balanced generates for this definition of researches are informed about the opening, while guaranteeing
is depicted in Algorithm 1. It runs independently two procedures: that enough young female researchers are informed as well. Using
one ensures that the solution satisfies the constraint1 (line 3.1), and the system’s UI, as depicted in Figure 2a (from top to bottom),
the second maximizes the objective (line 3.2). It then returns the CIKM participants will choose: (1) which part of the network to
union S of the selected seeds. If S contains less than k seeds, it runs consider (only the collaboration edges, only citations, or both); (2)
Ab on the residual problem to complete the seed set (lines 5-7). the properties of the protected group - here, females under 40 (the
cardinality of this group is displayed in real time); (3) the required
1 The rounding operation ensures the number of seeds is an integer. seed set size k (the expected influence of a standard IM algorithm

1921
Demonstration Paper CIKM’18, October 22-26, 2018, Torino, Italy

(a) Balance definition builder. (b) Results page.

Figure 2: IM-Balanced UI.
for this k is also displayed); (4) a balance definition: the objective - ranking [3, 13]. Diversity, i.e., ensuring that different kinds of objects
here, maximize the overall influence, and the constraint - the size are represented in the output of an algorithm as opposed to similar
of the protected cover is at least 50% of the size of the optimal cover high-scoring results, has been studied in the context of search
of protected users (t = 0.5). Pressing the “Compute a balanced engines and recommender systems [4]. Each of these concepts is
solution" button, the system runs the instantiated algorithm. The naturally subject to various context-dependent interpretations and
results page, as portrayed in Figure 2b, is then presented. plays a different role in studying algorithmic imbalances, hence
As one of our key objectives is to enable the audience to gauge the the parametrization of our framework to accommodate a variety
effectiveness of IM-Balanced, the results are displayed alongside of applications. Our definition of protected group generalizes the
those of previous IM algorithms that either focus solely on maxi- definitions used in previous work [4, 13], by capturing both binary
mizing the overall influence [11] or on the protected group alone and non-binary attributes of possibly overlapping subsets of users.
(targeted-IM) [9]. To illustrate, Figure 2b depicts such a compari- IM has been studied extensively, with emphasis on scalable per-
son, showing that the solution returned by IM-Balanced influences formance [6, 7]. IM-Balanced can employ any top performing IM
almost the same number of users as the standard IM algorithm algorithm (e.g., [11]) in a white-box manner, match its performance
(7, 076 vs. 9, 433), while influencing almost twice as many protected and take advantage of all its optimizations (e.g., parallelized com-
users. We can see that IM-Balanced also comes very close to the putation), while also retaining theoretical accuracy guarantees.
targeted-IM solution w.r.t. the influenced protected users (2, 972
Acknowledgments. We thank Julia Stoyanovich for her insightful
vs. 3, 302), while influencing overall significantly more users, and
comments on this work. This work has been partially funded by
thus is clearly more advantageous. Additionally, the selected seeds
the European Research Council under the FP7, ERC grant MoDaS,
(ordered alphabetically) of each algorithm are presented to the par-
agreement 291071, and by grants from Intel, the Israel Innovation
ticipants, allowing for the discovery, e.g., of which researches have
Authority and the Israel Science Foundation.
significant influence on young women while at the same time influ-
encing the community at large. The audience will then be invited to
REFERENCES
formulate other balance notions by tuning the protected group def-
[1] AMiner 2018. AMiner. (2018). https://aminer.org/data.
inition and the balance criteria (e.g., switch between the objective [2] Chayes J. Borgs C., Brautbar M. and Lucier B. 2014. Maximizing Social Influence
and the constraint), and will examine how these affect the results. in Nearly Optimal Time. In SODA.
[3] Straszak D. Celis L E. and Vishnoi N K. 2017. Ranking with fairness constraints.
Finally, we will present to the participants other viral marketing arXiv preprint arXiv:1704.06840 (2017).
tasks, such as calls for nominations for awards/grants applications. [4] Pitoura E. Drosou M., Jagadish HV and Stoyanovich J. 2017. Diversity in big data:
In each scenario, we will examine various protected populations A review. Big data (2017).
[5] Shay Gershtein, Tova Milo, Brit Youngmann, and Gal Zeevi. 2018. Balanced IM:
which are neglected by standard IM algorithms, consider several Technical Report. (2018). Tel Aviv University.
balancing criteria, and correspondingly examine how these affect [6] B. Glenn X. Xiaokui Huang K., W. Sibo and Lakshmanan L. V. S. 2017. Revisiting
the Stop-and-stare Algorithms for Influence Maximization. PVLDB (2017).
the selected seeds and the size/type of the influenced populations. [7] Kleinberg J. Kempe D. and Tardos E. 2003. Maximizing the Spread of Influence
Last, to demonstrate the robustness of IM-Balanced, interested par- Through a Social Network. In KDD. ACM.
ticipants will be further allowed to examine the system’s operation [8] Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network
Dataset Collection. http://snap.stanford.edu/data. (June 2014).
on additional real-life social networks such as Pokec, a popular [9] Yuchen Li, Dongxiang Zhang, and Kian-Lee Tan. 2015. Real-time Targeted
social network in Slovakia and data extracted from Twitter [8], Influence Maximization for Online Advertisements. PVLDB (2015).
whose graph datasets were also ported into the system. [10] Chonggang Song, Wynne Hsu, and Mong Li Lee. 2016. Targeted Influence
Maximization in Social Networks. In CIKM. ACM.
[11] Shi Y. Tang Y. and Xiao X. 2015. Influence Maximization in Near-Linear Time: A
4 RELATED WORK Martingale Approach. In SIGMOD.
The study of algorithmic discrimination, fairness and diversity has [12] Gomez R. M. Zafar M. B., Valera I. and Gummadi K. P. 2017. Fairness constraints:
Mechanisms for fair classification. arXiv preprint arXiv:1507.05259 (2017).
been gaining popularity in recent years. Work on fairness, with the [13] Castillo C. Hajian S. Megahed M. Zehlike M, Bonchi F. and Baeza-Yates R. 2017.
aim of remedying algorithmic bias against groups or individuals on Fa* ir: A fair top-k ranking algorithm. In CIKM. ACM.
unreasonable grounds, focused largely on predictive tasks [12] and

1922