2020 Linkedin Ads Allocation in Feed Via Constrained Optimization
2020 Linkedin Ads Allocation in Feed Via Constrained Optimization
2020 Linkedin Ads Allocation in Feed Via Constrained Optimization
ABSTRACT with such content because of their own interest in the content or in
Social networks and content publishing platforms have newsfeed the creator or both. Showing relevant organic content helps retain
applications, which show both organic content to drive engagement, users and grow their long term engagement on the platform.
and ads to drive revenue. This paper focuses on the problem of ads Monetization is another key consideration for most social network
allocation in a newsfeed to achieve an optimal balance of revenue platforms and the common mechanism is to insert ads (which are
and engagement. To the best of our knowledge, we are the first “sponsored” updates in contrast to the aforementioned organic up-
to report practical solutions to this business-critical and popular dates), in the feed. These ads have a native feel and often blend
problem in industry. well with the surrounding organic updates. Such an ad product
The paper describes how large-scale recommender system like feed helps advertisers reach their target audience (thereby expanding
ranking works, and why it is useful to consider ads allocation as a and eventually earning profit), while simultaneously enabling the
post-operation once the ranking of organic items and (separately) platform to build a viable business with the ads revenue. Search
the ranking of ads are done. A set of computationally lightweight ads and display ads (often filled via demand side platforms (DSP)
algorithms are proposed based on various sets of assumptions in or ad networks) have a similar underpinning. There is very limited
the context of ads on the LinkedIn newsfeed. Through both offline reported work on identifying the optimal number of search ads to
simulation and online A/B tests, benefits of the proposed solutions insert at the top of the page [23].
are demonstrated. The best performing algorithm is currently fully To the best of our knowledge, there has been no reported work
deployed on the LinkedIn newsfeed and is serving all live traffic. on how to determine optimal positions for ads in the feed,
and the effectiveness of various approaches in a real-world
CCS CONCEPTS large-scale application. This problem has more complexity than
the search ads problem since ads can be inserted at any position.
• Information systems Computational advertising; Online
Given the popularity of social networks, and the importance of
advertising; Social advertising; Rank aggregation.
monetization via ads on feed, this is a very critical problem and is
KEYWORDS the focus of our current work. The methods we propose generalize
to merging two (or more) content streams in a feed-like application,
Social Networks, Computational advertising, Constrained optimiza- but certain artifacts are more likely to be observed in ads (e.g., the
tion, User feedback modeling “gap effect” as explained in Section 4).
ACM Reference Format: Organic items are the main driver for engagement, quantified by
Jinyun Yan, Zhiyuan Xu, Birjodh Tiwana, Shaunak Chatterjee. 2020. Ads various affirmative user actions. The ranking objective for organic
Allocation in Feed via Constrained Optimization. In 26th ACM SIGKDD content is maximizing engagement. Ads, on the other hand, are
Conference on Knowledge Discovery and Data Mining (KDD ’20), August
ranked to maximize expected revenue and typically involve an
23–27, 2020, Virtual Event, USA. ACM, New York, NY, USA, 9 pages. https:
auction, which determines pricing based on the order of ads. The
//doi.org/10.1145/3394486.3403391
importance of both these utilities and the vastly varying factors
1 INTRODUCTION at play for each, result in the following two outcomes in most
large-scale feed ranking systems: 1. Individual, separate systems
The newsfeed, also referred to simply as feed, is a popular product
determine ranking of organic items and ads (often with tons of
on many social network platforms. Facebook, Instagram, Pinterest,
custom information). 2. The blending layer, which comes after the
Twitter, Tiktok, and LinkedIn are some examples with a central feed
individual ranking is computed, is required to respect the original
product, where hundreds of millions of users consume content on
ranking among items scored by the same system.
a daily basis. Users visit feed for relevant content, which includes
These two objectives are often in conflict on the feed because there
friends’ and followees’ updates, company news, group activities and
are limited impression slots. This would not happen if ads were
job postings. We refer to such content, where creators do not pay to
more engaging than organic items, but that is rarely the case. The
have their content shown to users, as organic content. Users engage
problem we address is how to allocate impressions to ads in feed
Permission to make digital or hard copies of all or part of this work for personal or efficiently to balance engagement and revenue. One solutions is to
classroom use is granted without fee provided that copies are not made or distributed simply use fixed slots. For instance, in search ads and display ads
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the on publishers, as well as in some feed applications, ads are often
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or allocated to pre-determined slots. However, such a solution can be
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected].
quite suboptimal for feed as shown in the example in Figure 1.
KDD ’20, August 23–27, 2020, Virtual Event, USA One key requirement in blending two sets of results when they are
© 2020 Copyright held by the owner/author(s). Publication rights licensed to ACM. optimizing for different utilities is a conversion factor among the
ACM ISBN 978-1-4503-7998-4/20/08. . . $15.00
https://doi.org/10.1145/3394486.3403391
utilities. In some cases, these could be specified by the business
3386
Applied Data Science Track Paper B KDD ‘20, August 23–27, 2020, Virtual Event, USA
3387
Applied Data Science Track Paper B KDD ‘20, August 23–27, 2020, Virtual Event, USA
3 SYSTEM OVERVIEW advertisers. Changing the ranking of either list is highly unde-
We first describe the overview of a large scale recommender system sired. Hence, our blending algorithms take this as a constraint
like the Feed. It is based on how the LinkedIn Feed operates, but which in turns helps simplify the solution. We treat the esti-
many design choices and requirements should be representative of mated utilities coming from the initial ranking layer as accurate
Feed ranking systems in the industry. The overall objective of Feed and expect them to be monotonically decreasing in each ranked
ranking is to produce a combined list of organic items and ads to list.
achieve an optimal balance of engagement and revenue. • Modeling velocity. In many medium to large sized companies,
there are big teams working on the organic ranking and ads
3.1 Organic items ranking ranking modules separately (often, there are multiple teams fo-
cusing on various aspects within each module). It is imperative
Engagement on organic items is often quantified by users’ actions,
to have a setup that allows each of those modules to iterate
including clicks, likes, comments, shares, conversion, and/or dwell
asynchronously. This was one of the biggest factors in our de-
time. Organic content is the main driver of user engagement, be-
cision to adopt the two-phase ranking, and have most of the
cause the creator and content are relevant and of interest to viewers.
complexity of ranking nuances be handled in the respective
The objective is to rank organic content is maximizing expected user
ranking modules. In order to ensure seamless iterations for the
engagement. There is some excellent work in literature covering
upstream ranking modules, we have to automatically adjust (or
the key factors to consider in building such models, both in terms
“auto-tune”) certain parameters in our blending algorithm. We
of signals and from a system architecture perspective [4, 5, 20].
will revisit this detail in the Section 5.4.
Ads also drives a small portion of engagement through users’ click
and/or conversion activities.
The system complexity of such a recommender system could be
high because of the number of features that need to be fetched, as
well as the scoring complexity of the model used. In Figure 2, the
“Organic ranking” module represents this module.
3388
Applied Data Science Track Paper B KDD ‘20, August 23–27, 2020, Virtual Event, USA
3.5 Additional desired properties engagement utility uo , while ads are ranked by expected revenue
There are two other considerations specific to ads that a good utility ra . Each item is associated with both utilities: r and u. For
blending algorithm should satisfy. Results in Section 5.3.2 and 5.3.1 every impression slot i, we need to decide whether to pick the
show how effective our algorithms are on these dimensions. top item from the organic ranked list, or the one from ranked ads
list, and then remove the winner from the corresponding ranked
Adaptation to seasonality. Ads demand (i.e., the bids and budgets) list. Let x i be the variable to decide whether to show an ad in the
may have a quarterly and annual periodic pattern. A good blending impression (or slot) i. We formulate the problem as a constrained
algorithm will increase (decrease) ad impressions when the revenue optimization problem as shown in Equation 1.
utilities are systemically higher (lower). In some applications, the
organic engagement could also have such temporal patterns (e.g., Õ w
higher engagement during holidays on platforms like Facebook) maximize xi ri − ||x|| 2
i
2
which the blending mechanism should help adapt to. This is in
x i uia + (1 − x i )uio ≥ C
Õ
addition to the local fluctuations and variations in revenue and s.t. (1)
organic engagement utility that the blending algorithm is already i
helping to capitalize on. 0 ≤ x i ≤ 1, ∀i ∈ I
Dynamic ads positions. Ads blindness is the term used to refer to This formulation is to maximize revenue across all impressions
the user behavior when a user gets used to seeing ads in a particular (which are spread over all requests) such that the total engagement
spot on the Feed or other surfaces. If users don’t find ads as engag- is larger or equal to some constant value (C in this case). C can
ing, they may develop blindness for that regular spot, which may be set to a fraction (i.e., δ ) of the maximum possible engagement
lead to not noticing relevant ads. If the blending mechanism places (e.g., when there are no ads), but that choice is an orthogonal con-
ads at different spots in different sessions for each user (to achieve sideration. It should be noted that we could formulate the problem
its revenue and engagement maximization objectives), then that is equivalently as an engagement maximization problem with a rev-
an extra benefit to help counter phenomena like ads blindness. enue constraint. By varying the value of C above, and the revenue
constraint in the alternate formulation, we would traverse the same
With this context, we will present a set of blending algorithms that Pareto-optimal curve between the two utilities.
will have the following properties: We closely follow [3] (see specifically Section 3) to derive the solu-
tion. We cannot solve this constrained optimization on the fly since
• Low computational complexity.
it is defined across several requests, some users may be new, and
• Preserve the order of input rankings.
candidate items for each member are very dynamic. Hence, we use
• Respect user experience guardrails.
the Lagrangian dual from Equation 1 to obtain “optimal” primal
• Have the flexibility to adapt to seasonal and local changes in
serving plans for new requests as they arrive. The quadratic term
ads demand and/or organic engagement.
in the objective function is added to introduce strong convexity
into the problem and allow easy conversions from dual to primal
4 PROBLEM FORMULATION AND
solutions and vice-versa (the derivatives of the Lagrangian vanish
ALGORITHMS in LPs [8]). Using the Lagrangian duals, the primal solution x i can
4.1 Problem Formulation be obtained by Equation 2 under two conditions: 1. uia , uio and r ia
In the paper, we will use i to index impressions, and j to denote are drawn from the same distribution as was the historical data
requests. The two key utilities corresponding to the two objectives used to solve the primal and obtain the optimal duals, 2. w → 0,
are defined as follows: which concentrates x i to one of the vertices in the simplex unless
there is a tie. w → 0 is also close to the original business problem.
Definition 4.1. Expected Engagement Utility. The expected en-
gagement utility for an item being considered for impression i (in if r ia + αuia − αuio > 0
1,
xi = (2)
the feed request j) from a user is denoted as ui . Particularly, uio if it 0, otherwise
is an organic item, and uia if it is an ad. The parameter α (which is the optimal Lagrangian dual variable
We omit the request index for brevity. Also, we do not introduce corresponding to the engagement constraint) is a function of C.
specific notations for items or item indices to keep the narrative Intuitively, this can be interpreted as a bid for engagement, This
clean. uo refers to the engagement utilities of a list of candidates, “shadow bid” converts engagement into an equivalent moneti-
where the list will be clear from the context. zation amount to enable direct comparison against revenue.
As we prioritize engagement with higher values of C, α will also
Definition 4.2. Expected Revenue Utility. The expected revenue increase. This ability to compare the two utilities on the basis of a
utility of an item being considered for impression i (from a user’s unified currency is critical for any principled blending of organic
feed request j) is denoted as r i . Particularly, r io for organic item and items and ads. Table 1 shows the final value for both ads and organic
r ia for ad. Note that organic content has no revenue utility: r io = 0. content.
Ad r ia + αuia
Suppose that there are J requests in total. For every request j,
there are N jo organic content candidates, and N ja ads to fill a total Organic content αuio
Table 1: Score to compare ad and organic content at each position
of N j = N jo + N ja slots. Organic candidates are ranked by expected
3389
Applied Data Science Track Paper B KDD ‘20, August 23–27, 2020, Virtual Event, USA
4.2 Re-Rank Algorithm is not available at that time. This position-agnostic estimates are
consistent with the “ all slots are identical” assumption.
Algorithm 1: Re-Ranker Let w = (w 1 , w 2 , . . .) where 1 ≥ w 1 ≥ w 2 . . . ≥ w n ≥ 0 be the posi-
Input: Ranked list of organic content Lo with size N o , tional effect on the utility estimates because of the aforementioned
Ranked list of ads La with size N a . Each item bias. If the position bias is invariant to item type, that is wa = wo
associates with revenue utility and engagement = w, the Re-Rank algorithm described in Algorithm 1 is still valid,
utility. since w k is applied to both estimates in comparison and cancels
Input: min gap M, top slot T , shadow bid α out. If wa , wo or w k is not monotonically decreasing, special
Output: Merged list L of organic content and ads. considerations will be needed and some sub-optimality could incur.
Initialize i = 0, j = 0, k = 0, L = [], prevIdx = 0; In this paper, we will consider the case where position bias only
while i < N o and j < N a do depends on position and is non-increasing.
if k > T and (k - prevIdx) > M
and (r a [j] + αu a [j]) > αu o [i] then 4.5 Gap Effect
L.append(La [j + +]) Another important factor that impacts users’ response is the gap
prevIdx = k between consecutive ads. We formally define gap as follows.
else
L.append(Lo [i + +]) Definition 4.3. Gap. Let d = k ′ − k denote the gap between two
consecutive ads placed at positions k and k ′ where k < k ′ .
k ++
while i < N o do It is a bit challenging to estimate the pure gap effect because there
L.append(Lo [i + +]) is a confounding position effect (as the position of the ad also varies
Return L when we change the gap). We tested with some randomized buckets
to observe the effect of different fixed gaps on ads CTR at a fixed
Algorithm 1 describes a lightweight Re-Rank algorithm for the position. As Figure 3 shows, ads CTR drops with smaller gaps. We
Re-Rank module in Figure 2. It is applied online and assumes that examined the impact to organic items as well. However, unlike
we have already obtained the value of shadow bid. The algorithm the significant effect of the gap on ads’ CTR, we did not observe
is essentially a merge operation (akin to the merge operation in significant impact on users feedback to organic items. For actual
Merge Sort [12]) which combines two ranked lists without altering estimation of the gap effect, we added a gap feature in the ads CTR
the ordering in either list. If all slots are identical, this algorithm estimation process (which has the position term to handle position
provides the optimal solution, considering the objective (as per bias already) as follows.
Equation 1) and the guardrails.
3390
Applied Data Science Track Paper B KDD ‘20, August 23–27, 2020, Virtual Event, USA
3391
Applied Data Science Track Paper B KDD ‘20, August 23–27, 2020, Virtual Event, USA
Definition 5.2. Discounted Cumulative Engagement (DCE) The results suggest that β = 0 is the most efficient value. This is
Õ Õ because the evaluation metrics share the assumption that all slots are
score(E) = w k uk, j independent and there is no gap effect. To bridge the gap in the faulty
j ∈ J k ∈[N j ]
evaluation metric, we compose a gap weight, which is similar to
For the positional bias (introduced in Section 4.4), without loss of the position weight used in DCR, and define a more appropriate
generality, we assume w = (w 1 , w 2 , . . .) are the non-increasing evaluation metric for this context in Definition 5.3. The DCE metric
weights of positions, i.e., 1 ≥ w 1 ≥ w 2 . . . ≥ w n ≥ 0. This captures remains the same, since organic items did not demonstrate any gap
a global bias towards position of a list: an item placed in the top effect.
position of the list is more likely to be impressed and clicked than Definition 5.3. Discounted Cumulative Revenue with Gap Ef-
an item placed below it. The weight vector we use is as follows: fect Õ Õ r k, j
wk =
1 score(R) = loд10 (d + c)
loд2 (k + 1) loд2 (k + 1)
j ∈ J k ∈[N j ]
This is also the weight used in discounted cumulative gain (DCG) where loд10 (d + c) is fitted with observed gap effect in Figure 3. The
and normalized discounted cumulative gain (NDCG) [9], which exact value of c is undisclosed since it reflects very specific user
are very popular metrics in evaluating ranking algorithms. Our behavior information.
evaluation metrics are essentially DCG of revenue and DCG of With this modified metric, we re-evaluate the trade-off for the same
engagement. Each value of shadow bid α will produce a pair of set of β and α. Figure 6 shows that β = 0.41 results in the best
metrics. By varying α, we can obtain a Pareto-optimal frontier trade-off curve, and β = 0 has the worst performance.
[10] for revenue and engagement. Offline replay can be used to
generate or identify a good candidate set of shadow bids (or narrow
down from a larger list). These promising shadow bids can then be
evaluated in online A/B tests.
Data Setting. We randomly sample around 100,000 feed requests
from the historical logs of the LinkedIn Feed. Each request has hun-
dreds of organic content candidates and ads candidates. Revenue
utilities and engagement utilities for each candidate item, estimated
with a default fixed position, are generated before the Re-Rank
layer, and are tracked in the logs. A better ad allocation algorithm
should map to a better Pareto-optimal tradeoff curve. The value
of β (the gap effect estimator defined in Section 4.5) decides the
shape of the curve, and the hyper-parameter α (i.e., the shadow bid)
decides the operating point on the curve. We learn β through CTR
prediction task, which has marginal improvement of accuracy with Figure 6: Revenue and engagement tradeoff curve with the modified
gap effect. DCR. A gap effect aware evaluation metric is necessary to show the
We picked several values of β to demonstrate the difference. For usefulness of gap effect aware algorithms.
each β, we select a set of αs. We re-rank feed requests based on the As we see from the above results, different evaluation metrics can
chosen β and α, then compute DCE and DCR. Figure 5 shows the lead to diametrically opposite conclusions. The majority of existing
performance curves with different values of β. literature relies on DCG (or NDCG), which has a strong assumption,
to evaluate ranking problems. Blending different types of items
can violate such assumptions in certain cases, hence it’s im-
portant to select an appropriate evaluation metric. To demonstrate
the fidelity of the newly proposed evaluation metric, we now com-
pare its online performance to the β = 0 variant (which performed
best with DCG).
3392
Applied Data Science Track Paper B KDD ‘20, August 23–27, 2020, Virtual Event, USA
We test four groups A, B, C, D, each of which is compared to a requests per day were included. Such diverse experience can help
group with matched ads impressions from a baseline algorithm in reducing ads blindness.
(no gap effect, i.e., β = 0). Each group corresponds to a certain
number of ads impressions with gap effect (i.e. with the same β
(, 0) and different αs). The baseline algorithm is Re-Rank algorithm
without gap effect (i.e. β = 0, with appropriate α values to match
impression volume with its corresponding gap-effect bucket). We
ran each variant on 2% of live traffic on LinkedIn for a week. Using
group A’s impression volume as the baseline, group B has 4% more
impressions than group A, group C has 10% more impressions than
group A, and group D has 20% more impressions than group A.
Table 3 shows the results. It is clear that re-rank with gap effect is
much more efficient than the baseline.
Online results also directionally match the offline curves in Figure 6.
The difference in the middle is bigger as shown in Groups B and Figure 7: Most users experienced some diversity in the set of gaps
C. In the extreme cases of all or no ads, all Re-Rank algorithms between ads when Re-Rank with gap effect was used.
will achieve same engagement and same revenue (i.e., 0). Group
D gets close to the extreme because of the guardrails and high ad 5.3.2 Elasticity to Demand. Our ads ecosystem, like on most other
impressions. platforms, has a strong quarterly and annual seasonality pattern
Test group (Impressions) Engagement lift Revenue lift in demand. The increase in demand usually means an increase in
A (N ) 0.1% 2% total budget and the total number of campaigns. As a consequence,
B (1.04N ) 0.8% 3.9% the proportion of ads with higher expected revenue utility per im-
C (1.1N ) 0.69% 5.32% pression also increases. In contrast, the decrease in demand will
reduce the portion of high revenue utility ads. If the quantity of ads
D (1.2N ) 0.37% 1.00%
impressions is fixed (e.g., with fixed slotting), the increase of de-
Table 3: Online A/B test results of Re-Rank with gap effect v.s. Re-
Rank algorithm. Each row is a comparison under a fixed amount of
mand will lead to more competition hence price per impression will
ads impressions, and group A serves as the baseline. All numbers increase. In this case, advertisers’ return on investment (ROI) will
are statistically significant with p-val < 0.0001. get hurt. On the contrary, when the demand gets lower, we could
provide a better user experience by showing less ads, compared to
The Re-Rank with gap effect algorithm enables organic items with
the same fixed amount of ads shown in high demand season.
high engagement utilities to be placed in better positions that align
It is critical to use an efficient ads allocation algorithm that is re-
better with users’ preference, it achieves moderate organic engage-
sponsive to the change of demand. Without hurting engagement,
ment lift. The lift number on engagement metrics looks small, but
when the demand is high, ads can win more (and better) positions,
the absolute value is significant, especially since we do not modify
and when the demand is low, organic items can “take back” these
the engagement estimate directly but only change ads allocation re-
coveted positions from ads. We conducted a 6 months period online
sults. The revenue lift largely comes from the average cost per click
A/B test for our Re-Rank with gap effect algorithm, and observed
(CPC) increase. Ads with high expected revenue utility has a better
ads position distribution shifts between low demand and high de-
chance to show at high positions, which results in increased impres-
mand period, as shown in Figure 8.
sions for such ads. These ads with high revenue utility also often
have higher CPC (revenue utility and CPC are highly correlated).
We used Mantel-Haenszel Method [16] to compute reweighed CPC
that takes stratification into account to address Simpson’s paradox.
The reweighed CPC is neutral, and proves that revenue lift is not
from a systematic price increase, but due to more “expensive” ads
getting more results.
3393
Applied Data Science Track Paper B KDD ‘20, August 23–27, 2020, Virtual Event, USA
3. Each ranking system operates and evolves on its own. Such SIGKDD international conference on Knowledge discovery and data mining. 1603–
decoupling facilitates modeling velocity for both organic content 1612.
[5] Deepak Agarwal, Bee-Chung Chen, Qi He, Zhenhao Hua, Guy Lebanon, Yiming
and ads, but also imposes an additional challenge on the Re-Rank Ma, Pannagadatta Shivaswamy, Hsiao-Ping Tseng, Jaewon Yang, and Liang Zhang.
layer. If the score distribution changes dramatically from either 2015. Personalizing linkedin feed. In Proceedings of the 21th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining. 1651–1660.
upstream ranking module, it could have significant, and inadvertent [6] Deepak Agarwal, Bo Long, Jonathan Traupman, Doris Xin, and Liang Zhang.
impact on the final ranking (e.g., ad CTR prediction goes up by 10%). 2014. Laser: A scalable response prediction platform for online advertising. In
As a result, we need to solve a calibration challenge. Proceedings of the 7th ACM international conference on Web search and data mining.
173–182.
If the utility score is point-wise prediction, e.g., the probability [7] Jaime Arguello, Fernando Diaz, Jamie Callan, and Ben Carterette. 2011. A method-
of click, we leverage isotonic regression [24] to bring predicted ology for evaluating aggregated search results. In European Conference on Infor-
probability close to observed user response. If the utility score mation Retrieval. Springer, 141–152.
[8] Stephen Boyd and Lieven Vandenberghe. 2004. Convex optimization. Cambridge
is not point-wise prediction but a combination of multiple point- university press.
wise predictions, we leverage Thompson Sampling [21] to find a [9] Christopher Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole
Hamilton, and Gregory N Hullender. 2005. Learning to rank using gradient
global calibration factor for the new score distribution with online descent. In Proceedings of the 22nd International Conference on Machine learning
traffic. The objective is to match a metric of interest between a new (ICML-05). 89–96.
model and the control model. We assume the metric to match is a [10] Yair Censor. 1977. Pareto optimality in multiobjective problems. Applied Mathe-
matics and Optimization 4, 1 (1977), 41–59.
function of the calibration factor, and that function is drawn from a [11] Ye Chen, Pavel Berkhin, Bo Anderson, and Nikhil R Devanur. 2011. Real-time
Guassian Process prior with a covariance function. The algorithm bidding algorithms for performance-based display ad allocation. In Proceedings
has explore and exploit stages, and repeatedly estimates the value of the 17th ACM SIGKDD international conference on Knowledge discovery and
data mining. ACM, 1307–1315.
of the factor until convergence. Our calibration framework works [12] Thomas H Cormen, Charles E Leiserson, Ronald L Rivest, and Clifford Stein. 2009.
well in practice. It facilitates the process to ramp new models in the Introduction to algorithms. MIT press.
[13] Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet
first rank layer and reduces manual tuning burden significantly. advertising and the generalized second-price auction: Selling billions of dollars
worth of keywords. American economic review 97, 1 (2007), 242–259.
[14] Yan Gao, Viral Gupta, Jinyun Yan, Changji Shi, Zhongen Tao, PJ Xiao, Curtis
6 CONCLUSION Wang, Shipeng Yu, Romer Rosales, Ajith Muralidharan, and Shaunak Chatterjee.
2018. Near Real-time Optimization of Activity-based Notifications. In Proceedings
In this paper, we discuss different approaches to allocate optimal of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data
positions to ads in a newsfeed application to obtain an optimal Mining. ACM, 283–292.
trade-off between revenue and engagement. Using the foundations [15] Mounia Lalmas. 2011. Aggregated search. In Advanced Topics in Information
Retrieval. Springer, 109–123.
of constrained optimization, we present a set of blending algorithms [16] Nathan Mantel and William Haenszel. 1959. Statistical aspects of the analysis of
(termed Re-Rank), which are optimal under various assumptions (all data from retrospective studies of disease. Journal of the National Cancer Institute
slots i.i.d., positional bias, gap effect). We discuss many effects which 22, 4 (1959), 719–748.
[17] Mario Rodriguez, Christian Posse, and Ethan Zhang. 2012. Multiple objective
are important considerations for slotting ads into feed through optimization in recommender systems. In Proceedings of the sixth ACM conference
randomized online tests performed on the LinkedIn feed. on Recommender systems. ACM, 11–18.
[18] Shanu Sushmita, Hideo Joho, Mounia Lalmas, and Robert Villa. 2010. Factors
To the best of our knowledge, this is the first reported work on affecting click-through behavior in aggregated search interfaces. In Proceedings of
this problem, which is very critical in the internet industry today. the 19th ACM international conference on Information and knowledge management.
We are optimistic that this will help practitioners design their feed 519–528.
[19] Krysta M Svore, Maksims N Volkovs, and Christopher JC Burges. 2011. Learning
applications with monetization considerations in a more optimal to rank with multiple objective functions. In Proceedings of the 20th international
fashion, and also encourage future publications in this space. conference on World wide web. ACM, 367–376.
In terms of future directions, devising algorithms when the gap ef- [20] Liang Tang, Bo Long, Bee-Chung Chen, and Deepak Agarwal. 2016. An empirical
study on recommendation with multiple types of feedback. In Proceedings of the
fect has more complex forms, or positional effects are non-monotonic 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
are interesting challenges. Proving a general bound on performance Mining. 283–292.
[21] William R. Thompson. 1933. On the likelihood that one unknown probability
for the current algorithm in the general setting would also be useful. exceeds another in view of the evidence of two samples. Biometrika 47 (1933),
Finally, ads allocation is the end component in a very dynamic ads 285–294.
system and there are feedback loops (e.g., fewer ad impressions [22] Hal R Varian and Christopher Harris. 2014. The VCG auction in theory and
practice. American Economic Review 104, 5 (2014), 442–45.
can increase advertiser bids in an automated bidding system). Be- [23] Bo Wang, Zhaonan Li, Jie Tang, Kuo Zhang, Songcan Chen, and Liyun Ru. 2011.
ing more cognizant of those effects and eventually optimizing the Learning to advertise: how many ads are enough?. In Pacific-Asia Conference on
bigger system can be very useful. Knowledge Discovery and Data Mining. Springer, 506–518.
[24] Bianca Zadrozny and Charles Elkan. 2002. Transforming classifier scores into ac-
curate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD
international conference on Knowledge discovery and data mining. ACM, 694–699.
REFERENCES [25] XianXing Zhang, Yitong Zhou, Yiming Ma, Bee-Chung Chen, Liang Zhang,
[1] Gediminas Adomavicius, Nikos Manouselis, and YoungOk Kwon. 2011. Multi- and Deepak Agarwal. 2016. Glmix: Generalized linear mixed models for large-
criteria recommender systems. In Recommender systems handbook. Springer, scale response prediction. In Proceedings of the 22nd ACM SIGKDD International
769–803. Conference on Knowledge Discovery and Data Mining. ACM, 363–372.
[2] Deepak Agarwal, Shaunak Chatterjee, Yang Yang, and Liang Zhang. 2015. Con- [26] Bo Zhao, Koichiro Narita, Burkay Orten, and John Egan. 2018. Notification
strained optimization for homepage relevance. In Proceedings of the 24th Interna- Volume Control and Optimization System at Pinterest. In Proceedings of the 24th
tional Conference on World Wide Web. ACM, 375–384. ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
[3] Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, and Xuanhui Wang. 2012. ACM, 1012–1020.
Personalized Click Shaping Through Lagrangian Duality for Online Recommen-
dation. In SIGIR (Portland, Oregon, USA). ACM, New York, NY, USA, 485–494.
[4] Deepak Agarwal, Bee-Chung Chen, Rupesh Gupta, Joshua Hartman, Qi He,
Anand Iyer, Sumanth Kolar, Yiming Ma, Pannagadatta Shivaswamy, Ajit Singh,
et al. 2014. Activity ranking in LinkedIn feed. In Proceedings of the 20th ACM
3394