Trends in Social Media: Persistence and Decay
Trends in Social Media: Persistence and Decay
Trends in Social Media: Persistence and Decay
Density
Density
Density
2.0
1.5 1.5 2 1.5
1.0 1.0 1.0
1
0.5 0.5 0.5
0.0 0.0 0 0.0
100.2 100.4 100.6 100.8 101 101.2 101.4 101.6 100.4 100.6 100.8 101 101.2 101.4 101.6 100.2 100.4 100.6 100.8 101 100.2 100.4 100.6 100.8 101 101.2 101.4 101.6
Relative tweet count Relative tweet count Relative tweet count Relative tweet count
(a)
−0.2 1.6
1.6
−0.4 1.4
1.4 0.8
−0.6 1.2
Data quantiles
Data quantiles
Data quantiles
Data quantiles
1.2
−0.8 0.6 1.0
1.0
−1.0 0.8
0.8 0.4
−1.2 0.6
0.6
−1.4 0.2 0.4
0.4
−3 −2 −1 0 1 2 3 −2 −1 0 1 2 −3 −2 −1 0 1 2 3 −2 −1 0 1 2
Normal theoretical quantiles Normal theoretical quantiles Normal theoretical quantiles Normal theoretical quantiles
(b)
Figure 1: (a) The densities of the ratios between cumulative tweet counts measured in two respective time frames. From left to right
in the figure, the indices of the time frames between which the ratios were taken are: (2, 10), (2, 14), (4, 10), and (4, 14), respectively.
The horizontal axis has been rescaled logarithmically, and the solid line in the plots shows the density estimates using a kernel
smoother. (b) The Q-Q plots of the cumulative tweet distributions with respect to normal distributions. If the random variables of
the data were a linear transformation of normal variates, the points would line up on the straight lines shown in the plots. The tails
of the empirical distributions are apparently heavier than in the normal case.
These figures immediately suggest that the ratios Cq (ti , tj ) are dis- Here Nq (0) is the initial number of tweets in the earliest time step.
tributed according to log-normal distributions, since the horizontal Taking the logarithm of both sides of Eq. (3),
axes are logarithmically rescaled, and the histograms appear to be
t
Gaussian functions. To check if this assumption holds, consider X
Fig. 1(b), where we show the Q-Q plots of the distributions of ln Nq (t) − ln Nq (0) = ln [1 + γ(s)ξ(s)] (4)
Fig. 1(a) in comparison to normal distributions. We can observe s=1
0.8
0.2
0.6
10−1
10 20 30 40
γ
10−1.5 0.4
10−2 0.2
0.0
unbounded [11]. To check this, we first plotted a few representative examples of the
cumulative number of tweets for a few topics in Fig. 3. It is ap-
To measure the functional form of γ(t), we observe that the ex- parent that all the topics ( selected randomly) show an approximate
pected value of the noise term ξ(t) in Eq. (2) is 1. Thus averaging initial linear growth in the number of tweets.We also checked if this
over the fractions between consecutive tweet counts yields γ(t): is true in general. Figure 4 shows the second discrete derivative of
the total number of tweets, which we expect to be 0 if the trend
Nq (t)
γ(t) = − 1. (5) lines are linear on average. A positive second derivative would
Nq (t − 1) q mean that the growth is superlinear, while a negative one suggests
The experimental values of γ(t) in time are shown in Fig. 2. It that it is sublinear. We point out that before taking the average of
is interesting to notice that γ(t) follows a power-law decay very all second derivatives over the different topics in time, we divided
precisely with an exponent of −1, which means that γ(t) ∼ 1/t. the derivatives by the average of the total number of tweets of the
given topics. We did this so as to account for the large difference
between the ranges of the number of tweets across topics, since a
6. THE GROWTH OF TWEETS OVER TIME simple averaging without prior normalization would likely bias the
The interesting fact about the decay function γ(t) = 1/t is that results towards topics with large tweet counts and their fluctuations.
it results in a linear increase in the total number of tweets for a The averages are shown in Fig. 4.
topic over time. To see this, we can again consider Eq. (4), and
approximate the discrete sum of random variables with an integral We observe from the figure that when we consider all topics there
of the operand of the sum, and substitute the noise term with its is a very slight sublinear growth regime right after the topic starts
expectation value, hξ(t)i = 1 as defined earlier (this is valid if γ(t) trending, which then becomes mostly linear, as the derivatives data
is changing slowly). These approximations yield the following: is distributed around 0. If we consider only very popular topics
Nq (t)
Z t Z t
1 (that were on the trends site for more than 4 hours), we observe an
ln ≈ ln [1 + γ(τ )] dτ ≈ dτ = ln t. (6) even better linear trend. One reason for this may be that topics that
Nq (0) τ =0 τ =0 τ
trend only for short periods exhibit a concave curvature, since they
In simplifying the logarithm above, we used the Taylor expansion lose popularity quickly, and are removed from among the Twitter
of ln(1 + x) ≈ x, for small x, and also used the fact that γ(τ ) = trends by the system early on.
1/τ as we found experimentally earlier.
These results suggest that once a topic is highlighted as a trend on a
It can be immediately seen then that Nq (t) ≈ Nq (0) t for the range very visible website, its growth becomes linear in time. The reason
of t where γ(t) is inversely proportional to t. In fact, it can be for this may be that as more and more visitors come to the site
easily proven that no functional form for γ(t) would yield a lin- and see the trending topics there is a constant probability that they
ear increase in Nq (t) other than γ(t) ∼ 1/t (assuming that the will also talk and tweet about it. This is in contrast to scenarios
above approximations are valid for the stochastic discrete case). where the primary channel of information flow is more informal.
0.06 2000
103
Relative curvature of the tweet count
1500
0.04
102.5 1000
Number of topics
500
0.02 102
0
0 2 4 6 8 10
101.5
0.00
101
−0.02
100.5
−0.04
100
Figure 4: The average of the second derivative of the total num- 103 1500
ber of tweets over all topics. For one topic, we first divided the
Frequency
2.5
derivative values by the mean of the tweet counts so as to min- 10 1000
imize the differences between the wide range of topic popular- 500
102
Frequency
ities. The open circles show the derivatives obtained with this 0
0 5 10 15 20
procedure for all topics, while the smaller red dots represent 101.5
Trend duration (hours)
100.5
In that case we expect that the growth will exhibit first a phase
with accelerated growth and then slow down to a point when no 100
one talks about the topic any more. Content that spreads through
100 100.5 101 101.5
a social network or without external “driving” will follow such a Trend duration (hours)
course, as has been showed elsewhere [10, 12].
Figure 5: (a) The distribution of the number of sequences a
7. PERSISTENCE OF TRENDS trending topic comprises of (b) The distribution of the lengths
An important reason to study trending topics on Twitter is to un- of each sequence. Both graphs are shown in the log-log scale
derstand why some of them remain at the top while others dissi- with the inset giving the actual histograms in the linear scale.
pate quickly. To see the general pattern of behavior on Twitter,
we examined the lifetimes of the topics that trended in our study.
From Fig 5(a) we can see that while most topics occur continu- We first examine the authors who tweet about given trending topics
ously, around 34% of topics appear in more than one sequence. to see if the authors change over time or if it is the same people
This means that they stop trending for a certain period of time be- who keep tweeting to cause trends. When we computed the corre-
fore beginning to trend again. lation in the number of unique authors for a topic with the duration
(number of timestamps) that the topic trends we noticed that cor-
A reason for this behavior may be the time zones that are involved. relation is very strong (0.80). This indicates that as the number of
For instance, if a topic is a piece of news relevant to North Ameri- authors increases so does the lifetime, suggesting that the propaga-
can readers, a trend may first appear in the Eastern time zone, and tion through the network causes the topic to trend.
3 hours later in the Pacific time zone. Likewise, a trend may re-
turn the next morning if it was trending the previous evening, when To measure the impact of authors we compute for each topic the
more users check their accounts again after the night. active-ratio aq as:
N umber of T weets
Given that many topics do not occur continuously, we examined the aq = (7)
N umber of U nique Authors
distribution of the lengths sequences for all topics. In Fig 5(b) we
show the length of the topic sequences. It can be observed that this The correlation of active-ratio with trending duration is as shown in
is a power-law which means that most topic sequences are short Fig 6. We observe that the active-ratio quickly saturates and varies
and a few topics last for a very long time. This could be due to the little with time for any given topic. Since the authors change over
fact that there are many topics competing for attention. Thus, the time with the topic propagation, the correlation between number of
topics that make it to the top (the trend list) last for a short time. tweets and authors is high (0.83).
However, in many cases, the topics return to trend for more time,
which is captured by the number of sequences shown in Fig 5(a), 7.2 Persistence of long trending topics
as mentioned. On Twitter each topic competes with the others to survive on the
trending page. As we now show, for the long trending ones we can
7.1 Relation to authors and activity derive an expression for the distribution of their average length.
●
4000
3000 0.1
Trending Duration (mins)
Probability
●
2000
0.01
●
●
●● ● ●●●
●●●
● ●
● ●
1000
●●●
●●● ● ●
●●●●
●●●
●● ● ●
●●●
●●
●●
●●● ●
●● ●
● ●● ●
●
●●
●
●●●
● ● ● ● ●
●
●● ● ● ● ●
●
●
●
●
●●
●
●
●
●●●
● ● ●●● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●●● ●● ● ● ●● ●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●●●●●●●
●● ●●● ● ● ● ● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●
●
●●
● ●
● ●●
●●
● ● ● ● ● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●●
●●
●
● ●●●●
●● ●● ●● ●
●
●
●●
●●
●●
●
●
●●
●●●
●●●●●●
●●●●● ●●●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●●
●
●
●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●●
●
●●
●●
●
●●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●●
●●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ● ●●
●
● ●● ●
●●
●
● ●●●
●●
● ● 1E-3
0
10 20 30 40
2 4 6 8 10
Trending Duration
Active Ratio
Figure 6: Relation between the active-ratio and the length of Figure 8: Fit of trending duration to density in log scale. The
the trend across all topics, showing that the active-ratio does straight line suggests an exponential family of the trending time
not vary significantly with time. distribution. The red line gives a fit with an R2 of 0.9112.
0.00
Trending Duration
Since the ξs are independent and identical distributed random vari- 8.1 Sources
ables, φ1 , φ2 , ···φt would be independent with each other. Thus the We examined the users who initiate the most trending topics. First,
probability that a topic stops trending in a time interval s, where s is for each topic we extracted the first 100 users who tweeted about it
large, is equal to the probability that φs is lower than the threshold prior to its trending. The distribution of these authors and the topics
θ, which can be written as: is a power-law, as shown in Fig 9. This shows that there are few
authors who contribute to the creation of many different topics. To
p = Pr(φs < θ) = Pr(log φs < log(θ)) focus on these multi-tasking users, we considered only the authors
(9)
= Pr(ξs < log(θ)) = F (log θ) who contributed to at least five trending topics.
Author Retweets Topics Retweet-Ratio
105 200000 vovo_panico 11688 65 179.81
150000 cnnbrk 8444 84 100.52
104 100000
keshasuja 5110 51 100.19
LadyGonga 4580 54 84.81
Number of authors
50000
103 0
BreakingNews 8406 100 84.06
0 2 4 6 8 10
MLB 3866 62 62.35
nytimes 2960 59 50.17
102
HerbertFromFG 2693 58 46.43
espn 2371 66 35.92
101 globovision 2668 75 35.57
huffingtonpost 2135 63 33.88
100 skynewsbreak 1664 52 32
100 100.5 101 101.5 el_pais 1623 52 31.21
Number of trending topics initiated stcom 1255 51 24.60
la_patilla 1273 65 19.58
Figure 9: Distribution of the first 100 authors for each trending reuters 957 57 16.78
topic. The log-log plot shows a power-law distribution. The WashingtonPost 929 60 15.48
inset graph gives the actual histogram in the linear scale. bbcworld 832 59 14.10
CBSnews 547 56 9.76
When we consider people who are influential in starting trends on TelegraphNews 464 79 5.87
Twitter, we can hypothesize two attributes - a high frequency of ac- tweetmeme 342 97 3.52
tivity for these users, as well as a large follower network. To eval- nydailynews 173 51 3.39
uate these hypotheses we measured these two attributes for these Table 1: Top 22 Retweeted Users in at least 50 trending topics
authors over these months. each
Frequency: The tweet-rate can effectively measure the frequency identify the authors who are retweeted the most in the trending top-
of participation of a Twitter user. The mean tweet-rate for these ics, we counted the number of retweets for each author on each
users was 26.38 tweets per day, indicating that these authors tweeted topic.
fairly regularly. However, when we computed the correlation of the
tweet-rate with the number of trending topics that they contributed Domination: We found that in some cases, almost all the retweets
to, the result was a weak positive correlation of 0.22. This indicates for a topic are credited to one single user. These are topics that are
that although people who tweet a lot do tend to contribute to the entirely based on the comments by that user. They can thus be said
trending topics, the rate by itself does not strongly determine the to be dominating the topic. The domination-ratio for a topic can be
popularity of the topic. In fact, they happen to tweet on a variety of defined as the fraction of the retweets of that topic that can be at-
topics, many of which do not become trends. We found that a large tributed to the largest contributing user for that topic. However, we
number of them tended to tweet frequently about sporting events observed a negative correlation of −0.19 between the domination-
and players and teams involved. When some sports-related topics ratio of a topic to its trending duration. This means that topics
begin to trend, these users are among the early initiators of them, by revolving around a particular author’s tweets do not typically last
virtue of their high tweet-rate. This suggests that the nature of the long. This is consistent with the earlier observed strong correlation
content plays a strong role in determining if a topic trends, rather between number of authors and the trend duration. Hence, for a
than the users who initate it. topic to trend for a long time, it requires many people to contribute
actively to it.
Audience: When we looked at the number of followers for these
authors, we were surprised to find that they were almost completely Influence: On the other hand, we observed that there were authors
uncorrelated (correlation of 0.01) with the number of trending top- who contributed actively to many topics and were retweeted signif-
ics, although the mean is fairly high (2481) 1 . The absence of cor- icantly in many of them. For each author, we computed the ratio
relation indicates that the number of followers is not an indication of retweets to topics which we call the retweet-ratio. The list of
of influence, similar to observations in earlier work [9]. influential authors who are retweeted in at least 50 trending topics
is shown in Table 1. We find that a large portion of these authors
8.2 Propagators are popular news sources such as CNN, the New York Times and
We have observed previously that topics trend on Twitter mainly ESPN. This illustrates that social media, far from being an alter-
due to the propagation through the network. The main way to prop- nate source of news, functions more as a filter and an amplifier for
agate information on Twitter is by retweeting. 31% of the tweets of interesting news from traditional media.
trending topics are retweets. This reflects a high volume of propa-
gation that garner popularity for these topics. Further, the number 9. CONCLUSIONS
of retweets for a topic correlates very strongly (0.96) with the trend To study the dynamics of trends in social media, we have conducted
duration, indicating that a topic is of interest as long as there are a comprehensive study on trending topics on Twitter. We first de-
people retweeting it. rived a stochastic model to explain the growth of trending topics
and showed that it leads to a lognormal distribution, which is vali-
Each retweet credits the original poster of the tweet. Hence, to dated by our empirical results. We also have found that most topics
1 do not trend for long, and for those that are long-trending, their
This is due to the fact that one of these authors has more than a
million followers persistence obeys a geometric distribution.
When we considered the impact of the users of the network, we
discovered that the number of followers and tweet-rate of users are
not the attributes that cause trends. What proves to be more impor-
tant in determining trends is the retweets by other users, which is
more related to the content that is being shared than the attributes
of the users. Furthermore, we found that the content that trended
was largely news from traditional media sources, which are then
amplified by repeated retweets on Twitter to generate trends.
10. REFERENCES
[1] N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the
Influential Bloggers in a Community. WSDM’08, 2008.
[2] S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing
influence-based contagion from homophily-driven diffusion
in dynamic networks. Proceedings of the National Academy
of Sciences, 106(51):21544–21549, December 2009.
[3] M. Cha, H. Haddadi, F. Benevenuto, and K. P. Gummadi.
Measuring User Influence in Twitter: The Million Follower
Fallacy. In Fourth International AAAI Conference on
Weblogs and Social Media, May 2010.
[4] W. Galuba, D. Chakraborty, K. Aberer, Z. Despotovic, and
W. Kellerer. Outtweeting the Twitterers - Predicting
Information Cascades in Microblogs. In 3rd Workshop on
Online Social Networks (WOSN 2010), 2010.
[5] B. A. Huberman, D. M. Romero, and F. Wu. Social networks
that matter: Twitter under the microscope. ArXiv e-prints,
December 2008, 0812.1045.
[6] B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter
power: Tweets as electronic word of mouth. J. Am. Soc. Inf.
Sci., 60(11):2169–2188, 2009.
[7] M. E. McCombs and D. L. Shaw. The Evolution of
Agenda-Setting Research: Twenty Five Years in the
Marketplace of Ideas. Journal of Communication, (43
(2)):68–84, 1993.
[8] M. Mitzenmacher. A brief history of generative models for
power law and lognormal distributions. Internet
Mathematics, 1:226–251, 2004.
[9] D. M. Romero, W. Galuba, S. Asur, and B. A. Huberman.
Influence and passivity in social media. In 20th International
World Wide Web Conference (WWW’11), 2011.
[10] G. Szabo and B. A. Huberman. Predicting the popularity of
online content. Commun. ACM, 53(8):80–88, 2010.
[11] F. Wu and B. A. Huberman. Novelty and collective attention.
Proceedings of the National Academy of Sciences of the
United States of America, 104(45):17599–17601, November
2007.
[12] J. Yang and J. Leskovec. Patterns of temporal variation in
online media. In Proceedings of the fourth ACM
international conference on Web search and data mining,
WSDM ’11, pages 177–186, 2011.