4 Lerman

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Social Networks and Social Information Filtering on Digg∗


Kristina Lerman
University of Southern California
Information Sciences Institute
4676 Admiralty Way
Marina del Rey, California 90292
[email protected]

Abstract suggest new products or documents simply based on whether


The new social media sites—blogs, wikis, Flickr and Digg, the user’s designated friends found these products or docu-
among others—underscore the transformation of the Web to ments interesting. Researchers in the past have recognized
a participatory medium in which users are actively creating, that social networks present in the user base of the recom-
evaluating and distributing information. Digg is a social news mender system can be induced from the explicit and implicit
aggregator which allows users to submit links to, vote on and declarations of user interest, and that these social networks
discuss news stories. Each day Digg selects a handful of sto- can in turn be used to make new recommendations [2].
ries to feature on its front page. Rather than rely on the The new social media sites, such as the social news aggre-
opinion of a few editors, Digg aggregates opinions of thou- gator Digg,1 allow users to explicitly build social networks by
sands of its users to decide which stories to promote to the designating others as friends. Tracking activities of friends is
front page. Digg users can designate other users as “friends” common feature in many social media sites and is one of the
and easily track friends’ activities: what new stories they sub- major draws attracting users to these sites. It offers a new
mitted, commented on or read. The friends interface acts as paradigm for interacting with information—social filtering.
a social filtering system, recommending to user stories his or Rather than actively searching for new interesting content,
her friends liked or found interesting. By tracking the votes or subscribing to a set of predefined topics, users can now
received by newly submitted stories over time, we showed put other people to task of finding and filtering information
that social filtering is an effective information filtering ap- for them. We show that social networks are being used on
proach. Specifically, we showed that (a) users tend to like Digg for social filtering. Specifically, we show that Digg users
stories submitted by friends and (b) users tend to like stories tend to be interested in the news stories their friends find in-
their friends read and liked. Social filtering is a promising teresting. Although social filtering, as practiced by Digg, has
new technology that can be used to personalize and tailor in- recently come under fire for being susceptible to “gaming,”
formation to individual users: for example, through personal we believe it to be a promising technology that will lead to
front pages. new generation of personalization and recommendation algo-
rithms.
Keywords
Social Network analysis; collaborative filtering; social filtering 2. Structure of Digg
Digg’s functionality is very simple: Users submit links to sto-
1. Introduction ries they find online, and other users vote on these stories.
Many Web sites that provide information (or sell products When a story gets enough positive votes, or diggs, it is pro-
or services) use collaborative filtering technology to suggest moted to the front page. The front page is what users see on
relevant documents (or products and services) to its users. the Digg home page, while the newly submitted stories are
Collaborative filtering-based recommendation systems [1] try less visible, being “hidden” in the Upcoming stories pages.
to find users with similar interests by comparing their opin- Digg also allows users to designate other users as friends and
ions about products. They will then suggest new products makes it easy to track friends’ activities. A section of Digg’s
that were liked by other users with similar opinions. Recom- home page summarizes the number of stories the friends have
mender systems based on social filtering, on the other hand, submitted, commented on or liked recently.
∗A full version of this paper is available at Each day Digg selects a handful of stories to feature on its
heaving trafficked front page. Although the exact formula for
arxiv.org/abs/cs.HC/0612046
‡This research is based on work supported in part by how a story is promoted to the front page is kept secret, so as
to prevent users from “gaming the system,” it appears to take
the National Science Foundation under Award Nos. IIS- into account the number of diggs a story gets and the rate at
0535182 and IIS-0413321. We are grateful to Dipsy Kapoor
for helping with data analysis, and to Fetch Technologies which it gets them. The mechanism by which the stories are
(http://www.fetch.com) for providing wrapper building and promoted, therefore, does not depend on the decision of one
execution tools. or few editors, but emerges from the activities of many users.
In order to study the role of social networks in filtering,
we tracked both new and front page stories in the technology
1
ICWSM’2007 Boulder, Colorado, USA http://digg.com/technology
30 900
category. We collected data in May 2006 by scraping Digg site in all diggs in first 25 diggs reverse friends
with the help of Web wrappers, created with tools provided

number friends who dugg story


25
by Fetch Technologies. We extracted 195 front stories. For

number reverse friends


each story, we extracted the submitter’s name, story title, 20 600
time submitted, number of diggs the story received and the
15
list of the first 216 users who dugg the story (15, 742 unique
users total). We also collected information about the top 10 300
1020 ranked users. For each user, we extracted the list of
friends and reverse friends or “people who have befriended 5
this user.”
0 0
1 98 195
100,000
stories (sorted)
a
10,000 b Fig. 2: Number of diggers who are also among the reverse
number reverse friends+1

friends of the user who submitted the story


1,000

highly unlikely.
100 Moreover, users digg stories submitted by their friends very
quickly. The heavy solid line in Figure 1(b) shows the number
10 of reverse friends who were among the first 25 diggers. The
probability that these numbers could have been observed by
chance is even less—P = 0.003. We conclude that users digg
1
stories their friends submit. A consequence of this conclusion
1 10 100 1,000
is that users with active and large social networks are more
number friends+1
successful in getting their stories promoted to the front page.
Fig. 1: Scatter plot of the number of friends vs reverse friends We believe that this explains the success of top users.
for the top 1020 Digg users. Two of the biggest celebrities,
kevinrose and diggnation, are marked a and b 3.2 Users digg stories their friends digg
Do social networks also help users discover interesting stories
that were submitted by unknown users? In other words, do
3. Social filtering on Digg users digg stories their friends like?
We looked at the 25 diggs that came after the first m diggs
To show that Digg users take advantage of the Friends inter- to see how many came from friends of the first m diggers.
face to filter the tremendous number of new submissions, we Of the stories posted posted by “unknown” users, ten were
analyze two sub-claims: (a) users digg stories their friends dugg by submitter’s reverse friends (p = 0.005). After five
submit, and (b) users digg stories their friends digg. more diggs (m = 6), 75 became visible to others through the
Note that the “friend” relationship is not symmetric: if user friends interface, and of these 23 (p = 0.028) were dugg by
A designates user B as a friend, user A can keep track of user friends. After 15 users dugg the story, 94 are now visible
B’s activities, but not vice versa. This makes A the reverse and 37 (p = 0.060) are dugg by friends. After 25 diggs, all
friend of B. Figure 1 shows the scatter plot of the number of 96 stories were visible, and almost half of these were dugg
friends vs reverse friends of the top 1020 Digg users as of May by friends (p = 0.077). The probabilities that these many
2006. Black symbols correspond to the top 33 users. For the friends could have dugg the story by chance are above the
most part, users appear to take advantage of Digg’s social 0.05 significance level for after 25 diggs, possibly reflecting
networking feature, with the top users having bigger social the story’s increased visibility on the front page. Although
networks. the effect is not quite as dramatic as one in the previous
section, we believe that the data shows that users do use the
3.1 Users digg stories their friends submit
friends interface to find new interesting stories.
We compare the list of users who dugg the story, or any por-
tion of it, with the list of reverse friends of the submitter. Sub-
mitter’s name is the first on the list. Figure 2 shows the num-
References
ber of diggers who are also among the reverse friends of the [1] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker,
submitter. Dashed line shows the size of the social network L. R. Gordon, and J. Riedl. GroupLens: Applying
(number of reverse friends) of the submitter. More than half collaborative filtering to Usenet news. Communications
of the stories (102) were submitted by users with one or more of the ACM, 40(3):77–87, 1997.
reverse friends, and the rest by unknown users. We use simple [2] S. Perugini, M. Andr Gonalves, and E. A. Fox.
combinatorics to compute the probability that k of the sub- Recommender systems research: A connection-centric
mitter’s friends could have dugg the story purely by chance. survey. Journal of Intelligent Information Systems,
The probability that after picking n = 215 users randomly 23(2):107 – 143, September 2004.
from a pool of N = 15, 742 you end ` ´up with k that came
from a group of size K is P (k, n) = nk (p)k (1 − p)n−k , where
p = K/N . Using this formula, the probability (averaged over
stories dugg by at least one friend) that the observed numbers
of friends dugg the story by chance is P = 0.005, making it

You might also like