Predicting The Importance of Newsfeed Posts and So
Predicting The Importance of Newsfeed Posts and So
net/publication/221603440
CITATIONS READS
66 810
5 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Michael Gamon on 15 December 2015.
preferentially weighting content based on keywords, or (1999) demonstrated that machine learning could be
otherwise helping users rank their feed content. leveraged to rank email content for near-automated triage.
In summary, the sheer number of posts most users see in
Related Research Facebook highlights the need for better content ranking.
While relatively little research exists specifically on
Given that Facebook data is not entirely public, little ranking social network feed content, prior work has
research has examined methods for content ranking in
demonstrated that user behavior and Facebook content are
Facebook. However, several efforts have demonstrated the
predictive of a number of phenomena, and thus are good
predictive qualities of Facebook data. First, in terms of candidates on which to train statistical models for
leveraging social media to uncover relationships, Gilbert
classification. From work in the email domain, we know
and Karahalios (2009) showed that properties such as the
that both social metadata and machine learning have been
number of intimacy words exchanged between two users successfully leveraged to help triage incoming content.
on their Facebook walls and days since their last
Here, we take a similar approach in what, to our
communication, can predict “tie strength” (Granovetter,
knowledge, is the first research to apply machine learning
1973), or the strength of the relationship between any two to build predictive models of newsfeed importance, which
users, with moderate to strong accuracy. In our study,
in turn can be used to build interfaces that help users triage
although we had users rate the “importance” of their
their flood of posts.
friends, it was in terms of how interested they were in
knowing about their daily activities. Because it is possible
to have weak tie strength and a strong interest in knowing User Study
about a friend’s daily life (e.g., a boss), our focus here is
more about news and less about social relationships, In order to obtain importance ratings for newsfeed posts
though we consider all such variables to be useful features and friends, we conducted a user study. We recruited 24
for classification. participants through an email solicitation sent to our
In addition to predicting tie strength, Facebook has been organization. Participants were required to be active
analyzed statistically to better understand a variety of Facebook users who checked their newsfeed on a daily
properties of users and their behavior. For example, new basis. All participants had at least 200 friends in their
Facebook users’ photo posting behavior can be predicted social network. They were also financially compensated
by the photo posting behavior of friends in their network for their involvement.
(Burke et al., 2009). The number of friends has been
shown to have a curvilinear relationship with the social Data Collection Method
attractiveness and extraversion of the user (Tong et al.,
Participants were asked to download a Facebook
2008). And Sun et al. (2009) demonstrated that information
diffuses in small chains of users that may then merge, application we developed called “Newsfeed Tagger” under
Facebook’s Terms of Service agreement. As shown in
rather than starting at a single point.
Figure 1, the application consists of two tabs: one to Rate
In terms of triaging content more broadly, research in
the email domain has demonstrated a number of benefits to News Feed and another to Rate Friends. For newsfeed
posts, the application retrieved and displayed posts using
leveraging social metadata. For example, Venolia et al.
the same markup language style as the newsfeed on the
(2001) highlight a variety of social attributes of email that
contribute to perceived importance, including whether it Facebook home page (Figure 1(a)). Participants were
instructed to rate the importance of each post using a slider
was addressed directly to the user as well as the
next to the post, where “the far right of the slider means
relationship of the sender to the user (e.g., whether the
email came from a manager). Given the potential that this item is very important and the far left means that
you would skip the item.” The sliders provided a
usefulness of social metadata, prototype systems such as
continuous value from 0 to 100. For rating friends,
DriftCatcher (Lockerd, 2002), Bifrost (Balter & Sidner,
2002), and SNARF (Neustadter et al., 2005) have participants received a list of friends in their network
ranked according to a simple heuristic that took into
incorporated social relationship information when
account the last time users interacted with that friend and
organizing and presenting email to the user in order to
facilitate triage. Finally, of notable relevance, Horvitz et al. how frequently. As shown in Figure 1(b), because users
Figure 2: (a) Newsfeed ratings histogram; (b) Friend ratings histogram; (c) Scatter plot of time since post creation by newsfeed rating.
had over 200 friends, we also included a search box so that considered the majority of their friends to be people for
users could find friends. Participants were instructed to use whom they had little to moderate interest in knowing about
the adjacent sliders to rate how “close” they were to the their daily affairs. This does not include the friends they
friend, where closeness was defined as “interest in could not remember.
knowing what is going on in their daily lives”. Because Finally, because Facebook utilizes reverse chronological
many of the participants found it onerous to rate all their ordering of the newsfeed, we assessed to what extent
friends, we asked them to rate at least 100 friends. timeliness, or urgency, was correlated with the ratings. In
Participants were asked to do the rating every day for a other words, we investigated whether participants
full business week. Because we allowed participants to considered the most recent newsfeed posts to be the most
submit their ratings at their own leisure, not all participants important. Figure 2(c) shows a scatter plot where the x-axis
actively rated their newsfeed and friends. In all, we represents the time since the post was created in minutes
received 4989 newsfeed ratings and 4238 friend ratings. and the y-axis represents the newsfeed rating. Note that
Upon initiating the study, we downloaded whatever instead of a left leaning slope the scatter plot shows more
information was programmatically available for the of a vertical column; indeed, the Pearson correlation
participant’s Facebook account through the beta version of (r=.01) was not statistically significant. In short, for our
the Facebook Open Stream API per the Terms of Service participants, reverse chronological ordering did not suffice
agreement. Because participants had extensive social to surface the most important newsfeed posts.
networks, we did not download information about all the While many Facebook users would have suspected
friends in their networks but only those they remembered much of the data analysis reported in this section, no prior
enough to rate in the Rate Friends tab. Because research has, to our knowledge, provided any such
participants rated friends who had not sent posts during the empirical validation.
week of the study, and not all poster senders were rated by
the participants, only 3241 out of the 4989 posts (65%) had
ratings for the sender, along with other downloaded Model Selection Experiments
information. We used this smaller dataset for model
We conducted model selection experiments with two goals
selection so that we could compare the effects of using
different sets of features. in mind: first, we sought to identify what kinds of features
were predictive of the perceived importance of newsfeed
posts and friends, and second, we sought to attain the
Data Analysis maximum classification accuracy possible on the data.
In order to validate the need for newsfeed triage, we first Given the successful track record of linear kernel SVM
examined descriptive statistics for the ratings. Figure 2(a) classifiers in the area of text classification (Joachim, 1998),
displays a histogram of all the newsfeed ratings. The mode and the fact that they can be trained relatively quickly over
of the ratings was 0 – hence the large spike in the left of a very large number of features (e.g., n-grams), we decided
the histogram. The average rating was 37.3 and the median to learn linear SVM classifiers using the Sequential
36. Note that ratings greater than 80 comprised the two Minimal Optimization (SMO) algorithm (Platt, 1999). For
smallest bins in the histogram. ¾ of the ratings were below performance reasons, we discretized the values of the
60. In short, the descriptive statistics demonstrate that most continuous predictor variables into 5 bins containing
participants regarded the majority of newsfeed they roughly the same number of cases in each bin. For our
received to be unimportant, though participants varied in primary target variable, newsfeed rating, which is also
their rating distributions, as we revisit later. continuous, we split the ratings into 2 bins, Important and
Figure 2(b) displays a histogram of the friend ratings. Not Important, for several reasons. First, we intended to
Similar to the newsfeed ratings, the two smallest bins employ the models as a type of spam filter, which is
consist of ratings 80 and above. The mode was 0 and ¾ of typically binary. Second, finer-grained classification would
the ratings were below 60. The average friend rating was have been difficult given the size of our dataset (3241
42.4 and the median was 40. Hence, our participants cases). Furthermore, although we could have set the target
variable threshold to the midpoint of the sliders (i.e., 50), consisted of counts. Binary features that were observed 3
given the skewed histogram in Figure 2(b) we decided to times or less in the corpus were eliminated. The LIWC
use the median rating (i.e., 35) instead. This allowed us to features correspond to the counts of words in a text
avoid modeling complications due to unbalanced classes. belonging to each of 80 categories in the LIWC dictionary.
Note that Gilbert and Karahalios (2009) did not utilize any
Feature Engineering n-gram features and only looked at 13 emotion and
intimacy related LIWC categories: Positive Emotion,
Having downloaded all programmatically available content
Negative Emotion, Family, Friends, Home Sexual, Swears,
from participants’ Facebook accounts, we engineered Work, Leisure, Money, Body, Religion and Health. Given
features from three types of information: social media
our focus on news, we decided to include all other
properties, the message text and corpus, and shared
categories, such as Insight (e.g., “think”, “know”), Assent
background information. (e.g., “agree”, “OK”) and Fillers (e.g., “you know”, “I
mean”).
Social media properties. Social media properties included
In addition, we also extracted a number of other text-
any properties related to the newsfeed post and sender, oriented features from the post and corpus: Whether there
excluding the actual text. In particular, we extracted:
were embedded URLs; Total number of stop words; Ratio
Whether the post was a wall post or feed post; Whether the
of stop words to total words; Ratio of non-punctuation,
post contained photos, links, and/or videos; Total number non-alphanumeric characters to total characters; Sum and
of comments by everyone; Total number of comments by
average of tf.idf (term-frequency × inverse document
friends (including multiple comments); Total number of
frequency) of all words in a post or corpus, where tf.idf
comments by distinct friends; Total number of likes by scores were computed on all the posts in the entire dataset;
everyone; Total number of likes by friends; Time elapsed
Sum and average of tf.idf of all words in a post or corpus,
since the post was created; Total number of words
where tf.idf scores were computed on Wikipedia; Delta of
exchanged between the user and the sender on their the previous two tf.idf measures; Message length in tokens
respective walls (including comments); Total number of
and characters.
posts from the user to the sender; Total number of posts
from the sender to the user; Time since the first exchange; Shared background information. Finally, for every
Time since the most recent exchange; Total number of
participant and rated friend, we compared shared
photos in which both the user and sender are tagged
background information in terms of the following self-
together; Total number of photos the user has of the friend disclosed categories: Affiliations, Hometowns, Religion,
and vice versa; Total number of friends overlapping in
Political Views, Current Location, Activities, Stated
their respective networks.
Interests, Music, Television, Movies, Books, Pre-College
For every post, we also had lists of Facebook account Education, College and Post-College Education, and
IDs that had provided comments, likes, etc. We created a
“About Me” Profile. For each of these categories, after
set of features based on knowing the importance rating of
removing category-specific stop words (e.g., “high school”
the account IDs; in particular, the maximum friend rating for Pre-College Education), we extracted the number of
of people who posted comments, put likes, or are otherwise
common words as well as the percent overlap.
tagged in photos. The intuition here is that even if users do
not find the post content to be important, it may become
important if someone they know and track with great Experimental Setup
interest commented on it. For mutual friends between the All of our model selection experiments were conducted in
user and sender, we also extracted the maximum, minimum the following manner. First, the dataset was split into five
and average friend rating, along with its variance. folds of training and test data. The training set of the first
fold was then utilized to tune the optimal classifier
Message text and corpus. For text analysis features, we parameter settings, as measured on the test set of the first
looked at two sources: the post and the corpus of all posts fold. These settings were then used to learn SVM
exchanged between the user and the sender. Because classifiers on the training files for the remaining four folds.
Facebook maintains only the most recent posts, for the Evaluation was performed on the test sets of the four folds.
corpus we were only able to retrieve posts up to roughly 2- We conducted a grid search on the first fold to determine
3 months prior to the date of retrieval. In order to capture optimal values for the SVM cost parameter c, which trades
the linguistic content of the post and corpus, we extracted off training error against model complexity (Joachims,
both n-gram features, with n ranging from 1 to 3, and 2002). For feature reduction of binary features, even after
features based on the Linguistic Inquiry and Word Count imposing a count cutoff, the number of n-gram features
dictionary (LIWC, Pennebaker et al., 2007). N-gram was in the tens of thousands. So, we reduced the number of
features had binary values depending on whether the n- features to the top 3K features in terms of log likelihood
gram was present or not, whereas the LIWC features ratios (Dunning, 1993) as determined on the training set.
Figure 3: (a) Newsfeed importance classification; (b) Friend importance classification; (c) Histogram of the maximum rating differences
between participants for the same newsfeed post. Error bars represent standard errors about the mean.