0% found this document useful (0 votes)
21 views17 pages

Mining Social Media A Brief Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views17 pages

Mining Social Media A Brief Introduction

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

INFORMS 2012 c 2012 INFORMS | isbn 978-0-9843378-3-5

http://dx.doi.org/10.1287/educ.1120.0105

Mining Social Media: A Brief Introduction


Pritam Gundecha, Huan Liu
Arizona State University, Tempe, Arizona 85287 {[email protected], [email protected]}

Abstract The pervasive use of social media has generated unprecedented amounts of social
data. Social media provides easily an accessible platform for users to share informa-
tion. Mining social media has its potential to extract actionable patterns that can
be beneficial for business, users, and consumers. Social media data are vast, noisy,
unstructured, and dynamic in nature, and thus novel challenges arise. This tutorial
reviews the basics of data mining and social media, introduces representative research
problems of mining social media, illustrates the application of data mining to social
media using examples, and describes some projects of mining social media for human-
itarian assistance and disaster relief for real-world applications.
Keywords social media; data mining; social data; social media mining; social networking sites;
blogging; microblogging; crowdsourcing; HADR; privacy; trust

1. Introduction
Data mining research has successfully produced numerous methods, tools, and algorithms
for handling large amounts of data to solve real-world problems. Traditional data mining
has become an integral part of many application domains including bioinformatics, data
warehousing, business intelligence, predictive analytics, and decision support systems. Pri-
mary objectives of the data mining process are to effectively handle large-scale data, extract
actionable patterns, and gain insightful knowledge. Because social media is widely used
for various purposes, vast amounts of user-generated data exist and can be made available
for data mining. Data mining of social media can expand researchers’ capability of under-
standing new phenomena due to the use of social media and improve business intelligence
to provide better services and develop innovative opportunities. For example, data mining
techniques can help identify the influential people in the vast blogosphere, detect implicit
or hidden groups in a social networking site, sense user sentiments for proactive planning,
develop recommendation systems for tasks ranging from buying specific products to mak-
ing new friends, understand network evolution and changing entity relationships, protect
user privacy and security, or build and strengthen trust among users or between users and
entities. Mining social media is a burgeoning multidisciplinary area where researchers of dif-
ferent backgrounds can make important contributions that matter for social media research
and development.
The objective of this tutorial is to introduce social media, data mining, and their con-
fluence—mining social media. We attempt to achieve the goal by presenting representative
and interesting research issues and important social media tasks based on our experience
and research. This tutorial first reviews data mining, social media and its types, and the
importance of social media mining. In §2, we briefly introduce representative issues in social
media mining. In §3, we highlight the impact of social media mining using three examples
based on our current research. Section 4 illustrates how social media mining is applied in
some real-world applications—two projects on humanitarian assistance and disaster relief
(HADR) carried out in the Data Mining and Machine Learning Laboratory (DMML) at
Arizona State University (ASU). We conclude this tutorial in §5.
1
Gundecha and Liu: Mining Social Media: A Brief Introduction
2 Tutorials in Operations Research, c 2012 INFORMS

1.1. Data Mining


Data mining (Tan et al. [55]) is a process of discovering useful or actionable knowledge
in large-scale data. Data mining also means knowledge discovery from data (KDD) (Han
et al. [24]), which describes the typical process of extracting useful information from raw
data. The KDD process broadly consists of the following tasks: data preprocessing, data
mining, and postprocessing. These steps need not be separate tasks and can be combined
together. Data mining is an integral part of many related fields including statistics, machine
learning, pattern recognition, database systems, visualization, data warehouse, and infor-
mation retrieval (Han et al. [24]).
Data mining algorithms are broadly classified into supervised, unsupervised, and semi-
supervised learning algorithms. Classification is a common example of supervised learning
approach. For supervised learning algorithms, a given data set is typically divided into two
parts: training and testing data sets with known class labels. Supervised algorithms build
classification models from the training data and use the learned models for prediction. To
evaluate a classification model’s performance, the model is applied to the test data to obtain
classification accuracy. Typical supervised learning methods include decision tree induction,
k-nearest neighbors, naive Bayes classification, and support vector machines.
Unsupervised learning algorithms are designed for data without class labels. Clustering
is a common example of unsupervised learning. For a given task, unsupervised learning
algorithms build the model based on the similarity or dissimilarity between data objects.
Similarity or dissimilarity between the data objects can be measured using proximity mea-
sures including Euclidean distance, Minkowski distance, and Mahalanobis distance. Other
proximity measures such as simple matching coefficient, Jaccard coefficient, cosine similarity,
and Pearson’s correlation can also be used to calculate similarity or dissimilarity between
the data objects. K-means, hierarchical clustering (agglomerative or partitional methods),
and density-based clustering are typical examples of unsupervised learning.
Semisupervised learning algorithms are most applicable where there exist small amounts
of labeled data and large amounts of unlabeled data. Two typical types of semisupervised
learning are semisupervised classification and semisupervised clustering. The former uses
labeled data to make classification and unlabeled data to refine the classification boundaries
further, and the latter uses labeled data to guide clustering. Cotraining is a representative
semisupervised learning algorithm. Active learning algorithms allow users to play an active
role in the learning process via labeling. Typically, users are domain experts and their
skills are employed to label some data instances for which a machine learning algorithm are
confident about its classification. Minimum marginal hyperplane and maximum curiosity
are two popular active learning algorithms.
Data mining includes other techniques such as association rule mining, anomaly detection,
feature selection, instance selection, and visual analytics. Additional details related to these
data mining techniques can be found in Han et al. [24], Tan et al. [55], Witten et al. [59],
Zhao and Liu [63], and Liu and Motoda [37].

1.2. Social Media


Social media (Kaplan and Haenlein [28]) is defined as a group of Internet-based applications
that build on the ideological and technological foundations of Web 2.0 and that allow the
creation and exchanges of user-generated content. Social media is conglomerate of different
types of social media sites including traditional media such as newspaper, radio, and televi-
sion and nontraditional media such as Facebook, Twitter, etc. Table 1 shows characteristics
of different types of social media.
Social media gives users an easy-to-use way to communicate and network with each other
on an unprecedented scale and at rates unseen in traditional media. The popularity of social
media continues to grow exponentially, resulting in an evolution of social networks, blogs,
Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 3

Table 1. Characteristics of different types of social media.

Type Characteristics
Online social networking Online social networks are Web-based services that allow
individuals and communities to connect with real-world friends
and acquaintances online. Users interact with each other
through status updates, comments, media sharing, messages,
etc. (e.g., Facebook, Myspace, LinkedIn).
Blogging A blog is a journal-like website for users, aka bloggers, to
contribute textual and multimedia content, arranged in reverse
chronological order. Blogs are generally maintained by an
individual or by a community (e.g., Huffington Post, Business
Insider, Engadget).
Microblogging Microblogs can be considered same a blogs but with limited
content (e.g., Twitter, Tumblr, Plurk).
Wikis A wiki is a collaborative editing environment that allow multiple
users to develop Web pages (e.g., Wikipedia, Wikitravel,
Wikihow).
Social news Social news refers to the sharing and selection of news stories and
articles by community of users (e.g., Digg, Slashdot, Reddit).
Social bookmarking Social bookmarking sites allow users to bookmark Web content
for storage, organization, and sharing (e.g., Delicious,
StumbleUpon).
Media sharing Media sharing is an umbrella term that refers to the sharing of
variety of media on the Web including video, audio, and photo
(e.g., YouTube, Flickr, UstreamTV).
Opinion, reviews, and ratings The primary function of such sites is to collect and publish user-
submitted content in the form of subjective commentary on
existing products, services, entertainment, businesses, places,
etc. Some of these sites also provide products reviews (e.g.,
Epinions, Yelp, Cnet).
Answers These sites provide a platform for users seeking advice, guidance,
or knowledge to ask questions. Other users from the
community can answer these questions based on previous
experiences, personal opinions, or relevent research. Answers
are generally judged using ratings and comments (e.g., Yahoo!
answers, WikiAnswers).

microblogs, location-based social networks (LBSNs), wikis, social bookmarking applications,


social news, media (text, photo, audio, and video) sharing, product and business review
sites, etc. Facebook,1 the social networking site, recorded more than 845 million active users
as of December 2011. This number suggests that China (approximately 1.3 billion) and
India (approximately 1.1 billion) are the only two countries in the world that have larger
populations than Facebook. Facebook and Twitter have accrued more than 1.2 billion users,2
more than thrice the population of the United States and more than the population of any
continent except Asia.

1.3. Mining Social Media


Vast amounts of user-generated content are created on social media sites every day. This
trend is likely to continue with exponentially more content in the future. Hence, it is critical
for producers, consumers, and service providers to figure out management and utility of
massive user-generated data. Social media growth is driven by these challenges: (1) How

1 http://newsroom.fb.com/content/default.aspx?NewsAreaId=22 (accessed March 2012).


2 Facebook and Twitter have many overlapping users. Hence, 1.2 billion users are not unique.
Gundecha and Liu: Mining Social Media: A Brief Introduction
4 Tutorials in Operations Research, c 2012 INFORMS

can a user be heard? (2) Which source of information should a user use? (3) How can user
experience be improved? Answers to these questions are hidden in the social media data.
These challenges present ample opportunities for data miners to develop new algorithms
and methods for social media.
Data generated on social media sites are different from conventional attribute-value data
for classic data mining. Social media data are largely user-generated content on social media
sites. Social media data are vast, noisy, distributed, unstructured, and dynamic. These char-
acteristics pose challenges to data mining tasks to invent new efficient techniques and algo-
rithms. For example, Facebook3 and Twitter4 report Web traffic data from approximately
149 million and 90 million unique U.S. visitors per month, respectively. According to the
video sharing site YouTube,5 more than 4 billion videos are viewed per day, and 60 hours of
videos are uploaded every minute. The picture sharing site Flickr,6 as of August 2011, hosts
more than 6 billion photo images. Web-based, collaborative, and multilingual Wikipedia7
hosts over 20 million articles attracting over 365 million readers.
Depending on social media platforms, social media data can often be very noisy. Remov-
ing the noise from the data is essential before performing effective mining. Researchers
notice that spammers (Yardi et al. [61], Chu et al. [12]) generate more data than legiti-
mate users. Social media data are distributed because there is no central authority that
maintains data from all social media sites. Distributed social media data pose a daunting
task for researchers to understand the information flows on the social media. Social media
data are often unstructured. To make meaningful observations based on unstructured data
from various data sources is a big challenge. For example, social media sites like LinkedIn,
Facebook, and Flickr serve different purposes and meet different needs of users.
Social media sites are dynamic and continuously evolving. For example, Facebook recently
brought about many concepts including a user’s timeline, the creation of in-groups for a
user, and numerous user privacy policy changes. The dynamic nature of social media data
is a significant challenge for continuously and speedily evolving social media sites. There
are many additional interesting questions related to human behavior can be studied using
social media data. Social media can also help advertisers to find the influential people to
maximize the reach of their products within an advertising budget. Social media can help
sociologists to uncover the human behavior such as in-group and out-group behaviors of
users. Recently, social media was reported to play an instrumental role in facilitating mass
movements such as the Arab Spring8 and Occupy Wall Street.9

2. Issues in Mining Social Media


In this section, we introduce some representative research issues in mining social media.

2.1. Community Analysis


A community is formed by individuals such that those within a group interact with each
other more frequently than with those outside the group. Based on the context, a community
is also referred to as a group, cluster, cohesive subgroup, or module. Communities can be
observed via connections in social media because social media allows people to expand social
networks online (Tang and Liu [56]). Social media enables people to connect friends and

3 http://www.quantcast.com/facebook.com (accessed March 2012).


4 http://www.quantcast.com/twitter.com (accessed March 2012).
5 http://www.youtube.com/t/press statistics (accessed March 2012).
6 http://news.softpedia.com/news/Flickr-Boasts-6-Billion-Photo-Uploads-215380.shtml (accessed March 2012).
7 http://en.wikipedia.org/wiki/Wikipedia (accessed March 2012).
8 http://en.wikipedia.org/wiki/Arab Spring (accessed March 2012).
9 http://en.wikipedia.org/wiki/Occupy Wall Street (accessed March 2012).
Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 5

find new users of similar interests. Communities found in social media are broadly classified
into explicit and implicit groups. Explicit groups are formed by user subscriptions, whereas
implicit groups emerge naturally through interactions. Community analysts are generally
faced with issues such as community detection, formation, and evolution.
Community detection often refers to the extraction of implicit groups in a network. The
main challenges of community detections are that (1) the definition of a community can
be subjective, and (2) the lack of ground truth makes community evaluation difficult. Tang
and Liu [56] divided community detection methods into four categories: (1) node-centric
community detection, where each node satisfies certain properties such as complete mutual-
ity, reachability, node degrees, frequency of within and outside ties, etc. (examples include
cliques, k-cliques, and k-clubs); (2) group-centric community detection, where a group needs
to satisfy certain properties (for example, minimum group densities); (3) network-centric
community detection, where groups are formed based on partition of network into disjoint
sets (examples are spectral clustering and modularity maximization); and (4) hierarchy-
centric community detection, where the goal is to build a hierarchical structure of com-
munities. This allows the analysis of a network with different resolutions. Representative
methods are divisive clustering and agglomerative clustering.
Social media networks are highly dynamic. Communities can expand, shrink, or dissolve
in dynamic networks. Community evolution aims to discover the patterns of a community
over time with the presence of dynamic network interactions. Backstrom et al. [8] found
that the more friends you have in a group, the more likely you are to join, and communities
with cliques grow more slowly than those that are not tightly connected.

2.2. Sentiment Analysis and Opinion Mining


Sentiment analysis and opinion mining aim to automatically extract opinions expressed in
the user-generated content. Sentiment analysis and opinion mining tools allow businesses to
understand product sentiments, brand perception, new product perception, and reputation
management. These tools help users to perceive product opinions or sentiments on a global
scale. There are many social media sites reporting user opinions of products in many different
formats. Monitoring these opinions related to a particular company or product on social
media sites is a new challenge.
Sentiment analysis is hard because languages used to create contents are ambiguous.
Major steps of sentiment analysis are (1) finding relevant documents, (2) finding relevant
sections, (3) finding the overall sentiment, (4) quantifying the sentiment, and (5) aggregating
all sentiments to form an overview. Basic components of an opinion are (1) an object on
which opinion is expressed, (2) an opinion expressed on a object, and (3) the opinion holder.
Objects are generally represented as a finite set of features, where each feature represents
a finite set of synonymous words or phrases. Opinion mining tasks can be performed at
the document level (Turney [58], Pang et al. [49]), sentence level (Riloff and Wiebe [51],
Yu and Hatzivassiloglou [62]), or feature level (Hu and Liu [25], Popescu and Etzioni [50],
Liu and Maes [36]). Extracting opinions expressed in comparative sentences is a difficult
task, and some preliminary work can be found in Jindal and Liu [26], Jindal and Liu [27],
Liu [35], and Pang and Lee [48]. Performance evaluation of sentiment analysis is another
challenge because of the lack of ground truth.

2.3. Social Recommendation


Traditional recommendation systems attempt to recommend items based on aggregated
ratings of objects from users or past purchase histories of users. A social recommendation
system makes use of user’s social network and related information in addition to the tra-
ditional recommendation means. Social recommendation is based on the hypothesis that
people who are socially connected are more likely to share the same or similar interests
Gundecha and Liu: Mining Social Media: A Brief Introduction
6 Tutorials in Operations Research, c 2012 INFORMS

(homophily), and users can be easily influenced by the friends they trust and prefer their
friends’ recommendations to random recommendations. Objectives of social recommenda-
tion systems are to improve the quality of recommendation and alleviate the problem of
information overload. Examples of social recommendation systems are book recommenda-
tions based on friends’ reading lists on Amazon or friend recommendations on Twitter
and Facebook. More details on social recommendation systems can be found in Konstas
et al. [30], Ma et al. [38], and Backstrom and Leskovec [7].

2.4. Influence Modeling


Social scientists have been exploring influence and homophily in social networks for quite
some time (McPherson et al. [42], Lazarsfeld and Merton [34]). It is important to know
whether the underlying social network is influence driven or homophily driven. For example,
in the advertisement industry, if the social network is influence driven, then the influential
users should be identified and incentivized to promote the product or services to the members
of the social network. However, if the social network is homophily (similarity) driven, then
some individual users should be directly targeted to promote sales. Most social networks
have a mixture of both homophily and influence. Hence, distinguishing them is a challenge.
Aral et al. [5] and La Fond and Neville [33] gave details on distinguishing social influence and
homophily in social networks. Kempe et al. [29] studied the influence maximization problem.
For a given information propagation model, influence maximization aims to identify the
set of initial influential users from a given snapshot of a social network such that they can
influence the maximum number of other users within given budget constraints. Agarwal
et al. [3, 4] presented a preliminary model to identify influential bloggers in a community.
The blogosphere obeys a power law distribution with a few blogs being extremely influential
and a huge number of blogs being largely unknown. They reported that active bloggers are
not necessarily influential and proposed effective influence measures to identify influential
bloggers. A more general discussion on modeling and data mining in the blogosphere is given
by Agarwal and Liu [2].

2.5. Information Diffusion and Provenance


Researchers study how information diffuses and explore different models of information
diffusion, including the independent cascade model, the threshold model, the susceptible–
infected model, and the susceptible–infected–recovered model. Many such models have been
studied (Bailey [10], Granovetter [21], Mahajan [40], Macy [39], Berger [11]). Researchers
apply these models to analyze the spread of rumors, computer viruses, and diseases during
outbreaks. Two important problems from the social media viewpoint are (1) how information
spreads in a social media network and which factors affect the spread, and (2) what plausible
sources are, given some information from social media. The first problem of information
diffusion has received good attention from researchers. The second problem is still an open
research problem—information provenance in social media—and recognized as a key issue
to differentiate rumors from truth. Because social media data are distributed and dynamic,
the conventional techniques used in classical provenance research cannot be directly applied
in social media.

2.6. Privacy, Security, and Trust


The low barrier to and pervasive use of social media give rise to concerns on user privacy and
security issues. New challenges arise due to user’s opposing needs: on one hand, a user would
like to have as many friends and share as much as possible, and on the other hand, a user
would like to be as private as possible when needed. However, being gregarious requires
openness and transparency, but being private constricts one’s sharing. In addition, a social
networking site has its business needs to encourage users to easily find each other and expand
Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 7

their friendship networks as widely as possible—in other words, to be open. Hence, social
media poses new security challenges to fend off security threats to users and organizations.
With the variety of personal information disclosed in user profiles (e.g., information about
other users and user networks may be indirectly accessible), individuals may put themselves
and members of their social networks at risk for a variety of attacks. Social media has been
the target of numerous passive as well as active attacks including stalking, cyberbullying,
malvertizing, phishing, social spamming, scamming, and clickjacking.
Gross and Acquisti [22] showed that only a few users change the default privacy prefer-
ences on Facebook. In some cases, user profiles are completely public, making information
available and providing a communication mechanism to anyone who wants to access it. It
is no secret that when a profile is made public, malicious users including stalkers, spam-
mers, and hackers can use sensitive information for their personal gain. Sometimes malev-
olent users can even cause physical or emotional distress to other users (Rosenblum [52]).
Narayanan and Shmatikov [45, 46] demonstrated how users’ privacy can be weakened if an
attacker knows the presence of connections among users. Wondracek et al. [60] presented a
successful scheme to breach privacy by exploiting only the group membership information of
users.
Liu and Maes [36] pointed out a lack of privacy awareness and found a large number
of social network profiles in which people described themselves with a rich vocabulary in
terms of their passions and interests. Krishnamurthy and Wills [31] discussed the problem
of leakage of personally identifiable information and how it can be misused by third parties
(Narayanan and Shmatikov [46]). Squicciarini et al. [53] introduced a novel collective privacy
mechanism for better managing shared content between users. Fang and LeFevre [14] focused
on helping users to understand simple privacy settings, but did not consider additional
problems such as attribute inference (Zheleva and Getoor [64]) or shared data ownership
(Squicciarini et al. [53]). Zheleva and Getoor [64] showed how an adversary can exploit an
online social network with a mixture of public and private user profiles to predict the private
attributes of users. Baden et al. [9] presented a framework where users dictate who may
access their information based on public–private encryption–decryption algorithms.
Social trust depends on many factors that cannot be easily modeled in a computa-
tional system. Many different versions of definition of trust are proposed in the litera-
ture (Deutsch [13], Sztompka [54], Mui et al. [44], Olmedilla et al. [47], Grandison and
Sloman [20], Artz and Gil [6]). A highly cited thesis on trust computation (Marsh [41])
provides theoretical perspectives of modeling trust, but its complex nature makes it very
difficult to apply, especially to social networks (Golbeck and Hendler [18]). Trust between
any two people is observed to be affected by many factors including past experiences, opin-
ions expressed and actions taken, contributions to spreading rumors, influence by others’
opinions, and motives to gain something extra. Another important aspect of trust is the
trustworthiness of user-generated content. Moturu and Liu [43] provided an intuitive scoring
measure to quantify the trustworthiness of health-related user-generated content in social
media.

3. Illustrative Examples of Mining Social Media


In this section, we present some examples in our research to illustrate how to mine social
media data to address novel research issues. The first example is about assessing user vulner-
ability on a social networking sites to maintain user privacy. The second example explores
the importance of social and historical ties on a location-based social network. The third
example introduces a method that can take advantage of multifaceted trust for predictive
analytics.
Gundecha and Liu: Mining Social Media: A Brief Introduction
8 Tutorials in Operations Research, c 2012 INFORMS

3.1. Assessing User Vulnerability on a Social Networking Site


(Gundecha et al. [23])
Attribute-value data are a principal data form in social media. Attributes available for every
user on a social networking site can be categorized into two major types: individual attributes
and community attributes. Individual attributes are those attributes that contain individual
user information. Individual attributes include personal information such as gender, birth
date, phone number, home address, etc. Community attributes are those attributes that
contain information about the friends of a user. Community attributes include friends that
are traceable from a user’s profile (i.e., a user’s friends list), tagged pictures, and wall
interactions. Using the privacy and security settings of a profile, a user can control the
visibility of most individual attributes but has limited control over the visibility of most
community attributes. For example, Facebook users these days can control photo tagging
and the sharing of their friend list with the public but still can not control friends sharing
their friend lists or uploading photos of them from their profiles to the public.
Our earlier work (Gundecha et al. [23]) used a large-scale Facebook data set to assess
attribute visibility, which can be used to obtain general behavior of Facebook users. For
example, Facebook users do not usually disclose their mobile phone number. Hence, users
that do disclose phone numbers have a propensity to vulnerability because they disclose
more sensitive information in their profiles. A large portion of users are either not careful or
not aware of consequences of their actions on the privacy information of their friends. Thus,
protecting community attributes is important in protecting user privacy.
A novel mechanism was proposed by Gundecha et al. [23] to enable users to protect against
vulnerability. Often users on a social networking site are unaware that they could pose a
threat to their friends because of their vulnerability. Gundecha et al. [23] showed that it
is feasible to measure a user’s vulnerability based on three factors: (1) the user’s privacy
settings that can reveal personal information, (2) the user’s action on a social networking
site that can expose his or her friends’ personal information, and (3) friends’ actions on
a social networking site that can reveal the user’s personal information. Based on these
factors, Gundecha et al. [23] formally presented one of the earliest models for vulnerability
reduction. They proposed a four-step procedure to estimate user vulnerability: (1) estimate
risk to privacy due to individual attributes (referred to as the I-index), (2) estimate risk to
privacy due to community attributes (referred to as the C-index), (3) estimate the visibility
of a user based on the I-index and C-index (referred to as the P -index), and (4) estimate
the user vulnerability based on the P -indexes of a user and his one-hop friends (referred to
as the V -index).
Figures 1(a) and 1(b) show the relationship between the I-index and C-index as well as
that between the P -index and V -index for 100,000 randomly chosen Facebook users. Note
that users are sorted in ascending order of their I-indexes and P -indexes. The x-axis and
y-axis show users and their index values, respectively.
A user’s vulnerable friend is defined as a friend whose unfriending will lower the V -index
score of a user. This definition of a vulnerable friend can be generalized to multiple vulnerable
friends. Figure 2 shows a comparison of the V -index values of each user before and after the
unfriending of the user’s k most vulnerable friends. For each graph in Figure 2, the x-axis
and y-axis indicate users and their V -index values, respectively. Without loss of generality,
we sorted all users, in the ascending order based on the existing V -index values, before we
plotted the graphs in Figure 2. We ran the experiment on 300,000 randomly selected users
of the Facebook data set. Figure 2 shows the performance comparison of V -index values for
each user before (+) and after (◦) unfriending the k most vulnerable friends. We ran the
experiments for different values of k including 1, 2, 10, and 50. As expected, vulnerability
decreases consistently as the value of k increases, as seen in Figures 2(a)–2(d).
Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 9

Figure 1. Relationship among index values for each user.


(a) I -index and C index for each user (b) P-index and V index for each user
1.0 1.0
C-index I-index V -index P-index
0.8 0.8
Index values

Index values
0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 2.5 5.0 7.5 10 0 2.5 5.0 7.5 10
Users ×104 Users ×104

3.2. Exploring Social–Historical Ties on Location-Based


Social Networks (Gao et al. [16])
LBSNs have been a popular form of social media in recent years. They provide location-
related services that allow users to check in at geographical locations and share such expe-
rience with their friends. Millions of check-in records in LBSNs contain rich social and
geographical information and provide a unique opportunity for researchers to study users’

Figure 2. Performance comparison of V -index values for each user before (+) and after (◦)
unfriending the k most vulnerable friends from his or her social network.
(a) Most vulnerable (b) 2 most vulnerable

M1-index V-index M2-index V-index

0.6 0.6
Index values

Index values

0.4 0.4

0.2 0.2

0 0
0 1 2 3 0 1 2 3
Users ×105 Users ×105

(c) 10 most vulnerable (d) 50 most vulnerable

M 10-index V-index M50-index V-index

0.6 0.6
Index values

Index values

0.4 0.4

0.2 0.2

0 0
0 1 2 3 0 1 2 3
Users ×105 Users ×105
Gundecha and Liu: Mining Social Media: A Brief Introduction
10 Tutorials in Operations Research, c 2012 INFORMS

social behavior from a spatial–temporal aspect, which in turn enables a variety of services
including place advertisement or recommendation, traffic forecasting, and disaster relief.
To understand a user’s check-in behavior, it is inevitable to perform a historical analysis of
users. It is because the historical check-ins provide rich information about a user’s interests
and hints about when and where a particular user would like to go. In addition, social
correlation theory suggests to consider users’ social ties, because human movement is usually
affected by their social events, such as visiting friends, going out with colleagues, and so on.
These two relationship ties can shape the user’s check-in experience on LBSNs, and each
tie gives rise to a different probability of check-in activity, which indicates that people in
different spatial–temporal–social circles have different interactions.
The historical ties of a user’s check-in behavior have two properties on LBSNs. First,
a user’s check-in history approximately follows a power-law distribution; i.e., a user goes
to a few places many times and to many places a few times. Second, the historical ties
have a short-term effect. Taking advantage of the similarity between language modeling and
location-based social network mining, the work of Gao et al. [16] introduced the Pitman–Yor
process to location-based social networks to model the historical ties of a user i for his check-
in behavior cn+1 = l at time (n + 1) and location l, specifically, the power-law distribution
and short-term effect of historical ties, denoted as historical model (HM) as shown below:
i i, i
PH (cn+1 = l) = PHP Y (cn+1 = l | u, tul , d|u| , r|u| , tu ),

where u, tul , d|u| , r|u| , and tu are parameters. A social–historical model (SHM) is proposed
to explore user i’s check-in behavior integrating both of the social and historical effects:
i i
PSH (cn+1 = l) = ηPH (cn+1 = l) + (1 − η)PSi (cn+1 = l),

where
X i, j
PSi (cn+1 = l) = sim(ui , uj )PHP Y (cn+1 = l),
uj ∈N (ui )

where N indicates user i’s set of neighbors.


The experiments with location prediction on a real-world LBSN compare the proposed
methods (HM and SHM) with some baseline methods. The results are plotted in Figure 3,
demonstrating that the proposed methods best model users’ check-ins in terms of location
prediction; in other words, social and historical ties can help location prediction.
This work finds that a user’s friends can influence his next location because users that have
shared friends tend to go to similar locations than those without. The power-law property
and short-term effect are observed in historical ties; thus, a historical model is introduced to
capture these properties. The experimental results on location prediction demonstrate that
the proposed approach is suitable in capturing a user’s check-in property and outperforms
current models.

3.3. Discerning Multifaceted Trust in a Connected World


(Tang et al. [57])
The issue of trust has attracted increasing attention from the community of social media
research. Trust, as a social concept, naturally has multiple facets, indicating multiple and
heterogeneous trust relationships between users. Here is a multifaceted trust example from
Epinions. Figure 4(a) shows single trust relationships between user 1 and his 20 friends.
Here, we can see that user 7 is the more trustable for user 1. Figures 4(b) and 4(c) show their
multifaceted trust relationships in the categories “home and garden” and “restaurants,”
respectively. For the category “home and garden,” user 7 is not necessary the most trusted
Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 11

Figure 3. The performance comparison of prediction models demonstrates that by considering


social and historical ties, the proposed models can help location prediction.

0.40 MFC
MFT
Order-1
Order-2
0.35
HM
SHM
Prediction accuracy

0.30

0.25

0.20

0.15
10 20 30 40 50 60 70 80 90
Fraction of training set
Note. MFC, most frequent check-in model; MFT, most frequent time model; Order-1, order-1 Markov model;
Order-2, order-2 Markov model.

friend of user 1. This shows that trust relationships in different categories vary. Thus, people
trust others differently in different facets.
There are two challenges to study in obtaining multifaceted trust between users: first,
the representation of multiple and heterogeneous trust relationships between users, and
second, estimating the strength of multifaceted trust. Traditionally, trust is represented by
an adjacency matrix. However, this cannot capture the multifaceted trust relations. Tang
et al. [57] developed a new algorithm, mTrust, that extends a matrix representation to
a tensor representation, adding an extra dimension for facet description. Previous work
observed a strong correlation between trust and user similarity in the context of rating
systems. Therefore, it is reasonable to embed trust strength inference in rating prediction.
Thus, to evaluate the usefulness of multifaced trust, this work embeds the multifaceted trust
inference in the framework of rating prediction.
Interesting findings from the experiments are that (1) more than 20% of reciprocal links are
heterogeneous, (2) more than 14% transitive trust relations are heterogeneous, and (3) more

Figure 4. Single trust and multifaceted trust relationships of one user in Epinions.
(a) Single trust (b) Trust in home and garden (c) Trust in restaurants
6 7 6
5 5 7 5 6 7
4 8 8 8
4 4
3 9 9 9
3 3
10 10
2 2 2 10

11 11 1 11
1 1
12
12 12
21
13 21 21
13 13
20 20
14 20 14 14
19 19
15 19 15 15
18 16 18 18
17 17 16 17 16
Pajek
Pajek Pajek

Note. The thickness of a line indicates the level of trust.


Gundecha and Liu: Mining Social Media: A Brief Introduction
12 Tutorials in Operations Research, c 2012 INFORMS

than 11% of cocitation trust relations are heterogeneous. With these findings, mTrust can
be applied to many online tasks such as improving rating prediction, enabling facet-sensitive
ranking, and making status theory applicable to reciprocal links.

4. Employing Social Media in Real-World Applications


Social media has been increasingly used in a wide range of domains; examples include
political campaigns (e.g., presidential elections), mass movements (e.g., organizing Occupy
Wall Street movements, Arab Spring), as well as disaster and crisis response and relief
coordination. In this section, we show how social media mining is used for HADR.
Government and nongovernmental organizations (NGOs) have been faced with challenges
to effectively respond to crises such as natural disasters (e.g., tsunamis, hurricanes, earth-
quakes). Providing efficient relief to the victims is a primary focus in saving lives, minimizing
further losses in the aftermath, and accelerating recovery. Crowdsourcing tools via social
media such as Twitter, Ushahidi, and Sahana have proven useful in gathering information
during and after a crisis. However, they are designed to go beyond crowdsourcing. Additional
capabilities are required for response coordination, secured collaboration, and trust build-
ing among relief organizations. Goolsby [19] pointed out the need for such a social media
system and described the effort to build it to allow different organizations to share crowd-
sourced and groupsourced information, and analyze and visualize the processed information
for intelligent decision making. In this section, we describe two social media prototype sys-
tems designed to facilitate efficient collaboration among disparate organizations for effective
and coordinated responses to crises.

4.1. ASU Coordination Tracker (Gao et al. [17])


When natural disasters occur, the international community would join forces to provide dis-
aster relief and humanitarian assistance. Some prominent examples include the Haiti earth-
quake and the subsequent cholera outbreak, and the devastating earthquake and tsunami in
Japan. Social media has revolutionized the use of traditional media and played an important
role in these events as an information collector and disseminator, and as a communication
and collaboration tool.
As one of the most important functions of social media, crowdsourcing is a collaborative
information sharing mechanism based on the principle of collective wisdom. Wikipedia10 is a
perfect example of crowdsourcing, where people collectively publish information on various
topics. Crowdsourcing is capable of leveraging participatory social media services and tools
to collect information, and it allows crowds to participate in various HADR tasks. Its integra-
tion with crisis maps has been a very effective crowdsourcing application in HADR efforts.
One of the earliest such systems was Alive in Afghanistan,11 where people in Afghanistan
submitted their reports on accidents and terrorist attacks across Afghanistan.
Although crowdsourcing applications can provide accurate and timely information about
a crisis for decision making, current crowdsourcing applications still fall short in supporting
disaster relief efforts (Gao et al. [15]). Most importantly, current applications do not pro-
vide a common mechanism specifically designed for collaboration and coordination between
disparate relief organizations. For example, relief organizations that work independently
can cause conflicts and complicate the relief efforts. If independent organizations duplicate
a response, it will draw resources away from the other areas in need that could use the
duplicated supplies. It can also result in delayed response to other disaster areas. Further-
more, because of the noisy and chaotic nature of crowdsourced data, current crowdsourcing
applications cannot provide readily useful information for disaster relief efforts.

10 http://www.wikipedia.org/ (accessed March 2012).


11 http://aliveinafghanistan.org/ (accessed March 2012).
Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 13

To address these shortfalls of crowdsourcing to a certain extent, an online coordination


system, the ASU Coordination Tracker (ACT), was devised. The ACT is an event response
coordination system with the primary goal of facilitating multiorganization response (mili-
tary, governments, NGOs, etc.) to an event such as a disaster and providing relief organi-
zations the means for better collaboration and coordination during a crisis. It is a speedy
and effective approach with easy, open communication and streamlined coordination. The
major features and advantages of the ACT are summarized below:
• leveraging crowdsourcing information to provide the means for a groupsourcing response
for organizations to assist persons affected by a crisis effectively,
• developing novel strategies to analyze the requests and maximize the coordinate efforts
of crowdsourced data for disaster relief,
• visualizing the requests to give a global view of the request distribution to facilitate
responders, and
• increasing the coordination efficiency by optimally responding to requests.
The ACT consists of five functional modules: request collection, request analysis, response,
coordination, and situation awareness. Raw requests from users are collected via crowd-
sourcing and groupsourcing methods. The request analysis module takes advantage of both
data mining technology and expert knowledge to iteratively capture the essential content
of raw requests. Based on the demand and disaster background, raw requests are analyzed
and classified into various categories. Essential contents extracted from the raw data are
then stored in a requests pool. The system visualizes the requests pool on its crisis map,
with its selective visibility bonded to the access levels of users (organizations). The response
module is designed to allow relief organizations to contribute, receive, and evaluate differ-
ent response options and plan response actions. Relief organizations are able to respond
to requests through the crisis map directly. The coordination model uses the interagency
concept (Goolsby [19]) to avoid response conflicts while maintaining centralized control.
The ACT displays every available request on the crisis map. Each request is in one of the
following four states: available, in process, in delivery, or delivered. To avoid conflicts, relief
organizations are not able to respond to requests that are being fulfilled by another organi-
zation. A statistics module runs in the background to help track relief progress for situation
awareness, relief strategy adjustments, and further decision making.
The ACT is a developing system that enables coordination among organizations during
disaster relief. We are investing approaches to improve collaboration efficiency and provide
differential security to relief organizations and decision makers. The disaster relief ASU Crisis
Response Game (Abbasi et al. [1]) has demonstrated that by both leveraging crowdsourcing
information and providing means for a groupsourcing response, organizations can effectively
assist victims.

4.2. TweetTracker (Kumar et al. [32])


TweetTracker12 is a Twitter-based analytic and visualization tool. The focus of the tool is to
help HADR relief organizations to acquire situational awareness during disasters and emer-
gencies to aid disaster relief efforts (Kumar et al. [32]). New social media platforms, such as
Twitter microblogs, demonstrate their value and cability to provide information that is not
easily attainable from traditional media. For example, during the Mumbai blasts of 2011,13
firsthand information from the affected region was available on Twitter moments after the
blast. TweetTracker is designed to help to track, analyze, review, and monitor tweets. This
is achieved through near-real-time tracking of tweets with specific keywords/hashtags and
tweets generated from the region affected by the crisis. The tool supports monitoring and

12 http://tweettracker.fulton.asu.edu/ (accessed March 2012).


13 http://ibnlive.in.com/news/mumbai-blasts-twitter-joins-hands-to-help/167345-3.html (accessed March 2012).
Gundecha and Liu: Mining Social Media: A Brief Introduction
14 Tutorials in Operations Research, c 2012 INFORMS

analysis of the collected tweets via real-time trending, data reduction, historical review, and
integrated data mining techniques.
TweetTracker consists of three main components: (1) a Twitter stream reader, (2) a data
storage module, and (3) a data mining and visualization module. The Twitter stream reader
is a data collection module that continually crawls tweets through the Twitter streaming
API (Application Programming Interface).14 Tweets are filtered based on user-specified
keywords, hashtags, and geolocations. The data storage module is responsible for storing and
indexing the collected tweets into a relational database for use by the visualization module.
The data mining and visualization module is a Web-based user interface to the collected
tweets and a means to analyze the collected tweets. It provides geospatial visualization of
tweets related to a particular event on a map, summarizes the tweets, and visualizes the
trending keywords in the form of a word cloud, and it can identify popular resources (URLs)
and users mentioned in the tweets. The tool also includes built-in language translation
support for monitoring of multilingual tweets.
TweetTracker has been used in tracking, visualizing, and analyzing activities including the
Arab Spring movement, the Occupy Wall Street movement, and various natural disasters
such as earthquakes and cholera outbreaks.

5. Summary
Valuable information is hidden in vast amounts of social media data, presenting ample
opportunities social media mining to discover actionable knowledge that is otherwise difficult
to find. Social media data are vast, noisy, distributed, unstructured, and dynamic, which
poses novel challenges for data mining. In this tutorial, we offer a brief introduction to
mining social media, use illustrative examples to show that burgeoning social media mining
is spearheading the social media research, and demonstrate its invaluable contributions to
real-world applications.
As a main type of “big data,” social media is finding its many innovative uses, such as
political campaigns, job applications, business promotion and networking, and customer
services, and using and mining social media is reshaping business models, accelerating viral
marketing, and enabling the rapid growth of various grassroots communities. It also helps
in trend analysis and sales prediction. Social media data will continue their rapid growth
in the foreseeable future. We are faced with an increasing demand for new algorithms and
social media mining tools. Existing preliminary success in social media mining research
efforts convincingly demonstrates the promising future of the emerging social media mining
community and will help to expand research and development and explore online and off-line
human behavior and interaction patterns.

Acknowledgments
The authors thank Huiji Gao, Shamanth Kumar, Jiliang Tang, and DMML members for their
assistance and feedback in preparing this tutorial. Some projects described in this brief introductory
survey were sponsored by the Office of Naval Research [ONR N000141010091]; the Army Research
Office [ARO 025071]; and the National Science Foundation [Grant 0812551].

References
[1] M. A. Abbasi, S. Kumar, J. A. Andrade Filho, and H. Liu. Lessons learned in using social media
for disaster relief—ASU Crisis Response Game. Proceedings of the International Conference
on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer-Verlag, Berlin,
282–289, 2012.
[2] N. Agarwal and H. Liu. Modeling and Data Mining in Blogosphere. Morgan & Claypool Pub-
lishers, San Rafael, CA, 2009.

14 https://dev.twitter.com/docs/streaming-api (accessed March 2012).


Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 15

[3] N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community.
Proceedings of the International Conference on Web Search and Web Data Mining. Association
for Computing Machinery, New York, 207–218, 2008.
[4] N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Modeling blogger influence in a community. Social
Network Analysis and Mining 2(2):139–162, 2012.
[5] S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing influence-based contagion from
homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sci-
ences of the United States of America 106(51):21544, 2009.
[6] D. Artz and Y. Gil. A survey of trust in computer science and the Semantic Web. Web Seman-
tics: Science, Services and Agents on the World Wide Web 5(2):58–71, 2007.
[7] L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links
in social networks. Proceedings of the Fourth ACM International Conference on Web Search
and Data Mining. Association for Computing Machinery, New York, 635–644, 2011.
[8] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social
networks: Membership, growth, and evolution. Proceedings of the 12th ACM SIGKDD Inter-
national Conference on Knowledge Discovery and Data Mining. Association for Computing
Machinery, New York, 44–54, 2006.
[9] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, and D. Starin. Persona: An online
social network with user-defined privacy. ACM SIGCOMM Computer Communication Review
39(4):135–146, 2009.
[10] N. T. J. Bailey. The Mathematical Theory of Infectious Diseases and Its Applications. Charles
Griffin, High Wycombe, UK, 1975.
[11] E. Berger. Dynamic monopolies of constant size. Journal of Combinatorial Theory, Series B
83(2):191–200, 2001.
[12] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Who is tweeting on Twitter: Human, bot,
or cyborg? Proceedings of the 26th Annual Computer Security Applications Conference. Asso-
ciation for Computing Machinery, New York, 21–30, 2010.
[13] M. Deutsch. Cooperation and trust: Some theoretical notes. Nebraska Symposium on Motiva-
tion. University of Nebraska Press, Lincoln, 1962.
[14] L. Fang and K. LeFevre. Privacy wizards for social networking sites. Proceedings of the 19th
International Conference on World Wide Web. Association for Computing Machinery, New
York, 351–360, 2010.
[15] H. Gao, G. Barbier, and R. Goolsby. Harnessing the crowdsourcing power of social media for
disaster relief. IEEE Intelligent Systems 26(3):10–14, 2011.
[16] H. Gao, J. Tang, and H. Liu. Exploring social-historical ties on location-based social net-
works. Proceedings of the 6th International AAAI Conference on Weblogs and Social Media.
Association for the Advancement of Artificial Intelligence, Palo Alto, CA, 2012.
[17] H. Gao, X. Wang, G. Barbier, and H. Liu. Promoting coordination for disaster relief: From
crowdsourcing to coordination. Proceedings of the 4th Conference on Social Computing,
Behavioral-Cultural Modeling and Prediction. Springer-Verlag, Berlin, 197–204, 2011.
[18] J. Golbeck and J. Hendler. Inferring binary trust relationships in web-based social networks.
ACM Transactions on Internet Technology 6(4):497–529, 2006.
[19] R. Goolsby. Social media as crisis platform: The future of community maps/crisis maps. ACM
Transactions on Intelligent Systems and Technology 1(1):1–11, 2010.
[20] T. Grandison and M. Sloman. A survey of trust in Internet applications. IEEE Communications
Surveys & Tutorials 3(4):2–16, 2009.
[21] M. Granovetter. Threshold models of collective behavior. American Journal of Sociology
83(6):1420–1443, 1978.
[22] R. Gross and A. Acquisti. Information revelation and privacy in online social networks. Pro-
ceedings of the 2005 ACM Workshop on Privacy in the Electronic Society. Association for
Computing Machinery, New York, 71–80, 2005.
[23] P. Gundecha, G. Barbier, and H. Liu. Exploiting vulnerability to secure user privacy on a social
networking site. The 17th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining. Association for Computing Machinery, New York, 2011.
Gundecha and Liu: Mining Social Media: A Brief Introduction
16 Tutorials in Operations Research, c 2012 INFORMS

[24] J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann,
San Francisco, 2011.
[25] M. Hu and B. Liu. Mining and summarizing customer reviews. Proceedings of the Tenth ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for
Computing Machinery, New York, 168–177, 2004.
[26] N. Jindal and B. Liu. Identifying comparative sentences in text documents. Proceedings of the
29th Annual International ACM SIGIR Conference on Research and Development in Informa-
tion Retrieval. Association for Computing Machinery, New York, 244–251, 2006.
[27] N. Jindal and B. Liu. Opinion spam and analysis. Proceedings of the International Conference
on Web Search and Web Data Mining. Association for Computing Machinery, New York,
219–230, 2008.
[28] A. M. Kaplan and M. Haenlein. Users of the world, unite! The challenges and opportunities
of social media. Business Horizons 53(1):59–68, 2010.
[29] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social
network. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. Association for Computing Machinery, New York, 137–146, 2003.
[30] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recom-
mendation. Proceedings of the 32nd International ACM SIGIR Conference on Research and
Development in Information Retrieval. Association for Computing Machinery, New York,
195–202, 2009.
[31] B. Krishnamurthy and C. E. Wills. On the leakage of personally identifiable information via
online social networks. ACM SIGCOMM Computer Communication Review 40(1):112–117,
2010.
[32] S. Kumar, R. Zafarani, and H. Liu. Understanding user migration patterns across social media.
Twenty-Fifth International Conference on Artificial Intelligence. Association for the Advance-
ment of Artificial Intelligence, Palo Alto, CA, 2011.
[33] T. La Fond and J. Neville. Randomization tests for distinguishing social influence and
homophily effects. Proceedings of the 19th International Conference on World Wide Web. Asso-
ciation for Computing Machinery, New York, 601–610, 2010.
[34] P. F. Lazarsfeld and R. K. Merton. Friendship as a social process: A substantive and method-
ological analysis. Freedom and Control in Modern Society 18:18–66, 1954.
[35] B. Liu. Sentiment analysis and subjectivity. Handbook of Natural Language Processing. CRC
Press, Boca Raton, FL, 627–666, 2010.
[36] H. Liu and P. Maes. InterestMap: Harvesting social network profiles for recommendations.
Workshop: Beyond Personalization, San Diego, 2005.
[37] H. Liu and H. Motoda. Computational Methods of Feature Selection. Chapman & Hall, Boca
Raton, FL, 2008.
[38] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regular-
ization. Proceedings of the Fourth ACM International Conference on Web Search and Data
Mining. Association for Computing Machinery, New York, 287–296, 2011.
[39] M. W. Macy. Chains of cooperation: Threshold effects in collective action. American Sociolog-
ical Review 56(6):730–747, 1991.
[40] V. Mahajan, E. Muller, and F. M. Bass. New product diffusion models in marketing: A review
and directions for research. Journal of Marketing 54(1):1–26, 1990.
[41] S. P. Marsh. Formalising trust as a computational concept. Ph.D. thesis, Deptartment of
Computing Science and Mathematics, University of Stirling, Stirling, UK, 1994.
[42] M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social
networks. Annual Review of Sociology 27:415–444, 2001.
[43] S. T. Moturu and H. Liu. Quantifying the trustworthiness of social media content. Distributed
and Parallel Databases 29(3):239–260, 2011.
[44] L. Mui, M. Mohtashemi, and A. Halberstadt. A computational model of trust and reputa-
tion for E-businesses. Proceedings of the 35th Annual Hawaii Conference on System Sciences
(HICSS’02). IEEE Computer Society, Washington, DC, 2431–2439, 2002.
Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 17

[45] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. Proceed-
ings of the 2008 IEEE Symposium on Security and Privacy. IEEE Computer Society, Wash-
ington, DC, 111-125, 2008.
[46] A. Narayanan and V. Shmatikov. De-anonymizing social networks. Proceedings of the 2009
IEEE Symposium on Security and Privacy. IEEE Computer Society, Washington, DC,
173–187, 2009.
[47] D. Olmedilla, O. Rana, B. Matthews, and W. Nejdl. Security and trust issues in semantic
grids. Proceedings of the Dagsthul Seminar, Semantic Grid: The Convergence of Technologies
5271:896–902, 2005.
[48] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trendsr in
Information Retrieval 2(1–2):1–135, 2008.
[49] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine
learning techniques. Proceedings of the ACL-02 Conference on Empirical Methods in Natural
Language Processing, Vol. 10. Association for Computational Linguistics, Stroudsburg, PA,
79–86, 2002.
[50] A. M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. Pro-
ceedings of the Conference on Human Language Technology and Empirical Methods in Natural
Language Processing. Association for Computational Linguistics, Stroudsburg, PA, 339–346,
2005.
[51] E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. Proceedings of
the 2003 Conference on Empirical Methods in Natural Language Processing. Association for
Computational Linguistics, Stroudsburg, PA, 105–112, 2003.
[52] D. Rosenblum. What anyone can know: The privacy risks of social networking sites. IEEE
Security and Privacy 5(3):40–49, 2007.
[53] A. C. Squicciarini, M. Shehab, and F. Paci. Collective privacy management in social networks.
Proceedings of the 18th International Conference on World Wide Web. Association for Com-
puting Machinery, New York, 521–530, 2009.
[54] P. Sztompka. Trust: A Sociological Theory. Cambridge University Press, Cambridge, UK, 1999.
[55] P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson Addison Wesley,
Boston, 2006.
[56] L. Tang and H. Liu. Community Detection and Mining in Social Media, Vol. 2. Morgan &
Claypool Publishers, San Rafael, CA, 2010.
[57] J. Tang, H. Gao, and H. Liu. mTrust: Discerning multi-faceted trust in a connected world.
Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. Asso-
ciation for Computing Machinery, New York, 93–102, 2012.
[58] P. D. Turney. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised clas-
sification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational
Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 417–424, 2002.
[59] I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Machine Learning Tools and
Techniques. Morgan Kaufmann, San Francisco, 2011.
[60] G. Wondracek, T. Holz, E. Kirda, and C. Kruegel. A practical attack to de-anonymize social
network users. Proceedings of the 2010 IEEE Symposium on Security and Privacy. IEEE Com-
puter Society, Washington, DC, 223–238, 2010.
[61] S. Yardi, D. Romero, G. Schoenebeck, and D. Boyd. Detecting spam in a Twitter network.
First Monday 15(1):1–4, 2009.
[62] H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: Separating facts from
opinions and identifying the polarity of opinion sentences. Proceedings of the 2003 Confer-
ence on Empirical Methods in Natural Language Processing. Association for Computational
Linguistics, Stroudsburg, PA, 129–136, 2003.
[63] Z. A. Zhao and H. Liu. Spectral Feature Selection for Data Mining. Chapman & Hall/CRC
Press, Virginia Beach, VA, 2012.
[64] E. Zheleva and L. Getoor. To join or not to join: the illusion of privacy in social networks
with mixed public and private user profiles. Proceedings of the 18th International Conference
on World Wide Web. Association for Computing Machinery, New York, 531–540, 2009.

You might also like