Mining Social Media A Brief Introduction
Mining Social Media A Brief Introduction
http://dx.doi.org/10.1287/educ.1120.0105
Abstract The pervasive use of social media has generated unprecedented amounts of social
data. Social media provides easily an accessible platform for users to share informa-
tion. Mining social media has its potential to extract actionable patterns that can
be beneficial for business, users, and consumers. Social media data are vast, noisy,
unstructured, and dynamic in nature, and thus novel challenges arise. This tutorial
reviews the basics of data mining and social media, introduces representative research
problems of mining social media, illustrates the application of data mining to social
media using examples, and describes some projects of mining social media for human-
itarian assistance and disaster relief for real-world applications.
Keywords social media; data mining; social data; social media mining; social networking sites;
blogging; microblogging; crowdsourcing; HADR; privacy; trust
1. Introduction
Data mining research has successfully produced numerous methods, tools, and algorithms
for handling large amounts of data to solve real-world problems. Traditional data mining
has become an integral part of many application domains including bioinformatics, data
warehousing, business intelligence, predictive analytics, and decision support systems. Pri-
mary objectives of the data mining process are to effectively handle large-scale data, extract
actionable patterns, and gain insightful knowledge. Because social media is widely used
for various purposes, vast amounts of user-generated data exist and can be made available
for data mining. Data mining of social media can expand researchers’ capability of under-
standing new phenomena due to the use of social media and improve business intelligence
to provide better services and develop innovative opportunities. For example, data mining
techniques can help identify the influential people in the vast blogosphere, detect implicit
or hidden groups in a social networking site, sense user sentiments for proactive planning,
develop recommendation systems for tasks ranging from buying specific products to mak-
ing new friends, understand network evolution and changing entity relationships, protect
user privacy and security, or build and strengthen trust among users or between users and
entities. Mining social media is a burgeoning multidisciplinary area where researchers of dif-
ferent backgrounds can make important contributions that matter for social media research
and development.
The objective of this tutorial is to introduce social media, data mining, and their con-
fluence—mining social media. We attempt to achieve the goal by presenting representative
and interesting research issues and important social media tasks based on our experience
and research. This tutorial first reviews data mining, social media and its types, and the
importance of social media mining. In §2, we briefly introduce representative issues in social
media mining. In §3, we highlight the impact of social media mining using three examples
based on our current research. Section 4 illustrates how social media mining is applied in
some real-world applications—two projects on humanitarian assistance and disaster relief
(HADR) carried out in the Data Mining and Machine Learning Laboratory (DMML) at
Arizona State University (ASU). We conclude this tutorial in §5.
1
Gundecha and Liu: Mining Social Media: A Brief Introduction
2 Tutorials in Operations Research, c 2012 INFORMS
Type Characteristics
Online social networking Online social networks are Web-based services that allow
individuals and communities to connect with real-world friends
and acquaintances online. Users interact with each other
through status updates, comments, media sharing, messages,
etc. (e.g., Facebook, Myspace, LinkedIn).
Blogging A blog is a journal-like website for users, aka bloggers, to
contribute textual and multimedia content, arranged in reverse
chronological order. Blogs are generally maintained by an
individual or by a community (e.g., Huffington Post, Business
Insider, Engadget).
Microblogging Microblogs can be considered same a blogs but with limited
content (e.g., Twitter, Tumblr, Plurk).
Wikis A wiki is a collaborative editing environment that allow multiple
users to develop Web pages (e.g., Wikipedia, Wikitravel,
Wikihow).
Social news Social news refers to the sharing and selection of news stories and
articles by community of users (e.g., Digg, Slashdot, Reddit).
Social bookmarking Social bookmarking sites allow users to bookmark Web content
for storage, organization, and sharing (e.g., Delicious,
StumbleUpon).
Media sharing Media sharing is an umbrella term that refers to the sharing of
variety of media on the Web including video, audio, and photo
(e.g., YouTube, Flickr, UstreamTV).
Opinion, reviews, and ratings The primary function of such sites is to collect and publish user-
submitted content in the form of subjective commentary on
existing products, services, entertainment, businesses, places,
etc. Some of these sites also provide products reviews (e.g.,
Epinions, Yelp, Cnet).
Answers These sites provide a platform for users seeking advice, guidance,
or knowledge to ask questions. Other users from the
community can answer these questions based on previous
experiences, personal opinions, or relevent research. Answers
are generally judged using ratings and comments (e.g., Yahoo!
answers, WikiAnswers).
can a user be heard? (2) Which source of information should a user use? (3) How can user
experience be improved? Answers to these questions are hidden in the social media data.
These challenges present ample opportunities for data miners to develop new algorithms
and methods for social media.
Data generated on social media sites are different from conventional attribute-value data
for classic data mining. Social media data are largely user-generated content on social media
sites. Social media data are vast, noisy, distributed, unstructured, and dynamic. These char-
acteristics pose challenges to data mining tasks to invent new efficient techniques and algo-
rithms. For example, Facebook3 and Twitter4 report Web traffic data from approximately
149 million and 90 million unique U.S. visitors per month, respectively. According to the
video sharing site YouTube,5 more than 4 billion videos are viewed per day, and 60 hours of
videos are uploaded every minute. The picture sharing site Flickr,6 as of August 2011, hosts
more than 6 billion photo images. Web-based, collaborative, and multilingual Wikipedia7
hosts over 20 million articles attracting over 365 million readers.
Depending on social media platforms, social media data can often be very noisy. Remov-
ing the noise from the data is essential before performing effective mining. Researchers
notice that spammers (Yardi et al. [61], Chu et al. [12]) generate more data than legiti-
mate users. Social media data are distributed because there is no central authority that
maintains data from all social media sites. Distributed social media data pose a daunting
task for researchers to understand the information flows on the social media. Social media
data are often unstructured. To make meaningful observations based on unstructured data
from various data sources is a big challenge. For example, social media sites like LinkedIn,
Facebook, and Flickr serve different purposes and meet different needs of users.
Social media sites are dynamic and continuously evolving. For example, Facebook recently
brought about many concepts including a user’s timeline, the creation of in-groups for a
user, and numerous user privacy policy changes. The dynamic nature of social media data
is a significant challenge for continuously and speedily evolving social media sites. There
are many additional interesting questions related to human behavior can be studied using
social media data. Social media can also help advertisers to find the influential people to
maximize the reach of their products within an advertising budget. Social media can help
sociologists to uncover the human behavior such as in-group and out-group behaviors of
users. Recently, social media was reported to play an instrumental role in facilitating mass
movements such as the Arab Spring8 and Occupy Wall Street.9
find new users of similar interests. Communities found in social media are broadly classified
into explicit and implicit groups. Explicit groups are formed by user subscriptions, whereas
implicit groups emerge naturally through interactions. Community analysts are generally
faced with issues such as community detection, formation, and evolution.
Community detection often refers to the extraction of implicit groups in a network. The
main challenges of community detections are that (1) the definition of a community can
be subjective, and (2) the lack of ground truth makes community evaluation difficult. Tang
and Liu [56] divided community detection methods into four categories: (1) node-centric
community detection, where each node satisfies certain properties such as complete mutual-
ity, reachability, node degrees, frequency of within and outside ties, etc. (examples include
cliques, k-cliques, and k-clubs); (2) group-centric community detection, where a group needs
to satisfy certain properties (for example, minimum group densities); (3) network-centric
community detection, where groups are formed based on partition of network into disjoint
sets (examples are spectral clustering and modularity maximization); and (4) hierarchy-
centric community detection, where the goal is to build a hierarchical structure of com-
munities. This allows the analysis of a network with different resolutions. Representative
methods are divisive clustering and agglomerative clustering.
Social media networks are highly dynamic. Communities can expand, shrink, or dissolve
in dynamic networks. Community evolution aims to discover the patterns of a community
over time with the presence of dynamic network interactions. Backstrom et al. [8] found
that the more friends you have in a group, the more likely you are to join, and communities
with cliques grow more slowly than those that are not tightly connected.
(homophily), and users can be easily influenced by the friends they trust and prefer their
friends’ recommendations to random recommendations. Objectives of social recommenda-
tion systems are to improve the quality of recommendation and alleviate the problem of
information overload. Examples of social recommendation systems are book recommenda-
tions based on friends’ reading lists on Amazon or friend recommendations on Twitter
and Facebook. More details on social recommendation systems can be found in Konstas
et al. [30], Ma et al. [38], and Backstrom and Leskovec [7].
their friendship networks as widely as possible—in other words, to be open. Hence, social
media poses new security challenges to fend off security threats to users and organizations.
With the variety of personal information disclosed in user profiles (e.g., information about
other users and user networks may be indirectly accessible), individuals may put themselves
and members of their social networks at risk for a variety of attacks. Social media has been
the target of numerous passive as well as active attacks including stalking, cyberbullying,
malvertizing, phishing, social spamming, scamming, and clickjacking.
Gross and Acquisti [22] showed that only a few users change the default privacy prefer-
ences on Facebook. In some cases, user profiles are completely public, making information
available and providing a communication mechanism to anyone who wants to access it. It
is no secret that when a profile is made public, malicious users including stalkers, spam-
mers, and hackers can use sensitive information for their personal gain. Sometimes malev-
olent users can even cause physical or emotional distress to other users (Rosenblum [52]).
Narayanan and Shmatikov [45, 46] demonstrated how users’ privacy can be weakened if an
attacker knows the presence of connections among users. Wondracek et al. [60] presented a
successful scheme to breach privacy by exploiting only the group membership information of
users.
Liu and Maes [36] pointed out a lack of privacy awareness and found a large number
of social network profiles in which people described themselves with a rich vocabulary in
terms of their passions and interests. Krishnamurthy and Wills [31] discussed the problem
of leakage of personally identifiable information and how it can be misused by third parties
(Narayanan and Shmatikov [46]). Squicciarini et al. [53] introduced a novel collective privacy
mechanism for better managing shared content between users. Fang and LeFevre [14] focused
on helping users to understand simple privacy settings, but did not consider additional
problems such as attribute inference (Zheleva and Getoor [64]) or shared data ownership
(Squicciarini et al. [53]). Zheleva and Getoor [64] showed how an adversary can exploit an
online social network with a mixture of public and private user profiles to predict the private
attributes of users. Baden et al. [9] presented a framework where users dictate who may
access their information based on public–private encryption–decryption algorithms.
Social trust depends on many factors that cannot be easily modeled in a computa-
tional system. Many different versions of definition of trust are proposed in the litera-
ture (Deutsch [13], Sztompka [54], Mui et al. [44], Olmedilla et al. [47], Grandison and
Sloman [20], Artz and Gil [6]). A highly cited thesis on trust computation (Marsh [41])
provides theoretical perspectives of modeling trust, but its complex nature makes it very
difficult to apply, especially to social networks (Golbeck and Hendler [18]). Trust between
any two people is observed to be affected by many factors including past experiences, opin-
ions expressed and actions taken, contributions to spreading rumors, influence by others’
opinions, and motives to gain something extra. Another important aspect of trust is the
trustworthiness of user-generated content. Moturu and Liu [43] provided an intuitive scoring
measure to quantify the trustworthiness of health-related user-generated content in social
media.
Index values
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 2.5 5.0 7.5 10 0 2.5 5.0 7.5 10
Users ×104 Users ×104
Figure 2. Performance comparison of V -index values for each user before (+) and after (◦)
unfriending the k most vulnerable friends from his or her social network.
(a) Most vulnerable (b) 2 most vulnerable
0.6 0.6
Index values
Index values
0.4 0.4
0.2 0.2
0 0
0 1 2 3 0 1 2 3
Users ×105 Users ×105
0.6 0.6
Index values
Index values
0.4 0.4
0.2 0.2
0 0
0 1 2 3 0 1 2 3
Users ×105 Users ×105
Gundecha and Liu: Mining Social Media: A Brief Introduction
10 Tutorials in Operations Research, c 2012 INFORMS
social behavior from a spatial–temporal aspect, which in turn enables a variety of services
including place advertisement or recommendation, traffic forecasting, and disaster relief.
To understand a user’s check-in behavior, it is inevitable to perform a historical analysis of
users. It is because the historical check-ins provide rich information about a user’s interests
and hints about when and where a particular user would like to go. In addition, social
correlation theory suggests to consider users’ social ties, because human movement is usually
affected by their social events, such as visiting friends, going out with colleagues, and so on.
These two relationship ties can shape the user’s check-in experience on LBSNs, and each
tie gives rise to a different probability of check-in activity, which indicates that people in
different spatial–temporal–social circles have different interactions.
The historical ties of a user’s check-in behavior have two properties on LBSNs. First,
a user’s check-in history approximately follows a power-law distribution; i.e., a user goes
to a few places many times and to many places a few times. Second, the historical ties
have a short-term effect. Taking advantage of the similarity between language modeling and
location-based social network mining, the work of Gao et al. [16] introduced the Pitman–Yor
process to location-based social networks to model the historical ties of a user i for his check-
in behavior cn+1 = l at time (n + 1) and location l, specifically, the power-law distribution
and short-term effect of historical ties, denoted as historical model (HM) as shown below:
i i, i
PH (cn+1 = l) = PHP Y (cn+1 = l | u, tul , d|u| , r|u| , tu ),
where u, tul , d|u| , r|u| , and tu are parameters. A social–historical model (SHM) is proposed
to explore user i’s check-in behavior integrating both of the social and historical effects:
i i
PSH (cn+1 = l) = ηPH (cn+1 = l) + (1 − η)PSi (cn+1 = l),
where
X i, j
PSi (cn+1 = l) = sim(ui , uj )PHP Y (cn+1 = l),
uj ∈N (ui )
0.40 MFC
MFT
Order-1
Order-2
0.35
HM
SHM
Prediction accuracy
0.30
0.25
0.20
0.15
10 20 30 40 50 60 70 80 90
Fraction of training set
Note. MFC, most frequent check-in model; MFT, most frequent time model; Order-1, order-1 Markov model;
Order-2, order-2 Markov model.
friend of user 1. This shows that trust relationships in different categories vary. Thus, people
trust others differently in different facets.
There are two challenges to study in obtaining multifaceted trust between users: first,
the representation of multiple and heterogeneous trust relationships between users, and
second, estimating the strength of multifaceted trust. Traditionally, trust is represented by
an adjacency matrix. However, this cannot capture the multifaceted trust relations. Tang
et al. [57] developed a new algorithm, mTrust, that extends a matrix representation to
a tensor representation, adding an extra dimension for facet description. Previous work
observed a strong correlation between trust and user similarity in the context of rating
systems. Therefore, it is reasonable to embed trust strength inference in rating prediction.
Thus, to evaluate the usefulness of multifaced trust, this work embeds the multifaceted trust
inference in the framework of rating prediction.
Interesting findings from the experiments are that (1) more than 20% of reciprocal links are
heterogeneous, (2) more than 14% transitive trust relations are heterogeneous, and (3) more
Figure 4. Single trust and multifaceted trust relationships of one user in Epinions.
(a) Single trust (b) Trust in home and garden (c) Trust in restaurants
6 7 6
5 5 7 5 6 7
4 8 8 8
4 4
3 9 9 9
3 3
10 10
2 2 2 10
11 11 1 11
1 1
12
12 12
21
13 21 21
13 13
20 20
14 20 14 14
19 19
15 19 15 15
18 16 18 18
17 17 16 17 16
Pajek
Pajek Pajek
than 11% of cocitation trust relations are heterogeneous. With these findings, mTrust can
be applied to many online tasks such as improving rating prediction, enabling facet-sensitive
ranking, and making status theory applicable to reciprocal links.
analysis of the collected tweets via real-time trending, data reduction, historical review, and
integrated data mining techniques.
TweetTracker consists of three main components: (1) a Twitter stream reader, (2) a data
storage module, and (3) a data mining and visualization module. The Twitter stream reader
is a data collection module that continually crawls tweets through the Twitter streaming
API (Application Programming Interface).14 Tweets are filtered based on user-specified
keywords, hashtags, and geolocations. The data storage module is responsible for storing and
indexing the collected tweets into a relational database for use by the visualization module.
The data mining and visualization module is a Web-based user interface to the collected
tweets and a means to analyze the collected tweets. It provides geospatial visualization of
tweets related to a particular event on a map, summarizes the tweets, and visualizes the
trending keywords in the form of a word cloud, and it can identify popular resources (URLs)
and users mentioned in the tweets. The tool also includes built-in language translation
support for monitoring of multilingual tweets.
TweetTracker has been used in tracking, visualizing, and analyzing activities including the
Arab Spring movement, the Occupy Wall Street movement, and various natural disasters
such as earthquakes and cholera outbreaks.
5. Summary
Valuable information is hidden in vast amounts of social media data, presenting ample
opportunities social media mining to discover actionable knowledge that is otherwise difficult
to find. Social media data are vast, noisy, distributed, unstructured, and dynamic, which
poses novel challenges for data mining. In this tutorial, we offer a brief introduction to
mining social media, use illustrative examples to show that burgeoning social media mining
is spearheading the social media research, and demonstrate its invaluable contributions to
real-world applications.
As a main type of “big data,” social media is finding its many innovative uses, such as
political campaigns, job applications, business promotion and networking, and customer
services, and using and mining social media is reshaping business models, accelerating viral
marketing, and enabling the rapid growth of various grassroots communities. It also helps
in trend analysis and sales prediction. Social media data will continue their rapid growth
in the foreseeable future. We are faced with an increasing demand for new algorithms and
social media mining tools. Existing preliminary success in social media mining research
efforts convincingly demonstrates the promising future of the emerging social media mining
community and will help to expand research and development and explore online and off-line
human behavior and interaction patterns.
Acknowledgments
The authors thank Huiji Gao, Shamanth Kumar, Jiliang Tang, and DMML members for their
assistance and feedback in preparing this tutorial. Some projects described in this brief introductory
survey were sponsored by the Office of Naval Research [ONR N000141010091]; the Army Research
Office [ARO 025071]; and the National Science Foundation [Grant 0812551].
References
[1] M. A. Abbasi, S. Kumar, J. A. Andrade Filho, and H. Liu. Lessons learned in using social media
for disaster relief—ASU Crisis Response Game. Proceedings of the International Conference
on Social Computing, Behavioral-Cultural Modeling, and Prediction. Springer-Verlag, Berlin,
282–289, 2012.
[2] N. Agarwal and H. Liu. Modeling and Data Mining in Blogosphere. Morgan & Claypool Pub-
lishers, San Rafael, CA, 2009.
[3] N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Identifying the influential bloggers in a community.
Proceedings of the International Conference on Web Search and Web Data Mining. Association
for Computing Machinery, New York, 207–218, 2008.
[4] N. Agarwal, H. Liu, L. Tang, and P. S. Yu. Modeling blogger influence in a community. Social
Network Analysis and Mining 2(2):139–162, 2012.
[5] S. Aral, L. Muchnik, and A. Sundararajan. Distinguishing influence-based contagion from
homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sci-
ences of the United States of America 106(51):21544, 2009.
[6] D. Artz and Y. Gil. A survey of trust in computer science and the Semantic Web. Web Seman-
tics: Science, Services and Agents on the World Wide Web 5(2):58–71, 2007.
[7] L. Backstrom and J. Leskovec. Supervised random walks: Predicting and recommending links
in social networks. Proceedings of the Fourth ACM International Conference on Web Search
and Data Mining. Association for Computing Machinery, New York, 635–644, 2011.
[8] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. Group formation in large social
networks: Membership, growth, and evolution. Proceedings of the 12th ACM SIGKDD Inter-
national Conference on Knowledge Discovery and Data Mining. Association for Computing
Machinery, New York, 44–54, 2006.
[9] R. Baden, A. Bender, N. Spring, B. Bhattacharjee, and D. Starin. Persona: An online
social network with user-defined privacy. ACM SIGCOMM Computer Communication Review
39(4):135–146, 2009.
[10] N. T. J. Bailey. The Mathematical Theory of Infectious Diseases and Its Applications. Charles
Griffin, High Wycombe, UK, 1975.
[11] E. Berger. Dynamic monopolies of constant size. Journal of Combinatorial Theory, Series B
83(2):191–200, 2001.
[12] Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. Who is tweeting on Twitter: Human, bot,
or cyborg? Proceedings of the 26th Annual Computer Security Applications Conference. Asso-
ciation for Computing Machinery, New York, 21–30, 2010.
[13] M. Deutsch. Cooperation and trust: Some theoretical notes. Nebraska Symposium on Motiva-
tion. University of Nebraska Press, Lincoln, 1962.
[14] L. Fang and K. LeFevre. Privacy wizards for social networking sites. Proceedings of the 19th
International Conference on World Wide Web. Association for Computing Machinery, New
York, 351–360, 2010.
[15] H. Gao, G. Barbier, and R. Goolsby. Harnessing the crowdsourcing power of social media for
disaster relief. IEEE Intelligent Systems 26(3):10–14, 2011.
[16] H. Gao, J. Tang, and H. Liu. Exploring social-historical ties on location-based social net-
works. Proceedings of the 6th International AAAI Conference on Weblogs and Social Media.
Association for the Advancement of Artificial Intelligence, Palo Alto, CA, 2012.
[17] H. Gao, X. Wang, G. Barbier, and H. Liu. Promoting coordination for disaster relief: From
crowdsourcing to coordination. Proceedings of the 4th Conference on Social Computing,
Behavioral-Cultural Modeling and Prediction. Springer-Verlag, Berlin, 197–204, 2011.
[18] J. Golbeck and J. Hendler. Inferring binary trust relationships in web-based social networks.
ACM Transactions on Internet Technology 6(4):497–529, 2006.
[19] R. Goolsby. Social media as crisis platform: The future of community maps/crisis maps. ACM
Transactions on Intelligent Systems and Technology 1(1):1–11, 2010.
[20] T. Grandison and M. Sloman. A survey of trust in Internet applications. IEEE Communications
Surveys & Tutorials 3(4):2–16, 2009.
[21] M. Granovetter. Threshold models of collective behavior. American Journal of Sociology
83(6):1420–1443, 1978.
[22] R. Gross and A. Acquisti. Information revelation and privacy in online social networks. Pro-
ceedings of the 2005 ACM Workshop on Privacy in the Electronic Society. Association for
Computing Machinery, New York, 71–80, 2005.
[23] P. Gundecha, G. Barbier, and H. Liu. Exploiting vulnerability to secure user privacy on a social
networking site. The 17th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining. Association for Computing Machinery, New York, 2011.
Gundecha and Liu: Mining Social Media: A Brief Introduction
16 Tutorials in Operations Research, c 2012 INFORMS
[24] J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann,
San Francisco, 2011.
[25] M. Hu and B. Liu. Mining and summarizing customer reviews. Proceedings of the Tenth ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for
Computing Machinery, New York, 168–177, 2004.
[26] N. Jindal and B. Liu. Identifying comparative sentences in text documents. Proceedings of the
29th Annual International ACM SIGIR Conference on Research and Development in Informa-
tion Retrieval. Association for Computing Machinery, New York, 244–251, 2006.
[27] N. Jindal and B. Liu. Opinion spam and analysis. Proceedings of the International Conference
on Web Search and Web Data Mining. Association for Computing Machinery, New York,
219–230, 2008.
[28] A. M. Kaplan and M. Haenlein. Users of the world, unite! The challenges and opportunities
of social media. Business Horizons 53(1):59–68, 2010.
[29] D. Kempe, J. Kleinberg, and É. Tardos. Maximizing the spread of influence through a social
network. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining. Association for Computing Machinery, New York, 137–146, 2003.
[30] I. Konstas, V. Stathopoulos, and J. M. Jose. On social networks and collaborative recom-
mendation. Proceedings of the 32nd International ACM SIGIR Conference on Research and
Development in Information Retrieval. Association for Computing Machinery, New York,
195–202, 2009.
[31] B. Krishnamurthy and C. E. Wills. On the leakage of personally identifiable information via
online social networks. ACM SIGCOMM Computer Communication Review 40(1):112–117,
2010.
[32] S. Kumar, R. Zafarani, and H. Liu. Understanding user migration patterns across social media.
Twenty-Fifth International Conference on Artificial Intelligence. Association for the Advance-
ment of Artificial Intelligence, Palo Alto, CA, 2011.
[33] T. La Fond and J. Neville. Randomization tests for distinguishing social influence and
homophily effects. Proceedings of the 19th International Conference on World Wide Web. Asso-
ciation for Computing Machinery, New York, 601–610, 2010.
[34] P. F. Lazarsfeld and R. K. Merton. Friendship as a social process: A substantive and method-
ological analysis. Freedom and Control in Modern Society 18:18–66, 1954.
[35] B. Liu. Sentiment analysis and subjectivity. Handbook of Natural Language Processing. CRC
Press, Boca Raton, FL, 627–666, 2010.
[36] H. Liu and P. Maes. InterestMap: Harvesting social network profiles for recommendations.
Workshop: Beyond Personalization, San Diego, 2005.
[37] H. Liu and H. Motoda. Computational Methods of Feature Selection. Chapman & Hall, Boca
Raton, FL, 2008.
[38] H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regular-
ization. Proceedings of the Fourth ACM International Conference on Web Search and Data
Mining. Association for Computing Machinery, New York, 287–296, 2011.
[39] M. W. Macy. Chains of cooperation: Threshold effects in collective action. American Sociolog-
ical Review 56(6):730–747, 1991.
[40] V. Mahajan, E. Muller, and F. M. Bass. New product diffusion models in marketing: A review
and directions for research. Journal of Marketing 54(1):1–26, 1990.
[41] S. P. Marsh. Formalising trust as a computational concept. Ph.D. thesis, Deptartment of
Computing Science and Mathematics, University of Stirling, Stirling, UK, 1994.
[42] M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social
networks. Annual Review of Sociology 27:415–444, 2001.
[43] S. T. Moturu and H. Liu. Quantifying the trustworthiness of social media content. Distributed
and Parallel Databases 29(3):239–260, 2011.
[44] L. Mui, M. Mohtashemi, and A. Halberstadt. A computational model of trust and reputa-
tion for E-businesses. Proceedings of the 35th Annual Hawaii Conference on System Sciences
(HICSS’02). IEEE Computer Society, Washington, DC, 2431–2439, 2002.
Gundecha and Liu: Mining Social Media: A Brief Introduction
Tutorials in Operations Research, c 2012 INFORMS 17
[45] A. Narayanan and V. Shmatikov. Robust de-anonymization of large sparse datasets. Proceed-
ings of the 2008 IEEE Symposium on Security and Privacy. IEEE Computer Society, Wash-
ington, DC, 111-125, 2008.
[46] A. Narayanan and V. Shmatikov. De-anonymizing social networks. Proceedings of the 2009
IEEE Symposium on Security and Privacy. IEEE Computer Society, Washington, DC,
173–187, 2009.
[47] D. Olmedilla, O. Rana, B. Matthews, and W. Nejdl. Security and trust issues in semantic
grids. Proceedings of the Dagsthul Seminar, Semantic Grid: The Convergence of Technologies
5271:896–902, 2005.
[48] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trendsr in
Information Retrieval 2(1–2):1–135, 2008.
[49] B. Pang, L. Lee, and S. Vaithyanathan. Thumbs up? Sentiment classification using machine
learning techniques. Proceedings of the ACL-02 Conference on Empirical Methods in Natural
Language Processing, Vol. 10. Association for Computational Linguistics, Stroudsburg, PA,
79–86, 2002.
[50] A. M. Popescu and O. Etzioni. Extracting product features and opinions from reviews. Pro-
ceedings of the Conference on Human Language Technology and Empirical Methods in Natural
Language Processing. Association for Computational Linguistics, Stroudsburg, PA, 339–346,
2005.
[51] E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. Proceedings of
the 2003 Conference on Empirical Methods in Natural Language Processing. Association for
Computational Linguistics, Stroudsburg, PA, 105–112, 2003.
[52] D. Rosenblum. What anyone can know: The privacy risks of social networking sites. IEEE
Security and Privacy 5(3):40–49, 2007.
[53] A. C. Squicciarini, M. Shehab, and F. Paci. Collective privacy management in social networks.
Proceedings of the 18th International Conference on World Wide Web. Association for Com-
puting Machinery, New York, 521–530, 2009.
[54] P. Sztompka. Trust: A Sociological Theory. Cambridge University Press, Cambridge, UK, 1999.
[55] P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson Addison Wesley,
Boston, 2006.
[56] L. Tang and H. Liu. Community Detection and Mining in Social Media, Vol. 2. Morgan &
Claypool Publishers, San Rafael, CA, 2010.
[57] J. Tang, H. Gao, and H. Liu. mTrust: Discerning multi-faceted trust in a connected world.
Proceedings of the Fifth ACM International Conference on Web Search and Data Mining. Asso-
ciation for Computing Machinery, New York, 93–102, 2012.
[58] P. D. Turney. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised clas-
sification of reviews. Proceedings of the 40th Annual Meeting on Association for Computational
Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 417–424, 2002.
[59] I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Machine Learning Tools and
Techniques. Morgan Kaufmann, San Francisco, 2011.
[60] G. Wondracek, T. Holz, E. Kirda, and C. Kruegel. A practical attack to de-anonymize social
network users. Proceedings of the 2010 IEEE Symposium on Security and Privacy. IEEE Com-
puter Society, Washington, DC, 223–238, 2010.
[61] S. Yardi, D. Romero, G. Schoenebeck, and D. Boyd. Detecting spam in a Twitter network.
First Monday 15(1):1–4, 2009.
[62] H. Yu and V. Hatzivassiloglou. Towards answering opinion questions: Separating facts from
opinions and identifying the polarity of opinion sentences. Proceedings of the 2003 Confer-
ence on Empirical Methods in Natural Language Processing. Association for Computational
Linguistics, Stroudsburg, PA, 129–136, 2003.
[63] Z. A. Zhao and H. Liu. Spectral Feature Selection for Data Mining. Chapman & Hall/CRC
Press, Virginia Beach, VA, 2012.
[64] E. Zheleva and L. Getoor. To join or not to join: the illusion of privacy in social networks
with mixed public and private user profiles. Proceedings of the 18th International Conference
on World Wide Web. Association for Computing Machinery, New York, 531–540, 2009.