Big Data Privacy Issues in Public Social Media
Big Data Privacy Issues in Public Social Media
Abstract—Big Data is a new label given to a diverse field of data Data applications lie in the new domains of the social web,
intensive informatics in which the datasets are so large that they consumer and business analytics and governmental surveil-
become hard to work with effectively. The term has been mainly lance [6]. In these domains Big Data research is being used
used in two contexts, firstly as a technological challenge when
dealing with data-intensive domains such as high energy physics, to create and analyse profiles of us, for example for market
astronomy or internet search, and secondly as a sociological research, targeted advertisement, workflow improvement or
problem when data about us is collected and mined by companies national security. These are very contentious issues since it
such as Facebook, Google, mobile phone companies, retail chains is entirely up to the controller of the Big Data sets if the
and governments. In this paper we look at this second issue from information gleaned is used for nefarious purposes or not.
a new perspective, namely how can the user gain awareness of
the personally relevant part Big Data that is publicly available in In particular in the context of the social web there is an
the social web. The amount of user-generated media uploaded to increasing awareness of the value, potential and risk of the
the web is expanding rapidly and it is beyond the capabilities of personal data which we voluntarily upload to the web. Where
any human to sift through it all to see which media impacts privacy is concerned there has been a lot of work in the
our privacy. Based on an analysis of social media in Flickr, small data area, i.e. how can users control who has access
Locr, Facebook and Google+, we discuss privacy implications
and potential of the emerging trend of geo-tagged social media. to what they post themselves. However, the Big Data issue in
We then present a concept with which users can stay informed this area has focused almost entirely on what the controlling
about which parts of the social Big Data deluge is relevant to companies do with this information. These concerns are being
them. addressed by calls for regulatory intervention, i.e. regulating
what companies are allowed to do with the data we give them
I. I NTRODUCTION or what data they are allowed to gather about us. A topic
Big Data is becoming a hot topic in many areas where which has not received as much attention is the effect other
datasets are so large that they can no longer be handled peoples’ data has on us. This can be seen both in a social
effectively or even completely [15]. Or put differently, any context, i.e. what happens if friends or acquaintances see this
task which is comparatively easy to execute when operating data and also what happens when companies with Big Data
on a small but relevant set of data, but becomes unmanageable analytics harvest this information. Microsoft’s Scott Charney
when dealing with the same problem with a large dataset offered a very good example during his Keynote speech at
can be classified as a Big Data problem. Typical problems the RSA Conference 2012: If a friend takes a picture of me
encountered when dealing with Big Data include capture, during a volleyball game, shares this picture with other friends
storage, dissemination, search, analytics and visualisation. and one of them uploads the picture to the web, my insurance
The traditional data-intensive sciences such as astronomy, company can find and use that picture against me.1 There have
high energy physics, meteorology, genomics, biological and been reports that insurance companies are looking for just such
environmental research in which peta- and exabytes of data information which could raise premiums or even deny claims.2
are generated are common domain examples. Here even the The same is true for banks and credit rating companies.3
capture and storage of the data is a challenge. But there are In this paper we examine this side of the social media Big
also new domains encroaching on the Big Data paradigm: Data issue. We discuss how the growing proliferation and
data warehousing, Internet and social web search, finance and capabilities of mobile devices is creating a deluge of social
business informatics. Here datasets can be small compared to media which can effect our privacy. Due to the vast amounts
the previous domains, however the complexity of the data can of data being uploaded every day it is next to impossible to
still lead to the classification as a Big Data problem. be aware of everything which effects us. We also discuss a
When looking at privacy issues in the Big Data domain concept which can be used to regain control of some of the
we need to distinguish which of the many Big Data appli- Big Data deluge created by other social web users.
cation domains we are discussing. The traditional Big Data 1 Paraphrased from the Keynote at RSA 2012
applications such as astronomy and other e-sciences usually 2 http://abclocal.go.com/kabc/story?section=news/consumer&id=8422388
operate on non-personal information and as such usually do 3 http://www.betabeat.com/2011/12/13/as-banks-start-nosing-around-
not have significant privacy issues. The privacy critical Big facebook-and-twitter-the-wrong-friends-might-just-sink-your-credit/
Image File
VIII. C ONCLUSION
In this paper we presented an analysis of the threat to an
individual’s privacy that is created by other peoples’ social
media. For this we presented a brief overview of privacy
capabilities of common social media services regarding their
capability of protecting users from other peoples’ activities.
We also conducted an analysis of privacy related metadata,
particularly location data contained in social media and ana-
lysed over 28k real world images from popular social media
sites. Based on this survey we analysed the Big Data privacy
implications and potential of the emerging trend of geo-tagged
social media. We then presented three concepts how this
location information can actually help users to stay in control