0% found this document useful (0 votes)
40 views6 pages

Big Data Privacy Issues in Public Social Media

This document discusses privacy issues related to big data from public social media. It notes that the amount of user-generated content uploaded to social media sites is growing rapidly, creating a "big data" problem for users trying to stay aware of content that impacts their privacy. Specifically, the document examines how photos and information shared by friends or strangers on social networks could potentially be used against individuals by companies like insurance agencies. It categorizes privacy threats as either "home-grown problems" from a user's own uploads, or "big data problems" caused by information shared by others that the individual has no control over or awareness of. The document proposes the need for a concept to help users regain some awareness and control over socially shared big data

Uploaded by

AviNash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views6 pages

Big Data Privacy Issues in Public Social Media

This document discusses privacy issues related to big data from public social media. It notes that the amount of user-generated content uploaded to social media sites is growing rapidly, creating a "big data" problem for users trying to stay aware of content that impacts their privacy. Specifically, the document examines how photos and information shared by friends or strangers on social networks could potentially be used against individuals by companies like insurance agencies. It categorizes privacy threats as either "home-grown problems" from a user's own uploads, or "big data problems" caused by information shared by others that the individual has no control over or awareness of. The document proposes the need for a concept to help users regain some awareness and control over socially shared big data

Uploaded by

AviNash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Big Data Privacy Issues in Public Social Media

Matthew Smith, Christian Szongott Benjamin Henne, Gabriele von Voigt


Distributed Computing & Security Group L3S Research Center
Leibniz Universität Hannover Hannover, Germany
Hannover, Germany Email: {henne, vonvoigt}@l3s.de
Email: {smith,szongott}@dcsec.uni-hannover.de

Abstract—Big Data is a new label given to a diverse field of data Data applications lie in the new domains of the social web,
intensive informatics in which the datasets are so large that they consumer and business analytics and governmental surveil-
become hard to work with effectively. The term has been mainly lance [6]. In these domains Big Data research is being used
used in two contexts, firstly as a technological challenge when
dealing with data-intensive domains such as high energy physics, to create and analyse profiles of us, for example for market
astronomy or internet search, and secondly as a sociological research, targeted advertisement, workflow improvement or
problem when data about us is collected and mined by companies national security. These are very contentious issues since it
such as Facebook, Google, mobile phone companies, retail chains is entirely up to the controller of the Big Data sets if the
and governments. In this paper we look at this second issue from information gleaned is used for nefarious purposes or not.
a new perspective, namely how can the user gain awareness of
the personally relevant part Big Data that is publicly available in In particular in the context of the social web there is an
the social web. The amount of user-generated media uploaded to increasing awareness of the value, potential and risk of the
the web is expanding rapidly and it is beyond the capabilities of personal data which we voluntarily upload to the web. Where
any human to sift through it all to see which media impacts privacy is concerned there has been a lot of work in the
our privacy. Based on an analysis of social media in Flickr, small data area, i.e. how can users control who has access
Locr, Facebook and Google+, we discuss privacy implications
and potential of the emerging trend of geo-tagged social media. to what they post themselves. However, the Big Data issue in
We then present a concept with which users can stay informed this area has focused almost entirely on what the controlling
about which parts of the social Big Data deluge is relevant to companies do with this information. These concerns are being
them. addressed by calls for regulatory intervention, i.e. regulating
what companies are allowed to do with the data we give them
I. I NTRODUCTION or what data they are allowed to gather about us. A topic
Big Data is becoming a hot topic in many areas where which has not received as much attention is the effect other
datasets are so large that they can no longer be handled peoples’ data has on us. This can be seen both in a social
effectively or even completely [15]. Or put differently, any context, i.e. what happens if friends or acquaintances see this
task which is comparatively easy to execute when operating data and also what happens when companies with Big Data
on a small but relevant set of data, but becomes unmanageable analytics harvest this information. Microsoft’s Scott Charney
when dealing with the same problem with a large dataset offered a very good example during his Keynote speech at
can be classified as a Big Data problem. Typical problems the RSA Conference 2012: If a friend takes a picture of me
encountered when dealing with Big Data include capture, during a volleyball game, shares this picture with other friends
storage, dissemination, search, analytics and visualisation. and one of them uploads the picture to the web, my insurance
The traditional data-intensive sciences such as astronomy, company can find and use that picture against me.1 There have
high energy physics, meteorology, genomics, biological and been reports that insurance companies are looking for just such
environmental research in which peta- and exabytes of data information which could raise premiums or even deny claims.2
are generated are common domain examples. Here even the The same is true for banks and credit rating companies.3
capture and storage of the data is a challenge. But there are In this paper we examine this side of the social media Big
also new domains encroaching on the Big Data paradigm: Data issue. We discuss how the growing proliferation and
data warehousing, Internet and social web search, finance and capabilities of mobile devices is creating a deluge of social
business informatics. Here datasets can be small compared to media which can effect our privacy. Due to the vast amounts
the previous domains, however the complexity of the data can of data being uploaded every day it is next to impossible to
still lead to the classification as a Big Data problem. be aware of everything which effects us. We also discuss a
When looking at privacy issues in the Big Data domain concept which can be used to regain control of some of the
we need to distinguish which of the many Big Data appli- Big Data deluge created by other social web users.
cation domains we are discussing. The traditional Big Data 1 Paraphrased from the Keynote at RSA 2012
applications such as astronomy and other e-sciences usually 2 http://abclocal.go.com/kabc/story?section=news/consumer&id=8422388
operate on non-personal information and as such usually do 3 http://www.betabeat.com/2011/12/13/as-banks-start-nosing-around-
not have significant privacy issues. The privacy critical Big facebook-and-twitter-the-wrong-friends-might-just-sink-your-credit/

978-1-4673-1703-0/12/$31.00 ©2013 IEEE


II. E NVIRONMENT & P ROBLEM S TATEMENT oversight and an issue which will gain importance as the
mobile smart device boom continues.
The amount of social media being uploaded into the web is III. T HREAT A NALYSIS
growing rapidly and there is still no end to this trend in sight.
We categorise privacy issues into two classes. Firstly, home-
The ease-of-use of modern smartphones and the proliferation
grown problems: Someone uploads a piece of compromising
of high-speed mobile networks is facilitating a culture of
media of himself with insufficient protection or forethought
spontaneous and carefree uploading of user-generated content.
which causes damage to his own privacy. A prime example
To give an idea of the scale of this phenomenon: Just in the
of this category is someone uploading compromising pictures
last two years the number of photos uploaded to Facebook
of himself into a public album instead of a private one or
per month has risen from 2 billion to over 6 billion [9], [10].
onto his Timeline instead of a message. The damage done in
From a personal perspective an overwhelming majority of
these cases is very obvious since the link between the content
these photos have no privacy relevance for oneself. Finding
and the user is direct and the audience (often the peer circle)
the few that are relevant is a daunting task.
has direct interest in the content. One special facet of this
While one’s own media is uploaded consciously, the flood problem is that what is considered damaging content by the
of media uploaded by others is so huge that it is almost user can and often does change over time. While this is a
impossible to stay aware of all media in which one might serious problem, especially amongst the Facebook generation,
be depicted. This can be classified as a Big Data problem on this issue is a small data problem and thus is not the focus of
the users side, however not on the provider side. Current social this work.
networks and photo-sharing sites mainly focus on the privacy Secondly we have the Big Data problems created by others:
of users’ own media in terms of access control, but do little An emerging threat to users’ online privacy comes from other
to deal with the privacy implications created by other users’ users’ media. What makes this threat particularly bad is the
media. There are ever more complex settings allowing a user fact that the person harmed is not involved in the uploading
to decide who is allowed to see what content but only content process and thus cannot take any pre-emptive precautions and
owned by the user [5], [7]. However, the issue of staying on the amount of data being uploaded is so vast it cannot be
top of what others are uploading (mostly in good faith), that manually sighted. Also there are currently no countermeasures,
might also be relevant to the user, is still very much outside except post-priory legal ones, to prevent others from uploading
the control of that user. Social networks which allow tagging potentially damaging content about someone. There are two
of users usually inform affected users when they are tagged. requirements for this form of privacy threat to have an effect:
However, if no such tagging is done by the uploader or a third Firstly, to cause harm to a person a piece of media needs
party, there are currently no mechanisms to inform users of to be able to be associated/linked to the person in some
relevant media. way. This link can either be non-technical, such as being
A second important emerging trend is the capability of many recognisable in a photo, or technical such as a profile being
modern devices to embed geo-data and other metadata into the (hyper-)linked to a photo. There is also the grey area of textual
created content. While the privacy issues of location-based references to a person near to the photo or embedded in the
services such as Foursquare or Qype have been discussed metadata of the photo. This metadata does not directly create a
at great length, the privacy issues of location information technical link to a profile, but it opens the possibility for search
embedded into uploaded media have not yet received much engines to index the information and make it searchable,
attention. There is one very significant difference between thus creating a technical link. Secondly, a piece of media in
these two categories. In the first, users reveal their current question must contain harmful content for the person linked
location to access online services, such as Google Maps, to it. This can again be non-technical such as being depicted
Yelp or Qype or the user actually publishes his location on in a compromising way. However, more interestingly it can
a social network site like Foursquare, Google Latitude or also be technical. In these cases metadata or associated data
Facebook Places. In this category, the user mainly affects his causes harm. For instance time and location data can indicate
own privacy. There is a large body of work examining privacy that a person has been at an embarrassing location, took part
preserving techniques to protect a user’s own privacy, ranging in a political event, or was not where he said he was.
from solutions which are installed locally on the user’s mobile Since the uploading of this type of damaging media cannot
device [2], to solutions which use online services relying on be effectively prevented, awareness is the key issue in com-
group-based anonymisation algorithms, as for instance mix bating this emerging privacy problem.
zones [3] or k-anonymity [13].
The second category is created by media which contains A. Awareness of Damaging Media in Big Datasets
location information. This can have all the same privacy Most popular social networks and media sharing sites allow
implications for the creator of the media, however, a critical users to tag objects and people in their uploaded media.
and hitherto often overlooked issue is the fact, that the location Additionally, some services also extract embedded metadata
and other metadata contained in pictures and videos can also and use this information for indexing and linking. Media is
affect other people than the uploader himself. This is a critical annotated with names, comments, or is directly linked to users’

978-1-4673-1703-0/12/$31.00 ©2013 IEEE


profiles. In particular the direct linking of profiles to pictures Picasa Web & Google+ store the complete EXIF metadata
was initially met with an outcry of privacy concerns since of all images. It is accessible by everyone who can access the
it greatly facilitated finding information about people. For image. The access to images is regulated on a per-album base.
this reason, social networks quickly introduced the option to It can be set to public, restricted to people who know the secret
prevent people from linking them in media. However, there URL to the album, or to the owner only. A noteworthy feature
is also a positive side to this form of direct linking since the is that geo-location data can be protected separately. Google+
linked person is usually made aware about the linked media and Picasa Web allow the tagging of people in images.
and can then take action to have unwanted content removed or Locr is a geo-tagging focused photo-sharing site. As such,
restrict the visibility of the link. While the privacy mechanism location information is included in most images. By default all
of current services are still limited, hidden and often confusing, metadata is retained in all images. Access control is set on a
once the link is made the affected people can take action. per image basis. Anybody who can see an image can also see
A more critical case in our view is the non-linked tagging the metadata. There are also extensive location-based search
of photos. In this case a free text tag contains identifying options. Geo-data is extracted from uploaded files or set by
information and/or malicious comments. However there is no people on the Locr website. Locr uses reverse geocoding to
automated mechanism to inform a user that he was named in add textual location information to images in its database.
or near a piece of media. The named person might not even Instagram and PicPlz are services/mobile apps that allow
be a member of the service where the media was uploaded. posting images in a Twitter like way. Resized images stripped
The threat of this kind of linking is significantly different to of metadata but with optional location data are stored by the
the one depicted above. While the immediate damage can be services. Additionally they allow the posting of photos to
smaller since no automated notification is sent to friends of different services like Flickr, Facebook, Dropbox, Foursquare
the user, the threat can remain hidden far longer. The person and more. Depending on the service used metadata is stored
can remain unaware of this media whereas others can stumble or discarded. For instance, when uploading a photo to Flickr
upon it or be informed by peers. metadata is stripped from the actual file, but title, description
The final case of damaging media does not contain any as well as geo location are extracted from the image and can
technical link. Without any link to the person in question be set by the user. In contrast, the Hipstamatic mobile app
this kind of media can only cause harm to the person if preserves in-file metadata when uploading images to Flickr.
someone having some contact to that person stumbles across
it and makes the connection. While the likelihood of causing V. S URVEY OF M ETADATA IN S OCIAL M EDIA
noticeable harm is smaller, it is still possible. The viral To underpin the growing prevalence of privacy-relevant
spreading of media has caused serious embarrassment and metadata and location data in particular and to judge potential
harm in real world cases. The critical issue here is that there dangers and benefits based on real-world data we analysed
is currently no way for a person to pro-actively search for this a set of 20,000 publicly available Flickr images and their
kind of media in the Big Data deluge to mitigate this threat. metadata. Flickr was chosen as the premiere photo-sharing
website, because it can be legally crawled, offers the full
IV. A NALYSIS OF S ERVICE P RIVACY extent of privacy mechanisms and does not remove metadata in
The following section overviews our privacy analysis of general. We crawled one photo each from 20k random Flickr
different media hosting sites. It includes media access control users. Of these, 68.8% were Pro users where the original file
as well as metadata handling. could be accessed as well. For the others only the metadata
Flickr provides the most fine-grained privacy/access control available via the Flickr API was accessed. This includes data
settings of all analysed services. Privacy settings can be de- automatically extracted from EXIF data during upload and
fined on metadata as well as the image itself. One particularly data manually added via website or semi-automatically set by
interesting feature of Flickr is the geo-fence. The geo-fence client applications. 23% of the 20k users denied access to their
feature enables users to define privacy regions on a map by extracted EXIF data in the Flickr database. We also took a set
putting a pin on it and setting a radius. Access to GPS data of 3,000 images made with a camera phone from 3k random
of the user’s photos inside these regions is only allowed for a mobile Flickr users. 46.8% of the mobile users were Pro users
restricted set of users (friends, family, contacts). Flickr allows and only 2% denied access to EXIF data in the Flickr database.
its users to tag and add people to images. If a user revokes a GPS location data was present in 19% of the 20k dataset
person tag of himself in an image, no one can add the person and in 34% of the 3k mobile phone dataset. While Flickr
to that image again. hosts many semi-professional DSLR photos, mobile phones
Facebook extracts the title and description of an image from are becoming the dominant photo generation tool with the
metadata during the upload process of some clients. All photos iPhone 4 currently being the most common camera on Flickr
are resized after upload and metadata is stripped off. Facebook [12]. Textual location information like street or city names
uses face recognition for friend tagging suggestions based are currently not used much on Flickr. However, as reverse
on already tagged friends. Access to images is restricted by geocoding becomes more common in client applications this
Facebook’s ever changing, complex and sometimes abstruse will change (cf. Locr in Figure 2). To evaluate the potential
privacy settings [5], [7]. privacy impact, we manual checked which photos contained

978-1-4673-1703-0/12/$31.00 ©2013 IEEE


50 %
Metadata in random Flickr Original Photos
Original In-File Metadata and analysed the metadata in the images plus the images’
HTML pages built from the Locr database. Figure 2 shows
40 %
the results from this dataset. Particularly interesting is the
30 % high rate of non-GPS based location information. This is a
trend to watch since most location stripping mechanisms only
20 % strip GPS information and leave other text-based tags intact.
Furthermore, the amount of camera ID meta-data is notable
10 %
since these IDs can be used to link different pictures and infer
0%
meta-data even if data has been stripped from some of the
GPS Location City People Artist Description Tags Headline photos.
5,761 random orig. photos (left) vs. 1,050 mobile user random orig. photos (right)
To summarise, one third of the pictures taken by dominant
camera devices contains GPS information. About one third
Fig. 1. Public privacy-related metadata in 5.7k random and 1k mobile user of these images depict people on it. Thus, about 10% of all
original Flickr photos
the photos could harm other peoples’ privacy without them
knowing about it.
people and geo-reference, but no user profile tags – i.e. images VI. L OCATION BASED B IG DATA H ANDLING
which could contain people who are unaware of the photo. In
the set of 20k images we found 16% and in in the set of 3k In the previous sections we discussed some of the privacy
mobile photos we found 28% fulfilling this criteria. issues created by the other peoples’ media in particular the
We further analysed the subset of images which were privacy issues created by location information. In this section
available from Pro users, since these can contain the unaltered we discuss how this information can also be used to improve
metadata from the camera. From the 20k dataset, 5761 images a user’s privacy by raising awareness about potentially com-
contained in-file metadata. From the 3k dataset, 1050 images promising media and thus enabling the user to stay on top
contained in-file metadata. Figure 1 shows the percentage of the Big Data wave. We propose leveraging the location
values for the different types of metadata contained in the tracking capability of modern smartphones to create smart
files. For the rest of the images metadata was either manually privacy zones in which the user is informed about media
removed by the uploader or the image never has had any in the events. As was shown above, the number of images which
first place. Of the 20k dataset only 3% of the in-file metadata contain location information is already substantial and it is
contained GPS data compared to 32% from the mobile 3k likely to grow further. If a user’s phone keeps a GPS record
dataset. This shows a clear dominance of mobile devices of where the person was at which time, these two pieces of
when it comes to publishing GPS metadata. This itself is information can be combined with the location data stored in
unsurprising since most compact and DSLR cameras currently the media to significantly reduce the amount of data which
do not have GPS receivers and only few photographers add could be relevant to the individual person.
external GPS devices to these cameras – but this will likely A. Design
change with future cameras. However, combined with the fact
The privacy awareness concept consists of a watchdog client
that mobile phones are becoming the dominant type of camera
a server side watchdog service. Using a GPS-enabled mobile
where it comes to the number of published pictures, it is to be
device a user can activate the watchdog client to locally track
expected that the amount of GPS data available for scrutiny,
his position during times he considers relevant to his privacy.
use and abuse will rise further.
Then his device can request the privacy watchdog to show
100 % him media that could potentially affect him whenever the user
HTML Page
is interested in the state of his online privacy. For this the
Metadata in 5,000 random Locr Photos

Image File

80 % watchdog client sends the location traces to the watchdog


server or servers which then respond with a list of media which
60 % was taken in the vicinity (both spatially and temporally) of the
user.
40 %
The watchdog service can be operated in three different
ways. The first two would be value-added services which can
20 %
be offered by the media sharing sites or social networks (SN)
0%
themselves. In both these cases the existing services would
Artist Camera Keywords
ID
GPS Location/
Location Street
City need to be extended by an API to allow users to search for
media by time and location.
Fig. 2. Public privacy-related metadata of 5k random photos from Locr The first type of service would do this via the regular user
service account. Thus, it would able to see all public pictures, but
also all pictures restricted in scope but visible to the user.
We also collected a 5k dataset of random photos from Locr The benefit of this service type is that through the integration

978-1-4673-1703-0/12/$31.00 ©2013 IEEE


with the account of the user pictures which aren’t publicly the fact that to request the relevant media a user must send
available can be searched. These private pictures are typically location information to the watchdog service.
from social network friends and thus the likelihood of pictures When using the first type of service, there is little which can
involving the user is higher and the scope of people able to be done to protect the location privacy of the user since the
view the pictures more relevant. This type of service also has correlation between the location query and the user account is
the benefit-of-sorts that the location information is valuable to direct. One option to protect the privacy to a degree can be an
the SN, so it has an incentive to offer this kind of value-added- obfuscation approach. For every true query a number of fake
service. The privacy downside of searching with the user’s queries could also be sent, making it less easy (but far from
account is that the SN receives the information of when and impossible) for the SN provider to ascertain the true location.
where a user was. While there are certainly users who would However, this approach does not scale well for two reasons.
not mind this information being sent to their SN if it means Firstly, it creates a higher load on the SN. However, it is more
they get to see and remove embarrassing photos in a timely critical that the likelihood deducing the true location rises, if
manner, there are also certainly users who do not wish their many queries are sent unless great care is taken in creating
SN to know when and where they were, particularly amongst the fake paths and masking the source IP addresses. As such
the clientele who wish to protect their online privacy. this type of service should only be used if the user is willing
The second type of watchdog service would also be operated to ”swap” their location data for the best possible update on
by the SN. However, it does not require a user account to uploaded media.
do the search and can be queried anonymously. This type of Protecting the user’s privacy in the second case is simpler.
service would thus have a smaller scope, since it can only Since the queries do not require an account the only way
access publicly available media. A further drawback of this a user could be tracked directly is his IP address. Using an
type of service is that there is less of an incentive for the SN anonymising service such as TOR sequential queries cannot
provider to implement such a service. While there are sites be linked together and creating a tracking profile becomes
such as Locr that allow such queries, most SN sites do not. significantly harder. The anonymous trace data can of course
Without outside pressure there is less intrinsic value for them still be used by the SN, but the missing link to the user makes
to include such a service compared to the first type. it less critical for the user himself. The third type of service is
probably the most interesting privacy wise, since the economic
The third type of service would be a stand-alone service
model behind the service will significantly impact the privacy
which can be operated by a third party. The stand-alone
techniques which can be applied to this model. Most payment
service operates like an indexing search machine, which
models would require user credentials to log in and thus would
crawls publicly available media and its metadata and allows
allow the watchdog service provider to track the user. In this
this database to be queried. Possible incentive models for
case it would have to be the reputation of the service provider
this approach include pay-per-use, subscription, ad-based or
which the user would have to trust, similar to the case of
community services. The visibility scope would be the same
commercial anonymising proxies. In an ad-based approach no
as for the second type of service.
user-credentials are needed, thus it would be possible to use
All three types of service are mainly focused on detect- the service anonymously via TOR. If so, the watchdog client
ing relevant media events and breaking down the Big Data would need to be open source to ensure that no secret tracking
problem to humanly manageable sizes. The concept is mainly information is stored there. In community-based approaches a
focused on bringing possibly relevant media to the attention privacy model like that in TOR can be used, to ensure none
of the user without overburdening him. The system does not of the participating nodes gets enough information to track a
explicitly protect from malicious uploads with which the up- single user.
loader is intentionally trying to harm another while attempting Each of these proposed service types has different privacy
the activity at the same time. Even though the watchdog benefits and disadvantages and the trade-off between the two
service proposed here could make the subterfuge harder for is an interesting area for research.
the malicious uploader. But even without full protection from
malicious activity we believe that such a watchdog would VII. R ELATED W ORK
improve the current state of the art by enabling users to gain Two services worth mentioning which collect the type of
better awareness of the relevant part of the social media Big information needed for a privacy watchdog are SocialCamera
Dataset. and Locaccino. SocialCamera [17] is a mobile app that detects
1) Privacy Analysis: These different types of watchdog faces in the picture and tries to recognise them with the
service can help to reduce the number of relevant pieces of help of Facebook profile pictures of persons that are in your
media a user needs to keep an eye on if they don’t want friends list. Recognised people can be automatically tagged
uncontrolled media of themselves to be online. However, the and pictures instantly uploaded to Facebook. Locaccino [8] is
devil is in the detail since this form of service can also have a Foursquare type application which allows users to upload
serious privacy implications itself if designed in the wrong location-based information into Facebook. These two apps
way. Care must be taken to facilitate the different usage and show the willingness of users to share this kind of information
privacy requirements of different users. The critical issue is in the social web.

978-1-4673-1703-0/12/$31.00 ©2013 IEEE


Ahern et al. analyse in their work [1] privacy decisions of the flood of potentially harmful or interesting social media
of mobile users in the photo sharing process. They identify uploaded by others.
relationships between the location of the photo capture and
R EFERENCES
the corresponding privacy settings. They recommend the use
context information to help users to set privacy preferences and [1] S. Ahern, D. Eckles, N. Good, S. King, M. Naaman, and R. Nair. Over-
exposed?: privacy patterns and considerations in online and mobile photo
to increase the users’ awareness of information aggregation. sharing. In Proceedings of the SIGCHI conference on Human factors
Work by Fang and LeFevre [11] focuses on helping the user in computing systems, pages 357–366, 2007.
to find appropriate privacy settings in social networks. They [2] C. a. Ardagna, M. Cremonini, S. De Capitani di Vimercati, and P. Sama-
rati. An Obfuscation-Based Approach for Protecting Location Privacy.
present a system where the user initially only needs to set IEEE Transactions on Dependable and Secure Computing, 8(1):13–27,
up a few rules. Through the use of active machine learning June 2009.
algorithms the system helps the user to protect private infor- [3] A. Beresford and F. Stajano. Location privacy in pervasive computing.
IEEE Pervasive Computing, 2(1):46–55, Jan. 2003.
mation based on the individual behaviour and taste. In [14] [4] A. Besmer and H. Richter Lipford. Moving beyond untagging. In
Mannan et al. address the problem, that private user data is not Proceedings of the 28th international conference on Human factors in
only shared within social networks, but also through personal computing systems - CHI ’10, page 1563, Apr. 2010.
[5] D. Boyd. Facebook’s privacy trainwreck: Exposure, invasion, and social
web pages. In their work they focus on a privacy-enabled convergence. Convergence: The International Journal of Research into
web content sharing and utilise existing instant messaging New Media Technologies, 14(1):13–20, 2008.
friendship relations to create and enforce access policies. [6] d. Boyd and K. Crawford. Six Provocations for Big Data. SSRN
eLibrary, 2011.
The three works shown above all focus on protecting a [7] D. Boyd and E. Hargittai. Facebook privacy settings: Who cares? First
user’s privacy based on dangers created by the user himself Monday, 15(8), 2010.
while sharing media. They do not discuss how users can be [8] Carnegie Mellons Mobile Commerce Laboratory. Locaccino - a user-
controllable location-sharing tool. http://locaccino.org/, 2011.
protected from other peoples’ media. This is prevalent for most [9] E. Eldon. New Facebook Statistics Show Big Increase in Content
of the research work done in this area. Sharing, Local Business Pages. http://goo.gl/ebGQH, February 2010.
[10] Facebook. Statistics. 2011. http://www.facebook.com/press/info.php?
Besmer et al [4] present work which allows users that are statistics.
tagged in photos to send a request to the owner to hide [11] L. Fang and K. LeFevre. Privacy wizards for social networking sites.
the linked photo from certain people. This approach also In Proceedings of the 19th international conference on World wide web
- WWW ’10, page 351. ACM Press, Apr. 2010.
follows the idea that forewarned is forearmed and that creating [12] Flickr. Camera Finder. http://www.flickr.com/cameras, October 2011.
awareness of critical content is the first step towards the [13] B. Gedik and L. Liu. Protecting Location Privacy with Personalized k-
solution of the problem. However the work relies on direct Anonymity: Architecture and Algorithms. IEEE Transactions on Mobile
Computing, 7(1):1–18, Jan. 2008.
technical tags and as such does not cover the same scope as [14] M. Mannan and P. C. van Oorschot. Privacy-enhanced sharing of
the privacy watchdog suggested in this paper. personal content on the web. In Proceeding of the 17th international
Work that also takes into account other users’ media is conference on World Wide Web - WWW ’08, page 487. ACM Press,
April 2008.
presented by Squicciarini et al. [16]. They postulate that most [15] Metadata Working Group. Gartner special report - pattern-
of the shared data does not only belong to a single user. based strategy: Getting value from big data. www.gartner.com/
Therefore they propose a system to share the ownership of patternbasedstrategy, 2012.
[16] A. C. Squicciarini, M. Shehab, and F. Paci. Collective privacy man-
media items and by that strive to establish a collaborative agement in social networks. In Proceedings of the 18th international
privacy management for shared content. Their prototype is conference on World wide web - WWW ’09, page 521. ACM Press, Apr.
implemented as a Facebook app and is based on game theory, 2009.
[17] Viewdle. SocialCamera. http://www.viewdle.com/products/mobile/
rewarding users that promote co-ownerships of media items. index.html.
While this work does take into account other users’ media,
unlike our approach it does not cope with previously unknown
and unrelated users.

VIII. C ONCLUSION
In this paper we presented an analysis of the threat to an
individual’s privacy that is created by other peoples’ social
media. For this we presented a brief overview of privacy
capabilities of common social media services regarding their
capability of protecting users from other peoples’ activities.
We also conducted an analysis of privacy related metadata,
particularly location data contained in social media and ana-
lysed over 28k real world images from popular social media
sites. Based on this survey we analysed the Big Data privacy
implications and potential of the emerging trend of geo-tagged
social media. We then presented three concepts how this
location information can actually help users to stay in control

978-1-4673-1703-0/12/$31.00 ©2013 IEEE

You might also like