0% found this document useful (0 votes)
57 views

Privacy and Security Combined Notes

This document provides an introductory lecture for a course on privacy and security in online social media. It discusses key terms like privacy, security, and online social media. It introduces the instructors and teaching assistants for the course. It also outlines the course content, which will cover topics like the growth of social media, different social media platforms, and case studies about privacy issues. Students are encouraged to participate actively by attending online lectures and tutorials.

Uploaded by

basedwhytee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Privacy and Security Combined Notes

This document provides an introductory lecture for a course on privacy and security in online social media. It discusses key terms like privacy, security, and online social media. It introduces the instructors and teaching assistants for the course. It also outlines the course content, which will cover topics like the growth of social media, different social media platforms, and case studies about privacy issues. Students are encouraged to participate actively by attending online lectures and tutorials.

Uploaded by

basedwhytee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 467

#Privacy #Security #OnlineSocialMedia

Introductory Lecture
Jan 21, 2021

Ponnurangam Kumaraguru (“PK”)


#ProfGiri CS, #DeanGiri Student Affairs IIIT Delhi
ACM India Council Member
TEDx & ACM Distinguished Speaker

pkatiiitd linkedin/in/ponguru @ponguru fb.com/ponnurangam.kumaraguru


2
3
r

4
7 I
7
Gds
Fasd r
Eso
p
parler 5
Lets discuss this 3 words in the title of the course
● Privacy

Information Me
physical
watching

6
Lets discuss this 3 words in the title of the course
● Security

7
Lets discuss this 3 words in the title of the course
● Online Social Media

8
Now lets see
● Privacy + Security + Online Social Media

9
Assistat
Teady
O
TAs
Neha Kumari, PhD Student
fit
Hitt
Avinash Tulasi, PhD Student

Prashant Kodali, PhD Student

I
10
Six degrees of separation

nasi Ee
11
Six degrees of separation
● Random person -- You
● Small world phenomenon

350
● Facebook?

https://en.wikipedia.org/wiki/Six_degrees_of_separation 12
Inoffer

https://www.jstor.org/stable/pdf/2776392.pdf 13
https://www.jstor.org/stable/pdf/2776392.pdf 14
O.O

15
Idguffetfownfefautawe
4 (5) V’s of Social Media?

16
Growth of Social Media

00

17
History of Social Media

18
What is a #? Why use it?

SEO
BLRTrattic

19
20
From Social Media Data
WHO

WHEN

WHERE

WHAT

WHY
Devices
f
HOW

21
Potential Vision

O
22
Case Studies

23
Case Studies

Afeard

24
Case Studies: Bal Thackeray

25
Case Studies: Shami Witness

26
Case Studies: UK Riots

27
Source: Netflix

28
Source:
Netflix

29
The Great Hck
https://www.netflix.com/title/80117542

30
Congress - Marc Zuckerberg

https://www.youtube.com/watch?v=t-lMIGV-dUI
31
Shadow Profiles

ABC

Insta
https://www.youtube.com/watch?v=JiTQkbLzKUc
32
https://indianexpress.com/article/world/us-capitol-hill-siege-live-updates-7139212/ 33
0
ftp
elhiRot o
https://www.vox.com/recode/22221285/trump-o
nline-capitol-riot-far-right-parler-twitter-faceboo
k 34
https://arxiv.org/pdf/2101.06914.pdf 35
What you should aim as part of the course?
Try out all labs and exercise that we are posting

TAs will have Office hours, please attend them

I am planning to do some similar Live / Online lecture through the semester,


please attend

Be active on the mailing, I plan to use them to inject lot of ideas, and thoughts

36
GEW

37
38
39
Picture time!

40
pkatiiitd

Thanks for linkedin/in/ponguru

attending the class! [email protected]

http://precog.iiitd.edu.in/

@ponguru

41
Privacy and Security in Online
Social Media

Course on NPTEL
Week 1.2

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Overview of OSM

2
3
4
OSM Penetration in India

5
6
4 / 5 V’s of Online
Social Media

V 7
Facebook

8
9
Twitter

10
11
Terminology
⚫Tweet
⚫Retweet
⚫Mention
⚫Like
⚫Hashtag
⚫Replies

12
v/
s
Private Public

Bidirectional Links Unidirectional Links

5000 Characters 140 Characters

Like Retweet / Favorite

13
YouTube

14
Pinterest

15
LinkedIn

16
Foursquare

17
Google+

18
Periscope

https://www.periscope.tv/

19
Periscope

20
Tinder

21
22
Perceptions!

23
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media

Course on NPTEL
Week 1.3

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
2009

2
3
4
5
6
Role of OSN in global events:
UK Riots

7
8
9
10
11
12
13
Hurricane Sandy: Fake Images

14
Other implications

15
Other implications

16
Other implications

17
Takeaways Week 1
⚫Growth of Online Social Media
⚫4 / 5 Vs of OSM
⚫Different OSMs
⚫Use of OSM
- Positive
- Negative

18
Tutorials online
⚫Linux
⚫Python

19
https://www.facebook.com/PreCog.III
TD/

20
http://precog.iiitd.edu.in/

21
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 2.1

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Online discussion
⚫ https://groups.google.com/a/nptel.iitm.ac.i
n/g/noc21-cs28-discuss
⚫Please participate
⚫Read the posts before asking any questions

2
Assignment 1
⚫Hope it was simple and all of you were able
to do it satisfactorily

3
4
5
4 / 5 V’s of Online
Social Media

V 6
7
8
9
Other implications

10
Tutorials online
⚫Linux
⚫Python

11
Frameworks / Platforms to know

⚫APIs of OSM (e.g. Facebook / Twitter API)


⚫A programming language to write code to
extract data (e.g. Python / RoR)
⚫A database to store data (e.g. MySQL /
MongoDB)
⚫A visualization tool to query and analyze data
(e.g. PhpMyAdmin / RoboMongo)

12
Programming
language:
⚫High level programming language to instruct
commands and facilitate data collection
⚫Supports libraries for reading urls, parse
data, interact with APIs, etc.

13
Application Programming Interface (API)

⚫OSM API enables developers to interact with


the OSM website programmatically
⚫We use APIs to extract data from Twitter,
Facebook, etc.
⚫Rate limit: How much data requests can we
make?
⚫Each OSM has its own API and API rate limits

14
Data format
⚫API returns data in the following two
formats:
- JSON
- XML

15
JSON
⚫JSON - JavaScript Object Notation
⚫Data structuring notation
⚫Sample:

16
JSON
⚫Viewing

17
⚫Relational Database to store data
⚫Data is stored in rows and columns
⚫Retrieve using SQL queries
⚫Sample:

18
MongoDB

19
PhpMyAdmin
⚫Access MySQL databases and query using
browser
SQL to query
databases

20
RoboMongo

21
RoboMongo

22
All content in graph form
⚫Graph API
- Interface to extract data related to User profiles,
activities, photos, pages, applications, etc.

Friend USER
Friend
Uploads

Uploads
Uploads
Likes /
comments

23
Why is it called the Graph
API
⚫All objects are stored as nodes of a “graph”
⚫Connections (likes, friendship etc.) are edges
⚫All nodes have a unique numeric ID
- Users
- Pages
- Posts
-…

24
Tutorials for this week
⚫Facebook API Reddit API

25
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 2.2

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Topics that we will cover
⚫ Overview of OSM
⚫ Linux / Python / Twitter API / Mongo DB / MySQL
[Hands-on]
⚫ Trust & Credibility
⚫ Privacy
⚫ Social Network Analysis, NLTK [Hands-on]
⚫ e-crime
⚫ Plotly / Highcharts / Geo-location analysis
[Hands-on]
⚫ Policing
⚫ Identity resolution
⚫ What next – Deep learning, machine learning, NLP,
Image analysis
2
Temporal Patterns

Fake content / rumors becomes viral in first 7-8 hours just after the event.

3
Misinformation on Social Media

4
Misinformation on Social Media

5
Misinformation Tweets

FAKE

RUMORS

6
Background: Hurricane
Sandy
⚫Dates: Oct 22- 31, 2012
⚫Damages worth $75 billion
⚫Coast of NE America

7
Fake Image Tweets

8
Motivation

9
Methodology

Feature
Generation
Data Collection Data Evaluating
and Filtering Characterization Results
Obtaining
Ground Truth

Classification Module

10
Data Description
Total tweets 1,782,526
Total unique users 1,174,266
Tweets with URLs 622,860

11
Data Filtering
⚫ Reputable online resource to filter fake and real
images
- Guardian collected and publically distributed a list of
fake and true images shared during Hurricane Sandy

Tweets with fake images 10,350


Users with fake images 10,215
Tweets with real images 5,767
Users with real images 5,678
⚫ One of the biggest fake content propagation
datasets that have been studied by researchers

12
Analysis
⚫Who
⚫When
⚫Where
⚫What
⚫Why
⚫How

13
Network Analysis

Tweet – Retweet graph for the spread of


fake images at ‘nth’ and ‘n+1th‘ hour

14
Classification
Tweet Features [F2]

5 fold cross validation Length of Tweet


Number of Words
Contains Question Mark?
Contains Exclamation Mark?
User Features [F1] Number of Question Marks
Number of Exclamation Marks
Number of Friends
Contains Happy Emoticon
Number of Followers Contains Sad Emoticon
Contains First Order Pronoun
Follower-Friend Ratio Contains Second Order Pronoun
Contains Third Order Pronoun
Number of times listed Number of uppercase characters
User has a URL Number of negative sentiment words
User is a verified user Number of positive sentiment words
Number of mentions
Age of user account Number of hashtags
Number of URLs
Retweet count

15
Classification Results
F1 (user) F2 (tweet) F1+F2

Naïve Bayes 56.32% 91.97% 91.52%

Decision Tree 53.24% 97.65% 96.65%

• Best results were obtained from Decision Tree classifier. 97% accuracy in
predicting fake images from real

• Tweet based features are very effective in distinguishing fake images tweets
from real, while the performance of user based features was very poor.

16
Boston Blasts
⚫ Twin blasts occurred during the Boston Marathon
- April 15th, 2013 at 18:50 GMT
⚫ 3 people were killed and 264 were injured
⚫ First Image on Twitter (within 4 mins)

17
Sample Fake Tweets

> 30,000 RTs

> 50,000 RTs

18
Data Description

Total tweets 7,888,374


Total users 3,677,531
Tweets with URLs 3,420,228
Tweets with Geo-tag 62,629
Retweets 4,464,201
Replies 260,627
Time of the blast Mon Apr 15 18:50 2013
Time of first tweet Mon Apr 15 18:53 2013
Time of first image Mon Apr 15 18:54 2013
Time of last tweet Thu Apr 25 01:23 2013

19
Data Description

20
Geo-Located Tweets

21
Identifying Rumor / True tweets
⚫ Tagged most viral 20 tweet content
- Rumor / Fake
- True
- Generic (NA)

⚫ Six Rumors
- 130,690 Tweets / Retweets (29%)
- R.I.P. to the 8 year-old boy who died in Boston’s explosions, while running
for the Sandy Hook kids. #prayforboston

⚫ Seven True news


- 116,454 Tweets / Retweets (20%)
- Doctors: bombs contained pellets, shrapnel and nails that hit victims
#BostonMarathon @NBC6

⚫ Seven Generic
- 206,816 Tweets / Retweets (51%)
- #PrayForBoston
Fake Content User Profiles

Account 1 Account 2 Account 3 Account 4

No. of Followers 10 297 249 73,657

Profile Creation Date Mar 24 2013 Apr 15 2013 Feb 07 2013 Dec 04 2008

Total No. of Statuses 2 2 294 7,411

No. of Fake Tweets 2 2 1 1

Current Status Suspended Suspended Suspended Active

Username: BostonMarathons
23
Tweet Source Analysis

24
Suspended Accounts
⚫31,919 new Twitter accounts created during
Boston blasts, that tweeted about the event
⚫Out of these 19% [6,073 accounts] were
deleted or suspended by Twitter

25
Fake / Malicious Accounts

26
Network Analysis of Fake
Accounts
Closed community

27
Architecture

28
Data Statistics
Events Tweets Trending Topics

UK Riots 542,685 #ukriots, #londonri- ots, #prayforlondon

Libya Crisis 389,506 libya, tripoli

Earthquake in Virginia 277,604 #earthquake, Earth- quake in SF

JanLokPal Bill Agitation 182,692 Anna Hazare, #jan- lokpal, #anna

Apple CEO Steve Jobs resigns 158,816 Steve Jobs, Tim Cook, Apple CEO

US Downgrading 148,047 S&P, AAA to AA

Hurricane Irene 90,237 Hurricane Irene, Tropical Storm Irene

Google acquires Motorola Mobility 68,527 Google, Motorola Mobility

News of the World Scandal 67,602 Rupert Murdoch, #murdoch

Abercrombie & Fitch stocks drop 54,763 Abercrombie & Fitch, A&F

Muppets Bert and Ernie were gay 52,401 Bert and Ernie

Indiana State Fair Tragedy 49,924 Indiana State Fair

Mumbai Blast, 2011 32,156 #mumbaiblast, Dadar, #needhelp

New Facebook Messenger 28,206 Facebook Messenger


29
Annotation
⚫ Step 1

- R1. Contains information about the event


- R2. Is related to the event, but contains no information
- R3. Not related to the event
- R4. Skip tweet

⚫ Step 2

- C1. Definitely credible


- C2. Seems credible
- C3. Definitely incredible
- C4. Skip tweet.

30
Annotation Results
⚫ Each tweet annotated by 3 people

⚫ Inter-annotator agreement (Cronbach Alpha) = 0.748

⚫ 30% of tweets provide information (17% credible


information) and 14% was spam

31
Feature Sets
Message based features Source based features
Length of the tweet
Number of words Registration age of the user

Number of unique characters


Number of statuses
Number of hashtags
Number of retweets Number of followers

Number of swear language words


Number of friends
Number of positive sentiment words
Is a verified account
Number of negative sentiment words

Tweet is a retweet Length of description


Number of special symbols [$, !]
Length of screen name
Number of emoticons [:-), :-(]

Tweet is a reply Has URL


Number of @- mentions
Ratio of followers to followees
Number of retweets

Time lapse since the query Source based features


Has URL
Registration age of the user
Number of URLs

Use of URL shortener service Number of statuses


Message based features
Number of followers
Length of the tweet
Number of words
32
Evaluation Metric
Evaluation Metric: NDCG (Normalized Discounted Cumulative
Gain)

NDCG is the standard metric used to evaluate “graded” results

33
Ranking Results
• Tweet and user based features contribute in determining the credibility – it
matters “what you post and who you are”

34
TweetCred
⚫Available as a Chrome Extension
Live Demo of TweetCred

36
Features for Real-time Analysis
Feature set Features (45)
Number of seconds since the tweet; Source of tweet (mobile /
Tweet meta-data web/ etc); Tweet contains geo-coordinates

Number of characters; Number of words; Number of URLs;


Number of hashtags; Number of unique characters; Presence of
Tweet content (simple) stock symbol; Presence of happy smiley; Presence of sad smiley;
Tweet contains `via'; Presence of colon symbol
Presence of swear words; Presence of negative emotion words;
Tweet content (linguistic) Presence of positive emotion words; Presence of pronouns;
Mention of self words in tweet (I; my; mine)

Number of followers; friends; time since the user if on Twitter;


Tweet author etc.

Number of retweets; Number of mentions; Tweet is a reply;


Tweet network Tweet is a retweet
WOT score for the URL; Ratio of likes / dislikes for a YouTube
Tweet links video

37
Top Ten Features
⚫ No. of characters in tweet
⚫ Unique characters in tweet
⚫ No. of words in tweet
⚫ User has location in profile
⚫ Number of retweets
⚫ Age of tweet
⚫ Tweet contains URL
⚫ Tweet contains via
⚫ Statuses / Followers
⚫ Friends / Followers

38
Implementation
Feedback by Users

40
Users of TweetCred
Sample users:
- Emergency responders
- Firefighters
- Journalists / news media
- General users

41
Quick summary for Week 2
⚫ Frameworks / Platforms
- APIs – Twitter & Facebook Reddit
- Python
- MySQL / MongoDB
- PhpMyAdmin
⚫ Rate limits
⚫ JSON
⚫ Graphs
⚫ Credibility
⚫ Data collection for an event
⚫ Who, When, Where, What, Why, and How
⚫ Network analysis

42
Takeaways / Questions?

43
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 3.1

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Frameworks / Platforms to know

⚫APIs of OSM (e.g. Facebook / Twitter API)


⚫A programming language to write code to
extract data (e.g. Python / RoR)
⚫A database to store data (e.g. MySQL /
MongoDB)
⚫A visualization tool to query and analyze data
(e.g. PhpMyAdmin / RoboMongo)

2
Tutorials for this week
⚫Facebook API Reddit API

3
Temporal Patterns

Fake content / rumors becomes viral in first 7-8 hours just after the event.

4
Misinformation Tweets

FAKE

RUMORS

5
Fake Image Tweets

6
Analysis
⚫Who
⚫When
⚫Where
⚫What
⚫Why
⚫How

7
Classification
Tweet Features [F2]
Length of Tweet
Number of Words
Contains Question Mark?
Contains Exclamation Mark?
User Features [F1] Number of Question Marks
Number of Exclamation Marks
Number of Friends
Contains Happy Emoticon
Number of Followers Contains Sad Emoticon
Contains First Order Pronoun
Follower-Friend Ratio Contains Second Order Pronoun
Contains Third Order Pronoun
Number of times listed Number of uppercase characters
User has a URL Number of negative sentiment words
User is a verified user Number of positive sentiment words
Number of mentions
Age of user account Number of hashtags
Number of URLs
Retweet count

8
Sample Fake Tweets

> 30,000 RTs

> 50,000 RTs

9
Data Description

Total tweets 7,888,374


Total users 3,677,531
Tweets with URLs 3,420,228
Tweets with Geo-tag 62,629
Retweets 4,464,201
Replies 260,627
Time of the blast Mon Apr 15 18:50 2013
Time of first tweet Mon Apr 15 18:53 2013
Time of first image Mon Apr 15 18:54 2013
Time of last tweet Thu Apr 25 01:23 2013

10
Data Description

11
Geo-Located Tweets

12
Network Analysis of Fake
Accounts
Closed community

13
Architecture

14
TweetCred
⚫Available as a Chrome Extension
Facebook
⚫Features are different
⚫Different network structure
- Friendship
FBI: Methodology

Ground truth extraction

Facebook Graph API

Generating feature vectors

RESTful API
Supervised learning
17
Web of Trust scores
http://www.domain.com

Reputation: Unsatisfactory / Poor / Very poor (less than 60)


Confidence: High (greater than 10)
OR
Category: Negative

Malicious
18
Plugin
⚫https://chrome.google.com/webstore/detail
/facebook-inspector/jlhjfkmldnokgkhbhgbn
miejokohmlfc
⚫https://addons.mozilla.org/en-US/firefox/ad
don/fbi-facebook-inspector/

19
Demo

20
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 3.2

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
App details
⚫API Key
⚫API Secret
⚫Access token

2
Tutorial: Twitter API

3
Topics that we will cover
⚫ Overview of OSM
⚫ Linux / Python / Twitter API / Mongo DB / MySQL
[Hands-on]
⚫ Trust & Credibility
⚫ Privacy
⚫ Social Network Analysis, NLTK [Hands-on]
⚫ e-crime
⚫ Plotly / Highcharts / Geo-location analysis
[Hands-on]
⚫ Policing
⚫ Identity resolution
⚫ What next – Deep learning, machine learning, NLP,
Image analysis
4
How do we define privacy?

5
Westin’s 3 categories
⚫Fundamentalists, 25%
⚫Pragmatists, 60%
⚫Unconcerned, 15%

6
You thought that on the Internet nobody
knew you were a dog…

…but then you started getting personalized


ads for your favorite brand of dog food
7
#privacyindia12
Methodology

20 Interviews 4 FGDs

10,427 Surveys
18 months!
8
0.21
Sample
0.52
2.94
1.96 0.48 9.39
6.88
0.02 0.14
9.39
11.29 7.10 0.04
1.10 0.03
0.35
0.21 0.05
8.14 3.19 9.53

0.01
0.17
0.08 0.25
9.38

7.73
0.28

8.58
0.03 0.08
0.02
8.57
0.74
Age
Demographics

Age
(N=10,350)
<18 1.54 Gender (N= 10,232)
18-24 21.31 Male 67.57
25-29 32.20 Female 32.43
30-39 25.90
40-49 14.09
50-64 4.46
65+ 0.50

10
Internet & Social Media
What do you feel about privacy of your personal
information on your OSN?
Q42, N = 6,855
It is not a concern at all
Since I have specified my privacy settings, my
data is secure from a privacy breach
Even though, I have specified my privacy
settings, I am concerned about privacy of my
data
It is a concern, but I still share personal
information
It is a concern; hence I do not share personal
data on OSN
11
Internet & Social Media
What do you feel about privacy of your personal
information on your OSN?
Q42, N = 6,855
It is not a concern at all
Since I have specified my privacy settings, my
data is secure from a privacy breach 42.13
Even though, I have specified my privacy
settings, I am concerned about privacy of my
data
It is a concern, but I still share personal
information
It is a concern; hence I do not share personal
data on OSN
12
Internet & Social Media
What do you feel about privacy of your personal
information on your OSN?
Q42, N = 6,855
It is not a concern at all 19.30
Since I have specified my privacy settings, my
data is secure from a privacy breach 42.13
Even though, I have specified my privacy
settings, I am concerned about privacy of my
data 23.84
It is a concern, but I still share personal
information 8.02
It is a concern; hence I do not share personal
data on OSN 6.71
13
Internet & Social Media
If you receive a friendship request on your most
frequently used OSN, which of the following people
will you add as friends?

Q43, N = 6,929
Person of opposite gender
People from my hometown
Person with nice profile picture
Strangers (people you do not
know)
Somebody, whom you do not
know or recognize but have
mutual / common friends with
Anyone
14
Internet & Social Media
If you receive a friendship request on your most
frequently used OSN, which of the following people
will you add as friends?

Q43, N = 6,929
Person of opposite gender
People from my hometown
Person with nice profile picture 10.12
Strangers (people you do not
know)
Somebody, whom you do not
know or recognize but have
mutual / common friends with
Anyone
15
Internet & Social Media
If you receive a friendship request on your most
frequently used OSN, which of the following people
will you add as friends?

Q43, N = 6,929
Person of opposite gender 27.39
People from my hometown
Person with nice profile picture 10.12
Strangers (people you do not
know)
Somebody, whom you do not
know or recognize but have
mutual / common friends with
Anyone 2.99
16
Internet & Social Media
If you receive a friendship request on your most
frequently used OSN, which of the following people
will you add as friends?

Q43, N = 6,929
Person of opposite gender 27.39
People from my hometown 19.51
Person with nice profile picture 10.12
Strangers (people you do not
know) 4.99
Somebody, whom you do not
know or recognize but have
mutual / common friends with 8.31
Anyone 2.99
17
http://precog.iiitd.edu.in/research/privacyindia/

18
Different types of Privacy issues in FB?

19
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 4.1

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Topics that we will cover
⚫ Overview of OSM
⚫ Linux / Python / Twitter API / Mongo DB / MySQL
[Hands-on]
⚫ Trust & Credibility
⚫ Privacy
⚫ Social Network Analysis, NLTK [Hands-on]
⚫ e-crime
⚫ Plotly / Highcharts / Geo-location analysis
[Hands-on]
⚫ Policing
⚫ Identity resolution
⚫ What next – Deep learning, machine learning, NLP,
Image analysis
2
Westin’s 3 categories
⚫Fundamentalists, 25%
⚫Pragmatists, 60%
⚫Unconcerned, 15%

3
Internet & Social Media
What do you feel about privacy of your personal
information on your OSN?
Q42, N = 6,855
It is not a concern at all 19.30
Since I have specified my privacy settings, my
data is secure from a privacy breach 42.13
Even though, I have specified my privacy
settings, I am concerned about privacy of my
data 23.84
It is a concern, but I still share personal
information 8.02
It is a concern; hence I do not share personal
data on OSN 6.71
4
Internet & Social Media
If you receive a friendship request on your most
frequently used OSN, which of the following people
will you add as friends?

Q43, N = 6,929
Person of opposite gender 27.39
People from my hometown 19.51
Person with nice profile picture 10.12
Strangers (people you do not
know) 4.99
Somebody, whom you do not
know or recognize but have
mutual / common friends with 8.31
Anyone 2.99
5
http://precog.iiitd.edu.in/research/privacyindia/

6
Hard to define
“Privacy is a value so complex, so entangled
in competing and contradictory dimensions,
so engorged with various and distinct
meanings, that I sometimes despair whether
it can be usefully addressed at all.”
Robert C. Post, Three Concepts of Privacy,
89 Geo. L.J. 2087 (2001).

7
Control over information
“Privacy is the claim of individuals, groups or
institutions to determine for themselves when,
how, and to what extent information about them is
communicated to others.”
“…each individual is continually engaged in a
personal adjustment process in which he balances
the desire for privacy with the desire for disclosure
and communication….”
Alan Westin, Privacy and Freedom, 1967

8
Forms of Privacy
⚫Information
- Internet
⚫Communication
- Telephone
⚫Territorial
- Living space
⚫Bodily
- Self

9
Background
⚫In 2000, 100 billion photos were shot
worldwide
⚫In 2010, 2.5 billion photos per month were
uploaded by Facebook users only
⚫In 2015, 1.8 billion photos uploaded
everyday on Facebook, Instagram, Flickr,
Snapchat, and WhatsApp
⚫Facebook, Microsoft, Google, Apple have
acquired / licensed products that do Face
recognition

10
Many things are colluding
⚫Increasing public self-disclosures through
online social networks
- Photos
⚫Improving accuracy in Face recognition
⚫Cloud, ubiquitous computing
⚫Re-identification techniques are getting
better

11
Question
⚫Can one combine publicly available online
social network data with off-the-shelf face
recognition technology for
- Individual re-identification
- Finding potentially, sensitive information

12
Goal is to
⚫Use un-identified source {Match.com, photos
from Flickr, CCTVs, etc.} + identified sources
{Facebook, Linkedin, Govt. websites, etc.}
⚫To get some sensitive information of the
individual {gender orientation, SSN, Aadhaar
#, etc.}

13
Latanya Sweeney

14
Experiment 1
⚫Online – Online
⚫Mined publicly available images from FB to
re-identify profiles on one of the most
popular dating sites in the US
⚫Used http://www.pittpatt.com/ for face
recognizing
- Pittpatt acquired by Google
- Face detection
- Face recognition
⚫Use Tensorflow now

15
Experiment 1: Data
⚫Identified
⚫Downloaded FB profiles from one city in USA
⚫Profiles: 277,978
⚫Images: 274,540
⚫Faces detected: 110,984

16
Experiment 1: Data
⚫Un-Identified
⚫Downloaded profiles of one of the popular
dating websites
⚫Pseudonyms to protect their identities
⚫Photos can be used to identify
⚫Same city was used to search
⚫Profiles: 5,818
⚫Faces detected: 4,959

17
Experiment 1: Approach
⚫Unidentified {Dating site photos} + Identified
{FB photos} → Re-identified individual
⚫More than 500 million pairs compared
⚫Used only the best matching pair for each
dating site picture
⚫PittPatt produces score of -1.5 to 20
⚫Crowd sourced to Mturkers for validating
PittPatt
⚫Likert scale, 1 – 5
⚫At least 5 Turkers for each pair
18
Experiment 1: Results
⚫Highly likely matches: 6.3%
⚫Highly likely + Likely matches: 10.5%
⚫1 on 10 from the dating site can be identified

19
Reactions?
⚫What can you do better if you were the
attacker?

20
Experiment 2
⚫Offline to online
⚫Pictures from FB college network to identify
student strolling in campus

21
Experiment 2: Data
⚫Webcam to take 3 pics per participant
⚫Collected over 2 days
⚫Facebook data for the university
- Profiles: 25,051
- Images: 26,262
- Faces detected: 114,745

22
Experiment 2: Process
⚫Pictures taken of individuals walking in
campus
⚫Asked to fill online survey
⚫Pictures matched from cloud while they are
filling survey
⚫Last page of the survey with options of their
pictures
⚫Asked to select the pics which matched
closely, produced by the recognizer

23
Experiment 2: Process

24
Experiment 2: Process

25
Experiment 2: Results
⚫98 participants
- All students and had FB accounts
⚫38.18% of participants were matched with
correct FB profile
- Including a participant who mentioned that he
did not have a picture on FB
- Average computation less than 3 seconds

26
Experiment 3
⚫Predicted SSN from public data
⚫Faces / FB data + Public data → SSN
⚫27% of subjects’ first 5 SSN digits identified
with four attempts – starting from their faces
⚫Predicted sensitive information like SSN

27
What can you think of doing in India?

⚫Aadhaar number?
⚫Other details?

28
References
⚫https://www.blackhat.com/docs/webcast/ac
quisti-face-BH-Webinar-2012-out.pdf
⚫http://www.heinz.cmu.edu/~acquisti/papers
/privacy-facebook-gross-acquisti.pdf

29
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 5.1

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Topics that we will cover
⚫ Overview of OSM
⚫ Linux / Python / Twitter API / Mongo DB / MySQL
[Hands-on]
⚫ Trust & Credibility
⚫ Privacy
⚫ Social Network Analysis, NLTK [Hands-on]
⚫ e-crime
⚫ Plotly / Highcharts / Geo-location analysis
[Hands-on]
⚫ Policing
⚫ Identity resolution
⚫ What next – Deep learning, machine learning, NLP,
Image analysis
2
Experiments
⚫Online – Online
⚫Offline – Online
⚫Predicted SSN from public data

3
4
5
Interested?
⚫WWW http://www.www2017.com.au/
⚫ICWSM http://icwsm.org/2017/
⚫COSN http://cosn.acm.org/
⚫CSCW https://cscw.acm.org/2017/
⚫…

6
Topics that we will cover
⚫ Overview of OSM
⚫ Linux / Python / Twitter API / Mongo DB / MySQL
[Hands-on]
⚫ Trust & Credibility
⚫ Privacy
⚫ Policing
⚫ Social Network Analysis, NLTK [Hands-on]
⚫ e-crime
⚫ Plotly / Highcharts / Geo-location analysis
[Hands-on]
⚫ Identity resolution
⚫ What next – Deep learning, machine learning, NLP,
Image analysis
7
How you use OSM?
⚫How many friends and followers do you have
on Facebook and Twitter?
⚫How many of you are friends with police on
your social network?
⚫How often you use social networks to
-Post comments
-Interact with police

Ever wondered What makes police to use


social media?

8
The Power: Social Media

Do you remember this picture?


9
#myNYPD

10
#myNYPD to #myLAPD

11
Multiple Police Dept. on OSN

12
BLR City Police on OSN

Typical Post Looks like

Keep citizens informed

13
14
15
Popular departments
Police Departments Likes Followers Post Joined
USA
New York 383,372 147,000 No 2012
Boston 137,403 312,000 No 2010
Baltimore 36,530 70,400 Yes 2012
Metropolitan, Columbia 16,071 56,900 Yes 2008
Seattle 12,912 103,000 No 2010
UK
Greater Manchester* 98,193 205,000 Yes 2011
West Midlands* 86,904 115,000 No 2008
Essex* 66,461 85,300 No 2011
London* 46,889 267,000 No 2011
Northern Ireland* 26,173 71,300 No 2009
16
Popular departments - India
Police Departments Likes Followers Post Joined
Bangalore Traffic 2,49,968 8,045 Yes 2012

Delhi Traffic 2,02,858 2,59,000 Yes 2011

Hyderabad Traffic 1,88,480 1,361 Yes 2012

Bangalore City 1,05,463 12,100 Yes 2011

Kolkata Traffic 63,789 - Yes 2010

Chennai 50,979 1,108 Yes 2013


Gurgaon 43,901 718 Yes 2013
Gurgaon Traffic 24,475 - Yes 2010
Hyderabad 13,602 537 Yes 2014
UP Police PR 8,486 4,585 Yes 2013
Guwahati Police 3,255 295 Yes 2011
Bangalore Police

https://www.facebook.com/blrcitypolice
18
Delhi Police

https://www.facebook.com/pages/Delhi-Traffic-Police/117817371573308
19
Delhi fake account?

https://www.facebook.com/delhitraffic.polize?fref=ts
20
UP Police

21
Hyderabad Police

22
23
Hyderabad

24
Gurgaon Police

25
Guwahati Police

26
Kolkata Police

https://twitter.com/KolkataPolice
27
Pune Police

28
Jaipur Police

29
Like it or not but you are there!

30
Multiple accounts

31
Accounts in the country!

http://precog.iiitd.edu.in/research/osm-policing-india/handles.html
32
Analyzing Accounts

http://precog.iiitd.edu.in/research/osm-policing-india/analyse/
33
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 5.2

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Objective of Study
Whether OSN can support police to get
actionable information about crime and
residents’ opinion about policing activities in
urban cities of India.

2
3
4
Methodology: Data
Collection

Collected public posts, 21 July - 21 Aug 2014


Filtered

Posts &
Comments
1600
comments on
255 posts
Filtered = Posts from citizens to Police

5
Methodology: Data Coding
⚫ Thematic Inductive Analysis 24 categories (Public)
• Missing: Missing people complains
• Query: Ask how to get police assistance
Content • Traffic: Report Traffic menace on roads

• Formal 2 Categories (both)

Style • Informal

• Acknowledge to: Like or say thanks


Type • Reply to: Suggest a solution
• Follow-up by: Ask for further details
• Ignored by: No reply 4 Categories (police)

⚫ Lexical analysis using word trees


6
Results
⚫Know persistent, direct, and apparent concerns

Community
Reaction

% of Posts No. of comments and likes


7
Actionable Information
⚫Spatial Data

Unique Places

Intensity of complains
8
Actionable Information
⚫Temporal data

Saturday evening.
“Time – between 5.30 pm and 6pm. Location: The circle
between Freedom Park and the route that goes into
Cubbon Park, towards Century Club. Not a single police
posted here. I was waiting for an auto at the circle and
these two guys rode by asking if they could drop me. . . .
Please ensure there are police put here for safety …”

9
Communication Style
“Dear Sir, Request to take action on Railway Station parking
al : 68 contractors they are not issuing parking slips . . . today
r m
Fo @Yashwantpur Railway Station Tumkur Roadside Entrance
Parking.”
Info
rma Kudos to the Banasawadi Traffic Police Team. My Salute and
l : 18
7 sincere thanks to the Banasawadi Traffic Inspector XXX. . . .
An ANGEL in disguise

Always formal
Instrumental Dear XXX, Please provide the exact landmark..Thank you.”
Approach

Facebook can become an efficient instrument that help


police always stay visible and connected with residents.
10
Response Time

Time Taken to
Respond
Average time 30.53 hours
Maximum time 211.16 hours
Minimum time 4 minutes
Std. dev 41.26 hours

11
Engagement Type
Acknowledged (21.3%) Dear XXX, We will take all possible legal
measures in this regard. Thank you.

Dear XXX, Please lodge a complaint at


your nearest police station with the
172 posts 22% details and co tact police inspector in this
Reply regard. They will assist you further in this.
Thank you.

Dear XXX, This post has been forwarded to


44.3%
appropriate Police Station for taking
necessary action on this. Thank you.
Follow Up (10%)
Dear XXX, Please provide the police
station details. Thank you.
Received no reply; E.g., wishing to share
225 – 172 = 83 something good about a task undertaken
by Police in
Kerala...http://www.youtube.com/watch?v
12
Understanding Victimization
Word #
Fear 7
Worried 6
Concerned 8
Notice of 13
Issue 22
Trouble 4

13
Direct v/s Indirect

My Vehicle KA-02-HW-3183 white color Honda Dio


was stolen from Kadamba Hotel (Near Modi
Hospital), RajajiNagar on Friday (25th July) evening
between 6:30-7:45PM. Please help in tracing my
vehicle.

Dear BCP, though I stay at JP Nagar, but being part


of KSFC Layout RWA (Banaswadi Police station) , I
got to know that there are frequent problem at
KSFC Layout near BBMP Hall . . . 8-10 youths were
creating a public nuisance > (shouting, boozing,
manhandling, etc.) yesterday i.e. on 27/7/14,
Sunday night 9 pm. We need your support to help
avoid a molestation scenario at KSFC Layout. Thank
you

14
Accountability

Mutual Accountability

15
Accountability
⚫Police responds and
allows itself to be held
accountable
- Maintains formal
communication style
even for frustrated
people

16
Accountability
⚫Citizens accept that they are also
accountable to make city safe

17
Accountability
• Citizen hold police accountable

Word Tree visualizations of posts in which residents questioned police using the
word why.
18
Understanding Needs /
Wants
⚫Police can encounter fear and anxiety if they
know:
- residents expectations like needs and wants
- Expectations, if met, can increase feeling of safety

19
Understanding Needs

20
Understanding Wants

21
Discussion: The way
forward
⚫Observable OSN for Actionable Information
and Collective Action
- Tools to mark posts as follow-up, reply etc. does
not exist
⚫OSN for Mutual Accountability
- Much needed for overworked departments
⚫Understanding fear and Victimization effects

22
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 5.3

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Measuring Human Behavior
- Exploring the feasibility of social media in
quantifying attributes of communication
- Identifying behavioral attributes like affective
expression, engagement and social and cognitive
response processes

Citizen to Citizen Police to Citizen

Police to Police Citizen to Police

2
Research Questions
⚫ RQ 1: Topical Characteristics
- Nature of content and topics that characterize social
media discussion threads
⚫ RQ 2: Engagement Characteristics
- How do citizens and police engage in social media
discussion threads?
⚫ RQ 3: Emotional Exchanges
- Nature of emotions and affective expression that
manifest on social media
⚫ RQ 4: Cognitive and Social Orientation
- What are the linguistic attributes that characterize
cognitive and social response processes?

3
Methodology

85 Public and official Police Department

Average age 3 years (from 2010 – April


2015)

47,474 wall posts and 85,408 status


updates

4
Data Categorization

Total Posts w/ ≥ 1
P&C C
Comment
85,408 46,845 5,519 41,326
PP&C PC

47,474 24,984 17,196 7,788


CP&C CC

5
Measures of Behavior
• N Gram Analysis
Topics • K-means Clusters

• No. of police and citizen who comment in posts


• Distinct citizens who comment in posts
Engagement • Average no. of likes and comments

• Valence
Emotional • Arousal LIWC and Anew Dictionary

• Interpersonal Focus
• Social Orientation LIWC Dictionary
Social and
cognitive • Cognition

6
Topic Characteristics
⚫ Focus on advisories, the status of different cases being
investigated

Unigram Freq. Unigram Freq.


rules 0.015 safety 0.012
safety 0.014 following 0.011
violations 0.014 notice 0.010
challans 0.011 prosecuted 0.009
please 0.011 movement 0.008
citizens 0.01 complaint 0.008

(U = 700, p < .05, z = −3.57)

7
Topic Characteristics
Most posts tend to request police to take action on
their complaints

Unigram Freq. Unigram Freq.


please 0.026 people 0.022
take 0.021 please 0.02
action 0.019 one 0.019
people 0.019 take 0.016
one 0.019 action 0.015
time 0.017 time 0.015
near 0.017 number 0.013
Higher Reference to “people”
8
Clusters of Topics
⚫ Police initiated discussions are more focused
than citizen initiated.
Awareness drive / safety campaigns
Road sense is the offspring of courtesy and the parent of safety

Prosecuted / action taken reports


Action taken by [Withheld], Reg your tweet petition,
@[withheld]; 33 parking tag & 6 no parking, 1 foot path
parking. Cases booked on hospital road

Advisories on situations
Good -- Morning to all the Commuters of Shillong City,
there is heavy movement over NH - 40 – 44 and
Madanrting down side, Lumdiengjri area stretch.
Please do not overtake
9
Clusters of Topics
⚫ Police initiated discussions are more focused than
citizen initiated.
Appreciation
Heartiest congratulations to [withheld] police for nabbing
[withheld] agent within 24hrs. wow!!! Kudos and respect
Newspaper articles
Please ACT: http://timesofindia.indiatimes.com/videos/news/…
Citizen tips and complaints
4th Nov 2014 [withheld]: Driving in wrong side at Teghoria U Turn
Neighbourhood problems
“Learn from the Delhi incident and ensure that no buses in Kolkata have tinted
glasses. One such bus was spotted on Gariahat road Regn. #. [Withheld]. Kindly
take appropriate action. Thank you
Missing people
“Sir plz help find my nephew, he is missing since today
morning, he is from kodagu, contact [withheld]
10
Research Questions
⚫ RQ 1: Topical Characteristics
- Nature of content and topics that characterize social
media discussion threads
⚫ RQ 2: Engagement Characteristics
- How do citizens and police engage in social media
discussion threads?
⚫ RQ 3: Emotional Exchanges
- Nature of emotions and affective expression that
manifest on social media
⚫ RQ 4: Cognitive and Social Orientation
- What are the linguistic attributes that characterize
cognitive and social response processes?

11
Engagement / Comments Characteristics
⚫Content Generators
Police Citizen

Police + Citizens 55,028 1,79,176 17,124 12,630 26% lower

Citizens Only 54,982 1,79,176 17,081 12,630

12
Engagement / Comments Characteristics

Comments* Likes**
Avg. Std. dev Avg. Std. dev
Cp&c 3.34 19.19 9.4 253.85
Cc 3.69 13.79 13.38 201.57
9.49% lower 29.75% lower
Citizen post: “My family and I are getting the unwanted calls from
the given number [withheld]. Especially he is misbehaving with a
female member. My Number is - [withheld]”
Police reply: “Dear [withheld], Please visit at your nearest Police
Station and lodge a complaint with details and they will assist you
in this regard... Thankyou”
Police suggests an appropriate action and the discussion
tends to close early, resulting in lower interaction
13
Research Questions
⚫ RQ 1: Topical Characteristics
- Nature of content and topics that characterize social
media discussion threads
⚫ RQ 2: Engagement Characteristics
- How do citizens and police engage in social media
discussion threads?
⚫ RQ 3: Emotional Exchanges
- Nature of emotions and affective expression that
manifest on social media
⚫ RQ 4: Cognitive and Social Orientation
- What are the linguistic attributes that characterize
cognitive and social response processes?

14
Emotional Expressions
⚫ Negative sentiment higher in citizen initiated
threads
CP&C CC
Avg Std. dev Avg Std. dev
Negative 16.67%
higher in CP&C
Affect 0.021 0.03 0.018 0.04
Anxiety 0.001 0.01 0.003 0.02
Anger 0.006 0.02 0.005 0.02
Arousal 4.4 1.74 3.9 2.16

15
Emotional Expressions
⚫ Negative sentiment higher in citizen initiated
threads
Cp&c Cc
Avg Std. dev Avg Std. dev
NA** 0.021 0.03 0.018 0.04
Anx** 0.001 0.01 0.003 0.02 200% higher
in Cc
Anger** 0.006 0.02 0.005 0.02
Arousal** 4.4 1.74 3.9 2.16
I am just worried if Hyderabad Traffic Police [HTP]
makes things worse like always and create more
chaos. Frankly speaking... it's the lower income
group or the people who are not aware using high
beams. Try to educate people on road.
16
Emotional Expressions
⚫ Negative sentiment higher in citizen initiated
threads
CP&C CC
Avg Std. dev Avg Std. dev
NA** 0.021 0.03 0.018 0.04
Anx** 0.001 0.01 0.003 0.02
Anger** 0.006 0.02 0.005 0.02 12.82%
Arousal** 4.4 1.74 3.9 2.16 higher in Cpc

Higher arousal and negative affect to be markers of sensitisation


because of crime!

17
Research Questions
⚫ RQ 1: Topical Characteristics
- Nature of content and topics that characterize social
media discussion threads
⚫ RQ 2: Engagement Characteristics
- How do citizens and police engage in social media
discussion threads?
⚫ RQ 3: Emotional Exchanges
- Nature of emotions and affective expression that
manifest on social media
⚫ RQ 4: Cognitive and Social Orientation
- What are the linguistic attributes that characterize
cognitive and social response processes?

18
Social and Cognitive Orient.
⚫ Discussion threads involving just the
citizens are highly self-attention focused
CP&C CC
ppron 0.062 0.059 0.045 0.056
i 0.008 0.017 0.014 0.033 75% More
shehe 0.002 0.01 0.003 0.003
they 0.005 0.013 0.008 0.008
Likely citizens mostly express their own concerns that they
face with others

I have lived in the UK and all the time I have never heard
anyone honking. Honking is not required if you know how
to drive [...] Can anyone advise me where to complain if I
see anyone who don't comply ?
19
Why it matters?
⚫ Helps police improve policing and community
sensing
- Facebook can be used to record and sense behavioural
attributes such as engagement, emotions, and social
support

⚫ Enable police and citizen community to enhance


emotional support to residents experiencing safety
issues
- Discussion threads with police and citizen commentary
showed reduced levels of anxiety, showing police
interactions can be calming to citizens.
20
Technological Implications
⚫Helping communities to make consensus based
decisions regarding support and actions they
seek from police
⚫Help gauge changing emotions and behaviour
among citizens
- Timely and early predictive analytical systems
⚫Sense and record the reactions of citizens and
share these records with decision makers
- Take timely measures and gain better insights

21
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 6.1

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Topics that we will cover
⚫ Overview of OSM
⚫ Linux / Python / Twitter API / Mongo DB / MySQL
[Hands-on]
⚫ Trust & Credibility
⚫ Privacy
⚫ Policing
⚫ Social Network Analysis, NLTK [Hands-on]
⚫ e-crime
⚫ Plotly / Highcharts / Geo-location analysis
[Hands-on]
⚫ Identity resolution
⚫ What next – Deep learning, machine learning, NLP,
Image analysis
2
Multiple Police Dept. on OSN

3
Objective of Study
Whether OSN can support police to get
actionable information about crime and
residents’ opinion about policing activities in
urban cities of India.

4
Understanding Needs

5
Understanding Wants

6
Research Questions
⚫ RQ 1: Topical Characteristics
- Nature of content and topics that characterize social
media discussion threads
⚫ RQ 2: Engagement Characteristics
- How do citizens and police engage in social media
discussion threads?
⚫ RQ 3: Emotional Exchanges
- Nature of emotions and affective expression that
manifest on social media
⚫ RQ 4: Cognitive and Social Orientation
- What are the linguistic attributes that characterize
cognitive and social response processes?

7
Methodology

85 Public and official Police Department

Average age 3 years (from 2010 – April


2015)

47,474 wall posts and 85,408 status


updates

8
Technological Implications
⚫Helping communities to make consensus based
decisions regarding support and actions they
seek from police
⚫Help gauge changing emotions and behaviour
among citizens
- Timely and early predictive analytical systems
⚫Sense and record the reactions of citizens and
share these records with decision makers
- Take timely measures and gain better insights

9
Topics that we will cover
⚫ Overview of OSM
⚫ Linux / Python / Twitter API / Mongo DB / MySQL
[Hands-on]
⚫ Trust & Credibility
⚫ Privacy
⚫ Policing
⚫ Social Network Analysis, NLTK [Hands-on]
⚫ e-crime
⚫ Plotly / Highcharts / Geo-location analysis
[Hands-on]
⚫ Identity resolution
⚫ What next – Deep learning, machine learning, NLP,
Image analysis
10
Different crimes on OSM
⚫Phishing
⚫Fake X
⚫Social reputation: Fake followers /
Crowdturfing
⚫Clickbaiting
⚫Account compromise
⚫Account impersonation
⚫Work form home scam
⚫…

11
Phishing

⚫Act of tricking someone into handling


over her login credentials in order to
exploit personal information

12
Phishing
⚫Facebook Technical Support sent you a
notification
⚫Facebook new login system
⚫…..
⚫Facebook credentials being important now!

13
Fake customer service accounts

14
Fake comments on popular
posts

15
Fake live streaming videos

16
Fake online discounts

17
Fake online surveys and
contests

18
Foursquare Spam: Fake Tip

Advertising / Marketing

Scam /
Phishing
Social reputation

20
Social reputation manipulation

21
Clickbaiting

22
# hijacking

23
# hijacking

24
Compromised account

25
Impersonation

26
Work from home scam

27
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 6.2

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Link farming
⚫Search engines rank websites / webpages
based on graph metrics such as Pagerank
- High in-degree helps to get high Pagerank

⚫Link farming in Web


- Websites exchange reciprocal links with other
sites to improve ranking by search engines

2
Link farming
⚫A link farm is a form of spamming the index
of a search engine (sometimes called
spamdexing or spamexing).

3
Why link farming in Twitter?

⚫ Twitter has become a Web within the Web


- Vast amounts of information and real-time news
- Twitter search becoming more and more common
- Search engines rank users by follower-rank,
Pagerank to decide whose tweets to return as
search results
- High indegree (#followers) seen as a metric of
influence
- Klout score influenced by Twitter indegree

⚫ Link farming in Twitter


- Spammers follow other users and attempt to get
them to follow back (Reciprocity)
Link farming in Web & Twitter similar?

⚫Motivation is similar
- Higher indegree will give better ranks in search
results
⚫Who engages in link farming?
- Web – spammers
- Twitter – spammers + many legitimate, popular
users !!!
⚫Additional factors in Twitter
- ‘Following back’ considered a social etiquette
Is link farming in Twitter spam at all?

⚫Your reactions?

6
Spam in Twitter
⚫ “five spam campaigns controlling 145 thousand accounts combined are
able to persist for months at a time, with each campaign enacting a
unique spamming strategy.”

7
Spam in Twitter
⚫ “We find that 8% of 25 million URLs posted to the site point to phishing,
malware, and scams listed on popular blacklists.”
⚫ “We find that Twitter is a highly successful platform for coercing users to visit
spam pages, with a clickthrough rate of 0.13%, compared to much lower rates
previously reported for email spam”

8
Spam in Twitter
⚫ “finding that 16% of active accounts exhibit a high degree of automation.”
⚫ “find that 11% of accounts that appear to publish exclusively through the
browser are in fact automated accounts that spoof the source of the updates.”

9
Dataset
⚫Complete snapshot of Twitter, 2009
⚫54 million users, 1.9 billion links! Largest
dataset!

10
Nodes

11
Spammers
⚫379,340 accounts has been suspended in the
interval, Aug 09 – Feb 11
- Spam-activity or long inactivity
⚫41,352 suspended accounts posted at least
one blacklisted URL shortened by bitly,
tinyurl

12
Spammers
⚫# of spam-targets, spam-followers, their
overlap
⚫82% of spam followers overlap with the
spam-targets

13
Spammers
⚫# of spammers who rank within the top K
according to pagerank

14
Spammers
⚫# of spammers who rank within the top K
according to pagerank
⚫7 spammers rank within 10,000, 304 within
100,000 and 2,131 within 1million

15
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 7.1

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Spammers
⚫Fraction of reciprocated in-links from
spammers vs spam-follower node rank

2
Spammers
⚫Top 100,000 spam-followers account for 60%
of all links acquired by the spammers
⚫Top spam-followers tend to reciprocate all
links established to them by spammers

3
Popular users more likely
⚫Probability of response vs indegree for all
users targetted by spammers

4
Popular users more likely
⚫Probability of response vs indegree for all
users targetted by spammers
⚫Users with low indegree do not reciprocate
to links from spammers.
⚫Responsiveness increases with number of
followers

5
Top 5 link farmers
⚫Twitter account bios
⚫Having most links to spammers and highest
pagerank
⚫Popular accounts

6
Top link farmers are not spammers

⚫100,000 link farmers


⚫18,826 were suspended
⚫4,768 were “not found”
⚫76% still active
⚫235 are verified
⚫Manually checked 100 random users
(volunteers)
⚫86 were real accounts
- Business, internet marketing, ENT, money and
social media

7
Node degree distribution: In
⚫ Top link farmers have very high indegree compared
to spammers and random sample

8
Node degree distribution: Out

⚫ Top link farmers have very high outdegree


compared to spammers and random sample

9
Node degree distribution: In/Out
⚫ Most of the top link farmers have ratio near 1

10
Account bio of top 100,000 & random sample

11
Account bio of top 100,000 & random sample
⚫LF: promoting their own business or content
or trends in a domain. Links to legitimate
external sources
⚫RS: don’t tweet to external sources

12
Conclusion
⚫Characteristics of links farmers
⚫Surprisingly, legitimate, popular, and highly
active users such as bloggers, and experts
mostly likely engage in link farming
⚫Increase social capital and influence

13
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 7.2

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Cost of reading privacy policies
⚫What would happen if everyone read the
privacy policy for each site they visited once
each month?
⚫Time = 244/hours year
⚫Cost = USD 3,534/year
⚫National opportunity cost for reading privacy
policy = 781 billion USD

A. McDonald and L. Cranor. The Cost of Reading Privacy Policies. I/S: “A


Journal of Law and Policy for the Information Society. 2008 Privacy Year in
Review Issue. http://lorrie.cranor.org/pubs/readingPolicyCost-authorDraft.pdf

2
Goals
⚫To help individuals avoid regrettable online
disclosures

3
Facemail from MIT

4
Experimental setup
⚫Picture nudge
⚫“These people, your friends, and FRIENDS OF
YOUR FRIENDS can see your post.”

5
Experimental setup
⚫Timer nudge

6
Experimental setup
⚫Sentiment nudge

7
Methodology
⚫Chrome browser
⚫Exit survey, follow-up interviews
⚫IRB approved
⚫Recruitment
- Craiglist, flyers, emails, etc.
⚫21 participants who completed the field
study and 13 participated in the interviews

8
Analysis metrics
⚫Number of changes in inline privacy settings
⚫Number of cancelled or edited posts
⚫Posts frequency
⚫Topic sensitivity

9
Profile picture nudge
⚫One participant changed from “Friends” to
“Friends except acquaintances” when she
posted “Survived one of the craziest, most
exhausting days ever!”
⚫Another participant ended up cancelling “a
couple of posts” because of the profile
picture nudge

10
Timer nudge
⚫One participant mentioned “at times
annonying and at time handy”
- Wait for timer to expire or hit “post now”
- Make it more public when it was “venting” type
⚫Another participant said, made me think
about the posts
- Cancelled a few because of thinking

11
Sentiment nudge
⚫Nudge was missing the context
- Error in finding the sentiment
⚫Many participants cancelled their posts
because of the nudge
⚫Post frequency reduced for sensitive
information, 13 → 7

12
Conclusion

⚫ Interventions help users make better


decision
⚫ More work is needed to understand which
type of nudge works in which context

13
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 7.3

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Semantic Attacks
●“Target the way we, as humans, assign meaning
to content.”
●System and mental model

http://groups.csail.mit.edu/uid/projects/phishing/proposal.
Semantic attacks
Security
attacks
Physical Semantic Syntactic

Phishing Mules Nigerian

Update
Verification Security alert Mortgage
info
eBa BO
Amazon Paypal
y A
An email that we get
Features in the email
Subject: eBay: Urgent Notification From Billing Department
Features in the email

We regret to inform you that you eBay account could be


suspended if you don’t update your account information.
Features in the email

https://signin.ebay.com/ws/eBayISAPI.dll?SignIn&sid=verify&co_partnerid=2&sidteid=0
Website to collect information
http://www.kusi.org/hcr/eBay/ws23/eBayISAPI.htm
Phishing Cost

9
Types of Phishing Attacks
●Phishing
●Context-aware phishing / spear phishing
●Whaling
●Vishing
●Smsishing
●Social Phishing?

10
Until now, work that we have seen?
●Using voters database
●Using Medical health database
●Using Pictures from FB

11
Goal
●To see how phishing attacks can be
performed by collecting personal
information from social networks
- How easily or effectively can phisher use this
information?

12
13
Methodology
●Collected publicly available personal
information using simple tools like Perl LWP
library
●Correlated this data with IU’s address book
database
●Launched in April 2005
●Age between 18 – 24

14
15
Control Vs. Experiment
●Control: The email from IU email ID, but,
from an unknown person
●Experiment: From a friend in IU

16
Methodology
● Blogging, social network, and other public data is harvested
● Data is correlated and stored in a relational database
● Heuristics are used to craft spoofed email message by Eve
“as Alice” to Bob (a friend)
● Message is sent to Bob
● Bob follows the link contained within the email message and
is sent to an unchecked redirect
● Bob is sent to attacker whuffo.com site
● Bob is prompted for his University credentials
17
Victims

●Control group high – sender email ID was IU


●Experimental condition consistent with other
studies
18
Success rate

●70% authentications in first 12 hrs


●Takedown has to be successful
19
Repeated authentications

● Subject tried multiple times


● Tried again because “overload” message was shown
● Lower bound of users to fall, continued to be deceived
● Some tried 80 times
20
Gender

● 18,294 Ms and 19,527 Fs


● Overall F more victims
● More successful if it came from opposite gender
● F to M (13%) was more effect than M to F (2%)
21
●Younger targets more vulnerable

22
●All majors significant difference between control
and experimental
●Max difference in Science
●Technology lowest #satisfying ☺ 23
Reactions
●Anger
- Unethical, inappropriate, illegal, fraudulent
- Researchers fired
- Psychological cost
●Denial
- Nobody accepted that they fell for it
- Admitting our vulnerability is hard
●Misunderstanding over spoofing emails
●Underestimation of publicly available information

24
Conclusions
●Extensive educational campaigns
●Browser solutions
●Digitally signed emails
●OSM provides lot more information for
making the attack successful

25
References
●http://markus-jakobsson.com/papers/jakobs
son-commacm07.pdf

26
References
●http://www.mpi-sws.org/~farshad/TwitterLi
nkfarming.pdf
●www.isical.ac.in/~acmsc/TMW2014/N_gang
uly.ppt

27
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 8.1

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
What is the difficulty in matching?
⚫https://www.facebook.com/ponnurangam.k
umaraguru
⚫https://twitter.com/ponguru
⚫https://in.linkedin.com/in/ponguru

2
3
4
This lecture
⚫Tracking social footprint / identities across
different social network

5
Jain, P., Kumaraguru, P., and Joshi, A. Other Times, Other Values:
Leveraging Attribute History to Link User Profiles across Online Social Networks.

6
Knowing this can be useful!

??

7
De-duplicating audience

Social audience = 437,632 + 153,000 + 805,097 or less??

8
Challenges

Little personal information Quality and descriptive personal


Descriptive opinions Heterogeneous OSNs And professional information

Opinion Degree of Details Professional

Dating Personal

Registration with same Information evolved on one but


information on both OSNs not on other
{paridhij, New Delhi} Attribute Evolution {jainpari, Bangalore}

Time

9
Profile linking approach
⚫List common attributes
⚫Compare attribute values using syntactic,
semantic or graph based methods
⚫High similarity denote profiles refer to a
single user
⚫Values considered here are the most recent
(current) values of the attributes

10
But the values change!

Attribute: Username

# of users tracked: 376 million [random]

Tracking period: 4 years 11


Values change

Attribute: Username

# of users tracked: 8 million [random]


12
Tracking period: 2 months
Reality!

Registration: t1 @nitinsgr @nitinsgr


Observation: t2 @nitinsgr @explorer_nitin Attribute Evolution
Observation: t3 @nitinsgr @logicalIndian

Unmatching values

13
Problem Statement

Given two user profiles and the respective username sets,


each composed of past and current usernames,
find if profiles refer to a single individual?

14
Why only usernames?
⚫Unique attribute of a user
⚫Universally and publicly available attribute
⚫Homogenous, character and length restricted
⚫Easier history collection methods for
username as other attributes

15
Methodology

Collected Username Sets Prediction

16
Ground Truth Collection
⚫ Self-identification behavior [Cross-referencing
one’s OSN accounts]
⚫ Extrovert users

Twitter username

Tumblr username on the URL

17
Past Usernames Collection

[@badluckgirl,
http://twitter.com/yourgurlluzy 14 days @bieblerlover ]

Automated Tracker
of user identities

@yourgurlluzy

http://girlonthesportingnewss. [@badluckgirl,
tumblr.com @hazzyonthesun]
URL Changes on Twitter
badluckgurl.tumblr.com
hazzyonthesun.tumblr.com
18
Sample
• User ID: 595929421
• Past usernames on Twitter:
• ["bigeasye_", "reezy11_", "epiceric_", "soulanola", "swampson_",
"hebetheeeric", "swampkidd_"]
• Past Usernames on Instagram:
• ["bigeasye_", "epiceric17", "swampson", "hebetheeeric"]}

19
Methodology
Assumption: Consistent user behavior within and across networks over time

Collected Username Sets Prediction

20
Features

# Features: 26
21
Methodology

Collected Username Sets Prediction

22
Datasets
⚫Linking profiles
⚫ Twitter – Instagram
⚫ Twitter – Tumblr
⚫ Twitter - Facebook
⚫Past usernames available for both profiles:
⚫ 21,446 positive pairs, 21,449 negative pairs
⚫Past usernames available only on Twitter
but current username available on other
profile:
⚫ 112,451 positive pairs, 112,451 negative pairs
23
Supervised Classification

1. Independent Supervised Framework

2. Cascaded Supervised Framework

24
Prediction

Framework Config. Accuracy FNR FPR


Exact Match (b1) 55.38 89.34 0.00
Substring Match (b2) 60.99 78.46 0.00
Independent [Naive
Bayes] 72.19 55.86 0.13
Cascaded [b1→Naive
Bayes] 72.48 55.27 0.14
Cascaded [b1 → SVM
[Linear] 76.74 45.16 1.65
Cascaded [b2 →
Naive Bayes] 72.51 54.97 0.17
Cascaded [b2 → SVM
[Linear] 76.84 45.16 1.25 25
Prediction

A comparison of cascaded framework accuracy with and without Twitter-Tumblr instances

Framework Config.
[History on Both or
One] Accuracy FNR FPR
Exact Match (b1) 55.38 89.34 0.00

Cascaded [all network] 76.74 45.16 1.65


Exact Match without
Tumblr (b1) 66.17 67.51 0.00
Cascaded [without
Tumblr] 91.20 16.60 0.96

26
Measuring Volume of
Sentiments

??

??

??

27
Conclusion
⚫Profile linking may be necessary for many
organizations / needs
⚫Better profile linking is possible with past
history of user handles

28
Activity
⚫Take 2 of your accounts or any accounts that
you know are same in 2 different social
networks
⚫Find out various ways in which you can link
these 2 accounts
⚫ List the features
⚫List down things that you will change in the
profile to make it look as 2 different
networks

29
References
⚫Paridhi Jain’s Ph.D. thesis work

30
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru
Privacy and Security in Online
Social Media
Course on NPTEL
NOC21-CS28
Week 8.2

Ponnurangam Kumaraguru (“PK”)


Full Professor
ACM Distinguished Speaker
fb/ponnurangam.kumaraguru, @ponguru
Anonymous Networks
⚫4chan
⚫Whisper
⚫Secret
⚫Yik Yak
⚫Wickr

2
Why use Anonymous Networks?
⚫Increasing awareness of privacy
⚫Snowden disclosures
⚫PRISM Surveillance program
⚫Bal Thackeray incident
⚫Many other incidents around the world

3
What is Whisper?

https://whisper.sh
4
What is Whisper?

https://www.youtube.com/watch?v=pX9I9kR2tTc
5
Hearts / Chat

6
Terminology / Claims
⚫ Whispers
⚫ Replies
⚫ Anonymous names
⚫ Does not associate any personal information
with user ID
⚫ Does not archive any user history
⚫ Does not support persistent social links between
users
⚫ “Heart” a message anonymously
⚫ Private messages
7
Whisper

8
Goals
⚫How do whisper users interact in an
anonymous environment?
⚫Do users form communities similar to those
in traditional social networks?
⚫Does whisper’s lack of identities eliminate
strong ties between users?
⚫Does it eliminate stickiness critical to long
term engagement as in traditional SN?

9
Data collection
⚫Feb 6th – May 1st 2014
⚫Collected “Latest” list by scrapping
⚫Data include
⚫ WhisperID
⚫ Timestamp
⚫ Plain text of the whisper
⚫ Author’s nickname
⚫ A location tag
⚫ # of replies (marked with the whisper)
⚫ Likes
10
Data collection
⚫9,343,590 whispers
⚫15,268,964 replies
⚫1,038,364 GUIDs
⚫ Global Universal Identifier
⚫ Makes it possible to track user, but was removed
in June 2014
⚫Interacted with Whisper team

11
Data

⚫55% of whispers receives no replies


⚫25% have a chain of at least 2 replies
12
Time between original &
reply

⚫ 54% of replies arrive within an hour of the original whisper


⚫ 94% of replies arrives in one day
⚫ 1.3% of replies arrive a week or more
⚫ “If a whisper does not get attention shortly after posting, it is
unlikely to get attention later.”
13
Posts per user

⚫ 80% users post less than 10 total whispers or


replies
⚫ 15% of users only post replies but no original
whispers
⚫ 30% of users only post whispers but no replies
14
Network analysis

⚫ High average Degree. Users interact with large sample of other


users.
⚫ Whisper users are likely to interact with complete strangers who
are highly unlikely to interact with each other (low clustering
coef.)
⚫ 100 random nodes. Avg. path length calculated. Shortest
average path among 3.
⚫ Above 3 used to infer that graph is random than “small world”

15
Network analysis

⚫Assortativity measures the probability for


nodes in a graph to link to other nodes of
similar degrees.
⚫Close to zero → random graph

16
Content moderation
⚫1.7 million whispers have been deleted in 3
months
⚫18% of content deleted compared to 4% in
Twitter

17
Content moderation:
Process
⚫Extracted keywords from all whispers
⚫Removed common stop-words
⚫Removed words that appear in less than
0.05% of whispers
⚫Compute deletion ratio for each word
⚫ # of deleted whispers with this word / all
whispers with this keyword
⚫Rank the words with deletion ratio
⚫Top and bottom keywords
18
Content moderation:
Process
⚫Run on all 9 million original whispers
⚫1.7 M were deleted
⚫2,324 keywords ranked by deletion ratio
⚫Manually put them in categories

19
Content moderation

20
Deletion delay

⚫ 70% of deleted whispers are “deleted” within


one week after posting
⚫ 2% of the whispers stay for more than a month
⚫ Done by moderators 21
Deletion delay

⚫ Fine grained analysis


⚫ Recrawled for 200K latest whispers
⚫ 32,153 was deleted
⚫ Peak deletion 3 – 9 hrs
⚫ Majority deletes within 24 hrs 22
User interactions

⚫503K user pairs


⚫90% of the two users co-located in the same
“State”
⚫75% have their distance < 40 miles 23
User interactions

⚫Smaller user population in same nearby area,


higher chance of encounter
⚫More whispers 2 users post, more likely they
encounter each other 24
User engagement

⚫Roughly 80K users per week


⚫Daily new posts in the entire network remain
stable, despite new users (earlier conclusion)
⚫Shows users “disengage” 25
User engagement

⚫ # of whispers and replies by both new and old users


⚫ New users make 20% contribution in the content
⚫ Content by new users does not grow significantly
26
Conclusions
⚫Clearly different from traditional social
networks
⚫Without strong user identities or persistent
social links, users interact with strangers
⚫Moderation is necessary

27
References
⚫http://www.cs.ucsb.edu/~ravenben/publicat
ions/pdf/whisper-imc14.pdf

28
Thank you
[email protected]
precog.iiitd.edu.in
fb/ponnurangam.kumaraguru

You might also like