DataCamp Analyzing Social Media Data in Python
ANALYZING SOCIAL MEDIA DATA IN PYTHON
Analyzing Twitter Data
Alex Hanna
Computational Social Scientist
DataCamp Analyzing Social Media Data in Python
Why Analyze Twitter Data?
DataCamp Analyzing Social Media Data in Python
Why Analyze Twitter Data?
Source: Conover et al. (2011)
DataCamp Analyzing Social Media Data in Python
What you can't analyze
Can't collect data on observers
Free-level of access is restrictive
Can't collect historical data
Only a 1% (unverified) sample
DataCamp Analyzing Social Media Data in Python
What you can analyze
1% sample is still a few million tweets
Within a tweet
Text
User profile information
Geolocation
Retweets and quoted tweets
DataCamp Analyzing Social Media Data in Python
ANALYZING SOCIAL MEDIA DATA IN PYTHON
Let's review!
DataCamp Analyzing Social Media Data in Python
ANALYZING SOCIAL MEDIA DATA IN PYTHON
Collecting data through the
Twitter API
Alex Hanna
Computational Social Scientist
DataCamp Analyzing Social Media Data in Python
Twitter API
API: Application Programming Interace
Method of accessing data
Twitter APIs
Search API
Ads API
Streaming API
DataCamp Analyzing Social Media Data in Python
Streaming API
Streaming API
Real-time tweets
Filter endpoint
Keywords
User IDs
Locations
Sample endpoint
Random sample
DataCamp Analyzing Social Media Data in Python
Using tweepy to collect data
tweepy
Python package for accessing Streaming API
DataCamp Analyzing Social Media Data in Python
SListener
from tweepy.streaming import StreamListener
import time
class SListener(StreamListener):
def __init__(self, api = None):
self.output = open('tweets_%s.json' %
time.strftime('%Y%m%d-%H%M%S'), 'w')
self.api = api or API()
...
DataCamp Analyzing Social Media Data in Python
tweepy authentication
from tweepy import OAuthHandler
from tweepy import API
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = API(auth)
DataCamp Analyzing Social Media Data in Python
Collecting data with tweepy
from tweepy import Stream
listen = SListener(api)
stream = Stream(auth, listen)
stream.sample()
DataCamp Analyzing Social Media Data in Python
ANALYZING SOCIAL MEDIA DATA IN PYTHON
Let's practice!
DataCamp Analyzing Social Media Data in Python
ANALYZING SOCIAL MEDIA DATA IN PYTHON
Understanding Twitter
JSON
Alex Hanna
Computational Social Scientist
DataCamp Analyzing Social Media Data in Python
Contents of Twitter JSON
{
"created_at": "Thu Apr 19 14:25:04 +0000 2018",
"id": 986973961295720449,
"id_str": "986973961295720449",
"text": "Writing out the script of my @DataCamp class
and I can't help but mentally read it back to myself in
@hugobowne's voice.",
"retweet_count": 0,
"favorite_count": 1,
...
}
How many retweets, favorites
Language
Reply to which tweet
Reply to which user
DataCamp Analyzing Social Media Data in Python
Child JSON objects
{
"user": {
"id": 661613,
"name": "Alex Hanna, Data Witch",
"screen_name": "alexhanna",
"location": "Toronto, ON",
...
}
}
DataCamp Analyzing Social Media Data in Python
Places, retweets/quoted tweets, and 140+ tweets
place and coordinate
contain geolocation
extended_tweet
tweets over 140 characters
retweeted_status and quoted_status
contain all tweet information of retweets and quoted tweets
DataCamp Analyzing Social Media Data in Python
Accessing JSON
import json
tweet_json = open('tweet-example.json', 'r').read()
tweet = json.loads(tweet_json)
tweet['text']
DataCamp Analyzing Social Media Data in Python
Child tweet JSON
tweet['user']['screen_name']
tweet['user']['name']
tweet['user']['created_at']
DataCamp Analyzing Social Media Data in Python
ANALYZING SOCIAL MEDIA DATA IN PYTHON
Let's practice!