0% found this document useful (0 votes)
150 views

Analyzing Social Media Data in Python Chapter1

This document discusses analyzing Twitter data in Python. It describes how to collect Twitter data using the Twitter API and the tweepy Python package. Specifically, it explains how to authenticate with the API, filter tweets by keywords or users, and collect data in real-time using the Streaming API. It also covers how Twitter data is structured in JSON format and how to access fields like retweets, favorites, user information, and geolocation from the tweet JSON.

Uploaded by

Fgpeqw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Analyzing Social Media Data in Python Chapter1

This document discusses analyzing Twitter data in Python. It describes how to collect Twitter data using the Twitter API and the tweepy Python package. Specifically, it explains how to authenticate with the API, filter tweets by keywords or users, and collect data in real-time using the Streaming API. It also covers how Twitter data is structured in JSON format and how to access fields like retweets, favorites, user information, and geolocation from the tweet JSON.

Uploaded by

Fgpeqw
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DataCamp Analyzing Social Media Data in Python

ANALYZING SOCIAL MEDIA DATA IN PYTHON

Analyzing Twitter Data

Alex Hanna
Computational Social Scientist
DataCamp Analyzing Social Media Data in Python

Why Analyze Twitter Data?


DataCamp Analyzing Social Media Data in Python

Why Analyze Twitter Data?

Source: Conover et al. (2011)


DataCamp Analyzing Social Media Data in Python

What you can't analyze


Can't collect data on observers
Free-level of access is restrictive
Can't collect historical data
Only a 1% (unverified) sample
DataCamp Analyzing Social Media Data in Python

What you can analyze


1% sample is still a few million tweets
Within a tweet
Text
User profile information
Geolocation
Retweets and quoted tweets
DataCamp Analyzing Social Media Data in Python

ANALYZING SOCIAL MEDIA DATA IN PYTHON

Let's review!
DataCamp Analyzing Social Media Data in Python

ANALYZING SOCIAL MEDIA DATA IN PYTHON

Collecting data through the


Twitter API

Alex Hanna
Computational Social Scientist
DataCamp Analyzing Social Media Data in Python

Twitter API
API: Application Programming Interace
Method of accessing data
Twitter APIs
Search API
Ads API
Streaming API
DataCamp Analyzing Social Media Data in Python

Streaming API
Streaming API
Real-time tweets
Filter endpoint
Keywords
User IDs
Locations
Sample endpoint
Random sample
DataCamp Analyzing Social Media Data in Python

Using tweepy to collect data


tweepy

Python package for accessing Streaming API


DataCamp Analyzing Social Media Data in Python

SListener
from tweepy.streaming import StreamListener
import time

class SListener(StreamListener):
def __init__(self, api = None):
self.output = open('tweets_%s.json' %
time.strftime('%Y%m%d-%H%M%S'), 'w')
self.api = api or API()
...
DataCamp Analyzing Social Media Data in Python

tweepy authentication
from tweepy import OAuthHandler
from tweepy import API

auth = OAuthHandler(consumer_key, consumer_secret)

auth.set_access_token(access_token, access_token_secret)

api = API(auth)
DataCamp Analyzing Social Media Data in Python

Collecting data with tweepy


from tweepy import Stream

listen = SListener(api)

stream = Stream(auth, listen)

stream.sample()
DataCamp Analyzing Social Media Data in Python

ANALYZING SOCIAL MEDIA DATA IN PYTHON

Let's practice!
DataCamp Analyzing Social Media Data in Python

ANALYZING SOCIAL MEDIA DATA IN PYTHON

Understanding Twitter
JSON

Alex Hanna
Computational Social Scientist
DataCamp Analyzing Social Media Data in Python

Contents of Twitter JSON


{
"created_at": "Thu Apr 19 14:25:04 +0000 2018",
"id": 986973961295720449,
"id_str": "986973961295720449",
"text": "Writing out the script of my @DataCamp class
and I can't help but mentally read it back to myself in
@hugobowne's voice.",
"retweet_count": 0,
"favorite_count": 1,
...
}

How many retweets, favorites


Language
Reply to which tweet
Reply to which user
DataCamp Analyzing Social Media Data in Python

Child JSON objects


{
"user": {
"id": 661613,
"name": "Alex Hanna, Data Witch",
"screen_name": "alexhanna",
"location": "Toronto, ON",
...
}
}
DataCamp Analyzing Social Media Data in Python

Places, retweets/quoted tweets, and 140+ tweets


place and coordinate

contain geolocation
extended_tweet

tweets over 140 characters


retweeted_status and quoted_status

contain all tweet information of retweets and quoted tweets


DataCamp Analyzing Social Media Data in Python

Accessing JSON
import json

tweet_json = open('tweet-example.json', 'r').read()

tweet = json.loads(tweet_json)

tweet['text']
DataCamp Analyzing Social Media Data in Python

Child tweet JSON


tweet['user']['screen_name']

tweet['user']['name']

tweet['user']['created_at']
DataCamp Analyzing Social Media Data in Python

ANALYZING SOCIAL MEDIA DATA IN PYTHON

Let's practice!

You might also like