0% found this document useful (0 votes)

87 views27 pages

Twitter Data Scraping Jupyter Notebook Text Instruction

This document provides a tutorial for scraping Twitter data using Python and the GetOldTweets3 library. It outlines the steps to install Anaconda with Jupyter Notebook, install the GetOldTweets3 library, launch a Jupyter Notebook data scraper, run the scraper to collect tweets, and then open the dataset in Excel to prepare it for analysis. The goal is to allow users without a Twitter developer account to retrieve historical Twitter data dating back to the start of Twitter.

Uploaded by

Rey Sakamoto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views27 pages

Twitter Data Scraping Jupyter Notebook Text Instruction

Uploaded by

Rey Sakamoto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Twitter Data Scraping Tutorial

Abstract
This tutorial walks you through installing Anaconda, GetOldTweets3, and details how to scrape
data and then manipulate it within Excel to prepare the dataset for analysis.
Twitter Data Scraping Tutorial Amy Larner Giroux

Amy Larner Giroux, PhD

[email protected]

Table of Contents
Introduction ........................................................................................................................................................2
Install Anaconda with Jupyter Notebook ..................................................................................................... 4
Install GetOldTweets3 Library ........................................................................................................................9
Launch the Data Scraper Jupyter Notebook .............................................................................................. 11
Run the Data Scraper ..................................................................................................................................... 13
Initialization of the Process ...................................................................................................................... 13
Text-based Query ....................................................................................................................................... 14
Username Query ........................................................................................................................................ 16
Open the Dataset in Excel and Prepare it for Analysis ............................................................................ 18
Unique ID Changes ................................................................................................................................... 19
Splitting Date and Time ............................................................................................................................ 21
Fixing the Mentions Error ........................................................................................................................ 25
Concluding Thoughts ..................................................................................................................................... 26

Understanding Digital Culture: Humanist Lenses for Internet Research 1

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Introduction
This Twitter Data Scraping tutorial will step you through the process of setting up a Python
environment and how to use the supplied Jupyter Notebook to collect tweet data. The instructions
and screen shots are shown in a Microsoft Windows environment, but the programs used also exist
for Mac and Linux operating systems.
The concept behind using this Python data scraper is to remove the need for you to register for a
Twitter developer account, and also to give you access to all past Twitter data. Many of the other
methods of tweet collection limit you to retrieving only the past week’s tweets and you would need
to plan far ahead and set up a retriever, such as TAGS, to collect data over time.
This tutorial and the methods it will teach you will allow you to retrieve historical Twitter data from
any point since Twitter’s inception. The following is a snapshot of the first tweets of the developers
when Twitter launched in 2006. These tweets were retrieved using the methods detailed in this
tutorial.

The first tweets (in reverse date/time order)

Some assumptions were made when designing this tutorial to allow it to be as comprehensive as
possible for a very broad audience. The explanations are written for users who:
1. have little to no experience with Python
2. do not have Python and Jupyter Notebook installed
3. are familiar with Twitter and the concept of hashtags and usernames
4. have a hashtag or username and date range of interest for research
5. have some familiarity with Excel
If you are already familiar with parts of these concepts (Python/Excel) please just skim the
instructions where you feel confident with the process.

Understanding Digital Culture: Humanist Lenses for Internet Research 2

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

There are a set of outcomes for this Twitter data scraping tutorial and the instructional material is
separated into these goals:
1. Install Anaconda with Jupyter Notebook
2. Install GetOldTweets3 library
3. Launch the data scraper Jupyter Notebook
4. Run the data scraper
5. Open the dataset in Excel and prepare it for analysis
Some of the steps to follow are embedded in the main text of the instructional material, while others
are in the captions of the figures. Bolded text has been used to draw your attention to items to do.
In some of the illustrations, a yellow cursor is visible to indicate what to select.

Understanding Digital Culture: Humanist Lenses for Internet Research 3

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Install Anaconda with Jupyter Notebook

Anaconda is a an open-source distribution package for Python, R, and other data science resources.
Included with the bundled Anaconda installation is Jupyter Notebook, a web-based application
which allows users to create documents that include live code and other documentary materials.
Navigate to: https://www.anaconda.com/products/individual
At the bottom of the page are the links to the installers for the various operating systems.

Anaconda Installer Options

Select the appropriate Python 3.7 installer for your operating system to save the installer to your
computer.
Launch the installer. The installation wizard will move you through the various steps and allow you
to adjust a few options. The sequence of screens is shown below.

Understanding Digital Culture: Humanist Lenses for Internet Research 4

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Welcome Screen – Click Next

Licensing Screen – Click Next

Understanding Digital Culture: Humanist Lenses for Internet Research 5

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Installation Type – If you share your computer with other people and have admin
privileges, select All Users, otherwise leave the default. Click Next

Installation Path – Typically leave the default unless

you need to install it on another drive. Click Next

Understanding Digital Culture: Humanist Lenses for Internet Research 6

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Advanced Options – leave these options as is and click Install

Progress Bar – let the installation run to completion

Understanding Digital Culture: Humanist Lenses for Internet Research 7

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

PyCharm Advertisement – Ignore it and click Next

Completion Screen – Uncheck the tutorial/learn more boxes and select Finish

This completes the “Installation of Anaconda and Jupyter Notebook” section of the tutorial.

Understanding Digital Culture: Humanist Lenses for Internet Research 8

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Install GetOldTweets3 Library

The GetOldTweets3 library is open-source and contains the Python code needed to scrape the data
using the methods contained in this tutorial. Dmitry Mottl branched this library from Jefferson
Henrique’s code. We will be using the Anaconda Powershell command prompt to install the
library, but if you are interested in reading about the library, you can access the documentation on
GitHub (https://github.com/Mottl/GetOldTweets3).
Select the Anaconda Powershell Prompt from the Start menu.

Windows Start Menu – Click Anaconda Powershell Prompt

The Anaconda Powershell prompt is a command window that allows you to run programs as you
would from a normal command prompt. However, it has the environment set to be able to run
Python and other scripts. The figure below shows the command you need to enter to install the
GetOldTweets3 library.

Enter the following command: pip install GetOldTweets3

Understanding Digital Culture: Humanist Lenses for Internet Research 9

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Installation messages from pip

The installation will display the steps it took to download and install the library. When it is
completed, enter exit at the prompt to close the window.
This completes the “Install GetOldTweets3 Library” section of the tutorial.

Understanding Digital Culture: Humanist Lenses for Internet Research 10

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Launch the Data Scraper Jupyter Notebook

The Jupyter Notebook that you will use to scrape Twitter data was originally created by Martin Beck
(https://towardsdatascience.com/@its.martin.beck). I have modified it for this tutorial to include
different options and to document more fully the steps needed to run the scraper.
Navigate to: http://chdr.cah.ucf.edu/neh-digculture/NEH-DigCulture-TweetScraper.zip to
download the notebook and unzip the file. Place it on your desktop for ease of access.
Select Jupyter Notebook from the Start menu.

Windows Start Menu – click on Jupyter Notebook

The notebook will launch a local server instance to support the process and then it will launch the
web app for Jupyter Notebook.

Understanding Digital Culture: Humanist Lenses for Internet Research 11

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Jupyter Notebook server window – look, don’t touch

The web app for the notebook will launch in your default browser and display folder navigation
options. It defaults to the Desktop and if you unzipped the notebook there you should see it in the
list. If you placed it elsewhere, click on the folder next to the Desktop link and navigate to the
correct location.

Jupyter Notebook – click on Giroux-NEH-DigCulture-TweetScraper.ipynb

This completes “Launch the Data Scraper Jupyter Notebook” section of the tutorial.

Understanding Digital Culture: Humanist Lenses for Internet Research 12

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Run the Data Scraper

The tweet scraper notebook contains two options for collecting tweets; one that uses a text query
search and another to collect the tweets of a specific user.
Jupyter notebooks are a set of “cells” that may contain comments (such as the title block below) or
code to execute (the highlighted cell).

Notebook content showing comment and code cells

The code in a cell can be run by pressing the Run button in the toolbar, or by pressing Ctrl-Enter
on the keyboard.
The following explanations will step through each of the executable cells in the notebook and
describe what to expect for outcomes and what to do if something fails.
Initialization of the Process

Initialization – Click within this cell and Run it (Ctrl-Enter)

The compartmentalization of code within a notebook allows you to run sections separately. The first
section, Initialization of the process, loads two libraries, GetOldTweets3 (which we installed) and
pandas (which was installed with Anaconda). When a code cell is executed using the Run button or
Ctrl-Enter, the square brackets to the left of the code (showing [1] in this example) will change to an

Understanding Digital Culture: Humanist Lenses for Internet Research 13

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

asterisk (*) while the code executes. Since this cell is only two lines of code you may blink and miss
the asterisk. You will notice it more later when the data is being scraped.

Text-based Query
As each of the appropriate code cells are executed, the code is loaded into memory and is then
available to other code within the notebook. The next code cell in the notebook contains a function
that runs the query on the Twitter data and creates a CSV file that contains the results.
This function will be run (loaded into memory) before we execute the code that defines what our
search parameters will contain.
By looking at the comments within this code cell, you will see that 4 parameters are passed to the
function: text_query (the search terms), start_date (beginning of date range), end_date (ending of
date range) and a count. The count constrains the number of tweets requested through the Twitter
API. There is a variable limit for the number of tweets you can ask for in a single query. Some
documentation says you can retrieve up to 18,000 per query. Typically, I can retrieve about 10,000
every 15-20 minutes without the process failing.
As you work with this scraper and find you have criteria that push against this limit, think about
breaking the queries up by day, by a single hashtag, etc. to reduce the size of the dataset retrieved in
a single query. You can combine multiple datasets in Excel afterwards as described later in this
tutorial.

Understanding Digital Culture: Humanist Lenses for Internet Research 14

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Text query code cell – Click within this cell to make it active and Run (Ctrl-Enter)
Once the text query code cell has been run, we will set the criteria for the tweet scraping and retrieve
the data.
In the code cell below, you will see the 4 parameters to set. In this example, I am retrieving tweets
that use the hashtag #covid19 and include the text realDonaldTrump. This combination query looks
for Trump in any context: username, tweet content, or mentions, regardless of whether someone
used the @ username or the # hashtag symbol.
The date parameters require some caveats.
1. The until_date (Twitter’s variable name) needs to be your end date + 1. In the example
below, the last date in the range that the API will send back will be 15 March 2020.
2. The Twitter API will return query results from the until_date back towards the since_date
(i.e., end date to start date). This means that if your query hits the count limit before the
query finishes traversing all of the dates in your range, you may get, in this example, say
5,000 tweets from 15 March 2020, 5,000 tweets from 14 March 2020, and none from the rest
of the days in the range. You will need to examine your dataset if you are querying over
multiple days to ensure that all your requested data is retrieved. If you do not get all the
expected data, run single days individually and combine the data afterwards.

Process to retrieve tweets

Modify the text_query, since_date, and until_date in this code cell to be your research parameters.
Save your changes to the code by using the Save icon in the toolbar or pressing Ctrl-S.
Run the cell.

Understanding Digital Culture: Humanist Lenses for Internet Research 15

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

As this function takes time to run, you will notice the [*] displayed as the code executes and you will
see that it is completed when the asterisk is replaced with a number. This number denotes the
number of times cells are executed in a session.
You will also notice a CSV file will be saved to your desktop. The name of the file will be your
text_query and the number (in thousands) of the count you requested (not the actual count of
returned tweets).
If the process has an error, the asterisk will be replaced by a number, but your CSV file will not
appear. If you look below the code cell, you will find the error information. As mentioned
previously, Twitter constrains the number of queries. If you stay within the 10,000 count and a 1520
minute interval between large queries (ones that approach the 10,000 tweet boundary) you shouldn’t
have any issues.
If you see this error, you are running too many large queries to quickly. Wait 20 minutes and try
again.

Too Many Requests error

You may also get this one if Twitter just had a momentary glitch. You can retry immediately if you
get this error.

Service Temporarily unavailable error

The errors that may be displayed list out all of the Python traceback code. Just ignore it. The main
thing to note is whether the issue is a 429 (wait 20 minutes and try again) or a 503 (just try again).
When you run the code cell again, the error information will be cleared.

Username Query
The remaining two code cells in the notebook are used for queries by Twitter username. To collect
the tweets of a specific user, the code cell that runs the query by username needs to be loaded into
memory like we did for the text-based query function.
The difference here is that the parameter is the username instead of the text criteria.

Understanding Digital Culture: Humanist Lenses for Internet Research 16

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Username query – Click in the code cell and Run (Ctrl-Enter)

The last cell in the notebook is the function to set the query by username parameters and run the
actual retrieval. Modify the username, count, and date parameters as necessary for your research.

Query by username – Click in code cell and Run (Ctrl-Enter)

You will now have one or more CSV files with Twitter data. These files will need some manipulation
on Excel to make them ready for analysis.
Remember to close the Jupyter notebook server window by clicking on the X to close it.
This completes “Run the Data Scraper” section of the tutorial.
Understanding Digital Culture: Humanist Lenses for Internet Research 17
NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Open the Dataset in Excel and Prepare it for Analysis

The data scrapers described above create CSV files containing the tweet data. When you initially
open the file in Excel it will look like this:

CSV in Excel

The table below describes the content of these columns and some comments outlining manipulation
we will do to make them more useful/visually understandable before saving the data as an Excel
spreadsheet.
Col. Content Comments
A count It is an incremental number of the count of tweets in the file and is zero-based
B ID The unique ID of the tweet. This column is useful for de-duplicating data that
has been collected via multiple queries. This technique will be outlined later in
the tutorial. When the CSV is opened initially, this appears in exponential
notation.
C Datetime This is the date/time of the tweet. Typically, I am interested in just the date
and will explain later how to split this column into its constituent parts.
D Text The actual textual content of the tweet without emoji (these were excluded)
E User The username on the account sending the tweet
F To The username(s) the tweet was sent to in a reply
G Retweets The number of times the tweet was retweeted
H Favorites The number of times the tweet was liked
Understanding Digital Culture: Humanist Lenses for Internet Research 18
NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

I Mentions Other Twitter usernames mentioned in the tweet. Currently the column in the
example shows an Excel error of #NAME?. This will be explained below.
J Hashtags A list of the hashtags included in the tweet

Unique ID Changes
The ID is shown in exponential notation. To change, highlight the column, click on the format
dropdown and select Number, and then use the decimal place decreasing button to remove the
two decimal places.

To
remove decimals

Changing the ID column from exponential to integer

When combining multiple datasets in one spreadsheet, you will want to remove any duplicates. Copy
and paste the rows into one spreadsheet as below:

Merged data files

Select all rows in the file (Ctrl-A) and from the Data tab, click on Remove Duplicates.
Understanding Digital Culture: Humanist Lenses for Internet Research 19
NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Remove Duplicates
When the Remove Duplicates dialog is shown, make sure that the My data has headers is
checked, and then uncheck (Column A) as that column is not unique.

Remove Duplicates – Uncheck column A and then click on OK

Excel will show you how many duplicate rows were removed.

Results of Deduplication
If you have merged multiple files together, Column A is no longer unique. You may delete the
column. The figures throughout the rest of the tutorial still have Column A in place.

Understanding Digital Culture: Humanist Lenses for Internet Research 20

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Splitting Date and Time

By creating separate columns for date and time, you can more easily cluster your data by date. To do
this, we will use the Text to Columns function in Excel.

First, select the column and change it to a Text format using the dropdown in the Home tab. This
action will help to retain the YYYY-MM-DD format of the date portion of the field.

Change column format to text

Next, insert two empty columns to the right of the Datetime column as shown above. The Text to
Column function needs these columns to hold the separated data.
On the Data tab, select the Text to Columns function.

Understanding Digital Culture: Humanist Lenses for Internet Research 21

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Text to Columns function

The Text to Columns Wizard will step you through the process of splitting the Datetime field. The
wizard will default to Delimited, which is fine since the two fields are separated by a space.

Step 1 of Text to Column – make sure Delimited is selected and click Next

Understanding Digital Culture: Humanist Lenses for Internet Research 22

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Step 2 of Text to Column – check the box for Space and click Next

The third screen of the Text to Column wizard will require multiple changes. We need to set the
destination to the two columns we inserted and set the data format for the new columns.

Step 3(a) – Click the arrow on the right end of Destination

Understanding Digital Culture: Humanist Lenses for Internet Research 23

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Step 3(b) – highlight the two new columns (D & E) and once the =$D:$E appears in the bar, click
the down arrow to return to the wizard

Next, we need to format the two new columns so that the date format is retained.

Step 3(c) – Use Ctrl-click to select both columns in the preview and then click
on Text

Understanding Digital Culture: Humanist Lenses for Internet Research 24

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

After completing the three parts of Step 3, click on Finish and enter column names for the two
new columns.

Completed Date and Time columns

Fixing the Mentions Error
The final change to make to the data is to fix the error in the Mentions column. Since a username in
Twitter begins with @, Excel thinks that it is a formula and when it opens the CSV the Mentions
column is prefaced by an equal sign as seen here in the formula bar.

Mentions formula error

The simplest way to fix this error is to highlight the column, press Ctrl-H to bring up the
Find/Replace dialog and replace the equal sign with nothing. This removes the formula and
allows the Mentions to be viewed.

Understanding Digital Culture: Humanist Lenses for Internet Research 25

NEH Summer Institute, University of Central Florida, 1–5 June 2020
Twitter Data Scraping Tutorial Amy Larner Giroux

Find and Replace the equal sign to fix Mentions formula error

Remember to save the file as a spreadsheet (XLSX format) so that

your changes are retained.
This completes “Open the Dataset in Excel and Prepare it for Analysis” section of the
tutorial.

Concluding Thoughts
By following the methods outlined in this tutorial, you will be able to create a dataset of tweets that
can be used as input for a textual analysis program such as Orange (https://orange.biolab.si/).
Once Anaconda and Jupyter Notebook have been installed, creating new datasets is as simple as
changing the text or user query criteria in the notebook and running the code cells as needed. It is
not a complicated process and the installation of the software is straightforward.
Scraping Twitter data is a simple process:
1. Decide whether to query by text criteria or username.
2. Run the first cell in the notebook to load the libraries.
3. If using a text-based query:
a. Run the “Using a text-based search to collect tweets” code cell
b. Modify the search parameters in the “Text query process” code cell and then run the
cell
4. If using a username-based query:
a. Run the “Using a username-based search to collect tweets” code cell
b. Modify the search parameters in the “Username query process” code cell and the run
the cell
5. Modify your CSV file to prepare it for data analysis.

The takeaways from this tutorial are to remember the following:

1. Make sure that the end date of your range is one greater than the date you want
2. Only try for 10,000 maximum tweets per query to keep Twitter from restricting you
3. Wait approximately 15-20 minutes between queries that return close to the 10,000
maximum. If your queries are returning a few thousand each time, you can run them more
frequently.
4. Make the suggested changes to the data in Excel before further analysis of your data.
5. Enjoy the wealth of data retrievable using this Python-based data scraping method!
If you have any questions, I can be reached at [email protected].
Understanding Digital Culture: Humanist Lenses for Internet Research 26
NEH Summer Institute, University of Central Florida, 1–5 June 2020

Grandmaster's Openings Laboratory (PDFDrive)
100% (5)
Grandmaster's Openings Laboratory (PDFDrive)
270 pages
Python For Data Analytics
67% (3)
Python For Data Analytics
69 pages
Spare Parts Catalog: 6 WG 260 Material Number: 4646.066.001 Current Date: 27.08.2024
No ratings yet
Spare Parts Catalog: 6 WG 260 Material Number: 4646.066.001 Current Date: 27.08.2024
103 pages
SAP RVND Configurations
No ratings yet
SAP RVND Configurations
61 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
Anticancer
No ratings yet
Anticancer
38 pages
Algorithms and Flowcharts 1
100% (1)
Algorithms and Flowcharts 1
32 pages
Compacting Factor Test
67% (6)
Compacting Factor Test
4 pages
How To Install Jupyter Notebook On Ubuntu: Getting Started
No ratings yet
How To Install Jupyter Notebook On Ubuntu: Getting Started
95 pages
GRE Arithmetic Practices
No ratings yet
GRE Arithmetic Practices
7 pages
The Value of English Geoconservation Sites in Understanding Historical Collections of Lower and Middle Palaeolithic Artefacts
No ratings yet
The Value of English Geoconservation Sites in Understanding Historical Collections of Lower and Middle Palaeolithic Artefacts
15 pages
Job Listings For Lethbridge and Area: Total No. of Jobs
No ratings yet
Job Listings For Lethbridge and Area: Total No. of Jobs
13 pages
0 Anaconda-Guide 040323
No ratings yet
0 Anaconda-Guide 040323
22 pages
Physics Project Final
No ratings yet
Physics Project Final
11 pages
Thesis Topic: Socio Culture Centre
No ratings yet
Thesis Topic: Socio Culture Centre
10 pages
Dell VxRail Technical Specifications
No ratings yet
Dell VxRail Technical Specifications
24 pages
ARTS7 Q3 M3 Arts and Crafts of Mindanao Artifacts and Art Objects v4
No ratings yet
ARTS7 Q3 M3 Arts and Crafts of Mindanao Artifacts and Art Objects v4
21 pages
MAAE 3202 Experiment A - Complex Stresses
No ratings yet
MAAE 3202 Experiment A - Complex Stresses
11 pages
PPSC Sample Paper - Lecturer in English
No ratings yet
PPSC Sample Paper - Lecturer in English
9 pages
Intro Python
No ratings yet
Intro Python
11 pages
Kedudukan Kelebihan Harta Warisan (Radd) Untuk Janda Dan Duda Dalam Hukum Waris Islam
No ratings yet
Kedudukan Kelebihan Harta Warisan (Radd) Untuk Janda Dan Duda Dalam Hukum Waris Islam
18 pages
Vacant / Left Over Seat Matrix After Mop Up / Final Round (AIEEA UG 2019)
No ratings yet
Vacant / Left Over Seat Matrix After Mop Up / Final Round (AIEEA UG 2019)
9 pages
TDS Ges 309 en
No ratings yet
TDS Ges 309 en
4 pages
Software Environment
No ratings yet
Software Environment
6 pages
Argumentative Essay
No ratings yet
Argumentative Essay
5 pages
Ripon Presentation
No ratings yet
Ripon Presentation
15 pages
9 Maths
No ratings yet
9 Maths
5 pages
Requirement Document For Gameotivity's No-Code Web3 Game Engine For Telegram (Initailly)
No ratings yet
Requirement Document For Gameotivity's No-Code Web3 Game Engine For Telegram (Initailly)
4 pages
Factory Talk Activation Quick Start
No ratings yet
Factory Talk Activation Quick Start
8 pages
Industrial Exhaust Fans As Source of Power: Archit Patnaik, S.M.Ali
No ratings yet
Industrial Exhaust Fans As Source of Power: Archit Patnaik, S.M.Ali
4 pages
Big Skinny
No ratings yet
Big Skinny
4 pages
Hash Table Time Costs - Hash Functions - The Map Interface and Implementations
No ratings yet
Hash Table Time Costs - Hash Functions - The Map Interface and Implementations
25 pages
Lab0 - Warm Up
No ratings yet
Lab0 - Warm Up
5 pages
Grade 9 - Hoaxes and Fakes - Read Laterally For Accuracy Student Handout
No ratings yet
Grade 9 - Hoaxes and Fakes - Read Laterally For Accuracy Student Handout
5 pages
Dental Injuries - Tony Skapetis
No ratings yet
Dental Injuries - Tony Skapetis
32 pages
Numpy - Python Package For Data
No ratings yet
Numpy - Python Package For Data
9 pages
Exercise Comparison Degree
No ratings yet
Exercise Comparison Degree
7 pages
Thermal Degradation of Keratin Waste
No ratings yet
Thermal Degradation of Keratin Waste
9 pages
Quiz #4 - UZB314E Heat Transfer (22822) 2023-2024 Spring Semester
No ratings yet
Quiz #4 - UZB314E Heat Transfer (22822) 2023-2024 Spring Semester
2 pages
Collecting Tweets
No ratings yet
Collecting Tweets
2 pages
Exercise Conjunction
No ratings yet
Exercise Conjunction
3 pages
Santiago Alvarez's Account On The Tejeros Assembly: Xyza Domingo-Ron Alec Alejandro
No ratings yet
Santiago Alvarez's Account On The Tejeros Assembly: Xyza Domingo-Ron Alec Alejandro
2 pages
Uas '20
No ratings yet
Uas '20
3 pages
Passive Voice Pertemuan 12
No ratings yet
Passive Voice Pertemuan 12
2 pages
Past Continous&simple Past Ans
No ratings yet
Past Continous&simple Past Ans
1 page
Text Analysis with Python: A Research-Oriented Guide
From Everand
Text Analysis with Python: A Research-Oriented Guide
Mamta Mittal
No ratings yet
Learning Puppet Security
From Everand
Learning Puppet Security
Jason Slagle
No ratings yet
Object–Oriented Programming with Swift 2
From Everand
Object–Oriented Programming with Swift 2
Hillar Gastón C.
No ratings yet
Mastering Python Forensics: Master the art of digital forensics and analysis with Python
From Everand
Mastering Python Forensics: Master the art of digital forensics and analysis with Python
Michael Spreitzenbarth
4/5 (1)
Python 3 Programming: A Beginner Crash Course Guide to Learn Python 3 in 1 Week
From Everand
Python 3 Programming: A Beginner Crash Course Guide to Learn Python 3 in 1 Week
Timothy C. Needham
3.5/5 (3)
Data Analysis Foundations with Python: Master Data Analysis with Python: From Basics to Advanced Techniques
From Everand
Data Analysis Foundations with Python: Master Data Analysis with Python: From Basics to Advanced Techniques
Cuantum Technologies LLC
No ratings yet
Troubleshooting Puppet
From Everand
Troubleshooting Puppet
Thomas Uphill
No ratings yet
Understand IT: Starting From Scratch, #1
From Everand
Understand IT: Starting From Scratch, #1
LeffeLearn
No ratings yet
Python Crash Course: The Complete Step-By-Step Guide On How to Come Up Easily With Your First Data Science Project From Scratch In Less Than 7 Days
From Everand
Python Crash Course: The Complete Step-By-Step Guide On How to Come Up Easily With Your First Data Science Project From Scratch In Less Than 7 Days
Simon Tallman
No ratings yet
Hacker’s Guide to Machine Learning Concepts
From Everand
Hacker’s Guide to Machine Learning Concepts
Trilokesh Khatri
No ratings yet
Python for Data Science: A Practical Approach to Machine Learning
From Everand
Python for Data Science: A Practical Approach to Machine Learning
Jarrel E.
No ratings yet
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
From Everand
Learning PyTorch 2.0, Second Edition: Utilize PyTorch 2.3 and CUDA 12 to experiment neural networks and deep learning models
Matthew Rosch
No ratings yet
Troubleshooting CentOS
From Everand
Troubleshooting CentOS
Jonathan Hobson
No ratings yet
Machine Learning and Deep Learning With Python
From Everand
Machine Learning and Deep Learning With Python
James Chen
No ratings yet
Python Machine Learning For Beginners: Handbook For Machine Learning, Deep Learning And Neural Networks Using Python, Scikit-Learn And TensorFlow
From Everand
Python Machine Learning For Beginners: Handbook For Machine Learning, Deep Learning And Neural Networks Using Python, Scikit-Learn And TensorFlow
Finn Sanders
No ratings yet
Python for Cybersecurity: Using Python for Cyber Offense and Defense
From Everand
Python for Cybersecurity: Using Python for Cyber Offense and Defense
Howard E. Poston, III
No ratings yet
Designing Machine Learning Systems with Python
From Everand
Designing Machine Learning Systems with Python
David Julian
No ratings yet
The Complete Guide to Parrot OS: Ethical Hacking and Cybersecurity
From Everand
The Complete Guide to Parrot OS: Ethical Hacking and Cybersecurity
Robert Johnson
No ratings yet
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
From Everand
Data Science with Jupyter: Master Data Science skills with easy-to-follow Python examples
Prateek Gupta
No ratings yet
Learning PyTorch 2.0, Second Edition
From Everand
Learning PyTorch 2.0, Second Edition
Matthew Rosch
No ratings yet
Internet of Things with Intel Galileo
From Everand
Internet of Things with Intel Galileo
Miguel de Sousa
No ratings yet
BeagleBone for Secret Agents
From Everand
BeagleBone for Secret Agents
Josh Datko
5/5 (1)
Master Python Without Prior Experience
From Everand
Master Python Without Prior Experience
CodeCraft Dynamics
No ratings yet
Python for Everyone: A Complete Guide to Coding, Data, and Web Development: Your Guide to the Digital World, #3
From Everand
Python for Everyone: A Complete Guide to Coding, Data, and Web Development: Your Guide to the Digital World, #3
Atokhon Ghaniev
No ratings yet
Learning NAGIOS 3.0
From Everand
Learning NAGIOS 3.0
Wojciech Kocjan
No ratings yet
Your First Python Program
From Everand
Your First Python Program
Alexander Paz
No ratings yet
Python for Beginners: A Crash Course to Learn Python Programming in 1 Week
From Everand
Python for Beginners: A Crash Course to Learn Python Programming in 1 Week
Brady Ellison
No ratings yet
Python for Data Science For Dummies
From Everand
Python for Data Science For Dummies
John Paul Mueller
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Python for Mechanical and Aerospace Engineering
From Everand
Python for Mechanical and Aerospace Engineering
Alexander Kenan
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
From Everand
PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project
Mark Chan
5/5 (4)
Python Basics Made Simple: A Practical Guide with Examples
From Everand
Python Basics Made Simple: A Practical Guide with Examples
William E. Clark
No ratings yet
Programming And Coding begginers level
From Everand
Programming And Coding begginers level
Memo
No ratings yet
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
From Everand
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
Mark Magic
No ratings yet
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
From Everand
Introduction to Python Programming: Learn Coding with Hands-On Projects for Beginners
Kiet Huynh
No ratings yet
PYTHON PROGRAMMING
From Everand
PYTHON PROGRAMMING
Ramsey Hamilton
4/5 (12)
The Beginner’s Guide to AI - Aider
From Everand
The Beginner’s Guide to AI - Aider
Steven Mcananey
No ratings yet
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
From Everand
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
Mr Troy
No ratings yet
Python Programming: Learn, Code, Create
From Everand
Python Programming: Learn, Code, Create
Sachin Naha
No ratings yet
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
From Everand
Learn Python Programming for Beginners: The Best Step-by-Step Guide for Coding with Python, Great for Kids and Adults. Includes Practical Exercises on Data Analysis, Machine Learning and More.
Flynn Fisher
4/5 (2)
Getting Started with FortiGate
From Everand
Getting Started with FortiGate
Fabrizio Volpe
No ratings yet
How to Build Self-Driving Cars From Scratch, Part 2: A Step-by-Step Guide to Creating Autonomous Vehicles With Python
From Everand
How to Build Self-Driving Cars From Scratch, Part 2: A Step-by-Step Guide to Creating Autonomous Vehicles With Python
Bolakale Aremu
No ratings yet
Programming Concepts in Python
From Everand
Programming Concepts in Python
Robert Burns
No ratings yet
How to Build Self-Driving Cars From Scratch, Part 1: A Step-by-Step Guide to Creating Autonomous Vehicles With Python
From Everand
How to Build Self-Driving Cars From Scratch, Part 1: A Step-by-Step Guide to Creating Autonomous Vehicles With Python
Bolakale Aremu
No ratings yet
Python Data Science Essentials - Second Edition
From Everand
Python Data Science Essentials - Second Edition
Alberto Boschetti
4.5/5 (3)
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet
Python Made Easy: A First Course in Computer Programming using Python
From Everand
Python Made Easy: A First Course in Computer Programming using Python
Kevin Wilson
No ratings yet
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Python Programming Illustrated For Beginners & Intermediates: “Learn By Doing” Approach-Step By Step Ultimate Guide To Mastering Python: The Future Is Here!: The Future Is Here!
From Everand
Python Programming Illustrated For Beginners & Intermediates: “Learn By Doing” Approach-Step By Step Ultimate Guide To Mastering Python: The Future Is Here!: The Future Is Here!
William Sullivan
4/5 (2)
Python Programming Illustrated For Beginners & Intermediates“Learn By Doing” Approach-Step By Step Ultimate Guide To Mastering Python: The Future Is Here!
From Everand
Python Programming Illustrated For Beginners & Intermediates“Learn By Doing” Approach-Step By Step Ultimate Guide To Mastering Python: The Future Is Here!
William Sullivan
3/5 (1)

Twitter Data Scraping Jupyter Notebook Text Instruction

Uploaded by

Twitter Data Scraping Jupyter Notebook Text Instruction

Uploaded by

Twitter Data Scraping Tutorial

Amy Larner Giroux, PhD

Understanding Digital Culture: Humanist Lenses for Internet Research 1

The first tweets (in reverse date/time order)

Understanding Digital Culture: Humanist Lenses for Internet Research 2

Understanding Digital Culture: Humanist Lenses for Internet Research 3

Install Anaconda with Jupyter Notebook

Anaconda Installer Options

Understanding Digital Culture: Humanist Lenses for Internet Research 4

Welcome Screen – Click Next

Licensing Screen – Click Next

Understanding Digital Culture: Humanist Lenses for Internet Research 5

Installation Path – Typically leave the default unless

Understanding Digital Culture: Humanist Lenses for Internet Research 6

Advanced Options – leave these options as is and click Install

Progress Bar – let the installation run to completion

Understanding Digital Culture: Humanist Lenses for Internet Research 7

PyCharm Advertisement – Ignore it and click Next

Understanding Digital Culture: Humanist Lenses for Internet Research 8

Install GetOldTweets3 Library

Windows Start Menu – Click Anaconda Powershell Prompt

Enter the following command: pip install GetOldTweets3

Understanding Digital Culture: Humanist Lenses for Internet Research 9

Installation messages from pip

Understanding Digital Culture: Humanist Lenses for Internet Research 10

Launch the Data Scraper Jupyter Notebook

Windows Start Menu – click on Jupyter Notebook

Understanding Digital Culture: Humanist Lenses for Internet Research 11

Jupyter Notebook server window – look, don’t touch

Jupyter Notebook – click on Giroux-NEH-DigCulture-TweetScraper.ipynb

Understanding Digital Culture: Humanist Lenses for Internet Research 12

Run the Data Scraper

Notebook content showing comment and code cells

Initialization – Click within this cell and Run it (Ctrl-Enter)

Understanding Digital Culture: Humanist Lenses for Internet Research 13

Understanding Digital Culture: Humanist Lenses for Internet Research 14

Process to retrieve tweets

Understanding Digital Culture: Humanist Lenses for Internet Research 15

Too Many Requests error

Service Temporarily unavailable error

Understanding Digital Culture: Humanist Lenses for Internet Research 16

Username query – Click in the code cell and Run (Ctrl-Enter)

Query by username – Click in code cell and Run (Ctrl-Enter)

Open the Dataset in Excel and Prepare it for Analysis

Changing the ID column from exponential to integer

Merged data files

Remove Duplicates – Uncheck column A and then click on OK

Understanding Digital Culture: Humanist Lenses for Internet Research 20

Splitting Date and Time

Change column format to text

Understanding Digital Culture: Humanist Lenses for Internet Research 21

Text to Columns function

Understanding Digital Culture: Humanist Lenses for Internet Research 22

Step 3(a) – Click the arrow on the right end of Destination

Understanding Digital Culture: Humanist Lenses for Internet Research 23

Understanding Digital Culture: Humanist Lenses for Internet Research 24

Completed Date and Time columns

Mentions formula error

Understanding Digital Culture: Humanist Lenses for Internet Research 25

Remember to save the file as a spreadsheet (XLSX format) so that

The takeaways from this tutorial are to remember the following:

You might also like