0% found this document useful (0 votes)

7 views3 pages

API Data Collection

The document outlines a plan to collect health-related discussions from Twitter and Reddit, storing the data in MongoDB. It details the steps for setting up API access, fetching data using Tweepy and PRAW, and automating the data collection process. Additionally, it emphasizes the importance of testing API rate limits, filtering irrelevant data, and setting up logging for errors.

Uploaded by

manfredbaraka33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views3 pages

API Data Collection

Uploaded by

manfredbaraka33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Phase 1: Collecting Data from Twitter & Reddit

✅ Goal: Gather health-related discussions from Twitter (X) and Reddit and store them in
MongoDB.

Step 1: Set Up API Access

1.1 Get API Keys

You need API access for both Twitter (X) and Reddit:

📌 Twitter API (X)

1. Sign up at Twitter Developer Portal.

2. Create a project and get API keys & Bearer Token.
3. Use Tweepy or httpx to interact with the API.

📌 Reddit API

1. Go to Reddit Apps.
2. Click "Create App" and select "Script".
3. Save the client_id, client_secret, user_agent, username, and password.
4. Use PRAW (Python Reddit API Wrapper) to fetch data.

Step 2: Fetch Twitter Data

2.1 Install Dependencies
pip install tweepy pymongo

2.2 Fetch Tweets

Use Tweepy with Twitter's API v2. You can search for recent tweets containing health-related
keywords.

import tweepy
import pymongo

# Twitter API Credentials

BEARER_TOKEN = "your_bearer_token"

client = tweepy.Client(bearer_token=BEARER_TOKEN)

# MongoDB Connection
mongo_client = pymongo.MongoClient("mongodb://localhost:27017/")
db = mongo_client["health_sentiment"]
tweets_collection = db["tweets"]

# Fetch recent tweets

query = "(mental health OR vaccine OR covid OR anxiety) lang:en -
is:retweet"
tweets = client.search_recent_tweets(query=query, max_results=100,
tweet_fields=["created_at", "text", "author_id"])

# Store in MongoDB
for tweet in tweets.data:
tweets_collection.insert_one({
"platform": "twitter",
"author_id": tweet.author_id,
"text": tweet.text,
"timestamp": tweet.created_at
})

print("Tweets saved successfully!")

2.3 Automate Data Collection

 Run the script every 15 minutes using a cron job or FastAPI background task.

Step 3: Fetch Reddit Data

3.1 Install Dependencies
pip install praw pymongo

3.2 Fetch Reddit Posts

Use PRAW to scrape trending posts from relevant subreddits.

import praw

# Reddit API Credentials

reddit = praw.Reddit(
client_id="your_client_id",
client_secret="your_client_secret",
user_agent="health_sentiment_scraper"
)

# MongoDB Connection
reddit_collection = db["reddit_posts"]

# Fetch posts from health-related subreddits

subreddits = ["health", "Coronavirus", "mentalhealth"]
for subreddit in subreddits:
for post in reddit.subreddit(subreddit).hot(limit=50):
reddit_collection.insert_one({
"platform": "reddit",
"subreddit": subreddit,
"author": post.author.name if post.author else "unknown",
"text": post.title + " " + post.selftext,
"timestamp": post.created_utc
})

print("Reddit posts saved successfully!")

3.3 Automate Data Collection

 Schedule this script to run every hour for fresh data.

Step 4: Store & Structure Data in MongoDB

We store data in two collections:

1. tweets
2. reddit_posts

Schema Example
{
"platform": "twitter",
"author_id": "123456",
"text": "The new vaccine rollout is promising!",
"timestamp": "2025-03-31T10:00:00Z"
}
{
"platform": "reddit",
"subreddit": "mentalhealth",
"author": "user123",
"text": "I've been feeling anxious about the vaccine lately...",
"timestamp": 1711862400
}

Next Steps
🔹 Test API Rate Limits – Ensure you don’t get blocked.
🔹 Filter Irrelevant Data – Remove spammy or promotional posts.
🔹 Set Up Logging – Save API errors and failed requests.

Want help with setting up a cron job or FastAPI background tasks? 🚀

Ericsson Private 5G Solution Brief
No ratings yet
Ericsson Private 5G Solution Brief
5 pages
4100 Technical Presentation
No ratings yet
4100 Technical Presentation
96 pages
EXP 1 SMA - Merged
No ratings yet
EXP 1 SMA - Merged
64 pages
Mongodb and Python-1
No ratings yet
Mongodb and Python-1
66 pages
Web Scraping and Data Collection CheatSheet 1731972399
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
10 pages
Backend
No ratings yet
Backend
16 pages
G
No ratings yet
G
22 pages
TableauCertifiedDataAnalyst ExamGuide
No ratings yet
TableauCertifiedDataAnalyst ExamGuide
16 pages
Projects
No ratings yet
Projects
12 pages
Advance Data Mining Assignment
No ratings yet
Advance Data Mining Assignment
10 pages
SAP SD For Dummies
No ratings yet
SAP SD For Dummies
51 pages
02 - Introduction To Data Lakehouse Open-Source Technologies
No ratings yet
02 - Introduction To Data Lakehouse Open-Source Technologies
42 pages
Sma 3
No ratings yet
Sma 3
3 pages
Azure管理 AZ 103官方教材
No ratings yet
Azure管理 AZ 103官方教材
387 pages
10 Streamlit
No ratings yet
10 Streamlit
7 pages
EMC ScaleIO Performance Reports
No ratings yet
EMC ScaleIO Performance Reports
18 pages
Initial Requirement Document of Bank Management System
No ratings yet
Initial Requirement Document of Bank Management System
13 pages
Threat Monitot Overview
No ratings yet
Threat Monitot Overview
4 pages
01 DevOps Introduction
No ratings yet
01 DevOps Introduction
156 pages
The SOLID Principles
No ratings yet
The SOLID Principles
104 pages
09 - AI-900 1-35 - M - Answered
No ratings yet
09 - AI-900 1-35 - M - Answered
9 pages
University of Mumbai MCQ QUESTION BANK (100 Questions) : Middleware Firmware Package System Software Middleware
No ratings yet
University of Mumbai MCQ QUESTION BANK (100 Questions) : Middleware Firmware Package System Software Middleware
16 pages
Software Testing Maturity Model (SW-TMM) : Outline
No ratings yet
Software Testing Maturity Model (SW-TMM) : Outline
5 pages
1 MPI Communications: CS424. Parallel Computing Lab#4
No ratings yet
1 MPI Communications: CS424. Parallel Computing Lab#4
30 pages
Primary Key, Candidate Key, Alternate Key, Foreign Key, Composite Key
100% (1)
Primary Key, Candidate Key, Alternate Key, Foreign Key, Composite Key
7 pages
Dart Configuration Management Services
No ratings yet
Dart Configuration Management Services
31 pages
Programming
No ratings yet
Programming
43 pages
M06 StorageScalability
No ratings yet
M06 StorageScalability
61 pages
SDD Report
No ratings yet
SDD Report
7 pages
Deloitte Uk ssc2021 Process Mining
No ratings yet
Deloitte Uk ssc2021 Process Mining
14 pages
XZXC
No ratings yet
XZXC
13 pages
John Mashey Capture Curate: Mca Iii Sem
No ratings yet
John Mashey Capture Curate: Mca Iii Sem
4 pages
Datafication - Wikipedia PDF
No ratings yet
Datafication - Wikipedia PDF
14 pages
Mu Camatt2.20 7
No ratings yet
Mu Camatt2.20 7
1 page
Narvar Connect - Magento 2.x Community Extension
No ratings yet
Narvar Connect - Magento 2.x Community Extension
10 pages
Volkswagen After Sales CEM 2019 Survey: Dealer Name Dealer Code Report Event Month Report Generated On
No ratings yet
Volkswagen After Sales CEM 2019 Survey: Dealer Name Dealer Code Report Event Month Report Generated On
9 pages
Review of Related Literature
No ratings yet
Review of Related Literature
3 pages
Teaching Cloud Computing Using Project-Based Learning: Linh B. Ngo
No ratings yet
Teaching Cloud Computing Using Project-Based Learning: Linh B. Ngo
1 page
Skema Structure Jaringan: ISP: Skydive
No ratings yet
Skema Structure Jaringan: ISP: Skydive
4 pages
Building Intelligent Agents with Google ADK
From Everand
Building Intelligent Agents with Google ADK
Amulya Rattan Bhatia
No ratings yet
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
From Everand
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Neylson Crepalde
No ratings yet
Python Pen-testing Unleashed : Techniques for Ethical Hacking with Python
From Everand
Python Pen-testing Unleashed : Techniques for Ethical Hacking with Python
Pratham Pawar
No ratings yet
50+ App Features with Python: Implement feature-focused, code-driven Python capabilities with UX at the core
From Everand
50+ App Features with Python: Implement feature-focused, code-driven Python capabilities with UX at the core
Ylena Zorak
No ratings yet
50+ App Features with Python
From Everand
50+ App Features with Python
Ylena Zorak
No ratings yet
Make AI Work for You While You Nap
From Everand
Make AI Work for You While You Nap
Nexia
No ratings yet
Hacker’s Guide to Machine Learning Concepts
From Everand
Hacker’s Guide to Machine Learning Concepts
Trilokesh Khatri
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
CodeIgniter 1.7
From Everand
CodeIgniter 1.7
David Upton
No ratings yet
Pyqt6 101: A Beginner’s Guide to PyQt6
From Everand
Pyqt6 101: A Beginner’s Guide to PyQt6
Edward Chang
No ratings yet
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Python for Cybersecurity: Using Python for Cyber Offense and Defense
From Everand
Python for Cybersecurity: Using Python for Cyber Offense and Defense
Howard E. Poston, III
No ratings yet
Collection of Raspberry Pi Projects
From Everand
Collection of Raspberry Pi Projects
Guillermo Perez Guillen
5/5 (1)
Conversations with: AI: Developer edition, #1
From Everand
Conversations with: AI: Developer edition, #1
Xinc Cyberwizard
No ratings yet
NoSQL Injection for Elasticsearch
From Everand
NoSQL Injection for Elasticsearch
Gary Drocella
No ratings yet
Python Projects for Everyone
From Everand
Python Projects for Everyone
Mohamad Charara
No ratings yet
API Gateway, Cognito and Node.js Lambdas
From Everand
API Gateway, Cognito and Node.js Lambdas
Matthew Casperson
5/5 (1)
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
From Everand
Stripe Integration in Angular: A Step-by-Step Guide to Creating Payment Functionality
Abdelfattah Ragab
No ratings yet
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
From Everand
Study Guide Cisco 300-735 SAUTO Automating and Programming Cisco Security Solutions Exam
Anand Vemula
No ratings yet
Angular HTTP: Connecting to the REST API
From Everand
Angular HTTP: Connecting to the REST API
Abdelfattah Ragab
No ratings yet
Fresher PyQt5: A Beginner’s Guide to PyQt5
From Everand
Fresher PyQt5: A Beginner’s Guide to PyQt5
Edward Chang
No ratings yet
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Building a Countdown Timer Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Countdown Timer
From Everand
Building a Countdown Timer Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Countdown Timer
Lumavalle Press
No ratings yet
Thirty Things To Do After You Die: An epic fantasy novel for atheists
From Everand
Thirty Things To Do After You Die: An epic fantasy novel for atheists
Colon Lazars
No ratings yet
Operation Ragnarok
From Everand
Operation Ragnarok
Kevin Coolidge
No ratings yet
Introduction to Python Programming: Do your first steps into programming with python
From Everand
Introduction to Python Programming: Do your first steps into programming with python
Greytower Corp
No ratings yet
A Mighty Tree: An Illustrated Epic
From Everand
A Mighty Tree: An Illustrated Epic
Chris Vandeleur
No ratings yet
Projects with IOTA
From Everand
Projects with IOTA
Guillermo Perez Guillen
No ratings yet
AWS Solution Architect Certification Exam Practice Paper 2019
From Everand
AWS Solution Architect Certification Exam Practice Paper 2019
Tech Interviews
3.5/5 (3)
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
From Everand
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
equitypress
4.5/5 (3)
Microsoft PowerShell, VBScript and JScript Bible
From Everand
Microsoft PowerShell, VBScript and JScript Bible
William R. Stanek
No ratings yet
Aprende programación python aplicaciones web: python, #2
From Everand
Aprende programación python aplicaciones web: python, #2
Jesus Jonathan cuevas orozco
No ratings yet
ETHICAL HACKING GUIDE-Part 3: Comprehensive Guide to Ethical Hacking world
From Everand
ETHICAL HACKING GUIDE-Part 3: Comprehensive Guide to Ethical Hacking world
POONAM DEVI
No ratings yet
The Beginner’s Guide to AI - Aider
From Everand
The Beginner’s Guide to AI - Aider
Steven Mcananey
No ratings yet
Tom Roberts "Go forward, dear": A horseman's life and legacy
From Everand
Tom Roberts "Go forward, dear": A horseman's life and legacy
Andrew McLean
No ratings yet
Web Scraping for SEO with Python
From Everand
Web Scraping for SEO with Python
Enrique Vicente
No ratings yet
SC-200: Microsoft Security Operations Analyst Preparation
From Everand
SC-200: Microsoft Security Operations Analyst Preparation
Georgio Daccache
No ratings yet
Coding & Dev Tools 300+ Prompts Collection
From Everand
Coding & Dev Tools 300+ Prompts Collection
Hema
No ratings yet
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
From Everand
Footprinting, Reconnaissance, Scanning and Enumeration Techniques of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Hacking of Computer Networks: Full Course on Hacking of Computer Networks
From Everand
Hacking of Computer Networks: Full Course on Hacking of Computer Networks
Dr. Hidaia Mahmood Alassouli
No ratings yet
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
From Everand
Spring Boot Intermediate Microservices: Resilient Microservices with Spring Boot 2 and Spring Cloud
Jens Boje
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet
Common Windows, Linux and Web Server Systems Hacking Techniques
From Everand
Common Windows, Linux and Web Server Systems Hacking Techniques
Dr. Hidaia Mahmood Alassouli
No ratings yet
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
From Everand
Salesforce Certified Platform Developer I CRT-450 Exam Preparation
Georgio Daccache
No ratings yet
Setup of a Graphical User Interface Desktop for Linux Virtual Machine on Cloud Platforms
From Everand
Setup of a Graphical User Interface Desktop for Linux Virtual Machine on Cloud Platforms
Dr. Hidaia Mahmood Alassouli
No ratings yet
Some Tutorials in Computer Networking Hacking
From Everand
Some Tutorials in Computer Networking Hacking
Dr. Hidaia Mahmood Alassouli
No ratings yet
The Adventures of Mitee Mite: The Entire First Edition Collection
From Everand
The Adventures of Mitee Mite: The Entire First Edition Collection
David John
No ratings yet
SRS - How to build a Pen Test and Hacking Platform
From Everand
SRS - How to build a Pen Test and Hacking Platform
alasdair gilchrist
2/5 (1)

API Data Collection

Uploaded by

API Data Collection

Uploaded by

Phase 1: Collecting Data from Twitter & Reddit

Step 1: Set Up API Access

📌 Twitter API (X)

1. Sign up at Twitter Developer Portal.

Step 2: Fetch Twitter Data

2.2 Fetch Tweets

# Twitter API Credentials

# Fetch recent tweets

print("Tweets saved successfully!")

2.3 Automate Data Collection

Step 3: Fetch Reddit Data

3.2 Fetch Reddit Posts

Use PRAW to scrape trending posts from relevant subreddits.

# Reddit API Credentials

# Fetch posts from health-related subreddits

print("Reddit posts saved successfully!")

3.3 Automate Data Collection

 Schedule this script to run every hour for fresh data.

Step 4: Store & Structure Data in MongoDB

Want help with setting up a cron job or FastAPI background tasks? 🚀

You might also like