Host A Scheduled Scraper On AWS As An API Endpoint - Amen

Uploaded by

santhiyasantthosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views3 pages

Host A Scheduled Scraper On AWS As An API Endpoint - Amen

Uploaded by

santhiyasantthosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Scheduled scraper on Flask as an API Endpoint:

● Python libraries are at the top of the list of web scraping technologies accessible today.

● Beautiful Soup is the most popular Python web scraping library.

● We'll build a web scraper app with Flask, Python's most lightweight web framework.

Step 1: Setup the environment

pip install flask requests beautifulsoup4

Step 2: Create a Flask app

App.py

from bs4 import BeautifulSoup

from newspaper import Article
import requests
from flask import Flask, render_template, redirect
from flask_sqlalchemy import SQLAlchemy
import os
project_dir = os.path.dirname(os.path.abspath(__file__))
database_file = "sqlite:///{}".format(os.path.join(project_dir, "news_scrape.db"))
app = Flask(__name__)
app.config["SQLALCHEMY_DATABASE_URI"] = database_file
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
app.config["SECRET_KEY"] = "newisthesecretofsecretscrape"
db = SQLAlchemy(app)
class Articlelist(db.Model):
id = db.Column(db.Integer, primary_key=True)
title = db.Column(db.Text)
author = db.Column(db.Text)
summary = db.Column(db.Text)
@app.route('/')
def index():
articles = Articlelist.query.all()
return render_template("index.html", articles=articles)
if __name__ == "__main__":
app.run()

Step 3: Create the Scrape function

scrape.py
Create a scraping function to fetch data from the example news portal, `https://www.example-news.com`.
We’ll use the Requests library to send HTTP requests and Beautiful Soup for parsing the HTML.

from bs4 import BeautifulSoup

from newspaper import Article
import requests
import sqlite3
import os
import schedule
import time
connection = sqlite3.connect('news_scrape.db')
cursor = connection.cursor()
class NewsArticle:
def __init__(self, news_urls):
self.news_urls = news_urls
for news_url in self.news_urls:
article = Article(news_url)
article.download()
article.parse()
self.title = article.title
self.author = article.authors[0]
article.nlp()
self.summary = article.summary
cursor.execute("""INSERT INTO Articlelist VALUES (NULL, ?, ?, ?)""", (self.title,
self.author, self.summary))
connection.commit()
def scrape_news():
cursor.execute("DROP TABLE Articlelist")
print("Dropped Table")
print("Creating Table")
cursor.execute(
"""CREATE TABLE Articlelist(
id INTEGER PRIMARY KEY,
title TEXT,
author TEXT,
summary TEXT
)
""")
connection.commit()
print("Created Table")
print("SCRAPPING SITE ONE")
site1_content = requests.get('https://thehackernews.com')
site1_data = site1_content.text
soup1 = BeautifulSoup(site1_data, 'html.parser')
news_urls = []
story_links1 = soup1.find_all('a', class_="story-link")

for story_link1 in story_links1:

url = story_link1.get('href')
news_urls.append(url)
site1 = NewsArticle(news_urls)
news_urls.clear()
print("SCRAPPING SITE TWO")
site2_content =
requests.get('https://www.ehackingnews.com/search/label/Cyber%20Crime?max-results=7')
site2_data = site2_content.text
soup2 = BeautifulSoup(site2_data, 'html.parser')
blog_posts = soup2.find_all('article', class_="home-post")
for blog_post in blog_posts:
url = blog_post.h2.a.get('href')
news_urls.append(url)
site2 = NewsArticle(news_urls)
print("DONE")
#schedule.every(5).minutes.do(scrape_news)
schedule.every().day.at("24:00").do(scrape_news)
while True:
schedule.run_pending()
time.sleep(1)

Step 4: Testing the web scraper

Finally, test your web scraper by running the Flask application with the command `python
app.py`. Using http://localhost:5000 in your browser to see the latest news headlines displayed
in your app.

Scraping 1000's of News Articles Using 10 Simple Steps - by Kajal Yadav - Jun, 2020 - Towards Data Science
No ratings yet
Scraping 1000's of News Articles Using 10 Simple Steps - by Kajal Yadav - Jun, 2020 - Towards Data Science
24 pages
Implementing Web Scraping in Python With Beautifulsoup
No ratings yet
Implementing Web Scraping in Python With Beautifulsoup
6 pages
Python Module-4
No ratings yet
Python Module-4
109 pages
Python Web Scraping Tutorial
92% (12)
Python Web Scraping Tutorial
65 pages
Practical Introduction To Web Scraping in Python
100% (1)
Practical Introduction To Web Scraping in Python
14 pages
Complete Dictionary Objects
No ratings yet
Complete Dictionary Objects
96 pages
Fun With Python
100% (5)
Fun With Python
113 pages
Web Scraping Report
No ratings yet
Web Scraping Report
14 pages
Subdomain Scanner
No ratings yet
Subdomain Scanner
2 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
Scrapping The Web
100% (1)
Scrapping The Web
13 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
Se MCQ Unit 3
100% (1)
Se MCQ Unit 3
6 pages
Web Scraper Mini Project
No ratings yet
Web Scraper Mini Project
13 pages
Programming Assignment Unit 07 - CS 3308 - Information Retrieval - University of The People
No ratings yet
Programming Assignment Unit 07 - CS 3308 - Information Retrieval - University of The People
4 pages
A Textbook of Computer - Foundation of Information Technology-8
No ratings yet
A Textbook of Computer - Foundation of Information Technology-8
154 pages
Web Scraping
No ratings yet
Web Scraping
28 pages
TCS Previous Year Interview Questions: @placement - Fellas On Telegram
No ratings yet
TCS Previous Year Interview Questions: @placement - Fellas On Telegram
10 pages
Experiment2 Web Scraping and Data Analysis
No ratings yet
Experiment2 Web Scraping and Data Analysis
5 pages
6 Results and Discussions
No ratings yet
6 Results and Discussions
5 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
A Simple Python Web Crawler...
100% (1)
A Simple Python Web Crawler...
5 pages
Module 5-Web Scraping
No ratings yet
Module 5-Web Scraping
8 pages
WEBSCRAping Buildwithpython
No ratings yet
WEBSCRAping Buildwithpython
78 pages
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
No ratings yet
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
14 pages
A Guide To Web Scraping in Python Using Beautiful Soup
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
6 pages
Introduction To Web Scraping in RPA With Python
No ratings yet
Introduction To Web Scraping in RPA With Python
10 pages
DWDM Mini Project
No ratings yet
DWDM Mini Project
6 pages
Building A Python Web Scraper
No ratings yet
Building A Python Web Scraper
1 page
Python Web Crawler
No ratings yet
Python Web Crawler
15 pages
Python Web Scraping
No ratings yet
Python Web Scraping
2 pages
Topic1 - Introduction To Python
No ratings yet
Topic1 - Introduction To Python
10 pages
Web Scraping in Python Using Scrapy
No ratings yet
Web Scraping in Python Using Scrapy
30 pages
MIPS Assembly Language Programming: Computer Organization and Assembly
No ratings yet
MIPS Assembly Language Programming: Computer Organization and Assembly
33 pages
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
No ratings yet
Web Scrapping: Dept - of CS&E, BIET, Davangere Page - 1
8 pages
PDF Document 2
No ratings yet
PDF Document 2
24 pages
Upload PDF
No ratings yet
Upload PDF
11 pages
Python Toolbox 100 Scripts For Developers Enhance Your Development Skills With Ready-to-Use Python Scripts (Sari, Serhan) (Z-Library)
No ratings yet
Python Toolbox 100 Scripts For Developers Enhance Your Development Skills With Ready-to-Use Python Scripts (Sari, Serhan) (Z-Library)
193 pages
Web Crawling and Social Media Mining: Module No. 5
No ratings yet
Web Crawling and Social Media Mining: Module No. 5
77 pages
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
No ratings yet
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
193 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
20 - BeautifulSoup Library For Web Scraping
No ratings yet
20 - BeautifulSoup Library For Web Scraping
12 pages
Web Scraping and Data Collection CheatSheet 1731972399
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
10 pages
Basic Scraping Techniques
No ratings yet
Basic Scraping Techniques
7 pages
4 Design and Development
No ratings yet
4 Design and Development
3 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Beginner Guide To Web Scraping of Data
No ratings yet
Beginner Guide To Web Scraping of Data
14 pages
Scraperskank
No ratings yet
Scraperskank
3 pages
DAP - Module 4
No ratings yet
DAP - Module 4
57 pages
Multithreading Crawler Project OS
No ratings yet
Multithreading Crawler Project OS
11 pages
Pseudocodes and Flowcharts (Riyansha Shahare)
No ratings yet
Pseudocodes and Flowcharts (Riyansha Shahare)
14 pages
Advanced Programming Exam 1
No ratings yet
Advanced Programming Exam 1
5 pages
Web Crawling - Python
No ratings yet
Web Crawling - Python
34 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
06 WebScrapingData
No ratings yet
06 WebScrapingData
39 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
16 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
Sithfal-Task2 Explation Matter
No ratings yet
Sithfal-Task2 Explation Matter
6 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
De1 GK NHKTLT
No ratings yet
De1 GK NHKTLT
12 pages
Web Data Scraping
No ratings yet
Web Data Scraping
5 pages
Python For Web Scraping - Week 3: 1 Installing A Module
No ratings yet
Python For Web Scraping - Week 3: 1 Installing A Module
4 pages
Web Scraper For News Headlines in Python
No ratings yet
Web Scraper For News Headlines in Python
2 pages
Retrieving Data From The Web
No ratings yet
Retrieving Data From The Web
9 pages
Posiflex OPOS Driver Installation V13xx
No ratings yet
Posiflex OPOS Driver Installation V13xx
11 pages
Verilog Assignments
No ratings yet
Verilog Assignments
27 pages
TradeNotify (Telegram) EAV4
No ratings yet
TradeNotify (Telegram) EAV4
24 pages
Sukeshini CV
No ratings yet
Sukeshini CV
4 pages
Selection Construct
No ratings yet
Selection Construct
13 pages
Module 2: Getting Started With Visual C++ 6.0 Appwizard: Ondraw Member Function
No ratings yet
Module 2: Getting Started With Visual C++ 6.0 Appwizard: Ondraw Member Function
26 pages
JAVA - An Overview
No ratings yet
JAVA - An Overview
25 pages
Security. Environments. Dave Shackleford. John Wiley &. Sons, Inc. S J) ! ' - T I J. L - I. in - I
No ratings yet
Security. Environments. Dave Shackleford. John Wiley &. Sons, Inc. S J) ! ' - T I J. L - I. in - I
5 pages
Algorithm2e PDF
No ratings yet
Algorithm2e PDF
60 pages
Java Section 4 Ilearning
No ratings yet
Java Section 4 Ilearning
17 pages
Internship Report
No ratings yet
Internship Report
8 pages
VFR 1 9
No ratings yet
VFR 1 9
74 pages
Method Hiding Apologia - Fabulous Adventures in Coding
No ratings yet
Method Hiding Apologia - Fabulous Adventures in Coding
42 pages
Structures: Short Answer
No ratings yet
Structures: Short Answer
11 pages
Sending An Email Which Includes Response Buttons: Neil Kolban - 2014-09-07 - Extract From Next Release of PDF
No ratings yet
Sending An Email Which Includes Response Buttons: Neil Kolban - 2014-09-07 - Extract From Next Release of PDF
5 pages
Release Notes
No ratings yet
Release Notes
21 pages
Responsive Web Design Projects - Build A Product Landing Page - Learn
No ratings yet
Responsive Web Design Projects - Build A Product Landing Page - Learn
1 page
IT 304 OOPM Unit V - 1693892221
No ratings yet
IT 304 OOPM Unit V - 1693892221
10 pages
Modules - Python 3.10.4 Documentation
No ratings yet
Modules - Python 3.10.4 Documentation
9 pages
Using Messages in SAP ABAP Programing - Internal Tables and Work Areas
No ratings yet
Using Messages in SAP ABAP Programing - Internal Tables and Work Areas
6 pages
Class Handout CS226911 Make NavisWORK For You Intro To The Navisworks API Jake Lovelace
No ratings yet
Class Handout CS226911 Make NavisWORK For You Intro To The Navisworks API Jake Lovelace
14 pages
Static Variable Thread Safety
No ratings yet
Static Variable Thread Safety
3 pages
Maven "Convention Over Configuration" Example: An Illustration of This Notion Inside The Maven
No ratings yet
Maven "Convention Over Configuration" Example: An Illustration of This Notion Inside The Maven
2 pages
Ajax in One Hour, For Beginners, Learn Coding Fast
From Everand
Ajax in One Hour, For Beginners, Learn Coding Fast
Ray Yao
No ratings yet