0% found this document useful (0 votes)

5 views9 pages

Retrieving Data From The Web

The document provides an overview of retrieving data from the web using Python, focusing on web scraping techniques with the Requests and BeautifulSoup libraries. It covers accessing HTML content, parsing it, consuming JSON from APIs, and downloading files, along with examples for YouTube, Facebook, Twitter, LinkedIn, and Instagram APIs. Additionally, it highlights the importance of setting up developer accounts and obtaining necessary API keys for accessing various platforms.

Uploaded by

raavihema3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views9 pages

Retrieving Data From The Web

Uploaded by

raavihema3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Retrieving Data from the Web

Retrieving data from the web in Python typically involves sending HTTP
requests to web servers and handling the responses. Retrieving data from
the web is generally called web scraping. Web scraping is the process
of extracting specific data from websites. It focuses on retrieving useful
content (like product prices, reviews, news headlines, etc.) from HTML pages
and saving it in a structured format such as CSV, Excel, or a database.

Note: Web Scrapping comes under Web Content Mining which is a type of
Web Mining.

Using requests to Retrieve Web Data

Requests is a module in Python, which is used to send HTTP requests to

web pages (GET, POST, etc.). Helps to retrieve the content (HTML, JSON, etc.) from a
webpage.

1. requests.get(url) # Sends a GET request.

2. requests.post(url, data) # Sends a POST request.
3. response.text # Returns the content of the response in
Unicode.
4. response.json() # Converts a JSON response to a Python
dictionary.

Retrieving HTML from a Web Page

import requests
url = "https://google.com"
response = requests.get(url)

print("Status Code:", response.status_code)

print("Content:", response.text[:500]) # print first 500 characters
 requests.get() sends a GET request.
 response.text contains the HTML content of the page.

Parsing HTML using BeautifulSoup (Web Scraping)

BeautifulSoup module is available under bs4 package, which is used to parse and navigate
HTML or XML content. With this we can easily extract specific data from webpages.

1. BeautifulSoup(html, "html.parser") Creates a parse tree

from HTML.

2. soup.find(tag) Finds the first occurrence of a tag.

3. soup.find_all(tag) Finds all occurrences of a tag.

4. .text Gets the text content inside a tag.

5. .attrs Access tag attributes.

from bs4 import BeautifulSoup

import requests

url = "https://google.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Extract the title

title = soup.title.string
print("Page Title:", title)
# Extract all links
for link in soup.find_all('a'):
print(link.get('href'))

 BeautifulSoup helps parse and navigate HTML/XML.

 We can search for tags, classes, IDs, and more.

Accessing JSON Data from a Public API

Many websites provide data through REST APIs in JSON format.

Consuming JSON from a Web API

import requests

url = "https://api.coindesk.com/v1/bpi/currentprice.json"
response = requests.get(url)
data = response.json()

# Access Bitcoin price in USD

price_usd = data['bpi']['USD']['rate']
print("Bitcoin Price (USD):", price_usd)

 response.json() converts JSON to a Python dictionary.

 Useful for APIs that return structured data.

Downloading a File (like an Image or CSV)

Download an Image
import requests

img_url = "https://www.example.com/image.jpg"
response = requests.get(img_url)
with open("image.jpg", "wb") as f:
f.write(response.content)

print("Image downloaded!")

 response.content is used for binary data (images, files).

 Open file in 'wb' (write binary) mode to save it.

YouTube: Fetching Video Details Using YouTube Data API

Objective: Retrieve details of a specific YouTube video.

Steps:

Set Up Google Cloud Project:

Create a project on Google Cloud Console.

Enable the YouTube Data API v3.

Obtain an API key.

Install Required Libraries

pip install google-api-python-client

Python Code to Fetch Video Details

from googleapiclient.discovery import build

api_key = 'YOUR_API_KEY'
youtube = build('youtube', 'v3', developerKey=api_key)

request = youtube.videos().list(
part='snippet,contentDetails,statistics',
id='VIDEO_ID'
)
response = request.execute()

video = response['items'][0]
print(f"Title: {video['snippet']['title']}")
print(f"Views: {video['statistics']['viewCount']}")

Introduction to REST APIs and JSON responses.

Understanding API quotas and limits.

2. Facebook: Automating Posts Using Facebook Graph API

Objective: Automate posting to a Facebook page.

Steps:

Set Up Facebook Developer Account:

Create an app on Facebook Developers.

Obtain a Page Access Token.

Install Required Libraries:

pip install requests

Python Code to Post on Facebook Page:

import requests

page_id = 'YOUR_PAGE_ID'
access_token = 'YOUR_PAGE_ACCESS_TOKEN'
message = 'Hello, this is a test post from Python!'

url = f'https://graph.facebook.com/{page_id}/feed'
payload = {
'message': message,
'access_token': access_token
}

response = requests.post(url, data=payload)

print(response.json())

Twitter: Fetching Tweets Using Tweepy

Objective: Retrieve recent tweets from a user's timeline.

Steps:

Set Up Twitter Developer Account:

Apply for a developer account on Twitter Developer Platform.

Obtain API keys and tokens.

Install Required Libraries:

pip install tweepy

Python Code to Fetch Tweets:

import tweepy
consumer_key = 'YOUR_CONSUMER_KEY'
consumer_secret = 'YOUR_CONSUMER_SECRET'
access_token = 'YOUR_ACCESS_TOKEN'
access_token_secret = 'YOUR_ACCESS_TOKEN_SECRET'

auth = tweepy.OAuth1UserHandler(consumer_key, consumer_secret,

access_token, access_token_secret)
api = tweepy.API(auth)

tweets = api.user_timeline(screen_name='twitter', count=5)

for tweet in tweets:
print(f"{tweet.user.name} said: {tweet.text}")

LinkedIn: Retrieving Profile Information

Objective: Access basic profile information using LinkedIn API.

Note: LinkedIn's API access is limited. For educational purposes, consider

using Proxycurl as an alternative.

Steps:

Set Up LinkedIn Developer Account:

Create an app on LinkedIn Developers.

Obtain Client ID and Client Secret.

Install Required Libraries:

pip install requests

Python Code to Fetch Profile Data:

import requests

access_token = 'YOUR_ACCESS_TOKEN'
url = 'https://api.linkedin.com/v2/me'

headers = {
'Authorization': f'Bearer {access_token}'
}

response = requests.get(url, headers=headers)

profile = response.json()
print(profile)

Instagram: Accessing User Media Using Instagram Graph API

Objective: Retrieve recent media posts from an Instagram Business

Account.

Steps:

Set Up Facebook Developer Account:

Create an app on Facebook Developers.

Link your Instagram Business Account.

Install Required Libraries:

pip install requests

Python Code to Fetch Instagram Media:

import requests

user_id = 'YOUR_USER_ID'
access_token = 'YOUR_ACCESS_TOKEN'
url = f'https://graph.instagram.com/{user_id}/media'

params = {
'fields':
'id,caption,media_type,media_url,thumbnail_url,permalink,timestamp',
'access_token': access_token
}

response = requests.get(url, params=params)

media = response.json()
for item in media['data']:
print(item['media_url'])

https://medium.com/@JulietOma/a-data-analysts-guide-to-web-scraping-with-
python-008f1ed32647

Python Module-4
No ratings yet
Python Module-4
109 pages
Vehicle Architectures
No ratings yet
Vehicle Architectures
56 pages
Web Scraping Report
No ratings yet
Web Scraping Report
14 pages
Web Scraping With Python Tutorials From A To Z
100% (2)
Web Scraping With Python Tutorials From A To Z
35 pages
Anis D. Ultimate Step by Step Guide To Data Science..Python.2021
No ratings yet
Anis D. Ultimate Step by Step Guide To Data Science..Python.2021
161 pages
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
100% (3)
Web Scraping Cheat Sheet (2021), Python For Web Scraping by Frank Andrade Geek Culture - Medium
26 pages
Python Web Scraping Tutorial
92% (12)
Python Web Scraping Tutorial
65 pages
Lecture03 Data II
No ratings yet
Lecture03 Data II
42 pages
DAP Module4
No ratings yet
DAP Module4
109 pages
012-NetNumen U31 R22 Northbound Interface User Guide (SNMP Interface)
100% (5)
012-NetNumen U31 R22 Northbound Interface User Guide (SNMP Interface)
61 pages
Practical Introduction To Web Scraping in Python
100% (1)
Practical Introduction To Web Scraping in Python
14 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Princess The Hopeful
No ratings yet
Princess The Hopeful
433 pages
DAP - Module 4
No ratings yet
DAP - Module 4
57 pages
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
No ratings yet
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
193 pages
Web Scraping With Python
No ratings yet
Web Scraping With Python
16 pages
Roadmap C1 TB 9781292228709 UNIT 1
No ratings yet
Roadmap C1 TB 9781292228709 UNIT 1
38 pages
Getting Data
No ratings yet
Getting Data
54 pages
Reader's Digest 2010-02
No ratings yet
Reader's Digest 2010-02
204 pages
Implementing Web Scraping in Python With Beautifulsoup
No ratings yet
Implementing Web Scraping in Python With Beautifulsoup
6 pages
Web Scrapping
100% (1)
Web Scrapping
20 pages
Course Notes - Web Scraping and API Fundamentals in Python
No ratings yet
Course Notes - Web Scraping and API Fundamentals in Python
10 pages
Chapter 11. Web Scraping
100% (1)
Chapter 11. Web Scraping
57 pages
How To Scrape Websites With Python and BeautifulSoup PDF
100% (2)
How To Scrape Websites With Python and BeautifulSoup PDF
10 pages
Web Crawling - Python
No ratings yet
Web Crawling - Python
34 pages
Jade DR Gem
No ratings yet
Jade DR Gem
9 pages
Web Scraping Presentation With Images
No ratings yet
Web Scraping Presentation With Images
4 pages
Web Scraping Using Python
No ratings yet
Web Scraping Using Python
18 pages
Fun With Python
100% (5)
Fun With Python
113 pages
IRC SP 59 Amendment Published 01112024-59-67
No ratings yet
IRC SP 59 Amendment Published 01112024-59-67
9 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
Control Charts
No ratings yet
Control Charts
99 pages
Form 29
No ratings yet
Form 29
7 pages
Unit 11 Application Development Using Python
No ratings yet
Unit 11 Application Development Using Python
19 pages
Webscraping
No ratings yet
Webscraping
12 pages
FDSWeb Scraping
No ratings yet
FDSWeb Scraping
31 pages
PDF Document 2
No ratings yet
PDF Document 2
24 pages
Data Analysis by Web Scraping Using Python
No ratings yet
Data Analysis by Web Scraping Using Python
6 pages
5412201A02 SECDIS - Installation - and - Setup - Manual
No ratings yet
5412201A02 SECDIS - Installation - and - Setup - Manual
76 pages
06 WebScrapingData
No ratings yet
06 WebScrapingData
39 pages
Kinds of Ecosystem and Communities
No ratings yet
Kinds of Ecosystem and Communities
18 pages
Web Scraping 2
No ratings yet
Web Scraping 2
14 pages
20 - BeautifulSoup Library For Web Scraping
No ratings yet
20 - BeautifulSoup Library For Web Scraping
12 pages
Experiment2 Web Scraping and Data Analysis
No ratings yet
Experiment2 Web Scraping and Data Analysis
5 pages
Cheat Sheet: API's and Data Collection: Package/Method Description Code Example
No ratings yet
Cheat Sheet: API's and Data Collection: Package/Method Description Code Example
6 pages
Annexure IV
No ratings yet
Annexure IV
11 pages
Web Scraping and Data Collection CheatSheet 1731972399
No ratings yet
Web Scraping and Data Collection CheatSheet 1731972399
10 pages
Web Scraping Using Python - Notes
No ratings yet
Web Scraping Using Python - Notes
6 pages
Web Technologies QA
No ratings yet
Web Technologies QA
5 pages
Strip HTML Tags Using Python
No ratings yet
Strip HTML Tags Using Python
8 pages
Notes For Web Scraping - BeautifulSoup-3903
No ratings yet
Notes For Web Scraping - BeautifulSoup-3903
6 pages
Web Scrapping Final
No ratings yet
Web Scrapping Final
7 pages
Api and Data Structure
No ratings yet
Api and Data Structure
3 pages
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
No ratings yet
Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius
14 pages
Sma 2
No ratings yet
Sma 2
9 pages
Web Scraping
No ratings yet
Web Scraping
28 pages
3252 Ids 10
No ratings yet
3252 Ids 10
5 pages
API's and Data Collection
No ratings yet
API's and Data Collection
4 pages
4F IntroToWebScraping
No ratings yet
4F IntroToWebScraping
6 pages
Introduction To Web Scraping in RPA With Python
No ratings yet
Introduction To Web Scraping in RPA With Python
10 pages
DKA Final
No ratings yet
DKA Final
66 pages
Making Web Requests in Python
No ratings yet
Making Web Requests in Python
2 pages
Ibm Python Module 5 Apis Data Collection
No ratings yet
Ibm Python Module 5 Apis Data Collection
3 pages
Brands / Models of Bajaj Motorcycle: Product Award Award Body
No ratings yet
Brands / Models of Bajaj Motorcycle: Product Award Award Body
11 pages
Glossary: Apis and Data Collection: Term Definition
No ratings yet
Glossary: Apis and Data Collection: Term Definition
2 pages
Unit I
No ratings yet
Unit I
2 pages
Web Scarpping
No ratings yet
Web Scarpping
4 pages
A Guide To Web Scraping in Python Using Beautiful Soup
No ratings yet
A Guide To Web Scraping in Python Using Beautiful Soup
6 pages
Download
No ratings yet
Download
4 pages
API Cheatsheet
No ratings yet
API Cheatsheet
4 pages
Web Scraping With Python and Selenium: Sarah Fatima, Shaik Luqmaan Nuha Abdul Rasheed
No ratings yet
Web Scraping With Python and Selenium: Sarah Fatima, Shaik Luqmaan Nuha Abdul Rasheed
5 pages
Mole Concept
100% (1)
Mole Concept
2 pages
ES Practical Record
No ratings yet
ES Practical Record
58 pages
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
No ratings yet
The Ultimate Web Scraping With Python Bootcamp 2023 - Coderprog
3 pages
PHY 408 Eletromagnetic Theory
No ratings yet
PHY 408 Eletromagnetic Theory
22 pages
Candy CED110-1 12
No ratings yet
Candy CED110-1 12
10 pages
4-Pole Motor SSD PDF
No ratings yet
4-Pole Motor SSD PDF
4 pages
Dke199 ch12
No ratings yet
Dke199 ch12
40 pages
Marie: An Introduction To A Simple Computer
No ratings yet
Marie: An Introduction To A Simple Computer
26 pages
Pilon - Electro Acupuncture
No ratings yet
Pilon - Electro Acupuncture
4 pages
Activity On Concentrations
No ratings yet
Activity On Concentrations
3 pages
Bined 1
No ratings yet
Bined 1
3 pages
Asmaa - Mohamed - QC PDF
No ratings yet
Asmaa - Mohamed - QC PDF
4 pages
PT 2 QP
No ratings yet
PT 2 QP
3 pages
SRNE - IC Series - 1-3kW - High Frequency Pure Sine Wave Inverter - EU - Datasheet - 1.5
No ratings yet
SRNE - IC Series - 1-3kW - High Frequency Pure Sine Wave Inverter - EU - Datasheet - 1.5
1 page
Murr - Contraporca
No ratings yet
Murr - Contraporca
2 pages
LIPBA - Lincoln Spare Parts List PDF
No ratings yet
LIPBA - Lincoln Spare Parts List PDF
3 pages
Angular HTTP: Connecting to the REST API
From Everand
Angular HTTP: Connecting to the REST API
Abdelfattah Ragab
No ratings yet
Web Scraping for SEO with Python
From Everand
Web Scraping for SEO with Python
Enrique Vicente
No ratings yet