0% found this document useful (0 votes)
5 views9 pages

Retrieving Data From The Web

The document provides an overview of retrieving data from the web using Python, focusing on web scraping techniques with the Requests and BeautifulSoup libraries. It covers accessing HTML content, parsing it, consuming JSON from APIs, and downloading files, along with examples for YouTube, Facebook, Twitter, LinkedIn, and Instagram APIs. Additionally, it highlights the importance of setting up developer accounts and obtaining necessary API keys for accessing various platforms.

Uploaded by

raavihema3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views9 pages

Retrieving Data From The Web

The document provides an overview of retrieving data from the web using Python, focusing on web scraping techniques with the Requests and BeautifulSoup libraries. It covers accessing HTML content, parsing it, consuming JSON from APIs, and downloading files, along with examples for YouTube, Facebook, Twitter, LinkedIn, and Instagram APIs. Additionally, it highlights the importance of setting up developer accounts and obtaining necessary API keys for accessing various platforms.

Uploaded by

raavihema3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Retrieving Data from the Web

Retrieving data from the web in Python typically involves sending HTTP
requests to web servers and handling the responses. Retrieving data from
the web is generally called web scraping. Web scraping is the process
of extracting specific data from websites. It focuses on retrieving useful
content (like product prices, reviews, news headlines, etc.) from HTML pages
and saving it in a structured format such as CSV, Excel, or a database.

Note: Web Scrapping comes under Web Content Mining which is a type of
Web Mining.

Using requests to Retrieve Web Data

Requests is a module in Python, which is used to send HTTP requests to


web pages (GET, POST, etc.). Helps to retrieve the content (HTML, JSON, etc.) from a
webpage.

1. requests.get(url) # Sends a GET request.


2. requests.post(url, data) # Sends a POST request.
3. response.text # Returns the content of the response in
Unicode.
4. response.json() # Converts a JSON response to a Python
dictionary.

Retrieving HTML from a Web Page


import requests
url = "https://google.com"
response = requests.get(url)

print("Status Code:", response.status_code)


print("Content:", response.text[:500]) # print first 500 characters
 requests.get() sends a GET request.
 response.text contains the HTML content of the page.

Parsing HTML using BeautifulSoup (Web Scraping)

BeautifulSoup module is available under bs4 package, which is used to parse and navigate
HTML or XML content. With this we can easily extract specific data from webpages.

1. BeautifulSoup(html, "html.parser") Creates a parse tree


from HTML.

2. soup.find(tag) Finds the first occurrence of a tag.

3. soup.find_all(tag) Finds all occurrences of a tag.

4. .text Gets the text content inside a tag.

5. .attrs Access tag attributes.

from bs4 import BeautifulSoup


import requests

url = "https://google.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

# Extract the title


title = soup.title.string
print("Page Title:", title)
# Extract all links
for link in soup.find_all('a'):
print(link.get('href'))

 BeautifulSoup helps parse and navigate HTML/XML.


 We can search for tags, classes, IDs, and more.

Accessing JSON Data from a Public API

Many websites provide data through REST APIs in JSON format.

Consuming JSON from a Web API

import requests

url = "https://api.coindesk.com/v1/bpi/currentprice.json"
response = requests.get(url)
data = response.json()

# Access Bitcoin price in USD


price_usd = data['bpi']['USD']['rate']
print("Bitcoin Price (USD):", price_usd)

 response.json() converts JSON to a Python dictionary.


 Useful for APIs that return structured data.

Downloading a File (like an Image or CSV)

Download an Image
import requests

img_url = "https://www.example.com/image.jpg"
response = requests.get(img_url)
with open("image.jpg", "wb") as f:
f.write(response.content)

print("Image downloaded!")

 response.content is used for binary data (images, files).


 Open file in 'wb' (write binary) mode to save it.

YouTube: Fetching Video Details Using YouTube Data API

Objective: Retrieve details of a specific YouTube video.

Steps:

Set Up Google Cloud Project:

Create a project on Google Cloud Console.

Enable the YouTube Data API v3.

Obtain an API key.

Install Required Libraries

pip install google-api-python-client

Python Code to Fetch Video Details

from googleapiclient.discovery import build

api_key = 'YOUR_API_KEY'
youtube = build('youtube', 'v3', developerKey=api_key)

request = youtube.videos().list(
part='snippet,contentDetails,statistics',
id='VIDEO_ID'
)
response = request.execute()

video = response['items'][0]
print(f"Title: {video['snippet']['title']}")
print(f"Views: {video['statistics']['viewCount']}")

Introduction to REST APIs and JSON responses.

Understanding API quotas and limits.

2. Facebook: Automating Posts Using Facebook Graph API

Objective: Automate posting to a Facebook page.

Steps:

Set Up Facebook Developer Account:

Create an app on Facebook Developers.

Obtain a Page Access Token.

Install Required Libraries:

pip install requests

Python Code to Post on Facebook Page:

import requests

page_id = 'YOUR_PAGE_ID'
access_token = 'YOUR_PAGE_ACCESS_TOKEN'
message = 'Hello, this is a test post from Python!'

url = f'https://graph.facebook.com/{page_id}/feed'
payload = {
'message': message,
'access_token': access_token
}

response = requests.post(url, data=payload)


print(response.json())

Twitter: Fetching Tweets Using Tweepy

Objective: Retrieve recent tweets from a user's timeline.

Steps:

Set Up Twitter Developer Account:

Apply for a developer account on Twitter Developer Platform.

Obtain API keys and tokens.

Install Required Libraries:

pip install tweepy

Python Code to Fetch Tweets:

import tweepy
consumer_key = 'YOUR_CONSUMER_KEY'
consumer_secret = 'YOUR_CONSUMER_SECRET'
access_token = 'YOUR_ACCESS_TOKEN'
access_token_secret = 'YOUR_ACCESS_TOKEN_SECRET'

auth = tweepy.OAuth1UserHandler(consumer_key, consumer_secret,


access_token, access_token_secret)
api = tweepy.API(auth)

tweets = api.user_timeline(screen_name='twitter', count=5)


for tweet in tweets:
print(f"{tweet.user.name} said: {tweet.text}")

LinkedIn: Retrieving Profile Information

Objective: Access basic profile information using LinkedIn API.

Note: LinkedIn's API access is limited. For educational purposes, consider


using Proxycurl as an alternative.

Steps:

Set Up LinkedIn Developer Account:

Create an app on LinkedIn Developers.

Obtain Client ID and Client Secret.

Install Required Libraries:

pip install requests


Python Code to Fetch Profile Data:

import requests

access_token = 'YOUR_ACCESS_TOKEN'
url = 'https://api.linkedin.com/v2/me'

headers = {
'Authorization': f'Bearer {access_token}'
}

response = requests.get(url, headers=headers)


profile = response.json()
print(profile)

Instagram: Accessing User Media Using Instagram Graph API

Objective: Retrieve recent media posts from an Instagram Business


Account.

Steps:

Set Up Facebook Developer Account:

Create an app on Facebook Developers.

Link your Instagram Business Account.

Install Required Libraries:

pip install requests


Python Code to Fetch Instagram Media:

import requests

user_id = 'YOUR_USER_ID'
access_token = 'YOUR_ACCESS_TOKEN'
url = f'https://graph.instagram.com/{user_id}/media'

params = {
'fields':
'id,caption,media_type,media_url,thumbnail_url,permalink,timestamp',
'access_token': access_token
}

response = requests.get(url, params=params)


media = response.json()
for item in media['data']:
print(item['media_url'])

https://medium.com/@JulietOma/a-data-analysts-guide-to-web-scraping-with-
python-008f1ed32647

You might also like