Retrieving Data From The Web
Retrieving Data From The Web
Retrieving data from the web in Python typically involves sending HTTP
requests to web servers and handling the responses. Retrieving data from
the web is generally called web scraping. Web scraping is the process
of extracting specific data from websites. It focuses on retrieving useful
content (like product prices, reviews, news headlines, etc.) from HTML pages
and saving it in a structured format such as CSV, Excel, or a database.
Note: Web Scrapping comes under Web Content Mining which is a type of
Web Mining.
BeautifulSoup module is available under bs4 package, which is used to parse and navigate
HTML or XML content. With this we can easily extract specific data from webpages.
url = "https://google.com"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
import requests
url = "https://api.coindesk.com/v1/bpi/currentprice.json"
response = requests.get(url)
data = response.json()
Download an Image
import requests
img_url = "https://www.example.com/image.jpg"
response = requests.get(img_url)
with open("image.jpg", "wb") as f:
f.write(response.content)
print("Image downloaded!")
Steps:
api_key = 'YOUR_API_KEY'
youtube = build('youtube', 'v3', developerKey=api_key)
request = youtube.videos().list(
part='snippet,contentDetails,statistics',
id='VIDEO_ID'
)
response = request.execute()
video = response['items'][0]
print(f"Title: {video['snippet']['title']}")
print(f"Views: {video['statistics']['viewCount']}")
Steps:
import requests
page_id = 'YOUR_PAGE_ID'
access_token = 'YOUR_PAGE_ACCESS_TOKEN'
message = 'Hello, this is a test post from Python!'
url = f'https://graph.facebook.com/{page_id}/feed'
payload = {
'message': message,
'access_token': access_token
}
Steps:
import tweepy
consumer_key = 'YOUR_CONSUMER_KEY'
consumer_secret = 'YOUR_CONSUMER_SECRET'
access_token = 'YOUR_ACCESS_TOKEN'
access_token_secret = 'YOUR_ACCESS_TOKEN_SECRET'
Steps:
import requests
access_token = 'YOUR_ACCESS_TOKEN'
url = 'https://api.linkedin.com/v2/me'
headers = {
'Authorization': f'Bearer {access_token}'
}
Steps:
import requests
user_id = 'YOUR_USER_ID'
access_token = 'YOUR_ACCESS_TOKEN'
url = f'https://graph.instagram.com/{user_id}/media'
params = {
'fields':
'id,caption,media_type,media_url,thumbnail_url,permalink,timestamp',
'access_token': access_token
}
https://medium.com/@JulietOma/a-data-analysts-guide-to-web-scraping-with-
python-008f1ed32647