7Python Web Scraping Processing Images and Videos
7Python Web Scraping Processing Images and Videos
https://www.tutorialspoint.com/python_web_scraping/python_web_scraping_processing_images_and_videos.htm
Copyright © tutorialspoint.com
Advertisements
Web scraping usually involves downloading, storing and processing the web media content. In this chapter, let us
understand how to process the content downloaded from the web.
Introduction
The web media content that we obtain during scraping can be images, audio and video files, in the form of non
web pages as well as data files. But, can we trust the downloaded data especially on the extension of data we are
going to download and store in our computer memory? This makes it essential to know about the type of data we
are going to store locally.
import requests
Now, provide the URL of the media content we want to download and store locally.
url = "https://authoraditiagarwal.com/wpcontent/uploads/2018/05/MetaSlider_ThinkBig‐
1080x180.jpg"
r = requests.get(url)
With the help of following line of code, we can save the received content as .png file.
with open("ThinkBig.png",'wb') as f:
f.write(r.content)
After running the above Python script, we will get a file named ThinkBig.png, which would have the downloaded
image.
With the help of following Python script, using urlparse, we can extract the filename from URL −
import urllib3
import os
url = "https://authoraditiagarwal.com/wpcontent/uploads/2018/05/MetaSlider_ThinkBig‐
1080x180.jpg"
a = urlparse(url)
a.path
'/wp‐content/uploads/2018/05/MetaSlider_ThinkBig‐1080x180.jpg'
os.path.basename(a.path)
'MetaSlider_ThinkBig‐1080x180.jpg'
Once you run the above script, we will get the filename from URL.
import requests
Now, we need to provide the URL of the media content we want to download and store locally.
url = "https://authoraditiagarwal.com/wpcontent/uploads/2018/05/MetaSlider_ThinkBig‐
1080x180.jpg"
r = requests.get(url, allow_redirects=True)
Now, we can get what type of information about content can be provided by web server.
Date
Server
Upgrade
Connection
Last‐Modified
Accept‐Ranges
Content‐Length
Keep‐Alive
Content‐Type
With the help of following line of code we can get the particular information about content type, say contenttype
−
print (r.headers.get('content‐type'))
image/jpeg
With the help of following line of code, we can get the particular information about content type, say EType −
print (r.headers.get('ETag'))
None
print (r.headers.get('content‐length'))
12636
With the help of following line of code we can get the particular information about content type, say Server −
print (r.headers.get('Server'))
Apache
For this Python script, we need to install Python library named Pillow, a fork of the Python Image library having
useful functions for manipulating images. It can be installed with the help of following command −
The following Python script will create a thumbnail of the image and will save it to the current directory by
prefixing thumbnail file with Th_
import glob
from PIL import Image
for infile in glob.glob("ThinkBig.png"):
img = Image.open(infile)
img.thumbnail((128, 128), Image.ANTIALIAS)
if infile[0:2] != "Th_":
img.save("Th_" + infile, "png")
The above code is very easy to understand and you can check for the thumbnail file in the current directory.
After running the script, you can check your current directory for screenshot.png file.
Thumbnail Generation for Video
Suppose we have downloaded videos from website and wanted to generate thumbnails for them so that a specific
video, based on its thumbnail, can be clicked. For generating thumbnail for videos we need a simple tool called
ffmpeg which can be downloaded from www.ffmpeg.org. After downloading, we need to install it as per the
specifications of our OS.
The following Python script will generate thumbnail of the video and will save it to our local directory −
import subprocess
video_MP4_file = “C:\Users\gaurav\desktop\solar.mp4
thumbnail_image_file = 'thumbnail_solar_video.jpg'
subprocess.call(['ffmpeg', '‐i', video_MP4_file, '‐ss', '00:00:20.000', '‐
vframes', '1', thumbnail_image_file, "‐y"])
After running the above script, we will get the thumbnail named thumbnail_solar_video.jpg saved in our local
directory.
Now, after successfully installing moviepy with the help of following script we can convert and MP4 to MP3.
import moviepy.editor as mp
clip = mp.VideoFileClip(r"C:\Users\gaurav\Desktop\1234.mp4")
clip.audio.write_audiofile("movie_audio.mp3")
The above script will save the audio MP3 file in the local directory.