0% found this document useful (0 votes)

3 views5 pages

7Python Web Scraping Processing Images and Videos

Uploaded by

David Osei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views5 pages

7Python Web Scraping Processing Images and Videos

Uploaded by

David Osei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

PROCESSING IMAGES AND VIDEOS

https://www.tutorialspoint.com/python_web_scraping/python_web_scraping_processing_images_and_videos.htm
Copyright © tutorialspoint.com

Getting Media Content from Web Page

In this section, we are going to learn how we can download media content which correctly represents the media
type based on the information from web server. We can do it with the help of Python requests module as we did
in previous chapter.

First, we need to import necessary Python modules as follows −

import requests

Now, provide the URL of the media content we want to download and store locally.

url = "https://authoraditiagarwal.com/wpcontent/uploads/2018/05/MetaSlider_ThinkBig‐
1080x180.jpg"

Use the following code to create HTTP response object.

r = requests.get(url)

With the help of following line of code, we can save the received content as .png file.

with open("ThinkBig.png",'wb') as f:
f.write(r.content)

After running the above Python script, we will get a file named ThinkBig.png, which would have the downloaded
image.

Extracting Filename from URL

After downloading the content from web site, we also want to save it in a file with a file name found in the URL.
But we can also check, if numbers of additional fragments exist in URL too. For this, we need to find the actual
filename from the URL.

With the help of following Python script, using urlparse, we can extract the filename from URL −
import urllib3
import os
url = "https://authoraditiagarwal.com/wpcontent/uploads/2018/05/MetaSlider_ThinkBig‐
1080x180.jpg"
a = urlparse(url)
a.path

You can observe the output as shown below −

'/wp‐content/uploads/2018/05/MetaSlider_ThinkBig‐1080x180.jpg'
os.path.basename(a.path)

You can observe the output as shown below −

'MetaSlider_ThinkBig‐1080x180.jpg'

Once you run the above script, we will get the filename from URL.

Information about Type of Content from URL

While extracting the contents from web server, by GET request, we can also check its information provided by the
web server. With the help of following Python script we can determine what web server means with the type of the
content −

First, we need to import necessary Python modules as follows −

import requests

Now, we need to provide the URL of the media content we want to download and store locally.

url = "https://authoraditiagarwal.com/wpcontent/uploads/2018/05/MetaSlider_ThinkBig‐
1080x180.jpg"

Following line of code will create HTTP response object.

r = requests.get(url, allow_redirects=True)

Now, we can get what type of information about content can be provided by web server.

for headers in r.headers: print(headers)

You can observe the output as shown below −

Date
Server
Upgrade
Connection
Last‐Modified
Accept‐Ranges
Content‐Length
Keep‐Alive
Content‐Type

With the help of following line of code we can get the particular information about content type, say contenttype
−

print (r.headers.get('content‐type'))

You can observe the output as shown below −

image/jpeg

With the help of following line of code, we can get the particular information about content type, say EType −

print (r.headers.get('ETag'))

You can observe the output as shown below −

None

Observe the following command −

print (r.headers.get('content‐length'))

You can observe the output as shown below −

12636

With the help of following line of code we can get the particular information about content type, say Server −

print (r.headers.get('Server'))

You can observe the output as shown below −

Apache

Generating Thumbnail for Images

Thumbnail is a very small description or representation. A user may want to save only thumbnail of a large image
or save both the image as well as thumbnail. In this section we are going to create a thumbnail of the image named
ThinkBig.png downloaded in the previous section “Getting media content from web page”.

For this Python script, we need to install Python library named Pillow, a fork of the Python Image library having
useful functions for manipulating images. It can be installed with the help of following command −

pip install pillow

The following Python script will create a thumbnail of the image and will save it to the current directory by
prefixing thumbnail file with Th_
import glob
from PIL import Image
for infile in glob.glob("ThinkBig.png"):
img = Image.open(infile)
img.thumbnail((128, 128), Image.ANTIALIAS)
if infile[0:2] != "Th_":
img.save("Th_" + infile, "png")

The above code is very easy to understand and you can check for the thumbnail file in the current directory.

Screenshot from Website

In web scraping, a very common task is to take screenshot of a website. For implementing this, we are going to use
selenium and webdriver. The following Python script will take the screenshot from website and will save it to
current directory.

From selenium import webdriver

path = r'C:\\Users\\gaurav\\Desktop\\Chromedriver'
browser = webdriver.Chrome(executable_path = path)
browser.get('https://tutorialspoint.com/')
screenshot = browser.save_screenshot('screenshot.png')
browser.quit

You can observe the output as shown below −

DevTools listening on ws://127.0.0.1:1456/devtools/browser/488ed704‐9f1b‐44f0‐

a571‐892dc4c90eb7
<bound method WebDriver.quit of <selenium.webdriver.chrome.webdriver.WebDriver
(session="37e8e440e2f7807ef41ca7aa20ce7c97")>>

After running the script, you can check your current directory for screenshot.png file.
Thumbnail Generation for Video
Suppose we have downloaded videos from website and wanted to generate thumbnails for them so that a specific
video, based on its thumbnail, can be clicked. For generating thumbnail for videos we need a simple tool called
ffmpeg which can be downloaded from www.ffmpeg.org. After downloading, we need to install it as per the
specifications of our OS.

The following Python script will generate thumbnail of the video and will save it to our local directory −

import subprocess
video_MP4_file = “C:\Users\gaurav\desktop\solar.mp4
thumbnail_image_file = 'thumbnail_solar_video.jpg'
subprocess.call(['ffmpeg', '‐i', video_MP4_file, '‐ss', '00:00:20.000', '‐
vframes', '1', thumbnail_image_file, "‐y"])

After running the above script, we will get the thumbnail named thumbnail_solar_video.jpg saved in our local
directory.

Ripping an MP4 video to an MP3

Suppose you have downloaded some video file from a website, but you only need audio from that file to serve
your purpose, then it can be done in Python with the help of Python library called moviepy which can be installed
with the help of following command −

pip install moviepy

Now, after successfully installing moviepy with the help of following script we can convert and MP4 to MP3.

import moviepy.editor as mp
clip = mp.VideoFileClip(r"C:\Users\gaurav\Desktop\1234.mp4")
clip.audio.write_audiofile("movie_audio.mp3")

You can observe the output as shown below −

[MoviePy] Writing audio in movie_audio.mp3

100%|¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 674/674 [00:01<00:00,
476.30it/s]
[MoviePy] Done.

The above script will save the audio MP3 file in the local directory.