0% found this document useful (0 votes)
3 views

Python - Web Scraping Videos - Stack Overflow

The document discusses techniques for web scraping videos, specifically using Python libraries like BeautifulSoup and requests to extract video URLs from websites. It provides example code for downloading a specific episode of 'Bob's Burgers' and highlights the importance of locating the correct HTML tags, such as <video> and <source>, to retrieve the video file. Additionally, it addresses challenges faced when scraping video content and offers solutions for successfully downloading the videos.

Uploaded by

Louie Lu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Python - Web Scraping Videos - Stack Overflow

The document discusses techniques for web scraping videos, specifically using Python libraries like BeautifulSoup and requests to extract video URLs from websites. It provides example code for downloading a specific episode of 'Bob's Burgers' and highlights the importance of locating the correct HTML tags, such as <video> and <source>, to retrieve the video file. Additionally, it addresses challenges faced when scraping video content and offers solutions for successfully downloading the videos.

Uploaded by

Louie Lu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Web Scraping Videos

Asked 4 years ago Modified 1 year, 8 months ago Viewed 15k times

I'm attempting to do a proof of concept by downloading a TV episode of Bob's Burgers at


https://www.watchcartoononline.com/bobs-burgers-season-9-episode-3-tweentrepreneurs.
4
I cannot figure out how to extract the video url from this website. I used Chrome and Firefox
web developer tools to figure out it is in an iframe, but extracting src urls with BeautifulSoup
searching for iframes, returns links that have nothing to do with the video. Where are the
references to mp4 or flv files (which I see in Developer Tools - even though clicking them is
forbidden).

Any understanding on how to do video web scraping with BeautifulSoup and requests would
be appreciated.

Here is some code if needed. A lot of tutorials say to use 'a' tags, but I didn't receive any 'a'
tags.

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.watchcartoononline.com/bobs-burgers-season-9-
episode-5-live-and-let-fly")
soup = BeautifulSoup(r.content,'html.parser')
links = soup.find_all('iframe')
for link in links:
print(link['src'])

python video screen-scraping

Share Follow edited Nov 7, 2018 at 20:04 asked Nov 7, 2018 at 19:37
petezurich user192085
8,545 9 38 56 97 1 1 9

get the video source in the <video> tag. I've found it to be this one in your example:
cdn.cizgifilmlerizle.com/cizgi/… Then you can use python requests with stream=true parameter like this
– Lucas Wieloch Nov 7, 2018 at 19:41

Possible duplicate of Is there a way to download a video from a webpage with python? – Lucas Wieloch
Nov 7, 2018 at 19:41

Join Stack Overflow to find the best answer to your technical question, help others
Sign up
answer theirs.
Report this ad

Sorted by:
2 Answers Highest score (default)

import requests
url = "https://disk19.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e03.mp4?
6 st=_EEVz36ktZOv7ZxlTaXZfg&e=1541637622"
def download_file(url,filename):
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#f.flush() commented by recommendation from J.F.Sebastian
return filename

download_file(url,"bobs.burgers.s09e03.mp4")

This code will download this particular episode onto your computer. The video url is nested
inside the <video> tag in the <source> tag.

Share Follow answered Nov 7, 2018 at 20:53


Dimitriy Kruglikov
118 1 12

This did save a file named after your function, but it was invalid and only 162bytes. Why didn't
beautifulsoup find the video and source tags? I couldn't even located the url containing the extension
mp4 with bs4 or by simply searching the requests response text/content. – user192085 Nov 8, 2018
at 20:50

Background Information
4 (scroll all the way down for your answer)
This Overflow
Join Stack is only easily obtainable
to find if the website
the best answer to your you're trying
technical to gethelp
question, the video
othersformat from makes it
Sign up
answerexplicitly
theirs. stated in the HTML. If you want to, for example, get a .mp4 file from the site of your
choice by referencing the .mp4 URL, then if we use this site here for instance;
https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314 if we look for <video> in
inspect element, there will be an src containing the .mp4

Now if we were to try to grab the .mp4 URL from this website like this

import requests
from bs4 import BeautifulSoup

html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')

for mp4 in soup.find_all('video'):


mp4 = mp4['src']

print(mp4)

We would get a KeyError: 'src' output. This happens due to the actual video being stored in
source which we can view if we print out the values inside soup.find_all('video')

import requests
from bs4 import BeautifulSoup

html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')

for mp4 in soup.find_all('video'):


pass

print(mp4)

The output:

<video class="video-js vjs-default-skin vjs-big-play-centered" controls=""


data-setup="{}" height="264" id="example_video_1" poster="" preload="none"
width="640">
<source src="https://mountainoservo0002.animecdn.com/Yakunara-Mug-Cup-
mo/Yakunara-Mug-Cup-mo-Episode-01.1-1080p.mp4" type="video/mp4"/>
</video>

So if we wanted to now download the .mp4, we would use the source element and get the
src from that instead.

import requests
import shutil # - - This module helps to transfer information from 1 file to
another
from bs4 import BeautifulSoup # - - We could honestly do this without soup

Join Stack Overflow to find the best answer to your technical question, help others
Sign up
answer theirs.
# - - Get the url of the site you want to scrape
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')

# - - Get the .mp4 url and the filename


for vid in soup.find_all('source'):
url = vid['src']
filename = vid['src'].split('/')[-1]

# - - Get the video


response = requests.get(url, stream=True)

# - - Make sure the status is OK


if response.status_code == 200:
# - - Make sure the file size is not 0
response.raw.decode_content = True

with open(filename, 'wb') as f:


# - - Copy what's in response.raw and transfer it into the file
shutil.copyfileobj(response.raw, f)

(You could obviously simplify this by just copying the source's src manually and using that as
the base URL without having to use html_url I just wanted to show you that you could choose
to reference the .mp4 (aka the source's src ))

Once again, not every site is this clear-cut. For this site in particular, we're fortunate that it is
this manageable. Other sites you may try to scrape a video from might have to require you to
go from Elements (in inspect element) to Network . There you'd have to try getting the
snippets of embedded links and try downloading them all to make up the full video but once
again, not always so easy but The video for the site you requested is.

YOUR ANSWER
Go to inspect element, click on Chromecast Player (2. Player) located at the top of the
video to view the HTML attributes and finally click on the embed that should look like this

pid=437035&amp;h=25424730eed390d0bb4634fa93a2e96c&amp;t=1618011716&amp;embed=cizgi

Once you've done that, click play, make sure inspect element is open, click the video to view
the attributes (or ctrl+f to filter for <video> ) and copy the src which should be

https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-
2M9eUVvhZa8A&e=1618019876

Now we can download it with python.

import requests
Join Stack
# -Overflow to find the
- This module bestto
helps answer to your
transfer technical question,
information help others
from 1 file to another
Sign up
answer theirs.
import shutil
url = "https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?
st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876"

response = requests.get(url, stream=True)

if response.status_code == 200:
# - - Make sure the file size is not 0
response.raw.decode_content = True

with open('bobs-burgers.mp4', 'wb') as f:


# - - Take the data from response.raw and transfer it to the file
shutil.copyfileobj(response.raw, f)
print('downloaded file')
else:
print('Download failed')

Share Follow answered Apr 9, 2021 at 23:50


theletter_zee
41 2

Join Stack Overflow to find the best answer to your technical question, help others
Sign up
answer theirs.

You might also like