Python - Web Scraping Videos - Stack Overflow
Python - Web Scraping Videos - Stack Overflow
Asked 4 years ago Modified 1 year, 8 months ago Viewed 15k times
Any understanding on how to do video web scraping with BeautifulSoup and requests would
be appreciated.
Here is some code if needed. A lot of tutorials say to use 'a' tags, but I didn't receive any 'a'
tags.
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.watchcartoononline.com/bobs-burgers-season-9-
episode-5-live-and-let-fly")
soup = BeautifulSoup(r.content,'html.parser')
links = soup.find_all('iframe')
for link in links:
print(link['src'])
Share Follow edited Nov 7, 2018 at 20:04 asked Nov 7, 2018 at 19:37
petezurich user192085
8,545 9 38 56 97 1 1 9
get the video source in the <video> tag. I've found it to be this one in your example:
cdn.cizgifilmlerizle.com/cizgi/… Then you can use python requests with stream=true parameter like this
– Lucas Wieloch Nov 7, 2018 at 19:41
Possible duplicate of Is there a way to download a video from a webpage with python? – Lucas Wieloch
Nov 7, 2018 at 19:41
Join Stack Overflow to find the best answer to your technical question, help others
Sign up
answer theirs.
Report this ad
Sorted by:
2 Answers Highest score (default)
import requests
url = "https://disk19.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e03.mp4?
6 st=_EEVz36ktZOv7ZxlTaXZfg&e=1541637622"
def download_file(url,filename):
# NOTE the stream=True parameter
r = requests.get(url, stream=True)
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
#f.flush() commented by recommendation from J.F.Sebastian
return filename
download_file(url,"bobs.burgers.s09e03.mp4")
This code will download this particular episode onto your computer. The video url is nested
inside the <video> tag in the <source> tag.
This did save a file named after your function, but it was invalid and only 162bytes. Why didn't
beautifulsoup find the video and source tags? I couldn't even located the url containing the extension
mp4 with bs4 or by simply searching the requests response text/content. – user192085 Nov 8, 2018
at 20:50
Background Information
4 (scroll all the way down for your answer)
This Overflow
Join Stack is only easily obtainable
to find if the website
the best answer to your you're trying
technical to gethelp
question, the video
othersformat from makes it
Sign up
answerexplicitly
theirs. stated in the HTML. If you want to, for example, get a .mp4 file from the site of your
choice by referencing the .mp4 URL, then if we use this site here for instance;
https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314 if we look for <video> in
inspect element, there will be an src containing the .mp4
Now if we were to try to grab the .mp4 URL from this website like this
import requests
from bs4 import BeautifulSoup
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
print(mp4)
We would get a KeyError: 'src' output. This happens due to the actual video being stored in
source which we can view if we print out the values inside soup.find_all('video')
import requests
from bs4 import BeautifulSoup
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
print(mp4)
The output:
So if we wanted to now download the .mp4, we would use the source element and get the
src from that instead.
import requests
import shutil # - - This module helps to transfer information from 1 file to
another
from bs4 import BeautifulSoup # - - We could honestly do this without soup
Join Stack Overflow to find the best answer to your technical question, help others
Sign up
answer theirs.
# - - Get the url of the site you want to scrape
html_url = "https://4anime.to/yakunara-mug-cup-mo-episode-01-1?id=45314"
html_response = requests.get(html_url)
soup = BeautifulSoup(html_response.text, 'html.parser')
(You could obviously simplify this by just copying the source's src manually and using that as
the base URL without having to use html_url I just wanted to show you that you could choose
to reference the .mp4 (aka the source's src ))
Once again, not every site is this clear-cut. For this site in particular, we're fortunate that it is
this manageable. Other sites you may try to scrape a video from might have to require you to
go from Elements (in inspect element) to Network . There you'd have to try getting the
snippets of embedded links and try downloading them all to make up the full video but once
again, not always so easy but The video for the site you requested is.
YOUR ANSWER
Go to inspect element, click on Chromecast Player (2. Player) located at the top of the
video to view the HTML attributes and finally click on the embed that should look like this
pid=437035&h=25424730eed390d0bb4634fa93a2e96c&t=1618011716&embed=cizgi
Once you've done that, click play, make sure inspect element is open, click the video to view
the attributes (or ctrl+f to filter for <video> ) and copy the src which should be
https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?st=f9OWlOq1e-
2M9eUVvhZa8A&e=1618019876
import requests
Join Stack
# -Overflow to find the
- This module bestto
helps answer to your
transfer technical question,
information help others
from 1 file to another
Sign up
answer theirs.
import shutil
url = "https://cdn.cizgifilmlerizle.com/cizgi/bobs.burgers.s09e05.mp4?
st=f9OWlOq1e-2M9eUVvhZa8A&e=1618019876"
if response.status_code == 200:
# - - Make sure the file size is not 0
response.raw.decode_content = True
Join Stack Overflow to find the best answer to your technical question, help others
Sign up
answer theirs.