0% found this document useful (0 votes)
28 views3 pages

Parsing The Web: Let S Find The Following Data For The First 100 Movies

The document describes parsing web data to extract the release date, movie title, and production budget for the first 100 movies from a website. It uses the Beautiful Soup and Pandas libraries in Python to make a request to the target URL, parse the HTML response, extract the data from table rows into a dictionary, add it to an info array, and convert that into a Pandas dataframe for output.

Uploaded by

Josue Sanchez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views3 pages

Parsing The Web: Let S Find The Following Data For The First 100 Movies

The document describes parsing web data to extract the release date, movie title, and production budget for the first 100 movies from a website. It uses the Beautiful Soup and Pandas libraries in Python to make a request to the target URL, parse the HTML response, extract the data from table rows into a dictionary, add it to an info array, and convert that into a Pandas dataframe for output.

Uploaded by

Josue Sanchez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

PARSING THE WEB

Let´s find the following data for the first 100 movies:

Release Date Movie Production Budget

Código Implementado:
import requests
# Import the beautiful soup
from bs4 import BeautifulSoup
# Export library
import pandas as pd

TARGET_URL='https://www.the-numbers.com/movie/budgets/all'

info=[] # arreglo general


data={} # diccionoario final

myData=requests.get(TARGET_URL)
# Using beautiful soup library for parsing fetched data
soup= BeautifulSoup(myData.text, 'html.parser')
elements=soup.find_all("tr")
for elem in elements:
valores = []
dat = {}
itemtd=elem.find_all("td")
if itemtd:
valores.append(itemtd[1].text)
valores.append(itemtd[2].text)
valores.append(itemtd[3].text)

#se almacena la data en diccionarios con clave numérica por posición


dat[itemtd[0].text]=valores

#se agrega al arreglo general para crear el diccionario final


info.append(dat)

data["peliculas"]=info # se agraga para clave valor al diccionario data

dataFrame = pd.DataFrame.from_dict(data)
print(dataFrame)

Resultado al ejecutar el código:

You might also like