We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 2
ran, 1:18 PM. Implemoning Web Seraping in Python wth BeautflSoup- GookstorGooks
# Here the user agent is for Edge browser on windows 10. You can find your browser use
P= requests.get(url=URL, headers=headers)
print(r.content)
Step 3: Parsing the HTML content
Python
#This will not run on online IDE
import requests
from bs4 import Beautifulsoup
URL = “http: //4n.values.con/inspirational-quotes”
P= requests. get (URL)
soup = GeautifulSoup(r.content, ‘htmlSlib') # If this line causes an error, run ‘pip i
print (soup. prettify())
4 »
Areally nice thing about the BeautifulSoup library is that it is built on the top of the
HTML parsing libraries like ht ml5lib, (xml, htmL.parser, etc. So BeautifulSoup object and
specify the parser library can be created at the same time. In the example above,
soup = BeautifulSoup(r.content, ‘htmlS1ib")
We create a BeautifulSoup object by passing two arguments:
* econtent : It is the raw HTML content.
* htmiB5lib : Specifying the HTML parser we want to use.
Now soup.prettify() is printed, it gives the visual representation of the parse tree created
from the raw HTML content. Step
}: Searching and navigating through the parse tree
Now, we would like to extract some useful data from the HTML content. The soup object
contains all the data in the nested structure which could be programmatically extracted.
In our example, we are scraping a webpage consisting of some quotes. So, we would like
to create a program to save those quotes (and all relevant information about them).
Python
‘Python program to scrape website
We use cookies to ensure you have the best brows
experience on our webshte, By using our
site, you acknowledge tat you have read and understood our Cookie Policy & Privacy Polcy
-ntpsstwar.geekstorgceks orgimplementing-web-scraping-pythor-beaufulsoup! a0ran, 1:18 PM. Implemoning Web Seraping in Python wth BeautflSoup- GookstorGooks
import csv
URL = “http://wwy. values. con/inspirational-quotes”
P= requests.get(URL)
soup = BeautifulSoup(r.content, ‘htnlslib’)
quotes=[] # a list to store quotes
table = soup-find('div', attrs = {'id':'all_quotes'})
for row in table.findAll(‘div',
attrs = (‘class
col-6 col-Ig-3 text-center margin-30px-bott
quote = {}
quote['theme’] = row.hS.text
quote{'url'] = row.af href")
quote['ing'] = row.imgl*src']
quote['Lines"] = row.ingf'alt"].split(” #"){@)
Hiring Challenge In 20:43:58 BeautifulSoup Selenium Scrapy _urllib. Request opency Data an
filename = ‘inspirational_quotes.csv’
with open(Filenane, 'w', newline="") as #:
w= csv.DictWriter(F,[ theme’, “url’, img’, 'Lines*, ‘author’ ])
w.weiteheader()
for quote in quotes:
W.writerow(quote)
Before moving on, we recommend you to go through the HTML content of the webpage
which we printed using soup.prettify() method and try to find a pattern or a way to
navigate to the quotes.
* Itis noticed that all the quotes are inside a div container whose id is ‘all_quotes'. So,
we find that div element (termed as table in above code) using find() method
table = soup.find('div', attrs = all_quotes'})
* The first argument is the HTML tag you want to search and second argument is a
dictionary type element to specify the additional attributes associated with that tag
find() method returns the first matching element. You can try to print table.prettify()
to get a sense of what this piece of code does,
* Now, in the table element, one can notice that each quote is inside a div container
whose class is quote. So, we iterate through each div container whose class is quote.
Here, we use findAll() method which is similar to find method in terms of arguments
but it returns a list of all matching elements. Each quote is now iterated using a
\rariahla rallad ents Hara le nna eamnla rus LITMI cantant far hatter indaretandina:
We use cookies to ensure you have the best brows
experience on our webshte, By using our
site, you acknowledge tat you have read and understood our Cookie Policy & Privacy Polcy
-ntpsstwar.geekstorgceks orgimplementing-web-scraping-pythor-beaufulsoup! ano