0% found this document useful (0 votes)
23 views2 pages

Test 2

Uploaded by

Narayan Sahu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
23 views2 pages

Test 2

Uploaded by

Narayan Sahu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 2
ran, 1:18 PM. Implemoning Web Seraping in Python wth BeautflSoup- GookstorGooks # Here the user agent is for Edge browser on windows 10. You can find your browser use P= requests.get(url=URL, headers=headers) print(r.content) Step 3: Parsing the HTML content Python #This will not run on online IDE import requests from bs4 import Beautifulsoup URL = “http: //4n.values.con/inspirational-quotes” P= requests. get (URL) soup = GeautifulSoup(r.content, ‘htmlSlib') # If this line causes an error, run ‘pip i print (soup. prettify()) 4 » Areally nice thing about the BeautifulSoup library is that it is built on the top of the HTML parsing libraries like ht ml5lib, (xml, htmL.parser, etc. So BeautifulSoup object and specify the parser library can be created at the same time. In the example above, soup = BeautifulSoup(r.content, ‘htmlS1ib") We create a BeautifulSoup object by passing two arguments: * econtent : It is the raw HTML content. * htmiB5lib : Specifying the HTML parser we want to use. Now soup.prettify() is printed, it gives the visual representation of the parse tree created from the raw HTML content. Step }: Searching and navigating through the parse tree Now, we would like to extract some useful data from the HTML content. The soup object contains all the data in the nested structure which could be programmatically extracted. In our example, we are scraping a webpage consisting of some quotes. So, we would like to create a program to save those quotes (and all relevant information about them). Python ‘Python program to scrape website We use cookies to ensure you have the best brows experience on our webshte, By using our site, you acknowledge tat you have read and understood our Cookie Policy & Privacy Polcy -ntpsstwar.geekstorgceks orgimplementing-web-scraping-pythor-beaufulsoup! a0 ran, 1:18 PM. Implemoning Web Seraping in Python wth BeautflSoup- GookstorGooks import csv URL = “http://wwy. values. con/inspirational-quotes” P= requests.get(URL) soup = BeautifulSoup(r.content, ‘htnlslib’) quotes=[] # a list to store quotes table = soup-find('div', attrs = {'id':'all_quotes'}) for row in table.findAll(‘div', attrs = (‘class col-6 col-Ig-3 text-center margin-30px-bott quote = {} quote['theme’] = row.hS.text quote{'url'] = row.af href") quote['ing'] = row.imgl*src'] quote['Lines"] = row.ingf'alt"].split(” #"){@) Hiring Challenge In 20:43:58 BeautifulSoup Selenium Scrapy _urllib. Request opency Data an filename = ‘inspirational_quotes.csv’ with open(Filenane, 'w', newline="") as #: w= csv.DictWriter(F,[ theme’, “url’, img’, 'Lines*, ‘author’ ]) w.weiteheader() for quote in quotes: W.writerow(quote) Before moving on, we recommend you to go through the HTML content of the webpage which we printed using soup.prettify() method and try to find a pattern or a way to navigate to the quotes. * Itis noticed that all the quotes are inside a div container whose id is ‘all_quotes'. So, we find that div element (termed as table in above code) using find() method table = soup.find('div', attrs = all_quotes'}) * The first argument is the HTML tag you want to search and second argument is a dictionary type element to specify the additional attributes associated with that tag find() method returns the first matching element. You can try to print table.prettify() to get a sense of what this piece of code does, * Now, in the table element, one can notice that each quote is inside a div container whose class is quote. So, we iterate through each div container whose class is quote. Here, we use findAll() method which is similar to find method in terms of arguments but it returns a list of all matching elements. Each quote is now iterated using a \rariahla rallad ents Hara le nna eamnla rus LITMI cantant far hatter indaretandina: We use cookies to ensure you have the best brows experience on our webshte, By using our site, you acknowledge tat you have read and understood our Cookie Policy & Privacy Polcy -ntpsstwar.geekstorgceks orgimplementing-web-scraping-pythor-beaufulsoup! ano

You might also like