Reading selected webpage content using Python Web Scraping Last Updated : 11 Jul, 2022 Summarize Comments Improve Suggest changes Share Like Article Like Report Prerequisite: Downloading files in Python, Web Scraping with BeautifulSoup We all know that Python is a very easy programming language but what makes it cool are the great number of open source library written for it. Requests is one of the most widely used library. It allows us to open any HTTP/HTTPS website and let us do any kind of stuff we normally do on web and can also save sessions i.e cookie. As we all know that a webpage is just a piece of HTML code which is sent by the Web Server to our Browser, which in turn converts into the beautiful page. Now we need a mechanism to get hold of the HTML source code i.e finding some particular tags with a package called BeautifulSoup. Installation: pip3 install requests pip3 install beautifulsoup4 We take an example by reading a news site Hindustan Times The code can be divided into three parts. Requesting a webpage Inspecting the tags Print the appropriate contents Steps: Requesting a webpage: First we see right click on the news text to see the source code Inspecting the tags: We need to figure in which body of the source code contains the news section we want to scrap. It is the under ul,i.e unordered list, "searchNews" which contains the news section. Note The news text is present in the anchor tag text part. A close observation gives us the idea that all the news are in li, list, tags of the unordered tag. Print the appropriate contents: The content is printed with the help of code given below. Python import requests from bs4 import BeautifulSoup def news(): # the target we want to open url='http://www.hindustantimes.com/top-news' #open with GET method resp=requests.get(url) #http_respone 200 means OK status if resp.status_code==200: print("Successfully opened the web page") print("The news are as follow :-\n") # we need a parser,Python built-in HTML parser is enough . soup=BeautifulSoup(resp.text,'html.parser') # l is the list which contains all the text i.e news l=soup.find("ul",{"class":"searchNews"}) #now we want to print only the text part of the anchor. #find all the elements of a, i.e anchor for i in l.findAll("a"): print(i.text) else: print("Error") news() Output Successfully opened the web page The news are as follow :- Govt extends toll tax suspension, use of old notes for utility bills extended till Nov 14 Modi, Abe seal historic civil nuclear pact: What it means for India Rahul queues up at bank, says it is to show solidarity with common man IS kills over 60 in Mosul, victims dressed in orange and marked 'traitors' Rock On 2 review: Farhan Akhtar, Arjun Rampal's band hasn't lost its magic Rumours of shortage in salt supply spark panic among consumers in UP Worrying truth: India ranks first in pneumonia, diarrhoea deaths among kids To hell with romance, here's why being single is the coolest way to be India vs England: Cheteshwar Pujara, Murali Vijay make merry with tons in Rajkot Akshay-Bhumi, SRK-Alia, Ajay-Parineeti: Age difference doesn't matter anymore Currency ban: Only one-third have bank access; NE, backward regions worst hit Nepal's central bank halts transactions with Rs 500, Rs 1000 Indian notes Political upheaval in Punjab after SC tells it to share Sutlej water Let's not kid ourselves, with Trump, what we have seen is what we will get Want to colour your hair? Try rose gold, the hottest hair trend this winter References Requests BeautifulSoup Http_status_codes Comment More infoAdvertise with us Next Article read_html function in R S Shubham Choudhary 1 Improve Article Tags : Web Scraping Similar Reads Introduction to Web Scraping Web scraping is an automated technique used to extract data from websites. Instead of manually copying and pasting information which is a slow and repetitive process it uses software tools to gather large amounts of data quickly. These tools can be custom-built or used across multiple sites. It also 6 min read How to Isolate a Single Element from a Scraped Web Page in R Web scraping in R involves scraping HTML text from a web page to extract and analyze useful information. It is commonly used for data-gathering tasks, such as gathering information from online tables, extracting text, or isolating specific assets from web content. Web scraping, the programmatically 4 min read read_html function in R Web scraping is a powerful technique in data science for extracting data from websites. R, with its rich ecosystem of packages, provides several tools for this purpose, and one of the most commonly used functions is read_html() from the rvest package. This function allows users to download and parse 4 min read BeautifulSoup4 Module - Python BeautifulSoup4 is a user-friendly Python library designed for parsing HTML and XML documents. It simplifies the process of web scraping by allowing developers to effortlessly navigate, search and modify the parse tree of a webpage. With BeautifulSoup4, we can extract specific elements, attributes an 3 min read Implementing web scraping using lxml in Python Web scraping basically refers to fetching only some important piece of information from one or more websites. Every website has recognizable structure/pattern of HTML elements. Steps to perform web scraping :1. Send a link and get the response from the sent link 2. Then convert response object to a 3 min read Python Web Scraping Tutorial Web scraping is the process of extracting data from websites automatically. Python is widely used for web scraping because of its easy syntax and powerful libraries like BeautifulSoup, Scrapy, and Selenium. In this tutorial, you'll learn how to use these Python tools to scrape data from websites and 10 min read Like