Python | Parse a website with regex and urllib Last Updated : 23 Jan, 2019 Summarize Comments Improve Suggest changes Share Like Article Like Report Let's discuss the concept of parsing using python. In python we have lot of modules but for parsing we only need urllib and re i.e regular expression. By using both of these libraries we can fetch the data on web pages. Note that parsing of websites means that fetch the whole source code and that we want to search using a given url link, it will give you the output as the bulk of HTML content that you can't understand. Let's see the demonstration with an explanation to let you understand more about parsing. Code #1: Libraries needed Python3 1== # importing libraries import urllib.request import urllib.parse import re Code #2: Python3 1== url = 'https://www.geeksforgeeks.org/' values = {'s':'python programming', 'submit':'search'} We have defined a url and some related values that we want to search. Remember that we define values as a dictionary and in this key value pair we define python programming to search on the defined url. Code #3: Python3 1== data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() In the first line we encode the values that we have defined earlier, then (line 2) we encode the same data that is understand by machine. In 3rd line of code we request for values in the defined url, then use the module urlopen() to open the web document that HTML. In the last line read() will help read the document line by line and assign it to respData named variable. Code #4: Python3 1== paragraphs = re.findall(r'<p>(.*?)</p>', str(respData)) for eachP in paragraphs: print(eachP) In order to extract the relevant data we apply regular expression. Second argument must be type string and if we want to print the data we apply simple print function. Below are few examples: Example #1: Python3 1== import urllib.request import urllib.parse import re url = 'https://www.geeksforgeeks.org/' values = {'s':'python programming', 'submit':'search'} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() paragraphs = re.findall(r'<p>(.*?)</p>',str(respData)) for eachP in paragraphs: print(eachP) Output: Example #2: Python3 1== import urllib.request import urllib.parse import re url = 'https://www.geeksforgeeks.org/' values = {'s':'pandas', 'submit':'search'} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() paragraphs = re.findall(r'<p>(.*?)</p>',str(respData)) for eachP in paragraphs: print(eachP) Output: Comment More infoAdvertise with us Next Article How to Scrape Websites with Beautifulsoup and Python ? J jitender_1998 Follow Improve Article Tags : Python Web Technologies python-utility Practice Tags : python Similar Reads Parsing and Processing URL using Python - Regex Prerequisite: Regular Expression in Python URL or Uniform Resource Locator consists of many information parts, such as the domain name, path, port number etc. Any URL can be processed and parsed using Regular Expression. So for using Regular Expression we have to use re library in Python. Example: U 3 min read Scraping websites with Newspaper3k in Python Web Scraping is a powerful tool to gather information from a website. To scrape multiple URLs, we can use a Python library called Newspaper3k. The Newspaper3k package is a Python library used for Web Scraping articles, It is built on top of requests and for parsing lxml. This module is a modified an 2 min read How to Scrape Websites with Beautifulsoup and Python ? Have you ever wondered how much data is created on the internet every day, and what if you want to work with those data? Unfortunately, this data is not properly organized like some CSV or JSON file but fortunately, we can use web scraping to scrape the data from the internet and can use it accordin 10 min read Scrape Tables From any website using Python Scraping is a very essential skill for everyone to get data from any website. Scraping and parsing a table can be very tedious work if we use standard Beautiful soup parser to do so. Therefore, here we will be describing a library with the help of which any table can be scraped from any website easi 3 min read Scrape Tables From any website using Python Scraping is a very essential skill for everyone to get data from any website. Scraping and parsing a table can be very tedious work if we use standard Beautiful soup parser to do so. Therefore, here we will be describing a library with the help of which any table can be scraped from any website easi 3 min read Scraping Reddit with Python and BeautifulSoup In this article, we are going to see how to scrape Reddit with Python and BeautifulSoup. Here we will use Beautiful Soup and the request module to scrape the data. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in 3 min read Like