Python | Parse a website with regex and urllib Last Updated : 23 Jan, 2019 Comments Improve Suggest changes Like Article Like Report Let's discuss the concept of parsing using python. In python we have lot of modules but for parsing we only need urllib and re i.e regular expression. By using both of these libraries we can fetch the data on web pages. Note that parsing of websites means that fetch the whole source code and that we want to search using a given url link, it will give you the output as the bulk of HTML content that you can't understand. Let's see the demonstration with an explanation to let you understand more about parsing. Code #1: Libraries needed Python3 1== # importing libraries import urllib.request import urllib.parse import re Code #2: Python3 1== url = 'https://www.geeksforgeeks.org/' values = {'s':'python programming', 'submit':'search'} We have defined a url and some related values that we want to search. Remember that we define values as a dictionary and in this key value pair we define python programming to search on the defined url. Code #3: Python3 1== data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() In the first line we encode the values that we have defined earlier, then (line 2) we encode the same data that is understand by machine. In 3rd line of code we request for values in the defined url, then use the module urlopen() to open the web document that HTML. In the last line read() will help read the document line by line and assign it to respData named variable. Code #4: Python3 1== paragraphs = re.findall(r'<p>(.*?)</p>', str(respData)) for eachP in paragraphs: print(eachP) In order to extract the relevant data we apply regular expression. Second argument must be type string and if we want to print the data we apply simple print function. Below are few examples: Example #1: Python3 1== import urllib.request import urllib.parse import re url = 'https://www.geeksforgeeks.org/' values = {'s':'python programming', 'submit':'search'} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() paragraphs = re.findall(r'<p>(.*?)</p>',str(respData)) for eachP in paragraphs: print(eachP) Output: Example #2: Python3 1== import urllib.request import urllib.parse import re url = 'https://www.geeksforgeeks.org/' values = {'s':'pandas', 'submit':'search'} data = urllib.parse.urlencode(values) data = data.encode('utf-8') req = urllib.request.Request(url, data) resp = urllib.request.urlopen(req) respData = resp.read() paragraphs = re.findall(r'<p>(.*?)</p>',str(respData)) for eachP in paragraphs: print(eachP) Output: Comment More infoAdvertise with us Next Article Python | Parse a website with regex and urllib J Jitender_1998 Follow Improve Article Tags : Python Web Technologies python-utility Practice Tags : python Similar Reads Parsing and Processing URL using Python - Regex Prerequisite: Regular Expression in Python URL or Uniform Resource Locator consists of many information parts, such as the domain name, path, port number etc. Any URL can be processed and parsed using Regular Expression. So for using Regular Expression we have to use re library in Python. Example: U 3 min read Scraping websites with Newspaper3k in Python Web Scraping is a powerful tool to gather information from a website. To scrape multiple URLs, we can use a Python library called Newspaper3k. The Newspaper3k package is a Python library used for Web Scraping articles, It is built on top of requests and for parsing lxml. This module is a modified an 2 min read How to Scrape Websites with Beautifulsoup and Python ? Have you ever wondered how much data is created on the internet every day, and what if you want to work with those data? Unfortunately, this data is not properly organized like some CSV or JSON file but fortunately, we can use web scraping to scrape the data from the internet and can use it accordin 10 min read Scrape Tables From any website using Python Scraping is a very essential skill for everyone to get data from any website. Scraping and parsing a table can be very tedious work if we use standard Beautiful soup parser to do so. Therefore, here we will be describing a library with the help of which any table can be scraped from any website easi 3 min read Verbose in Python Regex In this article, we will learn about VERBOSE flag of the re package and how to use it. re.VERBOSE : This flag allows you to write regular expressions that look nicer and are more readable by allowing you to visually separate logical sections of the pattern and add comments. Whitespace within the pat 3 min read Scraping Reddit with Python and BeautifulSoup In this article, we are going to see how to scrape Reddit with Python and BeautifulSoup. Here we will use Beautiful Soup and the request module to scrape the data. Module neededbs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in 3 min read Web Scraping using lxml and XPath in Python Prerequisites: Introduction to Web Scraping In this article, we will discuss the lxml python library to scrape data from a webpage, which is built on top of the libxml2 XML parsing library written in C. When compared to other python web scraping libraries like BeautifulSoup and Selenium, the lxml pa 3 min read How to Scrape Multiple Pages of a Website Using Python? Web Scraping is a method of extracting useful data from a website using computer programs without having to manually do it. This data can then be exported and categorically organized for various purposes. Some common places where Web Scraping finds its use are Market research & Analysis Websites 6 min read Python Regex: Replace Captured Groups Regular Expressions, often abbreviated as Regex, are sequences of characters that form search patterns. They are powerful tools used in programming and text processing to search, match, and manipulate strings. Think of them as advanced search filters that allow us to find specific patterns within a 5 min read Extract all the URLs from the webpage Using Python Scraping is a very essential skill for everyone to get data from any website. In this article, we are going to write Python scripts to extract all the URLs from the website or you can save it as a CSV file. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and 2 min read Like