Scrape Tables From any website using Python
Last Updated :
06 Aug, 2021
Scraping is a very essential skill for everyone to get data from any website. Scraping and parsing a table can be very tedious work if we use standard Beautiful soup parser to do so. Therefore, here we will be describing a library with the help of which any table can be scraped from any website easily. With this method you don't even have to inspect element of a website, you only have to provide the URL of the website. That's it and the work will be done within seconds.
Installation
You can use pip to install this library:
pip install html-table-parser-python3
Getting Started
Step 1: Import the necessary libraries required for the task
# Library for opening url and creating
# requests
import urllib.request
# pretty-print python data structures
from pprint import pprint
# for parsing all the tables present
# on the website
from html_table_parser.parser import HTMLTableParser
# for converting the parsed data in a
# pandas dataframe
import pandas as pd
Step 2 : Defining a function to get contents of the website
# Opens a website and read its
# binary contents (HTTP Response Body)
def url_get_contents(url):
# Opens a website and read its
# binary contents (HTTP Response Body)
#making request to the website
req = urllib.request.Request(url=url)
f = urllib.request.urlopen(req)
#reading contents of the website
return f.read()
Now, our function is ready so we have to specify the url of the website from which we need to parse tables.
Note: Here we will be taking the example of moneycontrol.com website since it has many tables and will give you a better understanding. You can view the website here .
Step 3 : Parsing tables
# defining the html contents of a URL.
xhtml = url_get_contents('Link').decode('utf-8')
# Defining the HTMLTableParser object
p = HTMLTableParser()
# feeding the html contents in the
# HTMLTableParser object
p.feed(xhtml)
# Now finally obtaining the data of
# the table required
pprint(p.tables[1])
Each row of the table is stored in an array. This can be converted into a pandas dataframe easily and can be used to perform any analysis.
Complete Code:
Python3
# Library for opening url and creating
# requests
import urllib.request
# pretty-print python data structures
from pprint import pprint
# for parsing all the tables present
# on the website
from html_table_parser.parser import HTMLTableParser
# for converting the parsed data in a
# pandas dataframe
import pandas as pd
# Opens a website and read its
# binary contents (HTTP Response Body)
def url_get_contents(url):
# Opens a website and read its
# binary contents (HTTP Response Body)
#making request to the website
req = urllib.request.Request(url=url)
f = urllib.request.urlopen(req)
#reading contents of the website
return f.read()
# defining the html contents of a URL.
xhtml = url_get_contents('https://www.moneycontrol.com/india\
/stockpricequote/refineries/relianceindustries/RI').decode('utf-8')
# Defining the HTMLTableParser object
p = HTMLTableParser()
# feeding the html contents in the
# HTMLTableParser object
p.feed(xhtml)
# Now finally obtaining the data of
# the table required
pprint(p.tables[1])
# converting the parsed data to
# dataframe
print("\n\nPANDAS DATAFRAME\n")
print(pd.DataFrame(p.tables[1]))
Output:
Similar Reads
How to Scrape Multiple Pages of a Website Using Python? Web Scraping is a method of extracting useful data from a website using computer programs without having to manually do it. This data can then be exported and categorically organized for various purposes. Some common places where Web Scraping finds its use are Market research & Analysis Websites
6 min read
How to Scrape Videos using Python ? Prerequisite: requestsBeautifulSoup In this article, we will discuss web scraping of videos using python. For web scraping, we will use requests and BeautifulSoup Module in Python. The requests library is an integral part of Python for making HTTP requests to a specified URL. Whether it be REST APIs
2 min read
Extract title from a webpage using Python Prerequisite Implementing Web Scraping in Python with BeautifulSoup, Python Urllib Module, Tools for Web Scraping In this article, we are going to write python scripts to extract the title form the webpage from the given webpage URL. Method 1: bs4 Beautiful Soup(bs4) is a Python library for pulling
3 min read
Extract all the URLs from the webpage Using Python Scraping is a very essential skill for everyone to get data from any website. In this article, we are going to write Python scripts to extract all the URLs from the website or you can save it as a CSV file. Module Needed: bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and
2 min read
Web Scraping using lxml and XPath in Python Prerequisites: Introduction to Web Scraping In this article, we will discuss the lxml python library to scrape data from a webpage, which is built on top of the libxml2 XML parsing library written in C. When compared to other python web scraping libraries like BeautifulSoup and Selenium, the lxml pa
3 min read
Automated Website Scraping using Scrapy Scrapy is a Python framework for web scraping on a large scale. It provides with the tools we need to extract data from websites efficiently, processes it as we see fit, and store it in the structure and format we prefer. Zyte (formerly Scrapinghub), a web scraping development and services company,
5 min read
Web Scraping Financial News Using Python In this article, we will cover how to extract financial news seamlessly using Python. This financial news helps many traders in placing the trade in cryptocurrency, bitcoins, the stock markets, and many other global stock markets setting up of trading bot will help us to analyze the data. Thus all t
3 min read
Scraping websites with Newspaper3k in Python Web Scraping is a powerful tool to gather information from a website. To scrape multiple URLs, we can use a Python library called Newspaper3k. The Newspaper3k package is a Python library used for Web Scraping articles, It is built on top of requests and for parsing lxml. This module is a modified an
2 min read
How to Scrape Websites with Beautifulsoup and Python ? Have you ever wondered how much data is created on the internet every day, and what if you want to work with those data? Unfortunately, this data is not properly organized like some CSV or JSON file but fortunately, we can use web scraping to scrape the data from the internet and can use it accordin
10 min read
Image Scraping with Python Scraping Is a very essential skill for everyone to get data from any website. In this article, we are going to see how to scrape images from websites using python. For scraping images, we will try different approaches. Method 1: Using BeautifulSoup and Requests bs4: Beautiful Soup(bs4) is a Python l
2 min read