
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Save HTML Table Data to CSV in Python
Problem:
One of the most challenging taks for a data sceintist is to collect the data. While the fact is, there is plenty of data available in the web it is just extracting the data through automation.
Introduction..
I wanted to extract the basic operations data which is embedded in HTML tables from https://www.tutorialspoint.com/python/python_basic_operators.htm.
Hmmm, The data is scattered in many HTML tables, if there is only one HTML table obviously I can use Copy & Paste to .csv file.
However, if there are more than 5 tables in a single page then obviously it is pain. Isn't it ?
How to do it..
1. I will quickly show you how to create an csv file easily if you want to create a csv file.
import csv # Open File in Write mode , if not found it will create one File = open('test.csv', 'w+') Data = csv.writer(File) # My Header Data.writerow(('Column1', 'Column2', 'Column3')) # Write data for i in range(20): Data.writerow((i, i+1, i+2)) # close my file File.close()
Output
The above code when executed produces a test.csv file with in the same directory as this code.
2. Let us now retrieve an HTML table from https://www.tutorialspoint.com/python/python_dictionary.htm and write it as a CSV file.
First step is to do imports.
import csv from urllib.request import urlopen from bs4 import BeautifulSoup url = 'https://www.tutorialspoint.com/python/python_dictionary.htm'
Open the HTML file and store it in html object using urlopen.
Output
html = urlopen(url) soup = BeautifulSoup(html, 'html.parser')
Find the tables inside the html table and Let us bring the tables data. For demonstration purpose I will be extracting only the first table [0]
Output
table = soup.find_all('table')[0] rows = table.find_all('tr')
Output
print(rows)
Output
[ Sr.No. Function with Description , 1 cmp(dict1, dict2)Compares elements of both dict.
, 2 len(dict)Gives the total length of the dictionary. This would be equal to the number of items in the dictionary.
, 3 str(dict)Produces a printable string representation of a dictionary
, 4 type(variable)Returns the type of the passed variable. If passed variable is dictionary, then it would return a dictionary type.
]
5. Now we will write the data to csv file.
Example
File = open('my_html_data_to_csv.csv', 'wt+') Data = csv.writer(File) try: for row in rows: FilteredRow = [] for cell in row.find_all(['td', 'th']): FilteredRow.append(cell.get_text()) Data.writerow(FilteredRow) finally: File.close()
6. The results are now saved into my_html_data_to_csv.csv file.
Example
We will put everything explained above together.
Example
import csv from urllib.request import urlopen from bs4 import BeautifulSoup # set the url.. url = 'https://www.tutorialspoint.com/python/python_basic_syntax.htm' # Open the url and parse the html html = urlopen(url) soup = BeautifulSoup(html, 'html.parser') # extract the first table table = soup.find_all('table')[0] rows = table.find_all('tr') # write the content to the file File = open('my_html_data_to_csv.csv', 'wt+') Data = csv.writer(File) try: for row in rows: FilteredRow = [] for cell in row.find_all(['td', 'th']): FilteredRow.append(cell.get_text()) Data.writerow(FilteredRow) finally: File.close()