BeautifulSoup is a python library that pulls out the data from HTML and XML files. Using BeautifulSoup, we can also remove the empty tags present in HTML or XML documents and further convert the given data into human readable files.
First, we will install BeautifulSoup library in our local environment using the command: pip install beautifulsoup4
Example
#Import the BeautifulSoup library from bs4 import BeautifulSoup #Get the html document html_object = """ <p>Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation.</p> """ #Let us create the soup for the given html document soup = BeautifulSoup(html_object, "lxml") #Iterate over each line of the document and extract the data for x in soup.find_all(): if len(x.get_text(strip=True)) == 0: x.extract() print(soup)
Output
Running the above code will generate the output and convert the given HTML document into human readable code by removing empty tags in it.
<html><body><p>Python is an interpreted, high−level and general−purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation.</p> </body></html>