Open In App

Python – Convert HTML Characters To Strings

Last Updated : 15 Jan, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Converting HTML characters to strings is a process of decoding HTML entities like &lt; into their respective characters, such as <. This is essential for making encoded HTML content readable in plain text. In this article, we will explore efficient methods to convert HTML characters to strings in Python .

Using html.unescape()

html module provides the unescape() function, which is useful for converting HTML entities back into normal text. This method works well for converting HTML-encoded strings into their original form.

Python
# import html 
import html 
  
# Create Text 
text = 'Γeeks for Γeeks'
  
# It Converts given text To String 
print(html.unescape(text))  
  
# It Converts given text to HTML Entities  
print(html.escape(text))  

Output:

Γeeks for Γeeks
&Gamma;eeks for &Gamma;eeks

Explanation:

  • html.unescape() converts the HTML entities like &Gamma; back into their respective characters like Γ.
  • html.escape() converts special characters in a string such as Γ into HTML entities &Gamma;.

Using BeautifulSoup

When we’re working with web data and need to decode HTML entities automatically, BeautifulSoup is an excellent choice. It handles entity conversion seamlessly, making it a popular choice for parsing HTML content.

Python
from bs4 import BeautifulSoup

# Sample HTML string
s = "Hello &lt;b&gt;World&lt;/b&gt;!"

# Parse the HTML string using BeautifulSoup
soup = BeautifulSoup(s, "html.parser")

# Convert HTML entities to plain text
res = soup.get_text()
print(res)

Output:

Hello <b>World</b>!

Explanation:

  • Import BeautifulSoup library is used to parse and manipulate HTML or XML content.
  • Parse HTML: The string s is passed to BeautifulSoup with the parser "html.parser", which decodes the HTML.
  • get_text() method extracts and decodes the HTML entities into plain text.


Next Article
Practice Tags :

Similar Reads