Convert Unicode to ASCII in Python
Last Updated :
08 Feb, 2024
Unicode is the universal character set and a standard to support all the world's languages. It contains 140,000+ characters used by 150+ scripts along with various symbols. ASCII on the other hand is a subset of Unicode and the most compatible character set, consisting of 128 letters made of English letters, digits, and punctuation, with the remaining being control characters. This article deals with the conversion of a wide range of Unicode characters to a simpler ASCII representation using the Python library anyascii.
The text is converted from character to character. The mappings for each script are based on conventional schemes. Symbolic characters are converted based on their meaning or appearance. If the input contains ASCII (American Standard Code for Information Interchange) characters, they are untouched, the rest are all tried to be converted to ASCII. Unknown characters are removed.
Installation:
To install this module type the below command in the terminal.
pip install anyascii
Example 1: Working with Several languages
In this, various different languages like Unicode are set as input, and output is given as converted ASCII characters.
Python3
from anyascii import anyascii
# checking for Hindi script
hindi_uni = anyascii('नमस्ते विद्यार्थी')
print("The translation from hindi Script : "
+ str(hindi_uni))
# checking for Punjabi script
pun_uni = anyascii('ਸਤਿ ਸ੍ਰੀ ਅਕਾਲ')
print("The translation from Punjabi Script : "
+ str(pun_uni))
Output :
The translation from hindi Script : nmste vidyarthi
The translation from Punjabi Script : sti sri akal
Example 2: Working with Unicode Emojis and Symbols
This library also handles working with emojis and symbols, which are generally Unicode representations.
from anyascii import anyascii# working with emoji example
emoji_uni = anyascii('???? ???? ????')print("The ASCII from emojis : "
+ str(emoji_uni))# checking for Symbols
sym_uni = anyascii('➕ ☆ ℳ')print("The ASCII from Symbols : "
+ str(sym_uni))
Output:
The ASCII from emojis : :sunglasses: :crown: :apple:
The ASCII from Symbols : :heavy_plus_sign: * M
Using the iconv Utility:
Approach:
The iconv utility is a system command-line tool that can convert text from one character encoding to another. You can use the subprocess module to call the iconv utility from Python.
Python3
import subprocess
unicode_string = "Héllo, Wörld!"
process = subprocess.Popen(['iconv', '-f', 'utf-8', '-t', 'ascii//TRANSLIT'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
output, error = process.communicate(input=unicode_string.encode())
ascii_string = output.decode()
print(ascii_string)
Time Complexity: O(n)
Auxiliary Space: O(n)
Similar Reads
Convert Unicode to Bytes in Python Unicode, often known as the Universal Character Set, is a standard for text encoding. The primary objective of Unicode is to create a universal character set that can represent text in any language or writing system. Text characters from various writing systems are given distinctive representations
2 min read
Program to Convert ASCII to Unicode In this article, we will learn about different character encoding techniques which are ASCII (American Standard Code for Information Interchange) and Unicode (Universal Coded Character Set), and the conversion of ASCII to Unicode. Table of Content What is ASCII Characters?What is ASCII Table?What is
4 min read
Program to Convert Unicode to ASCII Given a Unicode number, the task is to convert this into an ASCII (American Standard Code for Information Interchange) number. ASCII numberASCII is a character encoding standard used in communication systems and computers. It uses 7-bit encoding to encode 128 different characters 0-127. These values
4 min read
Convert Unicode String to a Byte String in Python Python is a versatile programming language known for its simplicity and readability. Unicode support is a crucial aspect of Python, allowing developers to handle characters from various scripts and languages. However, there are instances where you might need to convert a Unicode string to a regular
2 min read
How To Convert Unicode To Integers In Python Unicode is a standardized character encoding that assigns a unique number to each character in most of the world's writing systems. In Python, working with Unicode is common, and you may encounter situations where you need to convert Unicode characters to integers. This article will explore five dif
2 min read
Convert a String to Utf-8 in Python Unicode Transformation Format 8 (UTF-8) is a widely used character encoding that represents each character in a string using variable-length byte sequences. In Python, converting a string to UTF-8 is a common task, and there are several simple methods to achieve this. In this article, we will explor
3 min read