unicode_literals in Python
Last Updated :
21 Jun, 2021
Unicode is also called Universal Character set. ASCII uses 8 bits(1 byte) to represents a character and can have a maximum of 256 (2^8) distinct combinations. The issue with the ASCII is that it can only support the English language but what if we want to use another language like Hindi, Russian, Chinese, etc. We didn't have enough space in ASCII to covers up all these languages and emojis. This is where Unicode comes, Unicode provides us a huge table to which can store ASCII table and also the extent to store other languages, symbols, and emojis.
We actually can not save the text as Unicode directly. Because Unicode is just an abstract representation of the text data. We need some kind of encoding/mapping to map each character to a certain number. If a character uses more than 1 byte(8-bits), then all that bytes need to be packed as a single unit (think of a box with more than one item). This boxing method is called the UTF-8 method. In UTF-8 character can occupy a minimum of 8 bits and in UTF-16 a character can occupy a minimum of 16-bits. UTF is just an algorithm that turns Unicode into bytes and read it back
Normally, in python2 all string literals are considered as byte strings by default but in the later version of python, all the string literals are Unicode strings by default. So to make all the strings literals Unicode in python we use the following import :
from __future__ import unicode_literals
If we are using an older version of python, we need to import the unicode_literals from the future package. This import will make python2 behave as python3 does. This will make the code cross-python version compatible.
Python 2
Python
import sys
# checking the default encoding of string
print "The default encoding for python2 is:",
sys.getdefaultencoding()
Output:
The default encoding for python2 is: ascii
As in python2, the default encoding is ASCII we need to switch the encoding to utf-8.
Python
from __future__ import unicode_literals
# creating variables to holds
# the letters in python word.
p = "\u2119"
y = "\u01b4"
t = "\u2602"
h = "\u210c"
o = "\u00f8"
n = "\u1f24"
# printing Python
# encoding to utf-8 from ascii
print(p+y+t+h+o+n).encode("utf-8")
Python3:
Python3
# In python3
# By default the encoding is "utf-8"
import sys
# printing the default encoding
print("The default encoding for python3 is:", sys.getdefaultencoding())
# to define string as unicode
# we need to prefix every string with u"...."
p = u"\u2119"
y = u"\u01b4"
t = u"\u2602"
h = u"\u210c"
o = u"\u00f8"
n = u"\u1f24"
# printing Python
print(p+y+t+h+o+n)
Output:
The default encoding for python3 is: utf-8
ℙƴ☂ℌøἤ
Here,
Sr. no.
| Unicode
| Description
|
---|
1.
| U+2119
| it will display double-struck capital P
|
2.
| U+01B4
| it will display the Latin small letter Y with a hook.
|
3.
| U+2602
| it will display an umbrella.
|
4.
| U+210C
| it will display the capital letter H.
|
5.
| U+00F8
| it will display the Latin small letter O with a stroke.
|
6.
| U+1F24
| it will display the Greek letter ETA.
|
Similar Reads
How To Convert Unicode To Integers In Python Unicode is a standardized character encoding that assigns a unique number to each character in most of the world's writing systems. In Python, working with Unicode is common, and you may encounter situations where you need to convert Unicode characters to integers. This article will explore five dif
2 min read
Convert Unicode to ASCII in Python Unicode is the universal character set and a standard to support all the world's languages. It contains 140,000+ characters used by 150+ scripts along with various symbols. ASCII on the other hand is a subset of Unicode and the most compatible character set, consisting of 128 letters made of English
2 min read
How To Print Unicode Character In Python? Unicode characters play a crucial role in handling diverse text and symbols in Python programming. This article will guide you through the process of printing Unicode characters in Python, showcasing five simple and effective methods to enhance your ability to work with a wide range of characters Pr
2 min read
Working with Unicode in Python Unicode serves as the global standard for character encoding, ensuring uniform text representation across diverse computing environments. Python, a widely used programming language, adopts the Unicode Standard for its strings, facilitating internationalization in software development. This tutorial
3 min read
Convert Unicode to Bytes in Python Unicode, often known as the Universal Character Set, is a standard for text encoding. The primary objective of Unicode is to create a universal character set that can represent text in any language or writing system. Text characters from various writing systems are given distinctive representations
2 min read
Invalid Decimal Literal in Python A decimal literal is a number written with digits and an optional decimal point (.). It represents a floating-point value. For example, 20.25 is a valid decimal literal because it contains only digits and a single decimal point.What Causes a "SyntaxError: Invalid Decimal Literal"Python throws this e
2 min read