Python Programming UNIT 2
Python Programming UNIT 2
1. PYTHON STRING
Till now, we have discussed numbers as the standard data-types in Python. In this
section of the tutorial, we will discuss the most popular data type in Python, i.e., string.
Python string is the collection of the characters surrounded by single quotes, double
quotes, or triple quotes. The computer does not understand the characters; internally, it
stores manipulated character as the combination of the 0's and 1's.
Each character is encoded in the ASCII or Unicode character. So we can say that Python
strings are also called the collection of Unicode characters.
Syntax:
1. str = "Hi Python !"
Here, if we check the type of the variable str using a Python script
In Python, strings are treated as the sequence of characters, which means that Python
doesn't support the character data-type; instead, a single character written as 'p' is
treated as the string of length 1.
Output:
Hello Python
Hello Python
Triple quotes are generally used for
represent the multiline or
docstring
Strings indexing and splitting
Like other languages, the indexing of the Python strings starts from 0. For example, The
string "HELLO" is indexed as given in the below figure.
1. str = "HELLO"
2. print(str[0])
3. print(str[1])
4. print(str[2])
5. print(str[3])
6. print(str[4])
7. # It returns the IndexError because 6th index doesn't exist
8. print(str[6])
Output:
H
E
L
L
O
IndexError: string index out of range
As shown in Python, the slice operator [] is used to access the individual characters of
the string. However, we can use the : (colon) operator in Python to access the substring
from the given string. Consider the following example.
Here, we must notice that the upper range given in the slice operator is always exclusive
i.e., if str = 'HELLO' is given, then str[1:3] will always include str[1] = 'E', str[2] = 'L'
and nothing else.
Output:
JAVATPOINT
AVAT
VA
JAV
TPO
We can do the negative slicing in the string; it starts from the rightmost character, which
is indicated as -1. The second rightmost index indicates -2, and so on. Consider the
following image.
Consider the following example
1. str = 'JAVATPOINT'
2. print(str[-1])
3. print(str[-3])
4. print(str[-2:])
5. print(str[-4:-1])
6. print(str[-7:-2])
7. # Reversing the given string
8. print(str[::-1])
9. print(str[-12])
Output:
T
I
NT
OIN
ATPOI
TNIOPTAVAJ
IndexError: string index out of range
Deleting the String
As we know that strings are immutable. We cannot delete or remove the characters from
the string. But we can delete the entire string using the del keyword.
1. str = "JAVATPOINT"
2. del str[1]
Output:
1. str1 = "JAVATPOINT"
2. del str1
3. print(str1)
Output:
String Operators
Operator Description
not in It is also a membership operator and does the exact reverse of in. It
returns true if a particular substring is not present in the specified
string.
r/R It is used to specify the raw string. Raw strings are used in the cases
where we need to print the actual meaning of escape characters such
as "C://python". To define any string as a raw string, the character r or
R is followed by the string.
Example
Consider the following example to understand the real use of Python operators.
1. str = "Hello"
2. str1 = " world"
3. print(str*3) # prints HelloHelloHello
4. print(str+str1)# prints Hello world
5. print(str[4]) # prints o
6. print(str[2:4]); # prints ll
7. print('w' in str) # prints false as w is not present in str
8. print('wo' not in str1) # prints false as wo is present in str1.
9. print(r'C://python37') # prints C://python37 as it is written
10. print("The string str : %s"%(str)) # prints The string str : Hello
Output:
HelloHelloHello
Hello world
o
ll
False
False
C://python37
The string str : Hello
2. UNICODE
Unicode is a system designed to represent every character from every language.
Each number represents a unique character used in at least one of the world‟s
languages.
Characters that are used in multiple languages generally have the same number,
unless there is a good etymological reason not to.
Regardless, there is exactly 1 number per character, and exactly 1 character per
number.
Advantages- the most important being that you can find the Nth character of a
string in constant time, because the Nth character starts at the 4×Nth byte.
Disadvantages, the most obvious being that it takes four freaking bytes to store
every freaking character.
Even though there are a lot of Unicode characters, it turns out that most people
will never use anything beyond the first 65535. Thus, there is another Unicode
encoding, called UTF-16 (because 16 bits = 2 bytes).
UTF-16 encodes every character from 0–65535 as two bytes, then uses some dirty
hacks if you actually need to represent the rarely-used “astral plane” Unicode
characters beyond 65535.
Most obvious advantage: UTF-16 is twice as space-efficient as UTF-32, because
every character requires only two bytes to store instead of four bytes
But there are also non-obvious disadvantages to both UTF-32 and UTF-16.
Otherwise, the receiving system has no way of knowing whether the two-byte
sequence 4E 2D means U+4E2D or U+2D4E.
UTF-8
UTF-8 uses just one byte per character. In fact, it uses the exact same bytes;
the first 128 characters (0–127) in UTF-8 are indistinguishable from ASCII .
“Extended Latin” characters like ñ and ö end up taking two bytes
Also, there is bit- twiddling involved to encode characters into bytes and decode
bytes into characters.
A document encoded in UTF-8 uses the exact same stream of bytes on any
computer.
3.PYTHON STRING FORMATTING
Escape Sequence
Let's suppose we need to write the text as - They said, "Hello what's going on?"- the
given statement can be written in single quotes or double quotes but it will raise
the SyntaxError as it contains both single and double-quotes.
Example
Consider the following example to understand the real use of Python operators.
Output:
We can use the triple quotes to accomplish this problem but Python provides the escape
sequence.
The backslash(/) symbol denotes the escape sequence. The backslash can be followed by
a special character and it interpreted differently. The single quotes inside the string must
be escaped. We can apply the same as in the double quotes.
Example -
1. # using triple quotes
2. print('''''They said, "What's there?"''')
3.
4. # escaping single quotes
5. print('They said, "What\'s going on?"')
6.
7. # escaping double quotes
8. print("They said, \"What's going on?\"")
Output:
2. \\ Backslash print("\\")
Output:
\
1. print("C:\\Users\\DEVANSH SHARMA\\Python32\\Lib")
2. print("This is the \n multiline quotes")
3. print("This is \x48\x45\x58 representation")
Output:
C:\Users\DEVANSH SHARMA\Python32\Lib
This is the
multiline quotes
This is HEX representation
We can ignore the escape sequence from the given string by using the raw string. We
can do this by writing r or R in front of the string. Consider the following example.
1. print(r"C:\\Users\\DEVANSH SHARMA\\Python32")
Output:
C:\\Users\\DEVANSH SHARMA\\Python32
Output:
1. Integer = 10;
2. Float = 1.290
3. String = "Devansh"
4. print("Hi I am Integer ... My value is %d\nHi I am float ... My value is %f\nHi I am s
tring ... My value is %s"%(Integer,Float,String))
Output:
Method Description
decode(encoding = 'UTF8', errors Decodes the string using codec registered for
= 'strict') encoding.
5.PYTHON BYTES()
The bytes() method returns a immutable bytes object initialized with the given size and data.
bytes() method returns a bytes object which is an immutable (cannot be modified) sequence of
integers in the range 0 <=x < 256.
If you want to use the mutable version, use bytearray() method.
bytes() Parameters
Type Description
Object A read-only buffer of the object will be used to initialize the byte array
Creates an array of size equal to the iterable count and initialized to the
Iterable
iterable elements Must be iterable of integers between 0 <= x < 256
No source
Creates an array of size 0
(arguments)
The bytes() method returns a bytes object of the given size and initialization values.
Output
b'Python is interesting.'
size = 5
arr = bytes(size)
print(arr)
Output
b'\x00\x00\x00\x00\x00'
rList = [1, 2, 3, 4, 5]
arr = bytes(rList)
print(arr)
Output
b'\x01\x02\x03\x04\x05'
Example
UTF-8 encode the string:
x = txt.encode()
print(x)
Parameter Values
Parameter Description
errors Optional. A String specifying the error method. Legal values are:
print(txt.encode(encoding="ascii",errors="backslashreplace"))
print(txt.encode(encoding="ascii",errors="ignore"))
print(txt.encode(encoding="ascii",errors="namereplace"))
print(txt.encode(encoding="ascii",errors="replace"))
print(txt.encode(encoding="ascii",errors="xmlcharrefreplace"))
7. REGULAR EXPRESSIONS
Regular expressions are a powerful and (mostly) standardized way of searching, replacing, and parsing
text with complex patterns of characters. Although the regular expression syntax is tight and unlike
normal code, the result can end up being more readable than a hand-rolled solution that uses a long chain
of string functions. There are even ways of embedding comments within regular expressions, so you can
include fine-grained documentation within them.
So far you‟ve just been dealing with what I‟ll call “compact” regular expressions. As you‟ve seen, they
are difficult to read, and even if you figure out what one does, that‟s no guarantee that you‟ll be able to
understand it six months later. What you really need is inline documentation.
Python allows you to do this with something called verbose regular expressions. A verbose regular
expression is different from a compact regular expression in two ways:
• Whitespace is ignored. Spaces, tabs, and carriage returns are not matched as spaces, tabs, and carriage
returns.
They‟re not matched at all. (If you want to match a space in a verbose regular expression, you‟ll need to
escape it by putting a backslash in front of it.)
• Comments are ignored. A comment in a verbose regular expression is just like a comment in Python
code: it starts with a # character and goes until the end of the line. In this case it‟s a comment within a
multi-line string instead of within your source code, but it works the same way.
This will be more clear with an example. Let‟s revisit the compact regular expression you‟ve been
working with, and make it a verbose regular expression. This example shows how.
1. The most important thing to remember when using verbose regular expressions is that you need to pass
an extra argument when working with them: re.VERBOSE is a constant defined in the re module that
signals that the pattern should be treated as a verbose regular expression. As you can see, this pattern has
quite a bit of whitespace (all of which is ignored), and several comments (all of which are ignored). Once
you ignore the \d matches any numeric digit (0–9). \D matches whitespace and the comments, this is
exactly the same regular expression as you saw in the previous section, butit‟s a lot more readable.
2. This matches the start of the string, then one of a possible three M, then CM, then L and three of a
possible three X, then IX, then the end of the string.
3. This matches the start of the string, then three of a possible three M, then D and three of a possible
three C, then L and three of a possible three X, then V and three of a possible three I, then the end of the
string.
4. This does not match. Why? Because it doesn‟t have the re.VERBOSE flag, so the re.search function is
treating the pattern as a compact regular expression, with significant whitespace and literal hash marks.
Python can‟t auto- detect whether a regular expression is verbose or not. Python assumes every regular
expression is compact unless you explicitly state that it is verbose.