0% found this document useful (0 votes)
135 views

Python Programming UNIT 2

1. The document discusses Python strings and string formatting. 2. Python strings can be created using single quotes, double quotes, or triple quotes and are treated as a sequence of characters. 3. Strings support operators like concatenation (+), repetition (*), slicing ([]), indexing, formatting and membership tests. Strings are immutable in Python.

Uploaded by

indu budamagunta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views

Python Programming UNIT 2

1. The document discusses Python strings and string formatting. 2. Python strings can be created using single quotes, double quotes, or triple quotes and are treated as a sequence of characters. 3. Strings support operators like concatenation (+), repetition (*), slicing ([]), indexing, formatting and membership tests. Strings are immutable in Python.

Uploaded by

indu budamagunta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

UNIT-II

1. PYTHON STRING
Till now, we have discussed numbers as the standard data-types in Python. In this
section of the tutorial, we will discuss the most popular data type in Python, i.e., string.

Python string is the collection of the characters surrounded by single quotes, double
quotes, or triple quotes. The computer does not understand the characters; internally, it
stores manipulated character as the combination of the 0's and 1's.

Each character is encoded in the ASCII or Unicode character. So we can say that Python
strings are also called the collection of Unicode characters.

In Python, strings can be created by enclosing the character or the sequence of


characters in the quotes. Python allows us to use single quotes, double quotes, or triple
quotes to create the string.

Consider the following example in Python to create a string.

Syntax:
1. str = "Hi Python !"

Here, if we check the type of the variable str using a Python script

1. print(type(str)), then it will print a string (str).

In Python, strings are treated as the sequence of characters, which means that Python
doesn't support the character data-type; instead, a single character written as 'p' is
treated as the string of length 1.

Creating String in Python


We can create a string by enclosing the characters in single-quotes or double- quotes.
Python also provides triple-quotes to represent the string, but it is generally used for
multiline string or docstrings.

1. #Using single quotes


2. str1 = 'Hello Python'
3. print(str1)
4. #Using double quotes
5. str2 = "Hello Python"
6. print(str2)
7.
8. #Using triple quotes
9. str3 = '''''Triple quotes are generally used for
10. represent the multiline or
11. docstring'''
12. print(str3)

Output:

Hello Python
Hello Python
Triple quotes are generally used for
represent the multiline or
docstring
Strings indexing and splitting
Like other languages, the indexing of the Python strings starts from 0. For example, The
string "HELLO" is indexed as given in the below figure.

Consider the following example:

1. str = "HELLO"
2. print(str[0])
3. print(str[1])
4. print(str[2])
5. print(str[3])
6. print(str[4])
7. # It returns the IndexError because 6th index doesn't exist
8. print(str[6])

Output:

H
E
L
L
O
IndexError: string index out of range

As shown in Python, the slice operator [] is used to access the individual characters of
the string. However, we can use the : (colon) operator in Python to access the substring
from the given string. Consider the following example.

Here, we must notice that the upper range given in the slice operator is always exclusive
i.e., if str = 'HELLO' is given, then str[1:3] will always include str[1] = 'E', str[2] = 'L'
and nothing else.

Consider the following example:


1. # Given String
2. str = "JAVATPOINT"
3. # Start Oth index to end
4. print(str[0:])
5. # Starts 1th index to 4th index
6. print(str[1:5])
7. # Starts 2nd index to 3rd index
8. print(str[2:4])
9. # Starts 0th to 2nd index
10. print(str[:3])
11. #Starts 4th to 6th index
12. print(str[4:7])

Output:

JAVATPOINT
AVAT
VA
JAV
TPO

We can do the negative slicing in the string; it starts from the rightmost character, which
is indicated as -1. The second rightmost index indicates -2, and so on. Consider the
following image.
Consider the following example

1. str = 'JAVATPOINT'
2. print(str[-1])
3. print(str[-3])
4. print(str[-2:])
5. print(str[-4:-1])
6. print(str[-7:-2])
7. # Reversing the given string
8. print(str[::-1])
9. print(str[-12])

Output:

T
I
NT
OIN
ATPOI
TNIOPTAVAJ
IndexError: string index out of range
Deleting the String
As we know that strings are immutable. We cannot delete or remove the characters from
the string. But we can delete the entire string using the del keyword.

1. str = "JAVATPOINT"
2. del str[1]

Output:

TypeError: 'str' object doesn't support item deletion

Now we are deleting entire string.

1. str1 = "JAVATPOINT"
2. del str1
3. print(str1)

Output:

NameError: name 'str1' is not defined

String Operators

Operator Description

+ It is known as concatenation operator used to join the strings given


either side of the operator.

* It is known as repetition operator. It concatenates the multiple copies


of the same string.

[] It is known as slice operator. It is used to access the sub-strings of a


particular string.

[:] It is known as range slice operator. It is used to access the characters


from the specified range.

in It is known as membership operator. It returns if a particular sub-string


is present in the specified string.

not in It is also a membership operator and does the exact reverse of in. It
returns true if a particular substring is not present in the specified
string.

r/R It is used to specify the raw string. Raw strings are used in the cases
where we need to print the actual meaning of escape characters such
as "C://python". To define any string as a raw string, the character r or
R is followed by the string.

% It is used to perform string formatting. It makes use of the format


specifiers used in C programming like %d or %f to map their values in
python. We will discuss how formatting is done in python.

Example

Consider the following example to understand the real use of Python operators.

1. str = "Hello"
2. str1 = " world"
3. print(str*3) # prints HelloHelloHello
4. print(str+str1)# prints Hello world
5. print(str[4]) # prints o
6. print(str[2:4]); # prints ll
7. print('w' in str) # prints false as w is not present in str
8. print('wo' not in str1) # prints false as wo is present in str1.
9. print(r'C://python37') # prints C://python37 as it is written
10. print("The string str : %s"%(str)) # prints The string str : Hello

Output:

HelloHelloHello
Hello world
o
ll
False
False
C://python37
The string str : Hello

2. UNICODE
Unicode is a system designed to represent every character from every language.

Unicode represents each letter, character, or ideograph as a 4-byte number.

Each number represents a unique character used in at least one of the world‟s
languages.

Characters that are used in multiple languages generally have the same number,
unless there is a good etymological reason not to.

Regardless, there is exactly 1 number per character, and exactly 1 character per
number.

There is a Unicode encoding that uses four bytes per character.

It‟s called UTF-32, because 32 bits = 4 bytes.

UTF-32 is a straightforward encoding; it takes each Unicode character (a 4-byte


number) and represents the character with that same number.

 Advantages- the most important being that you can find the Nth character of a
string in constant time, because the Nth character starts at the 4×Nth byte.

 Disadvantages, the most obvious being that it takes four freaking bytes to store
every freaking character.

 Even though there are a lot of Unicode characters, it turns out that most people
will never use anything beyond the first 65535. Thus, there is another Unicode
encoding, called UTF-16 (because 16 bits = 2 bytes).

UTF-16 encodes every character from 0–65535 as two bytes, then uses some dirty
hacks if you actually need to represent the rarely-used “astral plane” Unicode
characters beyond 65535.
Most obvious advantage: UTF-16 is twice as space-efficient as UTF-32, because
every character requires only two bytes to store instead of four bytes

But there are also non-obvious disadvantages to both UTF-32 and UTF-16.

Otherwise, the receiving system has no way of knowing whether the two-byte
sequence 4E 2D means U+4E2D or U+2D4E.

UTF-8

 UTF-8 is a variable-length encoding system for Unicode.

 That is, different characters take up a different number of bytes.

 For ASCII characters (A-Z, &c.)

 UTF-8 uses just one byte per character. In fact, it uses the exact same bytes;
the first 128 characters (0–127) in UTF-8 are indistinguishable from ASCII .
“Extended Latin” characters like ñ and ö end up taking two bytes

Disadvantages: because each character can take a different number of bytes,


finding the Nth character is an O (N) operation , the longer it takes to find a
specific character.

Also, there is bit- twiddling involved to encode characters into bytes and decode
bytes into characters.

Advantages: super-efficient encoding of common ASCII characters.

No worse than UTF-16 for extended Latin characters.

Better than UTF-32 for Chinese characters.

A document encoded in UTF-8 uses the exact same stream of bytes on any
computer.
3.PYTHON STRING FORMATTING
Escape Sequence

Let's suppose we need to write the text as - They said, "Hello what's going on?"- the
given statement can be written in single quotes or double quotes but it will raise
the SyntaxError as it contains both single and double-quotes.

Example

Consider the following example to understand the real use of Python operators.

1. str = "They said, "Hello what's going on?""


2. print(str)

Output:

SyntaxError: invalid syntax

We can use the triple quotes to accomplish this problem but Python provides the escape
sequence.

The backslash(/) symbol denotes the escape sequence. The backslash can be followed by
a special character and it interpreted differently. The single quotes inside the string must
be escaped. We can apply the same as in the double quotes.

Example -
1. # using triple quotes
2. print('''''They said, "What's there?"''')
3.
4. # escaping single quotes
5. print('They said, "What\'s going on?"')
6.
7. # escaping double quotes
8. print("They said, \"What's going on?\"")

Output:

They said, "What's there?"


They said, "What's going on?"
They said, "What's going on?"

The list of an escape sequence is given below:

Sr. Escape Description Example


Sequence

1. \newline It ignores the new print("Python1 \


line. Python2 \
Python3")
Output:
Python1 Python2 Python3

2. \\ Backslash print("\\")
Output:
\

3. \' Single Quotes print('\'')


Output:
'

4. \\'' Double Quotes print("\"")


Output:
"

5. \a ASCII Bell print("\a")

6. \b ASCII Backspace(BS) print("Hello \b World")


Output:
Hello World

7. \f ASCII Formfeed print("Hello \f World!")


Hello World!

8. \n ASCII Linefeed print("Hello \n World!")


Output:
Hello
World!
9. \r ASCII Carriege print("Hello \r World!")
Return(CR) Output:
World!

10. \t ASCII Horizontal Tab print("Hello \t World!")


Output:
Hello World!

11. \v ASCII Vertical Tab print("Hello \v World!")


Output:
Hello
World!

12. \ooo Character with octal print("\110\145\154\154\157")


value Output:
Hello

13 \xHH Character with hex print("\x48\x65\x6c\x6c\x6f")


value. Output:
Hello

Here is the simple example of escape sequence.

1. print("C:\\Users\\DEVANSH SHARMA\\Python32\\Lib")
2. print("This is the \n multiline quotes")
3. print("This is \x48\x45\x58 representation")

Output:

C:\Users\DEVANSH SHARMA\Python32\Lib
This is the
multiline quotes
This is HEX representation

We can ignore the escape sequence from the given string by using the raw string. We
can do this by writing r or R in front of the string. Consider the following example.

1. print(r"C:\\Users\\DEVANSH SHARMA\\Python32")
Output:

C:\\Users\\DEVANSH SHARMA\\Python32

The format() method


The format() method is the most flexible and useful method in formatting strings. The
curly braces {} are used as the placeholder in the string and replaced by
the format() method argument. Let's have a look at the given an example:

1. # Using Curly braces


2. print("{} and {} both are the best friend".format("Devansh","Abhishek"))
3.
4. #Positional Argument
5. print("{1} and {0} best players ".format("Virat","Rohit"))
6.
7. #Keyword Argument
8. print("{a},{b},{c}".format(a = "James", b = "Peter", c = "Ricky"))

Output:

Devansh and Abhishek both are the best friend


Rohit and Virat best players
James,Peter,Ricky

Python String Formatting Using % Operator


Python allows us to use the format specifiers used in C's printf statement. The format
specifiers in Python are treated in the same way as they are treated in C. However,
Python provides an additional operator %, which is used as an interface between the
format specifiers and their values. In other words, we can say that it binds the format
specifiers to the values.

Consider the following example.

1. Integer = 10;
2. Float = 1.290
3. String = "Devansh"
4. print("Hi I am Integer ... My value is %d\nHi I am float ... My value is %f\nHi I am s
tring ... My value is %s"%(Integer,Float,String))

Output:

Hi I am Integer ... My value is 10


Hi I am float ... My value is 1.290000
Hi I am string ... My value is Devansh

4. PYTHON STRING FUNCTIONS


Python provides various in-built functions that are used for string handling. Many String
fun

Method Description

capitalize() It capitalizes the first character of the String. This


function is deprecated in python3

casefold() It returns a version of s suitable for case-less


comparisons.

center(width ,fillchar) It returns a space padded string with the original


string centred with equal number of left and right
spaces.

count(string,begin,end) It counts the number of occurrences of a


substring in a String between begin and end
index.

decode(encoding = 'UTF8', errors Decodes the string using codec registered for
= 'strict') encoding.

encode() Encode S using the codec registered for


encoding. Default encoding is 'utf-8'.

endswith(suffix It returns a Boolean value if the string terminates


,begin=0,end=len(string)) with given suffix between begin and end.

expandtabs(tabsize = 8) It defines tabs in string to multiple spaces. The


default space value is 8.

find(substring ,beginIndex, It returns the index value of the string where


endIndex) substring is found between begin index and end
index.

format(value) It returns a formatted version of S, using the


passed value.

index(subsring, beginIndex, It throws an exception if string is not found. It


endIndex) works same as find() method.

isalnum() It returns true if the characters in the string are


alphanumeric i.e., alphabets or numbers and
there is at least 1 character. Otherwise, it returns
false.

isalpha() It returns true if all the characters are alphabets


and there is at least one character, otherwise
False.

isdecimal() It returns true if all the characters of the string


are decimals.

isdigit() It returns true if all the characters are digits and


there is at least one character, otherwise False.

isidentifier() It returns true if the string is the valid identifier.

islower() It returns true if the characters of a string are in


lower case, otherwise false.

isnumeric() It returns true if the string contains only numeric


characters.

isprintable() It returns true if all the characters of s are


printable or s is empty, false otherwise.

isupper() It returns false if characters of a string are in


Upper case, otherwise False.

isspace() It returns true if the characters of a string are


white-space, otherwise false.

istitle() It returns true if the string is titled properly and


false otherwise. A title string is the one in which
the first character is upper-case whereas the
other characters are lower-case.

isupper() It returns true if all the characters of the string(if


exists) is true otherwise it returns false.

join(seq) It merges the strings representation of the given


sequence.

len(string) It returns the length of a string.

ljust(width[,fillchar]) It returns the space padded strings with the


original string left justified to the given width.

lower() It converts all the characters of a string to Lower


case.

lstrip() It removes all leading whitespaces of a string and


can also be used to remove particular character
from leading.

partition() It searches for the separator sep in S, and


returns the part before it, the separator itself,
and the part after it. If the separator is not
found, return S and two empty strings.

maketrans() It returns a translation table to be used in


translate function.
replace(old,new[,count]) It replaces the old sequence of characters with
the new sequence. The max characters are
replaced if max is given.

rfind(str,beg=0,end=len(str)) It is similar to find but it traverses the string in


backward direction.

rindex(str,beg=0,end=len(str)) It is same as index but it traverses the string in


backward direction.

rjust(width,[,fillchar]) Returns a space padded string having original


string right justified to the number of characters
specified.

rstrip() It removes all trailing whitespace of a string and


can also be used to remove particular character
from trailing.

rsplit(sep=None, maxsplit = -1) It is same as split() but it processes the string


from the backward direction. It returns the list of
words in the string. If Separator is not specified
then the string splits according to the white-
space.

split(str,num=string.count(str)) Splits the string according to the delimiter str.


The string splits according to the space if the
delimiter is not provided. It returns the list of
substring concatenated with the delimiter.

splitlines(num=string.count('\n')) It returns the list of strings at each line with


newline removed.

startswith(str,beg=0,end=len(str)) It returns a Boolean value if the string starts with


given str between begin and end.

strip([chars]) It is used to perform lstrip() and rstrip() on the


string.

swapcase() It inverts case of all characters in a string.


title() It is used to convert the string into the title-case
i.e., The string meEruT will be converted to
Meerut.

translate(table,deletechars = '') It translates the string according to the


translation table passed in the function .

upper() It converts all the characters of a string to Upper


Case.

5.PYTHON BYTES()
The bytes() method returns a immutable bytes object initialized with the given size and data.

The syntax of bytes() method is:

bytes([source[, encoding[, errors]]])

bytes() method returns a bytes object which is an immutable (cannot be modified) sequence of
integers in the range 0 <=x < 256.
If you want to use the mutable version, use bytearray() method.

bytes() Parameters

bytes() takes three optional parameters:


 source (Optional) - source to initialize the array of bytes.
 encoding (Optional) - if the source is a string, the encoding of the string.
 errors (Optional) - if the source is a string, the action to take when the encoding
conversion fails (Read more: String encoding)
The source parameter can be used to initialize the byte array in the following ways:
Different source parameters

Type Description

Converts the string to bytes using str.encode() Must also provide


String
encoding and optionally errors

Integer Creates an array of provided size, all initialized to null

Object A read-only buffer of the object will be used to initialize the byte array

Creates an array of size equal to the iterable count and initialized to the
Iterable
iterable elements Must be iterable of integers between 0 <= x < 256

No source
Creates an array of size 0
(arguments)

Return value from bytes()

The bytes() method returns a bytes object of the given size and initialization values.

Example 1: Convert string to bytes

string = "Python is interesting."

# string with encoding 'utf-8'


arr = bytes(string, 'utf-8')
print(arr)

Output

b'Python is interesting.'

Example 2: Create a byte of given integer size

size = 5

arr = bytes(size)
print(arr)

Output

b'\x00\x00\x00\x00\x00'

Example 3: Convert iterable list to bytes

rList = [1, 2, 3, 4, 5]

arr = bytes(rList)
print(arr)

Output

b'\x01\x02\x03\x04\x05'

6. PYTHON STRING ENCODE() METHOD

Example
UTF-8 encode the string:

txt = "My name is Ståle"

x = txt.encode()

print(x)

Definition and Usage


The encode() method encodes the string, using the specified encoding. If no encoding is specified,
UTF-8 will be used.
Syntax
string.encode(encoding=encoding, errors=errors)

Parameter Values
Parameter Description

encoding Optional. A String specifying the encoding to use. Default is UTF-8

errors Optional. A String specifying the error method. Legal values are:

'backslashreplace' - uses a backslash instead of the character that could not be


encoded

'ignore' - ignores the characters that cannot be encoded

'namereplace' - replaces the character with a text explaining the character

'strict' - Default, raises an error on failure

'replace' - replaces the character with a questionmark

'xmlcharrefreplace' - replaces the character with an xml character


More Examples
Example
These examples uses ascii encoding, and a character that cannot be encoded, showing the result with
different errors:

txt = "My name is Ståle"

print(txt.encode(encoding="ascii",errors="backslashreplace"))
print(txt.encode(encoding="ascii",errors="ignore"))
print(txt.encode(encoding="ascii",errors="namereplace"))
print(txt.encode(encoding="ascii",errors="replace"))
print(txt.encode(encoding="ascii",errors="xmlcharrefreplace"))

7. REGULAR EXPRESSIONS
Regular expressions are a powerful and (mostly) standardized way of searching, replacing, and parsing
text with complex patterns of characters. Although the regular expression syntax is tight and unlike
normal code, the result can end up being more readable than a hand-rolled solution that uses a long chain
of string functions. There are even ways of embedding comments within regular expressions, so you can
include fine-grained documentation within them.

VERBOSE REGULAR EXPRESSIONS

So far you‟ve just been dealing with what I‟ll call “compact” regular expressions. As you‟ve seen, they
are difficult to read, and even if you figure out what one does, that‟s no guarantee that you‟ll be able to
understand it six months later. What you really need is inline documentation.

Python allows you to do this with something called verbose regular expressions. A verbose regular
expression is different from a compact regular expression in two ways:

• Whitespace is ignored. Spaces, tabs, and carriage returns are not matched as spaces, tabs, and carriage
returns.

They‟re not matched at all. (If you want to match a space in a verbose regular expression, you‟ll need to
escape it by putting a backslash in front of it.)

• Comments are ignored. A comment in a verbose regular expression is just like a comment in Python
code: it starts with a # character and goes until the end of the line. In this case it‟s a comment within a
multi-line string instead of within your source code, but it works the same way.
This will be more clear with an example. Let‟s revisit the compact regular expression you‟ve been
working with, and make it a verbose regular expression. This example shows how.

1. The most important thing to remember when using verbose regular expressions is that you need to pass
an extra argument when working with them: re.VERBOSE is a constant defined in the re module that
signals that the pattern should be treated as a verbose regular expression. As you can see, this pattern has
quite a bit of whitespace (all of which is ignored), and several comments (all of which are ignored). Once
you ignore the \d matches any numeric digit (0–9). \D matches whitespace and the comments, this is
exactly the same regular expression as you saw in the previous section, butit‟s a lot more readable.

2. This matches the start of the string, then one of a possible three M, then CM, then L and three of a
possible three X, then IX, then the end of the string.

3. This matches the start of the string, then three of a possible three M, then D and three of a possible
three C, then L and three of a possible three X, then V and three of a possible three I, then the end of the
string.
4. This does not match. Why? Because it doesn‟t have the re.VERBOSE flag, so the re.search function is
treating the pattern as a compact regular expression, with significant whitespace and literal hash marks.
Python can‟t auto- detect whether a regular expression is verbose or not. Python assumes every regular
expression is compact unless you explicitly state that it is verbose.

You might also like