PythonString&Characters
PythonString&Characters
UNIT-III
CHAPTER-1
STRINGS AND CHARACTERS:
Since every string comprises several characters, Python handles strings and
characters almost in the same manner. There is no separate datatype to represent
individual characters in Python.
CREATING STRINGS
There is no difference between the single quotes and double quotes while creating the
strings. Both will work in the same manner.
Sometimes, we can use triple single quotes or triple double quotes to represent strings.
These quotation marks are useful when we want to represent a string that occupies
several lines as:
In the preceding statement, the string 'str’ is created using triple single quotes.
Alternately, the above string can be created using triple double quotes as:
str = “””welcome to Core Python, a book on Python
language that discusses all important concepts of Python
in å lucid and comprehensive manner. “””
Thus, triple single quotes or triple double quotes are useful to create strings which span
into several lines.
It is possible to display quotation marks to mark a sub string in a string. In that case,we
should use one type of quotes for outer string and another type of quotes for inner
as:
s1 = 'welcome to "Core Python" learning'
print (s1)
Here, the string ‘s1’ contains two strings. The outer string is enclosed in single quotes
and the inner string, i.e. "Core Python" is enclosed in double quotes. Alternately, we can
use double quotes for outer string and single quotes for inner string as:
It is possible to use escape characters like \t or \n inside the strings. The escape
character \t releases tab space of 6 or 8 spaces and the escape character \n throws
cursor into a new line. For example,
To nullify the effect of escape characters, we can create the string as a ‘raw' string by
adding ‘r’ before the string as:
s1 = r"wel come to\tcore Python\nlearning"
print (s1)
This is not showing the effect of \t or \n. It means we could not see the horizontal tab
space or new line. Raw strings take escape characters, like \t, \n, etc., as ordinary
characters in a string and hence display them as they are.
To create a string with Unicode characters, we should add ‘u' at the beginning of the
string. Unicode is a standard to include the alphabet of various human languages into
programming languages like Python or Java. For example, it is possible to display the
alphabet of Hindi, French, and German languages using Unicode system. Each Unicode
character contains 4 digits preceded by a \u. The following statement displays ‘Core
Python' in Hindi using Unicode characters. There are 8 Unicode characters used for this
purpose.
name = u'\u0915\u094b\u0930 \u092a\u0948\u0925\u0964\u0928"
print (name)
LENGTH OF A STRING
Length of a string represents the number of characters in a string. To know the length of
a string, we can use the len() function. This function gives the number of characters
including spaces in the string.
str = ‘ Core Python ‘
n= len (str)
print (n)
The preceding lines of code will display the following output:
11
INDEXING IN STRINGS
Index represents the position number. Index is written using square braces [].
By specifying the position number through an index, we can refer to the individual
elements (or characters) of a string. For example, str[0] refers to the 0th element of the
string and str[1| refers to the 1st element of the string. Thus, str[i] can be used to refer to
ith element of the string. Here, ‘i’ is called the string index because it is specifying the
position number of the element in the string.
We can also use the for loop to access each element (or character) of a string. The
following for loop simply takes each element into a variable ‘i’ and displays it.
for i in str:
print(i)
To display the string in the reverse order, we should use slicing operation on string. The
format of slicing is stringname[start: stop: stepsize]. If 'start' and 'stop' are not specified,
then it is taken from 0th to n-1th elements. If 'stepsize' is not written, then it is taken to
be 1. Hence, the following loop will display all the elements of the string:
for i in str[ : : ]:
print(i)
To get the elements in reverse order, we should use stepsize negative as: -1. This will
display the elements from last to first in steps of 1 in reverse order. The for loop in this
case looks like this:
If 'start' and 'stop' are not specified, then slicing is done from 0th to n-1th elements. If
'stepsize' is not written, then it is taken to be 1. See the following example:
Core Pyth
When 'stepsize' is 2, then it will access every other character from 1st character onwards.
Hence it retrieves the 0th , 2nd , 4th , 6th characters and so on.
str[0:9:2]
Cr yh
Some other examples are given below to have a better understanding on slicing,
Consider the following code snippet:
str[::2]
It is possible to use reverse slicing to retrieve the elements from the string in reverse
order. The 'start', 'stop' can be specified as negative numbers. For example,
str = 'Core Python'
str[-4:-1]
tho
Python
When stepsize is negative, then the elements are counted from right to left. See the
examples:
str[-1:-4:-1]
'noh’
Now, if you write the following statement:
str[-1::-1] `1
The repetition operator is denoted by ‘*’ symbol and is useful to repeat the string for
Several times. For example, str*n repeats the string for n times. See the example:
str =' Core Python'
print (str*2)
CONCATENATION OF STRINGS
We can use ‘+’ on strings to attach a string at the end of another string. This
operator ‘+’ is called addition operator when used on numbers. But, when used on
strings, it is called ‘concatenation' operator since it joins or concatenates the strings.
Similar result can be achieved using the join() method.
ASST PROF VEENA MORE A.S.P COLLEGE OF COMMERCE(AUTONOMOUS),VIJAYAPUR Page 8
INTRODUCTION TO PYTHON PROGRAMMING Unit III
s1='Core’
s2=" Python"
s3=s1+s2
print (s3)
CorePython
CHECKING MEMBERSHIP
The operators in' and 'not in' make case sensitive comparisons. It means these
operators consider the upper case and lower case letters or strings differently while
comparing the strings.
COMPARING STRINGS
We can use the relational operators like >, >=, <, <=, == or != operators to
compare two strings. They return Boolean value, i.e. either True or False depending on
the strings being compared.
This code returns ‘Not same' as the strings are not same. While comparing the strings,
Python interpreter compares them by taking then in English dictionary order. The string
which comes first in the dictionary order will have a low value than the string which
comes next. It means, 'A' is less than B' which is less than ‘C' and so on. In the above
example, the string 's1' comes before the string 's2' and hence s1 is less than s2. So, if we
write:
The output will be ‘Name not found’. In this way, spaces may lead to wrong results.
Hence such spaces should be removed from the strings before they are compared. This is
possible using rstrip(), lstrip() and strip() methods. The rstrip() method removes the
spaces which are at the right side of the string. The lstrip () method removes spaces
which are at the left side of the string. strip() method removes spaces from both the
sides of the strings. These methods do not remove spaces which are in the middle of the
string. Consider the following code snippet:
Mukesh Deshmukh
The find(), rfind( ), index() and rindex() methods are useful to locate sub strings
in a string. These methods return the location of the first occurrence of the sub string in
the main string.
The find() and index() methods search for the sub string from the beginning of the main
string.
The rfind() and rindex() methods search for the sub string from right to left i.e. in
backward order.
The find() method returns -1 if the sub string is not found in the main string.
The index() method returns ‘ValueError' exception if the sub string is not found. The
format of find()method is:
In the above program, observe that the Sub string position is displayed to be at ‘n+1’.
Since find() method starts counting from 0th position and we count from 1st position, we
need to add 1 to the result given by find() method to get correct position number.
The same program can be rewritten using index() method. If the sub string is not
found index() method returns ‘ValueError' exception, we have to handle the exception in
our program. This is what we did in Program 5.
The find() method and index() methods return only the first occurrence of the sub
string When the sub string occurs several times in the main string, they cannot return all
those occurrences. Is there any way that we can find out all the occurrences of the sub
string is the question. For this purpose, we should develop additional logic.
Initially, searching should start from 0th character in the main string 'str' up to the
last character 'n' which is given by len(str). So, ‘i’ value will start initially at 0. When the
find() method finds the position of the sub string, we should display it. Suppose the sub
string is found at 2nd position, then we need not again search for the sub string up to the
2nd position. This time, we should continue searching from 3rd character onwards. Thus,
‘i’ value will become 3 i.e. i = post1. If find() method could not find the sub string, then
normal incrementing of i' is done, i.e. i=pos+1. If find() method could not find the sub
string, then normal incrementing of ‘i’ is done, i.e i=i+1. This logic is used in program 6.
The above program can be simplified by taking ‘pos' value initially as -1 and rewriting
find() method as:
find() method will search from ‘pos+1’ till the end of string. In this way, when the sub
string position is found, find() method will continue searching from its next position
onwards. If the string is found not even once, then ‘pos' value will continue to be -1 and
hence we can break the loop as:
if pos == -1: break
The method count() is available to count the number of occurrences of a sub string
in a main string, The format of this method is:
stringname.count (substring)
This returns an integer number that represents how many times the substring is
found in the main string. We can limit our search by specifying beginning and ending
positions the count() method so that the substring position is counted only in that range.
Hence, the other form of count() method is:
For example, we want to search for substring ‘Delhi' in the main string ‘New Delhi’ to
know how many times the substring appeared in the main string. We can use count( )
method as:
str = 'New Delhi’
n= str.count ('Delhi ')
print (n)
Suppose we want to know how many times ‘e' is repeated in the main string in the range
from 0th to 2nd characters, we can write:
n = str.count('e', 0, 3)
print (n)
The output of the preceding statements is as follows:
1
If we search for 'e' in the main string starting from 0th character to the end of the string.
we can write:
n =str. count('e', 0, len (str))
print (n)
The output of the preceding statements is as follows:
Security: Since string objects are immutable, any attempts to modify the existing
string object will create a new object in memory. Thus the identity number of the new
object will change that lets the programmer to understand that somebody modified
the original string. This is useful to enforce security where strings can be transported
from one application to another application without modifications.
In the following code, we create a string 'str' where we stored 4 characters: 'abcd'. When
we display 0th character, i.e. str[0], it will display 'a'. If we try to replace the 0th character
with a new character ‘x’, then there will be an error called ‘Type Error’. This is a proof
that strings are immutable.
We will take an ambiguous example. In this example, we are creating two strings 's1'and
's2’, as:
s1='one'
s2='two’
Now, we are modifying the content of the string 's2' by storing the content of 's1' into it
as
s2 = s1 # store s1 into s2
If we display, the 's2' string, we can see the same content of 's1'.
print (s2) # display s2
If you write,
print (s1)
Then, the output of the preceding statement is as follows:
one
It seems that the content of 's2' is replaced by the content of 's1' and hence 's2’ became
mutable. But this is wrong. When we write:
s2 =s1
The name 's2' will be adjusted to refer to the object that is referenced by 's1'. But, the
original value of ‘s2' that is 'two' is not altered. Since ‘two' is not referenced, the garbage
collector deletes that object from memory. Figure shows the immutability of string
objects:
ldentity number of an object internally refers to the memory address of the object and is
given by id() function. If we display identity numbers using id() function, we can find
that the identity number of ‘s2’ and ’s1’ are same since they refer to one and the same
object.
The replace() method is useful to replace a sub string with another sub string. The
format of using this method is:
stringname.replace(old,new)
This will replace all the occurrences of ‘old’ sub string in the main string. For example,
If we display the contents of ‘str’ and ‘str1’, we can understand that the original string
‘str’ is not modified. Consider the following statement:
print(str)
The output of the preceding statement is as follows:
That is beautiful girl
If you write,
print(str1)
The output of the preceding statement is as follows:
That is a beautiful flower
The split() method is used to brake a string into pieces. These pieces are returned as a
list. For example, to brake the string 'str' where a comma (, ) is found, we can write
str.split(',')
Observe the comma inside the parentheses. It is called separator that represents where
to separate or cut the string. Similarly, the separator will be a space if we want to cut the
string at spaces. In the following example, we are cutting the string 'str' wherever a
comma is found. The resultant string is stored in 'str1' which is a list.
In Program 8, we are accepting a group of numbers as a string from the user. The
numbers should be entered with space as separator. The numbers are stored by input()
function into a string 'str' which is split into pieces where a space is found. The group of
numbers are stored into a list ‘lst' from where we display them using a for loop.
When a group of strings are given, it is possible to join them all and make a single string,
For this purpose, we can use join() method as:
separator. join(str)
where, the separator represents the character to be used between the strings in the
output. ‘str’ represents a tuple or list of strings. In the following example, we are taking a
tuple 'str’ that contains 3 strings as:
str = ('one' , ‘two’ , ‘three')
We want to join the three strings and form a single string. Also, we want to use hypen (-)
between the three strings in the output. The join() method can be written as:
str1 = “-“.join(str)
print(str1)
The output of the preceding statements is as follows:
one-two-three
In the following example, we are taking a list comprising 4 strings and we are joining
them using a colon (:) between them.
If you write,
print (str.lower())
Then, the output will be:
python is the future
If you write,
print (str.swapcase())
Then, the output will be:
pYTHON IS THE FUTURE
The startswith() method is useful to know whether a string is starting with a sub string
not. The way to use this method is:
str.startswith (substring)
When the sub string is found in the main string 'str', this method returns True. If the
string is not found, it returns False. Consider the following statements:
str = ‘This is Python’
print(str.startswith('This'))
The output will be:
True
Similarly, to check the ending of a string, we can use endswith() method. It returns True
if the string ends with the specified sub string, otherwise it returns False.
str.endswith (substring)
str = 'This is Python’
print(str.endswith('Python'))
The output of the preceding statements is as follows:
True
There are several methods to test the nature of characters in a string. These methods
return either True or False. For example, if a string has only numeric digits, then isdigit()
method returns True. These methods can also be applied to individual characters. Table
8.2 mentioned the string and character testing methods:
To understand how to use these methods on strings, let's take an example. In this
example, we take a string as:
str ='Delhi999’
Now, we want to check if this string 'str’ contains only alphabets, i.e. A to Z, a to z and
not other characters like digits or spaces. We will use isalpha() method on the string as:
str.isalpha()
False
Since the string ‘Delhi999' contains digits, the isalpha() method returned False. Another
example:
str = ‘Delhi’
str.isalpha()
True
Formatting a string means presenting the string in a clearly understandable manner. The
format() method is used to format strings. This method is used as:
‘format string with replacement fields'. format (values)
We should first understand the meaning of the first attribute, i.e. ‘format string with
replacement fields'. The replacement fields are denoted by curly braces { } that contain
names or indexes. These names or indexes represent the order of the values. For
example, let's take an employee details like id number, name and salary in 3 variables
‘id' , ‘name' and 'sal'.
id=10
name-'shankar’
sal=19500.75
We want to create a format string by the name 'str' to display these 3 values. These 3
values or variables should be mentioned inside the format() method as format(id, name,
sal). The total format string can be written as:
str = ‘{},{},{}’.format (id, name, sal)
This string contains 3 replacement fields. The first field is replaced by ‘id' value and the
second field {} is replaced by the 'name' value and the third field {} is replaced by ‘sal’
value. So, if we display this string using print() method as given below:
print (str)
We can see the following output:
Suppose we do not want to display commas after each value. rather we want to display
hyphens (-). In that case, the format string can be written as:
Name = Shankar
Salary=19500.75
We can mention the escape characters like ‘\n' ‘\t’ inside the format string as shown in
the previous example. We can also mention the order numbers in the replacement fields
as 0, 1, 2, etc. Consider the following example:
By changing the numbers in the replacement fields, we can change the order of the
values being displayed as:
Please observe that the replacement fields {0} represented id number, {1) represented
name and (2) represented salary of the employee. These values are displayed in the
following order: (2), (0} and {1} in the preceding statement.
We can also mention names in the replacement fields to represent the values. These
names should refer to the values in the format() method as:
str = 'Id= {one}, Name= {two} , Salary= {three}’ .format (one=id, two=name,
three=sal)
print (str)
Formatting specification starts with a colon (: ) and after that, we can specify the
in the curly braces. We can use d' or ‘i’ for decimal number, ‘c' for character, ‘s’ for
string, ‘f’ or ‘F; for floating point numbers. If we do not use any type specifier, then
would assume string datatype. Also, ‘x’ or ‘X' should be used for hexadecimal number, ‘b’
for binary and 'o' for octal number. Consider the following example:
str= ‘Id= (:d}, Name= {:s}, Salary= {:10.2f}’.format (id, name, sal)
print (str)
The preceding statements will give the following output:
Id= 10, Name= Shankar, salary= 19500.75
Observe, the third replacement field (:10.2f}. This represents that the 'sal' value should
be displayed in 10 places. Among these 10 places, a decimal point and then 2 fraction
digits should be displayed. Suppose, we write {:.4f}, it means the 'sal' value should be
displayed with 4 fraction digits after decimal point and before decimal point, all the
available digits should be displayed.
It is possible to align the value in the replacement field. ‘<’ represents align left, ‘>’
represents align right, ‘^’ (carat) represents align in the center and '=’ represents
justified.Any character in the replacement field represents that the field will be filled
with that character. For example, we are going to display 'num' value which is 5000 right
aligned in the spaces. We are going to allot 15 spaces and justify the value towards right
in the spaces and remaining spaces will be filled with ‘*’.
num=5000
print('{: *>15d}'.format (num) )
In the above example ‘>' aligns the value towards right. If we use '^', then the value is
aligned in the center. Consider the following statement:
print('{:*^15d}'.format (num))
Let's display a value in the form of hexadecimal number and a binary number in the next
example.
n1=1000
print(' Hexadecimal= {:.>15X}\nBinary= {:.15b}'.format (n1, n1))
The preceding statements will display the following output:
Hexadecimal = …………3E8
Binary = 1111101000…..
In the above example, the number ‘n1’ whose value 1000 is converted into hexadecimal
and binary numbers and then displayed. Observe ‘X’ and ‘b’ in the format strings that
represent the hexadecimal and binary formats. Observe the output displayed by print()
function. It displayed ‘3E8' and ‘1111101000'. Suppose, we want to display these
numbers by adding the appropriate prefixes OX and OB, we can add a hash(#) symbol
in the replacement field as:
Characters are nothing but the individual elements of a string. As we know, a string
contain 1 or more characters. When the programmer is interested to work with
characters, he has to accept a string and then retrieve the characters from the string
using indexing or slicing. For example.
str ='Hello’
To retrieve the 0th character, we can write str[0] and to retrieve the 1st character, we
can write str[1]
ch= str[0)
print (ch)
We can also retrieve the characters from the string using slicing as:
Ch= str[0:1]
print (ch)
The preceding statements will give the following output:
H
We can apply the string testing methods mentioned in Table 8.2 for testing not only
strings but also the individual. These methods are useful to test a character and know
which type of character it is. For example,
ch.isalpha()
SORTING STRINGS
We can sort a group of strings into alphabetical order using sort() method and
sorted() function. The sort() method is used in the following way:
str.sort()
Here, 'str' represents an array that contains a group of strings. When sort() method is
used, it will sort the original array, i.e. 'str'. So, the original order of strings will be lost
and we will have only one sorted array. To retain the original array even after sorting,
we can use sorted() function as:
str1 = sorted(str)
Here, 'str' is the original array whose elements are to be sorted, After sorting the array,
the sorted array will be referenced by 'str1’. So, the sorted strings appear in the array
'str1’. The original array 'str' is undisturbed.
The easiest way to search for a string in a group of n' strings is by using sequential
search or linear search technique. This is done by comparing the searching string 's' with
every string in the group. For this purpose, we can use a for loop that iterates from 0th to
n-1th string in the group. By comparing the searching string 's' with every one of these
strings, we can decide whether 's' is available in the group or not. The logic looks
something like this:
for i in range(len(str)):
if s==str[i]:
We have len() function that returns the number of characters in a string. Suppose
want to find the length of a string without using len() function, we can use a for loop as:
i=0
for s in str:
i+=1
This for loop repeats once for each character of the string 'str'. So, if the string has 10
characters, the for loop repeats for 10 times. Hence, by counting the number of
repetitions, we can find the number of characters. This is done by simply incrementing
a counting variable 'i’ inside the loop as shown in the preceding code.
To find the number of words in a string, we have to first find out the number of spaces
For example, take a string: 'R Nageswara Rao'. The number of spaces here is 2, but there
are 3 words separated. Hence, we have to add 1 to the number of spaces to get the
mumber of words.
In many cases, there is possibility of having more than 1 space between the words in
string. In that case, we should not count all the spaces. When a space is counted, the
next immediate space should not be counted. For this purpose, we can take a Boolean
type variable ‘flag’. When a space is encountered, we will make ‘flag' as True otherwise
False as:
if str[i]==’ ‘:
flag=True
else:
flag=false
We will count the space only when flag is False. It means if there was no space found
previously, then only the present space is counted. In this way, we can obtain the
number of words correctly.
Let's take a string with some characters. In the middle of the string, we want to
insert a sub string. This is not possible because of immutable nature of strings. But, we
can find a way for this problem. Let's assume the main string is ‘str' and sub string is
'sub’. After inserting 'sub' into 'str’, the total string will be 'str1’. To represent this total
string, We will declare an empty list 'str1' as:
str1 = []
If n is the position where the sub string to be inserted, we will append the first n-1
characters from str into str1. Then the entire sub string will be appended to str1. In the
final step, we will append the remaining characters (from n till the end) from str to str1.
Thus, the total string will be available in str1 as a list. Figure 8.3 shows the insertion of a
sub string in a particular position into a main string.
Since a list contains characters as individual elements, we have to convert the list into a
string format so that we will have continuous flow of characters. See the difference
between elements of a list and of a string:
Above, the first line represents a list and the second line represents a string. We need the
result in the form of a string, hence we have to convert the list into a string. For this
purpose, we can use join) method with empty string as separator as:
Since the separator is an empty string, the elements of str1 will be joined without any
gaps in between and we will have the final string into 'str1'. Another way to convert the
list 'str1' into a string 'str2' is by using concatenation operator (+) as:
str2= ‘ ‘
for i in str1:
str2=str2+i
Thus the final string will be available in 'str2’.