0% found this document useful (0 votes)

95 views41 pages

Module-3 Python 17CS664

This document discusses lists in Python. Lists are mutable ordered sequences that can contain elements of different types. Elements are accessed using indexes and lists support operations like concatenation, repetition, slicing, and various methods. Common list methods include append(), extend(), sort(), reverse(), count(), clear(), insert(), index() and more. Lists are a fundamental data type in Python.

Uploaded by

thejasraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views41 pages

Module-3 Python 17CS664

Uploaded by

thejasraj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Python Application Programming (Open Elective - 17CS664)

MODULE – 3
3.1 LISTS
A list is an ordered sequence of values. It is a data structure in Python. The values inside
the lists can be of any type (like integer, float, strings, lists, tuples, dictionaries etc) and are
called as elements or items. The elements of lists are enclosed within square brackets. For
example,
ls1=[10,-4, 25, 13]
ls2=[“Tiger”, “Lion”, “Cheetah”]

Here, ls1 is a list containing four integers, and ls2 is a list containing three strings. A list
need not contain data of same type. We can have mixed type of elements in list. For
example,
ls3=[3.5, ‘Tiger’, 10, [3,4]]

Here, ls3 contains a float, a string, an integer and a list. This illustrates that a list can be
nested as well.

An empty list can be created any of the following ways –

>>> ls =[]
>>> type(ls)
<class 'list'>
or
>>> ls =list()
>>> type(ls)
<class 'list'>

In fact, list() is the name of a method (special type of method called as constructor –
which will be discussed in Module 4) of the class list. Hence, a new list can be created
using this function by passing arguments to it as shown below –

>>> ls2=list([3,4,1])
>>> print(ls2)
[3, 4, 1]

3.1.1 Lists are Mutable

The elements in the list can be accessed using a numeric index within square-brackets. It is
similar to extracting characters in a string.

>>> ls=[34, 'hi', [2,3],-5]

>>> print(ls[1])
hi
>>> print(ls[2])
[2, 3]

PDIT,Hosapete Page 1
Observe here that, the inner list is treated as a single element by outer list. If we would like
to access the elements within inner list, we need to use double-indexing as shown below –

>>> print(ls[2][0])
2
>>> print(ls[2][1])
3

Note that, the indexing for inner-list again starts from 0. Thus, when we are using double-
indexing, the first index indicates position of inner list inside outer list, and the second index
means the position particular value within inner list.

Unlike strings, lists are mutable. That is, using indexing, we can modify any value within list.
In the following example, the 3rd element (i.e. index is 2) is being modified –

>>> ls=[34, 'hi', [2,3],-5]

>>> ls[2]='Hello'
>>> print(ls)
[34, 'hi', 'Hello', -5]

The list can be thought of as a relationship between indices and elements. This relationship
is called as a mapping. That is, each index maps to one of the elements in a list.

The index for extracting list elements has following properties –

Any integer expression can be an index.
>>> ls=[34, 'hi', [2,3],-5]
>>> print(ls[2*1])
'Hello'
Attempt to access a non-existing index will throw and IndexError.
>>> ls=[34, 'hi', [2,3],-5]
>>> print(ls[4])
IndexError: list index out of range
A negative indexing counts from backwards.
>>> ls=[34, 'hi', [2,3],-5]
>>> print(ls[-1])
-5
>>> print(ls[-3])
hi

The in operator applied on lists will results in a Boolean value.

>>> ls=[34, 'hi', [2,3],-5]
>>> 34 in ls

>>> -2 in ls
False

PDIT,Hosapete Page 2
3.1.2 Traversing a List
A list can be traversed using for loop. If we need to use each element in the list, we can use
the for loop and in operator as below –
>>> ls=[34, 'hi', [2,3],-5]
>>> for item in ls:
print(item)

34
hi
Hello
-5
List elements can be accessed with the combination of range() and len() functions as well –

ls=[1,2,3,4]
for i in range(len(ls)):
ls[i]=ls[i]**2

print(ls) #output is [1, 4, 9, 16]

Here, we wanted to do modification in the elements of list. Hence, referring indices is

suitable than referring elements directly. The len() returns total number of elements in the
list (here it is 4). Then range() function makes the loop to range from 0 to 3 (i.e. 4-1). Then,
for every index, we are updating the list elements (replacing original value by its square).

3.1.3 List Operations

Python allows to use operators + and * on lists. The operator + uses two list objects and
returns concatenation of those two lists. Whereas * operator take one list object and one
integer value, say n, and returns a list by repeating itself for n times.

>>> ls1=[1,2,3]
>>> ls2=[5,6,7]
>>> print(ls1+ls2) #concatenation using +
[1, 2, 3, 5, 6, 7]

>>> ls1=[1,2,3]
>>> print(ls1*3) #repetition using *
[1, 2, 3, 1, 2, 3, 1, 2, 3]

>>> [0]4 #repetition using

[0, 0, 0, 0]

3.1.4 List Slices

Similar to strings, the slicing can be applied on lists as well. Consider a list t given below,
and a series of examples following based on this object.

PDIT,Hosapete Page 3
t=['a','b','c','d','e']

Extracting full list without using any index, but only a slicing operator –
>>> print(t[:])
['a', 'b', 'c', 'd', 'e']

Extracting elements from 2nd position –

>>> print(t[1:])
['b', 'c', 'd', 'e']

Extracting first three elements –

>>> print(t[:3])
['a', 'b', 'c']

Selecting some middle elements –

>>> print(t[2:4])
['c', 'd']

Using negative indexing –

>>> print(t[:-2])
['a', 'b', 'c']

Reversing a list using negative value for stride –

>>> print(t[::-1])
['e', 'd', 'c', 'b', 'a']

Modifying (reassignment) only required set of values –

>>> t[1:3]=['p','q']
>>> print(t)
['a', 'p', 'q', 'd', 'e']

Thus, slicing can make many tasks simple.

3.1.5 List Methods

There are several built-in methods in list class for various purposes. Here, we will discuss
some of them.

append(): This method is used to add a new element at the end of a list.
>>> ls=[1,2,3]
>>> [Link](‘hi’)
>>> [Link](10)
>>> print(ls)
[1, 2, 3, ‘hi’, 10]

PDIT,Hosapete Page 4
extend(): This method takes a list as an argument and all the elements in this list
are added at the end of invoking list.
>>> ls1=[1,2,3]
>>> ls2=[5,6]
>>> [Link](ls1)
>>> print(ls2)
[5, 6, 1, 2, 3]

Now, in the above example, the list ls1 is unaltered.

sort(): This method is used to sort the contents of the list. By default, the function
will sort the items in ascending order.

>>> ls=[3,10,5, 16,-2]

>>> [Link]()
>>> print(ls)
[-2, 3, 5, 10, 16]

When we want a list to be sorted in descending order, we need to set the argument
as shown –

>>> [Link](reverse=True)
>>> print(ls)
[16, 10, 5, 3, -2]

reverse(): This method can be used to reverse the given list.

>>> ls=[4,3,1,6]
>>> [Link]()
>>> print(ls)
[6, 1, 3, 4]

count(): This method is used to count number of occurrences of a particular value

within list.
>>> ls=[1,2,5,2,1,3,2,10]
>>> [Link](2)
3 #the item 2 has appeared 3 tiles in ls

clear(): This method removes all the elements in the list and makes the list empty.
>>> ls=[1,2,3]
>>> [Link]()
>>> print(ls)
[]

PDIT,Hosapete Page 5
insert(): Used to insert a value before a specified index of the list.
>>> ls=[3,5,10]
>>> [Link](1,"hi")
>>> print(ls)
[3, 'hi', 5, 10]

index(): This method is used to get the index position of a particular value in the list.
>>> ls=[4, 2, 10, 5, 3, 2, 6]
>>> [Link](2)
1
Here, the number 2 is found at the index position 1. Note that, this function will give
index of only the first occurrence of a specified value. The same function can be
used with two more arguments start and end to specify a range within which the
search should take place.

>>> ls=[15, 4, 2, 10, 5, 3, 2, 6]

>>> [Link](2)
2
>>> [Link](2,3,7)
6

If the value is not present in the list, it throws ValueError.

>>> ls=[15, 4, 2, 10, 5, 3, 2, 6]
>>> [Link](53)
ValueError: 53 is not in list

Few important points about List Methods:

1. There is a difference between append() and extend() methods. The former adds the
argument as it is, whereas the latter enhances the existing list. To understand this,
observe the following example –
>>> ls1=[1,2,3]
>>> ls2=[5,6]
>>> [Link](ls1)
>>> print(ls2)
[5, 6, [1, 2, 3]]

Here, the argument ls1 for the append() function is treated as one item, and made as
an inner list to ls2. On the other hand, if we replace append() by extend() then the
result would be –
>>> ls1=[1,2,3]
>>> ls2=[5,6]
>>> [Link](ls1)
>>> print(ls2)
[5, 6, 1, 2, 3]

PDIT,Hosapete Page 6
2. The sort() function can be applied only when the list contains elements of compatible
types. But, if a list is a mix non-compatible types like integers and string, the comparison
cannot be done. Hence, Python will throw TypeError. For example,

>>> ls=[34, 'hi', -5]

>>> [Link]()
TypeError: '<' not supported between instances of 'str' and 'int'

Similarly, when a list contains integers and sub-list, it will be an error.

>>> ls=[34,[2,3],5]
>>> [Link]()
TypeError: '<' not supported between instances of 'list' and 'int'

Integers and floats are compatible and relational operations can be performed on them.
Hence, we can sort a list containing such items.

>>> ls=[3, 4.5, 2]

>>> [Link]()
>>> print(ls)
[2, 3, 4.5]

3. The sort() function uses one important argument keys. When a list is containing tuples,
it will be useful. We will discuss tuples later in this Module.

4. Most of the list methods like append(), extend(), sort(), reverse() etc. modify the list
object internally and return None.
>>> ls=[2,3]
>>> ls1=[Link](5)
>>> print(ls)
[2,3,5]
>>> print(ls1)
None

3.1.6 Deleting Elements

Elements can be deleted from a list in different ways. Python provides few built-in methods
for removing elements as given below –
pop(): This method deletes the last element in the list, by default.
>>> ls=[3,6,-2,8,10]
>>> x=[Link]() #10 is removed from list and stored in x
>>> print(ls)
[3, 6, -2, 8]
>>> print(x)
10

PDIT,Hosapete Page 7
When an element at a particular index position has to be deleted, then we can give
that position as argument to pop() function.
>>> t = ['a', 'b', 'c']
>>> x = [Link](1) #item at index 1 is popped
>>> print(t)
['a', 'c']
>>> print(x)
b

remove(): When we don’t know the index, but know the value to be removed, then
this function can be used.

>>> ls=[5,8, -12,34,2]

>>> [Link](34)
>>> print(ls)
[5, 8, -12, 2]

Note that, this function will remove only the first occurrence of the specified value,
but not all occurrences.
>>> ls=[5,8, -12, 34, 2, 6, 34]
>>> [Link](34)
>>> print(ls)
[5, 8, -12, 2, 6, 34]

Unlike pop() function, the remove() function will not return the value that has been
deleted.

del: This is an operator to be used when more than one item to be deleted at a time.
Here also, we will not get the items deleted.

>>> ls=[3,6,-2,8,1]
>>> del ls[2] #item at index 2 is deleted

PDIT,Hosapete Page 8
>>> print(ls)
[3, 6, 8, 1]

>>> ls=[3,6,-2,8,1]
>>> del ls[1:4] #deleting all elements from index 1 to 3
>>> print(ls)
[3, 1]

Deleting all odd indexed elements of a list –

>>> t=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’]
>>> del t[1::2]
>>> print(t)
['a', 'c', 'e']

3.1.7 Lists and Functions

The utility functions like max(), min(), sum(), len() etc. can be used on lists. Hence most of
the operations will be easy without the usage of loops.
>>> ls=[3,12,5,26, 32,1,4]
>>> max(ls) # prints 32
>>> min(ls) # prints 1
>>> sum(ls) # prints 83
>>> len(ls) # prints 7

>>> avg=sum(ls)/len(ls)
>>> print(avg)
11.857142857142858

When we need to read the data from the user and to compute sum and average of those
numbers, we can write the code as below –

ls= list()
while (True):
x= input('Enter a number: ')
if x== 'done':
break

x= float(x)
[Link](x)

average = sum(ls) / len(ls)

print('Average:', average)

In the above program, we initially create an empty list. Then, we are taking an infinite while-
loop. As every input from the keyboard will be in the form of a string, we need to convert x
into float type and then append it to a list. When the keyboard input is a string ‘done’, t hen
the loop is going to get terminated. After the loop, we will find the average of those
numbers with the help of built-in functions sum() and len().

3.1.8 Lists and Strings

PDIT,Hosapete Page 9
Though both lists and strings are sequences, they are not same. In fact, a list of characters
is not same as string. To convert a string into a list, we use a method list() as below –
>>> s="hello"
>>> ls=list(s)
>>> print(ls)
['h', 'e', 'l', 'l', 'o']

The method list() breaks a string into individual letters and constructs a list. If we want a list
of words from a sentence, we can use the following code –

PDIT,Hosapete Page 10
>>> s="Hello how are you?"
>>> ls=[Link]()
>>> print(ls)
['Hello', 'how', 'are', 'you?']

Note that, when no argument is provided, the split() function takes the delimiter as white
space. If we need a specific delimiter for splitting the lines, we can use as shown in
following example –

>>> dt="20/03/2018"
>>> ls=[Link]('/')
>>> print(ls)
['20', '03', '2018']

There is a method join() which behaves opposite to split() function. It takes a list of strings
as argument, and joins all the strings into a single string based on the delimiter provided.
For example –
>>> ls=["Hello", "how", "are", "you"]
>>> d=' '
>>> [Link](ls)
'Hello how are you'

Here, we have taken delimiter d as white space. Apart from space, anything can be taken
as delimiter. When we don’t need any delimiter, use empty string as delimiter.

3.1.9 Parsing Lines

In many situations, we would like to read a file and extract only the lines containing required
pattern. This is known as parsing. As an illustration, let us assume that there is a log file
containing details of email communication between employees of an organization. For all
received mails, the file contains lines as –
From [Link]@[Link] Fri Jan 5 [Link] 2018
From georgek@[Link] Sat Jan 6 [Link] 2018
………………
Apart from such lines, the log file also contains mail-contents, to-whom the mail has been
sent etc. Now, if we are interested in extracting only the days of incoming mails, then we
can go for parsing. That is, we are interested in knowing on which of the days, the mails
have been received. The code would be –
fhand = open(‘[Link]’)
for line in fhand:
line = [Link]()
if not [Link]('From '):
continue
words = [Link]()
print(words[2])

PDIT,Hosapete Page 11
Obviously, all received mails starts from the word From. Hence, we search for only such
lines and then split them into words. Observe that, the first word in the line would be From,
second word would be email-ID and the 3rd word would be day of a week. Hence, we will
extract words[2] which is 3rd word.
3.1.10 Objects and Values
Whenever we assign two variables with same value, the question arises – whether both the
variables are referring to same object, or to different objects. This is important aspect to
know, because in Python everything is a class object. There is nothing like elementary data
type.

Consider a situation –
a= “hi”
b= “hi”

Now, the question is whether both a and b refer to the same string. There are two possible
states –

a hi a
hi
b hi b

In the first situation, a and b are two different objects, but containing same value. The
modification in one object is nothing to do with the other. Whereas, in the second case,
both a and b are referring to the same object. That is, a is an alias name for b and vice-
versa. In other words, these two are referring to same memory location.

To check whether two variables are referring to same object or not, we can use is operator.

>>> a= “hi”
>>> b= “hi”
>>> a is b #result is True
>>> a==b #result is True

When two variables are referring to same object, they are called as identical objects.
When two variables are referring to different objects, but contain a same value, they are
known as equivalent objects. For example,
>>> s1=input(“Enter a string:”) #assume you entered hello
>>> s2= input(“Enter a string:”) #assume you entered hello
>>> s1 is s2 #check s1 and s2 are identical
False
>>> s1 == s2 #check s1 and s2 are equivalent
True
Here s1 and s2 are equivalent, but not identical.

PDIT,Hosapete Page 12
If two objects are identical, they are also equivalent, but if they are equivalent, they are not
necessarily identical.

String literals are interned by default. That is, when two string literals are created in the
program with a same value, they are going to refer same object. But, string variables read
from the key-board will not have this behavior, because their values are depending on the
user’s choice.

Lists are not interned. Hence, we can see following result –

>>> ls1=[1,2,3]
>>> ls2=[1,2,3]
>>> ls1 is ls2 #output is False
>>> ls1 == ls2 #output is True

3.1.11 Aliasing
When an object is assigned to other using assignment operator, both of them will refer to
same object in the memory. The association of a variable with an object is called as
reference.
>>> ls1=[1,2,3]
>>> ls2= ls1
>>> ls1 is ls2 #output is True

Now, ls2 is said to be reference of ls1. In other words, there are two references to the
same object in the memory.

An object with more than one reference has more than one name, hence we say that object
is aliased. If the aliased object is mutable, changes made in one alias will reflect the other.
>>> ls2[1]= 34
>>> print(ls1) #output is [1, 34, 3]

Strings are safe in this regards, as they are immutable.

3.1.12 List Arguments

When a list is passed to a function as an argument, then function receives reference to this
list. Hence, if the list is modified within a function, the caller will get the modified version.
Consider an example –

def del_front(t):
del t[0]
ls = ['a', 'b', 'c']
del_front(ls)
print(ls)
# output is ['b', 'c']

PDIT,Hosapete Page 13
Here, the argument ls and the parameter t both are aliases to same object.

One should understand the operations that will modify the list and the operations that
create a new list. For example, the append() function modifies the list, whereas the +
operator creates a new list.

>>> t1 = [1, 2]
>>> t2 = [Link](3)
>>> print(t1) #output is [1 2 3]
>>> print(t2) #prints None
>>> t3 = t1 + [5]
>>> print(t3) #output is [1 2 3 5]
>>> t2 is t3 #output is False
Here, after applying append() on t1 object, the t1 itself has been modified and t2 is not
going to get anything. But, when + operator is applied, t1 remains same but t3 will get the
updated result.

The programmer should understand such differences when he/she creates a function
intending to modify a list. For example, the following function has no effect on the original
list –
def test(t):
t=t[1:]
ls=[1,2,3]
test(ls)
print(ls) #prints [1, 2, 3]
One can write a return statement after slicing as below –

def test(t):
return t[1:]
ls=[1,2,3]
ls1=test(ls)
print(ls1) #prints [2, 3]
print(ls) #prints [1, 2, 3]
In the above example also, the original list is not modified, because a return statement
always creates a new object and is assigned to LHS variable at the position of function call.

PDIT,Hosapete Page 14
3.2 DICTIONARIES
A dictionary is a collection of unordered set of key:value pairs, with the requirement that
keys are unique in one dictionary. Unlike lists and strings where elements are accessed
using index values (which are integers), the values in dictionary are accessed using keys. A
key in dictionary can be any immutable type like strings, numbers and tuples. (The tuple
can be made as a key for dictionary, only if that tuple consist of string/number/ sub-tuples).
As lists are mutable – that is, can be modified using index assignments, slicing, or using
methods like append(), extend() etc, they cannot be a key for dictionary.

One can think of a dictionary as a mapping between set of indices (which are actually keys)
and a set of values. Each key maps to a value.

An empty dictionary can be created using two ways –

d= {}
OR
d=dict()

To add items to dictionary, we can use square brackets as –

>>> d={}
>>> d["Mango"]="Fruit"
>>> d["Banana"]="Fruit"
>>> d["Cucumber"]="Veg"
>>> print(d)
{'Mango': 'Fruit', 'Banana': 'Fruit', 'Cucumber': 'Veg'}

To initialize a dictionary at the time of creation itself, one can use the code like –
>>> tel_dir={'Tom': 3491, 'Jerry':8135}
>>> print(tel_dir)
{'Tom': 3491, 'Jerry': 8135}
>>> tel_dir['Donald']=4793
>>> print(tel_dir)
{'Tom': 3491, 'Jerry': 8135, 'Donald': 4793}

NOTE that the order of elements in dictionary is unpredictable. That is, in the above
example, don’t assume that 'Tom': 3491 is first item, 'Jerry': 8135 is second item
etc. As dictionary members are not indexed over integers, the order of elements inside it
may vary. However, using a key, we can extract its associated value as shown below –
>>> print(tel_dir['Jerry'])
8135

Here, the key 'Jerry' maps with the value 8135, hence it doesn’t matter where exactly it
is inside the dictionary.

PDIT,Hosapete Page 15
If a particular key is not there in the dictionary and if we try to access such key, then the
KeyError is generated.
>>> print(tel_dir['Mickey'])
KeyError: 'Mickey'
The len() function on dictionary object gives the number of key-value pairs in that object.
>>> print(tel_dir)
{'Tom': 3491, 'Jerry': 8135, 'Donald': 4793}
>>> len(tel_dir)
3
The in operator can be used to check whether any key (not value) appears in the dictionary
object.
>>> 'Mickey' in tel_dir #output is False
>>> 'Jerry' in tel_dir #output is True
>>> 3491 in tel_dir #output is False
We observe from above example that the value 3491 is associated with the key 'Tom' in
tel_dir. But, the in operator returns False.
The dictionary object has a method values() which will return a list of all the values
associated with keys within a dictionary. If we would like to check whether a particular value
exist in a dictionary, we can make use of it as shown below –

>>> 3491 in tel_dir.values() #output is True

The in operator behaves differently in case of lists and dictionaries as explained

hereunder–
When in operator is used to search a value in a list, then linear search algorithm is
used internally. That is, each element in the list is checked one by one sequentially.
This is considered to be expensive in the view of total time taken to process.
Because, if there are 1000 items in the list, and if the element in the list which we are
search for is in the last position (or if it does not exists), then before yielding result of
search (True or False), we would have done 1000 comparisons. In other words,
linear search requires n number of comparisons for the input size of n elements.
Time complexity of the linear search algorithm is O(n).
The keys in dictionaries of Python are basically hashable elements. The concept of
hashing is applied to store (or maintain) the keys of dictionaries. Normally hashing
techniques have the time complexity as O(log n) for basic operations like insertion,
deletion and searching. Hence, the in operator applied on keys of dictionaries works
better compared to that on lists. (Hashing technique is explained at the end of this
Section, for curious readers)

PDIT,Hosapete Page 16
3.2.1 Dictionary as a Set of Counters
Assume that we need to count the frequency of alphabets in a given string. There are
different methods to do it –
Create 26 variables to represent each alphabet. Traverse the given string and
increment the corresponding counter when an alphabet is found.
Create a list with 26 elements (all are zero in the beginning) representing alphabets.
Traverse the given string and increment corresponding indexed position in the list
when an alphabet is found.
Create a dictionary with characters as keys and counters as values. When we find a
character for the first time, we add the item to dictionary. Next time onwards, we
increment the value of existing item.

Each of the above methods will perform same task, but the logic of implementation will be
different. Here, we will see the implementation using dictionary.

s=input("Enter a string:") #read a string

d=dict() #create empty dictionary
for ch in s: #traverse through string
if ch not in d: #if new character found
d[ch]=1 #initialize counter to 1
else: #otherwise, increment counter
d[ch]+=1

print(d) #display the dictionary

The sample output would be –

Enter a string: Hello World

{'H': 1, 'e': 1, 'l': 3, 'o': 2, ' ': 1, 'W': 1, 'r': 1, 'd': 1}
It can be observed from the output that, a dictionary is created here with characters as keys
and frequencies as values. Note that, here we have computed histogram of counters.

Dictionary in Python has a method called as get(), which takes key and a default value as
two arguments. If key is found in the dictionary, then the get() function returns
corresponding value, otherwise it returns default value. For example,
>>> tel_dir={'Tom': 3491, 'Jerry':8135, 'Mickey':1253}
>>> print(tel_dir.get('Jerry',0))
8135
>>> print(tel_dir.get('Donald',0))
0
In the above example, when the get() function is taking 'Jerry' as argument, it returned
corresponding value, as 'Jerry' is found in tel_dir . Whereas, when get() is used with
'Donald' as key, the default value 0 (which is provided by us) is returned.

PDIT,Hosapete Page 17
The function get() can be used effectively for calculating frequency of alphabets in a string.
Here is the modified version of the program –

s=input("Enter a string:")
d=dict()
for ch in s:
d[ch]=[Link](ch,0)+1
print(d)

In the above program, for every character ch in a given string, we will try to retrieve a
value. When the ch is found in d, its value is retrieved, 1 is added to it, and restored. If ch
is not found, 0 is taken as default and then 1 is added to it.

3.2.2 Looping and Dictionaries

When a for-loop is applied on dictionaries, it will iterate over the keys of dictionary. If we
want to print key and values separately, we need to use the statemen ts as shown –
tel_dir={'Tom': 3491, 'Jerry':8135,
'Mickey':1253} for k in tel_dir:
print(k, tel_dir[k])

Output would be –
Tom 3491
Jerry 8135
Mickey 1253

Note that, while accessing items from dictionary, the keys may not be in order. If we want to
print the keys in alphabetical order, then we need to make a list of the keys, and then sort
that list. We can do so using keys() method of dictionary and sort() method of lists.
Consider the following code –

tel_dir={'Tom': 3491, 'Jerry':8135, 'Mickey':1253}

ls=list(tel_dir.keys())
print("The list of keys:",ls)
[Link]()
print("Dictionary elements in alphabetical order:")
for k in ls:
print(k, tel_dir[k])

The output would be –

The list of keys: ['Tom', 'Jerry', 'Mickey']
Dictionary elements in alphabetical order:
Jerry 8135
Mickey 1253
Tom 3491

PDIT,Hosapete Page 18
Note: The key-value pair from dictionary can be together accessed with the help of a
method items() as shown –
>>> d={'Tom':3412, 'Jerry':6781, 'Mickey':1294}
>>> for k,v in [Link]():
print(k,v)
Output:
Tom 3412
Jerry 6781
Mickey 1294

The usage of comma-separated list k,v here is internally a tuple (another data structure in
Python, which will be discussed later).

3.2.3 Dictionaries and Files

A dictionary can be used to count the frequency of words in a file. Consider a file [Link]
consisting of following text –
hello, how are you?
I am doing fine.
How about you?

Now, we need to count the frequency of each of the word in this file. So, we need to take
an outer loop for iterating over entire file, and an inner loop for traversing each line in a file.
Then in every line, we count the occurrence of a word, as we did before for a character.
The program is given as below –
fname=input("Enter file name:")
try:
fhand=open(fname)
except:
print("File cannot be opened")
exit()
d=dict()

for line in fhand:

for word in [Link]():
d[word]=[Link](word,0)+1
print(d)

The output of this program when the input file is [Link] would be –

Enter file name: [Link]

{'hello,': 1, 'how': 1, 'are': 1, 'you?': 2, 'I': 1, 'am': 1,
'doing': 1, 'fine.': 1, 'How': 1, 'about': 1}

PDIT,Hosapete Page 19
Notes for Python Application Programming (Open Elective - 15CS664)

Few points to be observed in the above output –

The punctuation marks like comma, full point, question mark etc. are also
considered as a part of word and stored in the dictionary. This means, when a
particular word appears in a file with and without punctuation mark, then there will be
multiple entries of that word.
The word ‘how’ and ‘How’ are treated as separate words in the above example
because of uppercase and lowercase letters.

While solving problems on text analysis, machine learning, data analysis etc. such kinds of
treatment of words lead to unexpected results. So, we need to be careful in parsing the text
and we should try to eliminate punctuation marks, ignoring the case etc. The procedure is
discussed in the next section.

3.2.4 Advanced Text Parsing

As discussed in the previous section, during text parsing, our aim is to eliminate
punctuation marks as a part of word. The string module of Python provides a list of all
punctuation marks as shown –

>>> import string

>>> [Link]
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

The str class has a method maketrans() which returns a translation table usable for another
method translate(). Consider the following syntax to understand it more clearly –
[Link]([Link](fromstr, tostr, deletestr))

The above statement replaces the characters in fromstr with the character in the same
position in tostr and delete all characters that are in deletestr. The fromstr and
tostr can be empty strings and the deletestr parameter can be omitted.
Using these functions, we will re-write the program for finding frequency of words in a file.

import string

fname=input("Enter file name:")

try:
fhand=open(fname)
except:
print("File cannot be opened")
exit()
d=dict()

PDIT,Hosapete Page 20
for line in fhand:
line=[Link]()
line=[Link]([Link]('','',[Link]))
line=[Link]()
for word in [Link]():
d[word]=[Link](word,0)+1
print(d)

Now, the output would be –

Enter file name:[Link]
{'hello': 1, 'how': 2, 'are': 1, 'you': 2, 'i': 1, 'am': 1,
'doing': 1, 'fine': 1, 'about': 1}

Comparing the output of this modified program with the previous one, we can make out that
all the punctuation marks are not considered for parsing and also the case of the alphabets
are ignored.

3.2.5 Debugging
When we are working with big datasets (like file containing thousands of pages), it is
difficult to debug by printing and checking the data by hand. So, we can follow any of the
following procedures for easy debugging of the large datasets –
Scale down the input: If possible, reduce the size of the dataset. For example if the
program reads a text file, start with just first 10 lines or with the smallest example you
can find. You can either edit the files themselves, or modify the program so it reads only
the first n lines. If there is an error, you can reduce n to the smallest value that
manifests the error, and then increase it gradually as you correct the errors.
Check summaries and types: Instead of printing and checking the entire dataset,
consider printing summaries of the data: for example, the number of items in a
dictionary or the total of a list of numbers. A common cause of runtime errors is a value
that is not the right type. For debugging this kind of error, it is often enough to print the
type of a value.
Write self-checks: Sometimes you can write code to check for errors automatically. For
example, if you are computing the average of a list of numbers, you could check that the
result is not greater than the largest element in the list or less than the smallest. This is
called a sanity check because it detects results that are “completely illogical”. Another
kind of check compares the results of two different computations to see if they are
consistent. This is called a consistency check.
Pretty print the output: Formatting debugging output can make it easier to spot an
error.

PDIT,Hosapete Page 21
Hashing Technique (For curious minds – Only for understanding, not
for Exams!!)
Hashing is a way of representing dictionaries (Not a Python data structure Dictionary!!).
Dictionary is an abstract data type with a set of operations searching, insertion and deletion
defined on its elements. The elements of dictionary can be numeric or characters or most
of the times, records. Usually, a record consists of several fields; each may be of different
data types. For example, student record may contain student id, name, gender, marks etc.
Every record is usually identified by some key. Hashing technique is very useful in
database management, because it is considered to be very efficient searching technique.

Here we will consider the implementation of a dictionary of n records with keys k1, k2
…kn. Hashing is based on the idea of distributing keys among a one-dimensional array
H[0…m-1], called hash table.

For each key, a value is computed using a predefined function called hash function. This
function assigns an integer, called hash address, between 0 to m-1 to each key. Based on
the hash address, the keys will be distributed in a hash table.

For example, if the keys k1, k2, …., kn are integers, then a hash function can be
h(K) = K mod m.

Let us take keys as 65, 78, 22, 30, 47, 89. And let hash function
be, h(k) = k%10.

Then the hash addresses may be any value from 0 to 9. For each key, hash address will be
computed as –
h(65) = 65 %10 = 5
h(78) = 78%10 = 8
h(22)= 22 % 10 =2
h(30)= 30 %10 =0
h(47) = 47 %10 = 7
h(89)=89 % 10 = 9

Now, each of these keys can be hashed into a hash table as –

0 1 2 3 4 5 6 7 8 9
30 22 65 47 78 89

In general, a hash function should satisfy the following requirements:

A hash function needs to distribute keys among the cells of hash table as evenly as
possible.
A hash function has to be easy to compute.

Hash Collisions: Let us have n keys and the hash table is of size m such that m<n. As
each key will have an address with any value between 0 to m-1, it is obvious that more than

PDIT,Hosapete Page 22
one key will have same hash address. That is, two or more keys need to be hashed into the
same cell of hash table. This situation is called as hash collision.

In the worst case, all the keys may be hashed into same cell of hash table. But, we can
avoid this by choosing proper size of hash table and hash function. Anyway, every hashing
scheme must have a mechanism for resolving hash collision. There are two methods for
hash collision resolution, viz.
Open hashing
closed hashing

Open Hashing (or Separate Chaining): In open hashing, keys are stored in linked lists
attached to cells of a hash table. Each list contains all the keys hashed to its cell. For
example, consider the elements
65, 78, 22, 30, 47, 89, 55, 42, 18, 29, 37.

If we take the hash function as h(k)= k%10, then the hash addresses will be –
h(65) = 65 %10 = 5 h(78) = 78%10 = 8
h(22)= 22 % 10 =2 h(30)= 30 %10 =0
h(47) = 47 %10 = 7 h(89)=89 % 10 = 9
h(55)=55%10 =5 h(42)=42%10 =2
h(18)=18%10 =8 h(29)=29%10=9
h(37)=37%10 =7

The hash table would be –

0 1 2 3 4 5 6 7 8 9

30 22 65 47 78 89

42 55 37 18 29
Operations on Hashing:
Searching: Now, if we want to search for the key element in a hash table, we need
to find the hash address of that key using same hash function. Using the obtained
hash address, we need to search the linked list by tracing it, till either the key is
found or list gets exhausted.
Insertion: Insertion of new element to hash table is also done in similar manner.
Hash key is obtained for new element and is inserted at the end of the list for that
particular cell.
Deletion: Deletion of element is done by searching that element and then deleting it
from a linked list.

PDIT,Hosapete Page 23
Closed Hashing (or Open Addressing): In this technique, all keys are stored in the
hash table itself without using linked lists. Different methods can be used to resolve hash
collisions. The simplest technique is linear probing.

This method suggests to check the next cell from where the collision occurs. If that cell is
empty, the key is hashed there. Otherwise, we will continue checking for the empty cell in a
circular manner. Thus, in this technique, the hash table size must be at least as large as
the total number of keys. That is, if we have n elements to be hashed, then the size of hash
table should be greater or equal to n.

Example: Consider the elements 65, 78, 18, 22, 30, 89, 37, 55, 42
Let us take the hash function as h(k)= k%10, then the hash addresses will be –
h(65) = 65 %10 = 5 h(78) = 78%10 = 8
h(18)=18%10 =8 h(22)= 22 % 10 =2
h(30)= 30 %10 =0 h(89)=89 % 10 = 9
h(37)=37%10 =7 h(55)=55%10 =5
h(42)=42%10 =2

Since there are 9 elements in the list, our hash table should at least be of size 9. Here
we are taking the size as 10.

Now, hashing is done as below –

0 1 2 3 4 5 6 7 8 9
30 89 22 42 65 55 37 78 18

Drawbacks:
Searching may become like a linear search and hence not efficient.

PDIT,Hosapete Page 24
3.3 TUPLES
A tuple is a sequence of items, similar to lists. The values stored in the tuple can be of any
type and they are indexed using integers. Unlike lists, tuples are immutable. That is, values
within tuples cannot be modified/reassigned. Tuples are comparable and hashable objects.
Hence, they can be made as keys in dictionaries.

A tuple can be created in Python as a comma separated list of items – may or may not be
enclosed within parentheses.
>>> t='Mango', 'Banana', 'Apple' #without parentheses
>>> print(t)
('Mango', 'Banana', 'Apple')
>>> t1=('Tom', 341, 'Jerry') #with parentheses
>>> print(t1)
('Tom', 341, 'Jerry')

Observe that tuple values can be of mixed types.

If we would like to create a tuple with single value, then just a parenthesis will not suffice.
For example,
>>> x=(3) #trying to have a tuple with single item
>>> print(x)
3 #observe, no parenthesis found
>>> type(x)
<class 'int'> #not a tuple, it is integer!!
Thus, to have a tuple with single item, we must include a comma after the item. That is,

>>> t=3, #or use the statement t=(3,)

>>> type(t) #now this is a tuple

PDIT,Hosapete Page 25
<class 'tuple'>
An empty tuple can be created either using a pair of parenthesis or using a function tuple()
as below –
>>> t1=()
>>> type(t1)
<class 'tuple'>
>>> t2=tuple()
>>> type(t2)
<class 'tuple'>

If we provide an argument of type sequence (a list, a string or tuple) to the method tuple(),
then a tuple with the elements in a given sequence will be created –

Create tuple using string:

>>> t=tuple('Hello')
>>> print(t)
('H', 'e', 'l', 'l', 'o')

Create tuple using list:

>>> t=tuple([3,[12,5],'Hi'])
>>> print(t)
(3, [12, 5], 'Hi')

Create tuple using another tuple:

>>> t=('Mango', 34, 'hi')

>>> t1=tuple(t)
>>> print(t1)
('Mango', 34, 'hi')
>>> t is t1
True
Note that, in the above example, both t and t1 objects are referring to same memory
location. That is, t1 is a reference to t.

Elements in the tuple can be extracted using square-brackets with the help of indices.
Similarly, slicing also can be applied to extract required number of items from tuple.

>>> t=('Mango', 'Banana', 'Apple')

>>> print(t[1])
Banana
>>> print(t[1:])
('Banana', 'Apple')
>>> print(t[-1])
Apple

Modifying the value in a tuple generates error, because tuples are immutable –
>>> t[0]='Kiwi'
TypeError: 'tuple' object does not support item assignment

PDIT,Hosapete Page 26
We wanted to replace ‘Mango’ by ‘Kiwi’, which did not work using assignment. But, a tuple
can be replaced with another tuple involving required modifications –
>>> t=('Kiwi',)+t[1:]
>>> print(t)
('Kiwi', 'Banana', 'Apple')

PDIT,Hosapete Page 27
3.3.1 Comparing Tuples
Tuples can be compared using operators like >, <, >=, == etc. The comparison happens
lexicographically. For example, when we need to check equality among two tuple objects,
the first item in first tuple is compared with first item in sec ond tuple. If they are same, 2nd
items are compared. The check continues till either a mismatch is found or items get over.
Consider few examples –

>>> (1,2,3)==(1,2,5
) False
>>> (3,4)==(3,4)
True
The meaning of < and > in tuples is not exactly less than and greater than, instead, it
means comes before and comes after. Hence in such cases, we will get results different
from checking equality (==).
>>> (1,2,3)<(1,2,5)
True
>>> (3,4)<(5,2
) True
When we use relational operator on tuples containing non-comparable types, then
TypeError will be thrown.

>>> (1,'hi')<('hello','world')
TypeError: '<' not supported between instances of 'int' and 'str'

The sort() function internally works on similar pattern – it sorts primarily by first element, in
case of tie, it sorts on second element and so on. This pattern is known as DSU –
Decorate a sequence by building a list of tuples with one or more sort keys
preceding the elements from the sequence,
Sort the list of tuples using the Python built-in sort(), and
Undecorate by extracting the sorted elements of the sequence.

Consider a program of sorting words in a sentence from longest to shortest, which

illustrates DSU property.
txt = 'Ram and Seeta went to forest with Lakshman'
words = [Link]()
t = list()
for word in words:
[Link]((len(word), word))
print(‘The list is:’,t)
[Link](reverse=True)
res = list()

PDIT,Hosapete Page 28
for length, word in t:
[Link](word)
print(‘The sorted list:’,res)

The output would be –

The list is: [(3, 'Ram'), (3, 'and'), (5, 'Seeta'), (4, 'went'),
(2, 'to'), (6, 'forest'), (4, 'with'), (8, 'Lakshman')]
The sorted list: ['Lakshman', 'forest', 'Seeta', 'went', 'with',
'and', 'Ram', 'to']
In the above program, we have split the sentence into a list of words. Then, a tuple
containing length of the word and the word itself are created and are appended to a list.
Observe the output of this list – it is a list of tuples. Then we are sorting this list in
descending order. Now for sorting, length of the word is considered, because it is a first
element in the tuple. At the end, we extract length and word in the list, and create another
list containing only the words and print it.

3.3.2 Tuple Assignment

Tuple has a unique feature of having it at LHS of assignment operator. This allows us to
assign values to multiple variables at a time.

>>> x,y=10,20
>>> print(x) #prints 10
>>> print(y) #prints 20
When we have list of items, they can be extracted and stored into multiple variables as
below –
>>> ls=["hello", "world"]
>>> x,y=ls
>>> print(x) #prints hello
>>> print(y) #prints world
This code internally means that –
x= ls[0]
y= ls[1]

The best known example of assignment of tuples is swapping two values as below –
>>> a=10
>>> b=20
>>> a, b = b, a
>>> print(a, b) #prints 20 10

PDIT,Hosapete Page 29
In the above example, the statement a, b = b, a is treated by Python as – LHS is a set
of variables, and RHS is set of expressions. The expressions in RHS are evaluated and
assigned to respective variables at LHS.

Giving more values than variables generates ValueError –

>>> a, b=10,20,5
ValueError: too many values to unpack (expected 2)

While doing assignment of multiple variables, the RHS can be any type of sequence like
list, string or tuple. Following example extracts user name and domain from an email ID.
>>> email='chetanahegde@[Link]'
>>> usrName, domain = [Link]('@')
>>> print(usrName) #prints chetanahegde
>>> print(domain) #prints [Link]

3.3.3 Dictionaries and Tuples

Dictionaries have a method called items() that returns a list of tuples, where each tuple is a
key-value pair as shown below –
>>> d = {'a':10, 'b':1, 'c':22}
>>> t = list([Link]())
>>> print(t)
[('b', 1), ('a', 10), ('c', 22)]

As dictionary may not display the contents in an order, we can use sort() on lists and then
print in required order as below –
>>> d = {'a':10, 'b':1, 'c':22}
>>> t = list([Link]())
>>> print(t)
[('b', 1), ('a', 10), ('c', 22)]
>>> [Link]()
>>> print(t)
[('a', 10), ('b', 1), ('c', 22)]

3.3.4 Multiple Assignment with Dictionaries

We can combine the method items(), tuple assignment and a for-loop to get a pattern for
traversing dictionary:
d={'Tom': 1292, 'Jerry': 3501, 'Donald':
8913} for key, val in list([Link]()):
print(val,key)

The output would be –

1292 Tom
3501 Jerry
8913 Donald

PDIT,Hosapete Page 30
This loop has two iteration variables because items() returns a list of tuples. And key, val
is a tuple assignment that successively iterates through each of the key-value pairs in the
dictionary. For each iteration through the loop, both key and value are advanced to the next
key-value pair in the dictionary in hash order.

Once we get a key-value pair, we can create a list of tuples and sort them –

d={'Tom': 9291, 'Jerry': 3501, 'Donald': 8913}

ls=list()
for key, val in [Link]():
[Link]((val,key)) #observe inner parentheses
print("List of tuples:",ls)
[Link](reverse=True)
print("List of sorted tuples:",ls)

The output would be –

List of tuples: [(9291, 'Tom'), (3501, 'Jerry'), (8913, 'Donald')]
List of sorted tuples: [(9291, 'Tom'), (8913, 'Donald'),
(3501, 'Jerry')]
In the above program, we are extracting key, val pair from the dictionary and appending
it to the list ls. While appending, we are putting inner parentheses to make sure that each
pair is treated as a tuple. Then, we are sorting the list in the descending order. The sorting
would happen based on the telephone number (val), but not on name (key), as first
element in tuple is telephone number (val).
3.3.5 The Most Common Words
We will apply the knowledge gained about strings, tuple, list and dictionary till here to solve
a problem – write a program to find most commonly used words in a text file.

The logic of the program is –

Open a file
Take a loop to iterate through every line of a file.
Remove all punctuation marks and convert alphabets into lower case (Reason
explained in Section 3.2.4)
Take a loop and iterate over every word in a line.
If the word is not there in dictionary, treat that word as a key, and initialize its value
as 1. If that word already there in dictionary, increment the value.
Once all the lines in a file are iterated, you will have a dictionary containing distinct
words and their frequency. Now, take a list and append each key-value (word-
frequency) pair into it.
Sort the list in descending order and display only 10 (or any number of) elements
from the list to get most frequent words.

PDIT,Hosapete Page 31
import string
fhand = open('[Link]')
counts = dict()
for line in fhand:
line = [Link]([Link]('', '',[Link]))
line = [Link]()

for word in [Link]():

if word not in counts:
counts[word] = 1
else:
counts[word] += 1

lst = list()
for key, val in list([Link]()):
[Link]((val, key))

[Link](reverse=True)
for key, val in lst[:10]:
print(key, val)

Run the above program on any text file of your choice and observe the output.

3.3.6 Using Tuples as Keys in Dictionaries

As tuples and dictionaries are hashable, when we want a dictionary containing composite
keys, we will use tuples. For Example, we may need to create a telephone directory where
name of a person is Firstname-last name pair and value is the telephone number. Our job
is to assign telephone numbers to these keys. Consider the program to do this task –

names=(('Tom','Cat'),('Jerry','Mouse'), ('Donald',
'Duck')) number=[3561, 4014, 9813]

telDir={}

for i in range(len(number)):
telDir[names[i]]=number[i]

for fn, ln in telDir:

print(fn, ln, telDir[fn,ln])

The output would be –

Tom Cat 3561
Jerry Mouse 4014
Donald Duck 9813

PDIT,Hosapete Page 32
3.3.7 Summary on Sequences: Strings, Lists and Tuples
Till now, we have discussed different types of sequences viz. strings, lists and tuples. In
many situations these sequences can be used interchangeably. Still, due their difference in
behavior and ability, we may need to understand pros and cons of each of them and then
to decide which one to use in a program. Here are few key points –

1. Strings are more limited compared to other sequences like lists and Tuples.
Because, the elements in strings must be characters only. Moreover, strings are
immutable. Hence, if we need to modify the characters in a sequence, it is better to
go for a list of characters than a string.
2. As lists are mutable, they are most common compared to tuples. But, in some
situations as given below, tuples are preferable.
a. When we have a return statement from a function, it is better to use tuples
rather than lists.
b. When a dictionary key must be a sequence of elements, then we must use
immutable type like strings and tuples
c. When a sequence of elements is being passed to a function as arguments,
usage of tuples reduces unexpected behavior due to aliasing.
3. As tuples are immutable, the methods like sort() and reverse() cannot be applied on
them. But, Python provides built-in functions sorted() and reversed() which will take
a sequence as an argument and return a new sequence with modified results.

3.3.8 Debugging
Lists, Dictionaries and Tuples are basically data structures. In real-time programming, we
may require compound data structures like lists of tuples, dictionaries containing tuples and
lists etc. But, these compound data structures are prone to shape errors – that is, errors
caused when a data structure has the wrong type, size, composition etc. For example,
when your code is expecting a list containing single integer, but you are giving a plain
integer, then there will be an error.

When debugging a program to fix the bugs, following are the few things a programmer can
try –

Reading: Examine your code, read it again and check that it says what you meant
to say.
Running: Experiment by making changes and running different versions. Often if
you display the right thing at the right place in the program, the problem becomes
obvious, but sometimes you have to spend some time to build scaffolding.
Ruminating: Take some time to think! What kind of error is it: syntax, runtime,
semantic? What information can you get from the error messages, or from the output
of the program? What kind of error could cause the problem you’re seeing? What did
you change last, before the problem appeared?
Retreating: At some point, the best thing to do is back off, undoing recent changes,
until you get back to a program that works and that you understand. Then you can
start rebuilding.

PDIT,Hosapete Page 33
Python Application Programming (Open Elective - 17CS664)

3.4 REGULAR EXPRESSIONS

Searching for required patterns and extracting only the lines/words matching the pattern is a
very common task in solving problems programmatically. We have done such tasks earlier
using string slicing and string methods like split(), find() etc. As the task of searching and
extracting is very common, Python provides a powerful library called regular expressions to
handle these tasks elegantly. Though they have quite complicated syntax, they provide
efficient way of searching the patterns.

The regular expressions are themselves little programs to search and parse strings. To use
them in our program, the library/module re must be imported. There is a search() function in
this module, which is used to find particular substring within a string. Consider the following
example –

import re
fhand = open('[Link]')
for line in fhand:
line = [Link]()
if [Link]('how', line):
print(line)

By referring to file [Link] that has been discussed in previous Chapters, the output would
be –

hello, how are you?

how about you?
In the above program, the search() function is used to search the lines containing a word
how.

One can observe that the above program is not much different from a program that uses
find() function of strings. But, regular expressions make use of special characters with
specific meaning. In the following example, we make use of caret (^) symbol, which indicates
beginning of the line.

import re
hand = open('[Link]')
for line in hand:
line = [Link]()
if [Link]('^how', line):
print(line)

The output would be –

how about you?

Here, we have searched for a line which starts with a string how. Again, this program will not
makes use of regular expression fully. Because, the above program would have been

PDIT,Hospete Page 1
written using a string function startswith(). Hence, in the next section, we will understand the
true usage of regular expressions.

3.4.1 Character Matching in Regular Expressions

Python provides a list of meta-characters to match search strings. Table 3.1 shows the
details of few important metacharacters. Some of the examples for quick and easy
understanding of regular expressions are given in Table 3.2.

Table 3.1 List of Important Meta-Characters

Character Meaning
^ (caret) Matches beginning of the line
$ Matches end of the line
. (dot) Matches any single character except newline. Using option m, then
newline also can be matched
[…] Matches any single character in brackets
[^…] Matches any single character NOT in brackets
re* Matches 0 or more occurrences of preceding expression.
re+ Matches 1 or more occurrence of preceding expression.
re? Matches 0 or 1 occurrence of preceding expression.
re{ n} Matches exactly n number of occurrences of preceding expression.
re{ n,} Matches n or more occurrences of preceding expression.
re{ n, m} Matches at least n and at most m occurrences of preceding expression.
a| b Matches either a or b.
(re) Groups regular expressions and remembers matched text.
\d Matches digits. Equivalent to [0-9].
\D Matches non-digits.
\w Matches word characters.
\W Matches non-word characters.
\s Matches whitespace. Equivalent to [\t\n\r\f].
\S Matches non-whitespace.
\A Matches beginning of string.
\Z Matches end of string. If a newline exists, it matches just before
newline.
\z Matches end of string.
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
( ) When parentheses are added to a regular expression, they are ignored
for the purpose of matching, but allow you to extract a particular subset
of the matched string rather than the whole string when using
findall()

PDIT,Hospete Page 2
Table 3.2 Examples for Regular Expressions
Expression Description
[Pp]ython Match "Python" or "python"
rub[ye] Match "ruby" or "rube"
[aeiou] Match any one lowercase vowel
[0-9] Match any digit; same as [0123456789]
[a-z] Match any lowercase ASCII letter
[A-Z] Match any uppercase ASCII letter
[a-zA-Z0-9] Match any of uppercase, lowercase alphabets and digits
[^aeiou] Match anything other than a lowercase vowel
[^0-9] Match anything other than a digit

Most commonly used metacharacter is dot, which matches any character. Consider the
following example, where the regular expression is for searching lines which starts with I
and has any two characters (any character represented by two dots) and then has a
character m.
import re
fhand = open('[Link]')
for line in fhand:
line = [Link]()
if [Link]('^I..m', line):
print(line)

The output would be –

I am doing fine.

Note that, the regular expression ^I..m not only matches ‘I am’, but it can match ‘Isdm’,
‘I*3m’ and so on. That is, between I and m, there can be any two characters.
In the previous program, we knew that there are exactly two characters between I and m.
Hence, we could able to give two dots. But, when we don’t know the exact number of
characters between two characters (or strings), we can make use of dot and + symbols
together. Consider the below given program –
import re
hand = open('[Link]')
for line in hand:
line = [Link]()
if [Link]('^h.+u', line):
print(line)

The output would be –

hello, how are you?
how about you?

PDIT,Hospete Page 3
Observe the regular expression ^h.+u here. It indicates that, the string should be starting
with h and ending with u and there may by any number of (dot and +) characters in-between.

Few examples:
To understand the behavior of few basic meta characters, we will see some examples. The
file used for these examples is [Link] which can be downloaded from –
[Link]

Use this as input and try following examples –

Pattern to extract lines starting with the word From (or from) and ending with edu:
import re
fhand = open('[Link]')
for line in fhand:
line = [Link]() pattern
= ‘^[Ff]rom.*edu$’ if
[Link](pattern, line):
print(line)

Here the pattern given for regular expression indicates that the line should start with
either From or from. Then there may be 0 or more characters, and later the line should
end with edu.

Pattern to extract lines ending with any digit:

Replace the pattern by following string, rest of the program will remain the same.
pattern = ‘[0-9]$’
Using Not :
pattern = ‘^[^a-z0-9]+’

Here, the first ^ indicates we want something to match in the beginning of a line. Then,
the ^ inside square-brackets indicate do not match any single character within bracket.
Hence, the whole meaning would be – line must be started with anything other than a
lower-case alphabets and digits. In other words, the line should not be started with
lowercase alphabet and digits.

Start with upper case letters and end with digits:

pattern = '^[A-Z].*[0-9]$'
Here, the line should start with capital letters, followed by 0 or more characters, but must
end with any digit.

PDIT,Hospete Page 4
3.4.2 Extracting Data using Regular Expressions
Python provides a method findall() to extract all of the substrings matching a regular
expression. This function returns a list of all non-overlapping matches in the string. If there is
no match found, the function returns an empty list. Consider an example of extracting
anything that looks like an email address from any line.
import re
s = 'A message from csev@[Link] to cwen@[Link] about meeting
@2PM'
lst = [Link]('\S+@\S+', s)
print(lst)

The output would be –

['csev@[Link]', 'cwen@[Link]']

Here, the pattern indicates at least one non-white space characters (\S) before @ and at
least one non-white space after @. Hence, it will not match with @2pm, because of a white-
space before @.

Now, we can write a complete program to extract all email-ids from the file.

import re
fhand = open('[Link]')
for line in fhand:
line = [Link]()
x = [Link]('\S+@\S+', line)
if len(x) > 0:
print(x)

Here, the condition len(x) > 0 is checked because, we want to print only the line which
contain an email-ID. If any line do not find the match for a pattern given, the findall() function
will return an empty list. The length of empty list will be zero, and hence we would like to print
the lines only with length greater than 0.

The output of above program will be something as below –

['[Link]@[Link]']
['<postmaster@[Link]>']
['<200801051412.m05ECIaH010327@[Link]>']
['<source@[Link]>;']
['<source@[Link]>;']
['<source@[Link]>;']
['apache@localhost)']
……………………………….
………………………………..

PDIT,Hospete Page 5
Note that, apart from just email-ID’s, the output contains additional characters (<, >, ; etc)
attached to the extracted pattern. To remove all that, refine the pattern. That is, we want
email-ID to be started with any alphabets or digits, and ending with only alphabets. Hence,
the statement would be –

x = [Link]('[a-zA-Z0-9]\S*@\S*[a-zA-Z]', line)

3.4.3 Combining Searching and Extracting

Assume that we need to extract the data in a particular syntax. For example, we need to
extract the lines containing following format –

X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
The line should start with X-, followed by 0 or more characters. Then, we need a colon and
white-space. They are written as it is. Then there must be a number containing one or more
digits with or without a decimal point. Note that, we want dot as a part of our pattern string,
but not as meta character here. The pattern for regular expression would be –
^X-.*: [0-9.]+

The complete program is –

import re
hand = open('[Link]')
for line in hand:
line = [Link]()
if [Link]('^X\S*: [0-9.]+', line):
print(line)

The output lines will as below –

X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
X-DSPAM-Confidence: 0.6178
X-DSPAM-Probability: 0.0000
X-DSPAM-Confidence: 0.6961
X-DSPAM-Probability: 0.0000
……………………………………………………
……………………………………………………

Assume that, we want only the numbers (representing confidence, probability etc) in the
above output. We can use split() function on extracted string. But, it is better to refine regular
expression. To do so, we need the help of parentheses.

When we add parentheses to a regular expression, they are ignored when matching the
string. But when we are using findall(), parentheses indicate that while we want the whole
expression to match, we only are interested in extracting a portion of the substring that
matches the regular expression.

PDIT,Hospete Page 6
import re
hand = open('[Link]')
for line in hand:
line = [Link]()
x = [Link]('^X-\S*: ([0-9.]+)', line)
if len(x) > 0:
print(x)

Because of the parentheses enclosing the pattern above, it will match the pattern starting
with X- and extracts only digit portion. Now, the output would be –
['0.8475']
['0.0000']
['0.6178']
['0.0000']
['0.6961']
…………………
………………..

Another example of similar form: The file [Link] contains lines like –
Details: [Link]

We may be interested in extracting only the revision numbers mentioned at the end of these
lines. Then, we can write the statement –

x = [Link]('^Details:.*rev=([0-9.]+)', line)

The regex here indicates that the line must start with Details:, and has something with
rev= and then digits. As we want only those digits, we will put parenthesis for that portion
of expression. Note that, the expression [0-9] is greedy, because, it can display very large
number. It keeps grabbing digits until it finds any other character than the digit. The output of
above regular expression is a set of revision numbers as given below –
['39772']
['39771']
['39770']
['39769']
………………………
………………………

Consider another example – we may be interested in knowing time of a day of each email.
The file [Link] has lines like –
From [Link]@[Link] Sat Jan 5 [Link] 2008

Here, we would like to extract only the hour 09. That is, we would like only two digits
representing hour. Hence, we need to modify our expression as –
x = [Link]('^From .* ([0-9][0-9]):', line)

PDIT,Hospete Page 7
Here, [0-9][0-9] indicates that a digit should appear only two times. The alternative way
of writing this would be -

x = [Link]('^From .* ([0-9]{2}):', line)

The number 2 within flower-brackets indicates that the preceding match should appear
exactly two times. Hence [0-9]{2} indicates there can be exactly two digits. Now, the
output would be –

['09']
['18']
['16']
['15']
…………………
…………………

3.4.4 Escape Character

As we have discussed till now, the character like dot, plus, question mark, asterisk, dollar
etc. are meta characters in regular expressions. Sometimes, we need these characters
themselves as a part of matching string. Then, we need to escape them using a back-slash.
For example,

import re
x = 'We just received $10.00 for cookies.'
y = [Link]('\$[0-9.]+',x)
Output:
['$10.00']

Here, we want to extract only the price $10.00. As, $ symbol is a metacharacter, we need to
use \ before it. So that, now $ is treated as a part of matching string, but not as
metacharacter.

3.4.5 Bonus Section for Unix/Linux Users

Support for searching files using regular expressions was built into the Unix OS. There is a
command-line program built into Unix called grep (Generalized Regular Expression Parser)
that behaves similar to search() function.
$ grep '^From:' [Link]
Output:
From: [Link]@[Link]
From: louis@[Link]
From: zqian@[Link]
From: rjlowe@[Link]
Note that, grep command does not support the non-blank character \S, hence we need to
use [^ ] indicating not a white-space.

PDIT,Hospete Page 8

Python - Week 1 PDF
No ratings yet
Python - Week 1 PDF
28 pages
Python 1
No ratings yet
Python 1
111 pages
Python Syntax and Data Types Overview
No ratings yet
Python Syntax and Data Types Overview
11 pages
Python Programming Basics Guide
No ratings yet
Python Programming Basics Guide
29 pages
GE8151 Notes
No ratings yet
GE8151 Notes
118 pages
1 Year Lab Python 2017 Manual
No ratings yet
1 Year Lab Python 2017 Manual
54 pages
Python Mastery Lessons
No ratings yet
Python Mastery Lessons
28 pages
Python Fundmentals Idrees 1
No ratings yet
Python Fundmentals Idrees 1
32 pages
LAB Python Basics Ver 7.0
No ratings yet
LAB Python Basics Ver 7.0
17 pages
Python Basics for CS41 Students
No ratings yet
Python Basics for CS41 Students
50 pages
JavaScript for Web Developers
No ratings yet
JavaScript for Web Developers
4 pages
Introduction to Python Programming
No ratings yet
Introduction to Python Programming
79 pages
Yet Another Insignificant Programming Notes
100% (1)
Yet Another Insignificant Programming Notes
98 pages
JavaScript Object References
No ratings yet
JavaScript Object References
45 pages
Basic Python For Scientists: Pim Schellart May 27, 2010
No ratings yet
Basic Python For Scientists: Pim Schellart May 27, 2010
21 pages
Grade XII Python & SQL Guide
No ratings yet
Grade XII Python & SQL Guide
64 pages
Introduction to Python Programming
No ratings yet
Introduction to Python Programming
48 pages
Python Guide Documentation: Release 0.0.1
No ratings yet
Python Guide Documentation: Release 0.0.1
167 pages
Javascript Overview
No ratings yet
Javascript Overview
2 pages
Python Numpy Tutorial
No ratings yet
Python Numpy Tutorial
22 pages
Python Basics for Anumati Sekar
No ratings yet
Python Basics for Anumati Sekar
60 pages
Modul 4 - Function Tuple Dictionaries and Data Processing
No ratings yet
Modul 4 - Function Tuple Dictionaries and Data Processing
20 pages
Python References
No ratings yet
Python References
64 pages
Data Science Program with Purdue
100% (1)
Data Science Program with Purdue
28 pages
Python Cheat Sheat
No ratings yet
Python Cheat Sheat
10 pages
Python 3
No ratings yet
Python 3
122 pages
M.Tech for IT Professionals
No ratings yet
M.Tech for IT Professionals
18 pages
Python Programming Lab Exercises
No ratings yet
Python Programming Lab Exercises
39 pages
Graph Algorithms: Representation & Search
No ratings yet
Graph Algorithms: Representation & Search
4 pages
Algorithm
No ratings yet
Algorithm
27 pages
04 Python 1
No ratings yet
04 Python 1
43 pages
SAP Profitability Analysis Guide
No ratings yet
SAP Profitability Analysis Guide
47 pages
Python Types Branching Functions: If Def Return
No ratings yet
Python Types Branching Functions: If Def Return
1 page
SAP Output Types and Print Programs
No ratings yet
SAP Output Types and Print Programs
10 pages
Python Quick Guide - Tutorialspoint
No ratings yet
Python Quick Guide - Tutorialspoint
199 pages
AI & Machine Learning Postgrad Program
No ratings yet
AI & Machine Learning Postgrad Program
30 pages
Python Lab Manual - STUDENTCOPY2
No ratings yet
Python Lab Manual - STUDENTCOPY2
19 pages
Data Analysis Tutorial
No ratings yet
Data Analysis Tutorial
152 pages
Configure Automatic Postings: Further Information
No ratings yet
Configure Automatic Postings: Further Information
13 pages
Python Essentials
No ratings yet
Python Essentials
181 pages
BP Optimizer: Natural Blood Pressure Solution
No ratings yet
BP Optimizer: Natural Blood Pressure Solution
3 pages
GE8161 Problem Solving and Python Programming Lab Manual
84% (31)
GE8161 Problem Solving and Python Programming Lab Manual
16 pages
ML Notesv1
100% (2)
ML Notesv1
300 pages
Understanding AI: Types and Applications
No ratings yet
Understanding AI: Types and Applications
9 pages
Python Programming Basics Guide
No ratings yet
Python Programming Basics Guide
169 pages
Introduction To Artificial Intelligence: Aiza Shabir Lecturer Institute of CS&IT The Women University Multan
No ratings yet
Introduction To Artificial Intelligence: Aiza Shabir Lecturer Institute of CS&IT The Women University Multan
19 pages
Introduction to Python Basics
No ratings yet
Introduction to Python Basics
5 pages
Module 3 - 3rd Edition
No ratings yet
Module 3 - 3rd Edition
71 pages
Lab 2 Variable and Data Type I
No ratings yet
Lab 2 Variable and Data Type I
9 pages
Settings in Capacity Planning: Answer
No ratings yet
Settings in Capacity Planning: Answer
2 pages
Python - The Basics
No ratings yet
Python - The Basics
6 pages
Python Virtual Machine Overview
No ratings yet
Python Virtual Machine Overview
148 pages
Python Programming Exercises
No ratings yet
Python Programming Exercises
45 pages
Python Functions and Data Structures Guide
No ratings yet
Python Functions and Data Structures Guide
132 pages
A Collection of Useful Scripts, Tutorials and Other Python Related Thing
0% (1)
A Collection of Useful Scripts, Tutorials and Other Python Related Thing
5 pages
Understanding Python Lists and Loops
No ratings yet
Understanding Python Lists and Loops
31 pages
Python Lists: Operations and Methods
No ratings yet
Python Lists: Operations and Methods
32 pages
3.1.1 Lists Are Mutable
No ratings yet
3.1.1 Lists Are Mutable
27 pages
18CS752 Python Code
No ratings yet
18CS752 Python Code
48 pages
17CS664 Module3 Lists Dictionaries Tuples (Python)
No ratings yet
17CS664 Module3 Lists Dictionaries Tuples (Python)
31 pages
Python Data Types Overview
No ratings yet
Python Data Types Overview
44 pages
Python Identifier Naming Rules
No ratings yet
Python Identifier Naming Rules
57 pages
Python Data Types Explained
No ratings yet
Python Data Types Explained
6 pages
DSA Practical
No ratings yet
DSA Practical
51 pages
Complete Ruby Data Processing: Using Map, Reduce, and Select 1st Edition Jay Godse PDF For All Chapters
100% (5)
Complete Ruby Data Processing: Using Map, Reduce, and Select 1st Edition Jay Godse PDF For All Chapters
55 pages
Introduction To Programming Using Python 1st Edition Schneider Test Bankinstant Download
100% (15)
Introduction To Programming Using Python 1st Edition Schneider Test Bankinstant Download
39 pages
Ip Practical File
No ratings yet
Ip Practical File
31 pages
Overview of Python Data Types
No ratings yet
Overview of Python Data Types
8 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
10 pages
Time and Space Complexity - 2023
No ratings yet
Time and Space Complexity - 2023
47 pages
Data Science Assignment 1
No ratings yet
Data Science Assignment 1
20 pages
Class XII Computer Science Practical List
No ratings yet
Class XII Computer Science Practical List
4 pages
Ge3151 Unit4 Print
No ratings yet
Ge3151 Unit4 Print
18 pages
1738315731-Toc Modular Python 2.0 Book-X
No ratings yet
1738315731-Toc Modular Python 2.0 Book-X
1 page
Simple Material Data Structures in Python
No ratings yet
Simple Material Data Structures in Python
20 pages
Dictionary
No ratings yet
Dictionary
30 pages
UNIT-4 Python Programming
No ratings yet
UNIT-4 Python Programming
18 pages
Unit 1
No ratings yet
Unit 1
181 pages
CS Practical File Hiya FINALLL
No ratings yet
CS Practical File Hiya FINALLL
61 pages
Wing Smart Contract Security Audit
No ratings yet
Wing Smart Contract Security Audit
16 pages
(100% Off) Complete Python Bootcamp For Everyone From Zero To Hero 2023
No ratings yet
(100% Off) Complete Python Bootcamp For Everyone From Zero To Hero 2023
29 pages
Quick Python Learning Guide
No ratings yet
Quick Python Learning Guide
13 pages
Ebook - Python From Beginner To Advanced
50% (2)
Ebook - Python From Beginner To Advanced
65 pages
Bloom Filter: Algorithm Description
No ratings yet
Bloom Filter: Algorithm Description
11 pages
Python Basics: Variables, Strings, Lists, and More
No ratings yet
Python Basics: Variables, Strings, Lists, and More
65 pages
B.Tech CSE & IT Semester 3-4 Scheme
No ratings yet
B.Tech CSE & IT Semester 3-4 Scheme
52 pages
Pyhton 4 Dictionaries
No ratings yet
Pyhton 4 Dictionaries
33 pages
Python Notes
No ratings yet
Python Notes
11 pages
List, Tuple, Dictionary
No ratings yet
List, Tuple, Dictionary
12 pages
Numbers String List Tuple Dictionary
No ratings yet
Numbers String List Tuple Dictionary
5 pages

Module-3 Python 17CS664

Uploaded by

Module-3 Python 17CS664

Uploaded by

Python Application Programming (Open Elective - 17CS664)

An empty list can be created any of the following ways –

3.1.1 Lists are Mutable

>>> ls=[34, 'hi', [2,3],-5]

>>> ls=[34, 'hi', [2,3],-5]

The index for extracting list elements has following properties –

The in operator applied on lists will results in a Boolean value.

print(ls) #output is [1, 4, 9, 16]

Here, we wanted to do modification in the elements of list. Hence, referring indices is

3.1.3 List Operations

>>> [0]*4 #repetition using *

3.1.4 List Slices

Extracting elements from 2nd position –

Extracting first three elements –

Selecting some middle elements –

Using negative indexing –

Reversing a list using negative value for stride –

Modifying (reassignment) only required set of values –

Thus, slicing can make many tasks simple.

3.1.5 List Methods

Now, in the above example, the list ls1 is unaltered.

>>> ls=[3,10,5, 16,-2]

reverse(): This method can be used to reverse the given list.

count(): This method is used to count number of occurrences of a particular value

>>> ls=[15, 4, 2, 10, 5, 3, 2, 6]

If the value is not present in the list, it throws ValueError.

Few important points about List Methods:

>>> ls=[34, 'hi', -5]

Similarly, when a list contains integers and sub-list, it will be an error.

>>> ls=[3, 4.5, 2]

3.1.6 Deleting Elements

>>> ls=[5,8, -12,34,2]

Deleting all odd indexed elements of a list –

3.1.7 Lists and Functions

average = sum(ls) / len(ls)

3.1.8 Lists and Strings

3.1.9 Parsing Lines

Lists are not interned. Hence, we can see following result –

Strings are safe in this regards, as they are immutable.

3.1.12 List Arguments

An empty dictionary can be created using two ways –

To add items to dictionary, we can use square brackets as –

>>> 3491 in tel_dir.values() #output is True

The in operator behaves differently in case of lists and dictionaries as explained

s=input("Enter a string:") #read a string

print(d) #display the dictionary

The sample output would be –

Enter a string: Hello World

3.2.2 Looping and Dictionaries

tel_dir={'Tom': 3491, 'Jerry':8135, 'Mickey':1253}

The output would be –

3.2.3 Dictionaries and Files

for line in fhand:

Enter file name: [Link]

Few points to be observed in the above output –

3.2.4 Advanced Text Parsing

>>> import string

fname=input("Enter file name:")

Now, the output would be –

Now, each of these keys can be hashed into a hash table as –

In general, a hash function should satisfy the following requirements:

The hash table would be –

Now, hashing is done as below –

Observe that tuple values can be of mixed types.

>>> t=3, #or use the statement t=(3,)

Create tuple using string:

Create tuple using list:

Create tuple using another tuple:

>>> t=('Mango', 34, 'hi')

>>> t=('Mango', 'Banana', 'Apple')

Consider a program of sorting words in a sentence from longest to shortest, which

The output would be –

3.3.2 Tuple Assignment

Giving more values than variables generates ValueError –

3.3.3 Dictionaries and Tuples

3.3.4 Multiple Assignment with Dictionaries

>>> [0]4 #repetition using