Python_Data_Structure
Python_Data_Structure
Li Yin1
1
www.liyinscience.com
ii
Python Data Structures
0.1 Introduction
Python is object-oriented programming language where each object is im-
plemented using C++ in the backend. The built-in data types of C++ fol-
lows more rigidly to the abstract data structures. We would get by just
learning how to use Python data types alone: its property–immutable or
mutable, its built-in in-place operations–such as append(), insert(),
add(), remove(), replace() and so, and built-in functions and opera-
tions that offers additional ability to manipulate data structure–an object
here. However, some data types’ behaviors might confuse us with abstract
data structures, making it hard to access and evaluate its efficiency.
In this chapter and the following three chapters, we starts to learn
Python data structures by relating its C++ data structures to our learned
abstract data structures, and then introduce each’s property, built-in opera-
tions, built-in functions and operations. Please read the section Understand-
ing Object in the Appendix–Python Knowledge Base to to study the properties
of Built-in Data Types first if Python is not your familiar language.
iii
iv PYTHON DATA STRUCTURES
• For linked list, stack, queue, we either need to implement it with build-
in data types or we have Python Modules.
All these sequence type data structures share the most common methods
and operations shown in Table 4 and 5. To note that in Python, the indexing
starts from 0.
Let us examine each type of sequence further to understand its perfor-
mance, and relation to array data structures.
0.2. ARRAY AND PYTHON SEQUENCE v
0.2.2 Range
Range Syntax
The range object has three attributes: start, stop, step, and a range
object can be created as range(start, stop, step. These attributes need
to integers–both negative and positive works–to define a range, which will
be [start, stop). The default value for start and stop is 0. For example:
1 >>> a = r a n g e ( 1 0 )
2 >>> b = r a n g e ( 0 , 1 0 , 2 )
3 >>> a , b
4 ( range (0 , 10) , range (0 , 10 , 2) )
Like any other sequence types, range is iterable, can be indexed and sliced.
This is just how we define the behavior of the range class back in the
C++ code. We does not need to save all integers in the range, but be
generated with function that specifically asks for it.
0.2.3 String
String is static array and its items are just characters, represented using
ASCII or Unicode 1 . String is immutable which means once its created we
can no longer modify its content or extent its size. String is more compact
compared with storing the characters in list because of its backing array
wont be assigned to any extra space.
String Syntax
strings can be created in Python by wrapping a sequence of characters in
single or double quotes. Multi-line strings can easily be created using three
quote characters.
Join The str.join() method will concatenate two strings, but in a way
that passes one string through another. For example, we can use the
str.join() method to add whitespace to that string, which we can do
like so:
1 b a l l o o n = "Sammy has a b a l l o o n . "
2 print ( " " . join ( balloon ) )
3 #Ouput
4 S a mm y h a s a b a l l o o n .
Split Just as we can join strings together, we can also split strings up using
the str.split() method. This method separates the string by whitespace
if no other parameter is given.
1 print ( balloon . s p l i t () )
2 #Ouput
3 [ 'Sammy ' , ' has ' , ' a ' , ' b a l l o o n . ' ]
We can also use str.split() to remove certain parts of an original string. For
example, let’s remove the letter ’a’ from the string:
In Python 3, all strings are represented in Unicode. In Python 2 are stored internally
1
Now the letter a has been removed and the strings have been separated
where each instance of the letter a had been, with whitespace retained.
Replace The str.replace() method can take an original string and re-
turn an updated string with some replacement.
Let’s say that the balloon that Sammy had is lost. Since Sammy no
longer has this balloon, we will change the substring "has" from the original
string balloon to "had" in a new string:
1 p r i n t ( b a l l o o n . r e p l a c e ( " has " , " had " ) )
2 #Ouput
3 Sammy had a b a l l o o n .
String Functions
Because string is one of the most fundamental built-in data types, this makes
managing its built-in common methods shown in Table 1 and 2 necessary.
Use boolean methods to check whether characters are lower case, upper case,
or title case, can help us to sort our data appropriately, as well as provide
us with the opportunity to standardize data we collect by checking and then
modifying strings as needed.
0.2.4 List
The underlying abstract data structure of list data types is dynamic
array, meaning we can add, delete, modify items in the list. It supports
random access by indexing. List is the most widely one among sequence
types due to its mutability.
Even if list supports data of arbitrary types, we do not prefer to do this.
Use tuple or namedtuple for better practice and offers better clarification.
viii PYTHON DATA STRUCTURES
2 l s t 2 = [ 4 f o r i in range (5) ]
3 f o r idx , v i n enumerate ( l s t 1 ) :
4 l s t 1 [ i d x ] += 1
Add Item We can add items into list through insert(index, value)–
inserting an item at a position in the original list or list.append(value)–
appending an item at the end of the list.
1 # INSERTION
2 l s t . i n s e r t ( 0 , 1 ) # i n s e r t an e l e m e n t a t i n d e x 0 , and s i n c e i t i s
empty l s t . i n s e r t ( 1 , 1 ) has t h e same e f f e c t
3 print ( l s t )
4
5 l s t 2 . i n s e r t (2 , 3)
6 print ( lst2 )
7 # output
8 # [1]
9 # [2 , 2 , 3 , 2 , 2]
10 # APPEND
11 f o r i in range (2 , 5) :
12 l s t . append ( i )
13 print ( l s t )
14 # output
15 # [1 , 2 , 3 , 4]
Delete Item
Get Size of the List We can use len built-in function to find out the
number of items storing in the list.
1 print ( len ( lst2 ) )
2 # 4
And now, let us get the memory size of lst_lst and each list item in this
list.
1 import s y s
2 for l s t in l s t _ l s t :
3 p r i n t ( s y s . g e t s i z e o f ( l s t ) , end= ' ' )
4 print ( sys . g e t s i z e o f ( l s t _ l s t ) )
We can see a list of integers takes the same memory size as of a list of strings
with equal length.
insert and append Whenever insert and append is called, and assume
the original length is n, Python could compare n + 1 with its allocated
length. If you append or insert to a Python list and the backing array isn’t
big enough, the backing array must be expanded. When this happens, the
backing array is grown by approximately 12% the following formula (comes
from C++):
1 n e w _ a l l o c a t e d = ( s i z e _ t ) n e w s i z e + ( n e w s i z e >> 3 ) +
2 ( newsize < 9 ? 3 : 6) ;
size : 9 b y t e s : 16 i d : 140682152394952
size : 10 b y t e s : 16 i d : 140682152394952
size : 11 b y t e s : 16 i d : 140682152394952
size : 12 b y t e s : 16 i d : 140682152394952
size : 13 b y t e s : 16 i d : 140682152394952
size : 14 b y t e s : 16 i d : 140682152394952
size : 15 b y t e s : 16 i d : 140682152394952
size : 16 b y t e s : 16 i d : 140682152394952
size : 17 b y t e s : 25 i d : 140682152394952
The output addresses the growth patterns as [0, 4, 8, 16, 25, 35, 46, 58, 72,
88, ...].
Amortizely, append takes O(1). However, it is O(n) for insert because
it has to first shift all items in the original list from [pos, end] by one position,
and put the item at pos with random access.
Two-dimensional List
Two dimensional list is a list within a list. In this type of array the position
of an data element is referred by two indices instead of one. So it represents
a table with rows and columns of data. For example, we can declare the
following 2-d array:
1 ta = [ [ 1 1 , 3 , 9 , 1 ] , [ 2 5 , 6 , 1 0 ] , [ 1 0 , 8 , 12 , 5 ] ]
xii PYTHON DATA STRUCTURES
The scalar data in two dimensional lists can be accessed using two indices.
One index referring to the main or parent array and another index referring
to the position of the data in the inner list. If we mention only one index
then the entire inner list is printed for that index position. The example
below illustrates how it works.
1 p r i n t ( ta [ 0 ] )
2 p r i n t ( ta [ 2 ] [ 1 ] )
In the above example, we new a 2-d list and initialize them with values.
There are also ways to new an empty 2-d array or fix the dimension of the
outer array and leave it empty for the inner arrays:
1 # empty two d i m e n s i o n a l l i s t
2 empty_2d = [ [ ] ]
3
4 # f i x the outer dimension
5 fix_out_d = [ [ ] f o r _ i n r a n g e ( 5 ) ]
6 p r i n t ( fix_out_d )
All the other operations such as delete, insert, update are the same as of
the one-dimensional list.
However, we can not declare it in the following way, because we end up with
some copies of the same inner lists, thus modifying one element in the inner
lists will end up changing all of the them in the corresponding positions.
Unless the feature suits the situation.
1 # wrong d e c l a r a t i o n
2 m4 = [ [ 0 ] ∗ c o l s ] ∗ rows
3 m4 [ 1 ] [ 2 ] = 1
4 p r i n t (m4)
With output:
[[0 , 0 , 1 , 0] , [0 , 0 , 1 , 0] , [0 , 0 , 1 , 0]
Access Rows and Columns In the real problem solving, we might need
to access rows and columns. Accessing rows is quite easy since it follows the
declaraion of two-dimensional array.
1 # a c c e s s i n g row
2 f o r row i n m1 :
3 p r i n t ( row )
There’s also a handy “idiom” for transposing a nested list, turning ’columns’
into ’rows’:
1 transposedM1 = l i s t ( z i p ( ∗m1) )
2 p r i n t ( transposedM1 )
0.2.5 Tuple
A tuple has static array as its backing abstract data structure in C, which
is immutable–we can not add, delete, or replace items once its created and
assigned with value. You might think if list is a dynamic array and has
no restriction same as of the tuple, why would we need tuple then?
Tuple VS List We list how we use each data type and why is it. The
main benefit of tuple’s immutability is it is hashable, we can use them as
keys in the hash table–dictionary types, whereas the mutable types such
as list and range can not be applied. Besides, in the case that the data
does not to change, the tuple’s immutability will guarantee that the data
remains write-protected and iterating an immutable sequence is faster than
a mutable sequence, giving it slight performance boost. Also, we generally
use tuple to store a variety of data types. For example, in a class score
system, for a student, we might want to have its name, student id, and test
score, we can write (’Bob’, 12345, 89).
Tuple Syntax
New and Initialize Tuple Tuples are created by separating the items
with a comma. It is commonly wrapped in parentheses for better readability.
Tuple can also be created via a built-in function tuple(), if the argument to
tuple() is a sequence then this creates a tuple of elements of that sequences.
This is also used to realize type conversion.
An empty tuple:
1 tup = ( )
2 tup3 = t u p l e ( )
When there is only one item, put comma behind so that it wont be translated
as string, which is a bit bizarre!
1 tup2 = ( ' c r a c k ' , )
2 tup1 = ( ' c r a c k ' , ' l e e t c o d e ' , 2 0 1 8 , 2 0 1 9 )
Converting a string to a tuple with each character separated.
1 tup4 = t u p l e ( " l e e t c o d e " ) # t h e s e q u e n c e i s p a s s e d a s a t u p l e o f
elements
2 >> tup4 : ( ' l ' , ' e ' , ' e ' , ' t ' , ' c ' , 'o ' , 'd ' , ' e ' )
Converting a list to a tuple.
1 tup5 = t u p l e ( [ ' c r a c k ' , ' l e e t c o d e ' , 2 0 1 8 , 2 0 1 9 ] ) # same a s t u p l e 1
If we print out these tuples, it will be
1 tup1 : ( ' crack ' , ' l e e t c o d e ' , 2018 , 2019)
2 tup2 : crack
3 tup3 : ()
4 tup4 : ( ' l ' , ' e ' , ' e ' , ' t ' , ' c ' , 'o ' , 'd ' , 'e ')
5 tup5 : ( ' crack ' , ' l e e t c o d e ' , 2018 , 2019)
0.2. ARRAY AND PYTHON SEQUENCE xv
However, for its items which are mutable itself, we can still manipulate it.
For example, we can use index to access the list item at the last position of
a tuple and modify the list.
1 tup [ − 1 ] [ 0 ] = 4
2 #( ' a ' , ' b ' , [ 4 , 2 , 3 ] )
Understand Tuple
The backing structure is static array which states that the way the tuple
is structure is similar to list, other than its write-protected. We will just
brief on its property.
Tuple Object and Pointers Tuple object itself takes 48 bytes. And all
the others are similar to corresponding section in list.
1 lst_tup = [ ( ) , ( 1 , ) , ( ' 1 ' ,) , (1 , 2) , ( ' 1 ' , ' 2 ' ) ]
2 import s y s
3 f o r tup i n l s t _ t u p :
4 p r i n t ( s y s . g e t s i z e o f ( tup ) , end= ' ' )
Named Tuples
In named tuple, we can give all records a name, say “Computer_Science” to
indicate the class name, and we give each item a name, say ’name’, ’id’, and
’score’. We need to import namedtuple class from module collections.
For example:
1 r e c o r d 1 = ( ' Bob ' , 1 2 3 4 5 , 8 9 )
2 from c o l l e c t i o n s import namedtuple
3 Record = namedtuple ( ' Computer_Science ' , ' name i d s c o r e ' )
4 r e c o r d 2 = Record ( ' Bob ' , i d =12345 , s c o r e =89)
5 print ( record1 , record2 )
xvi PYTHON DATA STRUCTURES
0.2.6 Summary
All these sequence type data structures share the most common methods
and operations shown in Table 4 and 5. To note that in Python, the indexing
starts from 0.
Table 5: Common out of place operators for Sequence Data Type in Python
Operation Description
s+r Concatenates two sequences of the same type
s*n Make n copies of s, where n is an integer
v1 , v2 , ..., vn = s Unpack n variables from s
s[i] Indexing-returns ith element of s
s[i:j:stride] Slicing-returns elements between i and j with
optinal stride
x in s Return T rue if element x is in s
x not in s Return T rue if element x is not in s
0.2.7 Bonus
Circular Array The corresponding problems include:
0.2.8 Exercises
1. 985. Sum of Even Numbers After Queries (easy)
0.2. ARRAY AND PYTHON SEQUENCE xvii
Linked list consists of nodes, and each node consists of at least two
variables for singly linked lit: val to save data and next, a pointer that
points to the successive node. The Node class is given as:
1 c l a s s Node ( o b j e c t ) :
2 d e f __init__ ( s e l f , v a l = None ) :
3 s e l f . val = val
4 s e l f . next = None
In Singly Linked List, usually we can start to with a head node which
points to the first node in the list; only with this single node we are able
to trace other nodes. For simplicity, demonstrate the process without using
class, but we provide a class implementation with name SinglyLinkeList
in our online python source code. Now, let us create an empty node named
head.
1 head = None
0.3. LINKED LIST xix
The first case is simply bad: we would generate a new node and we can not
track the head through in-place operation. However, with the dummy node,
only the second case will appear. The code is:
1 d e f append ( head , v a l ) :
2 node = Node ( v a l )
3 c u r = head
4 w h i l e c u r . next :
5 c u r = c u r . next
6 c u r . next = node
7 return
Now, let use create the same exact linked list in Fig. 1:
1 f o r v a l i n [ 'A ' , 'B ' , 'C ' , 'D ' ] :
2 append ( head , v a l )
1 f o r node i n i t e r ( head ) :
2 p r i n t ( node . v a l , end = ' ' )
Search operation we find a node by value, and we return this node, otherwise,
we return None.
1 d e f s e a r c h ( head , v a l ) :
2 f o r node i n gen ( head ) :
3 i f node . v a l == v a l :
4 r e t u r n node
5 r e t u r n None
Delete Operation For deletion, there are two scenarios: deleting a node
by value when we are given the head node and deleting a given node such
as the node we got from searching ’B’.
The first case requires us to first locate the node first, and rewire the
pointers between the predecessor and successor of the deleting node. Again
here, if we do not have a dummy node, we would have two cases: if the
node is the head node, repoint the head to the next node, we connect the
previous node to deleting node’s next node, and the head pointer remains
untouched. With dummy node, we would only have the second situation.
In the process, we use an additional variable prev to track the predecessor.
1 d e f d e l e t e ( head , v a l ) :
2 c u r = head . next # s t a r t from dummy node
3 prev = head
4 while cur :
5 i f c u r . v a l == v a l :
6 # rewire
7 prev . next = c u r . next
8 return
9 prev = c u r
10 c u r = c u r . next
Now the output will indicate we only have two nodes left:
1 C D
The second case might seems a bit impossible–we do not know its pre-
vious node, the trick we do is to copy the value of the next node to current
0.3. LINKED LIST xxi
node, and we delete the next node instead by pointing current node to the
node after next node. While, that is only when the deleting node is not the
last node. When it is, we have no way to completely delete it; but we can
make it “invalid” by setting value and Next to None.
1 d e f d e l e t e ( head , v a l ) :
2 c u r = head . next # s t a r t from dummy node
3 prev = head
4 while cur :
5 i f c u r . v a l == v a l :
6 # rewire
7 prev . next = c u r . next
8 return
9 prev = c u r
10 c u r = c u r . next
Now, let us try deleting the node ’B’ via our previously found node.
1 deleteByNode ( node )
2 f o r n i n gen ( head ) :
3 p r i n t ( n . v a l , end = ' ' )
Clear When we need to clear all the nodes of the linked list, we just set
the node next to the dummy head to None.
1 def clear ( s e l f ) :
2 s e l f . head = None
3 self . size = 0
Question: Some linked list can only allow insert node at the tail which
is Append, some others might allow insertion at any location. To get the
length of the linked list easily in O(1), we need a variable to track the size
On the basis of Singly linked list, doubly linked list (dll) contains an
extra pointer in the node structure which is typically called prev (short for
previous) and points back to its predecessor in the list. We define the Node
class as:
xxii PYTHON DATA STRUCTURES
1 c l a s s Node :
2 d e f __init__ ( s e l f , v a l , prev = None , next = None ) :
3 s e l f . val = val
4 s e l f . prev = prev # r e f e r e n c e t o p r e v i o u s node i n DLL
5 s e l f . next = next # r e f e r e n c e t o next node i n DLL
1 d e f s e a r c h ( head , v a l ) :
2 f o r node i n gen ( head ) :
3 i f node . v a l == v a l :
4 r e t u r n node
5 r e t u r n None
0.3. LINKED LIST xxiii
Comparison We can see there is some slight advantage of dll over sll, but
it comes with the cost of handing the extra prev. This would only be an
advantage when bidirectional searching plays dominant factor in the matter
of efficiency, otherwise, better stick with sll.
Tips From our implementation, in some cases we still need to worry about
if it is the last node or not. The coding logic can further be simplified if we
put a dummy node at the end of the linked list too.
0.3.3 Bonus
Circular Linked List A circular linked list is a variation of linked list in
which the first node connects to last node. To make a circular linked list
from a normal linked list: in singly linked list, we simply set the last node’s
next pointer to the first node; in doubly linked list, other than setting the
last node’s next pointer, we set the prev pointer of the first node to the last
node making the circular in both directions.
Compared with a normal linked list, circular linked list saves time for us
to go to the first node from the last (both sll and dll) or go to the last node
xxiv PYTHON DATA STRUCTURES
from the first node (in dll) by doing it in a single step through the extra
connection. Because it is a circle, when ever a search with a while loop is
needed, we need to make sure the end condition: just make sure we searched
a whole cycle by comparing the iterating node to the starting node.
Input : 1−>1−>2
Output : 1−>2
Example 2 :
Input : 1−>1−>2−>3−>3
Output : 1−>2−>3
Analysis
This is a linear complexity problem, the most straightforward way is to
iterate through the linked list and compare the current node’s value with
the next’s to check its equivalency: (1) if YES: delete one of the nodes, here
we go for the next node; (2) if NO: we can move to the next node safely and
sound.
Recursion Now, if we use recursion and return the node, thus, at each
step, we can compare our node with the returned node (locating behind the
current node), same logical applies. A better way to help us is drawing out
an example. With 1->1->1. The last 1 will return, and at the second last
1, we can compare them, because it equals, we delete the last 1, now we
backtrack to the first 1 with the second last 1 as returned node, we compare
again. The code is the simplest among all solutions.
1 d e f r e c u r s i v e ( node ) :
2 i f node . next i s None :
3 r e t u r n node
4
5 next = r e c u r s i v e ( node . next )
6 i f next . v a l == node . v a l :
7 node . next = node . next . next
8 r e t u r n node
0.3.5 Exercises
Basic operations:
1. 237. Delete Node in a Linked List (easy, delete only given current
node)
xxvi PYTHON DATA STRUCTURES
6. Sort List
7. Reorder List
Fast-slow pointers:
Stack The implementation for stack is simplily adding and deleting ele-
ment from the end.
1 # stack
2 s = []
3 s . append ( 3 )
4 s . append ( 4 )
5 s . append ( 5 )
6 s . pop ( )
Queue For queue, we can append at the last, and pop from the first index
always. Or we can insert at the first index, and use pop the last element.
1 # queue
2 # 1 : u s e append and pop
3 q = []
4 q . append ( 3 )
5 q . append ( 4 )
6 q . append ( 5 )
7 q . pop ( 0 )
Stack and Singly Linked List with top pointer Because in stack, we
only need to add or delete item from the rear, using one pointer pointing at
the rear item, and the linked list’s next is connected to the second toppest
item, in a direction from the top to the bottom.
1 # s t a c k with l i n k e d l i s t
2 ' ' ' a<−b<−c<−top ' ' '
3 c l a s s Stack :
4 d e f __init__ ( s e l f ) :
5 s e l f . top = None
6 self . size = 0
7
xxviii PYTHON DATA STRUCTURES
8 # push
9 d e f push ( s e l f , v a l ) :
10 node = Node ( v a l )
11 i f s e l f . top : # c o n n e c t top and node
12 node . next = s e l f . top
13 # r e s e t t h e top p o i n t e r
14 s e l f . top = node
15 s e l f . s i z e += 1
16
17 d e f pop ( s e l f ) :
18 i f s e l f . top :
19 v a l = s e l f . top . v a l
20 i f s e l f . top . next :
21 s e l f . top = s e l f . top . next # r e s e t top
22 else :
23 s e l f . top = None
24 s e l f . s i z e −= 1
25 return val
26
27 e l s e : # no e l e m e n t t o pop
28 r e t u r n None
Queue and Singly Linked List with Two Pointers For queue, we need
to access the item from each side, therefore we use two pointers pointing at
the head and the tail of the singly linked list. And the linking direction is
from the head to the tail.
1 # queue with l i n k e d l i s t
2 ' ' ' head−>a−>b−> t a i l ' ' '
3 c l a s s Queue :
4 d e f __init__ ( s e l f ) :
5 s e l f . head = None
6 s e l f . t a i l = None
7 self . size = 0
8
9 # push
10 d e f enqueue ( s e l f , v a l ) :
11 node = Node ( v a l )
12 i f s e l f . head and s e l f . t a i l : # c o n n e c t top and node
13 s e l f . t a i l . next = node
14 s e l f . t a i l = node
15 else :
16 s e l f . head = s e l f . t a i l = node
17
18 s e l f . s i z e += 1
19
20 d e f dequeue ( s e l f ) :
21 i f s e l f . head :
22 v a l = s e l f . head . v a l
23 i f s e l f . head . next :
24 s e l f . head = s e l f . head . next # r e s e t top
25 else :
26 s e l f . head = None
0.4. STACK AND QUEUE xxix
27 s e l f . t a i l = None
28 s e l f . s i z e −= 1
29 return val
30
31 e l s e : # no e l e m e n t t o pop
32 r e t u r n None
Also, Python provide two built-in modules: Deque and Queue for such
purpose. We will detail them in the next section.
1 # python 3
2 import queue
3 # imp lementing queue
4 q = queue . Queue ( )
5 f o r i in range (3 , 6) :
6 q . put ( i )
1 import queue
2 # imp lementing s t a c k
3 s = queue . LifoQueue ( )
4
5 f o r i in range (3 , 6) :
6 s . put ( i )
0.4.4 Bonus
Circular Linked List and Circular Queue The circular queue is a
linear data structure in which the operation are performed based on FIFO
xxxii PYTHON DATA STRUCTURES
principle and the last position is connected back to the the first position to
make a circle. It is also called “Ring Buffer”. Circular Queue can be either
implemented with a list or a circular linked list. If we use a list, we initialize
our queue with a fixed size with None as value. To find the position of the
enqueue(), we use rear = (rear + 1)%size. Similarily, for dequeue(), we use
f ront = (f ront + 1)%size to find the next front position.
0.4.5 Exercises
Queue and Stack
Analysis: This is a typical buffer problem. If the size is larger than the
buffer, then we squeeze out the easilest data. Thus, a queue can be used to
save the t and each time, squeeze any time not in the range of [t-3000, t]:
1 c l a s s RecentCounter :
2
3 d e f __init__ ( s e l f ) :
0.5. HASH TABLE xxxiii
4 s e l f . ans = c o l l e c t i o n s . deque ( )
5
6 def ping ( s e l f , t ) :
7 """
8 : type t : i n t
9 : rtype : int
10 """
11 s e l f . ans . append ( t )
12 w h i l e s e l f . ans [ 0 ] < t −3000:
13 s e l f . ans . p o p l e f t ( )
14 r e t u r n l e n ( s e l f . ans )
Monotone Queue
Obvious applications:
Hash Set Design a HashSet without using any built-in hash table libraries.
To be specific, your design should include these functions: (705. Design
HashSet)
add ( v a l u e ) : I n s e r t a v a l u e i n t o t h e HashSet .
c o n t a i n s ( v a l u e ) : Return whether t h e v a l u e e x i s t s i n t h e HashSet
o r not .
remove ( v a l u e ) : Remove a v a l u e i n t h e HashSet . I f t h e v a l u e d o e s
not e x i s t i n t h e HashSet , do n o t h i n g .
For example:
MyHashSet h a s h S e t = new MyHashSet ( ) ;
h a s h S e t . add ( 1 ) ;
h a s h S e t . add ( 2 ) ;
hashSet . c o n t a i n s ( 1 ) ; // r e t u r n s t r u e
hashSet . c o n t a i n s ( 3 ) ; // r e t u r n s f a l s e ( not found )
h a s h S e t . add ( 2 ) ;
hashSet . c o n t a i n s ( 2 ) ; // r e t u r n s t r u e
h a s h S e t . remove ( 2 ) ;
hashSet . c o n t a i n s ( 2 ) ; // r e t u r n s f a l s e ( a l r e a d y removed )
Note: Note: (1) All values will be in the range of [0, 1000000]. (2) The
number of operations will be in the range of [1, 10000].
1 c l a s s MyHashSet :
2
3 d e f _h( s e l f , k , i ) :
4 r e t u r n ( k+i ) % 10001
5
6 d e f __init__ ( s e l f ) :
7 """
8 I n i t i a l i z e your data s t r u c t u r e h e r e .
9 """
10 s e l f . s l o t s = [ None ] ∗ 1 0 0 0 1
11 s e l f . s i z e = 10001
12
13 d e f add ( s e l f , key : ' i n t ' ) −> ' None ' :
14 i = 0
15 while i < s e l f . s i z e :
16 k = s e l f . _h( key , i )
17 i f s e l f . s l o t s [ k ] == key :
18 return
19 e l i f not s e l f . s l o t s [ k ] o r s e l f . s l o t s [ k ] == −1:
20 s e l f . s l o t s [ k ] = key
21 return
22 i += 1
23 # double s i z e
24 s e l f . s l o t s = s e l f . s l o t s + [ None ] ∗ s e l f . s i z e
25 s e l f . s i z e ∗= 2
26 r e t u r n s e l f . add ( key )
27
28
29 d e f remove ( s e l f , key : ' i n t ' ) −> ' None ' :
30 i = 0
0.5. HASH TABLE xxxv
31 while i < s e l f . s i z e :
32 k = s e l f . _h( key , i )
33 i f s e l f . s l o t s [ k ] == key :
34 s e l f . s l o t s [ k ] = −1
35 return
36 e l i f s e l f . s l o t s [ k ] == None :
37 return
38 i += 1
39 return
40
41 d e f c o n t a i n s ( s e l f , key : ' i n t ' ) −> ' b o o l ' :
42 """
43 Returns t r u e i f t h i s s e t c o n t a i n s t h e s p e c i f i e d e l e m e n t
44 """
45 i = 0
46 while i < s e l f . s i z e :
47 k = s e l f . _h( key , i )
48 i f s e l f . s l o t s [ k ] == key :
49 r e t u r n True
50 e l i f s e l f . s l o t s [ k ] == None :
51 return False
52 i += 1
53 return False
Hash Map Design a HashMap without using any built-in hash table li-
braries. To be specific, your design should include these functions: (706.
Design HashMap (easy))
• put(key, value) : Insert a (key, value) pair into the HashMap. If the
value already exists in the HashMap, update the value.
Example:
hashMap = MyHashMap ( )
hashMap . put ( 1 , 1 ) ;
hashMap . put ( 2 , 2 ) ;
hashMap . g e t ( 1 ) ; // r e t u r n s 1
hashMap . g e t ( 3 ) ; // r e t u r n s −1 ( not found )
hashMap . put ( 2 , 1 ) ; // update the e x i s t i n g value
hashMap . g e t ( 2 ) ; // r e t u r n s 1
hashMap . remove ( 2 ) ; // remove t h e mapping f o r 2
hashMap . g e t ( 2 ) ; // r e t u r n s −1 ( not found )
1 c l a s s MyHashMap :
2 d e f _h( s e l f , k , i ) :
3 r e t u r n ( k+i ) % 10001 # [ 0 , 1 0 0 0 1 ]
4 d e f __init__ ( s e l f ) :
xxxvi PYTHON DATA STRUCTURES
5 """
6 I n i t i a l i z e your data s t r u c t u r e h e r e .
7 """
8 s e l f . s i z e = 10002
9 s e l f . s l o t s = [ None ] ∗ s e l f . s i z e
10
11
12 d e f put ( s e l f , key : ' i n t ' , v a l u e : ' i n t ' ) −> ' None ' :
13 """
14 v a l u e w i l l always be non−n e g a t i v e .
15 """
16 i = 0
17 while i < s e l f . s i z e :
18 k = s e l f . _h( key , i )
19 i f not s e l f . s l o t s [ k ] o r s e l f . s l o t s [ k ] [ 0 ] i n [ key ,
−1]:
20 s e l f . s l o t s [ k ] = ( key , v a l u e )
21 return
22 i += 1
23 # d o u b l e s i z e and t r y a g a i n
24 s e l f . s l o t s = s e l f . s l o t s + [ None ] ∗ s e l f . s i z e
25 s e l f . s i z e ∗= 2
26 r e t u r n s e l f . put ( key , v a l u e )
27
28
29 d e f g e t ( s e l f , key : ' i n t ' ) −> ' i n t ' :
30 """
31 Returns t h e v a l u e t o which t h e s p e c i f i e d key i s mapped ,
o r −1 i f t h i s map c o n t a i n s no mapping f o r t h e key
32 """
33 i = 0
34 while i < s e l f . s i z e :
35 k = s e l f . _h( key , i )
36 i f not s e l f . s l o t s [ k ] :
37 r e t u r n −1
38 e l i f s e l f . s l o t s [ k ] [ 0 ] == key :
39 return s e l f . s l o t s [ k ] [ 1 ]
40 e l s e : # i f i t s d e l e t e d keep p r o b i n g
41 i += 1
42 r e t u r n −1
43
44
45 d e f remove ( s e l f , key : ' i n t ' ) −> ' None ' :
46 """
47 Removes t h e mapping o f t h e s p e c i f i e d v a l u e key i f t h i s
map c o n t a i n s a mapping f o r t h e key
48 """
49 i = 0
50 while i < s e l f . s i z e :
51 k = s e l f . _h( key , i )
52 i f not s e l f . s l o t s [ k ] :
53 return
54 e l i f s e l f . s l o t s [ k ] [ 0 ] == key :
55 s e l f . s l o t s [ k ] = ( −1 , None )
0.5. HASH TABLE xxxvii
56 return
57 e l s e : # i f i t s d e l e t e d keep p r o b i n g
58 i += 1
59 return
Python 2.X VS Python 3.X In Python 2X, we can use slice to access
keys() or items() of the dictionary. However, in Python 3.X, the same syn-
tax will give us TypeError: ’dict_keys’ object does not support indexing.
Instead, we need to use function list() to convert it to list and then slice it.
For example:
1 # Python 2 . x
2 d i c t . keys ( ) [ 0 ]
3
4 # Python 3 . x
5 l i s t ( d i c t . keys ( ) ) [ 0 ]
set Data Type Method Description Python Set remove() Removes El-
ement from the Set Python Set add() adds element to a set Python Set
copy() Returns Shallow Copy of a Set Python Set clear() remove all ele-
ments from a set Python Set difference() Returns Difference of Two Sets
Python Set difference_update() Updates Calling Set With Intersection of
Sets Python Set discard() Removes an Element from The Set Python Set
intersection() Returns Intersection of Two or More Sets Python Set inter-
section_update() Updates Calling Set With Intersection of Sets Python Set
isdisjoint() Checks Disjoint Sets Python Set issubset() Checks if a Set is
Subset of Another Set Python Set issuperset() Checks if a Set is Superset of
Another Set Python Set pop() Removes an Arbitrary Element Python Set
symmetric_difference() Returns Symmetric Difference Python Set symmet-
ric_difference_update() Updates Set With Symmetric Difference Python
xxxviii PYTHON DATA STRUCTURES
Set union() Returns Union of Sets Python Set update() Add Elements to
The Set.
If we want to put string in set, it should be like this:
1 >>> a = s e t ( ' a a r d v a r k ' )
2 >>>
3 { 'd ' , 'v ' , 'a ' , ' r ' , 'k '}
4 >>> b = { ' a a r d v a r k ' }# o r s e t ( [ ' a a r d v a r k ' ] ) , c o n v e r t a l i s t o f
s t r i n g s to s e t
5 >>> b
6 { ' aardvark ' }
7 #o r put a t u p l e i n t h e s e t
8 a =s e t ( [ t u p l e ] ) o r { ( t u p l e ) }
Compare also the difference between and set() with a single word argument.
dict Data Type Method Description clear() Removes all the elements
from the dictionary copy() Returns a copy of the dictionary fromkeys()
Returns a dictionary with the specified keys and values get() Returns the
value of the specified key items() Returns a list containing a tuple for each
key value pair keys() Returns a list containing the dictionary’s keys pop()
Removes the element with the specified key and return value popitem()
Removes the last inserted key-value pair setdefault() Returns the value of
the specified key. If the key does not exist: insert the key, with the specified
value update() Updates the dictionary with the specified key-value pairs
values() Returns a list of all the values in the dictionary
See using cases at https://www.programiz.com/python-programming/
dictionary.
Collection Module
OrderedDict Standard dictionaries are unordered, which means that any
time you loop through a dictionary, you will go through every key, but you
are not guaranteed to get them in any particular order. The OrderedDict
from the collections module is a special type of dictionary that keeps track
of the order in which its keys were inserted. Iterating the keys of an ordered-
Dict has predictable behavior. This can simplify testing and debugging by
making all the code deterministic.
5 d i c t [ key ] += 1
The defaultdict class from the collections module simplifies this process by
pre-assigning a default value when a key does not present. For different value
type it has different default value, for example, for int, it is 0 as the default
value. A defaultdict works exactly like a normal dict, but it is initialized
with a function (“default factory”) that takes no arguments and provides
the default value for a nonexistent key. Therefore, a defaultdict will never
raise a KeyError. Any key that does not exist gets the value returned by
the default factory. For example, the following code use a lambda function
and provide ’Vanilla’ as the default value when a key is not assigned and
the second code snippet function as a counter.
1 from c o l l e c t i o n s import d e f a u l t d i c t
2 ice_cream = d e f a u l t d i c t ( lambda : ' V a n i l l a ' )
3 ice_cream [ ' Sarah ' ] = ' Chunky Monkey '
4 ice_cream [ ' Abdul ' ] = ' B u t t e r Pecan '
5 p r i n t ice_cream [ ' Sarah ' ]
6 # Chunky Monkey
7 p r i n t ice_cream [ ' Joe ' ]
8 # Vanilla
1 from c o l l e c t i o n s import d e f a u l t d i c t
2 dict = d e f a u l t d i c t ( int ) # default value f o r int i s 0
3 d i c t [ ' c o u n t e r ' ] += 1
Counter
0.5.3 Exercises
1. 349. Intersection of Two Arrays (easy)
Answer: Use hashmap simply Set of tuple to save the corresponding sending
exmail address: local name and domain name:
1 class Solution :
2 d e f numUniqueEmails ( s e l f , e m a i l s ) :
3 """
4 : type e m a i l s : L i s t [ s t r ]
5 : rtype : int
6 """
7 i f not e m a i l s :
8 return 0
9 num = 0
10 handledEmails = s e t ( )
11 f o r email in emails :
12 local_name , domain_name = e m a i l . s p l i t ( '@ ' )
13 local_name = local_name . s p l i t ( '+ ' ) [ 0 ]
14 local_name = local_name . r e p l a c e ( ' . ' , ' ' )
15 h a n d l e d E m a i l s . add ( ( local_name , domain_name ) )
16 return l e n ( handledEmails )
presentation and implementation of the graph, but rather defer the searching
strategies to the principle part. Searching strategies in the graph makes a
starting point in algorithmic problem solving, knowing and analyzing these
strategies in details will make an independent chapter as a problem solving
principle.
0.6.1 Introduction
Graph representations need to show users full information to the graph itself,
G = (V, E), including its vertices, edges, and its weights to distinguish either
it is directed or undirected, weighted or unweighted. There are generally
four ways: (1) Adjacency Matrix, (2) Adjacency List, (3) Edge List, and (4)
optionally, Tree Structure, if the graph is a free tree. Each will be preferred
to different situations. An example is shown in Fig 3.
Adjacency Matrix
An adjacency matrix of a graph is a 2-D matrix of size |V | × |V |: each
dimension, row and column, is vertex-indexed. Assume our matrix is am, if
there is an edge between vertices 3,4, and if its unweighted graph, we mark
it by setting am[3][4]=1, we do the same for all edges and leaving all other
spots in the matrix zero-valued. For undirected graph, it will be a symmetric
matrix along the main diagonal as shown in A of Fig. 3; the matrix is its own
transpose: am = amT . We can choose to store only the entries on and above
the diagonal of the matrix, thereby cutting the memory need in half. For
unweighted graph, typically our adjacency matrix is zero-and-one valued.
For a weighted graph, the adjacency matrix becomes a weight matrix, with
xlii PYTHON DATA STRUCTURES
w(i, j) to denote the weight of edge (i, j); the weight can be both negative
or positive or even zero-valued in practice, thus we might want to figure out
how to distinguish the non-edge relation from the edge relation when the
situation arises.
The Python code that implements the adjacency matrix for the graph
in the example is:
am = [ [ 0 ] ∗ 7 f o r _ i n r a n g e ( 7 ) ]
# set 8 edges
am [ 0 ] [ 1 ] = am [ 1 ] [ 0 ] = 1
am [ 0 ] [ 2 ] = am [ 2 ] [ 0 ] = 1
am [ 1 ] [ 2 ] = am [ 2 ] [ 1 ] = 1
am [ 1 ] [ 3 ] = am [ 3 ] [ 1 ] = 1
am [ 2 ] [ 4 ] = am [ 4 ] [ 2 ] = 1
am [ 3 ] [ 4 ] = am [ 4 ] [ 3 ] = 1
am [ 4 ] [ 5 ] = am [ 5 ] [ 4 ] = 1
am [ 5 ] [ 6 ] = am [ 6 ] [ 5 ] = 1
Applications Adjacency matrix usually fits well to the dense graph where
the edges are close to |V |2 , leaving a small ratio of the matrix be blank
and unused. Checking if an edge exists between two vertices takes only
O(1). However, an adjacency matrix requires exactly O(V ) to enumerate
the the neighbors of a vertex v–an operation commonly used in many graph
algorithms–even if vertex v only has a few neighbors. Moreover, when the
graph is sparse, an adjacency matrix will be both inefficient in the space
and iteration cost, a better option is adjacency list.
Adjacency List
An adjacency list is a more compact and space efficient form of graph repre-
sentation compared with the above adjacency matrix. In adjacency list, we
have a list of V vertices which is vertex-indexed, and for each vertex v we
store anther list of neighboring nodes with their vertex as the value, which
can be represented with an array or linked list. For example, with adjacency
list as [[1, 2, 3], [3, 1], [4, 6, 1]], node 0 connects to 1,2,3, node 1 connect to 3,1,
node 2 connects to 4,6,1.
In Python, We can use a normal 2-d array to represent the adjacent list,
for the same graph in the example, it as represented with the following code:
al = [ [ ] f o r _ in range (7) ]
# set 8 edges
al [ 0 ] = [1 , 2]
al [ 1 ] = [2 , 3]
al [ 2 ] = [0 , 4]
al [ 3 ] = [1 , 4]
al [ 4 ] = [2 , 3 , 5]
0.6. GRAPH REPRESENTATIONS xliii
al [ 5 ] = [4 , 6]
al [ 6 ] = [ 5 ]
Edge List
The edge list is a list of edges (one-dimensional), where the index of the list
does not relate to vertex and each edge is usually in the form of (starting
vertex, ending vertex, weight). We can use either a list or a tuple to
represent an edge. The edge list representation of the example is given:
el = []
el . e x t en d ( [ [ 0 , 1] , [1 , 0]])
el . e x t en d ( [ [ 0 , 2] , [2 , 0]])
el . e x t en d ( [ [ 1 , 2] , [2 , 1]])
el . e x t en d ( [ [ 1 , 3] , [3 , 1]])
el . e x t en d ( [ [ 3 , 4] , [4 , 3]])
el . e x t en d ( [ [ 2 , 4] , [4 , 2]])
el . e x t en d ( [ [ 4 , 5] , [5 , 4]])
el . e x t en d ( [ [ 5 , 6] , [6 , 5]])
Applications Edge list is not widely used as the AM and AL, and usually
only be needed in a subrountine of algorithm implementation–such as in
Krukal’s algorithm to fine Minimum Spanning Tree(MST)–where we might
need to order the edges by its weight.
Tree Structure
If the connected graph has no cycle and the edges E = V − 1, which is
essentially a tree. We can choose to represent it either one of the three
representations. Optionally, we can use the tree structure is formed as rooted
tree with nodes which has value and pointers to its children. We will see
later how this type of tree is implemented in Python.
Weighted Graph If we need weights for each edge, we can use two-
dimensional dictionary. We use 10 as a weight to all edges just to demon-
strate.
1 dw = d e f a u l t d i c t ( d i c t )
2 f o r v1 , v2 i n e l :
3 vn1 = c h r ( v1 + ord ( ' a ' ) )
4 vn2 = c h r ( v2 + ord ( ' a ' ) )
5 dw [ vn1 ] [ vn2 ] = 10
6 p r i n t (dw)
We can access the edge and its weight through dw[v1][v2]. The output of
this structure is given:
d e f a u l t d i c t (< c l a s s ' d i c t ' > , { ' a ' : { ' b ' : 1 0 , ' c ' : 1 0 } , ' b ' : { ' a ' :
10 , ' c ' : 10 , 'd ' : 10} , ' c ' : { ' a ' : 10 , 'b ' : 10 , ' e ' : 10} , 'd
' : { 'b ' : 10 , ' e ' : 10} , ' e ' : { 'd ' : 10 , ' c ' : 10 , ' f ' : 10} , ' f ' :
{ ' e ' : 10 , ' g ' : 10} , ' g ' : { ' f ' : 10}})
0.7. TREE DATA STRUCTURES xlv
Binary Tree Node In a binary tree, the children pointers will at at most
two pointers, which we define as left and right. The binary tree node is
defined as:
1 c l a s s BinaryNode :
2 d e f __init__ ( s e l f , v a l ) :
3 s e l f . l e f t = None
4 s e l f . r i g h t = None
5 s e l f . val = val
N-ary Tree Node For N-ary node, when we initialize the length of the
node’s children with additional argument n.
1 c l a s s NaryNode :
2 d e f __init__ ( s e l f , n , v a l ) :
3 s e l f . c h i l d r e n = [ None ] ∗ n
4 s e l f . val = val
Construct A Tree Now that we have defined the tree node, the process
of constructing a tree in the figure will be a series of operations:
1
/ \
2 3
/ \ \
4 5 6
xlvi PYTHON DATA STRUCTURES
1 r o o t = BinaryNode ( 1 )
2 l e f t = BinaryNode ( 2 )
3 r i g h t = BinaryNode ( 3 )
4 root . l e f t = l e f t
5 root . right = right
6 l e f t . l e f t = BinaryNode ( 4 )
7 l e f t . r i g h t = BinaryNode ( 5 )
8 r i g h t . r i g h t = BinaryNode ( 6 )
Now, we call this function, and pass it with out input array:
1 nums = [ 1 , 2 , 3 , 4 , 5 , None , 6 ]
2 r o o t = c o n s t r u c t T r e e ( nums , 0 )
0.7. TREE DATA STRUCTURES xlvii
In the next section, we discuss tree traversal methods, and we will use those
methods to print out the tree we just build.
Example 1 :
xlviii PYTHON DATA STRUCTURES
Input : r o o t = [ 3 , 5 , 1 , 6 , 2 , 0 , 8 , n u l l , n u l l , 7 , 4 ] , p = 5 , q = 1
Output : 3
E x p l a n a t i o n : The LCA o f o f nodes 5 and 1 i s 3 .
Example 2 :
Input : r o o t = [ 3 , 5 , 1 , 6 , 2 , 0 , 8 , n u l l , n u l l , 7 , 4 ] , p = 5 , q = 4
Output : 5
E x p l a n a t i o n : The LCA o f nodes 5 and 4 i s 5 , s i n c e a node
can be a d e s c e n d a n t o f i t s e l f
a c c o r d i n g t o t h e LCA d e f i n i t i o n .
Solution: Divide and Conquer. There are two cases for LCA: 1)
two nodes each found in different subtree, like example 1. 2) two nodes
are in the same subtree like example 2. If we compare the current node
with the p and q, if it equals to any of them, return current node in
the tree traversal. Therefore in example 1, at node 3, the left return
as node 5, and the right return as node 1, thus node 3 is the LCA.
In example 2, at node 5, it returns 5, thus for node 3, the right tree
would have None as return, thus it makes the only valid return as the
final LCA. The time complexity is O(n).
1 d e f lowestCommonAncestor ( s e l f , r o o t , p , q ) :
2 """
3 : type r o o t : TreeNode
4 : type p : TreeNode
5 : type q : TreeNode
6 : r t y p e : TreeNode
7 """
8 i f not r o o t :
9 r e t u r n None
10 i f r o o t == p o r r o o t == q :
11 r e t u r n r o o t # found one v a l i d node ( c a s e 1 : s t o p a t
5 , 1 , case 2 : stop at 5)
12 l e f t = s e l f . lowestCommonAncestor ( r o o t . l e f t , p , q )
13 r i g h t = s e l f . lowestCommonAncestor ( r o o t . r i g h t , p , q )
14 i f l e f t i s not None and r i g h t i s not None : # p , q i n
the subtree
15 return root
16 i f any ( [ l e f t , r i g h t ] ) i s not None :
17 r e t u r n l e f t i f l e f t i s not None e l s e r i g h t
18 r e t u r n None
0.8 Heap
count = Counter(nums)
Heap is a tree based data structure that satisfies the heap ordering prop-
erty. The ordering can be one of two types:
• the min-heap property: the value of each node is greater than or equal
(≥) to the value of its parent, with the minimum-value element at the
0.8. HEAP xlix
root.
• the max-heap property: the value of each node is less than or equal to
(≤) the value of its parent, with the maximum-value element at the
root.
Figure 4: Max-heap be visualized with binary tree structure on the left, and
be implemented with Array on the right.
• and its parent is located at k/2 index (In Python3, use integer division
n//2).
Figure 5: A Min-heap.
• if parent is smaller than the added item, no action needed and the
traversal is terminated, e.g. adding item 18 will lead to no action.
• otherwise, swap the item with the parent, and set the node to its
parent so that it can keep traverse.
0.8. HEAP li
Each step we fix the heap ordering property for a substree. The time com-
plexity is the same as the height of the complete tree, which is O(log n).
To generalize the process, a _float() function is first implemented which
enforce min heap ordering property on the path from a given index to the
root.
1 d e f _ f l o a t ( idx , heap ) :
2 w h i l e i d x // 2 :
3 p = i d x // 2
4 # Violation
5 i f heap [ i d x ] < heap [ p ] :
6 heap [ i d x ] , heap [ p ] = heap [ p ] , heap [ i d x ]
7 else :
8 break
9 idx = p
10 return
• if the item is any other node instead of root, say node 7 in our example.
The process is exactly the same. We move 12 to node 7’s position.
By comparing 12 with children 10 and 15, 10 and 12 is about to be
swapped. With this, the heap ordering property is sustained.
We first use a function _sink to implement the percolation down part
of the operation.
lii PYTHON DATA STRUCTURES
Figure 6: Left: delete node 5, and move node 12 to root. Right: 6 is the
smallest among 12, 6, and 7, swap node 6 with node 12.
Min-Heap Given the exemplary list a = [21, 1, 45, 78, 3, 5], we call the
function heapify() to convert it to a min-heap.
1 from heapq import heappush , heappop , h e a p i f y
2 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3 heapify (h)
The heapified result is h = [1, 3, 5, 78, 21, 45]. Let’s try heappop and heappush:
1 heappop ( h )
2 heappush ( h , 1 5 )
2
https://docs.python.org/3.0/library/heapq.html
0.8. HEAP lv
The print out of ab directly can only give us a generator object with its
address in the memory:
1 <g e n e r a t o r o b j e c t merge a t 0 x7 fd c9 3b 389 e8 >
We can use list comprehension and iterate through ab to save the sorted
array in a list:
1 a b _ l s t = [ n f o r n i n ab ]
lvi PYTHON DATA STRUCTURES
However, if we have multiple tasks that having the same priority, the relative
order of these tied tasks can not be sustained. This is because the list items
are compared with the whole list as key: it first compare the first item,
whenever there is a tie, it compares the next item. For example, when our
example has multiple items with 3 as the first value in the list.
1 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
2 heapify (h)
The printout indicates that the relative ordering of items [3, ’e’], [3, ’d’], [3,
’a’] is not kept:
0.9. PRIORITY QUEUE lvii
1 [[3 , 'a ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , ' e ' ] ]
Keeping the relative order of tasks with same priority is a requirement for
priority queue abstract data structure. We will see at the next section how
priority queue can be implemented with heapq.
Modify Items in heapq In the heap, we can change the value of any
item just as what we can in the list. However, the violation of heap ordering
property occurs after the change so that we need a way to fix it. We have
the following two private functions to use according to the case of change:
• _siftdown(heap, startpos, pos): pos is where the where the new
violation is. startpos is till where we want to restore the heap in-
variant, which is usually set to 0. Because in _siftdown() it goes
backwards to compare this node with the parents, we can call this
function to fix when an item’s value is decreased.
3. If two items have the same priority, they are served according to their
order in the queue.
1. CPU Scheduling,
• Sort stability: when we get two tasks with equal priorities, we return
them in the same order as they were originally added. A potential
solution is to modify the original 2-element list [priority, task]
into a 3-element list as [priority, count, task]. list is preferred
because tuple does not allow item assignment. The entry count in-
dicates the original order of the task in the list, which serves as a
tie-breaker so that two tasks with the same priority are returned in
the same order as they were added to preserve the sort stability. Also,
since no two entry counts are the same so that in the tuple comparison
the task will never be directly compared with the other. For example,
use the same example as in the last section:
1 import i t e r t o o l s
2 c o u n t e r = i t e r t o o l s . count ( )
3 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
4 h = [ [ p , next ( c o u n t e r ) , t ] f o r p , t i n h ]
usage of a dictionary where task serves as key and the the 3-element
list as value possible (for a list, the value is just a pointer pointing to
the starting position of the list). With the dictionary to help search,
the time cost is thus decreased to constant. We name this dictionary
here as entry_finder. Now, with we modify the previous code. The
following code shows how to add items into a heap that associates with
entry_finder:
1 # A heap a s s o c i a t e d with e n t r y _ f i n d e r
2 c o u n t e r = i t e r t o o l s . count ( )
3 e n t r y _ f i n d e r = {}
4 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
5 heap = [ ]
6 for p , t in h :
7 item = [ p , next ( c o u n t e r ) , t ]
8 heap . append ( item )
9 e n t r y _ f i n d e r [ t ] = item
10 h e a p i f y ( heap )
8 w h i l e not pq . empty ( ) :
9 p r o c e s s _ o r d e r . append ( pq . g e t ( ) )
Customized Object If we want the higher the value is the higher priority,
we demonstrate how to do so with a customized object with two compar-
ison operators: < and == in the class with magic functions __lt__() and
__eq__(). The code is as:
1 c l a s s Job ( ) :
2 d e f __init__ ( s e l f , p r i o r i t y , t a s k ) :
3 self . priority = priority
4 s e l f . task = task
5 return
6
7 d e f __lt__( s e l f , o t h e r ) :
8 try :
9 return s e l f . p r i o r i t y > other . p r i o r i t y
10 except AttributeError :
11 r e t u r n NotImplemented
12 d e f __eq__( s e l f , o t h e r ) :
13 try :
14 r e t u r n s e l f . p r i o r i t y == o t h e r . p r i o r i t y
15 except AttributeError :
16 r e t u r n NotImplemented
Hands-on Example
Top K Frequent Elements (L347, medium) Given a non-empty array
of integers, return the k most frequent elements.
lxii PYTHON DATA STRUCTURES
Example 1 :
Input : nums = [ 1 , 1 , 1 , 2 , 2 , 3 ] , k = 2
Output : [ 1 , 2 ]
Example 2 :
Input : nums = [ 1 ] , k = 1
Output : [ 1 ]
Analysis: We first using a hashmap to get information as: item and its
frequency. Then, the problem becomes obtaining the top k most frequent
items in our counter: we can either use sorting or use heap. Our exemplary
code here is for the purpose of getting familiar with related Python modules.
0.10 Bonus
Fibonacci heap With fibonacc heap, insert() and getHighestPriority()
can be implemented in O(1) amortized time and deleteHighestPriority()
can be implemented in O(Logn) amortized time.
0.11. EXERCISES lxiii
0.11 Exercises
selection with key word: kth. These problems can be solved by
sorting, using heap, or use quickselect