0% found this document useful (0 votes)
2 views

Python_Data_Structure

The document provides an overview of Python data structures, comparing them with C++ data structures and detailing built-in data types such as scalar types, sequences, and mappings. It explains the properties, built-in operations, and functions of various data types including strings, lists, and ranges, emphasizing their characteristics and usage in programming. Additionally, it covers methods for manipulating strings and lists, highlighting their mutability and performance in Python.

Uploaded by

anirudhwork02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Python_Data_Structure

The document provides an overview of Python data structures, comparing them with C++ data structures and detailing built-in data types such as scalar types, sequences, and mappings. It explains the properties, built-in operations, and functions of various data types including strings, lists, and ranges, emphasizing their characteristics and usage in programming. Additionally, it covers methods for manipulating strings and lists, highlighting their mutability and performance in Python.

Uploaded by

anirudhwork02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Python Data Structures

Li Yin1

November 14, 2019

1
www.liyinscience.com
ii
Python Data Structures

0.1 Introduction
Python is object-oriented programming language where each object is im-
plemented using C++ in the backend. The built-in data types of C++ fol-
lows more rigidly to the abstract data structures. We would get by just
learning how to use Python data types alone: its property–immutable or
mutable, its built-in in-place operations–such as append(), insert(),
add(), remove(), replace() and so, and built-in functions and opera-
tions that offers additional ability to manipulate data structure–an object
here. However, some data types’ behaviors might confuse us with abstract
data structures, making it hard to access and evaluate its efficiency.
In this chapter and the following three chapters, we starts to learn
Python data structures by relating its C++ data structures to our learned
abstract data structures, and then introduce each’s property, built-in opera-
tions, built-in functions and operations. Please read the section Understand-
ing Object in the Appendix–Python Knowledge Base to to study the properties
of Built-in Data Types first if Python is not your familiar language.

Python Built-in Data Types In Python 3, we have four built-in scalar


data types: int, float, complex, bool. At higher level, it includes four
sequence types: str–string type, list, tuple, and range; one mapping
type: dict and two set types: set and fronzenset. Among these 12 built-
in data types, other than the scalar types, the others representing some of
our introduced abstract data structures.

Abstract Data Types with Python Data Types/Modules To relate


the abstract data types to our build-in data types we have:

• Sequence type corresponds to Array data structure: includes string,


list, tuple, and range

• dict, set, and fronzenset mapps to the hash tables.

iii
iv PYTHON DATA STRUCTURES

• For linked list, stack, queue, we either need to implement it with build-
in data types or we have Python Modules.

0.2 Array and Python Sequence


We will see from other remaining contents of this part that how array-based
Python data structures are used to implement the other data structures. On
the LeetCode, these two data structures are involved into 25% of LeetCode
Problems.

0.2.1 Introduction to Python Sequence


In Python, sequences are defined as ordered sets of objects indexed by non-
negative integers, we use index to refer and in Python it defaultly starts
at 0. Sequence types are iterable. Iterables are able to be iterated over.
Iterators are the agents that perform the iteration, where we have iter()
built-in function.

• string is a sequence of characters, it is immutable, and with static


array as its backing data structure in C++.

• list and tuple are sequences of arbitrary objects.–meaning it ac-


cepts different types of objects including the 12 built-in data types
and any other objects. This sounds fancy and like magic! However,
it does not change the fact that its backing abstract data structure
is dynamic array. They are able to have arbitrary type of objects
through the usage of pointers to objects, pointing to object’s physical
location, and each pointer takes fixed number of bytes in space (in
32-bit system, 4 bytes, and for a 64-bit system, 8 bytes instead).

• range: In Python 3, range() is a type. But range does not have


backing array data structure to save a sequence of value, it computes
on demand. Thus we will first introduce range and get done with it
before we focus on other sequence types.
1 >>> type ( r a n g e )
2 < c l a s s ' type '>

All these sequence type data structures share the most common methods
and operations shown in Table 4 and 5. To note that in Python, the indexing
starts from 0.
Let us examine each type of sequence further to understand its perfor-
mance, and relation to array data structures.
0.2. ARRAY AND PYTHON SEQUENCE v

0.2.2 Range
Range Syntax
The range object has three attributes: start, stop, step, and a range
object can be created as range(start, stop, step. These attributes need
to integers–both negative and positive works–to define a range, which will
be [start, stop). The default value for start and stop is 0. For example:
1 >>> a = r a n g e ( 1 0 )
2 >>> b = r a n g e ( 0 , 1 0 , 2 )
3 >>> a , b
4 ( range (0 , 10) , range (0 , 10 , 2) )

Now, we print it out:


1 >>> f o r i i n a :
2 ... p r i n t ( i , end= ' ' )
3 ...
4 0 1 2 3 4 5 6 7 8 9

And for b, it will be:


1 >>> f o r i i n b :
2 ... p r i n t ( i , end= ' ' )
3 ...
4 0 2 4 6 8

Like any other sequence types, range is iterable, can be indexed and sliced.

What you do not see


The range object might be a little bizarre when we first learn it. Is it an
iterator, a generator? The answer to both questions are NO. What is it then?
It is more like a sequence type that differs itself without other counterparts
with its own unique properties:

• It is “lazy” in the sense that it doesn’t generate every number that it


“contain” when we create it. Instead it gives those numbers to us as
we need them when looping over it. Thus, it saves us space:
1 >>> a = r a n g e ( 1 _000_000 )
2 >>> b = [ i f o r i i n a ]
3 >>> a . __sizeof__ ( ) , b . __sizeof__ ( )
4 (48 , 8697440)

This is just how we define the behavior of the range class back in the
C++ code. We does not need to save all integers in the range, but be
generated with function that specifically asks for it.

• It is not an iterator; it won’t get consumed. We can iterate it multiple


times. This is understandable given how it is implemented.
vi PYTHON DATA STRUCTURES

0.2.3 String
String is static array and its items are just characters, represented using
ASCII or Unicode 1 . String is immutable which means once its created we
can no longer modify its content or extent its size. String is more compact
compared with storing the characters in list because of its backing array
wont be assigned to any extra space.

String Syntax
strings can be created in Python by wrapping a sequence of characters in
single or double quotes. Multi-line strings can easily be created using three
quote characters.

New a String We specially introduce some commonly and useful func-


tions.

Join The str.join() method will concatenate two strings, but in a way
that passes one string through another. For example, we can use the
str.join() method to add whitespace to that string, which we can do
like so:
1 b a l l o o n = "Sammy has a b a l l o o n . "
2 print ( " " . join ( balloon ) )
3 #Ouput
4 S a mm y h a s a b a l l o o n .

The str.join() method is also useful to combine a list of strings into a


new single string.
1 print ( " , " . join ( [ "a" , "b" , " c " ] ) )
2 #Ouput
3 abc

Split Just as we can join strings together, we can also split strings up using
the str.split() method. This method separates the string by whitespace
if no other parameter is given.
1 print ( balloon . s p l i t () )
2 #Ouput
3 [ 'Sammy ' , ' has ' , ' a ' , ' b a l l o o n . ' ]

We can also use str.split() to remove certain parts of an original string. For
example, let’s remove the letter ’a’ from the string:
In Python 3, all strings are represented in Unicode. In Python 2 are stored internally
1

as 8-bit ASCII, hence it is required to attach ’u’ to make it Unicode. It is no longer


necessary now.
0.2. ARRAY AND PYTHON SEQUENCE vii

1 print ( balloon . s p l i t ( "a" ) )


2 #Ouput
3 [ ' S ' , 'mmy h ' , ' s ' , ' b ' , ' l l o o n . ' ]

Now the letter a has been removed and the strings have been separated
where each instance of the letter a had been, with whitespace retained.

Replace The str.replace() method can take an original string and re-
turn an updated string with some replacement.
Let’s say that the balloon that Sammy had is lost. Since Sammy no
longer has this balloon, we will change the substring "has" from the original
string balloon to "had" in a new string:
1 p r i n t ( b a l l o o n . r e p l a c e ( " has " , " had " ) )
2 #Ouput
3 Sammy had a b a l l o o n .

We can use the replace method to delete a substring:


1 b a l l o n . r e p l a c e ( " has " , ' ' )

Using the string methods str.join(), str.split(), and str.replace()


will provide you with greater control to manipulate strings in Python.

Conversion between Integer and Character Function ord() would


get the int value (ASCII) of the char. And in case you want to convert back
after playing with the number, function chr() does the trick.
1 p r i n t ( ord ( 'A ' ) )# Given a s t r i n g o f l e n g t h one , r e t u r n an i n t e g e r
r e p r e s e n t i n g t h e Unicode code p o i n t o f t h e c h a r a c t e r when
t h e argument i s a u n i c o d e o b j e c t ,
2 p r i n t ( chr (65) )

String Functions
Because string is one of the most fundamental built-in data types, this makes
managing its built-in common methods shown in Table 1 and 2 necessary.
Use boolean methods to check whether characters are lower case, upper case,
or title case, can help us to sort our data appropriately, as well as provide
us with the opportunity to standardize data we collect by checking and then
modifying strings as needed.

0.2.4 List
The underlying abstract data structure of list data types is dynamic
array, meaning we can add, delete, modify items in the list. It supports
random access by indexing. List is the most widely one among sequence
types due to its mutability.
Even if list supports data of arbitrary types, we do not prefer to do this.
Use tuple or namedtuple for better practice and offers better clarification.
viii PYTHON DATA STRUCTURES

Table 1: Common Methods of String


Method Description
count(substr, [start, end]) Counts the occurrences of a substring with op-
tional start and end position
find(substr, [start, end]) Returns the index of the first occurrence of a
substring or returns -1 if the substring is not
found
join(t) Joins the strings in sequence t with current
string between each item
lower()/upper() Converts the string to all lowercase or upper-
case
replace(old, new) Replaces old substring with new substring
strip([characters]) Removes withspace or optional characters
split([characters], [maxsplit]) Splits a string separated by whitespace or an
optional separator. Returns a list
expandtabs([tabsize]) Replaces tabs with spaces.

Table 2: Common Boolean Methods of String


Boolean Method Description
isalnum() String consists of only alphanumeric charac-
ters (no symbols)
isalpha() String consists of only alphabetic characters
(no symbols)
islower() String’s alphabetic characters are all lower
case
isnumeric() String consists of only numeric characters
isspace() String consists of only whitespace characters
istitle() String is in title case
isupper() String’s alphabetic characters are all upper
case

What You see: List Syntax


New a List: We have multiple ways to new either empty list or with
initialized data. List comprehension is an elegant and concise way to create
new list from an existing list in Python.
1 # new an empty l i s t
2 lst = []
3 l s t 2 = [ 2 , 2 , 2 , 2 ] # new a l i s t with i n i t i a l i z a t i o n
4 lst3 = [3]∗5 # new a l i s t s i z e 5 with 3 a s i n i t i a l i z a t i o n
5 print ( lst , lst2 , lst3 )
6 # output
7 # [ ] [2 , 2 , 2 , 2] [3 , 3 , 3 , 3 , 3]

We can use list comprehension and use enumerate function to loop


over its items.
1 lst1 = [3]∗5 # new a l i s t s i z e 5 with 3 a s i n i t i a l i z a t i o n
0.2. ARRAY AND PYTHON SEQUENCE ix

2 l s t 2 = [ 4 f o r i in range (5) ]
3 f o r idx , v i n enumerate ( l s t 1 ) :
4 l s t 1 [ i d x ] += 1

Search We use method list.index() to obtain the index of the searched


item.
1 p r i n t ( l s t . i n d e x ( 4 ) ) #f i n d 4 , and r e t u r n t h e i n d e x
2 # output
3 # 3

If we print(lst.index(5)) will raise ValueError: 5 is not in list. Use the


following code instead.
1 i f 5 in l s t :
2 p r i n t ( l s t . index (5) )

Add Item We can add items into list through insert(index, value)–
inserting an item at a position in the original list or list.append(value)–
appending an item at the end of the list.
1 # INSERTION
2 l s t . i n s e r t ( 0 , 1 ) # i n s e r t an e l e m e n t a t i n d e x 0 , and s i n c e i t i s
empty l s t . i n s e r t ( 1 , 1 ) has t h e same e f f e c t
3 print ( l s t )
4
5 l s t 2 . i n s e r t (2 , 3)
6 print ( lst2 )
7 # output
8 # [1]
9 # [2 , 2 , 3 , 2 , 2]
10 # APPEND
11 f o r i in range (2 , 5) :
12 l s t . append ( i )
13 print ( l s t )
14 # output
15 # [1 , 2 , 3 , 4]

Delete Item
Get Size of the List We can use len built-in function to find out the
number of items storing in the list.
1 print ( len ( lst2 ) )
2 # 4

What you do not see: Understand List

To understand list, we need start with its C++ implementation, we do not


introduce the C++ source code, but instead use function to access and
evaluate its property.
x PYTHON DATA STRUCTURES

List Object and Pointers In a 64-bits (8 bytes) system, such as in


Google Colab, a pointer is represented with 8 bytes space. In Python3, the
list object itself takes 64 bytes in space. And any additional element takes
8 bytes. In Python, we can use getsizeof() from sys module to get its
memory size, for example:
1 lst_lst = [[] , [1] , [ '1 ' ] , [1 , 2] , [ '1 ' , '2 ' ] ]

And now, let us get the memory size of lst_lst and each list item in this
list.
1 import s y s
2 for l s t in l s t _ l s t :
3 p r i n t ( s y s . g e t s i z e o f ( l s t ) , end= ' ' )
4 print ( sys . g e t s i z e o f ( l s t _ l s t ) )

The output is:


64 72 72 80 80 104

We can see a list of integers takes the same memory size as of a list of strings
with equal length.

insert and append Whenever insert and append is called, and assume
the original length is n, Python could compare n + 1 with its allocated
length. If you append or insert to a Python list and the backing array isn’t
big enough, the backing array must be expanded. When this happens, the
backing array is grown by approximately 12% the following formula (comes
from C++):
1 n e w _ a l l o c a t e d = ( s i z e _ t ) n e w s i z e + ( n e w s i z e >> 3 ) +
2 ( newsize < 9 ? 3 : 6) ;

Do an experiment, we can see how it works. Here we use id() function to


obtain the pointer’s physical address. We compare the size of the list and
its underlying backing array’s real additional size in space (with 8 bytes as
unit).
1 a = []
2 f o r s i z e in range (17) :
3 a . in se rt (0 , s i z e )
4 p r i n t ( ' s i z e : ' , l e n ( a ) , ' b y t e s : ' , ( s y s . g e t s i z e o f ( a ) −64) / / 8 , ' i d
: ' , id (a) )

The output is:


size : 1 bytes : 4 id : 140682152394952
size : 2 bytes : 4 id : 140682152394952
size : 3 bytes : 4 id : 140682152394952
size : 4 bytes : 4 id : 140682152394952
size : 5 bytes : 8 id : 140682152394952
size : 6 bytes : 8 id : 140682152394952
size : 7 bytes : 8 id : 140682152394952
size : 8 bytes : 8 id : 140682152394952
0.2. ARRAY AND PYTHON SEQUENCE xi

size : 9 b y t e s : 16 i d : 140682152394952
size : 10 b y t e s : 16 i d : 140682152394952
size : 11 b y t e s : 16 i d : 140682152394952
size : 12 b y t e s : 16 i d : 140682152394952
size : 13 b y t e s : 16 i d : 140682152394952
size : 14 b y t e s : 16 i d : 140682152394952
size : 15 b y t e s : 16 i d : 140682152394952
size : 16 b y t e s : 16 i d : 140682152394952
size : 17 b y t e s : 25 i d : 140682152394952

The output addresses the growth patterns as [0, 4, 8, 16, 25, 35, 46, 58, 72,
88, ...].
Amortizely, append takes O(1). However, it is O(n) for insert because
it has to first shift all items in the original list from [pos, end] by one position,
and put the item at pos with random access.

Common Methods of List


We have already seen how to use append, insert. Now, Table 3 shows us
the common List Methods, and they will be used as list.methodName().

Table 3: Common Methods of List


Method Description
append() Add an element to the end of the list
extend(l) Add all elements of a list to the another list
insert(index, val) Insert an item at the defined index s
pop(index) Removes and returns an element at the given
index
remove(val) Removes an item from the list
clear() Removes all items from the list
index(val) Returns the index of the first matched item
count(val) Returns the count of number of items passed
as an argument
sort() Sort items in a list in ascending order
reverse() Reverse the order of items in the list (same as
list[::-1])
copy() Returns a shallow copy of the list (same as
list[::])

Two-dimensional List
Two dimensional list is a list within a list. In this type of array the position
of an data element is referred by two indices instead of one. So it represents
a table with rows and columns of data. For example, we can declare the
following 2-d array:
1 ta = [ [ 1 1 , 3 , 9 , 1 ] , [ 2 5 , 6 , 1 0 ] , [ 1 0 , 8 , 12 , 5 ] ]
xii PYTHON DATA STRUCTURES

The scalar data in two dimensional lists can be accessed using two indices.
One index referring to the main or parent array and another index referring
to the position of the data in the inner list. If we mention only one index
then the entire inner list is printed for that index position. The example
below illustrates how it works.
1 p r i n t ( ta [ 0 ] )
2 p r i n t ( ta [ 2 ] [ 1 ] )

And with the output


[11 , 3 , 9 , 1]
8

In the above example, we new a 2-d list and initialize them with values.
There are also ways to new an empty 2-d array or fix the dimension of the
outer array and leave it empty for the inner arrays:
1 # empty two d i m e n s i o n a l l i s t
2 empty_2d = [ [ ] ]
3
4 # f i x the outer dimension
5 fix_out_d = [ [ ] f o r _ i n r a n g e ( 5 ) ]
6 p r i n t ( fix_out_d )

All the other operations such as delete, insert, update are the same as of
the one-dimensional list.

Matrices We are going to need the concept of matrix, which is defined as


a collection of numbers arranged into a fixed number of rows and columns.
For example, we define 3×4 (read as 3 by 4) order matrix is a set of numbers
arranged in 3 rows and 4 columns. And for m1 and m2 , they are doing the
same things.
1 rows , c o l s = 3 , 4
2 m1 = [ [ 0 f o r _ i n r a n g e ( c o l s ) ] f o r _ i n r a n g e ( rows ) ] # rows ∗
cols
3 m2 = [ [ 0 ] ∗ c o l s f o r _ i n r a n g e ( rows ) ] # rows ∗ c o l s
4 p r i n t (m1, m2)

The output is:


[[0 , 0 , 0 , 0] , [0 , 0 , 0 , 0] , [0 , 0 , 0 , 0]] [[0 , 0 , 0 , 0] , [0 , 0 ,
0 , 0] , [0 , 0 , 0 , 0]]

We assign value to m1 and m2 at index (1, 2) with value 1:


1 m1 [ 1 ] [ 2 ] = 1
2 m2 [ 1 ] [ 2 ] = 1
3 p r i n t (m1, m2)

And the output is:


[[0 , 0 , 0 , 0] , [0 , 0 , 1 , 0] , [0 , 0 , 0 , 0]] [[0 , 0 , 0 , 0] , [0 , 0 ,
1 , 0] , [0 , 0 , 0 , 0]]
0.2. ARRAY AND PYTHON SEQUENCE xiii

However, we can not declare it in the following way, because we end up with
some copies of the same inner lists, thus modifying one element in the inner
lists will end up changing all of the them in the corresponding positions.
Unless the feature suits the situation.
1 # wrong d e c l a r a t i o n
2 m4 = [ [ 0 ] ∗ c o l s ] ∗ rows
3 m4 [ 1 ] [ 2 ] = 1
4 p r i n t (m4)

With output:
[[0 , 0 , 1 , 0] , [0 , 0 , 1 , 0] , [0 , 0 , 1 , 0]

Access Rows and Columns In the real problem solving, we might need
to access rows and columns. Accessing rows is quite easy since it follows the
declaraion of two-dimensional array.
1 # a c c e s s i n g row
2 f o r row i n m1 :
3 p r i n t ( row )

With the output:


[0 , 0 , 0 , 0]
[0 , 0 , 1 , 0]
[0 , 0 , 0 , 0]

However, accessing columns will be less straightforward. To get each column,


we need another inner for loop or list comprehension through all rows and
obtain the value from that column. This is usually a lot slower than accessing
each row due to the fact that each row is a pointer while each col we need
to obtain from each row.
1 # accessing col
2 f o r i in range ( c o l s ) :
3 c o l = [ row [ i ] f o r row i n m1 ]
4 print ( col )

The output is:


[0 , 0, 0]
[0 , 0, 0]
[0 , 1, 0]
[0 , 0, 0]

There’s also a handy “idiom” for transposing a nested list, turning ’columns’
into ’rows’:
1 transposedM1 = l i s t ( z i p ( ∗m1) )
2 p r i n t ( transposedM1 )

The output will be:


[ ( 0 , 0 , 0) , (0 , 0 , 0) , (0 , 1 , 0) , (0 , 0 , 0) ]
xiv PYTHON DATA STRUCTURES

0.2.5 Tuple
A tuple has static array as its backing abstract data structure in C, which
is immutable–we can not add, delete, or replace items once its created and
assigned with value. You might think if list is a dynamic array and has
no restriction same as of the tuple, why would we need tuple then?

Tuple VS List We list how we use each data type and why is it. The
main benefit of tuple’s immutability is it is hashable, we can use them as
keys in the hash table–dictionary types, whereas the mutable types such
as list and range can not be applied. Besides, in the case that the data
does not to change, the tuple’s immutability will guarantee that the data
remains write-protected and iterating an immutable sequence is faster than
a mutable sequence, giving it slight performance boost. Also, we generally
use tuple to store a variety of data types. For example, in a class score
system, for a student, we might want to have its name, student id, and test
score, we can write (’Bob’, 12345, 89).

Tuple Syntax
New and Initialize Tuple Tuples are created by separating the items
with a comma. It is commonly wrapped in parentheses for better readability.
Tuple can also be created via a built-in function tuple(), if the argument to
tuple() is a sequence then this creates a tuple of elements of that sequences.
This is also used to realize type conversion.
An empty tuple:
1 tup = ( )
2 tup3 = t u p l e ( )
When there is only one item, put comma behind so that it wont be translated
as string, which is a bit bizarre!
1 tup2 = ( ' c r a c k ' , )
2 tup1 = ( ' c r a c k ' , ' l e e t c o d e ' , 2 0 1 8 , 2 0 1 9 )
Converting a string to a tuple with each character separated.
1 tup4 = t u p l e ( " l e e t c o d e " ) # t h e s e q u e n c e i s p a s s e d a s a t u p l e o f
elements
2 >> tup4 : ( ' l ' , ' e ' , ' e ' , ' t ' , ' c ' , 'o ' , 'd ' , ' e ' )
Converting a list to a tuple.
1 tup5 = t u p l e ( [ ' c r a c k ' , ' l e e t c o d e ' , 2 0 1 8 , 2 0 1 9 ] ) # same a s t u p l e 1
If we print out these tuples, it will be
1 tup1 : ( ' crack ' , ' l e e t c o d e ' , 2018 , 2019)
2 tup2 : crack
3 tup3 : ()
4 tup4 : ( ' l ' , ' e ' , ' e ' , ' t ' , ' c ' , 'o ' , 'd ' , 'e ')
5 tup5 : ( ' crack ' , ' l e e t c o d e ' , 2018 , 2019)
0.2. ARRAY AND PYTHON SEQUENCE xv

Changing a Tuple Assume we have the following tuple:


1 tup = ( ' a ' , ' b ' , [ 1 , 2 , 3 ] )

If we want to change it to (’c’, ’b’, [4,2,3]). We can not do the fol-


lowing operation as we said a tuple cannot be changed in-place once it has
been assigned.
1 tup = ( ' a ' , ' b ' , [ 1 , 2 , 3 ] )
2 #tup [ 0 ] = ' c ' #TypeError : ' t u p l e ' o b j e c t d o e s not s u p p o r t item
assignment

Instead, we initialize another tuple and assign it to tup variable.


1 tup=( ' c ' , ' b ' , [ 4 , 2 , 3 ] )

However, for its items which are mutable itself, we can still manipulate it.
For example, we can use index to access the list item at the last position of
a tuple and modify the list.
1 tup [ − 1 ] [ 0 ] = 4
2 #( ' a ' , ' b ' , [ 4 , 2 , 3 ] )

Understand Tuple
The backing structure is static array which states that the way the tuple
is structure is similar to list, other than its write-protected. We will just
brief on its property.

Tuple Object and Pointers Tuple object itself takes 48 bytes. And all
the others are similar to corresponding section in list.
1 lst_tup = [ ( ) , ( 1 , ) , ( ' 1 ' ,) , (1 , 2) , ( ' 1 ' , ' 2 ' ) ]
2 import s y s
3 f o r tup i n l s t _ t u p :
4 p r i n t ( s y s . g e t s i z e o f ( tup ) , end= ' ' )

The output will be:


48 56 56 64 64

Named Tuples
In named tuple, we can give all records a name, say “Computer_Science” to
indicate the class name, and we give each item a name, say ’name’, ’id’, and
’score’. We need to import namedtuple class from module collections.
For example:
1 r e c o r d 1 = ( ' Bob ' , 1 2 3 4 5 , 8 9 )
2 from c o l l e c t i o n s import namedtuple
3 Record = namedtuple ( ' Computer_Science ' , ' name i d s c o r e ' )
4 r e c o r d 2 = Record ( ' Bob ' , i d =12345 , s c o r e =89)
5 print ( record1 , record2 )
xvi PYTHON DATA STRUCTURES

The output will be:


1 ( ' Bob ' , 1 2 3 4 5 , 8 9 ) Computer_Science ( name= ' Bob ' , i d =12345 , s c o r e
=89)

0.2.6 Summary
All these sequence type data structures share the most common methods
and operations shown in Table 4 and 5. To note that in Python, the indexing
starts from 0.

Table 4: Common Methods for Sequence Data Type in Python


Function Method Description
len(s) Get the size of sequence s
min(s, [,default=obj, key=func]) The minimum value in s (alphabetically for
strings)
max(s, [,default=obj, key=func]) The maximum value in s (alphabetically for
strings)
sum(s, [,start=0) The sum of elements in s(return T ypeError
if s is not numeric)
all(s) Return T rue if all elements in s are True (Sim-
ilar to and)
any(s) Return T rue if any element in s is True (sim-
ilar to or)

Table 5: Common out of place operators for Sequence Data Type in Python
Operation Description
s+r Concatenates two sequences of the same type
s*n Make n copies of s, where n is an integer
v1 , v2 , ..., vn = s Unpack n variables from s
s[i] Indexing-returns ith element of s
s[i:j:stride] Slicing-returns elements between i and j with
optinal stride
x in s Return T rue if element x is in s
x not in s Return T rue if element x is not in s

0.2.7 Bonus
Circular Array The corresponding problems include:

1. 503. Next Greater Element II

0.2.8 Exercises
1. 985. Sum of Even Numbers After Queries (easy)
0.2. ARRAY AND PYTHON SEQUENCE xvii

2. 937. Reorder Log Files


You have an array of logs. Each log is a space delimited string of
words.
For each log, the first word in each log is an alphanumeric identifier.
Then, either:
Each word after the identifier will consist only of lowercase letters, or;
Each word after the identifier will consist only of digits.
We will call these two varieties of logs letter-logs and digit-logs. It is
guaranteed that each log has at least one word after its identifier.
Reorder the logs so that all of the letter-logs come before any digit-log.
The letter-logs are ordered lexicographically ignoring identifier, with
the identifier used in case of ties. The digit-logs should be put in their
original order.
Return the final order of the logs.
1 Example 1 :
2
3 I nput : [ " a1 9 2 3 1 " , " g1 a c t c a r " , " zo4 4 7 " , " ab1 o f f key
dog " , " a8 a c t zoo " ]
4 Output : [ " g1 a c t c a r " , " a8 a c t zoo " , " ab1 o f f key dog " , " a1 9
2 3 1 " , " zo4 4 7 " ]
5
6
7
8 Note :
9
10 0 <= l o g s . l e n g t h <= 100
11 3 <= l o g s [ i ] . l e n g t h <= 100
12 l o g s [ i ] i s g u a r a n t e e d t o have an i d e n t i f i e r , and a word
a f t e r the i d e n t i f i e r .

1 def reorderLogFiles ( s e l f , logs ) :


2 letters = []
3 digits = []
4 f o r idx , l o g i n enumerate ( l o g s ) :
5 splited = log . s p l i t ( ' ' )
6 id = s p l i t e d [ 0 ]
7 type = s p l i t e d [ 1 ]
8
9 i f type . i s n u m e r i c ( ) :
10 d i g i t s . append ( l o g )
11 else :
12 l e t t e r s . append ( ( ' ' . j o i n ( s p l i t e d [ 1 : ] ) , i d ) )
13 l e t t e r s . s o r t ( ) #d e f a u l t s o r t i n g by t h e f i r s t e l e m e n t
and then t h e s e c o n d i n t h e t u p l e
14
15 return [ id + ' ' + other f o r other , id in l e t t e r s ] +
digits
xviii PYTHON DATA STRUCTURES

1 def reorderLogFiles ( logs ) :


2 digit = []
3 letters = []
4 i n f o = {}
5 for log in logs :
6 i f ' 0 ' <= l o g [ −1] <= ' 9 ' :
7 d i g i t . append ( l o g )
8 else :
9 l e t t e r s . append ( l o g )
10 index = log . index ( ' ' )
11 i n f o [ log ] = log [ index +1:]
12
13 l e t t e r s . s o r t ( key= lambda x : i n f o [ x ] )
14 return l e t t e r s + d i g i t

0.3 Linked List


Python does not have built-in data type or modules that offers the Linked
List-like data structures, however, it is not hard to implement it ourselves.

0.3.1 Singly Linked List

Figure 1: Linked List Structure

Linked list consists of nodes, and each node consists of at least two
variables for singly linked lit: val to save data and next, a pointer that
points to the successive node. The Node class is given as:
1 c l a s s Node ( o b j e c t ) :
2 d e f __init__ ( s e l f , v a l = None ) :
3 s e l f . val = val
4 s e l f . next = None

In Singly Linked List, usually we can start to with a head node which
points to the first node in the list; only with this single node we are able
to trace other nodes. For simplicity, demonstrate the process without using
class, but we provide a class implementation with name SinglyLinkeList
in our online python source code. Now, let us create an empty node named
head.
1 head = None
0.3. LINKED LIST xix

We need to implement its standard operations, including insertion/append,


delete, search, clear. However, if we allow to the head node to be None, there
would be special cases to handle. Thus, we implement a dummy node–a
node but with None as its value as the head, to simplify the coding. Thus,
we point the head to a dummy node:
1 head = Node ( None )

Append Operation As the append function in list, we add node at the


very end of the linked list. If without the dummy node, then there will be
two cases:

• When head is an empty node, we assign the new node to head.

• When it is not empty, we because all we have that is available is the


head pointer, thus, it we need to first traverse all the nodes up till the
very last node whose next is None, then we connect node to the last
node through assigning it to the last node’s next pointer.

The first case is simply bad: we would generate a new node and we can not
track the head through in-place operation. However, with the dummy node,
only the second case will appear. The code is:
1 d e f append ( head , v a l ) :
2 node = Node ( v a l )
3 c u r = head
4 w h i l e c u r . next :
5 c u r = c u r . next
6 c u r . next = node
7 return

Now, let use create the same exact linked list in Fig. 1:
1 f o r v a l i n [ 'A ' , 'B ' , 'C ' , 'D ' ] :
2 append ( head , v a l )

Generator and Search Operations In order to traverse and iterate the


linked list using syntax like for ... in statement like any other sequence
data types in Python, we implement the gen() function that returns a
generator of all nodes of the list. Because we have a dummy node, so we
always start at head.next.
1 d e f gen ( head ) :
2 c u r = head . next
3 while cur :
4 y i e l d cur
5 c u r = c u r . next

Now, let us print out the linked list we created:


xx PYTHON DATA STRUCTURES

1 f o r node i n i t e r ( head ) :
2 p r i n t ( node . v a l , end = ' ' )

Here is the output:


A B C D

Search operation we find a node by value, and we return this node, otherwise,
we return None.
1 d e f s e a r c h ( head , v a l ) :
2 f o r node i n gen ( head ) :
3 i f node . v a l == v a l :
4 r e t u r n node
5 r e t u r n None

Now, we search for value ‘B’ with:


1 node = s e a r c h ( head , 'B ' )

Delete Operation For deletion, there are two scenarios: deleting a node
by value when we are given the head node and deleting a given node such
as the node we got from searching ’B’.
The first case requires us to first locate the node first, and rewire the
pointers between the predecessor and successor of the deleting node. Again
here, if we do not have a dummy node, we would have two cases: if the
node is the head node, repoint the head to the next node, we connect the
previous node to deleting node’s next node, and the head pointer remains
untouched. With dummy node, we would only have the second situation.
In the process, we use an additional variable prev to track the predecessor.
1 d e f d e l e t e ( head , v a l ) :
2 c u r = head . next # s t a r t from dummy node
3 prev = head
4 while cur :
5 i f c u r . v a l == v a l :
6 # rewire
7 prev . next = c u r . next
8 return
9 prev = c u r
10 c u r = c u r . next

Now, let us delete one more node–’A’ with this function.


1 d e l e t e ( head , 'A ' )
2 f o r n i n gen ( head ) :
3 p r i n t ( n . v a l , end = ' ' )

Now the output will indicate we only have two nodes left:
1 C D

The second case might seems a bit impossible–we do not know its pre-
vious node, the trick we do is to copy the value of the next node to current
0.3. LINKED LIST xxi

node, and we delete the next node instead by pointing current node to the
node after next node. While, that is only when the deleting node is not the
last node. When it is, we have no way to completely delete it; but we can
make it “invalid” by setting value and Next to None.
1 d e f d e l e t e ( head , v a l ) :
2 c u r = head . next # s t a r t from dummy node
3 prev = head
4 while cur :
5 i f c u r . v a l == v a l :
6 # rewire
7 prev . next = c u r . next
8 return
9 prev = c u r
10 c u r = c u r . next

Now, let us try deleting the node ’B’ via our previously found node.
1 deleteByNode ( node )
2 f o r n i n gen ( head ) :
3 p r i n t ( n . v a l , end = ' ' )

The output is:


1 A C D

Clear When we need to clear all the nodes of the linked list, we just set
the node next to the dummy head to None.
1 def clear ( s e l f ) :
2 s e l f . head = None
3 self . size = 0

Question: Some linked list can only allow insert node at the tail which
is Append, some others might allow insertion at any location. To get the
length of the linked list easily in O(1), we need a variable to track the size

0.3.2 Doubly Linked List

Figure 2: Doubly Linked List

On the basis of Singly linked list, doubly linked list (dll) contains an
extra pointer in the node structure which is typically called prev (short for
previous) and points back to its predecessor in the list. We define the Node
class as:
xxii PYTHON DATA STRUCTURES

1 c l a s s Node :
2 d e f __init__ ( s e l f , v a l , prev = None , next = None ) :
3 s e l f . val = val
4 s e l f . prev = prev # r e f e r e n c e t o p r e v i o u s node i n DLL
5 s e l f . next = next # r e f e r e n c e t o next node i n DLL

Similarly, let us start with setting the dummy node as head:


1 head = Node ( )

Now, instead of for me to continue to implement all operations that are


slightly variants of the singly linked list, why do not you guys implement it?
Do not worry, try it first, and also I have the answer covered in the google
colab, enjoy!
Now, I assume that you have implemented those operations and or
checked up the solutions. We would notice in search() and gen(), the
code is exactly the same, and for other operations, there is only one or two
lines of code that differs from SLL. Let’s quickly list these operations:

Append Operation In DLL, we have to set the appending node’s prev


pointer to the last node of the linked list. The code is:
1 d e f append ( head , v a l ) :
2 node = Node ( v a l )
3 c u r = head
4 w h i l e c u r . next :
5 c u r = c u r . next
6 c u r . next = node
7 node . prev = c u r ## o n l y d i f f e r e n c e
8 return

Generator and Search Operations There is no much difference if we


just search through next pointer. However, with the extra prev pointer,
we can have two options: either search forward through next or backward
through prev if the given starting node is any node. Whereas for SLL, this is
not an option, because we would not be able to conduct a complete search–
we can only search among the items behind from the given node. When the
data is ordered in some way, or if the program is parallel–situations that
bidirectional search would make sense.
1 d e f gen ( head ) :
2 c u r = head . next
3 while cur :
4 y i e l d cur
5 c u r = c u r . next

1 d e f s e a r c h ( head , v a l ) :
2 f o r node i n gen ( head ) :
3 i f node . v a l == v a l :
4 r e t u r n node
5 r e t u r n None
0.3. LINKED LIST xxiii

Delete Operation To delete a node by value, we first find it in the linked


list, and the rewiring process needs to deal with the next node’s prev pointer
if the next node exists.
1 d e f d e l e t e ( head , v a l ) :
2 c u r = head . next # s t a r t from dummy node
3 while cur :
4 i f c u r . v a l == v a l :
5 # rewire
6 c u r . prev . next = c u r . next
7 i f c u r . next :
8 c u r . next . prev = c u r . prev
9 return
10 c u r = c u r . next

For deleteByNode, because we are cutting off node.next, we need to con-


nect node to node.next.next in two directions: first point prev of later
node to current node, and set point current node’s next to the later node.
1 d e f deleteByNode ( node ) :
2 # p u l l t h e next node t o c u r r e n t node
3 i f node . next :
4 node . v a l = node . next . v a l
5 i f node . next . next :
6 node . next . next . prev = node
7 node . next = node . next . next
8 e l s e : #l a s t node
9 node . prev . next = None
10 r e t u r n node

Comparison We can see there is some slight advantage of dll over sll, but
it comes with the cost of handing the extra prev. This would only be an
advantage when bidirectional searching plays dominant factor in the matter
of efficiency, otherwise, better stick with sll.

Tips From our implementation, in some cases we still need to worry about
if it is the last node or not. The coding logic can further be simplified if we
put a dummy node at the end of the linked list too.

0.3.3 Bonus
Circular Linked List A circular linked list is a variation of linked list in
which the first node connects to last node. To make a circular linked list
from a normal linked list: in singly linked list, we simply set the last node’s
next pointer to the first node; in doubly linked list, other than setting the
last node’s next pointer, we set the prev pointer of the first node to the last
node making the circular in both directions.
Compared with a normal linked list, circular linked list saves time for us
to go to the first node from the last (both sll and dll) or go to the last node
xxiv PYTHON DATA STRUCTURES

from the first node (in dll) by doing it in a single step through the extra
connection. Because it is a circle, when ever a search with a while loop is
needed, we need to make sure the end condition: just make sure we searched
a whole cycle by comparing the iterating node to the starting node.

Recursion Recursion offers additional pass of traversal–bottom-up on the


basis of the top-down direction and in practice, it offers clean and simpler
code compared with iteration.

0.3.4 Hands-on Examples


Remove Duplicates (L83) Given a sorted linked list, delete all dupli-
cates such that each element appear only once.
Example 1 :

Input : 1−>1−>2
Output : 1−>2

Example 2 :

Input : 1−>1−>2−>3−>3
Output : 1−>2−>3

Analysis
This is a linear complexity problem, the most straightforward way is to
iterate through the linked list and compare the current node’s value with
the next’s to check its equivalency: (1) if YES: delete one of the nodes, here
we go for the next node; (2) if NO: we can move to the next node safely and
sound.

Iteration without Dummy Node We start from the head in a while


loop, if the next node exists and if the value equals, we delete next node.
However, after the deletion, we can not move to next directly; say if we have
1->1->1, when the second 1 is removed, if we move, we will be at the last
1, and would fail removing all possible duplicates. The code is given:
1 d e f d e l e t e D u p l i c a t e s ( s e l f , head ) :
2 """
3 : type head : ListNode
4 : r t y p e : ListNode
5 """
6 i f not head :
7 r e t u r n None
8
9 d e f i t e r a t i v e ( head ) :
10 c u r r e n t = head
11 while current :
0.3. LINKED LIST xxv

12 i f c u r r e n t . next and c u r r e n t . v a l == c u r r e n t . next . v a l :


13 # d e l e t e next
14 c u r r e n t . next = c u r r e n t . next . next
15 else :
16 c u r r e n t = c u r r e n t . next
17 r e t u r n head
18
19 r e t u r n i t e r a t i v e ( head )

With Dummy Node We see with a dummy node, we put current.next


in the whole loop, because only if the next node exists, would we need to
compare the values. Besides, we do not need to check this condition within
the while loop.
1 d e f i t e r a t i v e ( head ) :
2 dummy = ListNode ( None )
3 dummy . next = head
4 c u r r e n t = dummy
5 w h i l e c u r r e n t . next :
6 i f c u r r e n t . v a l == c u r r e n t . next . v a l :
7 # d e l e t e next
8 c u r r e n t . next = c u r r e n t . next . next
9 else :
10 c u r r e n t = c u r r e n t . next
11 r e t u r n head

Recursion Now, if we use recursion and return the node, thus, at each
step, we can compare our node with the returned node (locating behind the
current node), same logical applies. A better way to help us is drawing out
an example. With 1->1->1. The last 1 will return, and at the second last
1, we can compare them, because it equals, we delete the last 1, now we
backtrack to the first 1 with the second last 1 as returned node, we compare
again. The code is the simplest among all solutions.
1 d e f r e c u r s i v e ( node ) :
2 i f node . next i s None :
3 r e t u r n node
4
5 next = r e c u r s i v e ( node . next )
6 i f next . v a l == node . v a l :
7 node . next = node . next . next
8 r e t u r n node

0.3.5 Exercises
Basic operations:

1. 237. Delete Node in a Linked List (easy, delete only given current
node)
xxvi PYTHON DATA STRUCTURES

2. 2. Add Two Numbers (medium)

3. 92. Reverse Linked List II (medium, reverse in one pass)

4. 83. Remove Duplicates from Sorted List (easy)

5. 82. Remove Duplicates from Sorted List II (medium)

6. Sort List

7. Reorder List

Fast-slow pointers:

1. 876. Middle of the Linked List (easy)

2. Two Pointers in Linked List

3. Merge K Sorted Lists

Recursive and linked list:

1. 369. Plus One Linked List (medium)

0.4 Stack and Queue


Stack data structures fits well for tasks that require us to check the previous
states from cloest level to furtherest level. Here are some examplary appli-
cations: (1) reverse an array, (2) implement DFS iteratively as we will see
in Chapter ??, (3) keep track of the return address during function calls,
(4) recording the previous states for backtracking algorithms.
Queue data structures can be used: (1) implement BFS shown in Chap-
ter ??, (2) implement queue buffer.
In the remaining section, we will discuss the implement with the built-in
data types or using built-in modules. After this, we will learn more advanced
queue and stack: the priority queue and the monotone queue which can be
used to solve medium to hard problems on LeetCode.

0.4.1 Basic Implementation


For Queue and Stack data structures, the essential operations are two that
adds and removes item. In Stack, they are usually called PUSH and POP.
PUSH will add one item, and POP will remove one item and return its
value. These two operations should only take O(1) time. Sometimes, we
need another operation called PEEK which just return the element that can
be accessed in the queue or stack without removing it. While in Queue, they
are named as Enqueue and Dequeue.
0.4. STACK AND QUEUE xxvii

The simplest implementation is to use Python List by function insert()


(insert an item at appointed position), pop() (removes the element at the
given index, updates the list , and return the value. The default is to remove
the last item), and append(). However, the list data structure can not meet
the time complexity requirement as these operations can potentially take
O(n). We feel its necessary because the code is simple thus saves you from
using the specific module or implementing a more complex one.

Stack The implementation for stack is simplily adding and deleting ele-
ment from the end.
1 # stack
2 s = []
3 s . append ( 3 )
4 s . append ( 4 )
5 s . append ( 5 )
6 s . pop ( )

Queue For queue, we can append at the last, and pop from the first index
always. Or we can insert at the first index, and use pop the last element.
1 # queue
2 # 1 : u s e append and pop
3 q = []
4 q . append ( 3 )
5 q . append ( 4 )
6 q . append ( 5 )
7 q . pop ( 0 )

Running the above code will give us the following output:


1 p r i n t ( ' s t a c k : ' , s , ' queue : ' , q )
2 s t a c k : [ 3 , 4 ] queue : [ 4 , 5 ]

The other way to implement it is to write class and implement them


using concept of node which shares the same definition as the linked list
node. Such implementation can satisfy the O(1) time restriction. For both
the stack and queue, we utilize the singly linked list data structure.

Stack and Singly Linked List with top pointer Because in stack, we
only need to add or delete item from the rear, using one pointer pointing at
the rear item, and the linked list’s next is connected to the second toppest
item, in a direction from the top to the bottom.
1 # s t a c k with l i n k e d l i s t
2 ' ' ' a<−b<−c<−top ' ' '
3 c l a s s Stack :
4 d e f __init__ ( s e l f ) :
5 s e l f . top = None
6 self . size = 0
7
xxviii PYTHON DATA STRUCTURES

8 # push
9 d e f push ( s e l f , v a l ) :
10 node = Node ( v a l )
11 i f s e l f . top : # c o n n e c t top and node
12 node . next = s e l f . top
13 # r e s e t t h e top p o i n t e r
14 s e l f . top = node
15 s e l f . s i z e += 1
16
17 d e f pop ( s e l f ) :
18 i f s e l f . top :
19 v a l = s e l f . top . v a l
20 i f s e l f . top . next :
21 s e l f . top = s e l f . top . next # r e s e t top
22 else :
23 s e l f . top = None
24 s e l f . s i z e −= 1
25 return val
26
27 e l s e : # no e l e m e n t t o pop
28 r e t u r n None

Queue and Singly Linked List with Two Pointers For queue, we need
to access the item from each side, therefore we use two pointers pointing at
the head and the tail of the singly linked list. And the linking direction is
from the head to the tail.
1 # queue with l i n k e d l i s t
2 ' ' ' head−>a−>b−> t a i l ' ' '
3 c l a s s Queue :
4 d e f __init__ ( s e l f ) :
5 s e l f . head = None
6 s e l f . t a i l = None
7 self . size = 0
8
9 # push
10 d e f enqueue ( s e l f , v a l ) :
11 node = Node ( v a l )
12 i f s e l f . head and s e l f . t a i l : # c o n n e c t top and node
13 s e l f . t a i l . next = node
14 s e l f . t a i l = node
15 else :
16 s e l f . head = s e l f . t a i l = node
17
18 s e l f . s i z e += 1
19
20 d e f dequeue ( s e l f ) :
21 i f s e l f . head :
22 v a l = s e l f . head . v a l
23 i f s e l f . head . next :
24 s e l f . head = s e l f . head . next # r e s e t top
25 else :
26 s e l f . head = None
0.4. STACK AND QUEUE xxix

27 s e l f . t a i l = None
28 s e l f . s i z e −= 1
29 return val
30
31 e l s e : # no e l e m e n t t o pop
32 r e t u r n None

Also, Python provide two built-in modules: Deque and Queue for such
purpose. We will detail them in the next section.

0.4.2 Deque: Double-Ended Queue


Deque object is a supplementary container data type from Python collec-
tions module. It is a generalization of stacks and queues, and the name
is short for “double-ended queue”. Deque is optimized for adding/popping
items from both ends of the container in O(1). Thus it is preferred over list
in some cases. To new a deque object, we use deque([iterable[, maxlen]]).
This returns us a new deque object initialized left-ro-right with data from
iterable. If maxlen is not specified or is set to None, deque may grow to an
arbitray length. Before implementing it, we learn the functions for deque
class first in Table 6.

Table 6: Common Methods of Deque


Method Description
append(x) Add x to the right side of the deque.
appendleft(x) Add x to the left side of the deque.
pop() Remove and return an element from the right side of the deque.
If no elements are present, raises an IndexError.
popleft() Remove and return an element from the left side of the deque.
If no elements are present, raises an IndexError.
maxlen Deque objects also provide one read-only attribute:Maximum
size of a deque or None if unbounded.
count(x) Count the number of deque elements equal to x.
extend(iterable) Extend the right side of the deque by appending elements from
the iterable argument.
extendleft(iterable) Extend the left side of the deque by appending elements from
iterable. Note, the series of left appends results in reversing
the order of elements in the iterable argument.
remove(value) emove the first occurrence of value. If not found, raises a
ValueError.
reverse() Reverse the elements of the deque in-place and then return
None.
rotate(n=1) Rotate the deque n steps to the right. If n is negative, rotate
to the left.

In addition to the above, deques support iteration, pickling, len(d), re-


versed(d), copy.copy(d), copy.deepcopy(d), membership testing with the in
xxx PYTHON DATA STRUCTURES

operator, and subscript references such as d[-1].


Now, we use deque to implement a basic stack and queue,the main meth-
ods we need are: append(), appendleft(), pop(), popleft().
1 ' ' ' Use deque from c o l l e c t i o n s ' ' '
2 from c o l l e c t i o n s import deque
3 q = deque ( [ 3 , 4 ] )
4 q . append ( 5 )
5 q . popleft ()
6
7 s = deque ( [ 3 , 4 ] )
8 s . append ( 5 )
9 s . pop ( )

Printing out the q and s:


1 p r i n t ( ' s t a c k : ' , s , ' queue : ' , q )
2 s t a c k : deque ( [ 3 , 4 ] ) queue : deque ( [ 4 , 5 ] )

Deque and Ring Buffer Ring Buffer or Circular Queue is defined as a


linear data structure in which the operations are performed based on FIFO
(First In First Out) principle and the last position is connected back to the
first position to make a circle. This normally requires us to predefine the
maximum size of the queue. To implement a ring buffer, we can use deque
as a queue as demonstrated above, and when we initialize the object, set the
maxLen. Once a bounded length deque is full, when new items are added,
a corresponding number of items are discarded from the opposite end.

0.4.3 Python built-in Module: Queue


The queue module provides thread-safe implementation of Stack and Queue
like data structures. It encompasses three types of queue as shown in Ta-
ble 7. In python 3, we use lower case queue, but in Python 2.x it uses Queue,
in our book, we learn Python 3.

Table 7: Datatypes in Queue Module, maxsize is an integer that sets the


upperbound limit on the number of items that can be places in the queue.
Insertion will block once this size has been reached, until queue items are
consumed. If maxsize is less than or equal to zero, the queue size is infinite.
Class Data Structure
class queue.Queue(maxsize=0) Constructor for a FIFO queue.
class queue.LifoQueue(maxsize=0) Constructor for a LIFO queue.
class queue.PriorityQueue(maxsize=0) Constructor for a priority queue.

Queue objects (Queue, LifoQueue, or PriorityQueue) provide the public


methods described below in Table 8.
Now, using Queue() and LifoQueue() to implement queue and stack re-
spectively is straightforward:
0.4. STACK AND QUEUE xxxi

Table 8: Methods for Queue’s three classes, here we focus on single-thread


background.
Class Data Structure
Queue.put(item[, block[, timeout]]) Put item into the queue.
Queue.get([block[, timeout]]) Remove and return an item from the
queue.
Queue.qsize() Return the approximate size of the
queue.
Queue.empty() Return True if the queue is empty,
False otherwise.
Queue.full() Return True if the queue is full, False
otherwise.

1 # python 3
2 import queue
3 # imp lementing queue
4 q = queue . Queue ( )
5 f o r i in range (3 , 6) :
6 q . put ( i )

1 import queue
2 # imp lementing s t a c k
3 s = queue . LifoQueue ( )
4
5 f o r i in range (3 , 6) :
6 s . put ( i )

Now, using the following printing:


1 p r i n t ( ' s t a c k : ' , s , ' queue : ' , q )
2 s t a c k : <queue . LifoQueue o b j e c t a t 0 x000001A4062824A8> queue : <
queue . Queue o b j e c t a t 0 x000001A4062822E8>

Instead we print with:


1 print ( ' stack : ' )
2 w h i l e not s . empty ( ) :
3 p r i n t ( s . g e t ( ) , end= ' ' )
4 p r i n t ( ' \ nqueue : ' )
5 w h i l e not q . empty ( ) :
6 p r i n t ( q . g e t ( ) , end = ' ' )
7 stack :
8 5 4 3
9 queue :
10 3 4 5

0.4.4 Bonus
Circular Linked List and Circular Queue The circular queue is a
linear data structure in which the operation are performed based on FIFO
xxxii PYTHON DATA STRUCTURES

principle and the last position is connected back to the the first position to
make a circle. It is also called “Ring Buffer”. Circular Queue can be either
implemented with a list or a circular linked list. If we use a list, we initialize
our queue with a fixed size with None as value. To find the position of the
enqueue(), we use rear = (rear + 1)%size. Similarily, for dequeue(), we use
f ront = (f ront + 1)%size to find the next front position.

0.4.5 Exercises
Queue and Stack

1. 225. Implement Stack using Queues (easy)

2. 232. Implement Queue using Stacks (easy)

3. 933. Number of Recent Calls (easy)

Queue fits well for buffering problem.

1. 933. Number of Recent Calls (easy)

2. 622. Design Circular Queue (medium)

1 Write a c l a s s RecentCounter t o count r e c e n t r e q u e s t s .


2
3 I t has o n l y one method : p i n g ( i n t t ) , where t r e p r e s e n t s some
time i n m i l l i s e c o n d s .
4
5 Return t h e number o f p i n g s t h a t have been made from 3000
m i l l i s e c o n d s ago u n t i l now .
6
7 Any p i n g with time i n [ t − 3 0 0 0 , t ] w i l l count , i n c l u d i n g t h e
current ping .
8
9 I t i s guaranteed that every c a l l to ping uses a s t r i c t l y l a r g e r
v a l u e o f t than b e f o r e .
10
11
12
13 Example 1 :
14
15 Input : i n p u t s = [ " RecentCounter " , " p i n g " , " p i n g " , " p i n g " , " p i n g " ] ,
inputs = [ [ ] , [ 1 ] , [ 1 0 0 ] , [ 3 0 0 1 ] , [ 3 0 0 2 ] ]
16 Output : [ n u l l , 1 , 2 , 3 , 3 ]

Analysis: This is a typical buffer problem. If the size is larger than the
buffer, then we squeeze out the easilest data. Thus, a queue can be used to
save the t and each time, squeeze any time not in the range of [t-3000, t]:
1 c l a s s RecentCounter :
2
3 d e f __init__ ( s e l f ) :
0.5. HASH TABLE xxxiii

4 s e l f . ans = c o l l e c t i o n s . deque ( )
5
6 def ping ( s e l f , t ) :
7 """
8 : type t : i n t
9 : rtype : int
10 """
11 s e l f . ans . append ( t )
12 w h i l e s e l f . ans [ 0 ] < t −3000:
13 s e l f . ans . p o p l e f t ( )
14 r e t u r n l e n ( s e l f . ans )

Monotone Queue

1. 84. Largest Rectangle in Histogram

2. 85. Maximal Rectangle

3. 122. Best Time to Buy and Sell Stock II

4. 654. Maximum Binary Tree

Obvious applications:

1. 496. Next Greater Element I

2. 503. Next Greater Element I

3. 121. Best Time to Buy and Sell Stock

1. 84. Largest Rectangle in Histogram

2. 85. Maximal Rectangle

3. 122. Best Time to Buy and Sell Stock II

4. 654. Maximum Binary Tree

5. 42 Trapping Rain Water

6. 739. Daily Temperatures

7. 321. Create Maximum Number

0.5 Hash Table


0.5.1 Implementation
In this section, we practice on the learned concepts and methods by imple-
menting hash set and hash map.
xxxiv PYTHON DATA STRUCTURES

Hash Set Design a HashSet without using any built-in hash table libraries.
To be specific, your design should include these functions: (705. Design
HashSet)
add ( v a l u e ) : I n s e r t a v a l u e i n t o t h e HashSet .
c o n t a i n s ( v a l u e ) : Return whether t h e v a l u e e x i s t s i n t h e HashSet
o r not .
remove ( v a l u e ) : Remove a v a l u e i n t h e HashSet . I f t h e v a l u e d o e s
not e x i s t i n t h e HashSet , do n o t h i n g .

For example:
MyHashSet h a s h S e t = new MyHashSet ( ) ;
h a s h S e t . add ( 1 ) ;
h a s h S e t . add ( 2 ) ;
hashSet . c o n t a i n s ( 1 ) ; // r e t u r n s t r u e
hashSet . c o n t a i n s ( 3 ) ; // r e t u r n s f a l s e ( not found )
h a s h S e t . add ( 2 ) ;
hashSet . c o n t a i n s ( 2 ) ; // r e t u r n s t r u e
h a s h S e t . remove ( 2 ) ;
hashSet . c o n t a i n s ( 2 ) ; // r e t u r n s f a l s e ( a l r e a d y removed )

Note: Note: (1) All values will be in the range of [0, 1000000]. (2) The
number of operations will be in the range of [1, 10000].
1 c l a s s MyHashSet :
2
3 d e f _h( s e l f , k , i ) :
4 r e t u r n ( k+i ) % 10001
5
6 d e f __init__ ( s e l f ) :
7 """
8 I n i t i a l i z e your data s t r u c t u r e h e r e .
9 """
10 s e l f . s l o t s = [ None ] ∗ 1 0 0 0 1
11 s e l f . s i z e = 10001
12
13 d e f add ( s e l f , key : ' i n t ' ) −> ' None ' :
14 i = 0
15 while i < s e l f . s i z e :
16 k = s e l f . _h( key , i )
17 i f s e l f . s l o t s [ k ] == key :
18 return
19 e l i f not s e l f . s l o t s [ k ] o r s e l f . s l o t s [ k ] == −1:
20 s e l f . s l o t s [ k ] = key
21 return
22 i += 1
23 # double s i z e
24 s e l f . s l o t s = s e l f . s l o t s + [ None ] ∗ s e l f . s i z e
25 s e l f . s i z e ∗= 2
26 r e t u r n s e l f . add ( key )
27
28
29 d e f remove ( s e l f , key : ' i n t ' ) −> ' None ' :
30 i = 0
0.5. HASH TABLE xxxv

31 while i < s e l f . s i z e :
32 k = s e l f . _h( key , i )
33 i f s e l f . s l o t s [ k ] == key :
34 s e l f . s l o t s [ k ] = −1
35 return
36 e l i f s e l f . s l o t s [ k ] == None :
37 return
38 i += 1
39 return
40
41 d e f c o n t a i n s ( s e l f , key : ' i n t ' ) −> ' b o o l ' :
42 """
43 Returns t r u e i f t h i s s e t c o n t a i n s t h e s p e c i f i e d e l e m e n t
44 """
45 i = 0
46 while i < s e l f . s i z e :
47 k = s e l f . _h( key , i )
48 i f s e l f . s l o t s [ k ] == key :
49 r e t u r n True
50 e l i f s e l f . s l o t s [ k ] == None :
51 return False
52 i += 1
53 return False

Hash Map Design a HashMap without using any built-in hash table li-
braries. To be specific, your design should include these functions: (706.
Design HashMap (easy))

• put(key, value) : Insert a (key, value) pair into the HashMap. If the
value already exists in the HashMap, update the value.

• get(key): Returns the value to which the specified key is mapped, or


-1 if this map contains no mapping for the key. remove(key) : Remove
the mapping for the value key if this map contains the mapping for
the key.

Example:
hashMap = MyHashMap ( )
hashMap . put ( 1 , 1 ) ;
hashMap . put ( 2 , 2 ) ;
hashMap . g e t ( 1 ) ; // r e t u r n s 1
hashMap . g e t ( 3 ) ; // r e t u r n s −1 ( not found )
hashMap . put ( 2 , 1 ) ; // update the e x i s t i n g value
hashMap . g e t ( 2 ) ; // r e t u r n s 1
hashMap . remove ( 2 ) ; // remove t h e mapping f o r 2
hashMap . g e t ( 2 ) ; // r e t u r n s −1 ( not found )

1 c l a s s MyHashMap :
2 d e f _h( s e l f , k , i ) :
3 r e t u r n ( k+i ) % 10001 # [ 0 , 1 0 0 0 1 ]
4 d e f __init__ ( s e l f ) :
xxxvi PYTHON DATA STRUCTURES

5 """
6 I n i t i a l i z e your data s t r u c t u r e h e r e .
7 """
8 s e l f . s i z e = 10002
9 s e l f . s l o t s = [ None ] ∗ s e l f . s i z e
10
11
12 d e f put ( s e l f , key : ' i n t ' , v a l u e : ' i n t ' ) −> ' None ' :
13 """
14 v a l u e w i l l always be non−n e g a t i v e .
15 """
16 i = 0
17 while i < s e l f . s i z e :
18 k = s e l f . _h( key , i )
19 i f not s e l f . s l o t s [ k ] o r s e l f . s l o t s [ k ] [ 0 ] i n [ key ,
−1]:
20 s e l f . s l o t s [ k ] = ( key , v a l u e )
21 return
22 i += 1
23 # d o u b l e s i z e and t r y a g a i n
24 s e l f . s l o t s = s e l f . s l o t s + [ None ] ∗ s e l f . s i z e
25 s e l f . s i z e ∗= 2
26 r e t u r n s e l f . put ( key , v a l u e )
27
28
29 d e f g e t ( s e l f , key : ' i n t ' ) −> ' i n t ' :
30 """
31 Returns t h e v a l u e t o which t h e s p e c i f i e d key i s mapped ,
o r −1 i f t h i s map c o n t a i n s no mapping f o r t h e key
32 """
33 i = 0
34 while i < s e l f . s i z e :
35 k = s e l f . _h( key , i )
36 i f not s e l f . s l o t s [ k ] :
37 r e t u r n −1
38 e l i f s e l f . s l o t s [ k ] [ 0 ] == key :
39 return s e l f . s l o t s [ k ] [ 1 ]
40 e l s e : # i f i t s d e l e t e d keep p r o b i n g
41 i += 1
42 r e t u r n −1
43
44
45 d e f remove ( s e l f , key : ' i n t ' ) −> ' None ' :
46 """
47 Removes t h e mapping o f t h e s p e c i f i e d v a l u e key i f t h i s
map c o n t a i n s a mapping f o r t h e key
48 """
49 i = 0
50 while i < s e l f . s i z e :
51 k = s e l f . _h( key , i )
52 i f not s e l f . s l o t s [ k ] :
53 return
54 e l i f s e l f . s l o t s [ k ] [ 0 ] == key :
55 s e l f . s l o t s [ k ] = ( −1 , None )
0.5. HASH TABLE xxxvii

56 return
57 e l s e : # i f i t s d e l e t e d keep p r o b i n g
58 i += 1
59 return

0.5.2 Python Built-in Data Structures


SET and Dictionary
In Python, we have the standard build-in data structure dictionary and set
using hashtable. For the set classes, they are implemented using dictionar-
ies. Accordingly, the requirements for set elements are the same as those
for dictionary keys; namely, that the object defines both __eq__() and
__hash__() methods. A Python built-in function hash(object =) is im-
plementing the hashing function and returns an integer value as of the hash
value if the object has defined __eq__() and __hash__() methods. As a
result of the fact that hash() can only take immutable objects as input key
in order to be hashable meaning it must be immutable and comparable (has
an __eq__() or __cmp__() method).

Python 2.X VS Python 3.X In Python 2X, we can use slice to access
keys() or items() of the dictionary. However, in Python 3.X, the same syn-
tax will give us TypeError: ’dict_keys’ object does not support indexing.
Instead, we need to use function list() to convert it to list and then slice it.
For example:
1 # Python 2 . x
2 d i c t . keys ( ) [ 0 ]
3
4 # Python 3 . x
5 l i s t ( d i c t . keys ( ) ) [ 0 ]

set Data Type Method Description Python Set remove() Removes El-
ement from the Set Python Set add() adds element to a set Python Set
copy() Returns Shallow Copy of a Set Python Set clear() remove all ele-
ments from a set Python Set difference() Returns Difference of Two Sets
Python Set difference_update() Updates Calling Set With Intersection of
Sets Python Set discard() Removes an Element from The Set Python Set
intersection() Returns Intersection of Two or More Sets Python Set inter-
section_update() Updates Calling Set With Intersection of Sets Python Set
isdisjoint() Checks Disjoint Sets Python Set issubset() Checks if a Set is
Subset of Another Set Python Set issuperset() Checks if a Set is Superset of
Another Set Python Set pop() Removes an Arbitrary Element Python Set
symmetric_difference() Returns Symmetric Difference Python Set symmet-
ric_difference_update() Updates Set With Symmetric Difference Python
xxxviii PYTHON DATA STRUCTURES

Set union() Returns Union of Sets Python Set update() Add Elements to
The Set.
If we want to put string in set, it should be like this:
1 >>> a = s e t ( ' a a r d v a r k ' )
2 >>>
3 { 'd ' , 'v ' , 'a ' , ' r ' , 'k '}
4 >>> b = { ' a a r d v a r k ' }# o r s e t ( [ ' a a r d v a r k ' ] ) , c o n v e r t a l i s t o f
s t r i n g s to s e t
5 >>> b
6 { ' aardvark ' }
7 #o r put a t u p l e i n t h e s e t
8 a =s e t ( [ t u p l e ] ) o r { ( t u p l e ) }

Compare also the difference between and set() with a single word argument.

dict Data Type Method Description clear() Removes all the elements
from the dictionary copy() Returns a copy of the dictionary fromkeys()
Returns a dictionary with the specified keys and values get() Returns the
value of the specified key items() Returns a list containing a tuple for each
key value pair keys() Returns a list containing the dictionary’s keys pop()
Removes the element with the specified key and return value popitem()
Removes the last inserted key-value pair setdefault() Returns the value of
the specified key. If the key does not exist: insert the key, with the specified
value update() Updates the dictionary with the specified key-value pairs
values() Returns a list of all the values in the dictionary
See using cases at https://www.programiz.com/python-programming/
dictionary.

Collection Module
OrderedDict Standard dictionaries are unordered, which means that any
time you loop through a dictionary, you will go through every key, but you
are not guaranteed to get them in any particular order. The OrderedDict
from the collections module is a special type of dictionary that keeps track
of the order in which its keys were inserted. Iterating the keys of an ordered-
Dict has predictable behavior. This can simplify testing and debugging by
making all the code deterministic.

defaultdict Dictionaries are useful for bookkeeping and tracking statis-


tics. One problem is that when we try to add an element, we have no idea
if the key is present or not, which requires us to check such condition every
time.
1 d i c t = {}
2 key = " c o u n t e r "
3 i f key not i n d i c t :
4 d i c t [ key ]=0
0.5. HASH TABLE xxxix

5 d i c t [ key ] += 1

The defaultdict class from the collections module simplifies this process by
pre-assigning a default value when a key does not present. For different value
type it has different default value, for example, for int, it is 0 as the default
value. A defaultdict works exactly like a normal dict, but it is initialized
with a function (“default factory”) that takes no arguments and provides
the default value for a nonexistent key. Therefore, a defaultdict will never
raise a KeyError. Any key that does not exist gets the value returned by
the default factory. For example, the following code use a lambda function
and provide ’Vanilla’ as the default value when a key is not assigned and
the second code snippet function as a counter.
1 from c o l l e c t i o n s import d e f a u l t d i c t
2 ice_cream = d e f a u l t d i c t ( lambda : ' V a n i l l a ' )
3 ice_cream [ ' Sarah ' ] = ' Chunky Monkey '
4 ice_cream [ ' Abdul ' ] = ' B u t t e r Pecan '
5 p r i n t ice_cream [ ' Sarah ' ]
6 # Chunky Monkey
7 p r i n t ice_cream [ ' Joe ' ]
8 # Vanilla

1 from c o l l e c t i o n s import d e f a u l t d i c t
2 dict = d e f a u l t d i c t ( int ) # default value f o r int i s 0
3 d i c t [ ' c o u n t e r ' ] += 1

There include: Time Complexity for Operations Search, Insert, Delete:


O(1).

Counter

0.5.3 Exercises
1. 349. Intersection of Two Arrays (easy)

2. 350. Intersection of Two Arrays II (easy)

929. Unique Email Addresses


1 Every e m a i l c o n s i s t s o f a l o c a l name and a domain name ,
s e p a r a t e d by t h e @ s i g n .
2
3 For example , i n a l i c e @ l e e t c o d e . com , a l i c e i s t h e l o c a l name , and
l e e t c o d e . com i s t h e domain name .
4
5 B e s i d e s l o w e r c a s e l e t t e r s , t h e s e e m a i l s may c o n t a i n ' . ' s o r '+ ' s
.
6
7 I f you add p e r i o d s ( ' . ' ) between some c h a r a c t e r s i n t h e l o c a l
name p a r t o f an e m a i l a d d r e s s , m a i l s e n t t h e r e w i l l be
f o r w a r d e d t o t h e same a d d r e s s w i t h o u t d o t s i n t h e l o c a l name .
xl PYTHON DATA STRUCTURES

For example , " a l i c e . z @ l e e t c o d e . com " and " a l i c e z @ l e e t c o d e .


com " f o r w a r d t o t h e same e m a i l a d d r e s s . ( Note t h a t t h i s r u l e
d o e s not apply f o r domain names . )
8
9 I f you add a p l u s ( ' + ' ) i n t h e l o c a l name , e v e r y t h i n g a f t e r t h e
f i r s t p l u s s i g n w i l l be i g n o r e d . This a l l o w s c e r t a i n e m a i l s
t o be f i l t e r e d , f o r example m. y+name@email . com w i l l be
f o r w a r d e d t o my@email . com . ( Again , t h i s r u l e d o e s not apply
f o r domain names . )
10
11 I t i s p o s s i b l e t o u s e both o f t h e s e r u l e s a t t h e same time .
12
13 Given a l i s t o f e m a i l s , we send one e m a i l t o each a d d r e s s i n t h e
l i s t . How many d i f f e r e n t a d d r e s s e s a c t u a l l y r e c e i v e m a i l s ?
14
15 Example 1 :
16
17 Input : [ " t e s t . e m a i l+a l e x @ l e e t c o d e . com " , " t e s t . e . m a i l+bob .
c a t h y @ l e e t c o d e . com " , " t e s t e m a i l+d a v i d @ l e e . t c o d e . com " ]
18 Output : 2
19 E x p l a n a t i o n : " t e s t e m a i l @ l e e t c o d e . com " and " t e s t e m a i l @ l e e . t c o d e .
com " a c t u a l l y r e c e i v e m a i l s
20
21 Note :
22 1 <= e m a i l s [ i ] . l e n g t h <= 100
23 1 <= e m a i l s . l e n g t h <= 100
24 Each e m a i l s [ i ] c o n t a i n s e x a c t l y one '@' c h a r a c t e r .

Answer: Use hashmap simply Set of tuple to save the corresponding sending
exmail address: local name and domain name:
1 class Solution :
2 d e f numUniqueEmails ( s e l f , e m a i l s ) :
3 """
4 : type e m a i l s : L i s t [ s t r ]
5 : rtype : int
6 """
7 i f not e m a i l s :
8 return 0
9 num = 0
10 handledEmails = s e t ( )
11 f o r email in emails :
12 local_name , domain_name = e m a i l . s p l i t ( '@ ' )
13 local_name = local_name . s p l i t ( '+ ' ) [ 0 ]
14 local_name = local_name . r e p l a c e ( ' . ' , ' ' )
15 h a n d l e d E m a i l s . add ( ( local_name , domain_name ) )
16 return l e n ( handledEmails )

0.6 Graph Representations


Graph data structure can be thought of a superset of the array and the
linked list, and tree data structures. In this section, we only introduce the
0.6. GRAPH REPRESENTATIONS xli

presentation and implementation of the graph, but rather defer the searching
strategies to the principle part. Searching strategies in the graph makes a
starting point in algorithmic problem solving, knowing and analyzing these
strategies in details will make an independent chapter as a problem solving
principle.

0.6.1 Introduction
Graph representations need to show users full information to the graph itself,
G = (V, E), including its vertices, edges, and its weights to distinguish either
it is directed or undirected, weighted or unweighted. There are generally
four ways: (1) Adjacency Matrix, (2) Adjacency List, (3) Edge List, and (4)
optionally, Tree Structure, if the graph is a free tree. Each will be preferred
to different situations. An example is shown in Fig 3.

Figure 3: Four ways of graph representation, renumerate it from 0. Redraw


the graph

Double Edges in Undirected Graphs In directed graph, the number


of edges is denoted as |E|. However, for the undirected graph, because one
edge (u, v) only means that vertex u and v are connected; we can reach to
v from u and it also works the other way around. To represent undirected
graph, we have to double its number of edges shown in the structure; it
becomes 2|E| in all of our representations.

Adjacency Matrix
An adjacency matrix of a graph is a 2-D matrix of size |V | × |V |: each
dimension, row and column, is vertex-indexed. Assume our matrix is am, if
there is an edge between vertices 3,4, and if its unweighted graph, we mark
it by setting am[3][4]=1, we do the same for all edges and leaving all other
spots in the matrix zero-valued. For undirected graph, it will be a symmetric
matrix along the main diagonal as shown in A of Fig. 3; the matrix is its own
transpose: am = amT . We can choose to store only the entries on and above
the diagonal of the matrix, thereby cutting the memory need in half. For
unweighted graph, typically our adjacency matrix is zero-and-one valued.
For a weighted graph, the adjacency matrix becomes a weight matrix, with
xlii PYTHON DATA STRUCTURES

w(i, j) to denote the weight of edge (i, j); the weight can be both negative
or positive or even zero-valued in practice, thus we might want to figure out
how to distinguish the non-edge relation from the edge relation when the
situation arises.
The Python code that implements the adjacency matrix for the graph
in the example is:
am = [ [ 0 ] ∗ 7 f o r _ i n r a n g e ( 7 ) ]

# set 8 edges
am [ 0 ] [ 1 ] = am [ 1 ] [ 0 ] = 1
am [ 0 ] [ 2 ] = am [ 2 ] [ 0 ] = 1
am [ 1 ] [ 2 ] = am [ 2 ] [ 1 ] = 1
am [ 1 ] [ 3 ] = am [ 3 ] [ 1 ] = 1
am [ 2 ] [ 4 ] = am [ 4 ] [ 2 ] = 1
am [ 3 ] [ 4 ] = am [ 4 ] [ 3 ] = 1
am [ 4 ] [ 5 ] = am [ 5 ] [ 4 ] = 1
am [ 5 ] [ 6 ] = am [ 6 ] [ 5 ] = 1

Applications Adjacency matrix usually fits well to the dense graph where
the edges are close to |V |2 , leaving a small ratio of the matrix be blank
and unused. Checking if an edge exists between two vertices takes only
O(1). However, an adjacency matrix requires exactly O(V ) to enumerate
the the neighbors of a vertex v–an operation commonly used in many graph
algorithms–even if vertex v only has a few neighbors. Moreover, when the
graph is sparse, an adjacency matrix will be both inefficient in the space
and iteration cost, a better option is adjacency list.

Adjacency List
An adjacency list is a more compact and space efficient form of graph repre-
sentation compared with the above adjacency matrix. In adjacency list, we
have a list of V vertices which is vertex-indexed, and for each vertex v we
store anther list of neighboring nodes with their vertex as the value, which
can be represented with an array or linked list. For example, with adjacency
list as [[1, 2, 3], [3, 1], [4, 6, 1]], node 0 connects to 1,2,3, node 1 connect to 3,1,
node 2 connects to 4,6,1.
In Python, We can use a normal 2-d array to represent the adjacent list,
for the same graph in the example, it as represented with the following code:
al = [ [ ] f o r _ in range (7) ]

# set 8 edges
al [ 0 ] = [1 , 2]
al [ 1 ] = [2 , 3]
al [ 2 ] = [0 , 4]
al [ 3 ] = [1 , 4]
al [ 4 ] = [2 , 3 , 5]
0.6. GRAPH REPRESENTATIONS xliii

al [ 5 ] = [4 , 6]
al [ 6 ] = [ 5 ]

Applications The upper bound space complexity for adjacency list is


O(|V |2 ). However, with adjacency list, to check if there is an edge be-
tween node u and v, it has to take O(|V |) time complexity with a linear
scanning in the list al[u]. If the graph is static, meaning we do not add
more vertices but can modify the current edges and its weight, we can use a
set or a dictionary Python data type on second dimension of the adjacency
list. This change enables O(1) search of an edge just as of in the adjacency
matrix.

Edge List
The edge list is a list of edges (one-dimensional), where the index of the list
does not relate to vertex and each edge is usually in the form of (starting
vertex, ending vertex, weight). We can use either a list or a tuple to
represent an edge. The edge list representation of the example is given:
el = []
el . e x t en d ( [ [ 0 , 1] , [1 , 0]])
el . e x t en d ( [ [ 0 , 2] , [2 , 0]])
el . e x t en d ( [ [ 1 , 2] , [2 , 1]])
el . e x t en d ( [ [ 1 , 3] , [3 , 1]])
el . e x t en d ( [ [ 3 , 4] , [4 , 3]])
el . e x t en d ( [ [ 2 , 4] , [4 , 2]])
el . e x t en d ( [ [ 4 , 5] , [5 , 4]])
el . e x t en d ( [ [ 5 , 6] , [6 , 5]])

Applications Edge list is not widely used as the AM and AL, and usually
only be needed in a subrountine of algorithm implementation–such as in
Krukal’s algorithm to fine Minimum Spanning Tree(MST)–where we might
need to order the edges by its weight.

Tree Structure
If the connected graph has no cycle and the edges E = V − 1, which is
essentially a tree. We can choose to represent it either one of the three
representations. Optionally, we can use the tree structure is formed as rooted
tree with nodes which has value and pointers to its children. We will see
later how this type of tree is implemented in Python.

0.6.2 Use Dictionary


In the last section, we always use the vertex indexed structure, it works
but might not be human-friendly to work with, in practice a vertex always
xliv PYTHON DATA STRUCTURES

comes with a “name”–such as in the cities system, a vertex should be a city’s


name. Another inconvenience is when we have no idea of the total number
of vertices, using the index-numbering system requires us to first figure our
all vertices and number each, which is an overhead.
To avoid the two inconvenience, we can replace Adjacency list, which is
a list of lists with embedded dictionary structure which is a dictionary of
dictionaries or sets.

Unweighted Graph For example, we demonstrate how to give a “name”


to exemplary graph; we replace 0 with ‘a’, 1 with ‘b’, and the others with
{0 c0 , d,0 e0 ,0 f 0 ,0 g 0 }. We declare defaultdict(set), the outer list is replaced
by the dictionary, and the inner neighboring node list is replaced with a set
for O(1) access to any edge.
In the demo code, we simply construct this representation from the edge
list.
1 from c o l l e c t i o n s import d e f a u l t d i c t
2
3 d = defaultdict ( set )
4 f o r v1 , v2 i n e l :
5 d [ c h r ( v1 + ord ( ' a ' ) ) ] . add ( c h r ( v2 + ord ( ' a ' ) ) )
6 print (d)

And the printed graph is as follows:


d e f a u l t d i c t (< c l a s s ' s e t ' > , { ' a ' : { ' b ' , ' c ' } , ' b ' : { ' d ' , ' c ' , ' a
'} , ' c ' : { 'b ' , ' e ' , 'a '} , 'd ' : { 'b ' , ' e '} , ' e ' : { 'd ' , ' c ' , ' f
'} , ' f ' : { 'e ' , 'g '} , 'g ' : { ' f '}})

Weighted Graph If we need weights for each edge, we can use two-
dimensional dictionary. We use 10 as a weight to all edges just to demon-
strate.
1 dw = d e f a u l t d i c t ( d i c t )
2 f o r v1 , v2 i n e l :
3 vn1 = c h r ( v1 + ord ( ' a ' ) )
4 vn2 = c h r ( v2 + ord ( ' a ' ) )
5 dw [ vn1 ] [ vn2 ] = 10
6 p r i n t (dw)

We can access the edge and its weight through dw[v1][v2]. The output of
this structure is given:
d e f a u l t d i c t (< c l a s s ' d i c t ' > , { ' a ' : { ' b ' : 1 0 , ' c ' : 1 0 } , ' b ' : { ' a ' :
10 , ' c ' : 10 , 'd ' : 10} , ' c ' : { ' a ' : 10 , 'b ' : 10 , ' e ' : 10} , 'd
' : { 'b ' : 10 , ' e ' : 10} , ' e ' : { 'd ' : 10 , ' c ' : 10 , ' f ' : 10} , ' f ' :
{ ' e ' : 10 , ' g ' : 10} , ' g ' : { ' f ' : 10}})
0.7. TREE DATA STRUCTURES xlv

0.7 Tree Data Structures


In this section, we focus on implementing a recursive tree structure, since
a free tree just works the same way as of the graph structure. Also, we
have already covered the implicit structure of tree in the topic of heap.
In this section, we first implement the recursive tree data structure and the
construction of a tree. In the next section, we discuss the searching strategies
on the tree–tree traversal, including its both recursive and iterative variants.
put an figure here of a binary and n-ary tree.
Because a tree is a hierarchical–here which is represented recursively–
structure of a collection of nodes. We define two classes each for the N-ary
tree node and the binary tree node. A node is composed of a variable val
saving the data and children pointers to connect the nodes in the tree.

Binary Tree Node In a binary tree, the children pointers will at at most
two pointers, which we define as left and right. The binary tree node is
defined as:
1 c l a s s BinaryNode :
2 d e f __init__ ( s e l f , v a l ) :
3 s e l f . l e f t = None
4 s e l f . r i g h t = None
5 s e l f . val = val

N-ary Tree Node For N-ary node, when we initialize the length of the
node’s children with additional argument n.
1 c l a s s NaryNode :
2 d e f __init__ ( s e l f , n , v a l ) :
3 s e l f . c h i l d r e n = [ None ] ∗ n
4 s e l f . val = val

In this implementation, the children is ordered by each’s index in the list. In


real practice, there is a lot of flexibility. It is not necessarily to pre-allocate
the length of its children, we can start with an empty list [] and just append
more nodes to its children list on the fly. Also we can replace the list with a
dictionary data type, which might be a better and more space efficient way.

Construct A Tree Now that we have defined the tree node, the process
of constructing a tree in the figure will be a series of operations:
1
/ \
2 3
/ \ \
4 5 6
xlvi PYTHON DATA STRUCTURES

1 r o o t = BinaryNode ( 1 )
2 l e f t = BinaryNode ( 2 )
3 r i g h t = BinaryNode ( 3 )
4 root . l e f t = l e f t
5 root . right = right
6 l e f t . l e f t = BinaryNode ( 4 )
7 l e f t . r i g h t = BinaryNode ( 5 )
8 r i g h t . r i g h t = BinaryNode ( 6 )

We see that the above is not convenient in practice. A more practice


way is to represent the tree with the heap-like array, which treated the tree
as a complete tree. For the above binary tree, because it is not complete in
definition, we pad the left child of node 3 with None in the list, we would
have array [1, 2, 3, 4, 5, None, 6]. The root node will have index 0,
and given a node with index i, the children nodes of it will be indexed with
n ∗ i + j, j ∈ [1, ..., n]. Thus, a better way to construct the above tree is to
start from the array and and traverse the list recursively to build up the
tree.
We define a recursive function with two arguments: a–the input array of
nodes and idx–indicating the position of the current node in the array. At
each recursive call, we construct a BinaryNode and set its left and right
child to be a node returned with two recursive call of the same function.
Equivalently, we can say these two subprocess–constructTree(a, 2*idx
+ 1) and constructTree(a, 2*idx + 2) builds up two subtrees and each
is rooted with node 2*idx+1 and 2*idx+2 respectively. When there is no
items left in the array to be used, it natually indicates the end of the recur-
sive function and return None to indicate its an empty node. We give the
following Python code:
1 def constructTree (a , idx ) :
2 '''
3 a : i n p u t a r r a y o f nodes
4 i d x : i n d e x t o i n d i c a t t h e l o c a t i o n o f t h e c u r r e n t node
5 '''
6 i f i d x >= l e n ( a ) :
7 r e t u r n None
8 i f a [ idx ] :
9 node = BinaryNode ( a [ i d x ] )
10 node . l e f t = c o n s t r u c t T r e e ( a , 2∗ i d x + 1 )
11 node . r i g h t = c o n s t r u c t T r e e ( a , 2∗ i d x + 2 )
12 r e t u r n node
13 r e t u r n None

Now, we call this function, and pass it with out input array:
1 nums = [ 1 , 2 , 3 , 4 , 5 , None , 6 ]
2 r o o t = c o n s t r u c t T r e e ( nums , 0 )
0.7. TREE DATA STRUCTURES xlvii

Please write a recursive function to construct the N-ary


tree given in Fig. ???

In the next section, we discuss tree traversal methods, and we will use those
methods to print out the tree we just build.

0.7.1 LeetCode Problems


To show the nodes at each level, we use LevelOrder function to print out
the tree:
1 def LevelOrder ( root ) :
2 q = [ root ]
3 while q :
4 new_q = [ ]
5 for n in q :
6 i f n i s not None :
7 p r i n t ( n . v a l , end= ' , ' )
8 if n. left :
9 new_q . append ( n . l e f t )
10 i f n. right :
11 new_q . append ( n . r i g h t )
12 q = new_q
13 p r i n t ( ' \n ' )
14 LevelOrder ( root )
15 # output
16 # 1,
17
18 # 2 ,3 ,
19
20 # 4 , 5 , None , 6 ,

Lowest Common Ancestor. The lowest common ancestor is defined be-


tween two nodes p and q as the lowest node in T that has both p and q as
descendants (where we allow a node to be a descendant of itself). There will
be two cases in LCA problem which will be demonstrated in the following
example.

0.1 Lowest Common Ancestor of a Binary Tree (L236). Given a


binary tree, find the lowest common ancestor (LCA) of two given nodes
in the tree. Given the following binary tree: root = [3,5,1,6,2,0,8,null,null,7,4]
_______3______
/ \
___5__ ___1__
/ \ / \
6 _2 0 8
/ \
7 4

Example 1 :
xlviii PYTHON DATA STRUCTURES

Input : r o o t = [ 3 , 5 , 1 , 6 , 2 , 0 , 8 , n u l l , n u l l , 7 , 4 ] , p = 5 , q = 1
Output : 3
E x p l a n a t i o n : The LCA o f o f nodes 5 and 1 i s 3 .

Example 2 :
Input : r o o t = [ 3 , 5 , 1 , 6 , 2 , 0 , 8 , n u l l , n u l l , 7 , 4 ] , p = 5 , q = 4
Output : 5
E x p l a n a t i o n : The LCA o f nodes 5 and 4 i s 5 , s i n c e a node
can be a d e s c e n d a n t o f i t s e l f
a c c o r d i n g t o t h e LCA d e f i n i t i o n .

Solution: Divide and Conquer. There are two cases for LCA: 1)
two nodes each found in different subtree, like example 1. 2) two nodes
are in the same subtree like example 2. If we compare the current node
with the p and q, if it equals to any of them, return current node in
the tree traversal. Therefore in example 1, at node 3, the left return
as node 5, and the right return as node 1, thus node 3 is the LCA.
In example 2, at node 5, it returns 5, thus for node 3, the right tree
would have None as return, thus it makes the only valid return as the
final LCA. The time complexity is O(n).
1 d e f lowestCommonAncestor ( s e l f , r o o t , p , q ) :
2 """
3 : type r o o t : TreeNode
4 : type p : TreeNode
5 : type q : TreeNode
6 : r t y p e : TreeNode
7 """
8 i f not r o o t :
9 r e t u r n None
10 i f r o o t == p o r r o o t == q :
11 r e t u r n r o o t # found one v a l i d node ( c a s e 1 : s t o p a t
5 , 1 , case 2 : stop at 5)
12 l e f t = s e l f . lowestCommonAncestor ( r o o t . l e f t , p , q )
13 r i g h t = s e l f . lowestCommonAncestor ( r o o t . r i g h t , p , q )
14 i f l e f t i s not None and r i g h t i s not None : # p , q i n
the subtree
15 return root
16 i f any ( [ l e f t , r i g h t ] ) i s not None :
17 r e t u r n l e f t i f l e f t i s not None e l s e r i g h t
18 r e t u r n None

0.8 Heap
count = Counter(nums)
Heap is a tree based data structure that satisfies the heap ordering prop-
erty. The ordering can be one of two types:
• the min-heap property: the value of each node is greater than or equal
(≥) to the value of its parent, with the minimum-value element at the
0.8. HEAP xlix

root.

• the max-heap property: the value of each node is less than or equal to
(≤) the value of its parent, with the maximum-value element at the
root.

Figure 4: Max-heap be visualized with binary tree structure on the left, and
be implemented with Array on the right.

Binary Heap A heap is not a sorted structure but can be regarded as


partially ordered. The maximum number of children of a node in a heap
depends on the type of heap. However, in the more commonly-used heap
type, there are at most two children of a node and it’s known as a Binary
heap. A min-binary heap is shown in Fig. 4. Throughout this section the
word “heap” will always refer to a min-heap.
Heap is commonly used to implement priority queue that each time the
item of the highest priority is popped out – this can be done in O(log n).
As we go through the book, we will find how often priority queue is needed
to solve our problems. It can also be used in sorting, such as the heapsort
algorithm.

Heap Representation A binary heap is always a complete binary tree


that each level is fully filled before starting to fill the next level. Therefore
it has a height of log n given a binary heap with n nodes. A complete binary
tree can be uniquely represented by storing its level order traversal in an
array. Array representation more space efficient due to the non-existence of
the children pointers for each node.
In the array representation, index 0 is skipped for convenience of imple-
mentation. Therefore, root locates at index 1. Consider a k-th item of the
array, its parent and children relation is:

• its left child is located at 2 ∗ k index,

• its right child is located at 2 ∗ k + 1. index,


l PYTHON DATA STRUCTURES

• and its parent is located at k/2 index (In Python3, use integer division
n//2).

0.8.1 Basic Implementation


The basic methods of a heap class should include: push–push an item into
the heap, pop–pop out the first item, and heapify–convert an arbitrary
array into a heap. In this section, we use the heap shown in Fig. 5 as our
example.

Figure 5: A Min-heap.

Push: Percolation Up The new element is initially appended to the


end of the heap (as the last element of the array). The heap property is
repaired by comparing the added element with its parent and moving the
added element up a level (swapping positions with the parent). This process
is called percolation up. The comparison is repeated until the parent is larger
than or equal to the percolating element. When we push an item in, the
item is initially appended to the end of the heap. Assume the new item is
the smaller than existing items in the heap, such as 5 in our example, there
will be violation of the heap property through the path from the end of the
heap to the root. To repair the violation, we traverse through the path and
compare the added item with its parent:

• if parent is smaller than the added item, no action needed and the
traversal is terminated, e.g. adding item 18 will lead to no action.

• otherwise, swap the item with the parent, and set the node to its
parent so that it can keep traverse.
0.8. HEAP li

Each step we fix the heap ordering property for a substree. The time com-
plexity is the same as the height of the complete tree, which is O(log n).
To generalize the process, a _float() function is first implemented which
enforce min heap ordering property on the path from a given index to the
root.
1 d e f _ f l o a t ( idx , heap ) :
2 w h i l e i d x // 2 :
3 p = i d x // 2
4 # Violation
5 i f heap [ i d x ] < heap [ p ] :
6 heap [ i d x ] , heap [ p ] = heap [ p ] , heap [ i d x ]
7 else :
8 break
9 idx = p
10 return

With _float(), function push is implemented as:


1 d e f push ( heap , k ) :
2 heap . append ( k )
3 _ f l o a t ( i d x = l e n ( heap ) − 1 , heap=heap )

Pop: Percolation Down When we pop out the item, no matter if it is


the root item or any other item in the heap, an empty spot appears at that
location. We first move the last item in the heap to this spot, and then start
to repair the heap ordering property by comparing the new item at this spot
to its children:
• if one of its children has smaller value than this item, swap this item
with that child and set the location to that child’s location. And then
continue.

• otherwise, the process is done.


Similarly, this process is called percolation down. Same as the insert in the
case of complexity, O(log n). We demonstrate this process with two cases:
• if the item is the root, which is the minimum item 5 in our min-heap
example, we move 12 to the root first. Then we compare 12 with its
two children, which are 6 and 7. Swap 12 with 6, and continue. The
process is shown in Fig. 6.

• if the item is any other node instead of root, say node 7 in our example.
The process is exactly the same. We move 12 to node 7’s position.
By comparing 12 with children 10 and 15, 10 and 12 is about to be
swapped. With this, the heap ordering property is sustained.
We first use a function _sink to implement the percolation down part
of the operation.
lii PYTHON DATA STRUCTURES

Figure 6: Left: delete node 5, and move node 12 to root. Right: 6 is the
smallest among 12, 6, and 7, swap node 6 with node 12.

1 d e f _sink ( idx , heap ) :


2 s i z e = l e n ( heap )
3 while 2 ∗ idx < s i z e :
4 l i = 2 ∗ idx
5 ri = li + 1
6 mi = i d x
7 i f heap [ l i ] < heap [ mi ] :
8 mi = l i
9 i f r i < s i z e and heap [ r i ] < heap [ mi ] :
10 mi = r i
11 i f mi != i d x :
12 # swap i n d e x with mi
13 heap [ i d x ] , heap [ mi ] = heap [ mi ] , heap [ i d x ]
14 else :
15 break
16 i d x = mi

The pop is implemented as:


1 d e f pop ( heap ) :
2 v a l = heap [ 1 ]
3 # Move t h e l a s t item i n t o t h e r o o t p o s i t i o n
4 heap [ 1 ] = heap . pop ( )
5 _sink ( i d x =1 , heap=heap )
6 return val

Heapify Heapify is a procedure that converts a list to a heap. To heapify


a list, we can naively do it through a series of insertion operations through
the items in the list, which gives us an upper-bound time complexity :
O(n log n). However, a more efficient way is to treat the given list as a
tree and to heapify directly on the list.
To satisfy the heap property, we need to first start from the smallest
subtrees, which are leaf nodes. Leaf nodes have no children which satisfy
the heap property naturally. Therefore we can jumpy to the last parent
0.8. HEAP liii

node, which is at position n//2 with starting at 1 index. We apply the


percolation down process as used in pop operation which works forwards
comparing the node with its children nodes and applies swapping if the heap
property is violated. At the end, the subtree rooted at this particular node
obeys the heap ordering property. We then repeat the same process for all
parents nodes items in the list in range [n/2, 1]–in reversed order of [1, n/2],
which guarantees that the final complete binary tree is a binary heap. This
follows a dynamic programming fashion. The leaf nodes a[n/2 + 1, n] are
naturally a heap. Then the subarrays are heapified in order of a[n/2, n],
a[n/2 − 1, n], ..., [1, n] as we working on nodes [n/2, 1]. we first heaipfy
a[n, n], A[n − 1...n], A[n − 2...n], ..., A[1...n]. Such process gives us a tighter
upper bound which is O(n).
We show how the heapify process is applied on a = [21, 1, 45, 78, 3, 5] in
Fig. 9.

Figure 7: Heapify: The last parent node 45.

Figure 8: Heapify: On node 1

Figure 9: Heapify: On node 21.

Implementation-wise, the heapify function call _sink as its subroutine.


liv PYTHON DATA STRUCTURES

The code is shown as:


1 def heapify ( l s t ) :
2 heap = [ None ] + l s t
3 n = len ( l s t )
4 f o r i i n r a n g e ( n / / 2 , 0 , −1) :
5 _sink ( i , heap )
6 r e t u r n heap

Which way is more efficient building a heap from a list?


Using insertion or heapify? What is the efficiency of each method?
The experimental result can be seen in the code.

Try to use the percolation up process to heaipify the list.

0.8.2 Python Built-in Library: heapq


When we are solving a problem, unless specifically required for implementa-
tion, we can always use an existent Python module/package. heapq is one
of the most frequently used library in problem solving.
heapq 2 is a built-in library in Python that implements heap queue al-
gorithm. heapq object implements a minimum binary heap and it provides
three main functions: heappush, heappop, and heaipfy similar to what we
have implemented in the last section. The API differs from our last section
in one aspect: it uses zero-based indexing. There are other three functions:
nlargest, nsmallest, and merge that come in handy in practice. These
functions are listed and described in Table 9.
Now, lets see some examples.

Min-Heap Given the exemplary list a = [21, 1, 45, 78, 3, 5], we call the
function heapify() to convert it to a min-heap.
1 from heapq import heappush , heappop , h e a p i f y
2 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3 heapify (h)

The heapified result is h = [1, 3, 5, 78, 21, 45]. Let’s try heappop and heappush:
1 heappop ( h )
2 heappush ( h , 1 5 )

The print out for h is:


1 [ 5 , 15 , 45 , 78 , 21]

2
https://docs.python.org/3.0/library/heapq.html
0.8. HEAP lv

Table 9: Methods of heapq


Method Description
heappush(h, x) Push the x onto the heap, maintaining the heap invariant.
heappop(h) Pop and return the smallest item from the heap, maintaining
the heap invariant. If the heap is empty, IndexError is raised.
heappushpop(h, x) Push x on the heap, then pop and return the smallest item
from the heap. The combined action runs more efficiently than
heappush() followed by a separate call to heappop().
heapify(x) Transform list x into a heap, in-place, in linear time.
nlargest(k, iterable, This function is used to return the k largest elements from the
key = fun) iterable specified and satisfying the key if mentioned.
nsmallest(k, iter- This function is used to return the k smallest elements from
able, key = fun) the iterable specified and satisfying the key if mentioned.
merge(*iterables, Merge multiple sorted inputs into a single sorted output. Re-
key=None, re- turns a generator over the sorted values.
verse=False)
heapreplace(h, x) Pop and return the smallest item from the heap, and also push
the new item.

nlargest and nsmallest To get the largest or smallest first n items


with these two functions does not require the list to be first heapified with
heapify because it is built in them.
1 from heapq import n l a r g e s t , n s m a l l e s t
2 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3 nl = nlargest (3 , h)
4 ns = n s m a l l e s t ( 3 , h )

The print out for nl and ns is as:


1 [ 7 8 , 45 , 21]
2 [1 , 3 , 5]

Merge Multiple Sorted Arrays Function merge merges multiple iter-


ables into a single generator typed output. It assumes all the inputs are
sorted. For example:
1 from heapq import merge
2 a = [ 1 , 3 , 5 , 21 , 45 , 78]
3 b = [2 , 4 , 8 , 16]
4 ab = merge ( a , b )

The print out of ab directly can only give us a generator object with its
address in the memory:
1 <g e n e r a t o r o b j e c t merge a t 0 x7 fd c9 3b 389 e8 >

We can use list comprehension and iterate through ab to save the sorted
array in a list:
1 a b _ l s t = [ n f o r n i n ab ]
lvi PYTHON DATA STRUCTURES

The print out for ab_lst is:


1 [ 1 , 2 , 3 , 4 , 5 , 8 , 16 , 21 , 45 , 78]

Max-Heap As we can see the default heap implemented in heapq is forc-


ing the heap property of the min-heap. What if we want a max-heap instead?
In the library, it does offer us function, but it is intentionally hided from
users. It can be accessed like: heapq._[function]_max(). Now, we can
heapify a max-heap with function _heapify_max.
1 from heapq import _heapify_max
2 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
3 _heapify_max ( h )

The print out for h is:


1 [ 7 8 , 21 , 45 , 1 , 3 , 5 ]

Also, in practise, a simple hack for the max-heap is to save data as


negative. Whenever we use the data, we convert it to the original value. For
example:
1 h = [ 2 1 , 1 , 45 , 78 , 3 , 5 ]
2 h = [−n f o r n i n h ]
3 heapify (h)
4 a = −heappop ( h )

a will be 78, as the largest item in the heap.

With Tuple/List or Customized Object as Items for Heap Any


object that supports comparison (_cmp_()) can be used in heap with heapq.
When we want our item to include information such as “priority” and “task”,
we can either put it in a tuple or a list. heapq For example, our item is a
list, and the first is the priority and the second denotes the task id.
1 heap = [ [ 3 , ' a ' ] , [ 1 0 , ' b ' ] , [ 5 , ' c ' ] , [ 8 , ' d ' ] ]
2 h e a p i f y ( heap )

The print out for heap is:


1 [[3 , 'a ' ] , [8 , 'd ' ] , [5 , ' c ' ] , [10 , 'b ' ] ]

However, if we have multiple tasks that having the same priority, the relative
order of these tied tasks can not be sustained. This is because the list items
are compared with the whole list as key: it first compare the first item,
whenever there is a tie, it compares the next item. For example, when our
example has multiple items with 3 as the first value in the list.
1 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
2 heapify (h)

The printout indicates that the relative ordering of items [3, ’e’], [3, ’d’], [3,
’a’] is not kept:
0.9. PRIORITY QUEUE lvii

1 [[3 , 'a ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , ' e ' ] ]

Keeping the relative order of tasks with same priority is a requirement for
priority queue abstract data structure. We will see at the next section how
priority queue can be implemented with heapq.

Modify Items in heapq In the heap, we can change the value of any
item just as what we can in the list. However, the violation of heap ordering
property occurs after the change so that we need a way to fix it. We have
the following two private functions to use according to the case of change:
• _siftdown(heap, startpos, pos): pos is where the where the new
violation is. startpos is till where we want to restore the heap in-
variant, which is usually set to 0. Because in _siftdown() it goes
backwards to compare this node with the parents, we can call this
function to fix when an item’s value is decreased.

• _siftup(heap, pos): In _siftup() items starting from pos are com-


pared with their children so that smaller items are sifted up along the
way. Thus, we can call this function to fix when an item’s value is
increased.
We show one example:
1 import heapq
2 heap = [ [ 3 , ' a ' ] , [ 1 0 , ' b ' ] , [ 5 , ' c ' ] , [ 8 , ' d ' ] ]
3 h e a p i f y ( heap )
4 p r i n t ( heap )
5
6 heap [ 0 ] = [ 6 , ' a ' ]
7 # Increased value
8 heapq . _ s i f t u p ( heap , 0 )
9 p r i n t ( heap )
10 #D e c r e a s e d Value
11 heap [ 2 ] = [ 3 , ' a ' ]
12 heapq . _siftdown ( heap , 0 , 2 )
13 p r i n t ( heap )

The printout is:


1 [[3 , 'a ' ] , [8 , 'd ' ] , [5 , ' c ' ] , [10 , 'b ' ] ]
2 [[5 , ' c ' ] , [8 , 'd ' ] , [6 , 'a ' ] , [10 , 'b ' ] ]
3 [[3 , 'a ' ] , [8 , 'd ' ] , [5 , ' c ' ] , [10 , 'b ' ] ]

0.9 Priority Queue


A priority queue is an abstract data type(ADT) and an extension of queue
with properties:
1. A queue that each item has a priority associated with.
lviii PYTHON DATA STRUCTURES

2. In a priority queue, an item with higher priority is served (dequeued)


before an item with lower priority.

3. If two items have the same priority, they are served according to their
order in the queue.

Priority Queue is commonly seen applied in:

1. CPU Scheduling,

2. Graph algorithms like Dijkstra’s shortest path algorithm, Prim’s Min-


imum Spanning Tree, etc.

3. All queue applications where priority is involved.

The properties of priority queue demand sorting stability to our chosen


sorting mechanism or data structure. Heap is generally preferred over arrays
or linked list to be the underlying data structure for priority queue. In fact,
the Python class PriorityQueue() from Python module queue uses heapq
under the hood too. We later will see how to implement priority queue
with heapq and how to use PriorityQueue() class for our purpose. In
default, the lower the value is, the higher the priority is, making min-heap
the underlying data structure.

Implement with heapq Library


The core functions: heapify(), push(), and pop() within heapq lib are
used in our implementation. In order to implement priority queue, our
binary heap needs to have the following features:

• Sort stability: when we get two tasks with equal priorities, we return
them in the same order as they were originally added. A potential
solution is to modify the original 2-element list [priority, task]
into a 3-element list as [priority, count, task]. list is preferred
because tuple does not allow item assignment. The entry count in-
dicates the original order of the task in the list, which serves as a
tie-breaker so that two tasks with the same priority are returned in
the same order as they were added to preserve the sort stability. Also,
since no two entry counts are the same so that in the tuple comparison
the task will never be directly compared with the other. For example,
use the same example as in the last section:
1 import i t e r t o o l s
2 c o u n t e r = i t e r t o o l s . count ( )
3 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
4 h = [ [ p , next ( c o u n t e r ) , t ] f o r p , t i n h ]

The printout for h is:


0.9. PRIORITY QUEUE lix

1 [ [ 3 , 0 , ' e ' ] , [3 , 1 , 'd ' ] , [10 , 2 , ' c ' ] , [5 , 3 , 'b ' ] , [3 ,


4 , 'a ' ] ]

If we heapify h will gives us the same order as the original h. The


relative ordering of ties in the sense of priority is sustained.

• Remove arbitrary items or update the priority of an item:


In situations such as the priority of a task changes or if a pending
task needs to be removed, we have to update or remove an item from
the heap. we have seen how to update an item’s value in O(log n)
time cost with two functions: _siftdown() and _siftup() in a heap.
However, how to remove an arbitrary item? We could have found and
replaced it with the last item in the heap. Then depending on the
comparison between the value of the deleted item and the last item,
the two mentioned functions can be applied further.
However, there is a more convenient alternative: whenever we “re-
move” an item, we do not actually remove it but instead simply mark
it as “removed”. These “removed” items will eventually be popped out
through a normally pop operation that comes with heap data struc-
ture, and which has the same time cost O(log n). With this alterna-
tive, when we are updating an item, we mark the old item as “re-
moved” and add the new item in the heap. Therefore, with the special
mark method, we are able to implement a heap wherein arbitrary re-
moval and update is supported with just three common functionalities:
heapify, heappush, and heappop.
Let’s use the same example here. We first remove task ‘d’ and then
update task ‘b”s priority to 14. Then we use another list vh to get the
relative ordering of tasks in the heap according to the priority.
1 REMOVED = '<removed−t a s k > '
2 # Remove t a s k ' d '
3 h [ 1 ] [ 2 ] = REMOVED
4 # Updata t a s k ' b ' ' s p r o p r i t y t o 14
5 h [ 3 ] [ 2 ] = REMOVED
6 heappush ( h , [ 1 4 , next ( c o u n t e r ) , ' b ' ] )
7 vh = [ ]
8 while h :
9 item = heappop ( h )
10 i f item [ 2 ] != REMOVED:
11 vh . append ( item )

The printout for vh is:


1 [ [ 3 , 0 , ' e ' ] , [3 , 4 , 'a ' ] , [10 , 2 , ' c ' ] , [14 , 5 , 'b ' ] ]

• Search in constant time: To search in the heap of an arbitrary


item–non-root item and root-item–takes linear time. In practice, tasks
should have unique task ids to distinguish from each other, making the
lx PYTHON DATA STRUCTURES

usage of a dictionary where task serves as key and the the 3-element
list as value possible (for a list, the value is just a pointer pointing to
the starting position of the list). With the dictionary to help search,
the time cost is thus decreased to constant. We name this dictionary
here as entry_finder. Now, with we modify the previous code. The
following code shows how to add items into a heap that associates with
entry_finder:
1 # A heap a s s o c i a t e d with e n t r y _ f i n d e r
2 c o u n t e r = i t e r t o o l s . count ( )
3 e n t r y _ f i n d e r = {}
4 h = [ [ 3 , ' e ' ] , [3 , 'd ' ] , [10 , ' c ' ] , [5 , 'b ' ] , [3 , 'a ' ] ]
5 heap = [ ]
6 for p , t in h :
7 item = [ p , next ( c o u n t e r ) , t ]
8 heap . append ( item )
9 e n t r y _ f i n d e r [ t ] = item
10 h e a p i f y ( heap )

Then, we execute the remove and update operations with entry_finder.


1 REMOVED = '<removed−t a s k > '
2 d e f remove_task ( t a s k _ i d ) :
3 i f task_id in entry_finder :
4 e n t r y _ f i n d e r [ t a s k _ i d ] [ 2 ] = REMOVED
5 e n t r y _ f i n d e r . pop ( t a s k _ i d ) # d e l e t e from t h e d i c t i o n a r y
6 return
7
8 # Remove t a s k ' d '
9 remove_task ( ' d ' )
10 # Updata t a s k ' b ' ' s p r i o r i t y t o 14
11 remove_task ( ' b ' )
12 new_item = [ 1 4 , next ( c o u n t e r ) , ' b ' ]
13 heappush ( heap , new_item )
14 e n t r y _ f i n d e r [ ' b ' ] = new_item

In the notebook, we provide a comprehensive class named PriorityQueue


that implements just what we have discussed in this section.

Implement with PriorityQueue class


Class PriorityQueue() class has the same member functions as class Queue()
and LifoQueue() which are shown in Table 8. Therefore, we skip the intro-
duction. First, we built a queue with:
1 from queue import P r i o r i t y Q u e u e
2 data = [ [ 3 , ' e ' ] , [ 3 , ' d ' ] , [ 1 0 , ' c ' ] , [ 5 , ' b ' ] , [ 3 , ' a ' ] ]
3 pq = P r i o r i t y Q u e u e ( )
4 f o r d i n data :
5 pq . put ( d )
6
7 process_order = [ ]
0.9. PRIORITY QUEUE lxi

8 w h i l e not pq . empty ( ) :
9 p r o c e s s _ o r d e r . append ( pq . g e t ( ) )

The printout for process_order shown as follows indicates how PriorityQueue


works the same as our heapq:
1 [[3 , 'a ' ] , [3 , 'd ' ] , [3 , ' e ' ] , [5 , 'b ' ] , [10 , ' c ' ] ]

Customized Object If we want the higher the value is the higher priority,
we demonstrate how to do so with a customized object with two compar-
ison operators: < and == in the class with magic functions __lt__() and
__eq__(). The code is as:
1 c l a s s Job ( ) :
2 d e f __init__ ( s e l f , p r i o r i t y , t a s k ) :
3 self . priority = priority
4 s e l f . task = task
5 return
6
7 d e f __lt__( s e l f , o t h e r ) :
8 try :
9 return s e l f . p r i o r i t y > other . p r i o r i t y
10 except AttributeError :
11 r e t u r n NotImplemented
12 d e f __eq__( s e l f , o t h e r ) :
13 try :
14 r e t u r n s e l f . p r i o r i t y == o t h e r . p r i o r i t y
15 except AttributeError :
16 r e t u r n NotImplemented

Similarly, if we apply the wrapper shown in the second of heapq, we can


have a priority queue that is having sort stability, remove and update item,
and with constant serach time.

In single thread programming, is heapq or PriorityQueue


more efficient?
In fact, the PriorityQueue implementation uses heapq under the hood
to do all prioritisation work, with the base Queue class providing the
locking to make it thread-safe. While heapq module offers no locking,
and operates on standard list objects. This makes the heapq module
faster; there is no locking overhead. In addition, you are free to use
the various heapq functions in different, noval ways, while the Priori-
tyQueue only offers the straight-up queueing functionality.

Hands-on Example
Top K Frequent Elements (L347, medium) Given a non-empty array
of integers, return the k most frequent elements.
lxii PYTHON DATA STRUCTURES

Example 1 :
Input : nums = [ 1 , 1 , 1 , 2 , 2 , 3 ] , k = 2
Output : [ 1 , 2 ]

Example 2 :
Input : nums = [ 1 ] , k = 1
Output : [ 1 ]

Analysis: We first using a hashmap to get information as: item and its
frequency. Then, the problem becomes obtaining the top k most frequent
items in our counter: we can either use sorting or use heap. Our exemplary
code here is for the purpose of getting familiar with related Python modules.

• Counter(). Counter() has function most_common(k) that will return


the top k most frequent items. The time complexity is O(n log n).
1 from c o l l e c t i o n s import Counter
2 d e f topKFrequent ( nums , k ) :
3 r e t u r n [ x f o r x , _ i n Counter ( nums ) . most_common ( k ) ]

• heapq.nlargest(). The complexity should be better than O(n log n).


1 from c o l l e c t i o n s import Counter
2 import heapq
3 d e f topKFrequent ( nums , k ) :
4 count = c o l l e c t i o n s . Counter ( nums )
5 # Use t h e v a l u e t o compare with
6 r e t u r n heapq . n l a r g e s t ( k , count . k e y s ( ) , key=lambda x :
count [ x ] )

key=lambda x: count[x] can also be replaced with key=lambda x:


count[x].

• PriorityQueue(): We put the negative count into the priority queue


so that it can perform as a max-heap.
1 from queue import P r i o r i t y Q u e u e
2 d e f topKFrequent ( s e l f , nums , k ) :
3 count = Counter ( nums )
4 pq = P r i o r i t y Q u e u e ( )
5 f o r key , c i n count . i t e m s ( ) :
6 pq . put (( − c , key ) )
7 r e t u r n [ pq . g e t ( ) [ 1 ] f o r i i n r a n g e ( k ) ]

0.10 Bonus
Fibonacci heap With fibonacc heap, insert() and getHighestPriority()
can be implemented in O(1) amortized time and deleteHighestPriority()
can be implemented in O(Logn) amortized time.
0.11. EXERCISES lxiii

0.11 Exercises
selection with key word: kth. These problems can be solved by
sorting, using heap, or use quickselect

1. 703. Kth Largest Element in a Stream (easy)

2. 215. Kth Largest Element in an Array (medium)

3. 347. Top K Frequent Elements (medium)

4. 373. Find K Pairs with Smallest Sums (Medium

5. 378. Kth Smallest Element in a Sorted Matrix (medium)

priority queue or quicksort, quickselect

1. 23. Merge k Sorted Lists (hard)

2. 253. Meeting Rooms II (medium)

3. 621. Task Scheduler (medium)

You might also like