0% found this document useful (0 votes)
28 views28 pages

Prog 2 Chap 45

This document provides an overview of Python's memory model and linked data structures. It explains that in Python, variables store references to objects in memory rather than the objects themselves. These references are managed through a namespace dictionary that maps names to memory locations. The document then demonstrates how Python handles assigning names to objects and changing references through examples. It discusses how reference counting and garbage collection allow Python to efficiently manage memory.

Uploaded by

suh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views28 pages

Prog 2 Chap 45

This document provides an overview of Python's memory model and linked data structures. It explains that in Python, variables store references to objects in memory rather than the objects themselves. These references are managed through a namespace dictionary that maps names to memory locations. The document then demonstrates how Python handles assigning names to objects and changing references through examples. It discusses how reference counting and garbage collection allow Python to efficiently manage memory.

Uploaded by

suh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Chapter 4 Linked Str u ctures and

I terators

O bjectives
• To understand Python's memory model and the concepts of names and refer­
ences.

• To examine different designs for lists, evaluate when each one is appropriate,
and analyze the efficiency of the methods for each implementation.

• To learn how to write linked structures in Python.

• To understand the iterator design pattern and learn how to write iterators for
container classes in Python.

@]] Overview
When you first began learning Python, you may not have concerned yourself with the
details of exactly how variables and their values are stored internally by the Python
interpreter. For many simple programs, all you need to know is that variables are
used to store values; however, as you write larger programs and begin to use more
advanced features, it's important to understand exactly what the Python interpreter
is doing when you assign a variable name to a value (an object) . Understanding these
details will help you avoid certain kinds of mistakes, allow you to better understand
the efficiency of your code, and open the door to new ways of implementing data
structures. It will also make it easier for you to learn other programming languages
that support a similar memory model and understand the trade-offs when you learn
languages with differing models.

107
108 Chapter 4 L i n ked Structu res and Iterators

After we cover the details of Python's memory model, we will use that informa­
tion to implement lists in a new way, using a so-called linked structure. The linked
implementation makes some operations more efficient and other operations less
efficient than they are for the built-in Python list. Understanding these trade-offs
will allow you to choose the appropriate implementation techniques depending on
what operations are needed by your application. Along the way, we will also discuss
the iterator pattern, a technique that allows client programs to access items in a
collection without making any assumptions about how the collection is implemented.
If you already understand Python references and Python's memory model, you
may be tempted to skip the next section; however, we suggest you read through
it, as these concepts are crucial for understanding many of the topics covered later.
Unless you are a Python expert, you will likely learn something new in this material.

1 4. 2 1 The Python M emory M odel


I n traditional programming languages, variables are often thought o f as being named
memory locations. Applying that idea to Python, you might think of a variable in
Python as a place, a sort of cubbyhole, corresponding to a location in the computer's
memory where you can store an object. This way of thinking will work pretty well
for simple programs, but it's not a very accurate picture for how Python actually
manages things. In order to avoid confusion with other languages, some people prefer
to talk about names in Python rather than using the traditional term variables.
In Python, a name always refers to some object that is stored in memory. When
you assign a Python name to an object, internally the Python interpreter uses a
dictionary to map that name to the actual memory location where the object is
stored. This dictionary that maintains the mapping from names into objects is
called a namespace. If you later assign the same name to a different object, the
namespace dictionary is modified so that it maps the name to the new memory
location. We are going to walk through an interactive example that demonstrates
what is happening "under the hood." The details of this are a bit tedious, but if
you fully understand them, you will have a much easier time understanding many
of the topics discussed later.
Let's start with a couple simple assignment statements.

»>» >
I
d = ' Dave '
j = d

When the statement d = ' Dave ' is executed, Python allocates a string object
containing Dave. The assignment statement j = d causes the name j to refer to the
4.2 The Python Memory Model 109

Figure 4. 1 : Two variables assigned to an object

same object as the name d; it does not create a new string object. A good analogy is
to think of assignment as placing a sticky note with the name written on it onto the
object. At this point , the data object Dave has two sticky notes on it: one with the
name d and one with the name j . Figure 4. 1 should help clarify what is happening.
In diagrams such as this, we use an arrow as an intuitive way to to show the "value"
of a reference; the computer actually stores a number that is the address of what
our arrow is pointing to.
Of course, the Python interpreter can't use sticky notes, it keeps track of these
associations internally using the namespace dictionary. We can actually access that
dictionary with the built-in function called locals O .

»> print locals ( )


{ ' _ _builtins _ _ ' : <module ' _ _builtin_ _ ' (built-in» , ' __name __ " ' __ main_ _ ' ,
' j ' : ' Dave ' , ' __ doc __ ' : None , ' d ' : ' Dave ' }

In this example, you can see that the local dictionary includes some Python
special names ( built ins ,
__ __name , and
__ __ doc ) some of which you may
__ __

recognize. We're not really concerned about those here. The point is that our
assignment statements added the two names d and j to the dictionary. Notice, when
the dictionary is printed, Python shows us the names as keys and a representation
of the actual data objects to which they map as values. Keep in mind that the
namespace dictionary actually stores the address of the object (also called a reference
to the object) . Since we usually care about the data, not locations, the Python
interpreter automatically shows us a representation of what is stored at the address,
not the address itself.
If, out of curiosity, we should want to find the actual address of an object, we
can do that. The Python id function returns a unique identifier for each object;
in most versions of Python, the id function returns the memory address where the
object is stored.
[» >print id (d) , id (j )
432128 432128
1 10 Chapter 4 Linked Structures and Iterators

As you can see by the output of the id function, after the assignment statement
j = d, both the names j and d refer to the same data object. Internally, the Python
interpreter keeps track of the fact that there are two references to the string object
containing " Dave " ; this is referred to as the reference count for the object.
Continuing with the example, let's do a couple more assignments.

» > j = ' John '


» > print id (d) , id ( j )
432 1 28 432256
» > d = ' Smith '
» > print id (d) , id (j )
432224 432256

When we assign j = ' John ' , a new string object containing " John" is created.
Using our sticky note analogy, we have moved the sticky note j to the newly created
data object containing the string " John" . The output of the id function following
the statement j = ' John ' shows that the name d still refers to the same object as
before, but the name j now refers to an object at a different memory location. The
reference count for each of the two string objects is now one.
The statement d = ' Smith ' makes the name d refer to a new string object
containing " Smith" . Note that the address for the string object " Smith" is different
from the string object " Dave " . Again, the address that the name maps to changes
when the name is assigned to a different object. This is an important point to note:
Assignment changes what object a variable refers to, it does not have any effect on
the object itself. In this case, the string "Dave " does not change into the string
" Smi th" , but rather a new string object is created that contains " Smith " .
At this point, nothing refers to the string " Dave " so its reference count is now
zero. The Python interpreter automatically deallocates the memory for the string
object containing "Dave " , since there is no longer a way to access it. By deallocating
objects that can no longer be accessed (when their reference count changes to zero) ,
the Python interpreter is able to reuse the same memory locations for new objects
later on. This process is known as garbage collection. Garbage collection adds
some overhead to the Python interpreter that slows down execution. The gain is
that it relieves the programmer from the burden of having to worry about memory
allocation and deallocation, a process that is notoriously knotty and error prone in
languages that do not have automatic memory management.
It is also possible for the programmer to explicitly remove the mapping for a
given name.
4.2 The Python Memory Model 111

» > del d _
» > print locals ( )
{ ' _ _builtins_ _ ' : <module ' _ _builtin_ _ ' (built-in» , ' _ _name , . ' __main __ ' ,
' j ' : ' John ' , ' __ doc __ ' : None}

The statement del d removes the name d from the namespace dictionary so it can
no longer be accessed. Attempting to execute the statement print d now would
cause a NameError exception to be raised just as if we had never assigned an object
to d. Removing that name reduces the reference count for the string " Smith" from
one to zero so it will now also be garbage collected.
There are a number of benefits to Python's memory model. Since a variable
just contains a reference to an object, all variables are the same size (the standard
address size of the computer, usually four or eight bytes) . The data type information
is stored with the object. The technical term for this is dynamic typing. That means
the same name can refer to different types as a program executes and the name gets
reassigned. This also makes it very easy for containers such as lists, tuples, and
dictionaries to be heterogeneous (contain multiple types) , since they also simply
maintain references to (addresses of) the contained objects.
The Python memory model also makes assignment a very efficient operation.
An expression in Python always evaluates to a reference to some object. Assigning
the result to a name simply requires that the name be added to the namespace
dictionary (if it's not already present) along with the four- or eight-byte reference.
In a simple assignment like j = d the effect is to just copy d's reference over to j 's
namespace entry.
It should be clear by now that Python's memory model makes it trivial (usual,
in fact) for multiple names to refer to the exact same object. This is known as
aliasing, and it can lead to some interesting situations. When multiple names refer
to the same object, changes to the object through one of the names will change the
data that all the names refer to. Thus, changes to the data using one name will be
visible via accesses through other names. Here's a simple illustration using lists.

» > Ist 1 = [1 , 2, 3]
» > Ist2 = Ist 1
» > Ist2 . append (4)
» > Ist 1
[1 , 2 , 3 , 4]

Since Istl and Ist2 refer to the same object , appending 4 to Ist2 also affects
I st 1 . Unless you understand the underlying semantics it seems like Ist l has
changed "magically, " since there are no intervening uses of 1st 1 between the first
and last lines of the interaction. Of course these potentially surprising results of
1 12 Cha pter 4 Linked Structures a nd Iterators

aliasing crop up only when the shared object happens to be mutable. Things like
strings, ints, and floats simply can't change, so aliasing is not an issue for these
types.
When we want to avoid the side effects of aliasing, we need to make separate
copies of an object so that changes to one copy won't affect the others. Of course
a complex object such as a list might itself contain references to other objects, and
we have to decide how to handle those references in the copying process. There are
two different types of copies known as shallow copies and deep copies. A shallow
copy has its own top-level references, but those references refer to the same objects
as the original. A deep copy is a completely separate copy that creates both new
references and, where necessary, new data objects at all levels. The Python copy
module contains useful functions for copying arbitrary Python objects. Here's an
interactive example using lists to demonstrate.

» > import copy


» > b = [ 1 , 2 , [3 , 4] , 6]
» > c = b
» > d = copy . copy (b) # creates a shallow copy
» > e = copy . deepcopy (b) # creates a deep copy
» > print b is c , b --
c
True True
» > print b is d, b --
d
False True
» > print b is e , b == e
False True

In this code, c is the same list as b, d is a shallow copy, and e is a deep copy. By the
way, there are numerous ways to get a shallow copy of a Python list. We could also
have used slicing (d = b [ : ] ) or list construction (d l i st (b) ) to create a shallow
=

copy.
So what's up with the output? The Python is operator tests whether two
expressions refer to the exact same object, whereas the Python == operator tests to
see if two expressions yield equivalent data. That means a is b implies a == b but
not vice versa. In this example, you can see that assignment does not create a new
object since b i s c holds after the initial assignment. However both the shallow
copy d created by slicing and the deep copy e are distinct new objects that contain
equivalent data to b. While these copies contain equivalent data, their internal
structures are not identical. As depicted in Figure 4 . 2 , the shallow copy simply
contains a copy of the references at the top level of the list, while the deep copy
contains a copy of the mutable parts of the structure at all levels. Notice that the
4.2 The Python Memory Model 113

b --------�--���---

e --------jill

Figure 4.2: Pictorial representation of shallow and deep copies


114 Chapter 4 Linked Structures and Iterators

deep copy does not need to duplicate the immutable data items since, as mentioned
above, aliasing of immutable objects does not raise any special issues.
Because of the residual sharing in the shallow copy, we can still get aliasing side
effects. Consider what happens when we start modifying some of these lists.

» > b [0] = 0
» > b . append ( 7 )
» > c [2] . append (5)
» > print b
[0 , 2 , [3 , 4 , 5] , 6 , 7]
» > print c
[0 , 2 , [3 , 4 , 5] , 6 , 7]
» > print d
[1 , 2 , [3 , 4 , 5] , 6]
» > print e
[1 , 2 , [3 , 4] , 6]

Based on Figure 4 . 2 , this output should make sense to you. Changing the top level
of the list referred to by b causes c to change, since it refers to the same object. The
top-level changes have no effect on d or e, since they refer to separate objects that
are copies of b.
Things get interesting when we change the sublist [3 , 4] through c . Of course
b sees these changes (since b and c are the same object) . But now d also sees
those changes, since this sublist is still a shared substructure in the shallow copy.
Meanwhile, the deep copy e does not see any of these changes; since all of the mutable
structures have been copied at every level, no changes to the object referred to by
b will affect it . Figure 4.3 shows the memory picture at the end of this example.
As a final note, the sort of complete, reference-based diagrams that we have
been using in this section can take up a lot of space and can sometimes be difficult
to interpret. Since the distinction between reference and value is not crucial in
the case of immutable objects, in an effort to keep our diagrams as straightforward
as possible, we will not generally draw immutable objects as separate data objects
when they are contained with another object. Figure 4.4 shows the same situation
as Figure 4 . 3 drawn in a more compact style.

1 4.2.1 1 Passing Para meters


Although there seems to be confusion at times among programmers about Python's
parameter-passing mechanism, once you understand Python's memory model, pa­
rameter passing in Python is very simple. Computer scientists use the terminology
actual parameters to refer to the names of the parameters provided when a function
4.2 The Python Memory Model 115

b --------�--------�----

d -�-+---

e ------1-.

Figure 4.3: Memory representation at end of shallow and deep copy example
116 Chapter 4 Linked Structu res a nd Iterators

d ----.IiifI

e ------111!11

Figure 4.4: Simplified memory representation at end of shallow and deep copy
example

is called and formal parameters to refer to the names given to the parameters in the
function definition. One way to remember this is the actual parameters are where
the function is actually called. In the following example, b, c , and d are the actual
parameters and e , f , and g are formal parameters.

# parameters . py
def func (e , f , g) :
e += 2
f . append (4)
g [8 , 9]
=

print e , f , g

def main e ) :
b = 0
c = [ 1 , 2 , 3]
d = [5 , 6 , 7]
func (b , c , d)
print b , c , d

main e )

The output of this example is


4.3 A Linked I m plementation of Lists 117

120 [1 ,
[1 ,
2,2, 3 , 4] [8 , 9]
3 , 4] [5 , 6 , 7]
.

The easy way to think of how parameters are passed in Python is that the formal
parameters are assigned to the actual parameters when the function is called. We
cannot do this ourselves because the names e, f , and g are accessible only within
func, while the names b, c , and d are accessible only inside main. But the Python
interpreter handles the assignment behind the scenes for us when main calls func .
The result is that e refers to the same object as b, f refers to the same object as
c , and g refers to the same object as d when the function starts executing. The
statement e += 2 causes the name e to refer to a new object while b still refers to
the object containing zero. Since f and c refer to the same object, when we append
4 onto that object, we see the result when c prints. We assigned the name g to a
new object so g and d now refer to different objects, and thus the printed value of
d remains unchanged.
It is important to note that a function can change the state of an object that
an actual parameter refers to; however, a function cannot change which object the
actual parameter refers to. So information can be communicated to the caller by
passing a mutable object and having the function mutate it via the corresponding
formal parameter. Keep in mind, however, that assigning a new object to a formal
parameter inside the function or method will never change the actual parameter in
any way, regardless of whether the actual parameter is mutable or not.

1 4.3 1 A Li n ked I m plementation of Lists


With this understanding of Python names and references, we are ready to take a
look at a new way of implementing sequential collections. As you learned in the
last chapter, Python lists are implemented using arrays. The drawback of an array
implementation is the expense of inserting and deleting items. Since the array is
maintained as a contiguous block of memory, inserting into the midst of the array
requires shifting the original contents down to make room for the new item. Deleting
results in a similar effort to close the gap. The fundamental problem here is that the
ordering of the sequence is maintained by using an ordered sequence of addresses in
memory.
But this is not the only possible way to maintain sequence information. Instead
of maintaining the sequence information of an item implicitly by its position in
memory, we can instead represent the sequencing explicitly. That is, we can scatter
the elements of the sequence anywhere in memory and have each item "remember"
where the next one in the sequence resides. This approach produces a linked
118 Chapter 4 Linked Structures and Iterators

sequence. To take a concrete example, suppose we have a sequence of numbers


called myNums . Figure 4 . 5 shows both a contiguous and a linked implementation of
the sequence.

Figure 4 . 5 : Contiguous array on the left and linked version on the right

Notice that the linked version of the sequence does not use a single section of
memory; instead, we create a number of objects (often referred to as nodes) each
of which contains a reference to a data value and a pointer/reference to the next
element in the list. With the explicit references, a node can be stored at any location
in memory at all.
Given our linked implementation of myNums , we can perform all of the same
operations that we can do on the array-based version. For example, to print out all
the items in the sequence, we could use an algorithm like this:

current_node = myNums
while <current_node is not at the end of the sequence> :
print current_node ' s data
current_node = current_node ' s link to the next node

Implementing this algorithm requires a concrete representation for nodes that


includes a way to get ahold of the two pieces of information in the node (the data
and the link to the next item) and some way to know when we have reached the
end of the sequence. We could do this in a number of ways; probably the most
straightforward is to create a simple ListNode class that does the job.
4.3 A Linked I mplementation of Lists 119

# ListNode . py
class ListNode (obj ect ) :

def __ init __ ( self , item = None , link = None) :

" " " creates a ListNode with the specified data value and link
post : creates a ListNode with the specif ied data value and link'' '' ''

self . item = item


self . link = link

A ListNode object has an instance variable item to store the data associated
with the node and an instance variable link that stores the next item in the
sequence. Since Python supports dynamic types, the item instance variable can
be a reference to any data type. Thus, just as you can store any data type or a
mixture of data types in the built-in Python list, our linked implementation will also
be able to do that. That just leaves us with the issue of what to do with the link
field to indicate that we have come to the end of a sequence. The special Python
object None is generally used for this purpose.
Let's play around a bit with the ListNode class. The following code creates a
linked sequence of three items.

n3 = ListNode (3)
n2 = ListNode ( 2 , n3)
n1 = ListNode ( 1 , n2 )

1 2 3 None

I I
n1 n2 n3

Figure 4.6: Three ListNodes linked together

Tracing the execution of this code produces the situation depicted in Figure 4.6.
Here each double box corresponds t o a ListNode object with a data element and a
link to the next ListNode object. Notice, we have simplified the diagram by showing
the numbers (which are immutable) inside the ListNode boxes instead of drawing
120 Chapter 4 Linked Structu res and Iterators

a reference from the item part of the ListNode to the number object. Both n2 and
nl . link are references to the same ListNode object containing the data value 2 and
both n3 and n2 . l ink are references to the same object containing the data value 3.
We can also access the ListNode object containing the data value 3 as nl . link . link
and its data value as nl . link . l ink . item. Normally, we do not want to write code
such as that, but it demonstrates how each object and data value can be reached
from the start of the linked structure. Typically we only store a reference to the
first Li stNode object and then follow the links from the first item to access other
items in the list.
Suppose we want to insert the value 2 . 5 into this sequence so that the values
remain in order. The following code accomplishes this:

I n25 = ListNode (2 . 5 . n2 . 1 inkl


n2 . 1 ink = n25

Figure 4 . 7 show this pictorially. The statement n25 = ListNode ( 2 . 5 , n2 . link)


allocates a new ListNode and calls its ini t method. The first line of ini t
self . item = item sets a reference to 2 . 5 in the ListNode. The next line self . link
= l ink stores a reference to the link parameter that is the ListNode n3. After the
__ ini t method finishes, the statement n2 . link = n25 sets the link instance
__

variable of ListNode n2 so it refers to the newly created ListNode n25. None of


the references to ListNode nl were changed as part of this. Inserting a node in a
linked structure only requires updating the link of the node before the one we are
inserting. Since insertion into a linked structure does not require moving any of the
existing data, it can be done very efficiently.
Note that the order in which we update the links is extremely important. If we
change our statements to insert 2 . 5 to the following, it will not work.

# Incorre ct version . It yon ' t york !


n25 = ListNode (2 . 5)
n2 . 1ink = n25
n25 . 1ink = n2 . 1ink

In this case, the statement n2 . l ink = n25 results in the reference to the ListNode
containing the 3 being overwritten. The reference count for that ListNode will
be reduced by one and if there are no other references to it, the ListNode will be
deallocated. The statement n25 . link = n2 . l ink sets the link instance variable
in ListNode n25 to the ListNode n25. This breaks the connections in our linked
structure; it no longer contains the ListNode for 3. It also generates a cycle in our
linked structure. If we write a loop that starts at ListNode nl and continues to
4.3 A Linked I mplementation of Lists 121

n1 n2 n3

=
=

n25

After self.item = item executed in _init_ method

n1

=
=
=

n25

After self. link = link executed in _init_ method

n1

=
=
=
=

n25

After n2.link = n2S executed

Figure 4 . 7: Inserting a node in the linked structure


122 Chapter 4 Li nked Structures and Iterators

follow the l ink instance variables until a link with the value None is reached, we will
have an infinite loop, as the l ink for ListNode 2 . 5 refers to ListNode 2 . 5 itself.
We will just keep going around and around. Programming with linked structures
can get tricky, and the best way to make sure you have things correct is to trace
your code and draw the pictures.
Let's consider what has to happen in order to delete an item from our sequence.
To delete the number 2, we need to update the link field for the ListNode object
containing 1 so that it "hops over" the node for 2. The code n1 . link = n25
accomplishes this. That's it; deleting from the sequence is even easier than inserting.
If there are no other references to the deleted node, as usual its memory will be
automatically deallocated.

/
Chapter 5 Stacks and Q ueues

Objectives
• To understand the stack ADT and be familiar with various strategies for
building an efficient stack implementation.

• To gain familiarity with the behavior of a stack and understand and anlayze
basic stack-based algorithms.

• To understand the queue ADT and be familiar with various strategies for
building an efficient queue implementation.

• To gain familiarity with the behavior of a queue and understand and analyze
basic queue-based algorithms.

[[IJ Overview
In the past two chapters, we have looked in detail at the list data structure. As you
know, a list is a sequential structure. We have also looked at sorted lists, where the
ordering of the items in the list is dictated by the "value" of the item. Sometimes
it is useful for a sequential collection to be ordered according to the time at which
items are added, rather than what the particular item is. In this chapter, we'll take
a look at two simple examples of such structures, called stacks and queues.

1 5.2 1 Stacks
A stack is one of the one of the simplest container classes. As you'll see however,
despite its simplicy, a stack can be amazingly useful.
156 Chapter 5 Stacks a nd Queues

\ 5.2.1 \ The Stack ADT


Imagine a list (a sequential data structure) where you have access to the data only
at one end. That is, you can insert and remove items from one end of the list. Also,
you can look at the contents of only the single item at the end of the list (called
the top) . The rather restrictive data structure just described is called a stack. You
can think of it as modeling a real-world stack of items: you can only (safely) add
or remove an item at the top of a stack. And if things are stacked neatly, only the
top item is visible.
If you are into sweet confections, you might also think of a stack as the computer
science equivalent of a Pez candy dispenser. By convention our stacks are "spring
loaded, " and so adding an item to a stack is called pushing the item onto the stack.
Removing the top item from a stack is called p op ping it. Notice that the last item
pushed on a stack must always be the first item to be popped back off again. Because
of this, a stack is also referred to as a last in, first out (LIFO) data structure. You
could also call it a FILO structure, and of course a stack of filo dough makes a
delicious pastry. The specification for a typical stack ADT looks like this.

class Stack (obj ect) :

" " "post : creates an empty LIFO stack " " "

def push ( self , x) :

" " " po st : places x on top of the stack " " "

def pop ( self ) :

" " " pre : self . s ize O > 0


post : removes and returns the top element of
the stack " " "

def top ( self ) :

" " " pre : self . size O > 0


post : returns the t op element of the stack without
removing it " " "

def size ( self ) :

" " " post : returns the number of elements in the stack " " "
5.2 Stacks 157

1 5.2.2 1 S i m ple Stack Appl ications


Even though they are very simple, stacks can be very handy. You have, no doubt,
already come across many uses of stacks in computing, but you may not have
recognized them. For example, you have probably used some applications that
include an "undo" feature. For example, you might be editing a document in a word
processing program and accidently delete a bunch of text; no problem, you quickly
go to the Edit menu and select the undo command and your text is "magically"
restored. Need to go back even further? Many applications allow you to keep
undoing commands to rollback to virtually any previous state. Internally, this is
accomplished using a stack. Each time an action is performed, the information
about that action is saved on a stack. When "undo" is selected, the last action is
popped off the stack and reversed. The size of the stack determines how many levels
you can undo.
Another example of the use of stacks is inside the computer itself. You know that
functions are an important aspect of programming languages, and modern systems
provide hardware features to support programs that make extensive use of functions.
When a function is called, the information about the function such as the values
of local variables and the return address (where the program left off before calling
the function) is pushed on a so-called run-time stack. The last function called is
always the first to return, so when a function ends, its information is popped off
the run-time stack and the return address is used to tell the CPU the location of
the next instruction to execute. As functions are called, the stack grows; each time
a function returns, the stack shrinks back. You may notice when you get an errror
message in Python, the interpreter prints out a traceback that shows how the error
message arose. This traceback shows the contents of the run-time stack at the time
the exception was raised.
A stack is also important for the syntactic analysis of computer programs.
Programming language structures must always be properly nested. For example,
you can have an if completely inside of a loop or you can have it outside (before or
after) the loop, but it is not correct for an if to "straddle" a loop boundary. A stack
is the proper data structure for handling nested structures. We can illustrate this
using a simpler nesting example, namely parentheses. In mathematics, expressions
are often grouped using parentheses. Here's a simple example: ((x + y ) * x)/(3 * z ) .
In a correct expression, the parentheses are always properly nested, or balanced.
Looking just at the parentheses, the structure of the previous expression is ( 0 ) o .
Every opening parenthesis has a matching closing one, and none of the opening­
closing pairs "interleave" with other pairs.
158 Chapter 5 Stacks and Queues

Suppose you were writing an algorithm to check that a sequence of parentheses is


properly balanced. How could that be done? Basically, we must guarantee that every
time we see a closing parenthesis, there has already been an opening parenthesis that
matches it. We can do this by checking that there is an equal number of opening
and closing parentheses and that we never have a sequence where more closings have
been seen than openings. One simple approach is to keep a "balance" of opening
parentheses and make sure that it is always non-zero as we scan the string from left
to right. Here's a simple Python function that scans a string to determine whether
the parentheses are balanced.

# parensBalance 1 . py
def parensBalance 1 ( s ) :
open = 0
f or ch in s :
if ch == ( :
' '

open += 1
elif ch == ' ) : '

open -= 1
if open < 0 :
# there is no matching opener , so check fails
return False
return open == 0 # everything balances if no unmat ched opens

So far, this doesn't look very stack-like. However, things get much more inter­
esting if we introduce different types of parenthesis. For example, mathematicians
(and programming language designers) often use multiple types of grouping markers,
such as parenthesis, ( ) ; square brackets, [] ; and curly braces, {}. Suppose these
are mixed in a string such as [(x + y) * x] / (3 * z)/ [sin(x) + cos(y)] . Now our simple
counting approach doesn't work, as we have to ensure that each closing marker is
matched to the proper type of opening marker. Even though they have the same
number of opening and closing markers, an expression with the structure [0] 0 is
OK, but [(D O is not. Here is where a stack comes to the rescue.
In order to assure proper balancing and nesting with multiple grouping symbols,
we have to check that when a closing marker is found, it matches the most recent
unmatched opening marker. This is a LIFO problem that is easily solved with a
stack. We just need to scan the string from left to right. When an opening marker
is found, it is pushed onto a stack. Each time a closing marker is found, the top item
of the stack must be the matching opening marker, which is then popped. When
we get all done, the stack should be empty. Here's some code to do it:
5.2 Stacks 159

# parensBalance2 . py

from Stack import Stack

def parensBalance2 ( s ) :
stack = Stack
f or ch in s :
if ch in II ( [ { II : # push an opening marker
stack . push (ch)
elif ch in II ) ] } II : # match clos ing vith top of stack
if stack . size < 1 : # no pending open to match it
return False
else :
opener = stack . pop ( )
i f opener+ch not in [ II () II , II [] II , II { } II ] :
# not a matching pair
return False
return stack . size ( ) == 0 # empty stack means everything matched up

Figure 5 . 1 shows the intermediate steps of tracing the execution of the algorithm
using the expression { [2 * (7 4) + 2] + 3} * 4. It shows five "snapshots"
-

with the characters processed so far and the current stack contents below them.
You should trace through the algorithm by hand to convince yourself that it works.

{[2* {[2* ( {[2*(7-4) {[2*(7-4)+2] {[2*(7-4)+2]+3}*4

(
[ [ [
{ { { {

Figure Example of tracing through parentheses matching

1 5.2.3 1 I m plementi ng Stacks


In a language like Python, the simplest way to implement a stack is to use the
built-in list . Given the flexibility of the Python list, each stack operation translates
to a single line of code.
160 Chapter 5 Stacks a nd Queues

# Stack . py
class Stack (obj ect ) :

def __ init __ ( self ) :


self . items = []

def push ( self , item) :


self . items . append ( it em)

def pop (self ) :


return self . it ems . pop ( )

def t op (self ) :
return self . items [-l]

def size ( self ) :


return len ( self . items)

Recalling our discussion of Python lists, each of these operations is performed in


constant time, so a stack is very efficient. Of course, insertion at the end of a list can
occasionally require extra work to create a new array and copy all the values into the
new array, but Python does this automatically. As discussed in subsection 3 . 5 . 1 ,
the average amount of time to append onto the end of a list remains constant since
the array size is increased proportionally as necessary.
If a list type were not readily available, it would also be easy to implement a stack
using an array. A stack with a fixed maximum size can be handled by allocating an
array of the required maximum size and using an instance variable to keep track of
how many "slots" in the array are actually being used. If the maximum stack size
is unknown, then the push operation will have to handle allocating a larger array
and copying items over when the stack exceeds the current array size.
Another reasonable implementation strategy for a stack is to use a singly linked
list of nodes containing the stack data. A stack object would just need an instance
variable with a reference to the first node of the linked list, which would be the top
of the stack. Again, both pushing and popping are easily accomplished in constant
time using a linked structure. As with the pure array implementation, keeping track
of the size of the stack in an instance variable is advisable so that the size operation
does not have to traverse the list to count items.
5.3 Queues 169

1 5 . 3 1 Q ueues
Another common data structure that orders items according to when they arrive is
called a queue. Whereas a stack is a last in, first out structure, the ordering of a
queue is first in, first out (FIFO) . You are undoubtedly familiar with the concept
since you often spend time in a queue yourself. When you are standing in line at a
restaurant or store, you are in a queue. In fact, British English speakers don't stand
in line, they "wait on queue."

1 5.3. 1 1 A Queue ADT


Conceptually, a queue is a sequential structure that allows restricted access at both
ends. Items are added at one end and removed from the other. As usual, computer
scientists have their own terminology for these operations. Adding an item to the
back of a queue is called an enqueue, and the operation to remove an item from the
front is called dequeue. As with stacks, it is also handy to be able to peek at the
item on the front of the queue without having to remove it. This is usually called
front, but other terms are sometimes used like head or first.
Here is a specification of the Queue ADT:

class Queue (obj ect) :

" " "post : creates an empty FIFO queue " " "

def enqueue (self , x) :

" " "post : adds x at back of queue " " "


1 70 Chapter 5 Stacks and Queues

def dequeue (self ) :

" " "pre : self . s ize O > 0


post : removes and returns the front item" " "

def front ( self ) :

" " " pre : self . size 0 > 0


post condit ion : returns f irst item in queue " " "

def size ( self ) :

" " "postcondit ion : return number of items in queue " " "

1 5.3.2 1 S i m ple Q ueue Applications


Queues are commonly used in computer programming as a sort of buffer between dif­
ferent phases of a computing process. For example, when you print out a document,
your "job request" is placed on a queue in the computer operating system, and these
jobs are generally printed in a first come, first served order. In this case, the queue
is used to coordinate action across separate processes (the application that requests
the printing and the computer operating system that actually sends information to
the printer) . Queues are also frequently used as intermediate, data way stations
within a single computer program. For example, a compiler or interpreter might
need to make a series of "passes" over a program to translate it into machine code.
Often the first pass is a so-called lexical analysis that splits the program into its
meaningful pieces, the tokens. A queue is the perfect data structure to store the
sequence of tokens for subsequent processing by the next phase, which is typically
some sort of grammar-based syntactic analysis.
As an example of using a queue for an intermediate data structure, consider the
problem of determining whether or not a phrase is a palindrome. A palindrome is a
sentence or phrase that has the same sequence of letters when read either forward or
backward. Some famous examples are "Madam, I'm Adam" or "I prefer Pl." Some
words like "racecar" are palindromes all by themselves. Let's write a program to
analyze user input and validate it as a palindrome. The heart of the program will
be an isPalindrome function:
5.3 Queues 171

def isPalindrome (phrase) :

I I lI l I pre :
phrase is a string
post : returns True if the alphabet ic characters in phrase
f orm the same sequence reading either left-to-right
or right-to-left .

The tricky part of the isPalindrome function is that the palindromeness of


a phrase is determined only by the letters; spaces, punctuation, and capitalization
don't matter. We need to see if the sequence of letters is the same in both directions.
One way to approach this issue is to break the problem down into phases. In the
first phase we strip away the extraneous portions and boil the expression down to
its constituent letters. Then a second pass can compare the letter sequence in both
the forward and backward directions to see whether they match up. Conveniently,
a queue data structure can be used to store the characters so they can be accessed
again in the original order, and a stack can be used to store them for access in a
reversed order (remember, a stack reverses its data) .
Recasting this two-phase algorithm as a Python program, we get the following:

# palindrome . py
from MyQueue import Queue
from Stack import Stack

def isPalindrome (phrase ) :


f orward = Queue ( )
reverse Stack ( ) =

extractLetters (phrase , f orward , reverse)


return sameSequence (forward , reverse)

Now we just need to define the functions that implement the two phases:
extractLetters and sameSequence. The former must go through the phrase and
add each letter to both the intermediate stack and queue. Here's one way to do
that.
import string
def extractLetters (phrase , q, s) :
f or ch in phrase :
if ch . isalpha O :
ch ch . lower O
=

q . enqueue ( ch)
s . push (ch)
172 Cha pter 5 Stacks and Queues

The sameSequence function needs to compare the letters on the stack and queue.
If all the letters match up, we have a palindrome. As soon as two letters fail to match,
we know that our phrase has failed the test.

def sameSequence (q , s ) :
while q . size ( ) > 0 :
chi = q . dequeue ( )
ch2 = s . pop O
if chi ! = ch2 :
return False
return True

With the isPalindrome function in hand you should be able to easily complete
our palindrome checking program. Try it out on these two examples: "Able was I,
ere I saw Elba" and "Evil was I, ere I saw Elvis" Obviously, only one of these is
really a palindrome. A quick search on the Internet will yield lots of interesting test
data. Of course, you'll need an implementation of queues to get your program up
and running; read ahead for some hints.

1 5 . 4 1 Q ueue I m p lementations
Implementing a queue with Python's built-in list i s straightforward. We just need
to insert at one end of the list and remove from the other end. Since the Python
list is implemented as an array, inserting at the beginning is an inefficient operation
if the list is very long. Removing an element from the beginning of the list is also
inefficient; so neither option is ideal.
An alternative would be to use a linked implementation. The sequence of items
can be maintained as a singly linked list. The queue object itself then maintains
instance variables that point to the first and last nodes of the queue. As long as
we do insertions at the end of the linked list and removals from the front, both
of these operations can easily be done in constant (8( 1 ) ) time. Of course, the
linked implementation would be a lot trickier to code. Before pursuing this or
other options, it might be wise to consider the words of Tony Hoare, a very famous
computer scientist: "Premature optimization is the root of all evil." There are a
number of justifications for this statement. It does not make sense to worry about
optimizing code until you are certain what the bottlenecks are (i.e. , where most of
the time is being spent) . If you double the speed of code that is 5% of the execution
time of your program, your program will execute only about 3% faster. But if
you double the speed of code that is 50% of the execution time, your program will
execute about 33% faster. As we have already seen with the binary search algorithm,
5.4 Queue I m plementations 173

more efficient code is often more complex and more difficult to get correct. Before
you worry about making a specific section of code more efficient, you should make
certain that it will have a significant effect on the speed of your overall program.
In the case of implementing a queue in Python, there is the additional con­
sideration that the underlying Python list operations are coded in very efficient
C code and can take advantage of system-level calls that move blocks of memory
around very quickly. In theory, we may be able to write linked code with better
asymptotic ( theta) behavior, but the queue sizes will have to be very large indeed
before our linked code overtakes the optimized Python list code. Coding a linked
implementation of a queue is a great exercise in using linked structures, but we have
yet to encounter a situation in practice when such a queue actually out-performed
one based on the built-in list.
In languages such as C / C++ and Java that support fixed-size arrays, an array
is often the appropriate structure to use to implement a queue, particularly if the
maximum queue size is known ahead of time. Instead of performing the enqueue
and dequeue operations by shifting elements in the array, we can keep track of the
indices that represent both the front / head and back / tail of the queue. As long as
the maximum number of elements in the queue at any point in time does not exceed
the size of the array, this is an excellent method for implementing queues. Each
time an item is added to the queue, the tail index is increased by one. If we add
one and use the modulus operator we can easily make the index wrap around to the
beginning of the array, simulating a circular array representation. For an array of
size 10, we'd increment the tail like this:
tail (tail
= + 1 ) % 10

Since the index positions start at 0, the last position is index 9. When we add
1 to 9 we get 10 and 10 modulus ( remainder ) 10 is This is a common technique
used in many computer algorithms to wrap around back to 0 after some maximum
value is reached. The same technique is used for incrementing head when an item
is dequeued. The effect is that the head index chases the tail index around and
around the array. As long as items remain in the queue, head never quite catches
tail.
In Python, the circular array technique could also be used by simply starting
with a list of the appropriate size. List repetition provides an easy way to do this.

I ���f.items [None]
= * 10

There is one subtlety in the circular array/ list approach to queues. We need to
think carefully about the values for head and tail that indicate when the queue
174 Chapter 5 Stacks a nd Queues

is full or empty. Writing an invariant for the class that relates these values is an
excellent technique to make certain we get it right. We would like the head index
to indicate where the front item in the queue is located in the array. It makes sense
for the tail index to indicate either the position of the last item in the queue or
the following location where the next item inserted into the queue would be placed.
When the queue is empty, it is not clear what the values should be for head and
tail. Since we are using a circular array it is possible that the value for tail is less
than head. And after inserting a few items and then removing those items, head
and tail are in the middle of the array jlist so we cannot use any fixed values of
head and tail to indicate an empty queue. Instead, we must rely on their relative
values.
Suppose we start with a empty queue having both head and tail set to index
Then clearly when head == tail the queue is empty. Suppose the size of the
circular array is n. Now consider what happens if we enqueue n items without
any dequeues. As the tail pointer is incremented n times, it will wrap around and
land back at Thus, for a full queue, we once again have the condition head ==
tail. That's a problem. Since both a full queue and an empty queue "look" exactly
the same, we can't tell which we have by looking at the values of head and tail.
We could rescue the situation by simply agreeing that a "full" queue contains only
n - 1 items, in effect wasting one cell. However, a simpler approach is just to use
a separate instance variable that keeps track of the number of items in the queue.
This approach leads us to the following invariant:
1 . The instance variable s ize indicates the number of items in the queue and °
<= s ize <= capacity where capacity is the fixed size of the arrayjlist.
If size > 0, the queue items are found at locations items [ (head+i ) %capacity] ,
for i in range ( s ize) , where items [head] is the front of the queue and tail
== (head+s ize - 1 ) %capacity.
3. If s ize == 0 , head == (tail+ 1 ) %capacity .
Using this invariant, you should be able to complete a circular list implementation
of a queue without too much effort .

You might also like