08-datastruct
08-datastruct
Data Structures
15-122: Principles of Imperative Computation (Spring 2016)
Frank Pfenning, André Platzer, Rob Simmons
1 Introduction
In this lecture we introduce the idea of imperative data structures. So far, the
only interfaces we’ve used carefully are pixels and string bundles. Both of
these interfaces had the property that, once we created a pixel or a string
bundle, we weren’t interested in changing its contents. In this lecture, we’ll
talk about an interface that mimics the arrays that are primitively available
in C0.
To implement this interface, we’ll need to round out our discussion of
types in C0 by discussing pointers and structs, two great tastes that go great
together. We will discuss using contracts to ensure that pointer accesses
are safe, as well as the use of linked lists to implement the stack and queue
interfaces. The linked list implementation of stacks and queues allows us
to handle stacks and queues of any size.
Relating this to our learning goals, we have
Computational Thinking: We illustrate the power of abstraction by con-
sidering both the client-side and library-side of the interface to a data
structure.
Algorithms and Data Structures: The abstract arrays will be one of our
first examples of abstract datatypes.
Programming: Introduction of structs and pointers, use and design of in-
terfaces.
2 Structs
So far in this course, we’ve worked with five different C0 types — int,
bool, char, string, and arrays t[] (there is a array type t[] for every type
L ECTURE N OTES
Data Structures L8.2
t). The character, Boolean and integer values that we manipulate, store
locally, and pass to functions are just the values themselves. For arrays (and
strings), the things we store in assignable variables or pass to functions are
addresses, references to the place where the data stored in the array can be
accessed. An array allows us to store and access some number of values of
the same type (which we reference as A[0], A[1], and so on).
Therefore, when entering the following commands in Coin (the outputs
have been elided),
--> char c = ’\n’;
--> int i = 4;
--> string[] A = alloc_array(string, 4);
--> A[0] = "hi";
--> A[1] = "je";
--> A[2] = "ty";
--> A[3] = "lo";
the interpreter will store something like the following in its memory:
The next data structure we will consider is the struct. A struct can be
used to aggregate together different types of data, which helps us create
data structures. By contrast, an array is an aggregate of elements of the
same type.
Structs must be explicitly declared in order to define their “shape”. For
example, if we think of an image, we want to store an array of pixels along-
side the width and height of the image, and a struct allows us to do that:
1 typedef int pixel;
2
3 struct img_header {
4 pixel[] data;
5 int width;
6 int height;
7 };
Here data, width, and height are fields of the struct. The declaration
expresses that every image has an array of data as well as a width and a
L ECTURE N OTES
Data Structures L8.3
L ECTURE N OTES
Data Structures L8.4
We can write to the fields of a struct by using the arrow notation on the
left-hand side of an assignment.
--> IMG->data = alloc_array(pixel, 2);
IMG->data is 0xFFAFC130 (int[] with 2 elements)
--> IMG->width = 1;
IMG->width is 1 (int)
--> (*IMG).height = 2;
(*(IMG)).height is 2 (int)
--> IMG->data[0] = 0xFF00FF00;
IMG->data[0] is -16711936 (int)
--> IMG->data[1] = 0xFFFF0000;
IMG->data[1] is -65536 (int)
The notation (*p).f is a longer form of p->f. First, *p follows the
pointer to arrive at the struct in memory, then .f selects the field f. We will
rarely use this dot-notation (*p).f in this course, preferring the arrow-
notation p->f.
An updated picture of memory, taking into account the initialization
above, looks like this:
L ECTURE N OTES
Data Structures L8.5
3 Pointers
As we have seen in the previous section, a pointer is needed to refer to a
struct that has been allocated on the heap. In can also be used more gener-
ally to refer to an element of arbitrary type that has been allocated on the
heap. For example:
--> int* ptr1 = alloc(int);
ptr1 is 0xFFAFC120 (int*)
--> *ptr1 = 16;
*(ptr1) is 16 (int)
--> *ptr1;
16 (int)
In this case, we refer to the value of p using the notation *p, either to read
(when we use it inside an expression) or to write (if we use it on the left-
hand side of an assignment).
So we would be tempted to say that a pointer value is simply an ad-
dress. But this story, which was correct for arrays, is not quite correct for
pointers. There is also a special value NULL. Its main feature is that NULL is
not a valid address, so we cannot dereference it to obtain stored data. For
example:
--> int* ptr2 = NULL;
ptr2 is NULL (int*)
--> *ptr2;
Error: null pointer was accessed
Last position: <stdio>:1.1-1.3
Graphically, NULL is sometimes represented with the ground symbol, so we
can represent our updated setting like this:
L ECTURE N OTES
Data Structures L8.6
4 Creating an interface
The next ten lectures for this class will focus on building, analyzing, and
using different data structures. When we’re thinking about implementing
data structures, we will almost always use pointers to structs as the core of
our implementation.
We’ve also seen two kinds of interfaces in our programming assign-
ment: the pixels interface in the early programming assignments, and the
string bundle interface in the DosLingos programming assignment. For
this lecture, we will work through an intellectual exercise: what if C0 did
not provide arrays (we’ll limit ourselves to arrays of strings) as a primitive
type in C0? If we wanted to use something like strings, we’d have to intro-
duce them from scratch as an abstract type, like pixels or string bundles.
For this exercise, we’ll build an abstract data type that functions like an
array of strings; in fact, we will see our implementation will end up doing
a bit more than C0. The primitive operations that C0 provides on string
arrays are the ability to create a new array, to get a particular index of an
array, and to set a particular index in an array. We could capture these as
three functions that act on an abstract type arr_t:
typedef _______ arr_t;
arr_t arr_new(int size); // alloc_array(string, size)
string arr_get(arr_t A, int i); // A[i]
void arr_set(arr_t A, int i, string x); // A[i] = x
But this is not a complete picture! An interface needs to also capture the
preconditions necessary for using that abstract type safely. For instance, we
know that safety of array access requires that we only create non-negative-
length arrays and we never try to access a negative element of an array:
L ECTURE N OTES
Data Structures L8.7
3 int arr_len(arr_t A)
4 /*@requires A != NULL; @*/;
5
L ECTURE N OTES
Data Structures L8.8
L ECTURE N OTES
Data Structures L8.9
In both cases, the A != NULL precondition allows us to say that the A->limit
and A->data dereferences are safe. But how do we know A->data[i] is
not an out-of-bounds array access? We don’t — the second precondition of
arr_get just tells us that i is nonnegative and less than whatever arr_len
returns!
If we want to use the knowledge that arr_len(A) returns the length
of A->data, then we’d need to add \result == \length(A->data) as a
postcondition of arr_len. . .
. . . and we can only prove that postcondition true if we add the precon-
dition A->limit == \length(A->data) to arr_len. . .
. . . and if we do that, it changes the safety requirements for the call to
arr_len in the preconditions of arr_get, so we also have to add the pre-
condition A->limit == \length(A->data) to arr_get.
The user, remember, didn’t need to know anything about this, because
they were ignorant about the internal implementation details of the arr_t
type. As long as the user respects the interface, only creating arr_ts with
arr_new and only manipulating them with arr_len, arr_get, and arr_set,
they should be able to expect that the contracts on the interface are suffi-
cient to ensure safety. But we don’t have this luxury from the library per-
spective: all the functions in the library’s implementation are going to de-
pend on all the parts of the data structure making sense with respect to all
the other parts. We’ll capture this notion in a new kind of invariant, a data
structure invariant.
L ECTURE N OTES
Data Structures L8.10
L ECTURE N OTES
Data Structures L8.11
Functions that create new instances of the data structure should ensure
that the data structure invariants hold of their results, and functions that
modify data structures should have postconditions to ensure that none of
those data structure invariants have been violated.
1 arr_t* arr_new(int size)
2 //@requires 0 <= size;
3 //@ensures is_arr(\result);
4 {
5 struct arr_header* AH = alloc(struct arr_header);
6 AH->limit = size;
7 AH->data = alloc_array(string, size);
8 return AH;
9 }
10
L ECTURE N OTES