Dynamic Data Structures: Next
Dynamic Data Structures: Next
Systems Programming
More recently, we've focused on arrays of values, whose required size was only known at run-time.
In the case of dynamic arrays we've used C99 functions such as:
An extension to this idea is the use of dynamic data structures - collections of data whose required size is not known until run-time. Again, we'll use C99's
standard memory allocation functions whenever we require more memory.
However, unlike our use of realloc to grow (or shrink) a single data structure (there, an array), we'll see two significant differences:
we'll manage a complete data structure by allocating and deallocating its "pieces", and
we'll keep all of the "pieces" linked together by including, in each piece, a "link" to other pieces.
To implement these ideas in C99, we'll develop data structures that contain pointers to other data structures.
Such a data structure is also termed a first-in-last-out data structure, a FILO, because the first item added to the stack is the last item removed from it (not
the sort of sequence you want while queueing for a bank's ATM!).
typedef struct _s {
int value;
struct _s *next;
} STACKITEM;
STACKITEM *stack = NULL;
Of note:
we haven't really defined a stack datatype, but a single item that will "go into" the stack.
the datatype STACKITEM contains a pointer field, named next, that will point to another item in the stack.
we've defined a new type, a structure named _s, so that the pointer field next can be of a type that already exists.
we've defined a single pointer variable, named stack, that will point to a stack of items.
The need to do this is not known until run-time, and data (perhaps read from file) will determine how large our stack eventually grows.
As its name suggests, when we add items to our stack, we'll speak of pushing new items on the stack, and popping existing items from the stack, when
removing them.
....
The functions push_item and pop_item are quite simple, but in each case we must worry about the case when the stack is empty. We use a
NULL pointer
to represent the condition of the stack being empty.
In this example, the data held in each STACKITEM is just a single integer, but it could involve several fields of data. In that case, we may need more
complex functions to return all of the data (perhaps using a structure or pass-by-reference parameters to the pop_item function).
Again, we must ensure that we don't attempt to remove (pop) an item from an empty stack:
int pop_item(void)
{
STACKITEM *old;
int oldvalue;
if(stack == NULL) {
fprintf(stderr, "attempt to pop from an empty stack\n");
exit(EXIT_FAILURE);
}
oldvalue = stack->value;
old = stack;
stack = stack->next;
free(old);
return oldvalue;
}
Thus we'll write our own function, print_stack, to traverse the stack and successively print each item. using printf.
void print_stack(void)
{
STACKITEM *thisitem = stack;
while(thisitem != NULL) {
printf("%i", thisitem->value);
thisitem = thisitem->next;
if(thisitem != NULL)
printf(" -> ");
}
if(stack != NULL)
printf("\n");
}
Again, our stack is simple because each node only contains a single integer. If more complex, we may call a different function from within
print_stack to
perform the actual printing:
....
print_stack_item( thisitem );
Each integer read from lines of a file is pushed onto the stack, arithmetic operators pop 2 integers from the stack, perform some arithmetic, and push the
result back onto the stack.
In particular, the whole stack was represented by a single global pointer variable, and all functions accessed or modified that global variable.
Ideally we'd re-write all of our functions, push_item, push_item, and print_stack so that they received the required stack as a parameter, and used or
manipulated that stack.
Techniques on how, and why, to design and implement robust data structures are a focus of the unit CITS2200 Data Structures & Algorithms.
We term such a data structure a list, and its datatype declaration is very similar to our stack:
typedef struct _l {
char *string;
struct _l *next;
} LISTITEM;
LISTITEM *list = NULL;
As with the stack, we'll need to support empty lists, and will again employ a NULL pointer to represent it.
This time, each data item to be stored in the list is string, and we'll often term such a structure as "a list of strings".
Notice how we needed to traverse the whole list to locate its end.
Such traversal can become expensive (in time) for very long lists.
Of course, we again need to be careful about the case of the empty list:
char *remove_item(void)
{
LISTITEM *old = list;
char *string;
if(old == NULL) {
fprintf(stderr, "cannot remove item from an empty list\n");
exit(EXIT_FAILURE);
}
list = list->next;
string = old->string;
free(old);
return string;
}
Notice that we return the string (data value) to the caller, and deallocate the old node that was at the head of the list.
We say that the caller now owns the storage required to hold the string - even though the caller did not initially allocate that
storage.
We'll address all of these by developing a similar first-in-first-out (FIFO) data structure, which we'll name a queue.
We're hoping to address the main problems that were exhibited by the stack and list data structures:
typedef struct _e {
void *data;
size_t datalen;
struct _e *next;
} ELEMENT;
typedef struct {
ELEMENT *head;
ELEMENT *tail;
} QUEUE;
Of note:
We've introduced a new datatype, ELEMENT, to hold each individual item of data.
Because we don't require our functions to "understand" the data they're queueing, each element will just hold a void pointer to the data it's holding,
and remember its length.
Our "traditional" datatype QUEUE now holds 2 pointers - one to the head of the list of items, one to the tail.
We thus need a function to allocate space for, and to initialize, a new queue:
If we remember that:
the calloc function both allocates memory and sets all of its bytes to the zero-bit-pattern, and
that (most) C99 implementations represent the NULL pointer as the zero-bit-pattern,
To quickly add items - we don't wish appending to a very long queue to be slow.
We achieve this by remembering where the tail of the queue is, and quickly adding to it without searching.
To be able to queue data that we don't "understand".
We achieve this by treating all data as "a block of bytes", allocating memory for it, copying it (as we're told its length), all without ever interpreting its
contents.
More common is to store data in a structure that embeds the relative magnitude or priority of the data. Doing so requires insertions to keep the data-
structure ordered, but this makes searching much quicker as well.
Let's consider the type definition and insertion of data into a binary tree in C99:
Of note:
we've defined a data-structure containing two pointers to other instances of the data-structure.
the use of the struct _bt data type is temporary, and never used again.
here, each element of the data-structure, each node of the tree, holds a unique instance of a data value - here, a single integer - though it's very
common to hold multiple data values.
we insert into the tree with:
tree_root = tree_insert(tree_root, new_value);
the (magnitude of the) integer data value embeds the order of the structure - elements with lesser integer values are stored 'below' and to the left of
the current node, higher values to the right.
unlike some (more complicated) variants of the binary-tree, we've made no effort to keep the tree balanced. If we insert already sorted elements into
the tree, the tree will degenerate into a list, with every node having either a NULL left or a NULL right pointer.
bool find_recursively(BINTREE *t, int wanted) bool find_iteratively(BINTREE *t, int wanted)
{ {
if(t != NULL) { while(t != NULL) {
int order = (t->value - wanted); int order = (t->value - wanted);
if(order == 0) { if(order == 0) {
return true; return true;
} }
else if(order > 0) { else if(order > 0) {
return find_recursively(t->left, wanted); t = t->left;
} }
else { else {
return find_recursively(t->right, wanted); t = t->right;
} }
} }
return false; return false;
} }
Of note:
we do not modify the tree when searching, we simply 'walk' over its elements, determining whether to go-left or go-right depending on the relative
value of each element's data to the wanted value.
some (more complicated) variants of the binary-tree re-balance the tree by moving recently found values (their nodes) closer to the root of the tree in
the hope that they'll be required again, soon.
if the required value if found, the searching functions return true; otherwise we keep walking the tree until we find the value or until we can no longer
walk in the required direction (because either the left or the right pointer is NULL).