0% found this document useful (0 votes)
97 views41 pages

Data Structures and Algorithms

Data types define a set of values and operations that can be performed on those values. Common data types include integers, floating point numbers, characters, and Booleans. Arrays are a data structure that store multiple values of the same type in contiguous memory locations that can be individually accessed via indices. Multi-dimensional arrays store elements indexed by multiple integers, representing rows and columns. Proper choice of data structures is important for program design as it impacts performance and implementation difficulty.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views41 pages

Data Structures and Algorithms

Data types define a set of values and operations that can be performed on those values. Common data types include integers, floating point numbers, characters, and Booleans. Arrays are a data structure that store multiple values of the same type in contiguous memory locations that can be individually accessed via indices. Multi-dimensional arrays store elements indexed by multiple integers, representing rows and columns. Proper choice of data structures is important for program design as it impacts performance and implementation difficulty.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

DATA TYPE

In computer programming, a data type simply refers to a defined kind of data, that is, a
set of possible values and basic operations on those values. When applied in
programming languages, a data type defines a set of values and the allowable operations
on those values.

Data types are important in computer programmes because they classify data so that a
translator (compiler or interpreter) can reserve appropriate memory storage to hold all
possible values, e.g. integers, real numbers, characters, strings, and Boolean values, all
have very different representations in memory.

A data type consists of:


a domain (= a set of values)
a set of operations that may be applied to the values.

Data Type Classification


Some data items may be used singly whilst others may be combined together and
arranged to form other data items. The former are classified as ‘simple data types’
whereas the latter are classified as ‘data structures’. However, the following classification
is appropriate for study at this level .The simple data types are classified as follows:

a. Character
b. Numeric integer
c. Numeric real
d. Boolean (logical).

Examples of Data Types


Almost all programming languages explicitly include the notion of data type, though
different languages may use different terminology.
Common data types in programming languages include those that represent integers,
floating point numbers, and characters, and a language may support many more.

Example 1: Boolean or logical data type provided by most programming languages.


Two values: true, false.
Many operations, including: AND, OR, NOT, etc.

Example 2: In Java programming language, the “int” type represents the set of 32-bit
integers ranging in value from -2,147, 483, 648 to 2,147, 483, 647 and the operation such
as addition, subtraction and multiplication that can be performed on integers.

Abstract Data Type


An Abstract Data Type commonly referred to as ADT, is a collection of data objects
characterized by how the objects are accessed; it is an abstract human concept
meaningful outside of computer science. (Note that "object", here, is a general abstract
concept as well, i.e. it can be an "element" (like an integer), a data structure (e.g. a list of

2
lists), or an instance of a class. (e.g. a list of circles). A data type is abstract in the sense
that it is independent of various concrete implementations.
Object-oriented languages such as C++ and Java provide explicit support for expressing
abstract data types by means of classes. A first class abstract data type supports the
creation of multiple instances of ADT and the interface normally provides a constructor,
which returns an abstract handle to new data, and several operations, which are functions
accepting the abstract handle as an argument.

Examples of Abstract Data Type


Common abstract data types (ADT) typically implemented in programming languages (or
their libraries) include: Arrays, Lists, Queues, Stacks and Trees.

What is a Data Structure?


A data structure is the implementation of an abstract data type in a
particular programming language. Data structures can also be referred to as “data
aggregate”. A carefully chosen data structure will allow the most efficient algorithm to be
used. Thus, a well-designed data structure allows a variety of critical operations to be
performed using a few resources, both execution time and memory spaces as possible.

Classification of Data Structures


Data structures are broadly divided into two:
Linear Data Structures
Non-Linear Data Structures.

Linear Data Structures


Linear data structures are data structures in which individual data elements are stored and
accessed linearly in the computer memory. For the purpose of this course, the following
linear data structures would be studied: lists, stacks, queues and arrays in order to
determine how information is processed during implementation.

Non-Linear Data Structures


A non-linear data structure, as the name implies, is a data structure in which the data
items are not stored linearly in the computer memory, but data items can be processed
using some techniques or rules. Typical non-linear data structures to be studied in this
course are Trees.

Data Structures and Programmes


The structure of data in the computer is very important in software programmes,
especially where the set of data is very large. When data is properly structured and stored
in the computer, the accessibility of data is easier and the software programme routines
that make do with the data are made simpler; time and storage spaces are also reduced.

In the design of many types of programmes, the choice of data structures is a primary
design consideration, as experience in building large systems has shown that the
difficulty of implementation and the quality and performance of the final result depends
heavily on choosing the best data structure.

3
ARRAYS
In Computer Science, an array is a data structure consisting of a group of elements that
are accessed by indexing. Each data item of an array is known as an element, and the
elements are referenced by a common name known as the array name.

Arrays and Programming


In Java, as in most programming languages, an array is a structure that holds multiple
values of the same type. A Java array is also called an object. An array can contain data
of the primitive data types. As it is an object, an array must be declared and instantiated.
For example:

int[] anArray;
anArray = new int[10];
An array can also be created using a shortcut. For example:
int[] anArray = {1,2,3,4,5,6,7,8,9,10}
An array element can be accessed using an index value. For example: int i = anArray[5]
The size of an array can be found using the length attribute. For example: int len =
anArray.length
Before any array is used in the computer, some memory locations have to be created for
storage of the elements. This is often done by using the DIM instruction of BASIC
programming language or DIMENSION instruction of FORTRAN programming
language. For example, the instruction:
DIM LAGOS (45)
will create 45 memory locations for storage of the elements of the array called LAGOS.
In most programming languages, each element has the same data type
and the array occupies a contiguous area of storage. Most programming languages have a
built-in array data type. Some programming languages support array programming which
generalises operations and functions to work transparently over arrays as they do with
scalars, instead of requiring looping over array members.

Declaration of Arrays
Variables normally only store a single value but, in some situations, it is useful to have a
variable that can store a series of related values – using an array. For example, suppose a
programme is required that will calculate the average age among a group of six students.
The ages of the students could be stored in six integer variables in C:
int age1;
int age2;
int age3;
However, a better solution would be to declare a six-element array:
int age[6];
This creates a six element array; the elements can be accessed as age[0] through age[5] in
C.
A two-dimensional array (in which the elements are arranged into rows and columns)
declared by say DIM X(3,4) can be stored as linear arrays in the computer memory by
determining the product of the subscripts.
The above can thus be expressed as DIM X (3 * 4) or DIM X (12).

4
Multi-dimensional arrays can be stored as linear arrays in order to reduce the
computation time and memory.

Multi-dimensional Arrays
Ordinary arrays are indexed by a single integer. Also useful, particularly in numerical and
graphics applications, is the concept of a multi-dimensional array, in which we index into
the array using an ordered list of integers, such as in a[3,1,5]. The number of integers in
the list used to index into the multi-dimensional array is always the same and is referred
to as the array's dimensionality, and the bounds on each of these are called the array's
dimensions. An array with dimensionality k, is often called k-dimensional. One-
dimensional arrays correspond to the simple arrays discussed thus far; two-dimensional
arrays are a particularly common representation for matrices. In practice, the
dimensionality of an array rarely exceeds three. Mapping a one-dimensional array into
memory is obvious, since memory is logically itself a (very large) one-dimensional array.
When we reach higher-dimensional arrays, however, the problem is no longer obvious.
Suppose we want to represent this simple two-dimensional array:

It is most common to index this array using the RC-convention, where elements are
referred in row, column fashion such as:

Multi-dimensional arrays are typically represented by one-dimensional arrays of


references (Iliffe vectors) to other one-dimensional arrays. The sub-arrays can be either
the rows or columns.

Classification of Arrays
Arrays can be classified as static arrays (i.e. whose size cannot change once their storage
has been allocated), or dynamic arrays, which can be resized.

Processing Arrays
Although array-based iteration is useful when dealing with very simple data structures, it
is quite difficult to construct generalized algorithms that do much more than process
every element of an array from start to finish. For example, suppose you want to process
only every second item; include or exclude specific values based on some selection
criteria; or even process the items in reverse order. Being tied to arrays also makes it
difficult to write applications that operate on databases or files without first copying the

5
data into an array for processing. Using simple array-based iteration not only ties
algorithms to using arrays, but also requires that the logic for determining which elements
stay, which go, and in which order to process them, is known in advance. Even worse, if
you need to perform the iteration in more than one place in your code, you will likely end
up duplicating the logic. This clearly isn’t a very extensible approach. Instead, what’s
needed is a way to separate the logic for selecting the data from the code that actually
processes it. An iterator (also known as an enumerator) solves these problems by
providing a generic interface for looping over a set of data so that the underlying data
structure or storage mechanism—such as an array, database, and so on—is hidden.
Whereas simple iteration generally requires you to write specific code to handle where
the data is sourced from or even what kind of ordering or preprocessing is required, an
iterator enables you to write simpler, more generic algorithms. An iterator provides a
number of operations for traversing and accessing data.

A Reverse Iterator
Sometimes you will want to reverse the iteration order without changing the code that
processes the values. Imagine an array of names that is sorted in ascending order, A to Z,
and displayed to the user somehow. If the user chose to view the names sorted in
descending order, Z to A, you might have to re-sort the array or at the very least
implement some code that traversed the array backward from the end. With a reverse
iterator, however, the same behavior can be achieved without re-sorting and without
duplicated code. When the application calls first(), the reverse iterator actually calls last()
on the underlying iterator. When the application calls next(), the underlying iterator’s
previous() method is invoked, and so on. In this way, the behavior of the iterator can be
reversed without changing the client code that displays the results, and without re-sorting
the array, which could be quite processing intensive, when you write some sorting
algorithms.

Applications of Arrays
Arrays are employed in many computer applications in which data items need to be saved
in the computer memory for subsequent reprocessing. Due to their performance
characteristics, arrays are used to implement other data structures, such as heaps, hash
tables, deques, queues, stacks and strings.

LIST DATA STRUCTURE


Lists are the most fundamental data structure upon which most other data structures are
built and many more algorithms must operate. It’s not hard to find examples of lists in the
real world: shopping lists, to-do lists, train timetables, order forms, even this “list of
lists.” Much like arrays, lists are generally useful in most applications you will write. A
list is an ordered collection of elements supporting random access to each element, much
like an array—you can query a list to get the value contained at any arbitrary element.
Lists also preserve insertion order so that, assuming there are no intervening
modifications, a given list will always return the same value for the same position. Like
arrays, lists make no attempt to preserve the uniqueness of values, meaning a list may
contain duplicate values. For example, if you had a list containing the values
“swimming”, “cycling”, and “dancing” and you were to add “swimming” again, you

6
would now find that the list had grown to include two copies of “swimming”. The major
difference between arrays and lists, however, is that whereas an array is fixed in size, lists
can resize—growing and shrinking—as necessary.

A list data structure is a sequential data structure, i.e. a collection of items accessible one
after the other, beginning at the head and ending at the tail. It is a widely used data
structure for applications which do not need random access. Lists differ from the stacks
and queues data structures in that additions and removals can be made at any position in
the list.

Elements of a List
The sentence “Dupe is not a boy” can be written as a list as follows:

DUPE IS NOT A BOY

Fig. 1: Elements of a list

We regard each word in the sentence above as a data-item or datum, which is linked to
the next datum, by a pointer. Datum plus pointer make one node of a list. The last pointer
in the list is called a terminator. It is often convenient to speak of the first item as the
head of the list, and the remainder of the list as the tail.

Operations
The main primitive operations of a list are known as:
Add adds a new node
Set updates the contents of a node
Remove removes a node
Get returns the value at a specified index
IndexOf returns the index in the list of a specified element
Additional primitives can be defined:
IsEmpty reports whether the list is empty
IsFull reports whether the list is full
Initialise creates/initialises the list
Destroy deletes the contents of the list (may be implemented by re-initialising the list)
Initialise Creates the structure – i.e. ensures that the structure exists but contains no
elements e.g. Initialise(L) creates a new empty queue named Q

Add
e.g. Add(1,X,L) adds the value X to list L at position 1 (the start of the list is position 0),
shifting subsequent elements up L
A B C

Fig. 2: List before adding value


L
A X B C

7
Fig. 3: List after adding value

Set
e.g. Set(2,Z,L) updates the values at position 2 to be Z
L
A X Z C
Fig. 4: List after update

Remove
e.g. Remove(Z,L) removes the node with value Z
L
A X Z C
Fig. 5: List before removal
L

A X C

Fig. 6: List after removal

Get
e.g. Get(2,L) returns the value of the third node, i.e. C

IndexOf
e.g. IndexOf(X,L) returns the index of the node with value X, i.e. 1

List Implementation
There are many ways to implement a list depending on how the programmer will use lists
in their programme. The two most common, are an array-based implementation and a
linked list.

1. Array List: As the name suggests, an array list uses an array to hold the values.
2. Linked List: A linked list, conversely, is a chain of elements in which each item has a
reference (or link) to the next (and optionally previous) element.

Array Lists
As the name suggests, an array list uses an array as the underlying mechanism for storing
elements. Because of this, the fact that you can index directly into arrays makes
implementing access to elements very easy. It also makes an array list the fastest
implementation for indexed and sequential access. The downside to using an array is that
each time you insert a new element; you need to shift any elements in higher positions
one place to the right by physically copying them. Similarly, when deleting an existing
element, you need to shift any objects in higher positions one place to the left to fill the
gap left by the deleted element. Additionally, because arrays are fixed in size, anytime

8
you need to increase the size of the list, you also need to reallocate a new array and copy
the contents over. This clearly affects the performance of insertion and deletion.

Properties of Array List:


1. The position of each element is given by an index from 0 to n-1, where n is the number
of elements.
2. Given any index, the element with that index can be accessed in constant time – i.e. the
time to access does not depend on the size of the list.
3. To add an element at the end of the list, the time taken does not depend on the size of
the list. However, the time taken to add an element at any other point in the list does
depend on the size of the list, as all subsequent elements must be shifted up. Additions
near the start of the list take longer than additions near the middle or end.
4. When an element is removed, subsequent elements must be shifted down, so removals
near the start of the list take longer than removals near the middle or end.

Linked List
The Linked List is stored as a sequence of linked nodes. Rather than use an array to hold
the elements, a linked list contains individual elements with links between them. As in
the case of the stack, each node in a linked list contains data AND a reference to the next
node. It also makes insertion and deletion much simpler than it is for an array list.

The Linked List has the following properties:

The list can grow and shrink as needed.


The position of each element is given by an index from 0 to n-1, where n is the number
of elements.
Given any index, the time taken to access an element with that index depends on the
index. This is because each element of the list must be traversed until the required index
is found.
The time taken to add an element at any point in the list does not depend on the size of
the list, as no shifts are required. It does, however, depend on the index. Additions near
the end of the list take longer than additions near the middle or start. The same applies to
the time taken to remove an element. A list needs a reference to the front node.
There are many variations on the Linked List data structure, including:

i. Singly Linked Lists


A singly linked list is a data structure in which the data items are chained (linked) in one
direction. Figure 1 shows an example of a singly linked list.
header
a1 a2 an
tail
Figure 7: A singly linked list

ii. Circularly Linked Lists


In a circularly linked list, the tail of the list always points to the head of the list.

9
iii. Doubly Linked Lists
This permits scanning or searching of the list in both directions. (To go backwards in a
simple list, it is necessary to go back to the start and scan forwards.) In this case, the node
structure is altered to have two links. This double linking makes it possible to traverse the
elements in either direction. It also makes insertion and deletion much simpler than it is
for an array list.

header
a1 a2 an-1 an

Figure 8: A doubly linked list

As you might recall from the discussion on array lists, in most cases when deleting or
inserting, some portion of the underlying array needs to be copied. With a linked list,
however, each time you wish to insert or delete an element, you need only update the
references to and from the next and previous elements, respectively. This makes the cost
of the actual insertion or deletion almost negligible in all but the most extreme cases. For
lists with extremely large numbers of elements, the traversal time can be a performance
issue. A doubly linked list also maintains references to the first and last elements in the
list—often referred to as the head and tail, respectively. This enables you to access either
end with equal performance.

iv. Sorted Lists


Lists can be designed to be maintained in a given order. In this case, the Add method will
search for the correct place in the list to insert a new data item.

THE STACK DATA STRUCTURE


A stack is a linear data structure in which all insertions and deletions of data are made
only at one end of the stack, often called the top of the stack. For this reason, a stack is
referred to as a LIFO (last-in-first-out) structure.

Push Pop

Figure 9 shows a stack.

10
A frequently used metaphor is the idea of a stack of plates in a spring loaded cafeteria
stack. In such a stack, only the top plate is visible and accessible to the user, all other
plates remain hidden. As new plates are added, each new plate becomes the top of the
stack, hiding each plate below, pushing the stack of plates down. As the top plate is
removed from the stack, the plates pop back up, and the second plate becomes the top of
the stack.

Application of Stacks
Stacks are used extensively at every level of a modern computer system. For example, a
modern PC uses stacks at the architecture level, which are used in the basic design of an
operating system for interrupt handling and operating system function calls. Among other
uses, stacks are used to run a Java Virtual Machine, and the Java language itself has a
class called "Stack", which can be used by the programmer.

Stacks have many other applications. For example, as processor executes a programme,
when a function call is made, the called function must know how to return back to the
programme, so the current address of programme execution is pushed onto a stack. Once
the function is finished, the address that was saved is removed from the stack, and
execution of the programme resumes. If a series of function calls occur, the successive
return values are pushed onto the stack in LIFO order so that each function can return
back to calling programme. Stacks support recursive function calls, subroutine calls,
especially when “reverse polish notation” is involved.
Solving a search problem, regardless of whether the approach is exhaustive or optimal,
needs stack space. Examples of exhaustive search methods are brute force and
backtracking. Examples of optimal search exploring methods are branch and bound and
heuristic solutions. All of these algorithms use stacks to remember the search nodes that
have been noticed but not explored yet.
Another common use of stacks at the architecture level is as a means of allocating and
accessing memory.

Fig. 10: Basic Architecture of a Stack

11
Operations on a Stack
The stack is usually implemented with two basic operations known as "push" and "pop".
Thus, two operations applicable to all stacks are:
A push operation, in which a data item is placed at the location pointed to by the stack
pointer and the address in the stack pointer is adjusted by the size of the data item; Push
adds a given node to the top of the stack leaving previous nodes below.
A pop or pull operation, in which a data item at the current location pointed to by the
stack pointer is removed, and the stack pointer is adjusted by the size of the data item.
Pop removes and returns the current top node of the stack.
The main primitives of a stack are known as:
Push adds a new node
Pop removes a node
Figure 11 shows the insertion of three data X, Y and Z to a stack and the removal of two
data, Z and Y, from the stack.

To X To Y To Z To Y To X
p p p p p

Empty Push X Push Y Push Z Pop Z Pop Y

Fig. 11: Insertion and removal of data from stack

Additional primitives can be defined:


IsEmpty reports whether the stack is empty
IsFull reports whether the stack is full
Initialise creates/initialises the stack
Destroy deletes the contents of the stack
(may be implemented by re-initialising the stack)

Initialise
Creates the structure – i.e. ensures that the structure exists but contains
no elements e.g. Initialise(S) creates a new empty stack named S

e.g. Push(X,S) adds the value X to the Top of the stacks, S

12
X
S

Fig 12: Stack after adding the value X

Pop

e.g. Pop(S) removes the TOP node and returns its value

Pop

Fig. 13: Stack after removing the top node

Stack Storage Modes


A stack can be stored in two ways:
a static data structure
OR
a dynamic data structure

Static Data Structures


These define collections of data which are fixed in size when the
Programme is compiled.

Dynamic Data Structures


These define collections of data which are variable in size and structure.
They are created as the programme executes, and grow and shrink to accommodate the
data being stored.

THE QUEUE DATA STRUCTURE


Queues are an essential part of algorithms that manage the allocation and scheduling of
work, events, or messages to be processed. They are often used as a way of enabling
different processes— either on the same or different machines—to communicate with
one another.

13
Customers line up in a bank waiting to be served by a teller and in supermarkets waiting
to check out. No doubt you’ve been stuck waiting in a line to speak to a customer service
representative at a call center. In computing terms, however, a queue is a list of data
items stored in such a way that they can be retrieved in a definable order. The main
distinguishing feature between a queue and a list is that whereas all items in a list are
accessible—by their position within the list—the only item you can ever retrieve from a
queue is the one at the head. Which item is at the head depends on the specific queue
implementation.

More often than not, the order of retrieval is indeed the same as the order of insertion
(also known as first-in-first-out, or FIFO), but there are other possibilities as well. Some
of the more common examples include a last-in-first-out queue and a priority queue,
whereby retrieval is based on the relative priority of each item. You can even create a
random queue that effectively “shuffles” the contents. Queues are often described in
terms of producers and consumers. A producer is anything that stores data in a queue,
while a consumer is anything that retrieves data from a queue.

Queues can be ether bounded or unbounded. Bounded queues have limits placed on the
number of items that can be held at any one time. These are especially useful when the
amount of available memory is constrained—for example, in a device such as a router or
even an in-memory message queue. Unbounded queues, conversely, are free to grow in
size as the limits of the hardware allow.

The queue data structure is characterised by the fact that additions are made at the end, or
tail, of the queue while removals are made from the front, or head of the queue. For this
reason, a queue is referred to as a FIFO structure (First-In First-Out). Figure 14 shows a
queue of part of English alphabets.

Insertion Deletion

Last data First data

Fig. 14: Example of a Queue

Application of Queues
Queues are very important structures in computer simulations, data processing,
information management, and in operating systems.
In simulations, queue structures are used to represent real-life events such as car queues
at traffic light junctions and petrol filling stations, queues of people at the check-out point
in super markets, queues of bank customers, etc.

14
In operating systems, queue structures are used to represent different programmes in the
computer memory in the order in which they are executed. For example, if a programme,
J is submitted before programme K, then programme J is queued before programme K in
the computer memory and programme J is executed before programme K.

Operations on a Queue
The main primitive operations on a queue are known as:
Enqueue: Stores a value in the queue. The size of the queue will increase by one.
Dequeue: Retrieves the value at the head of the queue. The size of the queue will
decrease by one. Throws EmptyQueueException if there are no more items in the queue.
Clear: Deletes all elements from the queue. The size of the queue will be reset to zero
(0).
Size: Obtains the number of elements in the queue.
IsEmpty: Determines whether the queue is empty (size() = 0) or not.

Additional primitives can be defined thus:


IsFull reports whether the queue is full
Initialise creates/initialises the queue

Initialise
Creates the structure – i.e. ensures that the structure exists but contains no elements.
e.g. Initialise(Q) creates a new empty queue named Q
Add
e.g. Add(X,Q) adds the value X to the tail of Q

Q X

Fig. 15: Queue after adding the value X to the tail of Q then, Add (Y, Q) adds the value
Y to the tail of Q

X Y

Fig. 16: Queue after adding the value Y to the tail of Q

e.g. Remove(X, Q) removes the head node and returns its value

Q Y

Fig. 17: Queue after removing X from the head node

Other Queue Operations


Action Contents of queue Q after operation Return value
Initialise (Q) empty
Add (A,Q) A -

15
Add (B,Q) A B -
Add(C,Q) A B C -
Remove (Q) B C A
Add (F,Q) B C F -
Remove (Q) C F B
Remove (Q) F C
Remove (Q) empty F

Storing a Queue in a Static Data Structure


This implementation stores the queue in an array. The array indices at which the head and
tail of the queue are currently stored must be maintained. The head of the queue is not
necessarily at index 0. The array can be a “circular array” in which the queue “wraps
round” if the last index of the array is reached.
Figure 18 below is an example of storing a queue in an array of length 5:

Fig. 18: Storing a queue in an array

16
Storing a Queue in a Dynamic Data Structure
A queue requires a reference to the head node AND a reference to the tail node. The
following diagram describes the storage of a queue called Queue. Each node consists of
data (DataItem) and a reference (NextNode).

·The first node is accessed using the name Queue.Head.


·Its data is accessed using Queue.Head.DataItem
·The second node is accessed using Queue.Head.NextNode
·The last node is accessed using Queue.Tail

Adding a Node (Add)


The new node is to be added at the tail of the queue. The reference Queue.Tail should
point to the new node, and the NextNode reference of the node previously at the tail of
the queue should point to the DataItem of the new node.

Removing a Node (Remove)


The value of Queue.Head.DataItem is returned. A temporary reference Temp, is
declared and set to point to head node in the queue (Temp = Queue.Head). Queue.Head
is then set to point to the second node instead of the top node. The only reference to the
original head node is now Temp and the memory used by this node can then be freed.

17
Blocking Queues
Queues are often used in multi-threaded environments as a form of interprocess
communication. Unfortunately, FIFO Queue is totally unsafe for use in situations where
multiple consumers would be accessing it concurrently. Instead, a blocking queue is one
way to provide a thread-safe implementation, ensuring that all access to the data is
correctly synchronized. The first main enhancement that a blocking queue offers over a
regular queue is that it can be bounded.

So far, we have only dealt with unbounded queues—those that continue to grow without
limit. The blocking queue enables you to set an upper limit on the size of the queue.
Moreover, when an attempt is made to store an item in a queue that has reached its limit,
the queue will, you guessed it, block the thread until space becomes available—either by
removing an item or by calling clear(). In this way, you guarantee that the queue will
never exceed its predefined bounds. The second major feature affects the behavior of
dequeue(). When an attempt is made to retrieve an item from an empty queue, a blocking
queue, will block the current thread until an item is enqueued. This is good for
implementing work queues where multiple, concurrent consumers need to wait until there
are more tasks to perform.

The Scheduler Model


The scheduler model consists of minimum of 5 processors and maximum of 10
processors. The scheduler model involves a centralized dynamic scheduling scheme in
which all tasks arrive at a central processor, called the scheduler. Each scheduler has a
task queue attached to it. The task queue holds newly arriving tasks. The role of the
central scheduler is to distribute the tasks to other processors in the system for execution.

18
There is dispatch queue associated with each processor. The communication between the
scheduler and the processors is through the dispatch queues.

The scheduler makes sure that each dispatch queue is filled with a minimum number of
tasks so that a processor could always find a task in its dispatch queue when it finishes
execution of a task. The scheduler determines a feasible schedule based on the worst
case computation times of tasks satisfying their timing and resource constraints. The
scheduling algorithm has full knowledge about the currently active set of tasks, but not
about the new set of tasks that may arrive while scheduling the current task set. The
objective of the dynamic scheduling is to minimize the makespan thereby improving the
guarantee ratio. The guarantee ratio is the percentage of tasks that arrived in the system
whose deadlines are met. The scheduler must also guarantee that the tasks already
scheduled are going to meet their deadlines. The scheduler model consists of minimum of
5 processors and maximum of 10 processors. The scheduler model is shown in Fig.3.1.

P1

P2

P3

Task queue
P4

Scheduler
P5

P6

P7

P8

P9

P10

Dispatch queues Processors

19
Fig. 3.1: The Scheduler Model
(Source: Oluwadare, 2009)
TREES DATA STRUCTURE
A tree is often used to represent a hierarchy. This is because the relationships
between the items in the hierarchy suggest the branches of a botanical tree.

A simple unordered tree; in this diagram, the node labeled 7 has two children, labeled 2
and 6, and one parent, labeled 2. The root node, at the top, has no parent. In computer
science, a tree is a widely-used data structure that emulates a hierarchical tree structure
with a set of linked nodes. A node is a structure which may contain a value, a condition,
or represent a separate data structure (which could be a tree of its own). Each node in a
tree has zero or more child nodes, which are below it in the tree (by convention, trees are
drawn growing downwards). A node that has a child is called the child's parent node (or
ancestor node, or superior). A node has at most one parent. Nodes that do not have any
children are called leaf nodes. They are also referred to as terminal nodes.

The height of a node is the length of the longest downward path to a leaf from that node.
The height of the root is the height of the tree. The depth of a node is the length of the
path to its root (i.e., its root path). This is commonly needed in the manipulation of the
various self balancing trees, AVL Trees in particular. Conventionally, the value −1
corresponds to a subtree with no nodes, whereas zero corresponds to a subtree with one
node.

The topmost node in a tree is called the root node. Being the topmost node, the root node
will not have parents. It is the node at which operations on the tree commonly begin
(although some algorithms begin with the leaf nodes and work up ending at the root). All
other nodes can be reached from it by following edges or links. (In the formal definition,
each such path is also unique). In diagrams, it is typically drawn at the top. In some trees,
such as heaps, the root node has special properties. Every node in a tree can be seen as
the root node of the subtree rooted at that node.

An internal node or inner node is any node of a tree that has child nodes and is thus not
a leaf node. Similarly, an external node or outer node is any node that does not have
child nodes and is thus a leaf.

A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T.


(This is different from the formal definition of subtree used in graph theory. The subtree

20
corresponding to the root node is the entire tree; the subtree corresponding to any other
node is called a proper subtree (in analogy to the term proper subset).

Root Node
A tree with height
3
Right Child
Node
Edges or Links

Leaf Node

Fig 1: General Structure for a Tree.

This forms a complete tree, whose height is defined as the number of links
from the root to the deepest leaf.

Key terms

21
Root Node
Node at the "top" of a tree - the one from which all operations on the tree
commence. The root node may not exist (a NULL tree with no nodes in it) or
have 0, 1 or 2 children in a binary tree.
Leaf Node
Node at the "bottom" of a tree - farthest from the root. Leaf nodes have no
children.
Complete Tree
Tree in which each leaf is at the same distance from the root. A more precise and
formal definition of a complete tree is set out later.
Height
Number of nodes which must be traversed from the root to reach a leaf of a tree.

Binary Trees

The simplest form of tree is a binary tree. A binary tree consists of

a. a node (called the root node) and


b. left and right sub-trees.
c. Both the sub-trees are themselves binary trees.

A binary tree

The nodes at the lowest levels of the tree (the ones with no sub-trees) are called leaves.

In an ordered binary tree,

22
1. the keys of all the nodes in the left sub-tree are less than that of the root,
2. the keys of all the nodes in the right sub-tree are greater than that of the root,
3. the left and right sub-trees are themselves ordered binary trees.

Traversal methods
There are many different applications of trees. As a result, there are many different
algorithms for manipulating them. However, many of the different tree algorithms
have in common the characteristic that they systematically visit all the nodes in the
tree. That is, the algorithm walks through the tree data structure and performs
some computation at each node in the tree. This process of walking through the
tree is called a tree traversal.

Stepping through the items of a tree, by means of the connections between parents and
children, is called walking the tree, and the action is a walk of the tree. Often, an
operation might be performed when a pointer arrives at a particular node. A walk in
which each parent node is traversed before its children is called a pre-order walk; a walk
in which the children are traversed before their respective parents are traversed is called a
post-order walk; a walk in which a node's left subtree, then the node itself, and then
finally its right subtree are traversed is called an in-order traversal. (This last scenario,
referring to exactly two subtrees, a left subtree and a right subtree, assumes specifically a
binary tree.)

Preorder Traversal
The first depth-first traversal method we consider is called preorder
traversal. Preorder traversal is defined recursively as follows: To do a
preorder traversal of a general tree:
1. Visit the root first; and then
2. Do a preorder traversal each of the subtrees of the root one-by-one in the order
given.
Preorder traversal gets its name from the fact that it visits the root first.
In the case of a binary tree, the algorithm becomes:
1. Visit the root first; and then
2. Traverse the left subtree; and then
3. Traverse the right subtree.

Notice that the preorder traversal visits the nodes of the tree in precisely the same
order in which they are written. A preorder traversal is often done when it is
necessary to print a textual representation of a tree.

Postorder Traversal
The second depth-first traversal method we consider is postorder
traversal. In contrast with preorder traversal, which visits the root first, postorder
traversal visits the root last. To do a postorder traversal of a general tree:

23
1. Do a postorder traversal each of the subtrees of the root one by-one in the order
given; and then
2. Visit the root.
To do a postorder traversal of a binary tree
1. Traverse the left subtree; and then
2. Traverse the right subtree; and then
3. Visit the root.

Inorder Traversal
The third depth-first traversal method is inorder traversal. Inorder
traversal only makes sense for binary trees. Whereas preorder traversal visits the
root first and postorder traversal visits the root last, inorder traversal visits the root
in between visiting the left and right subtrees:
1. Traverse the left subtree; and then
2. Visit the root; and then
3. Traverse the right subtree.

Common operations on Trees

 Enumerating all the items


 Enumerating a section of a tree
 Searching for an item
 Adding a new item at a certain position on the tree
 Deleting an item
 Removing a whole section of a tree (called pruning)
 Adding a whole section to a tree (called grafting)
 Finding the root for any node

Common uses of Trees

 Manipulate hierarchical data


 Make information easy to search
 Manipulate sorted lists of data
 As a workflow for compositing digital images for visual effects

General n-ary trees

If we relax the restriction that each node can have only one key, we can reduce the
height of the tree

An m-way search tree

a. is empty or

24
b. consists of a root containing j (1<=j<m) keys, kj, and
a set of sub-trees, Ti, (i = 0..j), such that
i. if k is a key in T0, then k <= k1
ii. if k is a key in Ti (0<i<j), then ki <= k <= ki+1
iii. if k is a key in Tj, then k > kj and
iv. all Ti are nonempty m-way search trees or all Ti are empty

Or in plain English ..

b. A node generally has m-1 keys and m children.


Each node has alternating sub-tree pointers and keys:
sub-tree | key | sub-tree | key | ... | key | sub_tree
i. All keys in a sub-tree to the left of a key are smaller than it.
ii. All keys in the node between two keys are between those two keys.
iii. All keys in a sub-tree to the right of a key are greater than it.
iv. This is the "standard" recursive part of the definition.

A B-tree of order m is an m-way tree in which

a. all leaves are on the same level and


b. all nodes except for the root and the leaves have at least m/2 children and at most
m children. The root has at least 2 children and at most m children.

A variation of the B-tree, known as a B+-tree considers all the keys in nodes except the
leaves as dummies. All keys are duplicated in the leaves. This has the advantage that is
all the leaves are linked together sequentially; the entire tree may be scanned without
visiting the higher nodes at all.

Key Terms
n-ary trees (or n-way trees)
Trees in which each node may have up to n children.
B-tree
Balanced variant of an n-way tree.
B+-tree
B-tree in which all the leaves are linked to facilitate fast in order traversal.

-----------
AVL tree

An AVL tree is another balanced binary search tree. Named after their inventors,

Adelson-Velskii and Landis, they were the first dynamically balanced trees to be

proposed. AVL tree is a self-balancing Binary Search Tree (BST) where the

25
difference between heights of left and right subtrees cannot be more than one for

all nodes. An AVL tree is a binary search tree which has the following properties:

1. The sub-trees of every node differ in height by at most one.

2. Every sub-tree is an AVL tree.

A binary search tree is an AVL tree if there is no node that has subtrees differing

in height by more than

A perfectly balanced binary tree is an AVL tree.

Adding or removing a leaf from an AVL tree may make many nodes violate the
AVL balance condition, but each violation of AVL balance can be restored by one
or two simple changes called rotations..

B - TREE
In computer science, a B- tree is a tree data structure that keeps data stored
and allows searches, sequential access, insertions and deletions in a
logarithmic time. The B-tree is a generalization of a binary search tree in
that a node may have more than two children (Comer 1979, p.123). Unlike

26
self-balancing binary search trees, the B-tree is optimized for systems that
read and write large data block of data. It is commonly used in database
environment and file systems.

Fig 2: A B-tree of order 2 (Bayer & McCreight 1972) or order 5 (Knuth 1998).
As depicted in the above picture, in B-tree, the internal (non-leaf) nodes can
have a variable number of child nodes within some pre-defined way. Any
time data is inserted or removed from the node, the number of children
nodes changes, In order to maintain the pre-defined range, the internal node
may be joined or split. Because a range of child node is permitted, B-tree do
not need re-balancing as frequently as other self-balancing search tree, but it
may waste some space, since nodes are not entirely full. The lower and
upper bounds on the number of child nodes are typically fixed for s
particular implementation. For instance, in a 2-3 B -tree (often referred to as
a 2-3 tree), each internal nodes may have only 2 or 3 child nodes.
Each internal node of a B-tree will contain a number of keys. The keys have
a separation values which divides its sub-trees. A B- tree is made balanced
by keeping all leaf nodes at the same depth. This depth will increase slowly
as element are added to the tree, but an increase in the overall depth is
infrequent, and result in a leaf nodes being one more node farther away from
the root. B-trees have substantial advantages over alternative
implementations when the time to access the data of a node greatly exceed

27
the time spent processing the data, because then, the cost of accessing the
node may be amortized over multiple operation within the node. This
usually occurs when the node when the node data are in a secondary storage
such as disk drives. By maximizing the number of keys within each internal
node, the height of the tree decreases and the number of the expensive node
accesses is reduced. In addition, rebalancing of the tree occurs less often.
The maximum number of a child node depends on the information that must
be stored for each child node and the size of a full disk block or an
analogous size in secondary storage. While 2-3 B-tree are easier to explain,
practical B-tree using secondary storage needs large number of child node to
improve performance.
Note that, the term B-Tree may refer to a specific design or may refer to a
general class of designs. In the narrow sense, a B-tree stores keys in its
internal nodes but may need not store those keys in the records at the leaves.
The general class of B-tree includes variation such as B+ Tree as we shall
see in the next section.

B + TREE
A B+ Tree can be seen as a tree in which each node contains only keys (not
key-value pairs), and to which an additional level is added at the bottom
with linked leaves. As depicted in the simple B+ Tree in the example below,
the B+ tree is linking the keys 1-7 to data values d1-d7. The linked list with
red color allows a rapid in-order traversal, with a branching factor of b =4.

28
Fig 3: A B+ tree structure.
The primary value of a B+ Tree resides in its ability to store data for an
efficient retrieval in a block oriented storage context, particularly in file
systems. This is primarily because unlike a binary search tree, B+ trees have
a very high fanout (number of pointers to the child nodes in a node, typically
on the order of 100 or more), which thereby reduces the number of I/O
operations required to find an element in the tree.
The NTFS, ReiserFS, NSS, XFS, JFS, ReFS and BFS file systems all uses
this type of tree for metadata indexing; BFS also uses B+ tree for storing
directories. The relational database management systems such as IBM DB2,
Informix, Microsoft SQL Server, Oracle 8, Sybase ASE and SQLite support
this type of tree for table indices. Key-value database management systems
such as CouchDB and Tokyo Cabinet support this type of tree for efficient
data access.
COMPARISON BETWEEN B-TREE AND B + TREE

B-TREE B+TREE

Has Lower fan-out compared with Has very high fan-out (number of
B+Tress pointers to child nodes in a node)

Leaf nodes has no linkage (i.e. Leaf Leaf nodes are Linked with each other

29
Nodes pointing to another leaf Node)

B trees contain data with each key B+ trees don't have data associated with
interior nodes

M-WAY TREE
An M-way tree is a multi-way tree that can have more than two children. A
multi-way tree of order m ( known as m-way tree) is the one in which a tree
can have m children. As with other tree that have been previously
mentioned, the node in m-way tree will be made up of key fields, in this
case m-1 key field and a pointer to children. In order to make the processing
of an m-way tree easier, some type of order will be imposed on the keys
within each nodes, resulting in a multi way search tree of order m. Hence, by
definition an m way search tree is a m-way tree in which the following
condition holds:

 Each node has m children and m-1 key fields.


 The keys in each nodes are in an ascending order.
 The keys in the first i children are similar than the ith key.
 The keys in the last m-i children are larger than the ith key.

The structure below depict typical m-way tree.

30
Fig 4: M- way tree structure

Storage Management
An executing program uses memory (storage) for many different purposes,
such as for the machine instructions that represent the executable part of the
program, the values of data objects, and the return location for a function
invocation.
1.2.1 Static Memory Management
When memory is allocated during compilation time, it is called ‘Static
Memory Management’. This memory is fixed and cannot be increased or
decreased after allocation. If more memory is allocated than requirement,
then memory is wasted. If less memory is allocated than requirement, then
program will not run successfully. So exact memory requirements must be
known in advance.
1.2.2 Dynamic Memory Management
When memory is allocated during run/execution time, it is called ‘Dynamic
Memory Management’. This memory is not fixed and is allocated according
to our requirements. Thus in it there is no wastage of memory. So there is no
need to know exact memory requirements in advance.

31
2.0 Phases of Storage Management
In general, storage is managed in three phases: 1) allocation, in which
needed storage is found from available (unused) storage and assigned to the
program; 2) recovery, in which storage that is no longer needed is made
available for reuse; and 3) compaction, in which blocks of storage that are in
use but are separated by blocks of unused storage are moved together in
order to provide larger blocks of available storage. (Compaction is desirable,
but usually it is difficult or impossible to do in practice so it is not often
done.) These phases may be repeated many times during the execution of a
program

HASHING
Hashing is the technique used for performing almost constant time search in case

of insertion, deletion and find operation. Taking a very simple example of it, an

array with its index as key is the example of hash table.

If one wants to store a certain set of similar objects and wants to quickly access a

given one (or come back with the result that it is unknown), the first idea would be

to store them in a list, possibly sorted for faster access. This however still would

need log (n) comparisons to find a given element or to decide that it is not yet

stored. Therefore one uses a much bigger array and uses a function on the space of

possible objects with integer values to decide, where in the array to store a certain

object. If this so called hash function distributes the actually stored objects well

enough over the array, the access time is constant in average.

32
The idea of hashing is to distribute the entries (key/value pairs) across an array of

buckets. Given a key, the algorithm computes an index that suggests where the

entry can be found:

index = f(key, array_size)

Often this is done in two steps:

hash = hashfunc(key) index = hash % array_size

In this method, the hash is independent of the array size, and it is then reduced to

an index (a number between 0and array_size − 1) using the modulo operator (%).

In the case that the array size is a power of two, the remainder operation is reduced

to masking, which improves speed, but can increase problems with a poor hash

function.

HASH FUNCTIONS
A function which employs some algorithm to compute the key K for all the data

elements in the set U, such that the key K which is of a fixed size. The same key K

can be used to map data to a hash table and all the operations like insertion,

deletion and searching should be possible. The values returned by a hash function

are also referred to as hash values, hash codes, hash sums, or hashes

Here are some relatively simple hash functions that have been used:

 Division-remainder method: The size of the number of items in the table is

estimated. That number is then used as a divisor into each original value or key

to extract a quotient and a remainder. The remainder is the hashed value.

33
(Since this method is liable to produce a number of collisions, any search

mechanism would have to be able to recognize a collision and offer an

alternate search mechanism.)

 Folding method: This method divides the original value (digits in this case)

into several parts, adds the parts together, and then uses the last four digits (or

some other arbitrary number of digits that will work) as the hashed value or

key.

 Radix transformation method: Where the value or key is digital, the number

base (or radix) can be changed resulting in a different sequence of digits. (For

example, a decimal numbered key could be transformed into a hexadecimal

numbered key.) High-order digits could be discarded to fit a hash value of

uniform length.

 Digit rearrangement method: This is simply taking part of the original value or

key such as digits in positions 3 through 6, reversing their order, and then

using that sequence of digits as the hash value or key.

There are several well-known hash functions used in cryptography. These include

the message-digest hash functions MD2, MD4, and MD5, used for hashing digital

signatures into a shorter value called a message-digest, and the Secure Hash

Algorithm (SHA), a standard algorithm, that makes a larger (60-bit) message

digest and is similar to MD4. A hash function that works well for database storage

34
and retrieval, however, might not work as for cryptographic or error-checking

purposes.

HASH TABLES
It is a Data structure where the data elements are stored (inserted), searched,

deleted based on the keys generated for each element, which is obtained from a

hashing function. In a hashing system the keys are stored in an array which is

called the Hash Table. A perfectly implemented hash table would always promise

an average insert/delete/retrieval time of O(1).

A hash table is a collection of items which are stored in such a way as to make it

easy to find them later. Each position of the hash table, often called a slot, can

hold an item and is named by an integer value starting at 0. For example, we will

have a slot named 0, a slot named 1, a slot named 2, and so on. Initially, the hash

table contains no items so every slot is empty. We can implement a hash table by

using a list with each element initialized to the special Python value None. Figure

1shows a hash table of size \(m=11\). In other words, there are m slots in the table,

named 0 through 10.

Figure 1: Hash Table with 11 Empty Slots

35
The mapping between an item and the slot where that item belongs in the hash

table is called the hash function. The hash function will take any item in the

collection and return an integer in the range of slot names, between 0 and m-1.

Assume that we have the set of integer items 54, 26, 93, 17, 77, and 31. Our first

hash function, sometimes referred to as the “remainder method,” simply takes an

item and divides it by the table size, returning the remainder as its hash value

(\(h(item)=item \% 11\)). Table 1gives all of the hash values for our example

items. Note that this remainder method (modulo arithmetic) will typically be

present in some form in all hash functions, since the result must be in the range of

slot names.

Table 1: Simple Hash Function Using Remainders


Item Hash Value
54 10
26 4
93 5
17 6
77 0
31 9

Once the hash values have been computed, we can insert each item into the hash

table at the designated position as shown in Figure 2. Note that 6 of the 11 slots

are now occupied. This is referred to as the load factor, and is commonly denoted

by

\(\lambda = \frac {numberofitems}{tablesize}\).

36
For this example, \(\lambda = \frac {6}{11}\).

Figure 2: Hash Table with Six Items

Now when we want to search for an item, we simply use the hash function to

compute the slot name for the item and then check the hash table to see if it is

present. This searching operation is \(O(1)\), since a constant amount of time is

required to compute the hash value and then index the hash table at that location.

If everything is where it should be, we have found a constant time search

algorithm.

We can probably already see that this technique is going to work only if each item

maps to a unique location in the hash table. For example, if the item 44 had been

the next item in our collection, it would have a hash value of 0 (\(44 \% 11 == 0\)).

Since 77 also had a hash value of 0, we would have a problem. According to the

hash function, two or more items would need to be in the same slot. This is

referred to as a collision (it may also be called a “clash”). Clearly, collisions create

a problem for the hashing technique.

HASHING TECHNIQUES

There are two types of hashing:

 Static hashing: In static hashing, the hash function maps search-key values

to a fixed set of locations.

37
 Dynamic hashing: In dynamic hashing a hash table can grow to handle

variable set of locations

HASH COLLISION
A situation when the resultant hashes for two or more data elements in the data set

U, maps to the same location in the hash table is called a hash collision. In such a

situation two or more data elements would qualify to be stored/mapped to the

same location in the hash table.

HASH COLLISION RESOLUTION

When hash collision occur, the act of ensuring two or more data elements is not

stored or mapped to the same location is referred to as hash collision resolution.

The Techniques utilized in collision resolution include;

 Separate chaining

 Open addressing

38
SEPARATE CHAINING: is a technique in which the data is not directly stored

at the hash key index (k) of the Hash table. Rather the data at the key index (k) in

the hash table is a pointer to the head of the data structure where the data is

actually stored. In the most simple and common implementations the data

structure adopted for storing the element is a linked-list.

In this technique when a data needs to be searched, it might become necessary

(worst case) to traverse all the nodes in the linked list to retrieve the data.

SEPARATE CHAINING HAS SEVERAL ADVANTAGES OVER


OPEN ADDRESSING:
• Collision resolution is simple and efficient.

• The hash table can hold more elements without the large performance

deterioration of open addressing (The load factor can be 1 or greater)

39
• The performance of chaining declines much more slowly than open

addressing.

• Deletion is easy -no special flag values are necessary.

• Table size need not be a prime number.

• The keys of the objects to be hashed need not be unique.

DISADVANTAGES OF SEPARATE CHAINING:

• It requires the implementation of a separate data structure for chains, and

code to manage it.

• The main cost of chaining is the extra space required for the linked lists.

• For some languages, creating new nodes (for linked lists) is expensive and

slows down the system.

Open Addressing: In this technique a hash table with pre-identified size is

considered. All items are stored in the hash table itself. In addition to the data,

each hash bucket also maintains the three states: EMPTY, OCCUPIED,

DELETED. While inserting, if a collision occurs, alternative cells are tried until an

empty bucket is found. For which one of the following technique is adopted.

i. Linear Probing: One of the simplest re-hashing functions is +1 (or -1) on a

collision. The functions look in the neighboring slot in the table. It

calculates the new address extremely quickly and may be extremely

efficient on a modern RISC processor due to efficient cache utilization.

40
ii. Quadratic probing: Better behavior is usually obtained with quadratic

probing, where the secondary hash function depends on the re-hash index:

Address = h(key) + c i2

on the tthre-hash. (A more complex function of i may also be used.) Since

keys which are mapped to thesame value by the primary hash function

follow the same sequence of addresses, quadratic probing shows secondary

clustering. However, secondary clustering is not nearly as severe as the

clustering shown by linear probes.

iii. Double hashing: Re-hashing schemes use a second hashing operation when

there is a collision. If there is a further collision, we re-hash until an empty

"slot" in the table is found.

The re-hashing function can either be a new function or a re-application of

the original one. As long as the functions are applied to a key in the same

order, then a sought key can always be located.

ADVANTAGES OF OPEN ADDRESSING:

• All items are stored in the hash table itself. There is no need for another

data structure.

• Open addressing is more efficient storage-wise.

DISADVANTAGES OF OPEN ADDRESSING:

• The keys of the objects to be hashed must be distinct.

41
• Dependent on choosing a proper table size.

• Requires the use of a three-state (Occupied, Empty, or Deleted) flag in each

cell.

APPLICATION AREAS OF HASH FUNCTION

• DATABASE SYSTEM: Hash table provides a way to locate data in a

constant amount of time.

• SYMBOL TABLE: Hash table used is by compiler to maintain information

about symbols from a program.

• DATA DICTIONARY: data structure that supports Adding, Deleting and

searching Data

42

You might also like