0% found this document useful (0 votes)
412 views521 pages

Fundamentals of Data Structures in C - , 2 - Ellis Horowitz, Sahni, Dinesh Mehta

The document discusses key concepts related to algorithms and data structures including: - A program consists of a set of instructions and data structures/algorithms where data structures store data and algorithms define logic and control. - Common data structures include arrays, stacks, queues, linked lists, trees, heaps, hash tables, and priority queues. - Algorithm strategies include greedy, divide and conquer, dynamic programming, and exhaustive search. - The appropriate data structure and algorithm depends on the requirements and priorities around performance, memory usage, and ease of implementation.

Uploaded by

jely thakkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
412 views521 pages

Fundamentals of Data Structures in C - , 2 - Ellis Horowitz, Sahni, Dinesh Mehta

The document discusses key concepts related to algorithms and data structures including: - A program consists of a set of instructions and data structures/algorithms where data structures store data and algorithms define logic and control. - Common data structures include arrays, stacks, queues, linked lists, trees, heaps, hash tables, and priority queues. - Algorithm strategies include greedy, divide and conquer, dynamic programming, and exhaustive search. - The appropriate data structure and algorithm depends on the requirements and priorities around performance, memory usage, and ease of implementation.

Uploaded by

jely thakkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 521

What is Program

 A Set of Instructions
 Data Structures + Algorithms
 Data Structure = A Container stores Data
 Algoirthm = Logic + Control
Functions of Data Structures
 Add
 Index
 Key
 Position
 Priority
 Get
 Change
 Delete
Common Data Structures
 Array
 Stack
 Queue
 Linked List
 Tree
 Heap
 Hash Table
 Priority Queue
How many Algorithms?
 Countless
Algorithm Strategies
 Greedy
 Divide and Conquer
 Dynamic Programming
 Exhaustive Search
Which Data Structure or Algorithm
is better?
 Must Meet Requirement
 High Performance
 Low RAM footprint
 Easy to implement
 Encapsulated
Chapter 1 Basic Concepts
 Overview: System Life Cycle
 Algorithm Specification
 Data Abstraction
 Performance Analysis
 Performance Measurement
1.1 Overview: system life cycle (1/2)
 Good programmers regard large-scale
computer programs as systems that
contain many complex interacting parts.
 As systems, these programs undergo a
development process called the system
life cycle.
1.1 Overview (2/2)
 We consider this cycle as consisting of

five phases.
 Requirements
 Analysis: bottom-up vs. top-down
 Design: data objects and operations
 Refinement and Coding
 Verification
 Program Proving
 Testing
 Debugging
1.2 Algorithm Specification (1/10)
 1.2.1 Introduction
 An algorithm is a finite set of instructions that
accomplishes a particular task.
 Criteria
 input: zero or more quantities that are externally supplied
 output: at least one quantity is produced
 definiteness: clear and unambiguous
 finiteness: terminate after a finite number of steps
 effectiveness: instruction is basic enough to be carried out
 A program does not have to satisfy the finiteness criteria.
1.2 Algorithm Specification
(2/10)
 Representation
 A natural language, like English or Chinese.
 A graphic, like flowcharts.
 A computer language, like C.
 Algorithms + Data structures =
Programs [Niklus Wirth]
 Sequential search vs. Binary search
1.2 Algorithm Specification
(3/10)
 Example 1.1 [Selection sort]:
Example 1.1 [Selection sort]:
 From those integers that are currently unsorted, find the
smallest and place it next in the sorted list.
i [0] [1] [2] [3] [4]
- 30 10 50 40 20
0 10 30 50 40 20
1 10 20 40 50 30
2 10 20 30 40 50
3 10 20 30 40 50
1.2 (4/10)
 Program 1.3 contains
a complete program
which you may run on
your computer
1.2 Algorithm Specification
(5/10)
 Example 1.2 [Binary search]:
[0] [1] [2] [3] [4] [5] [6]
8 14 26 30 43 50 52
left right middle list[middle] : searchnum
0 6 3 30 < 43
4 6 5 50 > 43
4 4 4 43 == 43
0 6 3 30 > 18
0 2 1 14 < 18
2 2 2 26 > 18
2 1 -

 Searching a sorted list


while (there are more integers to check) {
middle = (left + right) / 2;
if (searchnum < list[middle])
right = middle - 1;
else if (searchnum == list[middle])
return middle;
else left = middle + 1;
}
int binsearch(int list[], int searchnum, int left, int right) {
/* search list[0] <= list[1] <= … <= list[n-1] for searchnum.
Return its position if found. Otherwise return -1 */
int middle;
while (left <= right) {
middle = (left + right)/2;
switch (COMPARE(list[middle], searchnum)) {
case -1: left = middle + 1;
break;
case 0 : return middle;
case 1 : right = middle – 1;
}
}
return -1;
}

*Program 1.6: Searching an ordered list


1.2 Algorithm Specification
(7/10)
 1.2.2 Recursive algorithms
 Beginning programmer view a function as
something that is invoked (called) by another
function
 It executes its code and then returns control to the
calling function.
1.2 Algorithm Specification (8/10)
 This perspective ignores the fact that functions
can call themselves (direct recursion).
 They may call other functions that invoke the
calling function again (indirect recursion).
 extremely powerful
 frequently allow us to express an otherwise
complex process in very clear term
 We should express a recursive algorithm
when the problem itself is defined recursively.
1.2 Algorithm Specification (9/10)
 Example 1.3 [Binary search]:
1.2 (10/10) lv0 perm: i=0,
lv0 SWAP: i=0,
lv1 perm: i=1,
lv1 SWAP: i=1,
n=2
j=0
n=2
j=1
abc
abc
abc
abc

 Example 1.4 [Permutations]: lv2 perm: i=2,


print: abc
lv1 SWAP: i=1,
n=2 abc

j=1 abc
lv1 SWAP: i=1, j=2 abc
lv2 perm: i=2, n=2 acb
print: acb
lv1 SWAP: i=1, j=2 acb
lv0 SWAP: i=0, j=0 abc
lv0 SWAP: i=0, j=1 abc
lv1 perm: i=1, n=2 bac
lv1 SWAP: i=1, j=1 bac
lv2 perm: i=2, n=2 bac
print: bac
lv1 SWAP: i=1, j=1 bac
lv1 SWAP: i=1, j=2 bac
lv2 perm: i=2, n=2 bca
print: bca
lv1 SWAP: i=1, j=2 bca
lv0 SWAP: i=0, j=1 bac
lv0 SWAP: i=0, j=2 abc
lv1 perm: i=1, n=2 cba
lv1 SWAP: i=1, j=1 cba
lv2 perm: i=2, n=2 cba
print: cba
lv1 SWAP: i=1, j=1 cba
lv1 SWAP: i=1, j=2 cba
lv2 perm: i=2, n=2 cab
print: cab
lv1 SWAP: i=1, j=2 cab
lv0 SWAP: i=0, j=2 cba
1.3 Data abstraction (1/4)
 Data Type
A data type is a collection of objects and a set of
operations that act on those objects.
 For example, the data type int consists of the objects {0,
+1, -1, +2, -2, …, INT_MAX, INT_MIN} and the operations
+, -, *, /, and %.
 The data types of C
 The basic data types: char, int, float and double
 The group data types: array and struct
 The pointer data type
 The user-defined types
1.3 Data abstraction (2/4)
 Abstract Data Type
 An abstract data type(ADT) is a data type
that is organized in such a way that
the specification of the objects and
the operations on the objects is separated from

the representation of the objects and


the implementation of the operations.
 We know what is does, but not necessarily
how it will do it.
1.3 Data abstraction (3/4)
 Specification vs. Implementation
 An ADT is implementation independent
 Operation specification
 function name
 the types of arguments
 the type of the results
 The functions of a data type can be
classify into several categories:
 creator / constructor
 transformers
 observers / reporters
1.3 Data abstraction (4/4)
 Example 1.5 [Abstract data type Natural_Number]

::= is defined as
1.4 Performance analysis (1/17)
 Criteria
 Is it correct?
 Is it readable?
 …
 Performance Analysis (machine independent)
 space complexity: storage requirement
 time complexity: computing time
 Performance Measurement (machine dependent)
1.4 Performance analysis (2/17)
 1.4.1 Space Complexity:
S(P)=C+SP(I)
 Fixed Space Requirements (C)
Independent of the characteristics
of the inputs and outputs
 instruction space
 space for simple variables, fixed-size structured
variable, constants
 Variable Space Requirements (SP(I))
depend on the instance characteristic I
 number, size, values of inputs and outputs
associated with I
 recursive stack space, formal parameters, local
variables, return address
1.4 Performance analysis (3/17)
 Examples:
 Example 1.6: In program 1.9, Sabc(I)=0.

 Example 1.7: In program 1.10, Ssum(I)=Ssum(n)=0.

Recall: pass the address of the


first element of the array &
pass by value
1.4 Performance analysis (4/17)
 Example 1.8: Program 1.11 is a recursive
function for addition. Figure 1.1 shows the
number of bytes required for one recursive call.

Ssum(I)=Ssum(n)=6n
1.4 Performance analysis (5/17)
 1.4.2 Time Complexity:
T(P)=C+TP(I)
 The time, T(P), taken by a program, P, is the
sum of its compile time C and its run (or
execution) time, TP(I)
 Fixed time requirements
 Compile time (C), independent of instance
characteristics
 Variable time requirements
 Run (execution) time TP
 TP(n)=caADD(n)+csSUB(n)+clLDA(n)+cstSTA(n)
1.4 Performance analysis (6/17)
 A program step is a syntactically or
semantically meaningful program segment
whose execution time is independent of the
instance characteristics.
 Example
(Regard as the same unit machine independent)
 abc = a + b + b * c + (a + b - c) / (a + b) + 4.0
 abc = a + b + c
 Methods to compute the step count
 Introduce variable count into programs
 Tabular method
 Determine the total number of steps contributed by
each statement step per execution  frequency
 add up the contribution of all statements
1.4 Performance analysis (7/17)
 Iterative summing of a list of numbers
 *Program 1.12: Program 1.10 with count statements (p.23)
float sum(float list[ ], int n)
{
float tempsum = 0; count++; /* for assignment */
int i;
for (i = 0; i < n; i++) {
count++; /*for the for loop */
tempsum += list[i]; count++; /* for assignment
*/
}
count++; /* last execution of for */
count++; /* for return */ 2n + 3 steps
 return tempsum;
}
1.4 Performance analysis (8/17)
 Tabular Method
 *Figure 1.2: Step count table for Program 1.10 (p.26)
Iterative function to sum a list of numbers
steps/execution
Statement s/e Frequency Total steps
float sum(float list[ ], int n) 0 0 0
{ 0 0 0
float tempsum = 0; 1 1 1
int i; 0 0 0
for(i=0; i <n; i++) 1 n+1 n+1
tempsum += list[i]; 1 n n
return tempsum; 1 1 1
} 0 0 0
Total 2n+3
1.4 Performance analysis (9/17)
 Recursive summing of a list of numbers
 *Program 1.14: Program 1.11 with count statements added (p.24)

float rsum(float list[ ], int n)


{
count++; /*for if conditional */
if (n) {
count++; /* for return and rsum
invocation*/
return rsum(list, n-1) + list[n-1];
}
2n+2 steps
count++;
return list[0];
}
1.4 Performance analysis (10/17)
• *Figure 1.3: Step count table for recursive summing function
(p.27)

Statement s/e Frequency Total steps


float rsum(float list[ ], int n) 0 0 0
{ 0 0 0
if (n) 1 n+1 n+1
return rsum(list, n-1)+list[n-1]; 1 n n
return list[0]; 1 1 1
} 0 0 0
Total 2n+2
1.4 Performance analysis (11/17)
 1.4.3 Asymptotic notation (O, , )
 Complexity of c1n2+c2n and c3n
 for sufficiently large of value, c3n is faster than
c1n2+c2n
 for small values of n, either could be faster
 c1=1, c2=2, c3=100 --> c1n2+c2n  c3n for n  98
 c1=1, c2=2, c3=1000 --> c1n2+c2n  c3n for n  998
 break even point
 no matter what the values of c1, c2, and c3, the n
beyond which c3n is always faster than c1n2+c2n
1.4 Performance analysis
(12/17)
 Definition: [Big “oh’’]
 f(n) = O(g(n)) iff there exist positive constants c and n0
such that f(n)  cg(n) for all n, n  n0.
 Definition: [Omega]
 f(n) = (g(n)) (read as “f of n is omega of g of n”) iff there
exist positive constants c and n0 such that f(n)  cg(n) for
all n, n  n0.
 Definition: [Theta]
 f(n) = (g(n)) (read as “f of n is theta of g of n”) iff there
exist positive constants c1, c2, and n0 such that c1g(n) 
f(n)  c2g(n) for all n, n  n0.
1.4 Performance analysis (13/17)
 Theorem 1.2:
 If f(n) = amnm+…+a1n+a0, then f(n) = O(nm).
 Theorem 1.3:
 If f(n) = amnm+…+a1n+a0 and am > 0, then f(n) = (nm).
 Theorem 1.4:
 If f(n) = amnm+…+a1n+a0 and am > 0, then f(n) = (nm).
1.4 Performance analysis (14/17)
 Examples
 f(n) = 3n+2
 3n + 2 <= 4n, for all n >= 2, 3n + 2 =  (n)
3n + 2 >= 3n, for all n >= 1, 3n + 2 =  (n)
3n <= 3n + 2 <= 4n, for all n >= 2,  3n + 2 =  (n)

 f(n) = 10n2+4n+2
 10n2+4n+2 <= 11n2, for all n >= 5,  10n2+4n+2 =  (n2)
10n2+4n+2 >= n2, for all n >= 1,  10n2+4n+2 =  (n2)
n2 <= 10n2+4n+2 <= 11n2, for all n >= 5,  10n2+4n+2 =  (n2)

 100n+6=O(n) /* 100n+6101n for n10 */


 10n2+4n+2=O(n2) /* 10n2+4n+211n2 for n5 */
 6*2n+n2=O(2n) /* 6*2n+n2 7*2n for n4 */
1.4 Performance analysis (15/17)
 1.4.4 Practical complexity
 To get a feel for how the various functions grow
with n, you are advised to study Figures 1.7 and
1.8 very closely.
1.4 Performance analysis (16/17)
1.4 Performance analysis
(17/17)
 Figure 1.9 gives the time needed by a 1 billion
instructions per second computer to execute a
program of complexity f(n) instructions.
1.5 Performance measurement
(1/3)
 Although performance analysis gives us a powerful
tool for assessing an algorithm’s space and time
complexity, at some point we also must consider
how the algorithm executes on our machine.
 This consideration moves us from the realm of analysis
to that of measurement.
1.5 Performance measurement
 Example 1.22 (2/3)
[Worst case performance of the selection
function]:
 The tests were conducted on an IBM compatible PC with
an 80386 cpu, an 80387 numeric coprocessor, and a
turbo accelerator. We use Broland’s Turbo C compiler.
1.5 Performance measurement
(3/3)
Chapter 2 Arrays and Structures
 The array as an abstract data type
 Structures and Unions
 The polynomial Abstract Data Type
 The Sparse Matrix Abstract Data Type
 The Representation of Multidimensional
Arrays
2.1 The array as an ADT (1/6)
 Arrays
 Array: a set of pairs, <index, value>
 data structure
 For each index, there is a value associated with
that index.
 representation (possible)
 Implemented by using consecutive memory.
 In mathematical terms, we call this a
correspondence or a mapping.
2.1 The array as an ADT (2/6)
 When considering an ADT we are more
concerned with the operations that can be
performed on an array.
 Aside from creating a new array, most languages
provide only two standard operations for arrays, one
that retrieves a value, and a second that stores a
value.
 Structure 2.1 shows a definition of the array ADT
 The advantage of this ADT definition is that it clearly
points out the fact that the array is a more general
structure than “a consecutive set of memory
locations.”
2.1 The array as an ADT (3/6)
2.1 The array as an ADT (4/6)
 Arrays in C
 int list[5], *plist[5];
 list[5]: (five integers) list[0], list[1], list[2], list[3], list[4]
 *plist[5]: (five pointers to integers)
 plist[0], plist[1], plist[2], plist[3], plist[4]
 implementation of 1-D array
list[0] base address = 
list[1]  + sizeof(int)
list[2]  + 2*sizeof(int)
list[3]  + 3*sizeof(int)
list[4]  + 4*sizeof(int)
2.1 The array as an ADT (5/6)
 Arrays in C (cont’d)
 Compare int *list1 and int list2[5] in C.
Same: list1 and list2 are pointers.
Difference: list2 reserves five locations.
 Notations:
list2 - a pointer to list2[0]
(list2 + i) - a pointer to list2[i] (&list2[i])
*(list2 + i) - list2[i]
Address Contents

2.1 The array (6/6) 1228


1230
0
1

 Example: 1232 2
1234 3
1-dimension array addressing 1236 4
 int one[] = {0, 1, 2, 3, 4};
 Goal: print out address and value
 void print1(int *ptr, int rows){
/* print out a one-dimensional array using a pointer */
int i;
printf(“Address Contents\n”);
for (i=0; i < rows; i++)
printf(“%8u%5d\n”, ptr+i, *(ptr+i));
printf(“\n”);
}
2.2 Structures and Unions (1/6)
 2.2.1 Structures (records)
 Arrays are collections of data of the same type. In C
there is an alternate way of grouping data that permit
the data to vary in type.
 This mechanism is called the struct, short for structure.
 A structure is a collection of data items, where each
item is identified as to its type and name.
2.2 Structures and Unions (2/6)
 Create structure data type
 We can create our own structure data types by using
the typedef statement as below:

 This says that human_being is the name of the type defined


by the structure definition, and we may follow this definition
with declarations of variables such as:
human_being person1, person2;
2.2 Structures and Unions (3/6)
 We can also embed a structure within a structure.

 A person born on February 11, 1994, would have have


values for the date struct set as
2.2 Structures and Unions (4/6)
 2.2.2 Unions
 A union declaration is similar to a structure.
 The fields of a union must share their memory space.
 Only one field of the union is “active” at any given time
 Example: Add fields for male and female.

person1.sex_info.sex = male;
person1.sex_info.u.beard =
FALSE;
and
person2.sex_info.sex = female;
person2.sex_info.u.children = 4;
2.2 Structures and Unions (5/6)
 2.2.3 Internal implementation of structures
 The fields of a structure in memory will be stored in
the same way using increasing address locations in
the order specified in the structure definition.
 Holes or padding may actually occur
 Within a structure to permit two consecutive components to
be properly aligned within memory
 The size of an object of a struct or union type is the
amount of storage necessary to represent the largest
component, including any padding that may be
required.
2.2 Structures and Unions (6/6)
 2.2.4 Self-Referential Structures
 One or more of its components is a pointer to itself.

 typedef struct list { Construct a list with three nodes


char data; item1.link=&item2;
list *link; item2.link=&item3;
} malloc: obtain a node (memory)
free: release memory
 list item1, item2, item3;
item1.data=‘a’;
item2.data=‘b’; a b c
item3.data=‘c’;
item1.link=item2.link=item3.link=NULL;
2.3 The polynomial ADT (1/10)
 Ordered or Linear List Examples
 ordered (linear) list: (item1, item2, item3, …, itemn)
 (Sunday, Monday, Tuesday, Wednesday, Thursday, Friday,
Saturday)
 (Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King)
 (basement, lobby, mezzanine, first, second)
 (1941, 1942, 1943, 1944, 1945)
 (a1, a2, a3, …, an-1, an)
2.3 The polynomial ADT (2/10)
 Operations on Ordered List
 Finding the length, n , of the list.
 Reading the items from left to right (or right to left).
 Retrieving the i’th element.
 Storing a new value into the i’th position.
 Inserting a new element at the position i , causing
elements numbered i, i+1, …, n to become numbered
i+1, i+2, …, n+1
 Deleting the element at position i , causing elements
numbered i+1, …, n to become numbered i, i+1, …, n-1
 Implementation
 sequential mapping (1)~(4)
 non-sequential mapping (5)~(6)
2.3 The polynomial ADT (3/10)
 Polynomial examples:
 Two example polynomials are:
 A(x) = 3x20+2x5+4 and B(x) = x4+10x3+3x2+1
 Assume that we have two polynomials,
A(x) = aixi and B(x) = bixi where x is the
variable, ai is the coefficient, and i is the
exponent, then:
 A(x) + B(x) = (ai + bi)xi
 A(x) · B(x) = (aixi · (bjxj))
 Similarly, we can define subtraction and division on
polynomials, as well as many other operations.
2.3 The polynomial ADT (4/10)
 An ADT definition
of a polynomial
2.3 The polynomial ADT (5/10)
 There are two ways to create the type
polynomial in C
 Representation I
 define MAX_degree 101 /*MAX degree of polynomial+1*/
typedef struct{
int degree;
float coef [MAX_degree];
drawback: the first
}polynomial; representation may
waste space.
 Polynomial Addition 2.3 (6/10)
 /* d =a + b, where a, b, and d are polynomials */
d = Zero( )
while (! IsZero(a) && ! IsZero(b)) do {
switch COMPARE (Lead_Exp(a), Lead_Exp(b)) {
case -1: d =
Attach(d, Coef (b, Lead_Exp(b)), Lead_Exp(b));
b = Remove(b, Lead_Exp(b));
break;
case 0: sum = Coef (a, Lead_Exp (a)) + Coef ( b, Lead_Exp(b));
if (sum) {
Attach (d, sum, Lead_Exp(a));
}
a = Remove(a , Lead_Exp(a));
b = Remove(b , Lead_Exp(b));
break;
case 1: d =
Attach(d, Coef (a, Lead_Exp(a)), Lead_Exp(a));
a = Remove(a, Lead_Exp(a));
} advantage: easy implementation
} disadvantage: waste space when sparse
insert any remaining terms of a or b into d

*Program 2.4 :Initial version of padd function(p.62)


2.3 The polynomial ADT (7/10)
 Representation II
 MAX_TERMS 100 /*size of terms array*/
typedef struct{
float coef;
int expon;
}polynomial;
polynomial terms [MAX_TERMS];
int avail = 0;
2.3 The polynomial ADT (8/10)
 Use one global array to store all polynomials
 Figure 2.2 shows how these polynomials are
stored in the array terms. specification representation
poly <start, finish>
A(x) = 2x +1 1000
A <0,1>
B(x) = x4+10x3+3x2+1 B <2,5>
storage requirements: start, finish, 2*(finish-start+1)
non-sparse: twice as much as Representation I when all the items are nonzero
2.3 The polynomial ADT (9/10)
 We would now like to
write a C function that
adds two polynomials,

A and B, represented
as above to obtain D
= A + B.
 To produce D(x), padd
(Program 2.5) adds A(x)
and B(x) term by term.
Analysis: O(n+m)
where n (m) is the number
of nonzeros in A (B).
2.3 The polynomial ADT (10/10)

Problem: Compaction is required


when polynomials that are no longer needed.
(data movement takes time.)
2.4 The sparse matrix ADT (1/18)
 2.4.1 Introduction
 In mathematics, a matrix contains m rows and
n columns of elements, we write mn to
designate a matrix with m rows and n
columns.

sparse matrix
data structure?
5*3
15/15 8/36 6*6
2.4 The sparse matrix ADT (2/18)
 The standard representation of a matrix is a two
dimensional array defined as a[MAX_ROWS]
[MAX_COLS].
 We can locate quickly any element by writing a[i ][ j ]
 Sparse matrix wastes space
 We must consider alternate forms of representation.
 Our representation of sparse matrices should store only
nonzero elements.
 Each element is characterized by <row, col, value>.
2.4 The sparse matrix ADT (3/18)
 Structure 2.3
contains our
specification of
the matrix ADT.
 A minimal set of
operations
includes matrix
creation,
addition,
multiplication,
and transpose.
2.4 The sparse matrix ADT (4/18)
 We implement the Create operation as
below:
2.4 The sparse matrix ADT (5/18)
 Figure 2.4(a) shows how the sparse matrix of Figure
2.3(b) is represented in the array a.
 Represented by a two-dimensional array.
 Each element is characterized by <row, col, value>.
# of rows (columns) # of nonzero terms

transpose

row, column in
ascending order
2.4 The sparse matrix ADT (6/18)
 2.4.2 Transpose a Matrix
 For each row i
 take element <i, j, value> and store it in element <j, i, value> of
the transpose.
 difficulty: where to put <j, i, value>
(0, 0, 15) ====> (0, 0, 15)
(0, 3, 22) ====> (3, 0, 22)
(0, 5, -15) ====> (5, 0, -15)
(1, 1, 11) ====> (1, 1, 11)
Move elements down very often.
 For all elements in column j,
place element <i, j, value> in element <j, i, value>
2.4 The sparse matrix ADT (7/18)
 This algorithm is incorporated in transpose
(Program 2.7).

columns
elements

Scan the array


“columns” times.
==> O(columns*elements)
The array has
“elements” elements.
2.4 The sparse matrix ADT (8/18)
 Discussion: compared with 2-D array
representation
 O(columns*elements) vs. O(columns*rows)
 elements --> columns * rows when non-sparse,
O(columns2*rows)
 Problem: Scan the array “columns” times.
 In fact, we can transpose a matrix represented as a
sequence of triples in O(columns + elements) time.
 Solution:
 First, determine the number of elements
in each column of the original matrix.
 Second, determine the starting positions of each row
in the transpose matrix.
2.4 The sparse matrix ADT (9/18)
 Compared with 2-D array representation:
O(columns+elements) vs. O(columns*rows)
elements --> columns * rows O(columns*rows)
Cost:
Additional row_terms and
starting_pos arrays are required.
Let the two arrays row_terms
and starting_pos be shared.
columns
elements

columns

elements
2.4 The sparse matrix ADT (10/18)
 After the execution of the third for loop, the
values of row_terms and starting_pos are:
[0] [1] [2] [3] [4]
[5]
row_terms = 2 1 2 2 0 1
starting_pos = 1 3 4 6 8 8

transpose
2.4 The sparse matrix ADT (11/18)
 2.4.3 Matrix multiplication
 Definition:
Given A and B where A is mn and B is np, the
product matrix D has dimension mp. Its <i, j>
element is n 1
d ij   aik bkj
k 0
for 0  i < m and 0  j < p.
 Example:
2.4 The sparse matrix ADT (12/18)
 Sparse Matrix Multiplication
 Definition: [D]m*p=[A]m*n* [B]n*p
 Procedure: Fix a row of A and find all elements
in column j of B for j=0, 1, …, p-1.
 Alternative 1.
Scan all of B to find all elements in j.
 Alternative 2.
Compute the transpose of B.
(Put all column elements consecutively)
 Once we have located the elements of row i of A and column
j of B we just do a merge operation similar to that used in the
polynomial addition of 2.2
2.4 The sparse matrix ADT (13/18)
 General case:
dij=ai0*b0j+ai1*b1j+…+ai(n-1)*b(n-1)j
 Array A is grouped by i, and after transpose,
array B is also grouped by j

a Sa d Sd
b Sb e Se
c Sc f Sf
g Sg

The generation at most:


entries ad, ae, af, ag, bd, be, bf, bg, cd, ce, cf, cg
The sparse matrix ADT (14/18)
 An Example
A = 1 0 2 BT = 3 -1 0 B = 3 0 2
-1 4 6 0 0 0 -1 0 0
2 0 5 0 0 5
row col value row col value row col value
a[0] 2 3 5 bt[0] 3 3 4 b[0] 3 3 4
[1] 0 0 1 bt[1] 0 0 3 b[1] 0 0 3
[2] 0 2 2 bt[2] 0 1 -1 b[2] 0 2 2
[3] 1 0 -1 bt[3] 2 0 2 b[3] 1 0 -1
[4] 1 1 4 bt[4] 2 2 5 b[4] 2 2 5
[5] 1 2 6
2.4 The sparse matrix ADT (15/18)
 The programs 2.9 and 2.10 can obtain the product matrix
D which multiplies matrices A and B.

a×b
2.4 The sparse matrix ADT (16/18)
2.4 The sparse matrix ADT (17/18)
 Analyzing the algorithm
 cols_b * termsrow1 + totalb +
cols_b * termsrow2 + totalb +
…+
cols_b * termsrowp + totalb
= cols_b * (termsrow1 + termsrow2 + … + termsrowp)+
rows_a * totalb
= cols_b * totala + row_a * totalb

O(cols_b * totala + rows_a * totalb)


2.4 The sparse matrix ADT (18/18)
 Compared with matrix multiplication using array
 for (i =0; i < rows_a; i++)
for (j=0; j < cols_b; j++) {
sum =0;
for (k=0; k < cols_a; k++)
sum += (a[i][k] *b[k][j]);
d[i][j] =sum;
}
 O(rows_a * cols_a * cols_b) vs.
O(cols_b * total_a + rows_a * total_b)
 optimal case:
total_a < rows_a * cols_a total_b < cols_a * cols_b
 worse case:
total_a --> rows_a * cols_a, or
total_b --> cols_a * cols_b
2.5 Representation of
multidimensional array (1/5)
 The internal representation of multidimensional
arrays requires more complex addressing
formula.
 If an array is declared a[upper0][upper1]…[uppern],
then it is easy to see that the number of elements in
the array is: n 1
 upper
i 0
i

Where  is the product of the upperi’s.


 Example:
 If we declare a as a[10][10][10], then we require 10*10*10 =
1000 units of storage to hold the array.
2.5 Representation of
multidimensional array (2/5)
 Represent multidimensional arrays:
row major order and column major order.
 Row major order stores multidimensional arrays
by rows.
 A[upper0][upper1] as
upper0 rows, row0, row1, …, rowupper0-1,
each row containing upper1 elements.
2.5 Representation of
multidimensional array (3/5)
 Row major order:A[i][j] :  + i*upper1 + j
 Column major order: A[i][j] :  + j*upper0 + i
col0 col1 colu1-1
row0 A[0][0] A[0][1] ... A[0][u1-1]
  + u0 +(u1-1)* u0
row1 A[1][0] A[1][1] ... A[1][u1-1]
 + u1
. . .
rowu0-1A[u0-1][0] A[u0-1][1] ... A[u0-1][u1-1]
+(u0-1)*u1
2.5 Representation of
multidimensional array (4/5)
 To represent a three-dimensional array,
A[upper0][upper1][upper2], we interpret the array
as upper0 two-dimensional arrays of dimension
upper1upper2.
 To locate a[i][j][k], we first obtain  + i*upper1*upper2
as the address of a[i][0][0] because there are i two
dimensional arrays of size upper1*upper2 preceding
this element.
  + i*upper1*upper2+j *upper2+k
as the address of a[i][j][k].
2.5 Representation of
multidimensional array (5/5)
 Generalizing on the preceding discussion, we can
obtain the addressing formula for any element A[i0]
[i1]…[in-1] in an n-dimensional array declared as:
A[upper0][upper1]…[uppern-1]
 The address for A[i0][i1]…[in-1] is:
2.6 The String Abstract data
type(1/19)
2.6.1 Introduction
 The String: component elements are characters.
 A string to have the form, S = s0, …, sn-1, where si
are characters taken from the character set of the
programming language.
 If n = 0, then S is an empty or null string.
 Operations in ADT 2.4, p. 81
2.6 The String Abstract data
type(2/19)
 ADT String:
2.6 The String Abstract data
type(3/19)
 In C, we represent strings as character arrays
terminated with the null character \0.

 Figure 2.8 shows how these strings would be


represented internally in memory.
2.6 The String Abstract data
type(4/19)
 Now suppose we want to concatenate these
strings together to produce the new string:
 Two strings are joined together by strcat(s, t), which stores the
result in s. Although s has increased in length by five, we have
no additional space in s to store the extra five characters. Our
compiler handled this problem inelegantly: it simply overwrote
the memory to fit in the extra five characters. Since we declared
t immediately after s, this meant that part of the word “house”
disappeared.
2.6 The String Abstract data
type(5/19)
 C string
functions
2.6 The String Abstract data
type(6/19)
 Example 2.2[String insertion]:
 Assume that we have two strings, say string 1 and
string 2, and that we want to insert string 2 into string
1 starting at the i th position of string 1. We begin with
the declarations:
 In addition to creating the two strings, we also have
created a pointer for each string.
2.6 The String Abstract data
type(7/19)
 Now suppose that the first string contains
“amobile” and the second contains “uto”.
 we want to insert “uto”
starting at position 1 of
the first string, thereby
producing the word
“automobile.’
2.6 The String Abstract data
type(8/19)
 String insertion function:
 It should never be used in practice as it is wasteful in
its use of time and space.
2.6 The String Abstract data
type(9/19)
 2.6.2 Pattern Matching:
 Assume that we have two strings, string and pat where pat is a
pattern to be searched for in string.
 If we have the following declarations:

 Then we use the following statements to determine if pat is in


string:

 If pat is not in string, this method has a computing time of


O(n*m) where n is the length of pat and m is the length of string.
2.6 The String Abstract data
type(10/19)
 We can improve on an exhaustive pattern
matching technique by quitting when strlen(pat)
is greater than the number of remaining
characters in the string.
2.6 The String Abstract data
type(11/19)
 Example 2.3 [Simulation of nfind]
 Suppose pat=“aab”
and
string=“ababbaabaa.”
 Analysis of nfind:
The computing time for
these string is linear
in the length of the
string O(m), but the
Worst case is still
O(n.m).
2.6 The String Abstract data
type(12/19)
 Ideally, we would like an algorithm that works in
O(strlen(string)+strlen(pat)) time.This is optimal
for this problem as in the worst case it is
necessary to look at all characters in the pattern
and string at least once.
 Knuth,Morris, and Pratt have developed a
pattern matching algorithm that works in this way
and has linear complexity.
2.6 The String Abstract data
type(13/19)
 Suppose pat = “a b c a b c a c a b”
2.6 The String Abstract data
type(14/19)

 From the definition of the failure function, we arrive at


the following rule for pattern matching: if a partial
match is found such that Si-j…Si-1=P0P1…Pj-1 and
Si != Pj then matching may be resumed by comparing
Si and Pf(j-1)+1 if j != 0 .If j= 0, then we may continue
by comparing Si+1 and P0.
2.6 The String Abstract data
type(15/19)
 This pattern matching rule translates into
function pmatch.
2.6 The String Abstract data
type(16/19)
 Analysis of pmatch:
 The while loop is iterated until the end of either the string or
the pattern is reached. Since i is never decreased, the lines
that increase i cannot be executed more than m =
strlen(string) times. The resetting of j to failure[j-1]+1
decreases j++ as otherwise, j falls off the pattern. Each time
the statement j++ is executed, i is also incremented. So j
cannot be incremented more than m times. Hence the
complexity of function pmatch is O(m) = O(strlen(string)).
2.6 The String Abstract data
type(17/19)
 If we can compute the failure function in O(strlen(pat))
time, then the entire pattern matching process will
have a computing time proportional to the sum of the
lengths of the string and pattern. Fortunately, there is
a fast way to compute the failure function. This is
based upon the following restatement of the failure
function:
2.6 The String Abstract data
type(18/19)
2.6 The String Abstract data
type(19/19)
 Analysis of fail:
 In each iteration of the while loop the value of i
decreases (by the definition of f ). The variable i is reset
at the beginning of each iteration of the for loop.
However, it is either reset to -1(initially or when the
previous iteration of the for loop goes through the last
else clause) or it is reset to a value 1 greater than its
terminal value on the previous iteration (i.e., when the
statement failure [ j ] = i+1 is executed). Since the for
loop is iterated only n-1(n is the length of the pattern)
times, the value of i has a total increment of at most n-1.
Hence it cannot be decremented more than n-1 times.
 Consequently the while loop is iterated at most n-1 times
over the whole algorithm and the computing time of fail is
O(n) = O(strlen(pat)).
Chapter 3 Stacks and Queues
 The Stack Abstract Data Type
 The Queue Abstract Data Type
 A Mazing Problem
 Evaluation of Expressions
3.1 The stack ADT (1/5)
 A stack is an ordered list in which insertions and
deletions are made at one end called the top.
 If we add the elements A, B, C, D, E to the
stack, in that order, then E is the first element
we delete from the stack
 A stack is also known as a Last-In-First-Out
(LIFO) list.
3.1 The stack ADT (2/5)
An application of stack:
stack frame of function call old frame pointer fp
(activation record)
return address
fp: a pointer to current stack frame al

local variables
old frame pointer fp old frame pointer
stack frame of invoking function
return address return address
main
system stack before a1 is invoked system stack after a1 is invoked
(a) (b)
*Figure 3.2: System stack after function call a1 (p.103)
3.1 The stack ADT (3/5)
 The ADT specification of the stack is shown in
Structure 3.1
3.1 The stack ADT (4/5)
 Implementation: using array
3.1 The stack ADT (5/5)
3.2 The queue ADT (1/7)
 A queue is an ordered list in which all insertion take
place one end, called the rear and all deletions
take place at the opposite end, called the front
 If we insert the elements A, B, C, D, E, in that
order, then A is the first element we delete from the
queue
 A stack is also known as a First-In-First-Out (FIFO)
list
3.2 The queue ADT (2/7)
 The ADT specification of the queue appears in
Structure 3.2
3.2 The queue ADT (3/7)
 Implementation 1:
using a one dimensional array and
two variables, front and rear
3.2 The queue ADT (4/7)

problem: there may be available space when IsFullQ is true i.e. movement is
required.
3.2 The queue ADT (5/7)
 Example 3.2 [Job scheduling]:
 Figure 3.5 illustrates how an operating system might
process jobs if it used a sequential representation for
its queue.

 As jobs enter and leave the system, the queue gradually shift
to right.
 In this case, queue_full should move the entire queue to the
left so that the first element is again at queue[0], front is at -1,
and rear is correctly positioned.
 Shifting an array is very time-consuming, queue_full has a
worst case complexity of O(MAX_QUEUE_SIZE).
 We can obtain a more efficient
3.2 (6/7) representation if we regard the array
queue[MAX_QUEUE_SIZE] as
circular.
Implementation 2:
regard an array as a circular
queue
front: one position
counterclockwise from the
first element
rear: current end

Problem: one space is left when


queue is full.
 Implementing addq and deleteq for a circular
queue is slightly more difficult since we must
3.2 (7/7) assure that a circular rotation occurs.
3.3 A Mazing Problem (1/8)
 Representation of the maze
 The most obvious choice is a two dimensional array
 0s the open paths and 1s the barriers
 Notice that not every position has eight neighbors. entrance
 To avoid checking for these border 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
conditions we can surround the maze 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1
by a border of ones. Thus an mp 1 1 0 0 0 1 1 0 1 1 1 0 0 1 1 1 1
maze will require an (m+2)  (p+2) 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1

array 1 1 1 0 1 1 1 1 0 1 1 0 1 1 0 0 1
1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1
 The entrance is at 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 1 1
position [1][1] and the exit at [m][p] 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1
1 0 0 1 1 0 1 1 0 1 1 1 1 1 0 1 1
1 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0 1
1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 1
1 0 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1

exit 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
3.3 A Mazing Problem (2/8)
 If X marks the spot of our current location,
maze[row][col], then Figure 3.9 shows the
possible moves from this position
3.3 A Mazing Problem (3/8)
 A possible implementation:
 Predefinition: the possible directions to move in an
array, move, as in Figure 3.10.
 Obtained from Figure 3.9
typedef struct {
short int vert;
short int horiz;
} offsets;
offsets move[8]; /*array of moves for each direction*/
 If we are at position, maze[row][col], and we wish to find
the position of the next move, maze[row][col], we set:
next_row = row + move[dir].vert;
next_col = col + move[dir].horiz;
3.3 A Mazing Problem (4/8)
 Initial attempt at a maze traversal algorithm
 maintain a second two-dimensional array, mark, to record

the maze positions


already checked
 use stack to keep
pass history

#define MAX_STACK_SIZE
100 /*maximum stack size*/
typedef struct {
short int row;
short int col;
short int dir;
} element;
element
stack[MAX_STACK_SIZE];
R: row
R1C1D1 C: col
R4 C12
R3 C14 D 5
2
D: dir
R3 C13 D 6
3
Pop out
R2 C12 D 3
R2 C11 D 2
maze[1][1]: entrance Initially set mark[1][1]=1
R1 C10 D 3
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
R1C9D2 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 █ ██ █  █ █ █ █
█ ██  ██ ██ █ █
R1C8D2 1 1 0 0 0 1 1 0 1 1 1 0 0 1 1 1 1 █ █ █
█ █ ██ █ █ ██ █ █ █ █
█ █

R2C7D1 1 0 1 1 0 0 0 0 1 1 1 1 0 0 1 1 1 █ █ █ █ █ █ █
█ 
█ █ █ █ █ █ █ █ █ 
1 1 1 0 1 1 1 1 0 1 1 0 1 1 0 0 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
█ █  
R3C6D1
1 1 1 0 1 0 0 1 0 1 1 1 1 1 1 1 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
R3C5D2 1 0 0 1 1 0 1 1 1 0 1 0 0 1 0 1 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █

R2C4D3 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
1 0 0 1 1 0 1 1 0 1 1 1 1 1 0 1 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
R1C5D5
1 1 1 0 0 0 1 1 0 1 1 0 0 0 0 0 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
R1C4D2 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
R1C3D2 1 0 1 0 0 1 1 1 1 1 0 1 1 1 1 0 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █
R2C2D1
R1C1D3
1 maze[15][11]: exit
3.3 A Mazing Problem (6/8)
 Review of add and delete to a stack
3.3 A Mazing
Problem (7/7)
 Analysis:
 The worst case of
computing time of path
is O(mp), where m and
p are the number of
rows and columns of
the maze 0
respectively 7 N 1
6 W E 2
5 S 3
4
(1,1)
(m,p)
(m+2)*(p+2)
3.3 A Mazing Problem (8/8)
 The size of a stack
0 0 0 0 0 1
1 1 1 1 1 0
1 0 0 0 0 1
0 1 1 1 1 1
1 0 0 0 0 1
1 1 1 1 1 0
1 0 0 0 0 1
0 1 1 1 1 1
1 0 0 0 0 0 m*p mp  m / 2 / p, or  p / 2 / m

*Figure 3.11: Simple maze with a long path (p.116)


3.4 Evaluation of Expressions
(1/14)
 3.4.1 Introduction
 The representation and evaluation of expressions is of
great interest to computer scientists.
 ((rear+1==front) || ((rear==MAX_QUEUE_SIZE-1) && !front)) (3.1)
 x = a/b - c+d*e - a*c
(3.2)
 If we examine expressions (3.1),
we notice that they contains:
 operators: ==, +, -, ||, &&, !
 operands: rear, front, MAX_QUEUE_SIZE
 parentheses: ( )
3.4 Evaluation of Expressions
(2/14)
 Understanding the meaning of these or any other
expressions and statements
 assume a = 4, b = c = 2, d = e = 3 in the statement
(3.2), finding out the value of x = a/b – c + d*e - a*c
 Interpretation 1: ((4/2)-2)+(3*3)-(4*2) = 0+8+9 = 1
 Interpretation 2: (4/(2-2+3))*(3-4)*2 = (4/3)*(-1)*2 = -2.66666…
 we would have written (3.2) differently by using
parentheses to change the order of evaluation:
 x = ((a/(b - c+d))*(e - a)*c (3.3)
 How to generate the machine instructions
corresponding to a given expression?
precedence rule + associative rule
3.4 (3/14)
 Precedence
hierarchy and
associative for
C
3.4 Evaluation of Expressions
(4/14)
 Evaluating postfix expressions
 The standard way of writing expressions is known as
infix notation
 binary operator in-between its two operands
 Infix notation is not the one used by compilers to
evaluate expressions
 Instead compilers typically use a parenthesis-free
notation referred to as postfix

Postfix:
no
parentheses,
no precedence
3.4 Evaluation of Expressions
(5/14)
 Evaluating postfix expressions is much simpler
than the evaluation of infix expressions:
 There are no parentheses to consider.
 To evaluate an expression we make a single left-to-
right scan of it.
 We can evaluate
an expression
easily by using
a stack
Figure 3.14 shows this
processing when the
input is nine character
string 6 2/3-4 2*+
3.4 Evaluation of Expressions (6/14)
 Representation
 We now consider the representation of both the stack and
the expression
3.4 Evaluation of Expressions
(7/14)
 Get Token
3.4 (8/14)
 Evaluation of
Postfix Expression
3.4 Evaluation of Expressions
(9/14)
string: 6 2/3-4 2*+

we make a single left-to-right


scan of it add the string with the operator

6 2 / 3 - 4 2 * +
2 2
thean
not answer
notoperator,
an
is an anisoperator,
operator,
not
is operator,
an operator,
not an
not
isoperator,
an
an
isend
operator,
operator,
an of
operator,
string,
put into
put into
pop
thepop
put
stack
the
twointo
two
stack
elements
put
the
elements
into
put
pop
stackthe
into
pop
two
pop
stack
the
two
elements
thestack
elements
stack 1 4*2
2
3
4
6/2-3 + 4*2
of the
of the
stackstack of the
ofand
stack
theget
stack
answer
0 6/2-3+4*2
6/2-3
6/2
6
top
STACK
now, top must -1
+1
-2
3.4 Evaluation of Expressions
(10/14)
We can describe am algorithm for producing a postfix
expression from an infix one as follows
(1) Fully parenthesize expression
a/b-c+d*e-a*c
 ((((a / b) - c) + (d * e)) - (a * c))
(2) All operators replace their corresponding right
parentheses
((((a / b) - c) + (d * e)) - (a * c))
* * two passes
/ - + -
(3)Delete all parentheses
ab/c-de*+ac*-
 The order of operands is the same in infix and postfix
3.4 Evaluation of Expressions
 (11/14)
Example 3.3 [Simple expression]: Simple expression a+b*c, which
yields abc*+ in postfix.
icp isp

12
12
13 13
13
0

 Example 3.5 [Parenthesized expression]: The expression a*(b+c)*d,


which yields abc+*d*
in postfix
icp isp

13 13
20 0
0
12 12
12
19 match )13
13 13
13
0 13
3.4 Evaluation of Expressions

(12/14)
Algorithm to convert from infix to postfix
 Assumptions:
 operators: (, ), +, -, *, /, %
 operands: single digit integer or variable of one character
1. Operands are taken out immediately
2. Operators are taken out of the stack as long as their in-stack
precedence (isp) is higher than or equal to the incoming
precedence (icp) of the new operator
3. ‘(‘ has low isp, and high icp
op ( ) + - * / %
eos
Isp 0 19 12 12 13 13 13
0
Icp 20 19 12 12 13 13 13
0
precedence stack[MAX_STACK_SIZE];
/* isp and icp arrays -- index is value of precedence lparen,
rparen, plus, minus, times, divide, mod, eos */
static int isp [ ] = {0, 19, 12, 12, 13, 13, 13, 0};
3.4 Evaluation of Expressions
(13/14)
a * ( b + c ) / d
operand,
operator
operator
operand,
operator
operand,
operator
operator
operand,
eos
print out print out print out print out
pushthe
pop into
stack
the stack
and printout 2 +
the isp of“)”,
operator “+“
“( “pop
“/ is 13
12
0 and
and
andprint
the
theicp
out
icpofof
until
“*“*
“(“““(”
isis13
20
13
1 (

output 0 */
top
a b c + * d / stack
now, top must -+1
1
3.4 Evaluation of Expressions
(14/14)
 Complexity: (n)
 The total time spent
here is (n) as the
number of tokens that
get stacked and
unstacked is linear in n
 where n is the number
of tokens in the
expression
3.5 MULTIPLE STACKS AND
QUEUE (1/5)
 Two stacks
m[0], m[1], …, m[n-2], m[n-1]

bottommost bottommost
stack 1 stack 2
More than two stacks (n)
memory is divided into n equal segments
boundary[stack_no]
0  stack_no < MAX_STACKS
top[stack_no]
0  stack_no < MAX_STACKS
3.5 MULTIPLE STACKS AND
QUEUE (2/5)
Initially, boundary[i]=top[i].

0 1 [ m/n ] 2[ m/n ] m-1

boundary[ 0] boundary[1] boundary[ 2]


boundary[n]
top[ 0] top[ 1] top[ 2]

All stacks are empty and divided into roughly equal segments.
3.5 MULTIPLE STACKS AND
QUEUE (3/5)
a*(p.128)
#define MEMORY_SIZE 100 /* size of memory */
#define MAX_STACK_SIZE 100 /* max number of stacks plus 1 */
/* global memory declaration */
element memory [MEMORY_SIZE];
int top [MAX_STACKS];
int boundary [MAX_STACKS];
int n; /* number of stacks entered by the user */
*(p.129) To divide the array into roughly equal segments :
top[0] = boundary[0] = -1;
for (i = 1; i < n; i++)
top [i] =boundary [i] =(MEMORY_SIZE/n)*i;
boundary [n] = MEMORY_SIZE-1;
3.5 MULTIPLE STACKS AND
QUEUE (4/5)
*Program 3.12:Add an item to the stack stack-no (p.129)
void add (int i, element item) {
/* add an item to the ith stack */
if (top [i] == boundary [i+1])
stack_full (i); may have unused storage
memory [++top [i] ] = item;
}
*Program 3.13:Delete an item from the stack stack-no (p.130)
element delete (int i) {
/* remove top element from the ith stack */
if (top [i] == boundary [i])
return stack_empty (i);
return memory [ top [i]--];
}
3.5 MULTIPLE STACKS AND
QUEUE (5/5)
Find j, stack_no < j < n ( 往右 )
such that top[j] < boundary[j+1]
or, 0  j < stack_no ( 往左 )

b[0] t[0] b[1] t[1] b[i] t[i] t[i+1] t[j] b[j+1] b[n]
b[i+1] b[i+2]
meet
b=boundary, t=top
往左或右找一個空間

*Figure 3.19: Configuration when stack i meets stack i+1, but the memory is not full (p.130)
Chapter 4 Lists
 Pointers
 Singly Linked Lists
 Dynamically Linked Stacks and Queues
 Polynomials
 Chain
 Circularly Linked Lists
 Equivalence Relations
 Doubly Linked Lists
Pointers (1/5)
 Consider the following alphabetized list of three
letter English words ending in at:
(bat, cat, sat, vat)
 If we store this list in an array
 Add the word mat to this list
 move sat and vat one position to the right before we insert mat.
 Remove the word cat from the list
 move sat and vat one position to the left
 Problems of a sequence representation (ordered
list)
 Arbitrary insertion and deletion from arrays can be very
time-consuming
Pointers (2/5)
 An elegant solution: using linked representation
 Items may be placed anywhere in memory.
 In a sequential representation the order of elements is
the same as in the ordered list, while in a linked
representation these two sequences need not be the
same.
 Store the address, or location, of the next element in
that list for accessing elements in the correct order
with each element.
 Thus, associated with each list element is a node
which contains both a data component and a pointer
to the next item in the list. The pointers are often
called links.
Pointers (3/5)
 C provides extensive supports for pointers.
 Two most important operators used with the pointer
type :
& the address operator
* the dereferencing (or indirection) operator
 Example:
If we have the declaration:
int i, *pi;
then i is an integer variable and pi is a pointer to an integer.
If we say:
pi = &i;
then &i returns the address of i and assigns it as the value of pi.
To assign a value to i we can say:
i = 10; or *pi = 10;
Pointers (4/5)
 Pointers can be dangerous
 Using pointers: high degree of flexibility and efficiency, but
dangerous as well.
 It is a wise practice to set all pointers to NULL when they are not
actually pointing to an object.
 Another: using explicit type cast when converting between pointer
types.
 Example:
pi = malloc(sizeof(int));/*assign to pi a pointer to
int*/
pf = (float *)pi; /*casts int pointer to float pointer*/
 In many systems, pointers have the same size as type int.
 Since int is the default type specifier, some programmers omit the
return type when defining a function.
 The return type defaults to int which can later be interpreted as a
pointer.
Pointers (5/5)
 Using dynamically allocated storage
 When programming, you may not know how much space
you will need, nor do you wish to allocate some vary large
area that may never be required.
 C provides heap, for allocating storage at run-time.
 You may call a function, malloc, and request the amount of
memory you need.
 When you no longer need an area of memory, you may free it by
calling another function, free, and return the area of memory to the
system.
 Example:
request memory

return memory
Singly Linked Lists (1/15)
 Linked lists are drawn as an order sequence of
nodes with links represented as arrows (Figure
4.1).
 The name of the pointer to the first node in the list is the
name of the list. (the list of Figure 4.1 is called ptr.)
 Notice that we do not explicitly put in the values of pointers, but
simply draw allows to indicate that they are there.
Singly Linked Lists (2/15)
 The nodes do not resident in sequential locations
 The locations of the nodes may change on
different runs
ptr

... Null
Link Field

Node
Data Field

chain
Singly Linked Lists (3/15)
 Why it is easier to make arbitrary insertions and
deletions using a linked list?
 To insert the word mat between cat can sat, we must:
 Get a node that is currently unused; let its address be paddr.
 Set the data field of this node to mat.
 Set paddr’s link field to point to the address found in the link
field of the node containing cat.
 Set the link field of the node containing cat to point to paddr.
Singly Linked Lists (4/15)
 Delete mat from the list:
 We only need to find the element that immediately
precedes mat, which is cat, and set its link field to point
to mat’s link (Figure 4.3).
 We have not moved any data, and although the link
field of mat still points to sat, mat is no longer in the
list.
Singly Linked Lists (5/15)
 We need the following capabilities to make linked
representations possible:
 Defining a node’s structure, that is, the fields it
contains. We use self-referential structures, discussed
in Section 2.2 to do this.
 Create new nodes when we need them. (malloc)
 Remove nodes that we no longer need. (free)
Singly Linked Lists (6/15)
 2.2.4 Self-Referential Structures
 One or more of its components is a pointer to itself.

 typedef struct list { Construct a list with three nodes


char data; item1.link=&item2;
list *link; item2.link=&item3;
} malloc: obtain a node (memory)
free: release memory
 list item1, item2, item3;
item1.data=‘a’;
item2.data=‘b’; a b c
item3.data=‘c’;
item1.link=item2.link=item3.link=NULL;
Singly Linked Lists (7/15)
 Example 4.1 [List of words ending in at]:
 Declaration
typedef struct list_node *list_pointer;
struct list_node {
char data [4];
list_pointer link;
};
 Creation
list_pointer ptr =NULL;
 Testing
#define IS_EMPTY(ptr) (!(ptr))
 Allocation
ptr=(list_pointer) malloc (sizeof(list_node));
 Return the spaces:
free(ptr);
Singly Linked Lists (8/15)
e -> name  (*e).name
strcpy(ptr -> data, “bat”);
ptr -> link = NULL;
address of ptr data ptr link
first node

 b a t \0 NULL
ptr

Figure 4.4:Referencing the fields of a node(p.142)


Singly Linked Lists (9/15)
 Example 4.2 [Two-node linked list]:
typedef struct list_node *list_pointer; #define IS_FULL(ptr) (!
typedef struct list_node { (ptr))
int data;
When returns NULL if there
list_pointer link;
}; is no more memory.
list_pointer ptr =NULL;
 Program 4.2: Create a two-node list
list_pointer create2( )
{
/* create a linked list with two nodes */
list_pointer first, second;
first = (list_pointer) malloc(sizeof(list_node));
second = (list_pointer) malloc(sizeof(list_node));
second -> link = NULL;
second -> data = 20; ptr
first -> data = 10;
first ->link = second;
return first; 10 
}
20 NULL
Singly Linked Lists (10/15)
• Insertion
 Observation
 insert a new node with data = 50 into the list ptr
after node

ptr
10 20 NULL
node

50

temp
Singly Linked Lists (11/15)
 Implement Insertion:
void insert(list_pointer *ptr, List_pointer node)
{
/* insert a new node with data = 50 into the list ptr after
node */
list_pointer temp;
temp=(list_pointer)malloc(sizeof(list_node));
if(IS_FULL(temp)){
fprintf(stderr, “The memory is full\n”);
exit(1);
}
temp 50
temp->data=50;
Singly Linked Lists (12/15)
if(*ptr){ //nonempty list
temp->link = node->link;
node->link = temp;
}
else{ //empty list
temp->link = NULL;
*ptr = temp;
} ptr
} 10 20 NULL
node

50

temp
Singly Linked Lists (13/15)
 Deletion
 Observation: delete node from the list

ptr trial node

10 50 20 NULL

ptr trial node

10 50 20 NULL

ptr

10 20 NULL
Singly Linked Lists (14/15)
 Implement Deletion:
void delete(list_pointer *ptr, list_pointer trail, list_pointer
node)
{
/* delete node from the list, trail is the preceding node
ptr is the head of the list */
if(trail)
trail->link = node->link;
else
ptr trial node
*ptr = (*ptr)->link;
free(node);
10 50 20 NULL
}
Singly Linked Lists (15/15)
 Print out a list (traverse a list)
 Program 4.5: Printing a list
void print_list(list_pointer ptr)
{
printf(“The list contains: “);
for ( ; ptr; ptr = ptr->link)
printf(“%4d”, ptr->data);
printf(“\n”);
}
Dynamically Linked
Stacks and Queues (1/8)
 When several stacks and queues coexisted, there
was no efficient way to represent them
sequentially.
 Notice that direction of links for both stack and the queue
facilitate easy insertion and deletion of nodes.
 Easily add or delete a node form the top of the stack.
 Easily add a node to the rear of the queue and add or delete a
node at the front of a queue.
Dynamically Linked
Stacks and Queues (2/8)
 Represent n stacks Stack

top item link

link

..
.
NULL
Dynamically Linked
Stacks and Queues (3/8)
 Push in the linked stack
void add(stack_pointer *top, element item){
/* add an element to the top of the stack */ Push
stack_pointer temp = (stack_pointer) malloc (sizeof
(stack));
if (IS_FULL(temp)) {
fprintf(stderr, “ The memory is full\n”);
exit(1); temp item link
}
top link
temp->item = item;
temp->link = *top; link
*top= temp;

..
.
} NULL
Dynamically Linked
Stacks and Queues (4/8)
 Pop from the linked stack
element delete(stack_pointer *top) {
/* delete an element from the stack */ Pop
stack_pointer temp = *top;
element item;
if (IS_EMPTY(temp)) {
temp
fprintf(stderr, “The stack is empty\n”);
top item link
exit(1);
} link
item = temp->item;
link
*top = temp->link;

..
.
free(temp);
return item; NULL
}
Dynamically Linked
Stacks and Queues (5/8)
 Represent n queues

Queue
front item link
Delete from
link

..
.
Add to
rear NULL
Dynamically Linked
Stacks and Queues (6/8)
 enqueue in the linked queue

front link
link

..
.
rear NULL

temp item NULL


Dynamically Linked
Stacks and Queues (7/8)
 dequeue from the linked queue (similar to push)

temp
front item link

link
link

..
.
rear NULL
Dynamically Linked
Stacks and Queues (8/8)
 The solution presented above to the n-stack, m-
queue problem is both computationally and
conceptually simple.
 We no longer need to shift stacks or queues to make
space.
 Computation can proceed as long as there is memory
available.
Polynomials (1/9)
 Representing Polynomials As Singly Linked Lists
 The manipulation of symbolic polynomials, has a classic example
of list processing.
 In general, we want to represent the polynomial:

A( x )  am1 x em 1      a0 x e0
 Where the ai are nonzero coefficients and the ei are
nonnegative integer exponents such that
em-1 > em-2 > … > e1 > e0 ≧ 0 .
 We will represent each term as a node containing coefficient and
exponent fields, as well as a pointer to the next term.
Polynomials (2/9)
 Assuming that the coefficients are integers, the type
declarations are:
typedef struct poly_node *poly_pointer;
typedef struct poly_node {
int coef;
int expon; a  3 x14  2 x 8  1
poly_pointer link;
};
poly_pointer a,b,d;
b  8 x 14  3x 10  10 x 6
 Draw poly_nodes as:
coef expon link
Polynomials (3/9)
 Adding Polynomials
 To add two polynomials,we examine their terms
starting at the nodes pointed to by a and b.
 If the exponents of the two terms are equal
1. add the two coefficients
2. create a new term for the result.
 If the exponent of the current term in a is less than b
1. create a duplicate term of b
2. attach this term to the result, called d
3. advance the pointer to the next term in b.
 We take a similar action on a if a->expon > b->expon.
 Figure 4.12 generating the first three term of
d = a+b (next page)
Polynomials
(4/9)
Polynomials
(5/9)
 Add two
polynomials
Polynomials (6/9)
 Attach a node to the end of a list
void attach(float coefficient, int exponent, poly_pointer *ptr){
/* create a new node with coef = coefficient and expon = exponent,
attach it to the node pointed to by ptr. Ptr is updated to point to
this new node */
poly_pointer temp;
temp = (poly_pointer) malloc(sizeof(poly_node));
/* create new node */
if (IS_FULL(temp)) {
fprintf(stderr, “The memory is full\n”);
exit(1);
}
temp->coef = coefficient; /* copy item to the new node */
temp->expon = exponent;
(*ptr)->link = temp; /* attach */
*ptr = temp; /* move ptr to the end of the list */
}
Polynomials (7/9)
 Analysis of padd
A( x )(  am1 x em 1      a0 x e0 )  B( x )(  bn 1 x f n 1    b0 x f 0 )

1. coefficient additions
0  additions  min(m, n)
where m (n) denotes the number of terms in A (B).
2. exponent comparisons
extreme case:
em-1 > fm-1 > em-2 > fm-2 > … > e1 > f1 > e0 > f0
m+n-1 comparisons
3. creation of new nodes
extreme case: maximum number of terms in d is m+n
m + n new nodes
summary: O(m+n)
Polynomials (8/9)
 A Suite for Polynomials
e(x) = a(x) * b(x) + d(x) read_poly()
poly_pointer a, b, d, e; print_poly()
... padd()
a = read_poly(); psub()
b = read_poly();
pmult()
d = read_poly();
temp = pmult(a, b);
temp is used to hold a partial result.
e = padd(temp, d); By returning the nodes of temp, we
print_poly(e); may use it to hold other polynomials
Polynomials (9/9)
 Erase Polynomials
 erase frees the nodes in temp
void erase (poly_pointer *ptr){
/* erase the polynomial pointed to by ptr */
poly_pointer temp;
while ( *ptr){
temp = *ptr;
*ptr = (*ptr) -> link;
free(temp);
}
}
Chain (1/3)
 Chain:
 A singly linked list in which the last node has a null link
 Operations for chains
 Inverting a chain
 For a list of length ≧1 nodes, the while loop is executed
length times and so the computing time is linear or O(length).

... NULL

lead
invert

× ...
NULL

lead
Chain
(2/3)

Two extra pointers

lead middle trial

NULL ... NULL


 Concatenates two chains Chain (3/3)
 Concatenates two chains, ptr1 and ptr2.
 Assign the list
ptr1 followed
by the list ptr2.

O(length of list ptr1)

temp

NULL NULL

ptr1 ptr2
Circularly Linked Lists (1/10)
 Circular Linked list
 The link field of the last node points to the first
node in the list.
 Example
 Represent a polynomial ptr = 3x14+2x8+1 as a
circularly linked list.

ptr 3 14 2 8 1 0
Circularly Linked Lists (2/10)
 Maintain an Available List
 We free nodes that are no longer in use so that we
may reuse these nodes later
 We can obtain an efficient erase algorithm for circular
lists, by maintaining our own list (as a chain) of nodes
that have been “freed”.
 Instead of using malloc and free, we now use
get_node (program 4.13) and ret_node (program
4.14).

avail ... NULL

List of freed nodes


Circularly Linked Lists (3/10)
 Maintain an Available List (cont’d)
 When we need a new node, we examine this list.
 If the list is not empty, then we may use one of its
nodes.
 Only when the
list is empty we
do need to use
malloc to
create a new
node.
Circularly Linked Lists (4/10)
 Maintain an Available List (cont’d)
 Insert ptr to the front of this list
 Let avail be a variable of type poly_pointer that points
to the first node in our list of freed nodes.
 Henceforth, we call this list the available space list or
avail list.
 Initially, we set avail to NULL
Circularly Linked Lists (5/10)
 Maintain an Available List
 Erase a circular list in a fixed
amount (constant) of time
O(1) independent of the
number of nodes in the list
using cerase

ptr      

temp

avail      NULL

 紅色 link 所連接而成的 chain


Circularly Linked Lists (6/10)
 We must handle the zero polynomial as a special
case. To avoid it, we introduce a head node into
each polynomial
 each polynomial, zero or nonzero, contains one additional
node.
 The expon and coef fields of this node are irrelevant.

Why ?

So !
Circularly Linked
Lists (7/10)
 For fit the
circular list with /* head node */
head node
representation
 We may remove
the test for (*ptr) /*a->expon=-1, so b->expont > -1 */
from cerase
 Changes the
original padd to
cpadd

/* link to the first node */


Circularly Linked Lists (8/10)
 Operations for circularly linked lists
 Question:
 What happens when we want to insert a new node
at the front of the circular linked list ptr?

ptr x1 x2 x3
 Answer:
 move down the entire length of ptr.
 Possible Solution:

x1 x2 x3 ptr
Circularly Linked Lists (9/10)
 Insert a new
node at the front
of a circular list
 To insert node
at the rear, we
only need to add
the additional
statement *ptr =
node to the else
clause of
insert_front

x1 x2 x3 ptr

node
Circularly Linked Lists (10/10)
 Finding the length of a circular list
Equivalence Relations (1/6)
 Reflexive Relation
 for any polygon x, x ≡ x (e.g., x is electrically equivalent to itself)

 Symmetric Relation
 for any two polygons x and y, if x ≡ y, then y ≡ x.

 Transitive Relation
 for any three polygons x, y, and z, if x ≡ y and y ≡ z, then x ≡ z.

 Definition:
 A relation over a set, S, is said to be an equivalence relation
over S iff it is symmertric, reflexive, and transitive over S.
 Example:
 “equal to” relationship is an equivalence relation
 Example: Equivalence Relations (2/6)
 if we have 12 polygons numbered 0 through 11
0  4, 3  1, 6  10, 8  9, 7  4, 6  8, 3  5, 2  11, 11  0
 we can partition the twelve polygons into the following
equivalence classes:
{0, 2, 4, 7, 11};{1, 3, 5};{6, 8, 9,10}
 Two phases to determine equivalence
 First phase: the equivalence pairs (i, j) are read in and
stored.
 Second phase:
 we begin at 0 and find all pairs of the form (0, j).
Continue until the entire equivalence class containing 0 has
been found, marked, and printed.
 Next find another object not yet output, and repeat
the above process.
Equivalence Relation (3/6)
 Program to find equivalence classes
void main(void){ #include <stdio.h>
short int out[MAX_SIZE]; #define MAX_SIZE 24
node_pointer seq[MAX_SIZE]; #define IS_FULL(ptr) (!(ptr))
node_pointer x, y, top; #define FALSE 0
int i, j, n; #define TRUE 1
printf(“Enter the size (<=%d) ”, MAX_SIZE);
scanf(“%d”, &n);
for(i=0; i<n; i++){ typedef struct node
/*initialize seq and out */ *node_pointer;
typedef struct node {
out[i] = TRUE; seq[i] = NULL;
} int data;
/* Phase 1 */ node_pointer link;
};
/* Phase 2 */
}
Equivalence Relations (4/6)
 Phase 1: read in and store the equivalence pairs <i, j>
[0] 11 4 NULL

[1] 3 NULL

[2] 11 NULL
[3] 5 1 NULL

[4] 7 0 NULL
(1) (2)
[5] 3 NULL Insert x to the top of lists seq[i]
[6] 8 10 NULL
[7] 4 NULL

[8] 6 9 NULL

[9] 8 NULL

[10 6 NULL
Insert x to the top of lists seq[j]
]
[11 0 2 NULL

]
0  4, 3  1, 6  10, 8  9, 7  4, 6  8, 3  5, 2  11, 11  0
Equivalence Relations (5/6)
 Phase 2:
 begin at 0 and find all pairs of the form <0, j>, where 0
and j are in the same equivalence class
 by transitivity, all pairs of the form <j, k> imply that k
in the same equivalence class as 0
 continue this way until we have found, marked, and
printed the entire equivalent class containing 0
Equivalence Relations (6/6)
x
y
 Phase 2
[0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

top
i= 1
0 j= 11
7
2
4
0 out
:
[0] 11 4 NULL

[1] 3 NULL

[2] 11 NULL
[3] 5 1 NULL

[4] 7 0 NULL

[5] 3 NULL

[6] 8 10 NULL
[7] 4 NULL

[8] 6 9 NULL

[9] 8 NULL

[10 6 NULL

]
[11 0 2 NULL

]
New class: 0 11 4 7 2
Doubly Linked Lists (1/4)
 Singly linked lists pose problems because we
can move easily only in the direction of the links
... NULL

? ptr
 Doubly linked list has at least three fields
 left link field(llink), data field(item), right link field(rlink).
 The necessary declarations:
typedef struct node *node_pointer;
typedef struct node{
node_pointer llink;
element item;
node_pointer rlink;
};
Doubly Linked Lists (2/4)
 Sample
 doubly linked circular with head node: (Figure 4.23)

 empty double linked circular list with head node


(Figure 4.24)

 suppose that ptr points to any node in a doubly


linked list, then:
 ptr = ptr -> llink -> rlink = ptr -> rlink -> llink
Doubly Linked Lists (3/4)
 Insert node

Head node

node

llink item rlink

New node
Doubly Linked Lists (4/4)
 Delete node

Head node

llink item rlink


deleted node
Chapter 5 Trees: Outline
 Introduction
 Representation Of Trees
 Binary Trees
 Binary Tree Traversals
 Additional Binary Tree Operations
 Threaded Binary Trees
 Heaps
 Binary Search Trees
 Selection Trees
 Forests
Introduction (1/8)
 A tree structure means that the data are organized
so that items of information are related by branches
 Examples:
Introduction (2/8)
 Definition (recursively): A tree is a finite set of
one or more nodes such that
 There is a specially designated node called root.
 The remaining nodes are partitioned into n>=0 disjoint
set T1,…,Tn, where each of these sets is a tree. T1,
…,Tn are called the subtrees of the root.
 Every node in the tree is the root of some
subtree
Introduction (3/8)
 Some Terminology
 node: the item of information plus the branches to each
node.
 degree: the number of subtrees of a node
 degree of a tree: the maximum of the degree of the
nodes in the tree.
 terminal nodes (or leaf): nodes that have degree zero
 nonterminal nodes: nodes that don’t belong to terminal
nodes.
 children: the roots of the subtrees of a node X are the
children of X
 parent: X is the parent of its children.
Introduction (4/8)
 Some Terminology (cont’d)
 siblings: children of the same parent are said to be
siblings.
 Ancestors of a node: all the nodes along the path
from the root to that node.
 The level of a node: defined by letting the root be at
level one. If a node is at level l, then it children are at
level l+1.
 Height (or depth): the maximum level of any node in
the tree
Introduction (5/8)
 Example
A is the root node Property: (# edges) = (#nodes) - 1
B is the parent of D and E
C is the sibling of B
D and E are the children of B
D, E, F, G, I are external nodes, or leaves
A, B, C, H are internal nodes
The level of E is 3 Level
The height (depth) of the tree is 4 A
The degree of node B is 2 1
The degree of the tree is 3
B C
The ancestors of node I is A, C, H 2
The descendants of node C is F, G, H, I
H
3
D E F G
I
4
Introduction (6/8)
 Representation Of Trees
 List Representation
 we can write of Figure 5.2 as a list in which each of the
subtrees is also a list
( A ( B ( E ( K, L ), F ), C ( G ), D ( H ( M ), I, J ) ) )
 The root comes first,
followed by a list of sub-trees
Introduction (7/8)
 Representation Of
Trees (cont’d)
 Left Child-
Right Sibling
Representation
Introduction (8/8)
 Representation Of Trees (cont’d)
 Representation
As A Degree
Two Tree
Binary Trees (1/9)
 Binary trees are characterized by the fact that
any node can have at most two branches
 Definition (recursive):
 A binary tree is a finite set of nodes that is either
empty or consists of a root and two disjoint binary
trees called the left subtree and the right subtree
 Thus the left subtree and the right subtree are
distinguished A A

B B

 Any tree can be transformed into binary tree


 by left child-right sibling representation
Binary Trees (2/9)
 The abstract data type of binary tree
Binary Trees (3/9)
 Two special kinds of binary trees:
(a) skewed tree, (b) complete binary tree
 The all leaf nodes of these trees are on two adjacent levels
Binary Trees (4/9)
 Properties of binary trees
 Lemma 5.1 [Maximum number of nodes]:
1. The maximum number of nodes on level i of a binary
tree is 2i-1, i 1.
2. The maximum number of nodes in a binary tree of
depth k is 2k-1, k1.
 Lemma 5.2 [Relation between number of leaf
nodes and degree-2 nodes]:
For any nonempty binary tree, T, if n0 is the number
of leaf nodes and n2 is the number of nodes of
degree 2, then n0 = n2 + 1.
 These lemmas allow us to define full and
complete binary trees
Binary Trees (5/9)
 Definition:
 A full binary tree of depth k is a binary tree of death k
having 2k-1 nodes, k  0.
 A binary tree with n nodes and depth k is complete iff its
nodes correspond to the nodes numbered from 1 to n in
the full binary tree of depth k.
 From Lemma 5.1, the
height of a complete
binary tree with n nodes
is log2(n+1)
Binary Trees (6/9)
 Binary tree representations (using array)
 Lemma 5.3: If a complete binary tree with n nodes
is represented sequentially, then for any node with
index i, 1  i  n, we have
1. parent(i) is at i /2 if i  1.
If i = 1, i is at the root and has no parent.
2. LeftChild(i) is at 2i if 2i  n.
If 2i  n, then i has no left child.
A 1
3. RightChild(i) is at 2i+1 if 2i+1  n.
If 2i +1  n, then i has no left child
[1] [2] [3] [4] [5] [6] [7] B 2 C 3
A B C — D — E
Level 1 D E
Level 2 Level 3 4 5 6 7
Binary Trees (7/9)
 Binary tree representations (using array)
 Waste spaces: in the worst case, a skewed tree of depth
k requires 2k-1 spaces. Of these, only k spaces will be
occupied
 Insertion or deletion
of nodes from the
middle of a tree
requires the
movement of
potentially many nodes
to reflect the change in
the level of these nodes
Binary Trees (8/9)
 Binary tree representations (using link)
Binary Trees (9/9)
 Binary tree representations (using link)
Binary Tree Traversals (1/9)
 How to traverse a tree or visit each node in the
tree exactly once?
 There are six possible combinations of traversal
LVR, LRV, VLR, VRL, RVL, RLV
 Adopt convention that we traverse left before
right, only 3 traversals remain
LVR (inorder), LRV (postorder), VLR (preorder)

left_child data right_child


V
L: moving left : R: moving right
visiting
node
Binary Tree Traversals (2/9)
 Arithmetic Expression using binary tree
 inorder traversal (infix expression)
A/B*C*D+E
 preorder traversal (prefix expression)
+**/ABCDE
 postorder traversal
(postfix expression)
AB/C*D*E+
 level order traversal
+*E*D/CAB
Binary Tree Traversals (3/9)
 Inorder traversal (LVR) (recursive version)
output A / B * C * D + E
:
ptr
L
V
R
Binary Tree Traversals (4/9)
 Preorder traversal (VLR) (recursive
version) output + * * / A B C D E
:

V
L
R
Binary Tree Traversals (5/9)
 Postorder traversal (LRV) (recursive
version) output A B / C * D * E +
:

L
R
V
Binary Tree Traversals (6/9)
 Iterative inorder traversal
 we use a stack to simulate recursion
5 4 11
8 3 14
2 17
1
A B
/ *C D
* E
+

L
V

output A / B*C * D + E node


:
Binary Tree Traversals (7/9)
 Analysis of inorder2 (Non-recursive Inorder
traversal)
 Let n be the number of nodes in the tree
 Time complexity: O(n)
 Every node of the tree is placed on and removed
from the stack exactly once
 Space complexity: O(n)
 equal to the depth of the tree which
(skewed tree is the worst case)
Binary Tree Traversals (8/9)
 Level-order traversal
 method:
 We visit the root first, then the root’s left child, followed by the
root’s right child.
 We continue in this manner, visiting the nodes at each new
level from the leftmost node to the rightmost nodes
 This traversal requires a queue to implement
Binary Tree Traversals (9/9)
 Level-order traversal (using queue)
output + * E * D / C A B
:
1 17 3 14 4 11 5
2 8
*+ E * D / C A B

FIFO

ptr
Additional Binary Tree Operations (1/7)
 Copying Binary Trees
 we can modify the postorder traversal algorithm only
slightly to copy the binary tree

similar as
Program 5.3
Additional Binary Tree Operations (2/7)
 Testing Equality
 Binary trees are equivalent if they have the same
topology and the information in corresponding nodes
is identical V L R

the same topology and data as Program 5.6


Additional Binary Tree Operations (3/7)
 Variables: x1, x2, …, xn can hold only of two
possible values, true or false
 Operators: (and), (or), ¬(not)
 Propositional Calculus Expression
 A variable is an expression
 If x and y are expressions, then ¬x, xy, xy are
expressions
 Parentheses can be used to alter the normal order of
evaluation (¬
( >  > )
 Example: x1  (x2  ¬x3)
Additional Binary Tree Operations (4/7)
 Satisfiability problem:
 Is there an assignment to make an expression true?
 Solution for the Example x1  (x2  ¬x3) :
 If x1 and x3 are false and x2 is true
 false  (true  ¬false) = false  true = true
 For n value of an expression, there are 2n
possible combinations of true and false
Additional Binary Tree Operations (5/7)
(x1  ¬x2)  (¬ x1  x3)  ¬x3

postorder traversal 

data  

value   X3

X1   X3

X2 X1
Additional Binary Tree Operations (6/7)
 node structure
 For the purpose of our evaluation algorithm, we
assume each node has four fields:

 We define this node structure in C as:


Additional Binary Tree Operations (7/7)
 Satisfiability function
 To evaluate the tree is
easily obtained by
modifying the original L
recursive postorder R
V
traversal
TRUE
node

TRUE
FALSE

FALSE
TRUE T
TRUE

F
T TRUE FALSE FALSE
TRUE

FALSE F T TRUE
Threaded Binary Trees (1/10)
 Threads
 Do you find any drawback of the above tree?
 Too many null pointers in current representation of
binary trees
n: number of nodes
number of non-null links: n-1
total links: 2n
null links: 2n-(n-1) = n+1
 Solution: replace these null pointers with some useful
“threads”
Threaded Binary Trees (2/10)
 Rules for constructing the threads
 If ptr->left_child is null,
replace it with a pointer to the node that would be
visited before ptr in an inorder traversal
 If ptr->right_child is null,
replace it with a pointer to the node that would be
visited after ptr in an inorder traversal
Threaded Binary Trees (3/10)
 A Threaded Binary Tree
root A
t: true  thread
f: false  child
dangling
f B f C

D t E t F G
dangling

inorder traversal:
H I
H D I B E A F C G
Threaded Binary
Trees (4/10)

 Two additional fields of the node structure,


left-thread and right-thread
 If ptr->left-thread=TRUE,
then ptr->left-child contains a thread;
 Otherwise it contains a pointer to the left child.
 Similarly for the right-thread
Threaded Binary Trees (5/10)
 If we don’t want the left pointer of H and the right
pointer of G to be dangling pointers, we may
create root node and assign them pointing to the
root node
Threaded Binary Trees (6/10)
 Inorder traversal of a threaded binary tree
 By using of threads we can perform an inorder
traversal without making use of a stack (simplifying
the task)
 Now, we can follow the thread of any node, ptr, to
the “next” node of inorder traversal
1. If ptr->right_thread = TRUE, the inorder successor
of ptr is ptr->right_child by definition of the threads
2. Otherwise we obtain the inorder successor of ptr by
following a path of left-child links from the right-child
of ptr until we reach a node with left_thread = TRUE
Threaded Binary Trees (7/10)
 Finding the inorder successor (next node) of a node
threaded_pointer insucc(threaded_pointer tree){
threaded_pointer temp;
temp = tree->right_child; tree
if (!tree->right_thread)
while (!temp->left_thread)
temp = temp->left_child;
return temp; temp
}

Inorder
Threaded Binary Trees (8/10)
 Inorder traversal of a threaded binary tree
void tinorder(threaded_pointer tree){
/* traverse the threaded binary tree inorder */
threaded_pointer temp = tree;
output H D I B E A FC G
for (;;) { :
temp = insucc(temp);
if (temp==tree) tree
break;
printf(“%3c”,temp->data);
}
}
Time Complexity: O(n)
Threaded Binary Trees (9/10)
 Inserting A Node Into A Threaded Binary Tree
 Insert child as the right child of node parent
1. change parent->right_thread to FALSE
2. set child->left_thread and child->right_thread to
TRUE
3. set child->left_child to point to parent
4. set child->right_child to parent->right_child
5. change parent->right_child to point to child
Threaded Binary Trees (10/10)
 Right insertion in a threaded binary tree
void insert_right(thread_pointer parent, threaded_pointer child){
/* insert child as the right child of parent in a threaded binary tree */
threaded_pointer temp; root
child->right_child = parent->right_child;
child->right_thread = parent->right_thread; A parent
child->left_child = parent; B
child->left_thread = TRUE;
parent->right_child = child; C X child
parent->right_thread = FALSE;
If(!child->right_thread){ A parent temp
temp = insucc(child); B child
temp->left_child = child; X
} C
}
D
First Case
Second Case E F
successor
Heaps (1/6)
 The heap abstract data type
 Definition: A max(min) tree is a tree in which the key
value in each node is no smaller (larger) than the key
values in its children. A max (min) heap is a complete
binary tree that is also a max (min) tree
 Basic Operations:
 creation of an empty heap
 insertion of a new elemrnt into a heap
 deletion of the largest element from the heap
Heaps (2/6)
 The examples of max heaps and min heaps
 Property: The root of max heap (min heap) contains
the largest (smallest) element
Heaps (3/6)
 Abstract data type of Max Heap
Heaps (4/6)
 Queue in Chapter 3: FIFO
 Priority queues
 Heaps are frequently used to implement priority queues
 delete the element with highest (lowest) priority
 insert the element with arbitrary priority
 Heaps is the only way to implement priority queue
machine service:
amount of time
(min heap)
amount of payment
(max heap)
factory:
time tag
Heaps (5/6)
 Insertion Into A Max Heap
 Analysis of insert_max_heap
 The complexity of the insertion function is O(log2 n)

insert 2
51
*n= 6
5
i= 1
6
7
3
[1
] 20
21
parent sink
[2 [3
] ] item upheap
15 20
52
[4 [5 [6 [7
] ] ]
]
14 10 2 5
 Deletion from a max heap Heaps (6/6)
 After deletion, the
heap is still a
complete binary tree
 Analysis of
delete_max_heap
 The complexity of the
insertion function
is O(log2 n)
parent = 41
2
*n= 5
4
child = 8
2 [1
4
<
] 15
20
[2 [3
] ]
15
14 2
[4 [5
item.key = 20
]
10 10
14
] temp.key = 10
Binary Search Trees (1/8)
 Why do binary search trees need?
 Heap is not suited for applications in which arbitrary
elements are to be deleted from the element list
 a min (max) element is deleted O(log2n)
 deletion of an arbitrary element O(n)
 search for an arbitrary element O(n)
 Definition of binary search tree:
 Every element has a unique key
 The keys in a nonempty left subtree (right subtree) are
smaller (larger) than the key in the root of subtree
 The left and right subtrees are also binary search trees
Binary Search Trees (2/8)
 Example: (b) and (c) are binary search trees

medium

smaller larger
Binary Search Trees
(3/8)
 Search:Search(25) Search(76)

44

17 88

32 65 97

28 54 82

29 76

80
Binary Search Trees (4/8)
 Searching a
binary search
tree

O(h)
Binary Search Trees (5/8)
 Inserting into a binary search tree

An empty tree
Binary Search Trees (6/8)
 Deletion from a binary search tree
 Three cases should be considered
 case 1. leaf  delete
 case 2.
one child  delete and change the pointer to this child
 case 3. two child  either the smallest element in the right
subtree or the largest element in the left subtree
Binary Search Trees (7/8)
 Height of a binary search tree
 The height of a binary search tree with n elements
can become as large as n.
 It can be shown that when insertions and deletions
are made at random, the height of the binary search
tree is O(log2n) on the average.
 Search trees with a worst-case height of O(log2n) are
called balance search trees
Binary Search Trees (8/8)
 Time Complexity
 Searching, insertion, removal
 O(h), where h is the height of the tree
 Worst case - skewed binary tree
 O(n), where n is the # of internal nodes
 Prevent worst case
 rebalancing scheme
 AVL, 2-3, and Red-black tree
Selection Trees (1/6)
 Problem:
 suppose we have k order sequences, called runs, that
are to be merged into a single ordered sequence
 Solution:
 straightforward : k-1 comparison
 selection tree : log2k+1
 There are two kinds of selection trees:
winner trees and loser trees
Selection Trees (2/6)
 Definition: (Winner tree)
 a selection tree is the binary tree where each node
represents the smaller of its two children
 root node is the smallest node in the tree
 a winner is the record with smaller key
 Rules:
 tournament : between sibling nodes
 put X in the parent node  X tree
where X = winner or loser
 Winner Tree Selection Trees (3/6)
sequential allocation Each node represents
scheme the smaller of its two
(complete children
binary tree)
sequence
ordered
Selection Trees (4/6)
 Analysis of merging runs using winner trees
 # of levels: log2K  +1  restructure time: O(log2K)
 merge time: O(nlog2K)
 setup time: O(K)
 merge time: O(nlog2K)
 Slight modification: tree of loser
 consider the parent node only (vs. sibling nodes)
Selection Trees (5/6)
 After one record has been output
6

15
Selection Trees (6/6)
 Tree of losers can be conducted by Winner tree

0
6 8

9 17

10 20 9 90
Forests (1/4)
 Definition:
 A forest is a set of n  0 disjoint trees
 Transforming a forest into a binary tree
 Definition: If T1,…,Tn is a forest of trees, then the
binary tree corresponding to this forest, denoted by
B(T1,…,Tn ):
 is empty, if n = 0
 has root equal to root(T1); has left subtree equal to
B(T11,T12,…,T1m); and has right subtree equal to
B(T2,T3,…,Tn)
 where T11,T12,…,T1m are the subtrees of root (T1)
Forests (2/4)
 Rotate the tree clockwise by 45 degrees

Leftmost child A
A E G
B E

B C D F H I C F G

Right sibling D H

I
 Forest traversals
 Forest preorder traversal
Forests (3/4)
(1)If F is empty, then return.
(2)Visit the root of the first tree of F.
(3)Traverse the subtrees of the first tree in tree preorder.
(4)Traverse the remaining tree of F in preorder.
 Forest inorder traversal
(1)If F is empty, then return
(2)Traverse the subtrees of the first tree in tree inorder
(3)Visit the root of the first tree of F
(4)Traverse the remaining tree of F in inorder
 Forest postorder traversal
(1)If F is empty, then return
(2)Traverse the subtrees of the first tree in tree postorder
(3)Traverse the remaining tree of F in postorder
(4)Visit the root of the first tree of F
preorder: A B C D E F G H I
inorder: B C A E D G H F I Forests (4/4)
A

B, C E, D, G, H, F, I

preorder: A B C (D E F G H I)
inorder: B C A (E D G H F I)
A

B D

C E F, G, H, I
Set Representation(1/13)

 S1={0, 6, 7, 8}, S2={1, 4, 9}, S3={2, 3, 5}


0 4 2

6 7 8 1 9 3 5
Si  Sj = 
 Two operations considered here
 Disjoint set union S1  S2={0,6,7,8,1,4,9}
 Find(i): Find the set containing the element i.
3  S3, 8  S1
Set Representation(2/13)
 Union and Find Operations
Make one of trees a subtree of the other
0 4

8 4 0 1 9
6 7

9 6 7 8
1

Possible representation for S1 union S2


Set Representation(3/13)
set 0
name pointer
S1 6 7 8
S2 4

S3 1 9

3 5

*Figure 5.41:Data Representation of S1S2and S3 (p.240)


Set Representation(4/13)
 Array Representation for Set
i [0] [1] [2] [3] [4] [5] [6] [7] [8] [9]
parent -4 4 -3 2 -3 2 0 0 0 4

int find1(int i) {
for(; parent[i] >= 0; i = parent[i]);
return i;
}
void union1(int i, int j) {
parent[i] = j;
}
Program 5.18: Initial attempt at union-find function (p.241)
Set Representation(5/13)
union operation n-1 union(0,1), find(0)
O(n) n-1 union(1,2), find(0)
.
find operation n-2 .
O(n2) n .
i 
union(n-2,n-1),find(0)
i 2

degenerate tree
*Figure 5.43:Degenerate tree (p.242)
Set Representation(6/13)
weighting rule for union(i,j): if # of nodes in i < # in j then j the parent of
i
Set Representation(7/13)
 Modified Union Operation
void union2(int i, int j)
{ Keep a count in the root of tree
int temp = parent[i] + parent[j];
if (parent[i] > parent[j]) {
parent[i] = j; /*make j the new root*/
parent[j] = temp;
}
else {
parent[j] = i;/* make i the new root*/
parent[i] = temp;
} If the number of nodes in tree i is
} less than the number in tree j, then
make j the parent of i; otherwise
make i the parent of j.
Set Representation(8/13)
Figure 5.45:Trees achieving worst case bound (p.245)

 log28+1
Set Representation(9/13)
 The definition of Ackermann’s function used
here is :

2 q P=0
0 q=0 and p >= 1
A ( p, q) = 

0 P>=1 and p = 1

 A( p  1, A( p, q  1)) p>=1 and q >= 2
Set Representation(10/13)
 Modified Find(i) Operation
Int find2(int i) {
int root, trail, lead;
for (root=i;parent[root]>=0;oot=parent[root])
;
for (trail=i; trail!=root; trail=lead) {
lead = parent[trail];
parent[trail]= root;
}
return root; If j is a node on the path from
} i to its root then make j a child
of the root
Set Representation(11/13)
0 0

1 2 4 1 2 4 6 7

3 5 6 3 5

find(7) find(7) find(7) find(7) find(7) find(7) find(7) find(7)


go up 3 1 1 1 1 1 1 1
reset 2
12 moves (vs. 24 moves)
Set Representation(12/13)
 Applications
 Find equivalence class i  j
 Find Si and Sj such that i  Si and j  Sj
(two finds)
 Si = Sj do nothing
 Si  Sj union(Si , Sj)
 example
0  4, 3  1, 6  10, 8  9, 7  4, 6  8,
3  5, 2  11, 11  0
{0, 2, 4, 7, 11}, {1, 3, 5}, {6, 8, 9, 10}
Set Representation(13/13)
Counting Binary trees(1/10)
 Distinct Binary Trees :
 If n=0 or n=1, there is only one binary tree.
 If n=2 and n=3,
Counting Binary trees(2/10)
 Stack Permutations
preorder: ABCDEFGHI
inorder: BCAEDGHFI
A A

B, C D, E, F, G, H, I B D

A
C E F

B D, E, F, G, H, I G I

C H
Counting Binary trees(3/10)
 Figure5.49(c) with the node numbering
of Figure 5.50.Its preorder permutation
is 1,2…,9, and its inorder 1
permutation is 2,3,1,5,4,
7,8,6,9. 2 4

3 5 6

7 9

8
Counting Binary trees(4/10)
 If we start with the numbers1,2,3, then the
possible permutations obtainable by a stack are:
(1,2,3) (1,3,2) (2,1,3) (2,3,1) (3,2,1)
 Obtaining(3,1,2) is impossible.

1 1
1 1 1

2 2
2 2 2 3

3 3 3 3
Counting Binary trees(5/10)
 Matrix Multiplication
 Suppose that we wish to compute the product of n matrices:
M1 * M2 . . .* Mn
 Since matrix multiplication is associative, we can perform
these multiplications in any order. We would like to know
how many different ways we can perform these
multiplications . For example, If n =3, there are two
possibilities:
(M1*M2)*M3
M1*(M2*M3)
Counting Binary trees(6/10)
 Let bn be the number of different ways to compute the
product of n matrices. Then b2  1, , b3  2 and b4  5 .
 Let M ij , i  j be the product M i * M i1 * ... * M j . .
The product we wish to compute is M ln by computing
any one of the products M 1i * M i 1,n ,1  i  n.
The number of distinct ways to obtain M 1i andM i 1,n
are bi and bni ,respectively. Therefore, letting bi =1,
we have: n 1
bn  b b
i 0
i n i ,n 1
Counting Binary trees(7/10)
 Now instead let bn be the number of distinct binary
trees with n nodes. Again an expression for bn in
terms of n is what we want. Than we see that bn is the
sum of all the possible binary trees formed in the
following way: a root and two subtrees with bi and bn i 1
nodes, for 0  i  n . This explanation says that
n 1
bn  b b i n i 1 ,n  1 b0  1
i 0 and bn

bi bn-i-1
Counting Binary trees(8/10)
 Number of Distinct Binary Trees:
 To obtain number of distinct binary trees with n
nodes, we must solve the recurrence of Eq.(5.5).To
begin we let: B ( x )  i0 i
b x i
(5.6)
 Which is the generating function for the number of
binary trees. Next observe that by the recurrence
relation we get the identity: xB 2 ( x )  b( x )  1
 Using the formula to solve quadratics and the fact
(Eq. (5.5)) that B(0) = b0 = 1 ,we get

1  1  4x
B( x ) 
2x
Counting Binary trees(9/10)
 Number of Distinct Binary Trees:
 We can use the binomial theorem to expand
1  4 x 1/ 2 to obtain:
n
1  1 / 2 
B( x )  1    ( 4 x ) 
2  n 0  n  

 1/ 2 
   (  1) m 2 m 1 m
2 x
m 0 m  1
(5.7)
Counting Binary trees(10/10)
 Number of Distinct Binary Trees:
 Comparing Eqs.(5.6) and (5.7) we see that bn ,
which is the coeffcient of x n in B(x), is :  1 / 2 
n  1   1n 2 2n1
 
Some simplification yields the more compact form
1 2n 
bn 
n  1  n 

n
which is approximately bn  O ( 4 3/ 2 )
n
Chapter 7 Sorting: Outline
 Introduction
 Searching and List Verification
 Definitions
 Insertion Sort
 Quick Sort
 Merge Sort
 Heap Sort
 Counting Sort
 Radix Sort
 Shell Sort
 Summary of Internal Sorting
Introduction (1/9)
 Why efficient sorting methods are so important ?
 The efficiency of a searching strategy depends
on the assumptions we make about the
arrangement of records in the list
 No single sorting technique is the “best” for all
initial orderings and sizes of the list being sorted.
 We examine several techniques, indicating when
one is superior to the others.
Introduction (2/9)
 Sequential search
 We search the list by examining the key
values list[0].key, … , list[n-1].key.
 Example: List has n records.
4, 15, 17, 26, 30, 46, 48, 56, 58, 82, 90, 95
 In that order, until the correct record is
located, or we have examined all the records
in the list
 Unsuccessful search: n+1  O(n)
 Average successful search n 1

 (i  1) / n  (n  1) / 2  O(n)
i 0
Introduction (3/9)
 Binary search
 Binary search assumes that the list is ordered on the key
field such that list[0].key  list[1]. key  …  list[n-1]. key.
 This search begins by comparing searchnum (search
key) and list[middle].key where
[5]
middle=(n-1)/2
46
[2] [8]
4, 15, 17, 26, 30, 46,
48, 56, 58, 82, 90, 95 17 58
[0] [3] [6] [10]
4 26 48 90
Figure 7.1: [1] [4] [7] [9] [11]
Decision tree for
binary search (p.323) 15 30 56 82 95
Introduction (4/9)
 Binary search (cont’d)
 Analysis of binsearch: makes no more than O(log
n) comparisons
Introduction (5/9)
 List Verification
 Compare lists to verify that they are identical or
identify the discrepancies.
 Example
 International Revenue Service (IRS)
(e.g., employee vs. employer)
 Reports three types of errors:
 all records found in list1 but not in list2
 all records found in list2 but not in list1
 all records that are in list1 and list2 with the same key
but have different values for different fields
Introduction (6/9)
 Verifying using a sequential search

Check whether
the elements in
list1 are also in
list2

And the elements


in list2 but not in
list1 would be
show up
Introduction (7/9)
 Fast
verification
of two lists
The element of
list1 is not in list2

The element of two


lists are matched
The element of
list2 is not in list1

The remainder
elements of a list
is not a member of
another list
 Complexities Introduction (8/9)
 Assume the two lists are randomly arranged
 Verify1: O(mn)
 Verify2: sorts them before verification
O(tsort(n) + tsort(m) + m + n)  O(max[nlogn,
mlogm])
 tsort(n): the time needed to sort the n records in list1
 tsort(m): the time needed to sort the m records in list2
 we will show it is possible to sort n records in O(nlogn) time
 Definition
 Given (R0, R1, …, Rn-1), each Ri has a key value Ki
find a permutation , such that K(i-1)  K(i), 0<i  n-1
  denotes an unique permutation
 Sorted: K(i-1)  K(i), 0<i<n-1
 Stable: if i < j and Ki = Kj then Ri precedes Rj in the
Introduction (9/9)
 Two important applications of sorting:
 An aid to search
 Matching entries in lists
 Internal sort
 The list is small enough to sort entirely in main
memory
 External sort
 There is too much information to fit into main
memory
Insertion Sort (1/3)
 Concept:
 The basic step in this method is to insert a record R
into a sequence of ordered records, R1, R2, …, Ri (K1
 K2 , …,  Ki) in such a way that the resulting
sequence of size i is also ordered
 Variation
 Binary insertion sort
 reduce search time
 List insertion sort
 reduce insert time
Insertion Sort (2/3)
 Insertion sort program list [0 [1 [2 [3 [4
]1
5 ]2
2 5 ]3
3 5 ]4
1 ]5
5 4
i= 3
1
4
2 next = 4
3
1
2
Insertion Sort (3/3)
 Analysis of InsertionSort:
 If k is the number of records LOO, then the computing
time is O((k+1)n)
Ri is LOO iff Ri  max{R j }
 The worst-case time is O(n ).
2 0 j i

 The average time is O(n2).


 The best time is O(n).

left out of order (LOO)

O(n)

n 2
O(  i )  O( n 2 )
j 0
Quick Sort (1/6)
 The quick sort scheme developed by C. A. R.
Hoare has the best average behavior among all
the sorting methods we shall be studying
 Given (R0, R1, …, Rn-1) and Ki denote a pivot key
 If Ki is placed in position s(i),
then Kj  Ks(i) for j < s(i), Kj  Ks(i) for j > s(i).
 After a positioning has been made, the original
file is partitioned into two subfiles, {R0, …, Rs(i)-1},
Rs(i), {Rs(i)+1, …, Rs(n-1)}, and they will be sorted
independently
Quick Sort (2/6)
 Quick Sort Concept
 select a pivot key
 interchange the elements to their correct positions
according to the pivot
 the original file is partitioned into two subfiles and they
will be sorted R0 R1 R 2 R 3 R 4 R 5 R 6 R7 R 8 R9
independently 26 5 37 1 61 11 59 15 48 19
11 5 19 1 15 26 59 61 48 37
1 5 11 19 15 26 59 61 48 37
1 5 11 19 15 26 59 61 48 37
1 5 11 15 19 26 59 61 48 37
1 5 11 15 19 26 48 37 59 61
1 5 11 15 19 26 37 48 59 61
1 5 11 15 19 26 37 48 59 61
In-Place Partitioning Example
a 6 2 8 5 11 10 4 1 9 7 3

a 6 2 3 5 11 10 4 1 9 7 8

a 6 2 3 5 1 10 4 11 9 7 8

a 6 2 3 5 1 4 10
10 11 9 7 8

bigElement is not to left of smallElement,


terminate process. Swap pivot and smallElement.
a 4 2 3 5 1 64 10 11 9 7 8
 Analysis for Quick Sort Quick Sort (4/6)
 Assume that each time a record is positioned, the list is
divided into the rough same size of two parts.
 Position a list with n element needs O(n)
 T(n) is the time taken to sort n elements
 T(n)<=cn+2T(n/2) for some c
<=cn+2(cn/2+2T(n/4))
...
<=cnlog2n+nT(1)=O(nlogn)
 Time complexity
 Average case and best case: O(nlogn)
 Worst case: O(n2)
 Best internal sorting method considering the average case
 Unstable
Quick Sort (5/6)
 Lemma 7.1:
 Let Tavg(n) be the expected time for quicksort to sort a
file with n records. Then there exists a constant k such
that Tavg(n)  knlogen for n  2
 Space for Quick Sort
 If the smaller of the two subarrays is always sorted first,
the maximum stack space is O(logn)
 Stack space complexity:
 Average case and best case: O(logn)
 Worst case: O(n)
Quick Sort (6/6)
 Quick Sort Variations
 Quick sort using a median of three: Pick the median
of the first, middle, and last keys in the current sublist
as the pivot. Thus, pivot = median{Kl, K(l+r)/2, Kr}.
Median of Three Partitioning Example
a 6 2 8 5 11 10
10 4 1 9 7 3

a 3 2 8 5 11 10
10 4 1 9 7 66

a 33 2 8 5 11 66 4 1 9 7 110
0
a 33 2 8 5 11 7 4 1 9 6 110
0
a 33 2 1 5 11 7 4 8 9 6 110
0
a 33 2 1 5 4 7 11 8 9 6 110
0
a 33 2 1 5 4 66 11 8 9 7 110
0
Merge Sort (1/7)
 Before looking at the merge sort algorithm to sort
n records, let us see how one may merge two
sorted lists to get a single sorted list.
 Merging
 Uses O(n) additional space.
 It merges the sorted lists
(list[i], … , list[m]) and (list[m+1], …, list[n]),
into a single sorted list, (sorted[i], … , sorted[n]).
 Copy sorted[1..n] to list[1..n]
 Merge (using O(n) space)
Merge Sort (3/7)
 Recursive merge sort concept
Merge Sort (4/7)
 Recursive merge sort concept
Merge Sort (5/7)
 Recursive merge sort concept
Merge Sort (6/7)
 Recursive merge sort concept
Merge Sort (7/7)
 Recursive merge sort concept
Heap Sort (1/3)
 The challenges of merge sort
 The merge sort requires additional storage
proportional to the number of records in the file
being sorted.
 Heap sort
 Require only a fixed amount of additional storage
 Slightly slower than merge sort using O(n)
additional space
 The worst case and average computing time is
O(n log n), same as merge sort
 Unstable
 adjust
 adjust the binary tree to establish the heap
root = 1
n = 10
rootkey = 26
child = 14
3
2
7
6

/* compare root and max. root */

/* move to parent */ [1] 77


26

[2] 5 [3] 59
77

[4] 1 [5] 61 11 [6][7] 26


59

[8] 15 [9] 48 19 [10]


Heap Sort (3/3)
 heapsort
n = 10
i= 19
8
7
6
5
4
3
2

bottom-up
ascending order
(max heap)

top-down [1] 19
126
61
48
77
59
26
15
11
5
1

51
[2] 19
15
61
5
48 77
[3] 11
59
26
1

48
15 61
1 [4][5] 19
5 11
26 59
1 [6][7] 48
26
1

15
5 [9] 48
[8] 59 161 19
5
77 [10]
Counting Sort
 For key values within small range
 1. scan list[1..n] to count the frequency of
every value
 2. sum to find proper index to put value x
 3. scan list[1..n] and put to sorted[]
 4. copy sorted to list
 O(n) for time and space
Radix Sort (1/5)
 We considers the problem of sorting records that
have several keys
 These keys are labeled K0 (most significant key), K1,
… , Kr-1 (least significant key).
 Let Ki j denote key Kj of record Ri.
 A list of records R0, … , Rn-1, is lexically sorted with
respect to the keys K0, K1, … , Kr-1 iff
(Ki0, Ki1, …, Kir-1)  (K0i+1, K1i+1, …, Kr-1i+1), 0 i < n-1
Radix Sort (2/5)
 Example
 sorting a deck of cards on two keys, suit and face
value, in which the keys have the ordering relation:
K0 [Suit]: <<<
K1 [Face value]: 2 < 3 < 4 < … < 10 < J < Q < K < A
 Thus, a sorted deck of cards has the ordering:
2, …, A, … , 2, … , A
 Two approaches to sort:
1. MSD (Most Significant Digit) first: sort on K0, then K1, ...
2. LSD (Least Significant Digit) first: sort on Kr-1, then Kr-2, ...
Radix Sort (3/5)
 MSD first
1. MSD sort first, e.g., bin sort, four bins    
2. LSD sort second
 Result: 2, …, A, … , 2, … , A
 LSD first Radix Sort (4/5)
1.LSD sort first, e.g., face sort,
13 bins 2, 3, 4, …, 10, J, Q, K, A
2.MSD sort second (may not needed, we can just classify
these 13 piles into 4 separated piles by considering them
from face 2 to face A)
 Simpler than the MSD one because we do not have to
sort the subpiles independently
Result:
2, …, A, … ,
2, …, A
Radix Sort (5/5)
 We also can use an LSD or MSD sort when we
have only one logical key, if we interpret this key
as a composite of several keys.
 Example:
 integer: the digit in the far right position is the least
significant and the most significant for the far left
position
 range: 0  K  999 MSD LSD
0-9 0-9 0-9
 using LSD or MSD sort for three keys (K0, K1, K2)
 since an LSD sort does not require the maintainence
of independent subpiles, it is easier to implement
Shell Sort
 For (h = magic1; h > 0; h /= magic2)
Insertion sort elements with distance h
 Idea: let data has chance to “long jump”
 Insertion sort is very fast for partially
sorted array
 The problem is how to find good magic?
 Several sets have been discussed
 Remember 3n+1
Summary of Internal Sorting (1/2)
 Insertion Sort
 Works well when the list is already partially ordered
 The best sorting method for small n
 Merge Sort
 The best/worst case (O(nlogn))
 Require more storage than a heap sort
 Slightly more overhead than quick sort
 Quick Sort
 The best average behavior
 The worst complexity in worst case (O(n2))
 Radix Sort
 Depend on the size of the keys and the choice of the radix
Summary of Internal Sorting (2/2)
 Analysis of the average running times
CS235102
Data Structures
Chapter 8 Hashing
Chapter 8 Hashing: Outline
 The Symbol Table Abstract Data Type
 Static Hashing
 Hash Tables
 Hashing Functions
 Mid-square
 Division
 Folding
 Digit Analysis
 Overflow Handling
 Linear Open Addressing, Quadratic probing, Rehashing
 Chaining
The Symbol Table ADT (1/3)
 Many example of dictionaries are found in many
applications, Ex. spelling checker
 In computer science, we generally use the term
symbol table rather than dictionary, when
referring to the ADT.
 We define the symbol table as a set of name-
attribute pairs.
 Example: In a symbol table for a compiler
 the name is an identifier
 the attributes might include an initial value
 a list of lines that use the identifier.
The Symbol Table ADT (2/3)
 Operations on symbol table:
 Determine if a particular name is in the table
 Retrieve/modify the attributes of that name
 Insert/delete a name and its attributes
 Implementations
 Binary search tree: the complexity is O(n)
 Some other binary trees (chapter 10): O(log n).
 Hashing
 A technique for search, insert, and delete operations
that has very good expected performance.
The Symbol Table ADT (3/3)
Search Techniques
 Search tree methods
 Identifier comparisons
 Hashing methods
 Relies on a formula called the hash function.
 Types of hashing
 Static hashing
 Dynamic hashing
Hash Tables (1/6)
 In static hashing, we store the identifiers in a
fixed size table called a hash table
 Arithmetic function, f
 To determine the address of an identifier, x, in the
table
 f(x) gives the hash, or home address, of x in the table
 Hash table, ht
 Stored in sequential memory locations that are
partitioned into b buckets, ht[0], …, ht[b-1].
 Each bucket has s slots
Hash Tables (2/6)
hash table (ht) f(x): 0 … (b-1)
0
1
buckets
b

2
. . .
.
. . ... ... ... .
. . . .
b-2
b-1
1 2 ………. s
s slots
Hash Tables (3/6)
 The identifier density of a hash table
is the ratio n/T
 n is the number of identifiers in the table
 T is possible identifiers
 The loading density or loading factor
of a hash table is  = n/(sb)
 s is the number of slots
 b is the number of buckets
Hash Tables (4/6)
 Two identifiers, i1 and i2 are synonyms with
respect to f if f(i1) = f(i2)
 We enter distinct synonyms into the same bucket as
long as the bucket has slots available
 An overflow occurs when we hash a new
identifier into a full bucket
 A collision occurs when we hash two
non-identical identifiers into the same bucket.
 When the bucket size is 1, collisions and
overflows occur simultaneously.
Hash Tables (5/6)
 Example 8.1: Hash table
 b = 26 buckets and s = 2 slots. Distinct identifiers n = 10
 The loading factor, , is 10/52 = 0.19.
 Associate the letters, a-z,
with the numbers, 0-25,
Synonyms
respectively
 Define a fairly simple hash Synonyms
function, f(x), as the
first character of x.
C library functions (f(x)): Synonyms
acos(0), define(3), float(5), exp(4),
char(2), atan(0), ceil(2), floor(5),
clock(2), ctime(2)
overflow: clock, ctime
Hash Tables (6/6)
 The time required to enter, delete, or search for
identifiers does not depend on the number of
identifiers n in use; it is O(1).
 Hash function requirements:
 Easy to compute and produces few collisions.
 Unfortunately, since the ration b/T is usually small, we
cannot avoid collisions altogether.
=> Overload handling mechanisms are needed
Hashing Functions (1/8)
 A hash function, f, transforms an identifier, x,
into a bucket address in the hash table.
 We want a hash function that is easy to compute
and that minimizes the number of collisions.
 Hashing functions should be unbiased.
 That is, if we randomly choose an identifier, x, from
the identifier space, the probability that f(x) = i is 1/b
for all buckets i.
 We call a hash function that satisfies unbiased
property a uniform hash function.
Mid-square, Division, Folding, Digit Analysis
Hashing Functions (2/8)
 Mid-square fm(x)=middle(x2):
 Frequently used in symbol table applications.
 We compute fm by squaring the identifier and then
using an appropriate number of bits from the middle
of the square to obtain the bucket address.
 The number of bits used to obtain the bucket address
depends on the table size. If we use r bits, the range
of the value is 2r.
 Since the middle bits of the square usually depend
upon all the characters in an identifier, there is high
probability that different identifiers will produce
different hash addresses.
Hashing Functions (3/8)
 Division fD(x) = x % M :
 Using the modulus (%) operator.
 We divide the identifier x by some number M and use
the remainder as the hash address for x.
 This gives bucket addresses that range from 0 to M - 1,
where M = that table size.
 The choice of M is critical.
 If M is divisible by 2, then odd keys to odd buckets
and even keys to even buckets. (biased!!)
Hashing Functions (4/8)
 The choice of M is critical (cont’d)
 When many identifiers are permutations of each other, a biased use
of the table results.
 Example: X=x1x2 and Y=x2x1
Internal binary representation: x1 --> C(x1) and x2 --> C(x2)
Each character is represented by six bits
X: C(x1) * 26 + C(x2), Y: C(x2) * 26 + C(x1)
(fD(X) - fD(Y)) % p (where p is a prime number)
= (C(x1) * 26 % p + C(x2) % p - C(x2) * 26 % p - C(x1) % p ) % p
p = 3, 26=64
(64 % 3 * C(x1) % 3 + C(x2) % 3 - 64 % 3 * C(x2) % 3 - C(x1) % 3) % 3
= C(x1) % 3 + C(x2) % 3 - C(x2) % 3 - C(x1) % 3 = 0 % 3
The same behavior can be expected when p = 7
 A good choice for M would be : M a prime number such that M does
not divide rka for small k and a.
Hashing Functions (5/8)
 Folding
 Partition identifier x into several parts
 All parts except for the last one have the same length
 Add the parts together to obtain the hash address
 Two possibilities (divide x into several parts)
 Shift folding:
Shift all parts except for the last one, so that the least
significant bit of each part lines up with corresponding
bit of the last part.
 x1=123, x2=203, x3=241, x4=112, x5=20, address=699
 Folding at the boundaries:
reverses every other partition before adding
 x1=123, x2=302, x3=241, x4=211, x5=20, address=897
Hashing Functions (6/8)
 Folding example:
123 203 241 112 20
P1 P2 P3 P4 P5
shift folding 123
203
241
112
20

699

folding at the 123 203 241 112 20


boundaries
MSD ---> LSD
LSD <--- MSD
Hashing Functions (7/8)
 Digit Analysis
 Used with static files
 A static files is one in which all the identifiers are known in
advance.
 Using this method,
 First, transform the identifiers into numbers using some radix,
r.
 Second, examine the digits of each identifier, deleting those
digits that have the most skewed distribution.
 We continue deleting digits until the number of remaining
digits is small enough to give an address in the range of the
hash table.
Hashing Functions (8/8)
 Digital Analysis example:
 All the identifiers are known in advance,
M=1~999
X1:d11 d12 … d1n
X2:d21 d22 … d2n

Xm:dm1 dm2 … dmn
 Select 3 digits from n
 Criterion:
Delete the digits having the most skewed distributions
 The one most suitable for general purpose
applications is the division method with a divisor,
M, such that M has no prime factors less than
Overflow Handling (1/8)
 Linear open addressing (Linear probing)
 Compute f(x) for identifier x
 Examine the buckets:
ht[(f(x)+j)%TABLE_SIZE], 0  j  TABLE_SIZE
 The bucket contains x.
 The bucket contains the empty string (insert to it)
 The bucket contains a nonempty string other than x
(examine the next bucket) (circular rotation)
 Return to the home bucket ht[f(x)],
if the table is full we report an error condition and exit
Overflow Handling (2/8)
 Additive transformation and Division

Hash table with linear probing (13 buckets, 1


slot/bucket)
insertion
Overflow Handling (3/8)
 Problem of Linear Probing
Enter ctime
float
acos
atoi
atol
define
ceil
cos
char
floor
exp
 Identifiers tend to cluster together
:
 Adjacent cluster tend to coalesce
 Increase the search time
 Example: suppose we enter the
C built-in functions into a
26-bucket hash table in order.
The hash function uses the first
character in each function name
Enter
sequence:
acos, atoi, char, define, exp,
ceil, cos, float, atol, floor, ctime
# of key comparisons=35/11=3.18
Hash table with linear probing (26 buckets, 1
Overflow Handling (4/8)
 Alternative techniques to improve open
addressing approach:
 Quadratic probing
 rehashing
 random probing
 Rehashing
 Try f1, f2, …, fm in sequence if collision occurs
 disadvantage
 comparison of identifiers with different hash values
 use chain to resolve collisions
Overflow Handling (5/8)
 Quadratic Probing
 Linear probing searches buckets (f(x)+i)%b
 Quadratic probing uses a quadratic function of i as
the increment
 Examine buckets f(x), (f(x)+i2)%b, (f(x)-i2)%b, for
1<=i<=(b-1)/2
 When b is a prime number of the form Prime j Prime j
4j+3, j is an integer, the quadratic search 3 0 43 10
examines every bucket in the table 7 1 59 14
11 2 127 31
19 4 251 62
23 5 503 125
31 7 1019 254
Overflow Handling (6/8)
 Chaining
 Linear probing and its variations perform poorly
because inserting an identifier requires the
comparison of identifiers with different hash values.
 In this approach we maintained a list of synonyms for
each bucket.
 To insert a new element
 Compute the hash address f (x)
 Examine the identifiers in the list for f(x).
 Since we would not know the sizes of the lists in
advance, we should maintain them as lined chains
Overflow Handling (7/8)
 Results of Hash Chaining
acos, atoi, char, define, exp, ceil, cos, float, atol, floor, ctime
f (x)=first character of x

# of key comparisons=21/11=1.91
 Comparison: Overflow Handling (8/8)
 In Figure 8.7, The values in each column give the average
number of bucket accesses made in searching eight
different table with 33,575, 24,050, 4909, 3072, 2241,
930, 762, and 500 identifiers each.
 Chaining performs better than linear open addressing.
 We can see that division is generally superior

Average number of bucket accesses per identifier retrieved


Dynamic Hashing
 Dynamic hashing using directories
 Analysis of directory dynamic hashing
 simulation
 Directoryless dynamic hashing
Dynamic Hashing Using Directories
Dynamic Hashing Using Directories
Dynamic Hashing Using Directories
Program8.5 Dynamic hashing
Analysis of Directory Dynamic Hashing
Directoryless Dynamic Hashing
Directoryless Dynamic Hashing
Directoryless Dynamic Hashing
Directoryless Dynamic Hashing
CS235102
Data Structures
Chapter 9 Heap Structures
 Min-Max Heap
 Deaps
 Leftist Trees
 Binomial Heaps
 Fibonacci Heaps
MIN-MAX Heaps (1/10)
 Definition
 A double-ended priority queue is a data structure that
supports the following operations:
 Insert an element with arbitrary key
 Delete an element with the largest key
 Delete an element with the smallest key
 Min heap or Max heap:
 Only insertion and one of the two deletion operations are
supported
 Min-Max heap:
 Supports all of the operations just described.
 Definition: MIN-MAX Heaps (2/10)
 A mix-max heap is a complete binary tree such that if it is
not empty, each element has a field called key.
 Alternating levels of this tree are min levels and max
levels, respectively.
 Let x be any node in a min-max heap. If x is on a min
(max) level then the element in x has the minimum
(maximum)
key from among
all elements in
the subtree with
root x. We call
this node a min
(max) node.
MIN-MAX Heaps (3/10)
 Insertion into a min-max heap (at a “max” level)
 If it is smaller/greater than its father (a “min”), then it must
be smaller/greater than all “max”/“min” above. So simply
check the “min”/“max” ancestors
 There exists a similar approach at a “min” level
 verify_max MIN-MAX Heaps (4/10)
 Following the nodes the max node i to the root and insert
into its proper
place

item = 80
i=313
grandparent = 0
3

#define MAX_SIZE 100


#define FALSE 0
[1
#define TRUE 1
]
#define SWAP(x,y,t)
((t)=(x), (x)=(y), [2 [3 80
(y)=(t)) ] ]
typedef struct {
int key; [4 [5 [6 [7
/* other fields */ [8] [9 [10] [11 [12] [13 ]
}element; ] ] ] ] ] ] 40
element
MIN-MAX Heaps (5/10)
 min_max_insert: Insert item into the min-max heap
item.key = 80
5
*n = 14
12
13 complexity: O(log n)
parent = 7
6

[1 7
5 min
]
[2 70 [3 40
80 max
] ]
[4 [5 [6 [7
] 30 ] 9 ] 10 ]
7 15 min

45 50 30
20 12 10 40 max
[8 [9 [10 [11 [12 [13 [14
MIN-MAX Heaps (6/10)
 Deletion of min element
 If we wish to delete the element with the smallest key,
then this element is in the root.
 In general situation, we are to reinsert an element item
into a min-max-heap, heap, whose root is empty.
 We consider the two cases:
1. The root has no children
 Item is to be inserted into the root.
2. The root has at least one child.
 The smallest key in the min-max-heap is in one of the children
or grandchildren of the root. We determine the node k has the
smallest key.
 The following possibilities need to be considered:
MIN-MAX Heaps (7/10)
a) item.key  heap[k].key
 No element in heap with key smaller than item.key
 Item may be inserted into the root.
b) item.key  heap[k].key, k is a child of the root
 Since k is a max node, it has no descendants with key
larger than heap[k].key. Hence, node k has no
descendants with key larger than item.key.
 heap[k] may be
moved to the
root and item
inserted into
node k.
MIN-MAX Heaps
a) item.key  heap[k].key, (8/10)
k is a grandchild of the root
 In this case, heap[k] may be moved to the root, now
heap[k] is seen as presently empty.
 Let parent be the parent of k.
 If item.key  heap[parent].key, then interchange them.
This ensures that the max node parent contains the
largest key in the sub-heap with root parent.
 At this point, we are faced with the problem of inserting
item into the
sub-heap with
root k.
Therefore, we
repeat the above
process.
 delete_min: complexity: O(log n)
 Delete the minimum element from the min-max heap
*n = 12
11
i=5 1
last = 5
k = 11
5
parent = 2
temp.key =
x.key = 12

[1 79 [0 7
] ]
[2 70 [3 40
] ]
[4 [5 [6 [7
] 30 ] 912 ] 10 ] 15

45 50 30
20 12
[8 [9 [10 [11 [12
MIN-MAX Heaps (10/10)
 Deletion of max element
1. Determine the children of the root which are located on
max-level, and find the larger one (node) which is the
largest one on the min-max heap
2. We would consider the node as the root of a max-min
heap
3. There exist a max-min heap
similar
approach
(deletion of
max element)
as we
mentioned
above
Deaps(1/8)
 Definition
 The root contains no element
 The left subtree is a min-heap
 The right subtree is a max-heap
 Constraint between the two trees:
 let i be any node in left subtree, j be the
corresponding node in the right subtree.
 if j not exists, let j corresponds to parent of i
 i.key <= j.key
Deaps(2/8)

log 2 n 1
 i = min_partner(n) = n2
log 2 n 1
 j = max_partner(n) = n2
 if j > heapsize j /= 2
Deaps Insert(3/8)
public void insert(int x) { } else {
int i; i = maxPartner(n);
if (++n == 2) { if (x > deap[i]) {
deap[2] = x; return; } deap[n] = deap[i];
if (inMaxHeap(n)) { maxInsert(i, x);
i = minPartner(n); } else minInsert(n, x);
if (x < deap[i]) { }
deap[n] = deap[i]; }
minInsert(i, x);
} else maxInsert(n, x);
Deaps(4/8)
 Insertion Into A Deap
Deaps(5/8)
Deaps(6/8)
Deaps delete min(7/8)
public int deleteMin() { // try to put x at leaf i
int i, j, key = deap[2], x = j = maxPartner(i);
deap[n--]; if (x > deap[j]) {
// move smaller child to i deap[i] = deap[j];
for (i = 2; 2*i <= n; deap[i] maxInsert(j, x);
= deap[j], i = j) { } else {
j = i * 2;
minInsert(i, x);
if (j+1 <= n && (deap[j]
}
> deap[j+1]) j++;
return key;
}
}
Deaps(8/8)
Leftist Trees(1/7)
 Support combine (two trees to one)
Leftist Trees(2/7)
 shortest(x) = 0 if x is an external node, otherwise
 1+min(shortest(left(x)),shortest(right(x))}
Leftist Trees(3/7)
 Definition: shortest(left(x)) >= shortest(right(x))
Leftist Trees(4/7)
 Algorithm for combine(a, b)
 assume a.data <= b.data
 if (a.right is null) then make b be right child of
a
 else combine (a.right, b)
 if shortest (a.right) > shortest (a.left) then
exchange
Leftist Trees(5/7)
Leftist Trees(6/7)
Leftist Trees(7/7)
Binomial Heaps(1/10)
 Cost Amortization( 分期還款 )
 every operation in leftist trees costs O(logn)
 actual cost of delete in Binomial Heap could be O(n),
but insert and combine are O(1)
 cost amortization charge some cost of a heavy
operation to lightweight operations
 amortized Binomial Heap delete is O(log2n)
 A tighter bound could be achieved for a sequence of
operations
 actual cost of any sequence of i inserts, c combines,
and dm delete in Binomial Heaps is O(i+c+dmlogi)
Binomial Heaps(2/10)
 Definition of Binomial Heap
 Node: degree, child ,left_link, right_link, data, parent
 roots are doubly linked
 a points to smallest root
Binomial Heaps(3/10)
Binomial Heaps(4/10)
 Insertion Into A Binomial Heaps
 make a new node into doubly linked circular
list pointed at by a
 set a to the root with smallest key
 Combine two B-heaps a and b
 combine two doubly linked circular lists to one
 set a to the root with smallest key
Binomial Heaps(5/10)
 Deletion Of Min Element
Binomial Heaps(6/10)
Binomial Heaps(7/10)
Binomial Heaps(8/10)
Binomial Heaps(9/10)
Binomial Heaps(10/10)
 Trees in B-Heaps is Binomial tree
 B0 has exactly one node
 Bk, k > 0, consists of a root with degree k
and whose subtrees are B0, B1, …, Bk-1
 Bk has exactly 2k nodes
 actual cost of a delete is O(logn + s)
 s = number of min-trees in a (original roots -
1) and y (children of the removed node)
Fibonacci Heaps(1/8)
 Definition
 delete, delete the element in a specified node
 decrease key
 This two operations are followed by cascading
cut
Fibonacci Heaps(2/8)
 Deletion From An F-heap
 min or not min
Fibonacci Heaps(3/8)
 Decrease Key
 if not min, and smaller than parent, then delete
Fibonacci Heap(4/8)
 To prevent the amortized cost of delete
min becomes O(n), each node can have
only one child be deleted.
 If two children of x were deleted, then x
must be cut and moved to the ring of
roots.
 Using a flag (true of false) to indicate
whether one of x’s child has been cut
Fibonacci Heaps(5/8)
 Cascading Cut
Fibonacci Heaps(6/8)
 Lemma
 the ith child of any node x in a F-Heap has a
degree of at least i – 2, except when i=1 the
degree is 0
 Corollary
 Let Sk be the minimum possible number of
descendants of a node of degree k, then S0=1,
S1=2. From the lemma above, we got
k 2
(2 comes from 1st child
Sk   Si  2 and root)
i 0
Fibonacci Heaps(7/8)
k
Fk  2   Fi  2
i2

S k  Fk  2
 That’s why the data structure is called
Fibonacci Heap
Fibonacci Heaps(8/8)
 Application Of F-heaps
CS235102
Data Structures
Chapter 10 Search Structures
Search Structures: Outline
 Optimal Binary Search Trees
 AVL Trees
 2-3 Trees
 2-3-4 Trees
 Red Black Trees
 B-Trees
Optimal binary search trees (1/14)
 In this section we look at the construction of
binary search trees for a static set of identifiers
 Make no additions to or deletions from the
 Only perform searches
 We examine the correspondence between a
binary search tree and the binary search
function
Optimal binary search trees (2/14)
 Examine: A binary search on the list (do, if , while)
is equivalent to
using the function
(search2) on the
binary search tree
Optimal binary search trees (3/14)
 For a given static list, to decide a cost measure
for search tree in order to find an optimal binary
search tree
 Assume that we wish to search for an identifier at
level k of a binary search tree.
 Generally, the number of iteration of binary search
equals the level number of the identifier we seek.
 It is reasonable to use the level number of a node as
its cost.
 A full binary tree may not be an optimal binary
search tree if the identifiers are
1
searched for with different frequency

 Consider these 2 2

two search trees, 3


If we search for
each identifier with equal probability 4
 In first tree, the average number of
comparisons for successful search is 2.4. (1+2+2+3+4)/5 = 2.4

 Comparisons for second tree is 2.2.


1
 The second tree has
 a better worst case search time than 2 2
the first tree.
 a better average behavior. (1+2+2+3+3)/5 = 2.2 3 3
Optimal binary
search trees (5/14)
 In evaluating binary search trees,
it is useful to add a special
square node at every place
there is a null links.
 We call these nodes external nodes.
 We also refer to the external nodes
as failure nodes.
 The remaining nodes are
internal nodes.
 A binary tree with external nodes
added is an extended binary tree
Optimal binary search trees (6/14)
 External / internal path length
 The sum of all external / internal nodes’ levels.
 For example
 Internal path length, I, is:
0
I=0+1+1+2+3=7
 External path length, E, is :
1 1
E = 2 + 2 + 4 + 4 + 3 + 2 = 17
 A binary tree with n internal 2 2 2 2
nodes are related by the formula
E = I + 2n 3 3

4 4
Optimal binary search trees (7/14)
 The maximum and minimum possible values for I
with n internal nodes
 Maximum:
 The worst case occurs when the tree is skewed, that
is, the tree has a depth of n.
 Minimum:
 We must have as many internal nodes as close to the
root as possible in order to obtain trees with minimal I
 One tree with minimal internal path length is the
complete binary tree that the distance of node i from
the root is log2i.
Optimal binary search trees (8/14)
 In the binary search tree:
 The identifiers a1, a2, …, an with a1 < a2 < … < an
 The probability of searching for each ai is pi
 The total cost (when only successful searches are
made) is:

 If we replace the null subtree by a failure node,


we may partition the identifiers that are not in the
binary search tree into n+1 classes Ei, 0 ≤ i ≤ n
 Ei contains all identifiers x such that ai < x < ai+1
 For all identifiers in a particular class, Ei, the search
terminates at the same failure node
Optimal binary search trees (9/14)
 We number the failure nodes form 0 to n with i
being for class Ei, 0  i  n.
 If qi is the probability that the identifier we are searching
for is in Ei, then the cost of the failure node is:

 Therefore, the total cost of a binary search tree is:


(10.1)

 An optimal binary search tree for the identifier set a1, …,


an is one that minimizes Eq. (10.1)
 Since all searches must terminate either successfully or
unsuccessfully, we have
1
Optimal binary search trees (10/14) E3
2 1
 The possible binary search trees for the E2
3 2
identifier set (a1, a2, a3) = (do, if, while) E0 E1
3 3
 The identifiers with equal probabilities,
pi=aj=1/7 for all i, j,
 cost(tree a) = 15/7; cost(tree b) = 13/7 (optimal);
cost(tree c) = 15/7; cost(tree d) = 15/7;
cost(tree e) = 15/7;
 p1 = 0.5, p2 = 0.1, p3 = 0.05,
q0 = 0.15, q1= 0.1, q2 = 0.05, q3 = 0.05
 cost(tree a) = 2.65;
cost(tree b) = 1.9;
cost(tree c) = 1.5;
(optimal)
cost(tree d) = 2.05;
cost(tree e) = 1.6;
Optimal binary search trees (11/14)
 How do we determine the optimal binary search
tree for a given set of identifiers?
 We can make some observations about the
properties of optimal binary search trees
 Tij : an optimal binary search tree for ai+1, …, aj, i < j.
 Tii is an empty tree for 0  i  n and Tij is not defined for i > j.
 cij : the cost of the search tree Tij.
 By definition cii is 0.
j
 rij : the root of Tij wij  qi   ( qk  pk )
k i 1
 wij : the weight of Tij ,
 By definition, rii = 0 and wii = qi , 0  i  n .
 T0n is an optimal binary search for a1, …, an. Its cost is
c0n, its weight is w0n, and its root is r0n
Optimal binary search trees (12/14)
 If Tij is an optimal binary search tree for ai+1, …, aj
and rij = k, then k satisfies the inequality
ak
i < k  j.
L R
 T has two subtrees L and R.
 L is the left subtree and the identifiers ai+1, …, ak-1
 R is the right subtree and the identifiers ak+1, …, aj
 The cost cij of Tij is (wij = pk + wi,k-1 + wkj)
pk + cost(L) + cost(R) + weight(L) + weight(R) =
pk + Ci,k-1 + Ckj + wi,k-1 + wkj = wij + Ci,k-1 + Ckj =
min{ci ,l 1  clj }
wij + i l  j

 It shows us how to obtain T0n and C0n, starting from


Optimal binary search trees (13/14)
 Example
 Let n = 4, (a1, a2, a3, a4) = (do, for, void, while).
Let (p1, p2, p3, p4) = (3, 3, 1, 1)
and (q0, q1, q2, q3, q4) = (2, 3, 1, 1, 1).
 Initially wii = qi , cii = 0, and rii = 0, 0 ≤ i ≤ 4
w01 = p1 + w00 + w11 = p1 + q1 + w00 = 8
c01 = w01 + min{c00 +c11} = 8, r01 = 1

w12 = p2 + w11 + w22 = p2 +q2 +w11 = 7


c12 = w12 + min{c11 +c22} = 7, r12 = 2

w23 = p3 + w22 + w33 = p3 +q3 +w22 = 3


c23 = w23 + min{c22 +c33} = 3, r23 = 3

w34 = p4 + w33 + w44 = p4 +q4 +w33 = 3


c = w + min{c +c } = 3, r = 4
Optimal binary search trees (14/14)
 wii = qi (a1, a2, a3, a4) = (do,for,void,while)
(p1, p2, p3, p4) = (3, 3, 1, 1)
 wij = pk + wi,k-1 + wkj (q0, q1, q2, q3, q4) = (2, 3, 1, 1, 1)
 cij = wij + min
i l  j
{ci ,l 1  clj }
 cii = 0
 rii = 0
 rij = l
2

1 3
Computation is carried out row-wise
from row 0 to row 4
4

The optimal search tree as the result


AVL Trees (1/17)
 We also may maintain dynamic tables as binary
search trees.
 Figure 10.8 shows the binary search tree obtained by
entering the months January to December, in that order,
into an initially empty binary search tree
 The maximum number of comparisons needed to search
for any identifier in the tree of Figure 10.8 is six
(for November).
 Average number of
comparisons is
42/12 = 3.5
AVL Trees (2/17)
 Suppose that we now enter the months into an
initially empty tree in alphabetical order
 The tree degenerates into the chain
 number of comparisons:
maximum: 12, and average: 6.5
 in the worst
case, binary
search trees
correspond to
sequential
searching in an
ordered list
 Another insert sequence
 In the order Jul, Feb, May, Aug, Jan, Mar, Oct, Apr, Dec,
Jun, Nov, and Sep, by Figure 10.9.
 Well balanced and does not have any paths to leaf nodes
that are much longer than others.
 Number of comparisons:
maximum: 4, and average: 37/12  3.1.
 All intermediate trees created during the construction of
Figure 10.9 are also well balanced
 If all permutations are equally probable, then we can prove
that the average
search and
insertion time is
O(logn) for n
node binary
search tree
AVL Trees (4/17)
 Since we have a dynamic environment, it is hard to
achieve:
 Required to add new elements and maintain a complete
binary tree without a significant increasing time
 Adelson-Velskii and Landis introduced a binary tree
structure (AVL trees):
 Balanced with respect to the heights of the subtrees.
 We can perform dynamic retrievals in O(logn) time for a
tree with n nodes.
 We can enter an element into the tree, or delete an
element form it, in O(logn) time. The resulting tree remain
height balanced.
 As with binary trees, we may define AVL tree recursively
AVL Trees (5/17)
 Definition:
 An empty binary tree is height balanced. If T is a
nonempty binary tree with TL and TR as its left and right
subtrees, then T is height balanced iff
 TL and TR are height balanced, and
 |hL - hR|  1 where hL and hR are the heights of TL and TR,
respectively.
 The definition of a height balanced binary tree
requires that
every subtree
also be height
balanced
AVL Trees (6/17)
 This time we will insert the months into the tree in the
order
 Mar, May, Nov, Aug, Apr, Jan, Dec, Jul, Feb, Jun, Oct, Sep
 It shows the tree as it grows, and the restructuring
involved in keeping it balanced.
 The numbers by each node represent the difference
in heights between the left and right subtrees of that
node
 We refer to this as the balance factor of the node
 Definition:
 The balance factor, BF(T), of a node, T, in a binary tree is
defined as hL - hR, where hL(hR) are the heights of the
left(right) subtrees of T.
For any node T in an AVL tree BF(T) = -1, 0, or 1.
AVL Trees (7/17)
 Insertion into an AVL tree
AVL Trees (8/17)
 Insertion into an AVL tree (cont’d)
 Insertion into an AVL tree (cont’d)
 Insertion into an AVL tree (cont’d)
AVL Trees (11/17)
 We carried out the rebalancing using four different
kinds of rotations:
LL, RR, LR, and RL
 LL and RR are symmetric as are LR and RL
 These rotations are characterized by the nearest
ancestor, A, of the inserted node, Y, whose balance
factor becomes 2.
 LL: Y is inserted in the left subtree of the left subtree of A.
 LR: Y is inserted in the right subtree of the left subtree of A
 RR: Y is inserted in the right subtree of the right subtree of A
 RL: Y is inserted in the left subtree of the right subtree of A
AVL Trees (12/17)
 Rebalancing rotations
AVL Trees (13/17)
 Rebalancing rotations
AVL Trees (14/17)
 Rebalancing rotations (cont’d)
AVL Trees (15/17)
 Rebalancing rotations (cont’d)
 Rebalancing rotations (cont’d)
AVL Trees (17/17)
 Complexity:
 In the case of binary search trees, if there were n
nodes in the tree, then h (the height of tree) could be
be n and the worst case insertion time would be O(n).
 In the case of AVL trees, since h is at most (log n),
the worst case insertion time is O(log n).
 Figure 10.13 compares the worst case times of
certain operations
2-3 Trees
2-3 Trees
2-3 Trees
2-3 Trees
2-3 Trees
2-3 Trees
2-3 Trees
2-3 Trees
2-3 Trees
2-3 Trees
2-3 Trees
2-3-4 Trees
2-3-4 Trees
2-3-4 Trees
2-3-4 Trees
2-3-4 Trees
2-3-4 Trees
2-3-4 Trees
2-3-4 Trees
Red-black Trees
Red-black Trees
Red-black Trees
Red-black Trees
Red-black Trees
Red-black Trees
Red-black Trees
Red-black Trees
Red-black Trees
B-Trees
B-Trees
B-Trees
B-Trees
B-Trees
B-Trees
Splay Trees
Splay Trees
Splay Trees
Splay Trees
Digital Trees
Digital Trees
Digital Trees
Digital Trees
Digital Trees
Digital Trees
Digital Trees
Digital Trees
Digital Trees
Tries
Tries
Tries
Tries
Tries
Tries

You might also like