CSE 211: Data Structures Lecture Notes
CSE 211: Data Structures Lecture Notes
Algorithm Analysis and Performance Prediction and Recursion 1 Algorithm Analysis and Performance Prediction
Algorithm is a finite sequence of instructions that the computer follows to solve a problem. Each of the instructions in an algorithm has a clear meaning and can be performed with a finite amount of effort in a finite length of time. When you face with a problem, the first thing you need to do is to find out an algorithm to solve it. Once you determine the algorithm to be correct, next step is to find out the resources (time and space) the algorithm will require. This is known as algorithm analysis. If your algorithm requires more resources than your computer has (such as gigabytes of main memory), it is useless. Data structures and algorithms are interrelated and should be studied together. Because, the algorithms are the methods used in systematic problem solving. Without methods for storing data in them, retrieving data from them and performing computational operations on the data in them data structures are meaningless. Thus, we have to study algorithms as well. The computation time and memory space required by data structures and algorithms that operate on them are important.
Sample Values N 1 2 4 8 32 256 2048 Constant log(N) 1 0 1 1 1 1 1 1 1 2 3 5 8 11 Nlog(N) 0 2 8 24 160 2048 22528 N2 1 4 16 64 1024 65536 4194304
Simple statements: We assume that statement does not contain a function call. It takes a fixed amount to execute. We denote the performance by O(1), if we factor out the constant execution time we are left with 1. Sequence of simple statements: It takes an amount of execution time equal to the sum of execution times of individual statements. If the performance of individual statements are O(1), so is their sum. Decision: For estimating the performance, then and else parts of the algorithm are considered independently. The performance estimate of the decision is taken to be the largest of the two individual big Os. For the case structure, we take the largest big O of all the case alternatives. Simple counting loop: This is the type of loop in which the counter is incremented or decrement each time the loop is executed (for loop). If the loop contains simple statements and the number of times the loop executes is a constant; in other words, independent of the problem size then the performance of the whole loop is O(1). On the other hand if the loop is like Ex: for (i=0; i< N; i++) the number of trips depends on N; the input size, so the performance is O(N). Nested loops: The performance depends on the counters at each nested loop. For ex: Ex: for (i=0; i< N; i++) { N 1 N 1 N 1 N 1 for (j=0; j< N; i++) { 1 = N =N 1 =N 2
sequence of simple statements
i = 0 j =0 i =0
i =0
} }
the outer loop count is N but the inner loop executes N times for each time. So the body of the inner loop will execute N*N and the entire performance will be O(N2). Ex: for (i=1; i<=N; i++) { N i 1 N N ( N + 1) N 2 N for (j=0; j< i; j++) {
sequence of simple statments
1 = i =
i =1 j =0 i =1
} In this case outer count trip is N, but the trip count of the inner loop depends not only N, but the value of the outer loop counter as well. If outer counter is 1, the inner loop has a trip count 1 and so on. If outer counter is N the inner loop trip count is N. How many times the body will be executed? 1+2+3+(N-1)+N = N(N+1) / 2 = ((N2) +N )/2 Therefore the performance is O(N2). For large N the contribution of the N/2 term is negligible. Generalization: A structure with k nested counting loops where the counter is just incremented or decrement by one has performance O(Nk) if the trip counts depends on the problem size only.
While loops: The control variable is multiplied or divided, each time the loop iteration is performed. Each loop has an initialization step, a termination condition and a modification step indicating how the control variable should be changed. In while-structure, the termination condition is checked before the iteration. Let's consider the following: control=1; while (control < n) {
Simple statements;
control=2*control; } In the above example performance depends on the problem size N. The control variable is multiplied by 2 until it gets larger than N. The initial value of control is 1, after k iterations we will have control = 2k In order to find k we take the log of both sides; log2(control) = log22k , lg(control)=k (assuming log2 is lg) Since the loop stops when control > N , the performance of the algorithm is O(lg(N)). Generalization: Assume that we multiply the control by some other constant; say fact. Then after k iterations; control = fact k and so, the performance is O(log(N)) where the log is taken to the base fact. In considering the performance base does not matter as form one base to another the additional factor is a constant.
1.4 General
Quadratic algorithms are impractical for input sizes exceeding a few thousand. Cubic algorithms are impractical for input sizes exceeding a few hundred.
2) Binary Search: If the input array is sorted, then we have an alternative to do binary search. Divide the list in half and look at the name right in the middle. If it is the one then exit, if the name you are looking for is smaller than the name in the middle than search in the first half. After k comparisons the data remaining to be searched is of size at most n/2k . Hence in the worst case this method requires O(log(N)) comparisons. For large values of N, the binary search outperforms the sequential search. For instance if N is 1000, then on average a successful sequential search requires about 500 comparisons. The average binary search will require 8 iterations in total 16 comparisons for a successful search. For small N, say 6; binary search may not be worth using.
3. Recursion
In this lecture we will discuss recursion and recursive algorithms. A recursive algorithm is an algorithm that is defined in terms of itself. In other words, it either directly or indirectly makes a call to itself. Recursion is a powerful problem solving tool. Many interesting programming problems can be expressed easily using recursive formulation. But we must be careful not to create circular logic that may result in infinite loops. A function is said to be recursive if in the course of its execution, the function makes a call to itself. This call may occur inside the function, in which case the function is directly recursive. In other cases a function may call another function, which in turn makes a call to the first one. This situation is known as indirect recursion. The objective of a recursive function is for the program to proceed through a sequence of calls until, at a certain point, the sequence terminates. If the function is improperly defined the program might cycle through a never-ending sequence. To ensure that recursive functions are well behaved, you should observe the following guidelines: 1. Every time a recursive function is called, the program should first check to see some basic condition, such as a particular parameter being equal to zero is satisfied. If this is the case the function should stop recursing. 2. Each time the function is recursively called, one or more of the arguments passed to the function should be reduced in size in some way. That is the parameters are nearer to the basic condition. For example: a positive integer may be smaller on each recursion so that eventually it reaches zero. Sometimes mathematical functions are defined recursively. Two classical examples are: 1. The sum of n integers; sum(n): we can write sum(1)=1 and sum(n)= sum(n-1)+n. Here, we have defined the function sum in terms of itself. Remember that the recursive definition of sum is identical to the closed form n(n+1)/2. But the recursive definition is only defined for positive integers. unsigned long sum(int n) { if (n==1) return 1; else return sum(n-1)+ n; }
CSE211 Lecture Notes - E. Ozcan, . Baydere O. Demir 2009 Spring 5
2. The factorial of a positive integer n; n! fact(n): we can write a formal definition as follows: fact(1)=1 and fact(n) = fact(n-1)*n Observe that a workable recursive algorithm must always reduce the size of the data set, that it is working with each time it is recursively called and must always provide a terminating condition such as the first line in our algorithm. Recursive calculation of 3! 4! = 4 * 3! = 4 * (3 * 2!) = 4 * (3 * (2 * 1!)) = 4 * (3 * (2 * 1)) = 4 * (3 * 2) =4*6 =24
Recursion vs Iteration
As a general rule, avoid recursion when an iterative solution is easy to find. Do not use recursion as a substitute for a simple loop. Too much recursion can be dangerous. Do not do redundant work recursively, the program will be incredibly inefficient. Recursion should be preferred when the underlying data structures in the problem are themselves recursive such as trees. Lets see this with an example: The use of recursion in the sum is poor because a simple loop would do the same thing. Another problem is illustrated by an attempt to calculate the Fibonacci numbers recursively. Lets assume that we have written a recursive algorithm to calculate the Fib numbers. long fib(int n) { if(n =1) return 1; else return fib(n-1) +fib(n-2); } This routine works but has a serious problem? It performs quite badly. (fib(40) takes 4 minutes to compute) Problem: Redundant calculations.To compute fib(n) we recursively compute fib(n-1), when this call returns we compute fib(n-2). But we have already computed f(n-2) in the process of computing f(n-1), so the call to fib(n-2) is wasted. It is a redundant calculation. Note that: The redundancy increases for each recursive call.
F5 F4 F3
F3
F2
F2
F1
F2 F1 F0
F1
F1
F0
F1
F0
Compound Interest Rule: Never duplicate work by solving the same instance of a problem in separate recursive calls. Example Tower of Hanoi The Tower of Hanoi puzzle was invented by the French mathematician Edouard Lucas in 1883. We are given a tower of eight disks (initially four in the applet below), initially stacked in increasing size on one of three pegs. The objective is to transfer the entire tower to one of the other pegs (the rightmost one in the applet below), moving only one disk at a time and never a larger one onto a smaller.
Src
Aux
Dst
The puzzle is well known to students of Computer Science since it appears in virtually any introductory text on data structures or algorithms. Its solution touches on two important topics discussed later on: recursive functions and stacks recurrence relations Assume there is a function Solve with for arguments - number of disks and three pegs (source, intermediary and destination - in this order). Then the body of the function might look like
Solve(N, Src, Aux, Dst) if N is 0 exit Solve(N-1, Src, Dst, Aux) Move from Src to Dst Solve(N-1, Aux, Src, Dst)
This actually serves as the definition of the function Solve. The function is recursive in that it calls itself repeatedly with decreasing values of N until a terminating condition (in our case N=0) has been met. To me the sheer simplicity of the solution is breathtaking. For N=3 it translates into 1. 2. 3. 4. 5. 6. 7. Move from Src to Dst Move from Src to Aux Move from Dst to Aux Move from Src to Dst Move from Aux to Src Move from Aux to Dst Move from Src to Dst
Of course "Move" means moving the topmost disk. For N=4 we get the following sequence
CSE211 Lecture Notes - E. Ozcan, . Baydere O. Demir 2009 Spring 7
1. Move from Src to Aux 2. Move from Src to Dst 3. Move from Aux to Dst 4. Move from Src to Aux 5. Move from Dst to Src 6. Move from Dst to Aux 7. Move from Src to Aux 8. Move from Src to Dst 9. Move from Aux to Dst 10. Move from Aux to Src 11. Move from Dst to Src 12. Move from Aux to Dst 13. Move from Src to Aux 14. Move from Src to Dst 15. Move from Aux to Dst Source: http://www.cut-the-knot.org/recurrence/hanoi.shtml