What Is CS 301?: CS 301: Why Data Structures ??
What Is CS 301?: CS 301: Why Data Structures ??
1
SECTION-A Hrs
Complexity Analysis: Time and Space complexity of algorithms, asymptotic analysis, big O and other
notations, importance of efficient algorithms, program performance measurement, data structures and
4
Instructional Material
algorithms.
• Y. Langsam, M. J. Augenstein, A. M. Tanenbaum: Data Structures using C and C++, 2nd Edition,
Linear Lists: Abstract data type, sequential and linked representations, comparison of insertion, deletion 8
Pearson Education
and search operations for sequential and linked lists, list and chain classes, doubly linked lists, circular
• R. Kruse, C. L. Tondo, B. Leung, S. Mogalla: Data Structures & Program Design n C. 2nd Edition,
linked lists, applications of lists in bin sort, radix sort, sparse tables.
Pearson Education.
Stacks and Queues: Abstract data types, sequential and linked implementations, representative 4
References:
applications such as parenthesis matching, towers of Hanoi.
• E. Horowitz, S. Sahni, D. Mehta :Fundamentals of Data Structures in C++, 2ndEdition, Universities
Sorting: Bubble sort, selection sort, insertion sort, Shell sort, Quick sort, Heap sort, Merge sort; Radix 7 Press
sort, Analysis of the sorting methods, Selecting the top k elements.
• Donald E. Knuth::Art of Computer Programming, Volume 1: Fundamental algorithms, 3rd Edition,
SECTION-B Addison-Wesley
Trees: Binary trees and their properties, terminology, sequential and linked implementations, tree traversal 7 • Art of Computer Programming, Volume 3: Sorting and Searching, 2nd Edition, Addison-Wesley
methods and algorithms, Heap data structure and its applications as priority queues, heap implementation,
• Online Material
insertion and deletion operations, Heap-sort.
• Computer Science –Data Structures and Algorithm – IIT Delhi
Search & Multi-way Trees: Binary search trees, search efficiency, insertion and deletion operations, 7
• Algorithms and data structure –MIT, Prof Surjit Das
importance of balancing, AVL trees, B-trees, B+ trees https://www.youtube.com/watch?v=HtSuA80QTyo&list=PLxZdKEtmy3GRhETjatYq9v3O8VVt3YrNb
Graphs: Definition, terminology, directed and undirected graphs, properties, connectivity in graphs, 5
• Algorithms and Data Structures, Princeton University,
applications, implementation – adjacency matrix and linked adjacency chains, graph traversal – breadth https://www.youtube.com/watch?v=8mYfZeHtdNc&list=PLxc4gS-_A5VDXUIOPkJkwQKYiT2T1t0I8
first and depth first, spanning trees.
Hashing: hashing as a search structure, hash table, collision avoidance, linear open addressing, chaining. 3
5 6
2
What 301 is NOT about Data Structures and Algorithms
• This course is not about C / C++ • Algorithm: Outline, the essence of a computational
– Although we will use C / C++ to implement many of the concepts
procedure, step by step instructions.
• This course is not about MATH
– Although we will use math to formalize many of the concepts • Program: an implementation of an algorithm in some
PL
• Competency in both math and C/C++ is therefore assumed.
• Data structure: Organization of data to solve the
– C / C++: inheritance, overloading, overriding, files,
reference/primitive types, multi-dimensional arrays problem
– Math: polynomials, logarithms, inductive proofs, logic
9 10
Specification of
Specification of output as function Input instance
Output related to
input of input adhering to
input as required.
specifications
Infinite number of input instances satisfying the specifications. -Algorithm describes actions on the input instance
For e.g. A sorted non-decreasing sequence of natural numbers of
non-zero finite length -Infinitely many correct algorithms for the same algorithmic
1. 20,901,978,985……. problem.
11 12
3
• What is Good Algorithm? Measuring the Running Time
–Efficient • How should we measure the running time of
• Running time algorithm?
• Space Used • Experimental study
– WAP that implements the algorithm
–Efficiency as function of input size: – Run the program with data sets of varying size and
• The number of bits in a number. composition
• Number of data elements(numbers , – Use method like currenttime()/time_t time (time_t*
timer); to get an accurate measure of actual running
points). time.
13 14
15 16
4
Pseudo Code Pseudo Code
• A mixture of natural language and high level programming
concepts that describes the main ideas behind a generic • More structured than usual code but less formal
implementation of a data structure or algorithm. • Expressions like for assignment, standard
• More structured than usual prose but less formal than PL. mathematical symbols for mathematical/boolean
• Eg Algorithm arrayMax(A,n): operators
Input: An array A storing n integers
• Method Declarations :
Output: Maximum element in A
– Algorithm name(parameter1 ,parameter 2)
currentMax <-- A[0]
for i 1 to n-1 do • Programming constructs
If currentMax < A[i] then currentMax A[i] – Decision structures: if..then..else..
retrun currentMax – While loops . While…do..
17
– Array indexing like A[i]..A[ii].. etc 18
Example : Sorting
Analysis of Algorithm • INPUT OUTPUT
• Primitive Operation: Low level operation Sequence of numbers a permutation of sequence of numbers
5
Example : Sorting Insertion Sort
• INPUT OUTPUT
Sequence of numbers a permutation of sequence of numbers • Array is divided: sorted / unsorted
• At each step an element from unsorted part is inserted in its
place in sorted part
A1,a2,a3…an b1,b2,b3……………………bn
SORT
2 5 4 10 7 2 4 5 7 10
Insertion Sort
void insertionSort (int list[ ])
{ int i; /* index for outer loop */
int hold; /* here we’ll store the element to be inserted */
int j; /* index for inner “moving” loop */
for i=2 to list.length
{ hold = list[ i ]; /* element to be inserted */
/* Move down 1 position all elements greater than ‘hold’ */
for (j = i - 1; j > 0 && hold < list[ j ]; j--)
{
list[ j + 1] = list[ j ];
}
list [ j + 1] = hold; /* inserting the element into its right place */
}
return;
23 24
}
6
25 26
7
Best/Worse/Average Case Asymptotic Analysis
• Goal: to simplify analysis of running time by
• Worst Case is usually used; It is an upper bound getting rid of details, which may be affected by
and in certain application domains( eg air craft specific implementation and hardware
control, surgery) knowing the worst case is
– Like rounding: 1,000,003 == 1,000,000
critically important.
– 3n2 =n2
• Worst Case occur fairly often
• Capturing the essence: how the running time of
• Average Case is often as bad as worst case. algorithm increases with the size of input in the
• Finding average case can be very difficult. limit
– Asymptotically, more efficient algorithms are best for
all but small inputs
29 30
8
Algorithm Efficiency Algorithm Efficiency
• However, in the second algorithm, we have two
• Let’s look at another algorithm for initializing the values
nested loops to process the elements in the two
in a different array:
final int N = 500;
dimensional array
int [] [] counts = new int[N][N]; • Intuitively:
for (int i=0; i<counts.length; i++) – If N is half its value, we would expect the algorithm to
for (int j=0; j<counts[i].length; j++) take one quarter the time
counts[i][j] = 0; – If N is twice its value, we would expect the algorithm to
• The length of time the algorithm takes to execute still take quadruple the time
depends on the value of N • That is true and we say that the algorithm
efficiency relative to N is quadratic
33 34
Internal External
Introduction to complexity The algorithm’s efficiency, in • Size of the input to the
terms of: algorithm
• Time required to run •Relative order of initial elements
• Space (memory storage) • Speed of the computer
required to run on which it will run
• Quality of the compiler
Complexity measures the internal factors
(usually more interested in time than space)
36
9
Growth rates and big-O notation Big-O Notation
• Growth rates capture the essence of an algorithm’s performance
• We use a shorthand mathematical notation to describe
• Big-O notation indicates the growth rate. It is the class of
the efficiency of an algorithm relative to any parameter
mathematical formula that best describes an algorithm’s
performance, and is discovered by looking inside the algorithm
n as its “Order” or Big-O
• Big-O is a function with parameter N, where N is usually the size – We can say that first algorithm of inserting elements is O(n)
of the input to the algorithm – We can say that the second algorithm is O(n2)
– For example, if an algorithm depending on the value n has performance • For any algorithm that has a function g(n) of the
an2 + bn + c (for constants a, b, c) then we say the algorithm has parameter n that describes its length of time to execute,
performance O(n2)
we can say the algorithm is O(g(n))
• For large N, the N2 term dominates. Only the dominant term is
included in big-O • We only include the fastest growing term and ignore any
37
multiplying by or adding of constants 38
39
10
Order-of-Magnitude Analysis and Big
Comparison of running time
O Notation
Running Maximum Problem Size(n)
Time
1 second 1 minute I hour
O Notation
44
11
Multiplicative constants do not affect the fundamental shape of a curve.
Only the steepness of the curve is affected. Quadratic family
45 46
Only the dominant terms of a polynomial matter in the long run Lower-order terms fade to insignificance as the problem size increases.
47 48
12
Calculating the actual time taken by a
Best, average, worst-case complexity
program (example)
• A program takes 10ms to process one data item Generally three types of complexity is measured:
(i.e. to do one operation on the data item) • Worst case: Maximum time taken by algorithm to
• How long would the program take to process 1000 complete for any possible input.
data items, if time is proportional to: • Average case: The expected time taken by
– log10 N algorithm to complete for any possible input.
–N • Best case: Minimum possible time taken by
– N log10 N algorithm to complete for any possible input.
– N2
49
– N3 50
13
Arithmetic of Big-O Notation Using Big O Notation
3) If f1(n) is O(g1(n)) and f2(n) is O(g2(n)) then • It’s not correct to say:
f1(n)*f2(n) is O(g1(n)*g2(n)). f(n) = O(g(n))
– Example: what is order of (3n+1)*(2n+log n)? • It’s completely wrong to say:
3n+1 is O(n) f(n) > O(g(n))
2n+log n is O(n)
(3n+1)*(2n+log n) is O(n*n)=O(n2)
• Just use:
f(n) is (in) O(g(n)), or
f(n) is of order O(g(n)), or
f(n) 2 O(g(n))
14
Guideline 2: Nested loops Guideline 3: Consecutive statements
Analyse inside out. Total running time is the Add the time complexities of each statement.
product of the sizes of all the loops. constant time ‘c0’ x = x +1;
for (i=1; i<=n; i++) {
m = m + 2; executed
for (i=1; i<=n; i++) { constant time ‘c1’
} n times
outer loop for (j=1; j<=n; j++) { inner loop
for (i=1; i<=n; i++) {
executed k = k+1; executed inner loop
outer loop for (j=1; j<=n; j++) {
n times } n times k = k+1; executed
executed
} constant time ‘c’ } n times
n times constant time ‘c2’
}
Total time = c * n * n * = cn2 = O(N2)
Total time = c0 + c1n + c2n2 = O(N2)
57 58
59 60
15
Binary Search
(on sorted arrays)
Basic Search Algorithms and • General case: search a sorted array a of size n
looking for the value key
their Complexity Analysis • Divide and conquer approach:
– Compute the middle index mid of the array
– If key is found at mid, we’re done
– Otherwise repeat the approach on the half of the
array that might still contain key
– etc…
13-61 13-62
16
Analysis of Binary Search Analysis of Binary Search
• Worst case: key is not found in the array
• Each time through the loop, at least half of the • Suppose in the worst case that maximum number
remaining locations are rejected: of times through the loop is k; we must express k
in terms of n
– After first time through, <= n/2 remain
– After second time through, <= n/4 remain • Exit the do..while loop when number of
– After third time through, <= n/8 remain
remaining possible locations is less than 1 (that
is, when first > last): this means that n/2k < 1
– After kth time through, <= n/2k remain
13-65 13-66
• Also, n/2k-1 >=1; otherwise, looping would have • Next, take base-2 logarithms to get:
stopped after k-1 iterations k > log2(n) >= k-1
17
Linear Search: Example 1 Analysis of Linear Search
• The problem: Search an array a of size n to determine • Total amount of work done:
whether the array contains the value key; return index – Before loop: a constant amount a
if found, -1 if not found – Each time through loop: 2 comparisons, an and
operation, and an addition: a constant amount of
Set k to 0. work b
While (k < n) and (a[k] is not key) – After loop: a constant amount c
Add 1 to k. – In worst case, we examine all n array locations, so
If k == n Return –1. T(n) = a +b*n + c = b*n + d, where d = a+c, and time
Return k.
complexity is O(n)
13-69 13-70
13-71 13-72
18
Binary vs. Liner Search Performance isn’t everything!
19