0% found this document useful (0 votes)
51 views19 pages

What Is CS 301?: CS 301: Why Data Structures ??

CS 301 teaches students how to write efficient and elegant software by learning appropriate data structures and algorithms. Students will learn common data structures like arrays, linked lists, stacks, queues, trees and hash tables. They will learn sorting, searching and hashing algorithms to manipulate these data structures. The course aims to develop proficiency in specifying, representing and implementing different data types and structures. Students will learn to choose the best data structure and algorithm for problems involving sorting, searching, insertion and deletion. On completing the course, students should be able to apply data structures and algorithms to program design and problem solving.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views19 pages

What Is CS 301?: CS 301: Why Data Structures ??

CS 301 teaches students how to write efficient and elegant software by learning appropriate data structures and algorithms. Students will learn common data structures like arrays, linked lists, stacks, queues, trees and hash tables. They will learn sorting, searching and hashing algorithms to manipulate these data structures. The course aims to develop proficiency in specifying, representing and implementing different data types and structures. Students will learn to choose the best data structure and algorithm for problems involving sorting, searching, insertion and deletion. On completing the course, students should be able to apply data structures and algorithms to program design and problem solving.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

What is CS 301?

• Learn how to really write good software!


– Similar to the difference between knowing English grammar
and writing Hamlet!
CS 301: • Learn to write efficient and elegant software
– How to choose between two algorithms
Why Data Structures ?? • Which to use? bubble-sort, insertion-sort, merge-
sort
– How to choose appropriate data structures
• Which to use? array, vector, linked list, binary tree
– How to construct a library of interacting components
• Examples: STL and MFC in C++, Swing and Util packages
in Java

Course Objectives Course Outcomes


• To develop proficiency in the specification, representation, and
On completion of this course, a student must be able to
implementation of data types and data structures.
• Understand common data structures (such as arrays, linked lists, stacks,
• To understand the basic concepts of the fundamentals of queues, priority queues, trees, heaps, hash tables, associative
different types of data structures. containers).
• To demonstrate the ways of implementation of different types • Understand the algorithms that build and manipulate different types of
of data structures. data structures including sorting, searching, and hashing algorithms.
• Decide, apply and implement the appropriate data type and data
• To learn the techniques to solve problems like sorting, structure for a given problem.
searching, insertion and deletion of data etc. related to data • Make appropriate data structure and algorithm design decisions with
structures. respect to program size, execution speed, and storage efficiency.

1
SECTION-A Hrs
Complexity Analysis: Time and Space complexity of algorithms, asymptotic analysis, big O and other
notations, importance of efficient algorithms, program performance measurement, data structures and
4
Instructional Material
algorithms.
• Y. Langsam, M. J. Augenstein, A. M. Tanenbaum: Data Structures using C and C++, 2nd Edition,
Linear Lists: Abstract data type, sequential and linked representations, comparison of insertion, deletion 8
Pearson Education
and search operations for sequential and linked lists, list and chain classes, doubly linked lists, circular
• R. Kruse, C. L. Tondo, B. Leung, S. Mogalla: Data Structures & Program Design n C. 2nd Edition,
linked lists, applications of lists in bin sort, radix sort, sparse tables.
Pearson Education.
Stacks and Queues: Abstract data types, sequential and linked implementations, representative 4
References:
applications such as parenthesis matching, towers of Hanoi.
• E. Horowitz, S. Sahni, D. Mehta :Fundamentals of Data Structures in C++, 2ndEdition, Universities
Sorting: Bubble sort, selection sort, insertion sort, Shell sort, Quick sort, Heap sort, Merge sort; Radix 7 Press
sort, Analysis of the sorting methods, Selecting the top k elements.
• Donald E. Knuth::Art of Computer Programming, Volume 1: Fundamental algorithms, 3rd Edition,
SECTION-B Addison-Wesley
Trees: Binary trees and their properties, terminology, sequential and linked implementations, tree traversal 7 • Art of Computer Programming, Volume 3: Sorting and Searching, 2nd Edition, Addison-Wesley
methods and algorithms, Heap data structure and its applications as priority queues, heap implementation,
• Online Material
insertion and deletion operations, Heap-sort.
• Computer Science –Data Structures and Algorithm – IIT Delhi
Search & Multi-way Trees: Binary search trees, search efficiency, insertion and deletion operations, 7
• Algorithms and data structure –MIT, Prof Surjit Das
importance of balancing, AVL trees, B-trees, B+ trees https://www.youtube.com/watch?v=HtSuA80QTyo&list=PLxZdKEtmy3GRhETjatYq9v3O8VVt3YrNb
Graphs: Definition, terminology, directed and undirected graphs, properties, connectivity in graphs, 5
• Algorithms and Data Structures, Princeton University,
applications, implementation – adjacency matrix and linked adjacency chains, graph traversal – breadth https://www.youtube.com/watch?v=8mYfZeHtdNc&list=PLxc4gS-_A5VDXUIOPkJkwQKYiT2T1t0I8
first and depth first, spanning trees.
Hashing: hashing as a search structure, hash table, collision avoidance, linear open addressing, chaining. 3

5 6

Academic Requirements Why should you care?


• End Semester Examination:- 50 marks • Complex data structures and algorithms are used in every substantial
program
• Sessional – 30 marks – Data compression uses trees: MP3, Gif, etc…
– Networking uses graphs: Routers and telephone networks
• Quiz – 10 marks
– Security uses complex math algorithms: GCD and large decimals
• Attendance requirements: minimum 75% – Operating systems use queues and stacks: Scheduling and recursion
• Surprise Quiz ( 5 marks 2 at least) • Many problems can only be solved using complex data structures and
algorithms.
• Rest 5:- How?? • Required to graduate!!!

2
What 301 is NOT about Data Structures and Algorithms
• This course is not about C / C++ • Algorithm: Outline, the essence of a computational
– Although we will use C / C++ to implement many of the concepts
procedure, step by step instructions.
• This course is not about MATH
– Although we will use math to formalize many of the concepts • Program: an implementation of an algorithm in some
PL
• Competency in both math and C/C++ is therefore assumed.
• Data structure: Organization of data to solve the
– C / C++: inheritance, overloading, overriding, files,
reference/primitive types, multi-dimensional arrays problem
– Math: polynomials, logarithms, inductive proofs, logic

9 10

Algorithmic Problem Algorithmic Solution

Specification of
Specification of output as function Input instance
Output related to
input of input adhering to
input as required.
specifications

Infinite number of input instances satisfying the specifications. -Algorithm describes actions on the input instance
For e.g. A sorted non-decreasing sequence of natural numbers of
non-zero finite length -Infinitely many correct algorithms for the same algorithmic
1. 20,901,978,985……. problem.

11 12

3
• What is Good Algorithm? Measuring the Running Time
–Efficient • How should we measure the running time of
• Running time algorithm?
• Space Used • Experimental study
– WAP that implements the algorithm
–Efficiency as function of input size: – Run the program with data sets of varying size and
• The number of bits in a number. composition
• Number of data elements(numbers , – Use method like currenttime()/time_t time (time_t*
timer); to get an accurate measure of actual running
points). time.
13 14

Limitation of Experimental studies Beyond Experimental Studies


• Develop a general methodology for analyzing
• It is necessary to implement and test the algo in
running time of algorithms. This uses
order to determine its running time.
– High level description of algo instead of testing one of
• Only limited set of inputs can be tested and may the implementations
not be indicative of the running time on other – Takes into account all possible inputs.
inputs not included. – Allows one to evaluate efficiency of any algorithm in a
• Same H/W and S/w environments shall be used. way that is independent of hardware and software
environment.

15 16

4
Pseudo Code Pseudo Code
• A mixture of natural language and high level programming
concepts that describes the main ideas behind a generic • More structured than usual code but less formal
implementation of a data structure or algorithm. • Expressions like for assignment, standard
• More structured than usual prose but less formal than PL. mathematical symbols for mathematical/boolean
• Eg Algorithm arrayMax(A,n): operators
Input: An array A storing n integers
• Method Declarations :
Output: Maximum element in A
– Algorithm name(parameter1 ,parameter 2)
currentMax <-- A[0]
for i 1 to n-1 do • Programming constructs
If currentMax < A[i] then currentMax A[i] – Decision structures: if..then..else..
retrun currentMax – While loops . While…do..
17
– Array indexing like A[i]..A[ii].. etc 18

Example : Sorting
Analysis of Algorithm • INPUT OUTPUT
• Primitive Operation: Low level operation Sequence of numbers a permutation of sequence of numbers

independent of PL. can be defined in pseudo-code.


For eg: a1,a2,a3…an b1,b2,b3……………………bn
– Data Movement (assign)( i=0 or I <- 0) SORT
– Control (branch, subroutine call, return) 2 5 4 10 7 2 4 5 7 10
– Arithmetic and logic operations
Correctness ( requirements for the o/p)
• By inspecting pseudo-code we can count the For any given input the algo halts with the o/p
b1<b2<b3<..bn
number of primitive operations executed by an b1,b2,b3…bn is permutation of a1,a2,a3…….an
algorithm.
19 20

5
Example : Sorting Insertion Sort
• INPUT OUTPUT
Sequence of numbers a permutation of sequence of numbers • Array is divided: sorted / unsorted
• At each step an element from unsorted part is inserted in its
place in sorted part
A1,a2,a3…an b1,b2,b3……………………bn

SORT

2 5 4 10 7 2 4 5 7 10

Correctness ( requirements for the o/p) Running Time


For any given input the algo halts with the o/p Depends on
b1<b2<b3<..bn •number of elements
b1,b2,b3…bn is permutation of •how (partially) sorted they are
a1,a2,a3…….an •algorithm
Not possible to verify all permutation as O(n!)
= O(nn 21 22

Insertion Sort
void insertionSort (int list[ ])
{ int i; /* index for outer loop */
int hold; /* here we’ll store the element to be inserted */
int j; /* index for inner “moving” loop */
for i=2 to list.length
{ hold = list[ i ]; /* element to be inserted */
/* Move down 1 position all elements greater than ‘hold’ */
for (j = i - 1; j > 0 && hold < list[ j ]; j--)
{
list[ j + 1] = list[ j ];
}
list [ j + 1] = hold; /* inserting the element into its right place */
}
return;
23 24
}

6
25 26

Analysis of Insertion Sort Analysis of insertion sort:


• Best Case : array is already sorted
• Outer loop moves from L-R • F(n) = 1+1+1….. =(n-1)=Omega(n)
• Worst case: reverse sorted order • Worst Case occurs when the array is in reverse order
• For i=2 no of comparisons=1, movement =1=2*1 and inner loop must use maximum comparisons i.e.
• For i=3 no of comparisons=2, movement =2=(2*2) K-1 for each Kth element. Hence
f ( n ) = 1 + 2 + 3 +………….+ ( n – 1 ) = ( n * ( n – 1 ) ) / 2 = O ( n2 )
• For i=4 no of comparisons=3, movement =3, total =6
• For Average case number of comparisons will be
• If i=n, (n-1) comparisons and (n-1) movements, total
approximately (K – 1) / 2. Hence
2(n-1) f (n) = (1/2) + (2/2) + (3/2)+ …………+ (n–1)/2 = n(n–1) / 4 = O ( n2 )
Total =2(1) + 2(2) + 2(3)+ 2(4)……2(n-1) = 2* Sum of n Space required
• Three variables :- hold , I ,j , therefore constant space.
numbers = O(n2) This algorithm is very slow when ‘n’ is very large. So only used for small value of ‘n’.
27 28

7
Best/Worse/Average Case Asymptotic Analysis
• Goal: to simplify analysis of running time by
• Worst Case is usually used; It is an upper bound getting rid of details, which may be affected by
and in certain application domains( eg air craft specific implementation and hardware
control, surgery) knowing the worst case is
– Like rounding: 1,000,003 == 1,000,000
critically important.
– 3n2 =n2
• Worst Case occur fairly often
• Capturing the essence: how the running time of
• Average Case is often as bad as worst case. algorithm increases with the size of input in the
• Finding average case can be very difficult. limit
– Asymptotically, more efficient algorithms are best for
all but small inputs
29 30

Algorithm Efficiency Algorithm Efficiency


• Let’s look at the following algorithm for initializing • In that algorithm, we have one loop that processes
the values in an array: all of the elements in the array
final int N = 500; • Intuitively:
int [] counts = new int[N]; – If N was half of its value, we would expect the
for (int i=0; i<counts.length; i++) algorithm to take half the time
counts[i] = 0; – If N was twice its value, we would expect the algorithm
• The length of time the algorithm takes to execute to take twice the time
depends on the value of N • If that is true and we say that the algorithm
efficiency relative to N is linear
31 32

8
Algorithm Efficiency Algorithm Efficiency
• However, in the second algorithm, we have two
• Let’s look at another algorithm for initializing the values
nested loops to process the elements in the two
in a different array:
final int N = 500;
dimensional array
int [] [] counts = new int[N][N]; • Intuitively:
for (int i=0; i<counts.length; i++) – If N is half its value, we would expect the algorithm to
for (int j=0; j<counts[i].length; j++) take one quarter the time
counts[i][j] = 0; – If N is twice its value, we would expect the algorithm to
• The length of time the algorithm takes to execute still take quadruple the time
depends on the value of N • That is true and we say that the algorithm
efficiency relative to N is quadratic
33 34

Complexity: a measure of the


performance of an algorithm
An algorithm’s performance depends on internal and external factors

Internal External
Introduction to complexity The algorithm’s efficiency, in • Size of the input to the
terms of: algorithm
• Time required to run •Relative order of initial elements
• Space (memory storage) • Speed of the computer
required to run on which it will run
• Quality of the compiler
Complexity measures the internal factors
(usually more interested in time than space)
36

9
Growth rates and big-O notation Big-O Notation
• Growth rates capture the essence of an algorithm’s performance
• We use a shorthand mathematical notation to describe
• Big-O notation indicates the growth rate. It is the class of
the efficiency of an algorithm relative to any parameter
mathematical formula that best describes an algorithm’s
performance, and is discovered by looking inside the algorithm
n as its “Order” or Big-O
• Big-O is a function with parameter N, where N is usually the size – We can say that first algorithm of inserting elements is O(n)
of the input to the algorithm – We can say that the second algorithm is O(n2)
– For example, if an algorithm depending on the value n has performance • For any algorithm that has a function g(n) of the
an2 + bn + c (for constants a, b, c) then we say the algorithm has parameter n that describes its length of time to execute,
performance O(n2)
we can say the algorithm is O(g(n))
• For large N, the N2 term dominates. Only the dominant term is
included in big-O • We only include the fastest growing term and ignore any
37
multiplying by or adding of constants 38

Common growth rates Note on Constant Time


Time complexity Example • We write O(1) to indicate something that takes a constant
O(1) constant the time is independent of n amount of time
Adding to the front of a linked list
O(log N) log Finding an entry in a sorted array – E.g. finding the minimum element of an ordered array takes O(1)
time, because the min is either at the beginning or the end of the
O(N) linear Finding an entry in an unsorted array
array
O(N log N) n-log-n Sorting n items by ‘divide-and-conquer’
2
O(N ) quadratic Shortest path between two nodes in a graph
– Important: constants can be huge, and so in practice O(1) is not
necessarily efficient --- all it tells us is that the algorithm will run at
O(N3) cubic Simultaneous linear equations
N the same speed no matter the size of the input we give it
O(2 ) exponential The Towers of Hanoi problem
O(N!) factorial Exhaustive Search

Order of growth of some common functions


O(1) < O(log n) < O(n) < O(n * log n) < O(n2) < O(n3) < O(2n)

39

10
Order-of-Magnitude Analysis and Big
Comparison of running time
O Notation
Running Maximum Problem Size(n)
Time
1 second 1 minute I hour

400n 2500 150000 90000000


20 n log n 4096 166666 7826087
2n2 707 5477 42426
N4 31 88 244
A comparison of growth-rate functions: a) in tabular form 2n 19 25 31
Source:NPTEL
42

Basic shape of polynomial is determined by highest valued exponent


Order-of-Magnitude Analysis and Big (order)

O Notation

A comparison of growth-rate functions: b) in graphical form

44

11
Multiplicative constants do not affect the fundamental shape of a curve.
Only the steepness of the curve is affected. Quadratic family

45 46

Only the dominant terms of a polynomial matter in the long run Lower-order terms fade to insignificance as the problem size increases.

47 48

12
Calculating the actual time taken by a
Best, average, worst-case complexity
program (example)
• A program takes 10ms to process one data item Generally three types of complexity is measured:
(i.e. to do one operation on the data item) • Worst case: Maximum time taken by algorithm to
• How long would the program take to process 1000 complete for any possible input.
data items, if time is proportional to: • Average case: The expected time taken by
– log10 N algorithm to complete for any possible input.
–N • Best case: Minimum possible time taken by
– N log10 N algorithm to complete for any possible input.
– N2
49
– N3 50

Asymptotic Notation Arithmetic of Big-O Notation


• Simple Rule: Drop lower order terms and constant
factors 1) If f(n) is O(g(n)) then c.f(n) is O(g(n)), where c is a
– 50 n log n is O(n log n) constant.
– 7n + 3 is O(n) – Example: 23*log n is O(log n)
– 8n2log(n) + 5n2 + n is O(n2 log(n)) 2) If f1(n) is O(g(n)) and f2(n) is O(g(n)) then also
– log n < n < n2 <n3 < 2n f1(n)+f2(n) is O(g(n))
• Caution: Beware of very large constant factors. An – Example: what is order of n2+n?
algorithm running in time 1,000,000 n is still O(n) n2 is O(n2)
but might be less efficient than one running in n is O(n) but also O(n2)
therefore n2+n is O(n2)
time 2n2 which is O(n2).
51

13
Arithmetic of Big-O Notation Using Big O Notation
3) If f1(n) is O(g1(n)) and f2(n) is O(g2(n)) then • It’s not correct to say:
f1(n)*f2(n) is O(g1(n)*g2(n)). f(n) = O(g(n))
– Example: what is order of (3n+1)*(2n+log n)? • It’s completely wrong to say:
3n+1 is O(n) f(n) > O(g(n))
2n+log n is O(n)
(3n+1)*(2n+log n) is O(n*n)=O(n2)
• Just use:
f(n) is (in) O(g(n)), or
f(n) is of order O(g(n)), or
f(n) 2 O(g(n))

How do we calculate big-O? Guideline 1: Loops


Five guidelines for finding out the time complexity
of a piece of code The running time of a loop is, at most, the running time
of the statements inside the loop (including tests)
multiplied by the number of iterations.
1 Loops
for (i=1; i<=n; i++)
2 Nested loops executed {
3 Consecutive statements n times m = m + 2; constant time ‘c’
}
4 If-then-else statements
Total time = a constant c * n = cn = O(N)
5 Logarithmic complexity
55 56

14
Guideline 2: Nested loops Guideline 3: Consecutive statements

Analyse inside out. Total running time is the Add the time complexities of each statement.
product of the sizes of all the loops. constant time ‘c0’ x = x +1;
for (i=1; i<=n; i++) {
m = m + 2; executed
for (i=1; i<=n; i++) { constant time ‘c1’
} n times
outer loop for (j=1; j<=n; j++) { inner loop
for (i=1; i<=n; i++) {
executed k = k+1; executed inner loop
outer loop for (j=1; j<=n; j++) {
n times } n times k = k+1; executed
executed
} constant time ‘c’ } n times
n times constant time ‘c2’
}
Total time = c * n * n * = cn2 = O(N2)
Total time = c0 + c1n + c2n2 = O(N2)

57 58

Guideline 4: If-then-else statements Guideline 5: Logarithmic complexity

Worst-case running time: the test, plus either the then


part or the else part (whichever is the larger). An algorithm is O(log N) if it takes a constant time to cut
the problem size by a fraction (usually by ½)
test: if (a != b ) {
Constant ‘c0’ return false; then part:
} Constant c1 Example algorithm (binary search):
else { finding a word in a dictionary of n pages
for (int n = 0; n < length; n++) {
else part:
another if : Constant if (d = = e) • Look at the centre point in the dictionary
(c2 + c3) * n
‘c2’ + constant ‘c3’ return false;
}
• Is word to left or right of centre?
(no else part)
} • Repeat process with left or right part of
dictionary until the word is found
Total time = c0 / c1 + (c2 + c3) * n = O(N)

59 60

15
Binary Search
(on sorted arrays)

Basic Search Algorithms and • General case: search a sorted array a of size n
looking for the value key
their Complexity Analysis • Divide and conquer approach:
– Compute the middle index mid of the array
– If key is found at mid, we’re done
– Otherwise repeat the approach on the half of the
array that might still contain key
– etc…
13-61 13-62

Example: Binary Search For


Analysis of Binary Search
Ordered List
• The amount of work done before and after the
int binarySearch(m_container, key) { loop is a constant, and independent of n
int first = 1, last = m_container.getLength();
• The amount of work done during a single
while (first <= last) { // start of while loop
int mid = (first+last)/2;
execution of the loop is constant
Item val = retrieve(mid); • Time complexity will therefore be proportional to
if (key < val) last = mid-1; number of times the loop is executed, so that’s
else if (key > val) first = mid+1; what we’ll analyze
else return mid;
} // end of while loop
return –1;
}
13-63 13-64

16
Analysis of Binary Search Analysis of Binary Search
• Worst case: key is not found in the array
• Each time through the loop, at least half of the • Suppose in the worst case that maximum number
remaining locations are rejected: of times through the loop is k; we must express k
in terms of n
– After first time through, <= n/2 remain
– After second time through, <= n/4 remain • Exit the do..while loop when number of
– After third time through, <= n/8 remain
remaining possible locations is less than 1 (that
is, when first > last): this means that n/2k < 1
– After kth time through, <= n/2k remain

13-65 13-66

Analysis of Binary Search Analysis of Binary Search

• Also, n/2k-1 >=1; otherwise, looping would have • Next, take base-2 logarithms to get:
stopped after k-1 iterations k > log2(n) >= k-1

• Combining the two inequalities, we get: • Which is equivalent to:


n/2k < 1 <= n/2 k-1 log2(n) < k <= log2(n) + 1

• Invert and multiply through by n to get:


• Thus, binary search algorithm is O(log2(n))
2k > n >= 2 k-1 in terms of the number of array locations
examined
13-67 13-68

17
Linear Search: Example 1 Analysis of Linear Search

• The problem: Search an array a of size n to determine • Total amount of work done:
whether the array contains the value key; return index – Before loop: a constant amount a
if found, -1 if not found – Each time through loop: 2 comparisons, an and
operation, and an addition: a constant amount of
Set k to 0. work b
While (k < n) and (a[k] is not key) – After loop: a constant amount c
Add 1 to k. – In worst case, we examine all n array locations, so
If k == n Return –1. T(n) = a +b*n + c = b*n + d, where d = a+c, and time
Return k.
complexity is O(n)

13-69 13-70

Analysis of Linear Search Analysis of Linear Search


• Simpler approach to expected case:
• Simpler (less formal) analysis: – Add up the number of times the loop is executed in
– Note that work done before and after loop is each of the n cases, and divide by the number of cases
independent of n, and work done during a single n
execution of loop is independent of n – (1+2+3+ … +(n-1)+n)/n = (n*(n+1)/2)/n = n/2 + 1/2;
– In worst case, loop will be executed n times, so algorithm is therefore O(n)
amount of work done is proportional to n, and
algorithm is O(n)

13-71 13-72

18
Binary vs. Liner Search Performance isn’t everything!

• There can be a tradeoff between:


t
– Ease of understanding, writing and debugging
– Efficient use of time and space
n • So, maximum performance is not always desirable
t • However, it is still useful to compare the
search for one performance of different algorithms, even if the
out of n optimal algorithm may not be adopted
n
ordered integers
see demo: www.csd.uwo.ca/courses/CS1037a/demos.html 13-73 74

19

You might also like