Presentation Introdiction To Data Structures 1463137384 181219

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 50

Data Structures - Introduction

1
Data Structures - Introduction

 The data structure you pick needs to


support the operations you need
 Ideally it supports the operations you
will use most often in an efficient
manner
 Examples of operations:
 A List with operations insert and delete
 A Stack with operations push and pop

2
Data Structures - Introduction

• Abstract Data Type (ADT)


– Mathematical description of an object with set of
operations on the object. Useful building block.
• Algorithm
– A high level, language independent, description
of a step-by-step process
• Data structure
– A specific family of algorithms for implementing
an abstract data type.
• Implementation of data structure
– A specific implementation in a specific language

3
Data Structures - Introduction

 A stack is an abstract data type supporting


push, pop and isEmpty operations
 A stack data structure could use an array, a
linked list, or anything that can hold data
 One stack implementation is java.util.Stack;
another is java.util.LinkedList

4
Data Structures - Introduction

 Abstract  Concrete
 Pseudocode  Specific programming
 Algorithm language
 A sequence of high-level,  Program
language independent  A sequence of operations in a
operations, which may specific programming
act upon an abstracted language, which may act upon
view of data. real data in the form of
 Abstract Data Type numbers, images, sound, etc.
(ADT)  Data structure
 A mathematical  A specific way in which a
description of an object program’s data is represented,
and the set of operations which reflects the
on the object. programmer’s design
choices/goals. 5
Data Structures - Introduction

Ideal data structure:


“fast”, “elegant”, memory efficient
Generates tensions:
 time vs. space
 performance vs. elegance
 generality vs. simplicity
 one operation’s performance vs.
another’s The study of data structures is the
study of tradeoffs. That’s why we
have so many of them!
6
Data Structures - Introduction

 Introductions
 Administrative Info
 What is this course about?
 Review: Queues and stacks

7
Data Structures - Introduction

 FIFO: First In First Out


 Queue operations
create
destroy G enqueue
FEDCB
dequeue
A
enqueue
dequeue
is_empty

8
Data Structures - Introduction

0
Q size - 1
b c d e f

front back
enqueue(Object x) {
Q[back] = x ;
back = (back + 1) % size
}
dequeue() {
x = Q[front] ;
front = (front + 1) % size;
return x ;
}
9
Data Structures - Introduction

b c d e f

front back

void enqueue(Object x) { Object dequeue() {


if (is_empty()) assert(!is_empty)
front = back = new return_data = front->data
Node(x) temp = front
else front = front->next
back->next = new delete temp
Node(x)
return return_data
back = back->next
}
}
bool is_empty() {
return front == null
} 10
Data Structures - Introduction

 Too much space  Can grow as needed


 Kth element  Can keep growing
accessed “easily”  No back looping
 Not as complex around to front
 Could make array  Linked list code more
more robust complex

11
Data Structures - Introduction

 LIFO: Last In First Out


 Stack operations
 create
 destroy A EDCBA
 push
 pop B
 top C
 is_empty D
E
F F

12
Data Structures - Introduction

 Function call stack


 Removing recursion
 Balancing symbols (parentheses)
 Evaluating Reverse Polish Notation

13
Data Structures - Introduction

14
Data Structures - Introduction

 Correctness:
 Does the algorithm do what is intended.
 Performance:
 What is the running time of the algorithm.
 How much storage does it consume.
 Different algorithms may be correct
 Which should I use?

15
Data Structures - Introduction

 Write a recursive function to find the


sum of the first n integers stored in
array v.

16
Data Structures - Introduction

 Basis Step: The algorithm is correct for a


base case or two by inspection.

 Inductive Hypothesis (n=k): Assume that


the algorithm works correctly for the first k
cases.

 Inductive Step (n=k+1): Given the


hypothesis above, show that the k+1 case
will be calculated correctly.

17
Data Structures - Introduction

 Basis Step:
sum(v,0) = 0. 

 Inductive Hypothesis (n=k):


Assume sum(v,k) correctly returns sum of first k
elements of v, i.e. v[0]+v[1]+…+v[k-1]+v[k]

 Inductive Step (n=k+1):


sum(v,n) returns
v[k]+sum(v,k-1)= (by inductive hyp.)
v[k]+(v[0]+v[1]+…+v[k-1])=
v[0]+v[1]+…+v[k-1]+v[k] 

18
Data Structures - Introduction

 Proving correctness of an algorithm is very important


 a well designed algorithm is guaranteed to work correctly
and its performance can be estimated

 Proving correctness of a program (an


implementation) is fraught with weird bugs
 Abstract Data Types are a way to bridge the gap between
mathematical algorithms and programs

19
Data Structures - Introduction

GOAL: Sort a list of names

“I’ll buy a faster CPU”

“I’ll use C++ instead of Java – wicked fast!”

“Ooh look, the –O4 flag!”

“Who cares how I do it, I’ll add more memory!”

“Can’t I just get the data pre-sorted??”

20
Data Structures - Introduction

 Ignores “details”
 What details?
 CPU speed
 Programming language used
 Amount of memory
 Compiler
 Order of input
 Size of input … sorta.

21
Data Structures - Introduction

 Efficiency measure
 how long the program runs time
complexity
 how much memory it uses space
complexity
 Why analyze at all?
 Decide what algorithm to implement before
actually doing it
 Given code, get a sense for where
bottlenecks must be, without actually
measuring it 22
Data Structures - Introduction

 Complexity as a function of input size n


T(n) = 4n + 5
T(n) = 0.5 n log n - 2n + 7
T(n) = 2n + n3 + 3n

 What happens as n grows?

23
Data Structures - Introduction

 Most algorithms are fast for small n


 Time difference too small to be noticeable
 External things dominate (OS, disk I/O,
…)

 BUT n is often large in practice


 Databases, internet, graphics, …

 Difference really shows up as n grows!


24
Data Structures - Introduction

2 3 5 16 37 50 73 75 126

bool ArrayFind( int array[], int n, int


key){
// Insert your algorithm here

What algorithm would you


} choose to implement this
code snippet?25
Data Structures - Introduction

Basic Java operations Constant time


Consecutive statements Sum of times
Conditionals Larger branch plus test
Loops Sum of iterations
Function calls Cost of function body
Recursive functions Solve recurrence relation

26
Data Structures - Introduction

bool LinearArrayFind(int array[],


int n,
int key ) { Best Case:
for( int i = 0; i < n; i++ ) {
if( array[i] == key )
// Found it! Worst Case:
return true;
}
return false;
}

27
Data Structures - Introduction

bool BinArrayFind( int array[], int low,


int high, int key ) {
// The subarray is empty
if( low > high ) return false; Best case:
// Search this subarray recursively
int mid = (high + low) / 2;
if( key == array[mid] ) {
return true; Worst case:
} else if( key < array[mid] ) {
return BinArrayFind( array, low,
mid-1, key );
} else {
return BinArrayFind( array, mid+1,
high, key );
}

28
Data Structures - Introduction

1. Determine the recurrence relation. What is/are the


base case(s)?

2. “Expand” the original relation to find an equivalent


general expression in terms of the number of
expansions.

3. Find a closed-form expression by setting the


number of expansions to a value which reduces
the problem to a base case
29
Data Structures - Introduction

30
Data Structures - Introduction

Linear Search Binary Search

Best Case 4 at [0] 4 at [middle]

Worst Case 3n+2 4 log n + 4

So … which algorithm is better?


What tradeoffs can you make?

31
32
33
34
Data Structures - Introduction

 Asymptotic analysis looks at the order


of the running time of the algorithm
 A valuable tool when the input gets “large”
 Ignores the effects of different machines or
different implementations of an algorithm

 Intuitively, to find the asymptotic


runtime, throw away the constants and
low-order terms
 Linear search is T(n) = 3n + 2  O(n)
 Binary search is T(n) = 4 log2n + 4  O(log n)
Remember: the fastest algorithm
has the slowest growing function
for its runtime 35
Data Structures - Introduction

 Eliminate low order terms


 4n + 5 
 0.5 n log n + 2n + 7 
 n3 + 2n + 3n 
 Eliminate coefficients
 4n 
 0.5 n log n 
 n log n2 =>

36
Data Structures - Introduction

 log AB = log A + log B


 Proof: A 2 log 2 A
, B 2 log 2 B

AB 2log 2 A 2log 2 B 2(log 2 Alog 2 B )


 log AB log A  log B
 Similarly:
 log(A/B) = log A – log B
 log(AB) = B log A

 Any log is equivalent to log-base-2


37
Data Structures - Introduction

f(n) = n3 + 2n2
g(n) = 100n2 + 1000

Although not yet apparent, as n gets “sufficiently


large”, f(n) will be “greater than or equal to” g(n)
38
Data Structures - Introduction

 Upper bound: T(n) = O(f(n)) Big-O


Exist positive constants c and n’ such that
T(n)  c f(n) for all n  n’

 Lower bound: T(n) = (g(n))


Omega
Exist positive constants c and n’ such that
T(n)  c g(n) for all n  n’

 Tight bound: T(n) = (f(n)) Theta


When both hold:
T(n) = O(f(n))
T(n) = (f(n))

39
Data Structures - Introduction

O( f(n) ) : a set or class of functions

g(n)  O( f(n) ) iff there exist positive


consts c and n0 such that:

g(n)  c f(n) for all n  n0

Example:
100n2 + 1000  5 (n3 + 2n2) for all n  19

So g(n)  O( f(n) ) 40
Data Structures - Introduction

100n2 + 1000  5 (n3 + 2n2) for all n  19


So f(n)  O( g(n) )
41
Data Structures - Introduction

 Sometimes you’ll see


g(n) = O( f(n) )
 This is equivalent to
g(n)  O( f(n) )

 What about the reverse?


O( f(n) ) = g(n)

42
Data Structures - Introduction

 constant: O(1)
 logarithmic: O(log n) (logkn, log n2  O(log n))
 linear: O(n)
 log-linear: O(n log n)
 quadratic: O(n2)
 cubic: O(n3)
 polynomial: O(nk) (k is a constant)
 exponential: O(cn) (c is a constant > 1)

43
Data Structures - Introduction

 O( f(n) ) is the set of all functions asymptotically


less than or equal to f(n)
 o( f(n) ) is the set of all functions
asymptotically strictly less than f(n)
 ( f(n) ) is the set of all functions asymptotically
greater than or equal to f(n)
 ( f(n) ) is the set of all functions
asymptotically strictly greater than f(n)
 ( f(n) ) is the set of all functions asymptotically
equal to f(n)

44
Data Structures - Introduction

 g(n)  O( f(n) ) iff


There exist c and n0 such that g(n)  c f(n) for all n  n0
 g(n)  o( f(n) ) iff

There exists a n0 such that g(n) < c f(n) for all c and n  n0
Equivalent to: limn g(n)/f(n) = 0

 g(n)  ( f(n) ) iff


There exist c and n0 such that g(n)  c f(n) for all n  n0
 g(n)  ( f(n) ) iff

There exists a n0 such that g(n) > c f(n) for all c and n  n0
Equivalent to: limn g(n)/f(n) = 

 g(n)  ( f(n) ) iff


g(n)  O( f(n) ) and g(n)  ( f(n) )

45
Asymptotic Notation Mathematics
Relation
O 
 
 =
o <
 >

Data Structures - Introduction 46


Data Structures - Introduction

47
Data Structures - Introduction

 Running time may depend on actual


data input, not just length of input
 Distinguish
 Worst Case
 Your worst enemy is choosing input
 Best Case
 Average Case
 Assumes some probabilistic distribution of inputs
 Amortized
 Average time over many operations

48
Data Structures - Introduction

Two orthogonal axes:

 Bound Flavor
 Upper bound (O, o)
 Lower bound (, )
 Asymptotically tight ()

 Analysis Case
 Worst Case (Adversary)
 Average Case
 Best Case
 Amortized 49
Data Structures - Introduction

 Eliminate 16n3log8(10n2) + 100n2


low-order 16n3log8(10n2)
terms n3log8(10n2)
n3(log8(10) + log8(n2))
n3log8(10) + n3log8(n2)
 Eliminate
n3log8(n2)
constant
2n3log8(n)
coefficients
n3log8(n)
n3log8(2)log(n)
n3log(n)/3
n3log(n)

50

You might also like