Presentation Introdiction To Data Structures 1463137384 181219

Data Structures - Introduction
1
 The data structure you pick needs to

support the operations you need
 Ideally it supports the operations you
will use most often in an efficient
manner
 Examples of operations:
 A List with operations insert and delete
 A Stack with operations push and pop
2
• Abstract Data Type (ADT)

– Mathematical description of an object with set of
operations on the object. Useful building block.
• Algorithm
– A high level, language independent, description
of a step-by-step process
• Data structure
– A specific family of algorithms for implementing
an abstract data type.
• Implementation of data structure
– A specific implementation in a specific language
3
 A stack is an abstract data type supporting

push, pop and isEmpty operations
 A stack data structure could use an array, a
linked list, or anything that can hold data
 One stack implementation is java.util.Stack;
another is java.util.LinkedList
4
 Abstract  Concrete
 Pseudocode  Specific programming
 Algorithm language
 A sequence of high-level,  Program
language independent  A sequence of operations in a
operations, which may specific programming
act upon an abstracted language, which may act upon
view of data. real data in the form of
 Abstract Data Type numbers, images, sound, etc.
(ADT)  Data structure
 A mathematical  A specific way in which a
description of an object program’s data is represented,
and the set of operations which reflects the
on the object. programmer’s design
choices/goals. 5
Ideal data structure:

“fast”, “elegant”, memory efficient
Generates tensions:
 time vs. space
 performance vs. elegance
 generality vs. simplicity
 one operation’s performance vs.
another’s The study of data structures is the
study of tradeoffs. That’s why we
have so many of them!
6
 Introductions
 Administrative Info
 What is this course about?
 Review: Queues and stacks
7
 FIFO: First In First Out

 Queue operations
create
destroy G enqueue
FEDCB
dequeue
A
enqueue
dequeue
is_empty
8
0
Q size - 1
b c d e f
front back
enqueue(Object x) {
Q[back] = x ;
back = (back + 1) % size
}
dequeue() {
x = Q[front] ;
front = (front + 1) % size;
return x ;
}
9
b c d e f
front back
void enqueue(Object x) { Object dequeue() {

if (is_empty()) assert(!is_empty)
front = back = new return_data = front->data
Node(x) temp = front
else front = front->next
back->next = new delete temp
Node(x)
return return_data
back = back->next
}
}
bool is_empty() {
return front == null
} 10
 Too much space  Can grow as needed

 Kth element  Can keep growing
accessed “easily”  No back looping
 Not as complex around to front
 Could make array  Linked list code more
more robust complex
11
 LIFO: Last In First Out

 Stack operations
 create
 destroy A EDCBA
 push
 pop B
 top C
 is_empty D
E
F F
12
 Function call stack

 Removing recursion
 Balancing symbols (parentheses)
 Evaluating Reverse Polish Notation
13
14
 Correctness:
 Does the algorithm do what is intended.
 Performance:
 What is the running time of the algorithm.
 How much storage does it consume.
 Different algorithms may be correct
 Which should I use?
15
 Write a recursive function to find the

sum of the first n integers stored in
array v.
16
 Basis Step: The algorithm is correct for a

base case or two by inspection.
 Inductive Hypothesis (n=k): Assume that

the algorithm works correctly for the first k
cases.
 Inductive Step (n=k+1): Given the

hypothesis above, show that the k+1 case
will be calculated correctly.
17
 Basis Step:
sum(v,0) = 0. 
 Inductive Hypothesis (n=k):

Assume sum(v,k) correctly returns sum of first k
elements of v, i.e. v[0]+v[1]+…+v[k-1]+v[k]
 Inductive Step (n=k+1):

sum(v,n) returns
v[k]+sum(v,k-1)= (by inductive hyp.)
v[k]+(v[0]+v[1]+…+v[k-1])=
v[0]+v[1]+…+v[k-1]+v[k] 
18
 Proving correctness of an algorithm is very important

 a well designed algorithm is guaranteed to work correctly
and its performance can be estimated
 Proving correctness of a program (an

implementation) is fraught with weird bugs
 Abstract Data Types are a way to bridge the gap between
mathematical algorithms and programs
19
GOAL: Sort a list of names
“I’ll buy a faster CPU”
“I’ll use C++ instead of Java – wicked fast!”
“Ooh look, the –O4 flag!”
“Who cares how I do it, I’ll add more memory!”
“Can’t I just get the data pre-sorted??”
20
 Ignores “details”
 What details?
 CPU speed
 Programming language used
 Amount of memory
 Compiler
 Order of input
 Size of input … sorta.
21
 Efficiency measure
 how long the program runs time
complexity
 how much memory it uses space
complexity
 Why analyze at all?
 Decide what algorithm to implement before
actually doing it
 Given code, get a sense for where
bottlenecks must be, without actually
measuring it 22
 Complexity as a function of input size n

T(n) = 4n + 5
T(n) = 0.5 n log n - 2n + 7
T(n) = 2n + n3 + 3n
 What happens as n grows?
23
 Most algorithms are fast for small n

 Time difference too small to be noticeable
 External things dominate (OS, disk I/O,
…)
 BUT n is often large in practice

 Databases, internet, graphics, …
 Difference really shows up as n grows!

24
2 3 5 16 37 50 73 75 126
bool ArrayFind( int array[], int n, int

key){
// Insert your algorithm here
What algorithm would you

} choose to implement this
code snippet?25
Basic Java operations Constant time

Consecutive statements Sum of times
Conditionals Larger branch plus test
Loops Sum of iterations
Function calls Cost of function body
Recursive functions Solve recurrence relation
26
bool LinearArrayFind(int array[],

int n,
int key ) { Best Case:
for( int i = 0; i < n; i++ ) {
if( array[i] == key )
// Found it! Worst Case:
return true;
}
return false;
}
27
bool BinArrayFind( int array[], int low,

int high, int key ) {
// The subarray is empty
if( low > high ) return false; Best case:
// Search this subarray recursively
int mid = (high + low) / 2;
if( key == array[mid] ) {
return true; Worst case:
} else if( key < array[mid] ) {
return BinArrayFind( array, low,
mid-1, key );
} else {
return BinArrayFind( array, mid+1,
high, key );
}
28
1. Determine the recurrence relation. What is/are the

base case(s)?
2. “Expand” the original relation to find an equivalent

general expression in terms of the number of
expansions.
3. Find a closed-form expression by setting the

number of expansions to a value which reduces
the problem to a base case
29
30
Linear Search Binary Search
Best Case 4 at [0] 4 at [middle]
Worst Case 3n+2 4 log n + 4
So … which algorithm is better?

What tradeoffs can you make?
31
32
33
34
 Asymptotic analysis looks at the order

of the running time of the algorithm
 A valuable tool when the input gets “large”
 Ignores the effects of different machines or
different implementations of an algorithm
 Intuitively, to find the asymptotic

runtime, throw away the constants and
low-order terms
 Linear search is T(n) = 3n + 2  O(n)
 Binary search is T(n) = 4 log2n + 4  O(log n)
Remember: the fastest algorithm
has the slowest growing function
for its runtime 35
 Eliminate low order terms

 4n + 5 
 0.5 n log n + 2n + 7 
 n3 + 2n + 3n 
 Eliminate coefficients
 4n 
 0.5 n log n 
 n log n2 =>
36
 log AB = log A + log B

 Proof: A 2 log 2 A
, B 2 log 2 B
AB 2log 2 A 2log 2 B 2(log 2 Alog 2 B )

 log AB log A  log B
 Similarly:
 log(A/B) = log A – log B
 log(AB) = B log A
 Any log is equivalent to log-base-2

37
f(n) = n3 + 2n2
g(n) = 100n2 + 1000
Although not yet apparent, as n gets “sufficiently

large”, f(n) will be “greater than or equal to” g(n)
38
 Upper bound: T(n) = O(f(n)) Big-O

Exist positive constants c and n’ such that
T(n)  c f(n) for all n  n’
 Lower bound: T(n) = (g(n))

Omega
Exist positive constants c and n’ such that
T(n)  c g(n) for all n  n’
 Tight bound: T(n) = (f(n)) Theta

When both hold:
T(n) = O(f(n))
T(n) = (f(n))
39
O( f(n) ) : a set or class of functions
g(n)  O( f(n) ) iff there exist positive

consts c and n0 such that:
g(n)  c f(n) for all n  n0
Example:
100n2 + 1000  5 (n3 + 2n2) for all n  19
So g(n)  O( f(n) ) 40
100n2 + 1000  5 (n3 + 2n2) for all n  19

So f(n)  O( g(n) )
41
 Sometimes you’ll see

g(n) = O( f(n) )
 This is equivalent to
g(n)  O( f(n) )
 What about the reverse?

O( f(n) ) = g(n)
42
 constant: O(1)
 logarithmic: O(log n) (logkn, log n2  O(log n))
 linear: O(n)
 log-linear: O(n log n)
 quadratic: O(n2)
 cubic: O(n3)
 polynomial: O(nk) (k is a constant)
 exponential: O(cn) (c is a constant > 1)
43
 O( f(n) ) is the set of all functions asymptotically

less than or equal to f(n)
 o( f(n) ) is the set of all functions
asymptotically strictly less than f(n)
 ( f(n) ) is the set of all functions asymptotically
greater than or equal to f(n)
 ( f(n) ) is the set of all functions
asymptotically strictly greater than f(n)
 ( f(n) ) is the set of all functions asymptotically
equal to f(n)
44
 g(n)  O( f(n) ) iff

There exist c and n0 such that g(n)  c f(n) for all n  n0
 g(n)  o( f(n) ) iff
There exists a n0 such that g(n) < c f(n) for all c and n  n0
Equivalent to: limn g(n)/f(n) = 0
 g(n)  ( f(n) ) iff

There exist c and n0 such that g(n)  c f(n) for all n  n0
 g(n)  ( f(n) ) iff
There exists a n0 such that g(n) > c f(n) for all c and n  n0
Equivalent to: limn g(n)/f(n) = 
 g(n)  ( f(n) ) iff

g(n)  O( f(n) ) and g(n)  ( f(n) )
45
Asymptotic Notation Mathematics
Relation
O 
 
 =
o <
 >
Data Structures - Introduction 46

47
 Running time may depend on actual

data input, not just length of input
 Distinguish
 Worst Case
 Your worst enemy is choosing input
 Best Case
 Average Case
 Assumes some probabilistic distribution of inputs
 Amortized
 Average time over many operations
48
Two orthogonal axes:
 Bound Flavor
 Upper bound (O, o)
 Lower bound (, )
 Asymptotically tight ()
 Analysis Case
 Worst Case (Adversary)
 Average Case
 Best Case
 Amortized 49
 Eliminate 16n3log8(10n2) + 100n2

low-order 16n3log8(10n2)
terms n3log8(10n2)
n3(log8(10) + log8(n2))
n3log8(10) + n3log8(n2)
 Eliminate
n3log8(n2)
constant
2n3log8(n)
coefficients
n3log8(n)
n3log8(2)log(n)
n3log(n)/3
n3log(n)
50

Presentation Introdiction To Data Structures 1463137384 181219

Uploaded by

Copyright:

Available Formats

Presentation Introdiction To Data Structures 1463137384 181219

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation Introdiction To Data Structures 1463137384 181219

Uploaded by

Copyright:

Available Formats

Data Structures - Introduction

 The data structure you pick needs to

• Abstract Data Type (ADT)

 A stack is an abstract data type supporting

Ideal data structure:

 FIFO: First In First Out

void enqueue(Object x) { Object dequeue() {

 Too much space  Can grow as needed

 LIFO: Last In First Out

 Function call stack

 Write a recursive function to find the

 Basis Step: The algorithm is correct for a

 Inductive Hypothesis (n=k): Assume that

 Inductive Step (n=k+1): Given the

 Inductive Hypothesis (n=k):

 Inductive Step (n=k+1):

 Proving correctness of an algorithm is very important

 Proving correctness of a program (an

GOAL: Sort a list of names

“I’ll buy a faster CPU”

“I’ll use C++ instead of Java – wicked fast!”

“Ooh look, the –O4 flag!”

“Who cares how I do it, I’ll add more memory!”

“Can’t I just get the data pre-sorted??”

 Complexity as a function of input size n

 What happens as n grows?

 Most algorithms are fast for small n

 BUT n is often large in practice

 Difference really shows up as n grows!

bool ArrayFind( int array[], int n, int

What algorithm would you

Basic Java operations Constant time

bool LinearArrayFind(int array[],

bool BinArrayFind( int array[], int low,

1. Determine the recurrence relation. What is/are the

2. “Expand” the original relation to find an equivalent

3. Find a closed-form expression by setting the

Linear Search Binary Search

Best Case 4 at [0] 4 at [middle]

Worst Case 3n+2 4 log n + 4

So … which algorithm is better?

 Asymptotic analysis looks at the order

 Intuitively, to find the asymptotic

 Eliminate low order terms

 log AB = log A + log B

AB 2log 2 A 2log 2 B 2(log 2 Alog 2 B )

 Any log is equivalent to log-base-2

Although not yet apparent, as n gets “sufficiently

 Upper bound: T(n) = O(f(n)) Big-O

 Lower bound: T(n) = (g(n))

 Tight bound: T(n) = (f(n)) Theta

O( f(n) ) : a set or class of functions

g(n)  O( f(n) ) iff there exist positive

g(n)  c f(n) for all n  n0

100n2 + 1000  5 (n3 + 2n2) for all n  19

 Sometimes you’ll see

 What about the reverse?

 O( f(n) ) is the set of all functions asymptotically

 g(n)  O( f(n) ) iff