Data Structures Unit 1
Data Structures Unit 1
• Data types
• Variables
• Operators
• Input/ output
• Conditional and Control statements
• pointers
• Arrays
• Strings
• Good Logical thinking
• Before writing a code for a problem first of all try to write pseudocode and
then implement it.
Course Objectives
• 1. To impart the basic concepts of linear and nonlinear data structures
such as stacks, queues, lists, trees and graphs and their operations
Reference Books:
• Horowitz and Sahani, Fundamentals of Data Structures, Universities
Press, 2nd Edition,2008.
• Images are linked with each other. So, an image viewer software uses a
linked list to view the previous and the next images using the previous and
next buttons.
• Web pages can be accessed using the previous and the next URL links
which are linked using linked list.
• The music players also use the same technique to switch between music.
• To keep the track of turns in a multi player game, a circular linked list is
used.
Stacks
• Temporary storage structure for recursive operations
• Auxiliary storage structure for nested operations, function calls,
deferred/postponed functions
• Manage function calls
• Evaluation of arithmetic expressions in various programming languages
• Conversion of infix expressions into postfix expressions
• Checking syntax of expressions in a programming environment
• Matching of parenthesis
• String reversal
• In all the problems solutions based on backtracking.
• Used in depth first search in graph and tree traversal.
• Operating System functions
• UNDO and REDO functions in an editor.
•[a*(b-c)/d]+e
•[2*(3-1)/2]+1 == 3
•- / * +
•31-2/2*1+
•[[((()))]([)]]
•INFIX (B – C)
•PREFIX -BC
•POSTFIX BC-
Some Applications of a stack are:
Performance evaluation
– Performance analysis
– Performance measurement
Performance analysis - prior
– an important branch of CS, complexity theory
– estimate time and space
– machine independent
Performance measurement -posterior
– The actual time and space requirements
– machine dependent
Space and time
– Does the program efficiently use primary and secondary storage?
– Is the program's running time acceptable for the task?
Evaluate a program generally
– Does the program meet the original specifications of the task?
– Does it work correctly?
– Does the program contain documentation that show how to use it and how it works?
– Does the program effectively use functions to create logical units?
– Is the program's code readable?
Evaluate a program
Meet specifications, Work correctly,
Good user-interface, Well-documentation,
Readable, Effectively use functions,
Running time acceptable, Efficiently use space
Definition
– The space complexity of a program is the amount of memory that it needs to
run to completion
The space needed is the sum of
– Fixed space and Variable space
Fixed space
– Includes the instructions, variables, and constants
– Independent of the number and size of I/O
Variable space
– Includes dynamic allocation, functions' recursion
Total space of any program
– S(P)= c+ Sp(Instance)
Examples of Evaluating Space Complexity
float abc(float a, float b, float c)
{
return a+b+b*c+(a+b-c)/(a+b)+4.00;
} float rsum(float list[], int n)
Sabc(I)= 0 12+4+4=20 bytes {
if (n) return rsum(list, n-1)+ list[n-1];
return 0;
float sum(float list[], int n) }
{ Srsum (n)= 4*n
float fTmpSum= 0; 4n + 16
int i; parameter:float(list[]) 4
for (i= 0; i< n; i++) parameter:integer(n) 4
return address 4
fTmpSum+= list[i];
return value 4
return fTmpSum;
4n + 12 }
Ssum(I)= Ssum (n)= 0
Time Complexity
Definition
The time complexity, T(p), taken by a program P is the sum of the compile time and the run time
Total time
T(P)= compile time + run (or execution) time
= c + tp(instance characteristics)
Compile time does not depend on the instance characteristics
How to evaluate?
Use the system clock
Number of steps performed
machine-independent
Definition of a program step
A program step is a syntactically or semantically meaningful program segment whose execution time is
independent of the instance characteristics
(10 additions can be one step, 100 multiplications can also be one step)
Types of Algorithm's Analysis
Order of Growth
SPARSE MATRIX
Hash Tables
Motivation
k2 h(k3)
k3
p-1
Hash Functions
• A collision occurs when h(x) maps two keys to the same
location.
U 0
(universe of keys)
h(k1)
collision
k1
h(k4)
K k4
(actual k5
h(k2) = h(k5)
keys)
k2 h(k3)
k3
p-1
Hash Functions
• A hash function, h, maps keys of a given type to integers in a fixed interval [0, N
- 1]
• Example:
h(x) = x mod N
is a hash function for integer keys
• The integer h(x) is called the hash value of x.
• A hash table for a given key type consists of
• Hash function h
• Array (called table) of size N
• The goal is to store item (k) at index i = h(k)
Example
0
• We design a hash table storing
1
employees records using their social 025-612-0001
security number, SSN as the key. 2
981-101-0002
• SSN is a nine-digit positive integer 3
• Our hash table uses an array of size N 4
451-229-0004
= 10,000 and the hash function
h(x) = last four digits of x
…
9997
9998
200-751-9998
9999
Example
0
• Our hash table uses an array of 1
size N = 100. 025-612-0001
2
• We have n = 49 employees. 981-101-0002
3
• Need a method to handle
collisions. 4
451-229-0004
• As long as the chance for
…
collision is low, we can achieve
this goal. 9997
• Setting N = 1000 and looking at 9998
200-751-9998
the last four digits will reduce 9999 176-354-9998
the chance of collision.
Collisions
• Can collisions be avoided?
• If my data is reversible, yes
• See perfect hashing for the case where the set of keys is static
• In general, no.
• Two primary techniques for resolving collisions:
• Chaining – keep a collection at each key slot.
• Open addressing – if the current slot is full use the next open one.
Chaining
• Chaining puts elements that hash to the same slot in a linked list:
U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we insert an element?
U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we delete an element?
• Do we need a doubly-linked list for efficient delete?
U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we search for a element with a
given key? T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3
k8 k3 ——
k6
k8 k6 ——
——
Open Addressing
• Basic idea:
• To insert: if slot is full, try another slot, …, until an open slot is found (probing)
• To search, follow same sequence of probes as would be used when inserting
the element
• If reach element with correct key, return it
• If reach a NULL pointer, element is not in table
• Good for fixed sets
• Example: spell checking
Open Addressing
• The colliding item is placed in a different cell of the
table.
• No dynamic memory.
• Fixed Table size.
• Load factor: n/N, where n is the number of items to store and N the
size of the hash table.
• Cleary, n ≤ N, or n/N ≤ 1.
• To get a reasonable performance, n/N<0.5.
Probing
• They key question is what should the next cell to try be?
• Random would be great, but we need to be able to repeat it.
• Three common techniques:
• Linear Probing (useful for discussion only)
• Quadratic Probing
• Double Hashing
Linear Probing
• Linear probing handles • Example:
collisions by placing the • h(x) = x mod 13
colliding item in the next • Insert keys 18, 41, 22, 44, 59,
(circularly) available table cell. 32, 31, 73, in this order
• Each table cell inspected is
referred to as a probe.
• Colliding items lump together,
causing future collisions to 0 1 2 3 4 5 6 7 8 9 10 11 12
cause a longer sequence of
probes.
41 18 44 59 32 22 31 73
0 1 2 3 4 5 6 7 8 9 10 11 12
Search with Linear Probing
Algorithm get(k)
• Consider a hash table A that uses linear probing
i h(k)
• get(k)
p0
• We start at cell h(k)
repeat
• We probe consecutive locations until one of
c A[i]
the following occurs
if c =
• An item with key k is found, or
return null
• An empty cell is found, or
else if c.key () = k
• N cells have been unsuccessfully probed
return c.element()
• To ensure the efficiency, if k is not in the table,
we want to find an empty cell as soon as else
possible. The load factor can NOT be close to i (i + 1) mod N
1. pp+1
until p = N
return null
Linear Probing
• Search for key=20. • Example:
• h(20)=20 mod 13 =7. • h(x) = x mod 13
• Go through rank 8, 9, …, 12, 0. • Insert keys 18, 41, 22, 44, 59,
32, 31, 73, 12, 20 in this
• Search for key=15 order
• h(15)=15 mod 13=2.
• Go through rank 2, 3 and
return null.
0 1 2 3 4 5 6 7 8 9 10 11 12
20 41 18 44 59 32 22 31 73 12
0 1 2 3 4 5 6 7 8 9 10 11 12
Double Hashing
• Use two hash functions
• If M is prime, eventually will examine every position in the table
• double_hash_insert(K)
if(table is full) error
probe = h1(K)
offset = h2(K)
while (table[probe] occupied)
probe = (probe + offset) mod M
table[probe] = K
Double Hashing
• Many of same (dis)advantages as linear probing
• Distributes keys more uniformly than linear probing does
• Notes:
• h2(x) should never return zero.
• M should be prime.
Double Hashing Example
• h1(K) = K mod 13
• h2(K) = 8 - K mod 8
• we want h2 to be an offset to add
• 18 41 22 44 59 32 31 73
0 1 2 3 4 5 6 7 8 9 10 11 12
44 41 73 18 32 53 31 22
0 1 2 3 4 5 6 7 8 9 10 11 12
Open Addressing Summary
• In general, the hash function contains two arguments now:
• Key value
• Probe number
h(k,p), p=0,1,...,m-1
• Probe sequences
<h(k,0), h(k,1), ..., h(k,m-1)>
• Should be a permutation of <0,1,...,m-1>
• There are m! possible permutations
• Good hash functions should be able to produce all m! probe sequences
Open Addressing Summary
• None of the methods discussed can generate more than m2 different
probing sequences.
• Linear Probing:
• Clearly, only m probe sequences.
• Quadratic Probing:
• The initial key determines a fixed probe sequence, so only m distinct probe
sequences.
• Double Hashing
• Each possible pair (h1(k),h2(k)) yields a distinct probe, so m2 permutations.
Choosing A Hash Function
• Clearly choosing the hash function well is crucial.
• What will a worst-case hash function do?
• What will be the time to search in this case?
• What are desirable features of the hash function?
• Should distribute keys uniformly into slots
• Should not depend on patterns in the data
From Keys to Indices
• A hash function is usually the composition of two maps:
• hash code map: key integer
• compression map: integer [0, N - 1]
• An essential requirement of the hash function is to map equal keys to
equal indices
• A “good” hash function minimizes the probability of collisions
Problem 1: