0% found this document useful (0 votes)
31 views

Data Structures Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

Data Structures Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

Data Structures

Prerequisites for Data Structures

• Programming Languages For Data Structures

• Most of the student will be in a dilemma, Is learning one programming


language is enough?

• Programming Languages to be Learned ?


• C
• C++
• Java
• Python
Prerequisites Contd..

• What Concepts to be Learn ?

• Data types
• Variables
• Operators
• Input/ output
• Conditional and Control statements
• pointers
• Arrays
• Strings
• Good Logical thinking

• Improve logical skills and analytical skills to approach to a problem.

• Logical building comes by practicing several problems.

• So, daily try to practice as many coding problems as u can.

• Before writing a code for a problem first of all try to write pseudocode and
then implement it.
Course Objectives
• 1. To impart the basic concepts of linear and nonlinear data structures
such as stacks, queues, lists, trees and graphs and their operations

• 2. To compare static and dynamic memory allocation

• 3. To understand basic searching and sorting algorithms

• 4. Choose the appropriate data structure for a specified application


Course Outcomes
• At the end of the syllabus, students will be able to
1. Examine space and time complexity of iterative and recursive functions
Level : Analyze
2. Apply hashing techniques for avoiding the collision.
Level : Apply
3. Illustrate linear data structures such as stack, queue, array, linked list for
illustrating fundamental data structure operations
Level : Apply
4. Illustrate nonlinear data structures such as trees, graphs for illustrating
fundamental data structure operations
Level : Apply
5. Implement searching and sorting techniques
Level : Apply
CA Methodologies

• CA – I : Offline Test (Based on Unit 1 and Unit 2)

• CA-II : Application based Assignments (Based on Unit 3 and Unit 4)


Text Book:

• 1. Weiss, Data structures and algorithms analysis in C++, Pearson


Education, 4th Edition,2013.

Reference Books:
• Horowitz and Sahani, Fundamentals of Data Structures, Universities
Press, 2nd Edition,2008.

• Y. Langsm, M. Augenstin, A. Tanenbaum , Data Structure using C and


C++, Prentice Hall India Learning Private Limited,2nd edition,1998.
Applications of Data Structures
• Arrays

• Storing list of data elements belonging to same data type


• Auxiliary (supplementary) storage for other data structures
• Storage of binary tree elements of fixed count
• Storage of matrices
• For example, if we wish to store the contacts on our phone, then the software
will simply place all our contacts in an array.
Arrays Cntd..
Some other applications of the arrays are:

• Arrangement of leader-board of a game can be done simply through arrays


to store the score and arrange them in descending order to clearly make
out the rank of each player in the game.

• A simple question Paper is an array of numbered questions with each of


them assigned to some marks.

• 2D arrays, commonly known as, matrix, are used in image processing.

• It is also used in speech processing, in which each speech signal is an array.


Linked List
• Implementing stacks, queues, binary trees and graphs of predefined size.
• Implement dynamic memory management functions of operating system.
• Polynomial implementation for mathematical operations
• Circular linked list is used to implement OS or application functions that
require round robin execution of tasks.
• Circular linked list is used in a slide show where a user wants to go back to the
first slide after last slide is displayed.
• When a user uses the alt+tab key combination to browse through the opened
application to select a desired application
• Doubly linked list is used in the implementation of forward and backward
buttons in a browser to move backwards and forward in the opened pages of
a website.
• Circular queue is used to maintain the playing sequence of multiple players in
a game.
Some other applications of the linked list are:

• Images are linked with each other. So, an image viewer software uses a
linked list to view the previous and the next images using the previous and
next buttons.

• Web pages can be accessed using the previous and the next URL links
which are linked using linked list.

• The music players also use the same technique to switch between music.

• To keep the track of turns in a multi player game, a circular linked list is
used.
Stacks
• Temporary storage structure for recursive operations
• Auxiliary storage structure for nested operations, function calls,
deferred/postponed functions
• Manage function calls
• Evaluation of arithmetic expressions in various programming languages
• Conversion of infix expressions into postfix expressions
• Checking syntax of expressions in a programming environment
• Matching of parenthesis
• String reversal
• In all the problems solutions based on backtracking.
• Used in depth first search in graph and tree traversal.
• Operating System functions
• UNDO and REDO functions in an editor.
•[a*(b-c)/d]+e
•[2*(3-1)/2]+1 == 3
•- / * +
•31-2/2*1+
•[[((()))]([)]]
•INFIX (B – C)
•PREFIX -BC
•POSTFIX BC-
Some Applications of a stack are:

• Converting infix to postfix expressions.


• Undo operation is also carried out through stacks.
• Syntaxes in languages are parsed using stacks.
• It is used in many virtual machines like JVM.
• Forward – backward surfing in browser
• History of visited websites
• Message logs and all messages you get are arranged in stack
• Call logs, E-mails, Google photos’ any gallery , YouTube downloads,
Notifications ( latest appears first )
Queues
• It is used in breadth search operation in graphs.

• Job scheduler operations of OS like a print buffer queue, keyboard buffer


queue to store the keys pressed by users

• Job scheduling, CPU scheduling, Disk Scheduling

• Priority queues are used in file downloading operations in a browser

• Data transfer between peripheral devices and CPU.

• Interrupts generated by the user applications for CPU

• Calls handled by the customers in BPO


Some applications of a queue are:

• Operating System uses queue for job scheduling.


• To handle congestion in networking queue can be used.
• Data packets in communication are arranged in queue format.
• Sending an E-mail, it will be queued
• server while responding request
• Uploading and downloading photo’s, first kept for
uploading/downloading will completed first (Not if there is threading)
• Most of internet requests and processes uses queue
• While switching multiple applications, windows uses circular queue.
Trees
• Implementing the hierarchical structures in computer systems like
directory and file system
• Implementing the navigation structure of a website
• Code generation like Huffman’s code
• Decision making in gaming applications
• Parsing of expressions and statements in programming language
compilers
• Spanning trees for routing decisions in computer and
communications networks
• Hash trees
• path-finding algorithm to implement in AI, robotics and video games
applications
Some applications of the trees are:

• XML Parser uses tree algorithms.


• Decision-based algorithm is used in machine learning which works upon
the algorithm of tree.
• Databases also uses tree data structures for indexing.
• Domain Name Server(DNS) also uses tree structures.
• File explorer/my computer of mobile/any computer
• BST used in computer Graphics
• Posting questions on websites like Quora, the comments are child of
questions
Graphs
• Representing networks and routes in communication, transportation and travel
applications
• Routes in GPS
• Interconnections in social networks and other network based applications
• Ecommerce applications to present user preferences
• Utility networks to identify the problems posed to municipal or local corporations
• Resource utilization and availability in an organization
• Document link map of a website to display connectivity between pages through
hyperlinks
• Robotic motion and neural networks
Some applications of a graph are:

• Facebook’s Graph API uses the structure of Graphs.


• Google’s Knowledge Graph also has to do something with Graph.
• Dijkstra algorithm or the shortest path first algorithm also uses graph
structure to finding the smallest path between the nodes of the graph.
• GPS navigation system also uses shortest path APIs.
• Networking components has huge application of graph
• Facebook, Instagram and all social media networking sites every user is
Node
• Data organization
Application of Hash Tables:
Some applications of a hash table are:
• Data stored in databases is generally of the key-value format which is
done through hash tables.
• Every time we type something to be searched in google chrome or other
browsers, it generates the desired output based on the principle of
hashing.
• Message Digest, a function of cryptography also uses hashing for
creating output in such a manner that reaching to the original input
from that generated output is almost next to impossible.
• In our computers we have various files stored in it, each file has two very
crucial information that is, filename and file path, in order to make a
connection between the filename to its corresponding file path hash
tables are used.
Applications of a matrix
• In geology, matrices are used for making seismic surveys.
• Used for plotting graphs, statistics and also to do scientific studies and
research in almost different fields.
• Matrices are also used in representing the real-world data’s like the
population of people, infant mortality rate, etc.
• They are best representation methods for plotting surveys.
Introduction to Data Structures
1
Performance Analysis

 Performance evaluation
– Performance analysis
– Performance measurement
 Performance analysis - prior
– an important branch of CS, complexity theory
– estimate time and space
– machine independent
 Performance measurement -posterior
– The actual time and space requirements
– machine dependent
 Space and time
– Does the program efficiently use primary and secondary storage?
– Is the program's running time acceptable for the task?
 Evaluate a program generally
– Does the program meet the original specifications of the task?
– Does it work correctly?
– Does the program contain documentation that show how to use it and how it works?
– Does the program effectively use functions to create logical units?
– Is the program's code readable?
 Evaluate a program
Meet specifications, Work correctly,
Good user-interface, Well-documentation,
Readable, Effectively use functions,
Running time acceptable, Efficiently use space

 How to achieve them?


– Good programming style, experience, and practice
– Discuss and think
Space Complexity

 Definition
– The space complexity of a program is the amount of memory that it needs to
run to completion
 The space needed is the sum of
– Fixed space and Variable space
 Fixed space
– Includes the instructions, variables, and constants
– Independent of the number and size of I/O
 Variable space
– Includes dynamic allocation, functions' recursion
 Total space of any program
– S(P)= c+ Sp(Instance)
Examples of Evaluating Space Complexity
float abc(float a, float b, float c)
{
return a+b+b*c+(a+b-c)/(a+b)+4.00;
} float rsum(float list[], int n)
Sabc(I)= 0 12+4+4=20 bytes {
if (n) return rsum(list, n-1)+ list[n-1];
return 0;
float sum(float list[], int n) }
{ Srsum (n)= 4*n
float fTmpSum= 0; 4n + 16
int i; parameter:float(list[]) 4
for (i= 0; i< n; i++) parameter:integer(n) 4
return address 4
fTmpSum+= list[i];
return value 4
return fTmpSum;
4n + 12 }
Ssum(I)= Ssum (n)= 0
Time Complexity
Definition
The time complexity, T(p), taken by a program P is the sum of the compile time and the run time
Total time
T(P)= compile time + run (or execution) time
= c + tp(instance characteristics)
Compile time does not depend on the instance characteristics
How to evaluate?
Use the system clock
Number of steps performed
 machine-independent
Definition of a program step
 A program step is a syntactically or semantically meaningful program segment whose execution time is
independent of the instance characteristics
(10 additions can be one step, 100 multiplications can also be one step)
Types of Algorithm's Analysis
Order of Growth
SPARSE MATRIX
Hash Tables
Motivation

 Arrays provide an indirect way to access a set.


 Many times we need an association between
two sets, or a set of keys and associated data.
 Ideally we would like to access this data directly
with the keys.
 We would like a data structure that supports
fast search, insertion, and deletion.
 Do not usually care about sorting.
 The abstract data type is usually called a
Dictionary, Map or Partial Map
Dictionaries

 What is the best way to implement this?


 Linked Lists?
 Double Linked Lists?
 Queues?
 Stacks?
 Multiple indexed arrays (e.g., data[key[i]])?
 To answer this, ask what the complexity of the operations are:
 Insertion
 Deletion
 Search
Direct Addressing
 Let’s look at an easy case, suppose:
 The range of keys is 0..m-1
 Keys are distinct
 Possible solution
 Set up an array T[0..m-1] in which
 T[i] = x if x T and key[x] = i
 T[i] = NULL otherwise

 This is called a direct-address table


 Operations take O(1) time!
 So what’s the problem?
Direct Addressing
 Direct addressing works well when the range m of keys is
relatively small
 But what if the keys are 32-bit integers?
 Problem 1: direct-address table will have
232 entries, more than 4 billion
 Problem 2: even if memory is not an issue, the time to initialize
the elements to NULL may be more
 Solution: map keys to smaller range 0..p-1
 Desire p = O(m).
Hash Table

 Hash Tables provide O(1) support for all of these operations!


 The key is rather than indexing an array directly, index it
through some function, h(x), called a hash function.
 myArray[ h(index) ]
 Key questions:
 What is the set that the x comes from?
 What is h() and what is its range?
Hash Table

 Consider this problem:


 If I know a priori the p keys from some finite set U, is it possible to
develop a function h(x) that will uniquely map the p keys onto the
set of numbers 0..p-1?
Hash Functions
• In general a difficult problem. Try something simpler.
U 0
(universe of keys)
h(k1)
k1
h(k4)
K k4
(actual k5
h(k2) = h(k5)
keys)

k2 h(k3)
k3

p-1
Hash Functions
• A collision occurs when h(x) maps two keys to the same
location.

U 0
(universe of keys)
h(k1)
collision
k1
h(k4)
K k4
(actual k5
h(k2) = h(k5)
keys)

k2 h(k3)
k3

p-1
Hash Functions
• A hash function, h, maps keys of a given type to integers in a fixed interval [0, N
- 1]
• Example:
h(x) = x mod N
is a hash function for integer keys
• The integer h(x) is called the hash value of x.
• A hash table for a given key type consists of
• Hash function h
• Array (called table) of size N
• The goal is to store item (k) at index i = h(k)
Example
0

• We design a hash table storing
1
employees records using their social 025-612-0001
security number, SSN as the key. 2
981-101-0002
• SSN is a nine-digit positive integer 3

• Our hash table uses an array of size N 4
451-229-0004
= 10,000 and the hash function
h(x) = last four digits of x


9997

9998
200-751-9998
9999

Example
0

• Our hash table uses an array of 1
size N = 100. 025-612-0001
2
• We have n = 49 employees. 981-101-0002
3
• Need a method to handle 
collisions. 4
451-229-0004
• As long as the chance for


collision is low, we can achieve
this goal. 9997

• Setting N = 1000 and looking at 9998
200-751-9998
the last four digits will reduce 9999 176-354-9998
the chance of collision. 
Collisions
• Can collisions be avoided?
• If my data is reversible, yes
• See perfect hashing for the case where the set of keys is static
• In general, no.
• Two primary techniques for resolving collisions:
• Chaining – keep a collection at each key slot.
• Open addressing – if the current slot is full use the next open one.
Chaining
• Chaining puts elements that hash to the same slot in a linked list:

U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we insert an element?

U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we delete an element?
• Do we need a doubly-linked list for efficient delete?

U ——
(universe of keys) k1 k4 ——
——
k1 ——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3 k3 ——
k8
k6
k8 k6 ——
——
Chaining
• How do we search for a element with a
given key? T
U ——
(universe of keys) k1 k4 ——
——
k1
——
K k4 k5 ——
(actual
k7 k5 k2 k7 ——
keys)
——
k2 k3
k8 k3 ——
k6
k8 k6 ——
——
Open Addressing
• Basic idea:
• To insert: if slot is full, try another slot, …, until an open slot is found (probing)
• To search, follow same sequence of probes as would be used when inserting
the element
• If reach element with correct key, return it
• If reach a NULL pointer, element is not in table
• Good for fixed sets
• Example: spell checking
Open Addressing
• The colliding item is placed in a different cell of the
table.
• No dynamic memory.
• Fixed Table size.
• Load factor: n/N, where n is the number of items to store and N the
size of the hash table.
• Cleary, n ≤ N, or n/N ≤ 1.
• To get a reasonable performance, n/N<0.5.
Probing
• They key question is what should the next cell to try be?
• Random would be great, but we need to be able to repeat it.
• Three common techniques:
• Linear Probing (useful for discussion only)
• Quadratic Probing
• Double Hashing
Linear Probing
• Linear probing handles • Example:
collisions by placing the • h(x) = x mod 13
colliding item in the next • Insert keys 18, 41, 22, 44, 59,
(circularly) available table cell. 32, 31, 73, in this order
• Each table cell inspected is
referred to as a probe.
• Colliding items lump together,
causing future collisions to 0 1 2 3 4 5 6 7 8 9 10 11 12
cause a longer sequence of
probes.
41 18 44 59 32 22 31 73

0 1 2 3 4 5 6 7 8 9 10 11 12
Search with Linear Probing
Algorithm get(k)
• Consider a hash table A that uses linear probing
i  h(k)
• get(k)
p0
• We start at cell h(k)
repeat
• We probe consecutive locations until one of
c  A[i]
the following occurs
if c = 
• An item with key k is found, or
return null
• An empty cell is found, or
else if c.key () = k
• N cells have been unsuccessfully probed
return c.element()
• To ensure the efficiency, if k is not in the table,
we want to find an empty cell as soon as else
possible. The load factor can NOT be close to i  (i + 1) mod N
1. pp+1
until p = N
return null
Linear Probing
• Search for key=20. • Example:
• h(20)=20 mod 13 =7. • h(x) = x mod 13
• Go through rank 8, 9, …, 12, 0. • Insert keys 18, 41, 22, 44, 59,
32, 31, 73, 12, 20 in this
• Search for key=15 order
• h(15)=15 mod 13=2.
• Go through rank 2, 3 and
return null.
0 1 2 3 4 5 6 7 8 9 10 11 12

20 41 18 44 59 32 22 31 73 12

0 1 2 3 4 5 6 7 8 9 10 11 12
Double Hashing
• Use two hash functions
• If M is prime, eventually will examine every position in the table
• double_hash_insert(K)
if(table is full) error
probe = h1(K)
offset = h2(K)
while (table[probe] occupied)
probe = (probe + offset) mod M
table[probe] = K
Double Hashing
• Many of same (dis)advantages as linear probing
• Distributes keys more uniformly than linear probing does
• Notes:
• h2(x) should never return zero.
• M should be prime.
Double Hashing Example
• h1(K) = K mod 13
• h2(K) = 8 - K mod 8
• we want h2 to be an offset to add
• 18 41 22 44 59 32 31 73

0 1 2 3 4 5 6 7 8 9 10 11 12

44 41 73 18 32 53 31 22

0 1 2 3 4 5 6 7 8 9 10 11 12
Open Addressing Summary
• In general, the hash function contains two arguments now:
• Key value
• Probe number
h(k,p), p=0,1,...,m-1
• Probe sequences
<h(k,0), h(k,1), ..., h(k,m-1)>
• Should be a permutation of <0,1,...,m-1>
• There are m! possible permutations
• Good hash functions should be able to produce all m! probe sequences
Open Addressing Summary
• None of the methods discussed can generate more than m2 different
probing sequences.
• Linear Probing:
• Clearly, only m probe sequences.
• Quadratic Probing:
• The initial key determines a fixed probe sequence, so only m distinct probe
sequences.
• Double Hashing
• Each possible pair (h1(k),h2(k)) yields a distinct probe, so m2 permutations.
Choosing A Hash Function
• Clearly choosing the hash function well is crucial.
• What will a worst-case hash function do?
• What will be the time to search in this case?
• What are desirable features of the hash function?
• Should distribute keys uniformly into slots
• Should not depend on patterns in the data
From Keys to Indices
• A hash function is usually the composition of two maps:
• hash code map: key  integer
• compression map: integer  [0, N - 1]
• An essential requirement of the hash function is to map equal keys to
equal indices
• A “good” hash function minimizes the probability of collisions
Problem 1:

• Consider a hash table of size 11 that uses open addressing


with linear probing. Let h(k)=kmod11 be the hash functions
used. A sequence of records with keys
43,36,92,87,11,4,71,13,14 are inserted into an initially
empty hash table, the bins of which are indexed from 0 to
10. What is the index of bin into which the last record is
inserted?
Problem 2:
• Given a hash table T with 25 slots that stores 2000 elements, what
will be the load factor for table T here?
Problem 3:
• In a table of size 10,key values are given as
79,28,39,68.Apply double hashing for avoiding
collision. What will be the respective index positions
for given key values after insertion? Use second hash
function as (7-key%7)

You might also like