0% found this document useful (0 votes)
5 views

Module 5 Graphs

This document covers data structures with a focus on graphs, including definitions, representations, classifications, and basic operations. It also discusses sorting algorithms like insertion sort and radix sort, as well as hashing techniques and their functions. Key traversal methods such as Breadth First Search and Depth First Search are explained, along with their algorithms.

Uploaded by

Ashish Shetty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module 5 Graphs

This document covers data structures with a focus on graphs, including definitions, representations, classifications, and basic operations. It also discusses sorting algorithms like insertion sort and radix sort, as well as hashing techniques and their functions. Key traversal methods such as Breadth First Search and Depth First Search are explained, along with their algorithms.

Uploaded by

Ashish Shetty
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

DATA STRUCTURES AND APPLICATIONS

MODULE - 5
Graphs: Definitions, Terminologies, Matrix and Adjacency List Representation Of Graphs,
Elementary Graph operations, Traversal methods: Breadth First Search and Depth First Search.
Sorting and Searching: Insertion Sort, Radix sort, Address Calculation Sort.
Hashing: Hash Table organizations, Hashing Functions, Static and Dynamic Hashing.
Files and Their Organization: Data Hierarchy, File Attributes Text Files and Binary Files, Basic
File Operations, File Organizations and Indexing.

5.1 Graphs - Terminology and Representation


 A graph is an abstract data structure that is used to implement the mathematical concept of graphs. It
is basically a collection of vertices (also called nodes) and edges that connect these vertices.
 A graph is often viewed as a generalization of the tree structure, where instead of having a purely
parent-to-child relationship between tree nodes, any kind of complex relationship can exist.

Definitions: Graph, Vertices, Edges


 Graph is defined as a pair of two sets G = (V, E)
1. V = a set of vertices
2. E = a set of edges
 Edges: Edge is an arc or line joining two vertices.
 Vertices: Vertices also called nodes, represented by circle. Denote vertices with labels.
Example:
V = {A,B,C,D,E}
E = {(A,B),(A,C),(A,D),(B,D),(C,D),(B,E),(D,E)}

 Examples of Graph applications:


 Cities with distances between Roads
 Network and shortest routes
 Social networks
 Electric circuits, projects planning and many more...

5.2 Graph Classifications


There are several common kinds of graphs
1. Directed or undirected
2. Multigraphs
3. Complete graph
4. Weighted or unweighted

Prof. Shrikant Pujar, Dept. of CS&E 1


DATA STRUCTURES AND APPLICATIONS
1. Digraph: A graph whose edges are directed (i.e have a direction).
 Edge drawn as arrow.
 Edge can only be traversed in direction of arrow.
 Example: E = {(A,B), (B,C), (C,E), (E,D), (D,B), (E,F)}

Undirected Graph: no implied direction on edge between nodes.


 In diagrams, edges have no direction (ie there are no arrows)
 Can traverse edges in either directions

2. Multigraph: A graph with self-loops or parallel edges(multiple edge) between any two vertices is
called a multigraph.

3. Complete graph: A graph is said to be a complete graph, if there exists an edge between every
pair of vertices.

4. Weighted graph: A graph in which a number is assigned to each edge in graph is called weighted
graph. Weight typically shows cost or distance of traversing.
Example: weights are distances between cities.

Unweighted graph: A graph in which a number is not assigned to edge in graph is called
unweighted. Edges simply show connections.

Prof. Shrikant Pujar, Dept. of CS&E 2


DATA STRUCTURES AND APPLICATIONS
 Path: sequence of vertices in which each pair of successive vertices is connected by an edge.
 Cycle: a path that starts and ends on the same vertex.
 Simple path: a path that does not cross itself.
 That is, no vertex is repeated (except first and last)
 Simple paths cannot contain cycles
 Length of a path: Number of edges in the path.

5.3 Matrix and Adjacency List Representation of Graphs:


Two common data structures for representing graphs are:
1. Adjacency matrix
2. Adjacency lists

5.3.1 Adjacency matrix


 An adjacency matrix is used to represent which nodes are adjacent to one another. By definition, two
nodes are said to be adjacent if there is an edge connecting them.
 In a directed graph G, if node v is adjacent to node u, then there is definitely an edge from u to v.
That is, if v is adjacent to u, we can get from u to v by traversing one edge. For any graph G having
n nodes, the adjacency matrix will have the dimension of n * n.
 In an adjacency matrix, the rows and columns are labeled by graph vertices. An entry aij in the
adjacency matrix will contain 1, if vertices vi and vj are adjacent to each other. However, if the
nodes are not adjacent, aij will be set to zero.
 An adjacency matrix contains only 0s and 1s, it is called a bit matrix or a Boolean matrix. The
entries in the matrix depend on the ordering of the nodes in G. Therefore, a change in the order of
nodes will result in a different adjacency matrix.
 Example: directed and undirected graphs with adjacency matrix.

Prof. Shrikant Pujar, Dept. of CS&E 3


DATA STRUCTURES AND APPLICATIONS
5.3.2 Adjacency list:
 An adjacency list is a way in which graphs can be represented in the computer’s memory. This
structure consists of a list of all nodes in G. Furthermore, every node is in turn linked to its own list
that contains the names of all other nodes that are adjacent to it.
 The key advantages of using an adjacency list are:
 It is easy to follow and clearly shows the adjacent nodes of a particular node.
 It is often used for storing graphs that have a small-to-moderate number of edges. That is, an
adjacency list is preferred for representing sparse graphs in the computer’s memory; otherwise,
an adjacency matrix is a good choice.
 Adding new nodes in G is easy and straightforward when G is represented using an adjacency
list. Adding new nodes in an adjacency matrix is a difficult task, as the size of the matrix needs
to be changed and existing nodes may have to be reordered. Each node has a list of adjacent
nodes.
Example: directed and undirected graphs with adjacency list.

 Adjacent list and Adjacent Matrix representation of weighted graph.

Prof. Shrikant Pujar, Dept. of CS&E 4


DATA STRUCTURES AND APPLICATIONS
5.4 Basic Operations on Graphs
Following are basic primary operations of a Graph −
 Add Vertex − Adds a vertex to the graph.
 Delete Vertex – Deletes a vertex from the graph
 Add Edge − Adds an edge between the two vertices of the graph.
 Delete Edge – Deletes an edge from the graph
 Display Vertex − Displays a vertex of the graph.

5.5 Traversal methods:


5.5.1. Breadth First Search:
 Breadth-first search (BFS) is a graph search algorithm that begins at the root node and explores all
the neighbouring nodes. Then for each of those nearest nodes, the algorithm explores their
unexplored neighbour nodes, and so on, until it finds the goal.
 That is, we start examining the node A and then all the neighbours of A are examined. In the next
step, we examine the neighbours of neighbours of A, so on and so forth. This means that we need to
track the neighbours of the node and guarantee that every node in the graph is processed and no node
is processed more than once. This is accomplished by using a queue that will hold the nodes that are
waiting for further processing.
 Algorithm of BFS:

Example: Traverse the following graph by BFS and print all the vertices reachable from start vertex
a.

Prof. Shrikant Pujar, Dept. of CS&E 5


DATA STRUCTURES AND APPLICATIONS

5.5.2. Depth First Search:


 Depth-first search begins at a starting node A which becomes the current node. Then, it examines
each node N along a path P which begins at A. That is, we process a neighbour of A, then a
neighbour of neighbour of A, and so on.
 During the execution of the algorithm, if we reach a path that has a node N that has already been
processed, then we backtrack to the current node. Otherwise, the unvisited (unprocessed) node
becomes the current node. The algorithm proceeds like this until we reach a dead-end (end of path
P). On reaching the deadend, we backtrack to find another path P. The algorithm terminates when
backtracking leads back to the starting node A.
 In this algorithm, edges that lead to a new vertex are called discovery edges and edges that lead to
an already visited vertex are called back edges. Observe that this algorithm is similar to the in-order
traversal of a binary tree. Its implementation is similar to that of the breadth first search algorithm
but here we use a stack instead of a queue.
 Algorithm of DFS:

Prof. Shrikant Pujar, Dept. of CS&E 6


DATA STRUCTURES AND APPLICATIONS
Example: Traverse the following graph by DFS and print all the vertices reachable from start vertex
a.

The vertices reachable from start vertex a are b, d, f, g, e.

5.6 Sorting
 Sorting means arranging the elements of an array so that they are placed in some relevant order
which may be either ascending or descending.

5.6.1 Insertion sort


 Insertion sort is a very simple sorting algorithm in which the sorted array (or list) is built one
element at a time. We all are familiar with this technique of sorting, as we usually use it for ordering
a deck of cards while playing bridge. The main idea behind insertion sort is that it inserts each item
into its proper place in the final list.
 To save memory, most implementations of the insertion sort algorithm work by moving the current
data element past the already sorted values and repeatedly interchanging it with the preceding value
until it is in its correct place.
 Insertion sort is less efficient as compared to other more advanced algorithms such as quick sort,
heap sort, and merge sort.

Prof. Shrikant Pujar, Dept. of CS&E 7


DATA STRUCTURES AND APPLICATIONS
Technique:
1. The array of values to be sorted is divided into two sets. One that stores sorted values and
another that contains unsorted values.
2. The sorting algorithm will proceed until there are elements in the unsorted set.
3. Suppose there are n elements in the array. Initially, the element with index 0 (assuming LB = 0)
is in the sorted set. Rest of the elements are in the unsorted set.
4. The first element of the unsorted partition has array index 1 (if LB = 0).
5. During each iteration of the algorithm, the first element in the unsorted set is picked up and
inserted into the correct position in the sorted set.

Example: Sort the following list using insertion sort – 77, 33, 44, 11, 88, 22, 66, 55

ALGORITHM INSERTION-SORT (ARR, N)


Step 1: Repeat Steps 2 to 5 for K = 1 to N–1
Step 2: SET TEMP = ARR[K]
Step 3: SET J = K - 1
Step 4: Repeat while TEMP <= ARR[J]
SET ARR[J + 1] = ARR[J]
SETJ=J-1 [END OF INNER LOOP]
Step 5: SET ARR[J + 1] = TEMP [END OF LOOP]
Step 6: EXIT

To insert an element A[K] in a sorted list A[0], A[1], ..., A[K–1], we need to compare A[K] with
A[K–1], then with A[K–2], A[K–3], and so on until we meet an element A[J] such that A[J] <=
A[K]. In order to insert A[K] in its correct position, we need to move elements A[K– 1], A[K–2], ...,
A[J] by one position and then A[K] is inserted at the (J+1)th location.
Prof. Shrikant Pujar, Dept. of CS&E 8
DATA STRUCTURES AND APPLICATIONS
5.6.2 Radix Sort
 Radix sort is a linear sorting algorithm for integers and uses the concept of sorting names in
alphabetical order.
 When we have a list of sorted names, the radix is 26 (or 26 buckets) because there are 26 letters in
the English alphabet. So radix sort is also known as bucket sort. Observe that words are first sorted
according to the first letter o/f the name. That is, 26 classes are used to arrange the names, where the
first class stores the names that begin with A, the second class contains the names with B, and so on.
 During the second pass, names are grouped according to the second letter. After the second pass,
names are sorted on the first two letters. This process is continued till the nth pass, where n is the
length of the name with maximum number of letters.
 After every pass, all the names are collected in order of buckets. That is, first pick up the names in
the first bucket that contains the names beginning with A. In the second pass, collect the name from
the second bucket, and so on.
 When radix sort is used on integers, sorting is done on each of the digits in the number. The sorting
procedure proceeds by sorting the least significant to the most significant digit. While sorting the
numbers, we have ten buckets, each for one digit (0, 1, 2, …, 9) and the number of passes will
depend on the length of the number having maximum number of digits.

Algorithm for RadixSort (ARR, N)


Step 1: Find the largest number in ARR as LARGE
Step 2: [INITIALIZE] SET NOP = Number of digits in LARGE
Step 3: SET PASS = 0
Step 4: Repeat Step 5 while PASS <= NOP-1
Step 5: SET= I=0 and INITIALIZE buckets
Step 6: Repeat Steps 7 to 9 while I<N-1
Step 7: SET DIGIT = digit at PASSth place in A[I]
Step 8: Add A[I] to the bucket numbered DIGIT
Step 9: INCREMENT bucket count for bucket numbered DIGIT [END OF LOOP]
Step 10: Collect the numbers in the bucket [END OF LOOP]
Step 11: END

Example: Sort the following list using Radix sort – 348, 143, 361, 423, 538, 128, 321, 543,366

Prof. Shrikant Pujar, Dept. of CS&E 9


DATA STRUCTURES AND APPLICATIONS

Sorted numbers are- 128, 143, 321, 348, 361, 366, 423, 538, 543.
 Difference between Graph and Tree

Prof. Shrikant Pujar, Dept. of CS&E 10


DATA STRUCTURES AND APPLICATIONS
5.7 Hashing
 Hashing is the process of mapping large amount of data item to smaller table with the help
of hashing function.
 Hashing is also known as Hashing Algorithm or Message Digest Function.
 It is a technique to convert a range of key values into a range of indexes of an array.
 It is used to facilitate the next level searching method when compared with the linear or binary
search.
 Hashing allows updating and retrieving any data entry in a constant time.
 Constant time means the operation does not depend on the size of the data.
 Hashing is used with a database to enable items to be retrieved more quickly.
 It is used in the encryption and decryption of digital signatures.

5.7.1 Hash Function:


 A fixed process converts a key to a hash key is known as a Hash Function.
 This function takes a key and maps it to a value of a certain length which is called a Hash
value or Hash.
 Hash value represents the original string of characters, but it is normally smaller than the
original.
 It transfers the digital signature and then both hash value and signature are sent to the receiver.
Receiver uses the same hash function to generate the hash value and then compares it to that
received with the message.
 If the hash values are same, the message is transmitted without errors.

Types of hash function


There are various types of hash function which are used to place the data in a hash table,
1. Division method: In this the hash function is dependent upon the remainder of a division.
H(key)=record% table size.
For example:-if the record 52,68,99,84 is to be placed in a hash table and let us take the table size is
10.
Then: h(key)=record% table size.
2=52%10
8=68%10
9=99%10
4=84%10

2. Mid square method: In this method firstly key is squared and then mid part of the result is taken
as the index.
H(key) = Key2 = mid(key)
For example: consider that if we want to place a record of 3101 and the size of table is 1000. So
3101*3101=9616201 i.e. h (3101) = 162 (middle 3 digit).

Prof. Shrikant Pujar, Dept. of CS&E 11


DATA STRUCTURES AND APPLICATIONS
3. Digit folding method: In this method the key is divided into separate parts and by using some
simple operations these parts are combined to produce a hash key.
H(k) = k1 + k2 + k3 + k4+ . . . . . . +kn
For example: consider a record of 12465512 then it will be divided into parts i.e. 124, 655, 12. After
dividing the parts combine these parts by adding it.
H(key) = 124 + 655 + 12 = 791

5.7.2 Hash Table


 Hash table or hash map is a data structure used to store key-value pairs.
 It is a collection of items stored to make it easy to find them later.
 It uses a hash function to compute an index into an array of buckets or slots from which the
desired value can be found.
 It is an array of list where each list is known as bucket.
 It contains value based on the key.
 Hash table is used to implement the map interface and extends Dictionary class.

Collision
 It is a situation in which the hash function returns the same hash key for more than one
record, it is called as collision. Sometimes when we are going to resolve the collision it may
lead to a overflow condition and this overflow and collision condition makes the poor hash
function.
Collision resolution technique
 If there is a problem of collision occurs then it can be handled by apply some technique. These
techniques are called as collision resolution techniques.

1) Chaining: It is a method in which additional field with data i.e. chain is introduced. A chain is
maintained at the home bucket. In this when a collision occurs then a linked list is maintained for
colliding data.

Example: Let us consider a hash table of size 10 and we apply a hash function of H(key)=key %
size of table. Let us take the keys to be inserted are 31,33,77,61. In the above diagram we can see at
same bucket 1 there are two records which are maintained by linked list or we can say by chaining
method.
Prof. Shrikant Pujar, Dept. of CS&E 12
DATA STRUCTURES AND APPLICATIONS
2) Linear probing: It is very easy and simple method to resolve or to handle the collision. In this
collision can be solved by placing the second record linearly down, whenever the empty place is
found. In this method there is a problem of clustering which means at some place block of a data is
formed in a hash table.
Example: Let us consider a hash table of size 10 and hash function is defined as H(key)=key %
table size. Consider that following keys are to be inserted that are 56,64,36,71.

 In this diagram we can see that 56 and 36 need to be placed at same bucket but by linear probing
technique the records linearly placed downward if place is empty i.e. it can be seen 36 is placed
at index 7.

5.7.3 Static and Dynamic Hashing:


Static Hashing: This is the process of mapping large amounts of data into a table whose size is fixed
during compilation time.

Dynamic Hashing: This is the process of mapping large amounts of data into a table whose size is
assigned during run time.

5.8 Files and Their Organization:


Data Hierarchy:
 The systematic organization of data refers to data hierarchy. It involves various data items such as,
data fields, records, files and database.

File Attributes:
 File attributes are settings associated with computer files that grant or deny certain rights to how a
user or the operating system can access that file. For example, IBM compatible computers running
MS-DOS or Microsoft Windows have capabilities of having read, archive, system, and hidden
attributes.
 Read-only - Allows a file to be read, but nothing can be written to the file or changed.
 Archive - Tells Windows Backup to backup the file.
 System - System file.
 Hidden - File is not shown when doing a regular directory from DOS.
Prof. Shrikant Pujar, Dept. of CS&E 13
DATA STRUCTURES AND APPLICATIONS
In operating systems like Linux, there are three main file attributes: read (r), write (w), execute (x).
 Read - Designated as an "r"; allows a file to be read, but nothing can be written to or changed in
the file.
 Write - Designated as a "w"; allows a file to be written to and changed.
 Execute - Designated as an "x"; allows a file to be executed by users or the operating system.

Text Files and Binary Files:


 A text file stores data in the form of alphabets, digits and other special symbols by storing their
ASCII values and are in a human readable format. For example, any file with a .txt, .c, etc
extension.
 a binary file contains a sequence or a collection of bytes which are not in a human readable
format. For example, files with .exe, .mp3, etc extension.

Basic File Operations:


1) Read Operation: Meant To Read the information which is Stored into the Files.
2) Write Operation: For inserting some new Contents into a File.
3) Rename or Change the Name of File.
4) Copy the File from one Location to another.
5) Sorting or Arrange the Contents of File.
6) Move or Cut the File from One Place to Another.
7) Delete a File
8) Execute Means to Run Means File Display Output.

File Organizations and Indexing:


There are three types of organizing the file:
1. Sequential access file organization
2. Direct access file organization
3. Indexed sequential access file organization

1. Sequential access file organization


 Storing and sorting in contiguous block within files on tape or disk is called as sequential access file
organization.
 In sequential access file organization, all records are stored in a sequential order. The records are
arranged in the ascending or descending order of a key field.
 Sequential file search starts from the beginning of the file and the records can be added at the end of
the file.
 In sequential file, it is not possible to add a record in the middle of the file without rewriting the file.

Advantages of sequential file


 It is simple to program and easy to design.
 Sequential file is best use if storage space.

Prof. Shrikant Pujar, Dept. of CS&E 14


DATA STRUCTURES AND APPLICATIONS
Disadvantages of sequential file
 Sequential file is time consuming process.
 It has high data redundancy.
 Random searching is not possible.

2. Direct access file organization


 Direct access file is also known as random access or relative file organization.
 In direct access file, all records are stored in direct access storage device (DASD), such as hard disk.
The records are randomly placed throughout the file.
 The records does not need to be in sequence because they are updated directly and rewritten back in
the same location.
 This file organization is useful for immediate access to large amount of information. It is used in
accessing large databases.
 It is also called as hashing.

Advantages of direct access file organization


 Direct access file helps in online transaction processing system (OLTP) like online railway
reservation system.
 In direct access file, sorting of the records are not required.
 It accesses the desired records immediately.
 It updates several files quickly.
 It has better control over record allocation.

Disadvantages of direct access file organization


 Direct access file does not provide backup facility.
 It is expensive.
 It has less storage space as compared to sequential file.

3. Indexed sequential access file organization


 Indexed sequential access file combines both sequential file and direct access file organization.
 In indexed sequential access file, records are stored randomly on a direct access device such as
magnetic disk by a primary key.
 This file have multiple keys. These keys can be alphanumeric in which the records are ordered is
called primary key.
 The data can be access either sequentially or randomly using the index. The index is stored in a file
and read into memory when the file is opened.

Advantages of Indexed sequential access file organization


 In indexed sequential access file, sequential file and random file access is possible.
 It accesses the records very fast if the index table is properly organized.
 The records can be inserted in the middle of the file.
 It provides quick access for sequential and direct processing.

Prof. Shrikant Pujar, Dept. of CS&E 15


DATA STRUCTURES AND APPLICATIONS
Disadvantages of Indexed sequential access file organization
 Indexed sequential access file requires unique keys and periodic reorganization.
 Indexed sequential access file takes longer time to search the index for the data access or
retrieval.
 It requires more storage space.
 It is expensive because it requires special software.
 It is less efficient in the use of storage space as compared to other file organizations.

QUESTION BANK

1. Define the following terminologies with examples.


i. Graph ii. Complete graph iii. Multigraph
iv. Directed & undirected graph v. weighted and unweighted graph
2. Give the adjacent matrix and adjacency list representation of following graphs.

3. Write an algorithm for DFS and BFS graph traversals.


4. Apply insertion sort technique for the following elements: 77, 33, 44, 11, 88, 22, 66, 55.
5. Explain hashing and collision. What are methods used to resolve collision.
6. What are the basic operations that can be performed on a file? List the methods used for file
organization (any two).
7. Write difference between graph and tree.
8. What is collision? Explain Linear probing with an example.
9. Write an algorithm for insertion sort and sort the list: 50, 30, 10, 70, 40, 20, 60.
10. Explain the radix sort with example.
11. Define hashing and Explain Hashing Functions.

Prof. Shrikant Pujar, Dept. of CS&E 16

You might also like