Unit 4
Unit 4
Hashing
• In a hashed search, the key, through an algorithmic function, determines the
location of the data.
• Because we are searching an array, we use a hashing algorithm to transform the key into the
index that contains the data we need to locate.
• Hashing is a key-to-address transformation in which the keys map to addresses in
a list.
• If the actual data that we insert into our list contain two or more synonyms, we
can have collisions.
• A collision occurs when a hashing algorithm produces an address for an insertion
key and that address is already occupied.
Collision
Hashing Methods
• Direct Method
• Modulo-division Method
• Midsquare Method
Modulo-division Method
• This method gives us the simple hashing algorithm shown below in which listSize
is the number of elements in the array:
• address = key MODULO listSize
• This algorithm works with any list size, but a list size that is a prime number
produces fewer collisions than other list sizes.
Modulo-division Method
121267/307 = 395 with remainder of 2
Therefore: hash(121267) = 2
Midsquare Method
• In midsquare hashing the key is squared and the address is selected
from the middle of the squared number.
• The most obvious limitation of this method is the size of the key.
Given a key of six digits, the product will be 12 digits, which is beyond
the maximum integer size of many computers.
Square(9452) = 89340304: address is 3403
• As a variation on the midsquare method, we can select a portion of the
key, such as the middle three digits, and then use them rather than the
whole key.
Collision Resolution
• Linear Probing or Closed Hashing
• Chaining or Open Hashing
• Bucket Hashing
Linear Probing
• This is the simplest way to resolve the collision by placing the record
in the next available location in the array.
• Here array is treated as circular.
• Example: If K,P and Y all are to be placed at the same location, use the
next available slots for each of them.
Linear Probing
Chaining
• A major disadvantage to open addressing is that each collision
resolution increases the probability of future collisions. This
disadvantage is eliminated in the second approach to collision
resolution: Chaining.
• It involves in keeping a distinct linked list for all records whose key
hashes to a particular value.
Chaining
Bucket Hashing
• Another approach to handling the collision problems is bucket
hashing, in which keys are hashed to buckets, nodes that
accommodate multiple data occurrences.
• Because a bucket can hold multiple data, collisions are postponed
until the bucket is full.
• There are two problems with this concept.
• First, it uses significantly more space because many of the buckets are empty or
partially empty at any given time.
• Second, it does not completely resolve the collision problem. At some point a
collision occurs and needs to be resolved. When it does, a typical approach is to use a
linear probe, assuming that the next element has some empty space.
Bucket Hashing
Graphs
• Graphs can be used in the areas like Computer networks, Representation of
topological information (maps), precedence graphs, workflows, Semantic
networks (e.g., entity-relationship diagrams), Finding Minimum spanning tree,
Finding the shortest path etc.
Example
Graphs
• A graph is a pair G = (V, E) where V is a set of vertices and E is a binary relation on
V.
• E contains a pair (u, v) if there is an edge between the u and v vertices. If the
graph is directed, this pair is ordered: (u, v) is not the same as (v, u).
• Weighted Graph: A weighted graph is graph which has a value associated with each edge. This can be a
distance, or cost, or some other numeric value associated with the edge.
• Degree of a vertex: the number of edges joined to that vertex. If a vertex has no edges, its degree is zero.
• Path: A path is a sequence of vertices traversed by following the edges between them.
• Cycle: A cycle is a path that includes the same vertex two or more times. A graph without cycles is
acyclic. Directed Acyclic Graphs are called dags.
• Sparse graph: A Sparse graph is a graph in which the numbers of edges are much less than the maximal
number of edges i.e. |E| is much less than |V|2.
• Dense graph: A dense graph is a graph in which the numbers of edges are close to the maximal number
of edges i.e. |E| is close to |V|2
Representation of Graphs
• Set representation
• Adjacency list and matrix
Set Representation
• Set of vertices v
• Set of edges E
• V(G)= {0,1,2,3,4}
• E(G)= {(0,1), (1,4), (0,4), (1,3), (3,4), (1,2), (2,3)}
Adjacency Matrix
• The matrix which keeps the information of adjacent nodes.
• The adjacency matrix uses a vector (one-dimensional array) for the vertices and a
matrix (two-dimensional array) to store the edges.
• If two vertices are adjacent—that is, if there is an edge between them—the
matrix intersect has a value of 1; if there is no edge between them, the intersect
is set to 0.
Adjacency List
• The adjacency list uses a two-dimensional ragged array to store the edges.
• The vertex list is a singly linked list of the vertices in the list.
• Depending on the application, it could also be implemented using doubly
linked lists or circular linked lists.
Graph Traversal Methods
• Depth First Search: In the depth-first traversal, we process all of a
vertex’s descendants before we move to an adjacent vertex.
Graph Traversal Methods
• Breadth First Search: In the breadth-first traversal of a graph, we
process all adjacent vertices of a vertex before going to the next level.
Weighted Graph
• A network is a graph whose lines are weighted. It is also known as a
weighted graph.
• The meaning of the weights depends on the application. For example, an
airline might use a graph to represent the routes between cities that it
serves.
• In this example the vertices represent the cities and the edge a route
between two cities.
• The edge’s weight could represent the miles between the two cities or the
price of the flight.
Weighted Graph
Spanning Tree
• We can derive one or more spanning trees from a connected
network. A spanning tree is a tree that contains all of the vertices in
the graph.
• One interesting algorithm derives the minimum spanning tree of a
network such that the sum of its weights is guaranteed to be minimal.
• If the weights in the network are unique, there is only one minimum spanning
tree.
Spanning Tree
• To create a minimum spanning tree in a strongly connected network— that
is, in a network in which there is a path between any two vertices—the
edges for the minimum spanning tree are chosen so that the following
properties exist:
1. Every vertex is included.
2. The total edge weight of the spanning tree is the minimum possible that
includes a path between any two vertices.
• A minimum spanning tree (MST) exists only if the graph is connected.
• There are two famous algorithms for this problem:
• Prim’s Algorithm
• Kruskal’s Algorithm
Minimum Spanning Tree example
Minimum Spanning Tree (MST)
Kruskal’s algorithm
• In an Euler’s path, if the starting vertex is same as its ending vertex, then it is
called an Euler’s circuit.
Euler Graphs
• Eulerian tour- a path that contains all edges without repetition.
• Eulerian circuit – a path that contains all edges without repetition and starts and
ends in the same vertex.
• Eulerian graph – a graph that contains an Eulerian circuit.
• An Euler circuit exists if and only if the graph is connected and the number of
neighbors of each vertex is even.
• A connected undirected graph is Eulerian if and only if every graph vertex
has an even degree, or exactly two vertices with an odd degree.
• Start with any node, select any untraversed outgoing edge, and follow it. Repeat
until there are no more remaining unselected outgoing edges.
Euler Graphs
• A legal Euler Circuit of this graph is 0 1 3 4 1 2 3 5 4 2 0.
Hamiltonian Graphs
• A connected graph is said to be Hamiltonian if it contains each vertex of G exactly once.
Such a path is called a Hamiltonian path.
• A Hamiltonian circuit ends up at the vertex from where it started
• G has four vertices with odd degree, hence it is not traversable. By skipping the
internal edges, the graph has a Hamiltonian cycle passing through all the vertices.
Example
• On the left a graph which is Hamiltonian and non-Eulerian and on the
right a graph which is Eulerian and non-Hamiltonian.
Question: Is the following graph Hamiltonian or
Eulerian or both?
Topological Sort
• Topological sort is an ordering of vertices in a directed acyclic graph [DAG] in
which each node comes before all nodes to which it has outgoing edges.
• Example, consider the course prerequisite structure at universities. A directed
edge (v,w) indicates that course v must be completed before course w.
Topological ordering for this example is the sequence which does not violate
the prerequisite requirement.
• Every DAG may have one or more topological orderings.
• Topological sort is not possible if the graph has a cycle, since for two vertices v
and w on the cycle, v precedes w and w precedes v.
Topological Sort
• Initially, indegree is computed for all vertices, starting with the vertices which are
having indegree 0. That means consider the vertices which do not have any
prerequisite.
• To keep track of vertices with indegree zero we can use a queue.
• All vertices of indegree 0 are placed on queue. While the queue is not empty, a
vertex v is removed, and all edges adjacent to v have their indegrees decremented.
• A vertex is put on the queue as soon as its indegree falls to 0. The topological
ordering is the order in which the vertices DeQueue.
Example
Step 1: Write in-degree of each vertex.
Step 2:
• Remove the vertex with in-degree 0 and its associated edges.
• Add the vertex in the queue of topological sort.
• Now, update the in-degree of other vertices.
Step 3: Keep on removing the vertex with the least indegree and repeat
the process of Step 2.
For this graph, following 4 different topological orderings are
possible-
•1 2 3 4 5 6
•1 2 3 4 6 5
•1 3 2 4 5 6
•1 3 2 4 6 5
Problem 1:
• Find the number of different topological orderings possible for the
given graph-
Step 1:
• Write in-degree of each vertex-
Step 2:
• Vertex-A has the least in-degree.
• So, remove vertex-A and its associated edges.
• Now, update the in-degree of other vertices.
Step 3:
• Vertex-B has the least in-degree.
• So, remove vertex-B and its associated edges.
• Now, update the in-degree of other vertices.
Step 4:
• There are two vertices with the least in-degree. So, following 2 cases
are possible-
Step 5:
• Now, the above two cases are continued separately in the similar
manner.
Problem 2
Find the number of different topological orderings possible for the given
graph-
Answer:
For the given graph, following 4 different topological orderings are
possible-
•123456
•123465
•132456
•132465
Transitive Closure
• Given a directed graph, find out if a vertex j is reachable from another
vertex i for all vertex pairs (i, j) in the given graph.
• Here reachable mean that there is a path from vertex i to j.
• The reach-ability matrix is called the transitive closure of a graph.
• Warshall algorithm is used to construct transitive closure of a matrix.
Example
Shortest Path Algorithms
• Given a graph G = (V, E) and a distinguished vertex s, we need to find the shortest
path from s to every other vertex in G.
• There are variations in the shortest path algorithms which depend on the type of
the input graph and are given below.
• The algorithm works by keeping the shortest distance of vertex v from the source
in the Distance table.
• The value Distance[v] holds the distance from s to v.
• The shortest distance of the source to itself is zero.
• The Distance table for all other vertices is set to –1 to indicate that those vertices
are not already processed.
Example
Example contd.
• After the first step, from vertex A, we can reach B and C. So, in the
Distance table we update the reachability of B and C with their costs
and the same is shown below.