0% found this document useful (0 votes)
5 views

Data Structures and Algorithms

Uploaded by

aditijadhav725
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Structures and Algorithms

Uploaded by

aditijadhav725
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

DATA STRUCTURES AND ALGORITHMS

Semester

1. Let G be a graph whose vertices are the integers 1 through 8, and let the adjacent vertices of
each vertex be given by the table below:

Assume that, in a traversal of G, the adjacent vertices of a given vertex are returned in the
same order as they are listed in the above table.
a. Draw G.
b. Order the vertices as they are visited in a DFS traversal starting at vertex 1.
c. Order the vertices as they are visited in a BFS traversal starting at vertex 1.

Answer –

a.

b. Give the sequence of vertices of G visited using a DFS traversal starting at vertex 1.

Depth First Search visits all the children until the bottom is reached of all child nodes is
reached.

There are many possible DFS traversals. One DFS traversal is:

1, 2, 3, 4, 6, 5, 7, 8.

Starting at 1, The first child is visited which is 2. Next the first child of 2 has (1) already been
visited so we look at 3. The next children of 3 have already been looked at (1 and 2), so now
we look at 4. The first child of 4 not looked at is 6. The next child not looked at from 6 is 5. The
children of 5 not yet looked at is 7. The last child of 7 not looked at is 8. In DFS, we always look
at children, children of children, until the bottom, and then backtrack up one step (one vertex)
at a time, and check to see if there are any other paths that we could possibly take.

c. Give the sequence of vertices visited using a BFS traversal starting at vertex 1.

BFS is Level Order Traversal. There are many possible BFS traversals. One BFS traversal is 1, 2,
3, 4, 6, 5, 7, 8.

We look at 1 first, then all of its children 2, 3, 4. Now we look at the children of 2 not already
looked at yet which 4. Now we look at the children of 4 not looked at yet is 6. All the children
of 6 not looked at yet are 5 and then 7. Finally, the last child of 5 is 8.

2. Suppose you have analysed a file and determined that the frequency of occurrence of certain
characters is as follows:

a) Construct the Huffman tree for the characters


b) List the codes for each character
c) Use the tree to compress the following strings:
i) ‘faded’ ii) ‘feed’

Answer –
(a) Huffman Tree Construction

1. Create a priority queue with characters and their frequencies:

| Character | Frequency |
| --- | --- |
| e | 30 |
| a | 15 |
|d|8|
| f | 10 |
|b|7|
|c|5|

1. Combine the two nodes with the lowest frequencies:

(5, c) + (7, b) = 12 (cb)

| Character | Frequency |
| --- | --- |
| e | 30 |
| a | 15 |
|d|8|
| f | 10 |
| cb | 12 |

1. Continue combining nodes:

(8, d) + (10, f) = 18 (df)

| Character | Frequency |
| --- | --- |
| e | 30 |
| a | 15 |
| cb | 12 |
| df | 18 |

1. Combine again:

(12, cb) + (15, a) = 27 (cba)

| Character | Frequency |
| --- | --- |
| e | 30 |
| df | 18 |
| cba | 27 |

1. Final combinations:

(18, df) + (27, cba) = 45 (dfcba)

(30, e) + (45, dfcba) = 75 (edfcba)

Huffman Tree:

+---------------+
| edfcba |
+---------------+
/ \
+---------------+ +---------------+
| e | | dfcba |
+---------------+ +---------------+
/ \
+---------------+ +---------------+
| df | | cba |
+---------------+ +---------------+
/ \ / \
+---------------+ +---------------+ +---------------+
| d || f || cb |
+---------------+ +---------------+ +---------------+
/ \
+---------------+ +---------------+
| c || b |
+---------------+ +---------------+
(b) Huffman Codes for Each Character

| Character | Code |
| --- | --- |
|e|0|
| a | 10 |
| d | 110 |
| f | 111 |
| c | 1000 |
| b | 1001 |

(c) Compression

i) 'faded'

Original: f-a-d-e-d
Huffman Codes: 111-10-110-0-110
Compressed: 1111011000110

ii) 'feed'

Original: f-e-e-d
Huffman Codes: 111-0-0-110
Compressed: 11100110

Uncompressed sizes:

- 'faded': 5 characters × 8 bits/character = 40 bits


- 'feed': 4 characters × 8 bits/character = 32 bits

Compressed sizes:

- 'faded': 13 bits
- 'feed': 8 bits

Compression ratios:

- 'faded': 40/13 ≈ 3.08


- 'feed': 32/8 = 4

3. Find the DFS traversal for the graph given below start node at 1. show the stack trace at each
step of the DFS traversal provide the DFS tree. [5Marks]
Answer –
Graph Description
The graph has the following connections:
• Node 1 is connected to nodes 2 and 3.
• Node 2 is connected to nodes 5 and 6.
• Node 3 is connected to node 4.
• Node 4 is connected to node 7.
• Node 7 is connected to node 8.
DFS Traversal Steps and Stack Trace
1. Start at Node 1:
o Stack: 1
o Visited: {1}
2. Move to Node 2:
o Stack: [2, 1]
o Visited: {1, 2}
3. Move to Node 5:
o Stack: [5, 2, 1]
o Visited: {1, 2, 5}
4. Node 5 has no unvisited adjacent nodes, backtrack:
o Stack: [2, 1]
o Visited: {1, 2, 5}
5. Move to Node 6:
o Stack: [6, 2, 1]
o Visited: {1, 2, 5, 6}
6. Node 6 has no unvisited adjacent nodes, backtrack:
o Stack: [2, 1]
o Visited: {1, 2, 5, 6}
7. Node 2 has no more unvisited adjacent nodes, backtrack:
o Stack: 1
o Visited: {1, 2, 5, 6}
8. Move to Node 3:
o Stack: [3, 1]
o Visited: {1, 2, 3, 5, 6}
9. Move to Node 4:
o Stack: [4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6}
10. Move to Node 7:
o Stack: [7, 4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7}
11. Move to Node 8:
o Stack: [8, 7, 4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
12. Node 8 has no unvisited adjacent nodes, backtrack:
o Stack: [7, 4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
13. Node 7 has no more unvisited adjacent nodes, backtrack:
o Stack: [4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
14. Node 4 has no more unvisited adjacent nodes, backtrack:
o Stack: [3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
15. Node 3 has no more unvisited adjacent nodes, backtrack:
o Stack: 1
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
16. Node 1 has no more unvisited adjacent nodes, traversal complete:
o Stack: []
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
DFS Tree
The DFS tree based on the traversal is:

1
├── 2
│ ├── 5
│ └── 6
└── 3
└── 4
└── 7
└── 8

4. Construct the Binary search tree for values 15, 20, 10, 8, 25, 12, 16.
Write the inorder successor of 12 and 15 from the constructed tree.

Answer –
find the inorder successors of 12 and 15.
Step 1: Construct the BST
Insert the values one by one into the BST:
1. Insert 15: Root node.
2. Insert 20: 20 > 15, so it goes to the right of 15.
3. Insert 10: 10 < 15, so it goes to the left of 15.
4. Insert 8: 8 < 10, so it goes to the left of 10.
5. Insert 25: 25 > 20, so it goes to the right of 20.
6. Insert 12: 12 > 10, so it goes to the right of 10.
7. Insert 16: 16 < 20, so it goes to the left of 20.
The constructed BST looks like this:

15
/ \
10 20
/ \ / \
8 12 16 25

Step 2: Find the Inorder Successors


Inorder Traversal of the BST:
The inorder traversal of a BST visits nodes in ascending order. For the given tree, the inorder
traversal is: 8, 10, 12, 15, 16, 20, 25.
Inorder Successor of 12:
The inorder successor of a node is the next node in the inorder traversal. For 12, the next
node is 15.
Inorder Successor of 15:
For 15, the next node in the inorder traversal is 16.

5. Compress ‘BILL BEATS BEN.’ using the Huffman approach.

Answer –
Huffman Coding Compression

Step 1: Calculate Frequency

| Character | Frequency |
| --- | --- |
|B|3|
|E|2|
|L|2|
|I|1|
|T|1|
|N|1|
|A|1|
|S|1|
| . | 1 (period) |

Step 2: Build Huffman Tree

Combine nodes with lowest frequencies:

(1, .) + (1, T) = 2 (T.)


(1, N) + (1, A) = 2 (NA)
(1, S) + (1, I) = 2 (SI)
(2, L) + (2, E) = 4 (LE)
(2, SI) + (2, NA) = 4 (SINA)
(3, B) + (4, LE) = 7 (BLE)
(4, SINA) + (7, BLE) = 11 (all)

Huffman Tree:

+---------------+
| all |
+---------------+
/ \
+---------------+ +---------------+
| BLE | | SINA |
+---------------+ +---------------+
/ \ / \

+-------+ +-------+ +-------+ +-------+


| B | | LE | | SI | | NA |
+-------+ +-------+ +-------+ +-------+
/ \ /
+---+ +---+ +---+ +---+
|E||L| |I||S|
+---+ +---+ +---+ +---+
/
+---+ +---+
|N||A|
+---+ +---+
/
+---+
| T. |
+---+

Step 3: Assign Huffman Codes

| Character | Code |
| --- | --- |
|B|0|
| E | 100 |
| L | 101 |
| I | 1100 |
| T | 11010 |
| N | 11011 |
| A | 1110 |
| S | 1111 |
| . | 1101 |

Step 4: Compress

Original: BILL BEATS BEN.


Huffman Codes:
0-0-101-101-100-0-1111-100-1100-1110-1101

Compressed: 0010110100101111001001100111011101

Uncompressed size: 12 characters × 8 bits/character = 96 bits


Compressed size: 32 bits
Compression ratio: 96/32 = 3

Note: This is a simple example of Huffman coding. Actual implementations may use more
complex techniques for optimal compression.

6. Given the following directed acyclic graph (DAG) with vertices labelled as numbers, perform
topological sorting on the graph:
Vertices: {5, 3, 2, 8, 6}
Edges: (5, 3), (5, 2), (3, 8), (2, 8), (6, 8), (6, 3)
Show each step of the sorting process and the final sorted result. [5Marks]

Answer –

Let’s perform a topological sort on the given directed acyclic graph (DAG). Here are the
vertices and edges:
• Vertices: {5, 3, 2, 8, 6}
• Edges: (5, 3), (5, 2), (3, 8), (2, 8), (6, 8), (6, 3)
Steps for Topological Sorting:
1. Identify vertices with no incoming edges:
o Vertices 5 and 6 have no incoming edges.
2. Select one of these vertices:
o Let’s start with vertex 5.
3. Remove vertex 5 and its outgoing edges:
o Remove edges (5, 3) and (5, 2).
o Remaining vertices: {3, 2, 8, 6}
o Remaining edges: (3, 8), (2, 8), (6, 8), (6, 3)
4. Add vertex 5 to the sorted list:
o Sorted list: 5
5. Identify vertices with no incoming edges:
o Vertices 6 and 2 have no incoming edges.
6. Select one of these vertices:
o Let’s choose vertex 6.
7. Remove vertex 6 and its outgoing edges:
o Remove edges (6, 8) and (6, 3).
o Remaining vertices: {3, 2, 8}
o Remaining edges: (3, 8), (2, 8)
8. Add vertex 6 to the sorted list:
o Sorted list: [5, 6]
9. Identify vertices with no incoming edges:
o Vertices 2 and 3 have no incoming edges.
10. Select one of these vertices:
o Let’s choose vertex 2.
11. Remove vertex 2 and its outgoing edge:
o Remove edge (2, 8).
o Remaining vertices: {3, 8}
o Remaining edge: (3, 8)
12. Add vertex 2 to the sorted list:
o Sorted list: [5, 6, 2]
13. Identify vertices with no incoming edges:
o Vertex 3 has no incoming edges.
14. Select vertex 3:
o Remove vertex 3 and its outgoing edge (3, 8).
o Remaining vertex: {8}
o No remaining edges.
15. Add vertex 3 to the sorted list:
o Sorted list: [5, 6, 2, 3]
16. Identify vertices with no incoming edges:
o Vertex 8 has no incoming edges.
17. Select vertex 8:
o Add vertex 8 to the sorted list.
o Sorted list: [5, 6, 2, 3, 8]
Final Topologically Sorted Order:
[5, 6, 2, 3, 8]

7. Consider the following directed graph G. Using Dijkstra’s algorithm, find the sequence of
vertices to be visited if the starting vertex is G. Show the step by step selection of each vertex
in the graph with the updated distance. At the end show the sequence of vertices visited by
following Dijkstra's algorithm. [5Marks]

Answer –

Let’s apply Dijkstra’s algorithm to the given directed graph starting from vertex G. Here are
the steps:
Step 1: Initialize Distances
• Set the distance to the starting vertex G to 0.
• Set the distance to all other vertices to infinity.
• Mark all vertices as unvisited.
Step 2: Select the Vertex with the Smallest Distance
• Initially, this is vertex G with a distance of 0.
Step 3: Update Distances of Adjacent Vertices
• Update the distances of vertices adjacent to G.
Step 4: Mark the Current Vertex as Visited
• Mark G as visited and move to the next vertex with the smallest distance.
Step-by-Step Execution
Start at Vertex G:
o Distance: G = 0, others = ∞
o Adjacent vertices: F (distance 4), E (distance 5)
o Updated distances: F = 4, E = 5
o Mark G as visited.
Select Vertex F (smallest distance 4):
o Distance: F = 4
o Adjacent vertices: C (distance 10 via F)
o Updated distances: C = 10, E remains 5 (since 5 < 4 + 6)
o Mark F as visited.
Select Vertex E (smallest distance 5):
o Distance: E = 5
o Adjacent vertices: D (distance 9 via E)
o Updated distances: D = 9, C remains 10
o Mark E as visited.
Select Vertex D (smallest distance 9):
o Distance: D = 9
o Adjacent vertices: None
o No updates needed.
o Mark D as visited.
Select Vertex C (smallest distance 10):
o Distance: C = 10
o Adjacent vertices: A (distance 11 via C)
o Updated distances: A = 11
o Mark C as visited.
Select Vertex A (smallest distance 11):
o Distance: A = 11
o Adjacent vertices: None
o No updates needed.
o Mark A as visited.
Sequence of Vertices Visited
The sequence of vertices visited by following Dijkstra’s algorithm starting from G is: G -> F -> E
-> D -> C -> A
Vertices B and H are not reachable from G based on the given graph structure.

8. Question –
Construct the AVL tree with the following numbers (by mentioning the rotations used)
12, 15, 36, 17, 34, 85, 64, 19, 3 and find the balance factor of the nodes: 64 and 34.

Answer –
Let’s construct the AVL tree with the given numbers and mention the rotations used.
After constructing the tree, we’ll find the balance factors of the nodes 64 and 34.
Step-by-Step Construction of the AVL Tree
Insert 12:
o Tree: 12
Insert 15:
o Tree:
12
\
15
Insert 36:
o Tree:
12
\
15
\
36
o No rotation needed.
Insert 17:
o Tree:
12
\
15
\
36
/
17
o No rotation needed.
Insert 34:
o Tree:
12
\
15
\
36
/
17
\
34
o No rotation needed.
Insert 85:
o Tree:
12
\
15
\
36
/ \
17 85
\
34
o No rotation needed.
Insert 64:
o Tree:
12
\
15
\
36
/ \
17 85
\ /
34
64
o No rotation needed.
Insert 19:
o Tree:
12
\
15
\
36
/ \
17 85
\ /
34
/ 64
19
o Left-Right (LR) Rotation at node 36:
First, perform a left rotation on 17:
12
\
15
\
36
/ \
19 85
/ /
17 64
34
Then, perform a right rotation on 36:
12
\
15
\
19
/ \
17 36
/ \
34 85
64
Insert 3:
o Tree:
12
/ \
3 15
\
19
/ \
17 36
/ \
34 85
64
o No rotation needed.
Final AVL Tree
12
/ \
3 15
\
19
/ \
17 36
/ \
34 85
64

Balance Factors

• Node 64: Balance factor = Height of left subtree - Height of right subtree = 0 - 0 = 0
• Node 34: Balance factor = Height of left subtree - Height of right subtree = 0 - 0 = 0

9. Question –
Answer the following questions
a) Give a word which is the only non-hyphenated word in the English language with three
consecutive double letters. (0.5M)
b) Give the frequency of each letter. (0.5M)
c) Explain which letter you would expect to find closest to the top of the Huffman Tree when
created for the expected word. (0.5M).
d) Determine the percentage of vowels in the word. (0.5M)

Answer –

(a) Word with three consecutive double letters

The word is: Bookkeeper

(b) Frequency of each letter

| Letter | Frequency |
| --- | --- |
|B|1|
|O|2|
|O|2|
|K|2|
|K|2|
|E|2|
|E|2|
|P|1|
|E|1|
|R|1|

Letter frequency table:

| Letter | Frequency |
| --- | --- |
|O|2|
|K|2|
|E|3|
|B|1|
|P|1|
|R|1|

(c) Letter closest to the top of the Huffman Tree

The letter 'E' would be closest to the top of the Huffman Tree since it has the highest
frequency (3) among all letters.

(d) Percentage of vowels

Vowels: O, E (total occurrences: 2 + 3 = 5)


Total letters: 10
Percentage of vowels = (5/10) × 100% = 50%

The word "Bookkeeper" is 50% vowels.

Midsemester

1. Given the top pointer of stack implemented using arrays, write an algorithm to print
minimum element of stack and find its running time complexity?

Answer –

Algorithm to Print Minimum Element of Stack

def print_min_element(stack, top):


if top == -1: # Stack is empty
return None

min_element = stack[0]
for i in range(1, top + 1):
if stack[i] < min_element:
min_element = stack[i]

return min_element

# Alternatively, you can maintain a separate variable to keep track of the minimum
element
def print_min_element_optimized(stack, top, min_element):
if top == -1: # Stack is empty
return None

return min_element
# To update min_element when pushing/popping elements
def update_min_element(min_element, new_element):
if new_element < min_element:
return new_element
return min_element

def push(stack, top, max_size, new_element):


if top == max_size - 1: # Stack is full
return top

top += 1
stack[top] = new_element
min_element = update_min_element(min_element, new_element)
return top, min_element

def pop(stack, top, min_element):


if top == -1: # Stack is empty
return top, min_element

popped_element = stack[top]
top -= 1
if popped_element == min_element:
min_element = find_new_min(stack, top)
return top, min_element

def find_new_min(stack, top):


min_element = stack[0]
for i in range(1, top + 1):
if stack[i] < min_element:
min_element = stack[i]
return min_element

Running Time Complexity:

- print_min_element: O(n), where n is the number of elements in the stack.


- print_min_element_optimized: O(1), since we maintain a separate variable to track
the minimum element.
- push: O(1), since we update the minimum element when pushing a new element.
- pop: O(n) in the worst case, when the popped element is the minimum and we need
to find the new minimum.

Optimization:

To achieve O(1) time complexity for print_min_element, maintain a separate variable


min_element that tracks the minimum element in the stack. Update this variable
when pushing or popping elements.

Space Complexity:
O(n), where n is the maximum size of the stack, since we use an array to implement
the stack.

2. Sort the following array in ascending array using heap sort. Show all the steps.
20,10,5, 7,9,25,30,15,12,35

Answer –

Here are the steps to sort the array using Heap Sort:

Initial Array:
20, 10, 5, 7, 9, 25, 30, 15, 12, 35

Step 1: Build Max Heap

1. Start with the last non-leaf node (index 4) and perform heapify.
2. Compare 9 with its children (25, 30). Since 30 is larger, swap 9 and 30.
3. Compare 30 with its children (25, none). No swap needed.
4. Move to the next non-leaf node (index 3) and repeat.
5. Compare 7 with its children (15, 12). Since 15 is larger, swap 7 and 15.
6. Compare 15 with its children (none). No swap needed.
7. Continue this process until the root node.

Max Heap:
35, 30, 25, 20, 15, 12, 10, 9, 7, 5

Step 2: Extract Maximum Element (Root) and Heapify

1. Swap root (35) with the last element (5).


2. Remove the last element (5) from the heap.
3. Heapify the reduced heap.

Heap after Extraction:


30, 20, 25, 15, 12, 10, 9, 7

Repeat Step 2 until the heap is empty

Iteration 2:
Swap 30 with 7. Remove 30.
Heapify: 25, 20, 15, 12, 10, 9

Iteration 3:
Swap 25 with 9. Remove 25.
Heapify: 20, 15, 12, 10, 9

Iteration 4:
Swap 20 with 9. Remove 20.
Heapify: 15, 12, 10

Iteration 5:
Swap 15 with 10. Remove 15.
Heapify: 12

Iteration 6-9:
Remove remaining elements.

Sorted Array:
5, 7, 9, 10, 12, 15, 20, 25, 30, 35

Heap Sort has a time complexity of O(n log n) and is an efficient sorting algorithm
for large datasets.

3. Question –
Your company has assigned a task for you to develop software for a small inventory
management system used by a local grocery store. The system needs to maintain an
ordered list of products based on their expiry dates, with the product closest to
expiration appearing first. The list will be updated frequently as new products arrive
and as existing products are sold. [ 5Marks]

i) Considering the requirements of the inventory management system, discuss the


suitability of using the Insertion Sort algorithm for the grocery store.
ii). Would you recommend any other algorithms.? Justify your recommendation.

Answer –

(i) Suitability of Insertion Sort

Insertion Sort can be suitable for the inventory management system due to the
following reasons:

1. Frequent Updates: Insertion Sort is efficient for small, frequent updates, as it only
sorts the newly inserted element.
2. Maintaining Order: Insertion Sort preserves the existing order, ensuring the list
remains sorted by expiry dates.
3. Simple Implementation: Insertion Sort has a straightforward implementation.

However, Insertion Sort also has some limitations:

1. Time Complexity: Insertion Sort has a worst-case and average time complexity of
O(n^2), which may lead to performance issues for large inventory lists.
2. Scalability: As the inventory grows, Insertion Sort's efficiency decreases.

(ii) Alternative Algorithms


Considering the potential scalability issues with Insertion Sort, I recommend the
following algorithms:

1. Binary Search Tree (BST):


- Time Complexity: O(log n) for insertion, deletion, and search operations.
- Efficient Updates: BST maintains the sorted order while handling frequent
updates.
- Scalability: Suitable for large inventory lists.
2. Heap Data Structure:
- Time Complexity: O(log n) for insertion and deletion operations.
- Maintaining Order: Heap ensures the product closest to expiration appears first.
- Scalability: Efficient for large inventory lists.
3. Balanced Binary Search Tree (AVL/RB Tree):
- Time Complexity: O(log n) for insertion, deletion, and search operations.
- Self-Balancing: Ensures the tree remains approximately balanced, maintaining
efficiency.

These data structures and algorithms offer better scalability and performance for
large inventory lists, making them more suitable for the grocery store's inventory
management system.

Justification:

- Frequent updates and maintenance of sorted order require efficient algorithms with
low time complexity.
- Scalability is crucial to handle growing inventory lists.
- BST, Heap, and Balanced BST algorithms provide optimal performance and
scalability.

Recommendation:

- Implement a Binary Search Tree (BST) or Heap data structure to manage the
inventory list.
- Consider using a self-balancing BST (AVL/RB Tree) for optimal performance.

By choosing an efficient algorithm and data structure, the inventory management


system will maintain optimal performance, scalability, and accuracy, ensuring the
grocery store's products are always organized by expiry date.

4. Question –
Consider an application scenario where Bucket Sort is utilized for sorting large- scale
genomic data consisting of DNA sequences. Discuss how the choice of bucket size
impacts the efficiency and correctness of the sorting process in this context.

Answer –

Bucket Sort for Genomic Data


Bucket Sort is a suitable algorithm for sorting large-scale genomic data, particularly
DNA sequences, due to its:

1. Efficiency: Bucket Sort has a time complexity of O(n + k) in the best case, where n is
the number of elements and k is the number of buckets.
2. Stability: Bucket Sort is a stable sorting algorithm, preserving the relative order of
equal elements.
3. Parallelization: Bucket Sort can be easily parallelized, making it suitable for large-
scale genomic data.

Impact of Bucket Size

The choice of bucket size significantly impacts the efficiency and correctness of the
sorting process:

Correctness

1. Insufficient buckets: Too few buckets (small bucket size) may lead to inaccurate
sorting, as similar DNA sequences might be distributed across multiple buckets.
2. Excessive buckets: Too many buckets (large bucket size) may result in unnecessary
computational overhead and memory usage.

Efficiency

1. Optimal bucket size: Choosing a bucket size that balances the number of buckets
(k) with the number of elements (n) ensures efficient sorting.
2. Load factor: Maintaining an optimal load factor (average number of elements per
bucket) between 0.5 and 1.5 ensures efficient sorting.

Guidelines for Choosing Bucket Size

1. Data distribution: Analyze the distribution of DNA sequences to determine an


optimal bucket size.
2. Memory constraints: Consider available memory to avoid excessive bucket
creation.
3. Computational resources: Balance computational resources with the number of
buckets.

DNA Sequence-Specific Considerations

1. Sequence length: Consider varying sequence lengths when determining bucket


size.
2. Sequence similarity: Account for similar sequences when choosing bucket size.
3. Alphabet size: Consider the size of the DNA alphabet (4 nucleotides: A, C, G, T).

Example Bucket Size Calculation

Suppose we have:

- 1 million DNA sequences (n = 1,000,000)


- Average sequence length: 1000 nucleotides
- Available memory: 16 GB
- Computational resources: 16 CPU cores

A suitable bucket size might be:

- 1000 buckets (k = 1000)


- Load factor: 1000 elements/bucket (n/k ≈ 1)

By carefully selecting the bucket size, Bucket Sort can efficiently and accurately sort
large-scale genomic data.

Real-World Applications

1. Genomic assembly: Sorting DNA sequences for genome assembly.


2. Variant detection: Identifying genetic variations through sorted DNA sequences.
3. Gene expression analysis: Analyzing gene expression levels through sorted RNA-
seq data.

In conclusion, choosing an optimal bucket size is crucial for efficient and accurate
sorting of genomic data using Bucket Sort.

5. Question –
Consider the following data: [ 5Marks]
Hash(x)=x mod 11, hi=(Hash(x)+F(i))%11
Where F(i)=i*h2(x) with h2(x)=7-(xmod7) and i is the ith time hashing is applied for
same value of x when collision occurs. Solve the following double hashing problem
for the input 97,30,108,33,44 clearly indicating the collisions.

Answer –

Let's solve the double hashing problem step by step.

Hash Function 1: Hash(x) = x mod 11

Hash Function 2: h2(x) = 7 - (x mod 7)

F(i) = i * h2(x)

hi = (Hash(x) + F(i)) % 11

Input Values: 97, 30, 108, 33, 44

Step 1: Hash 97

Hash(97) = 97 mod 11 = 9

h2(97) = 7 - (97 mod 7) = 7 - 6 = 1


No collision, store 97 at index 9.

Step 2: Hash 30

Hash(30) = 30 mod 11 = 8

h2(30) = 7 - (30 mod 7) = 7 - 2 = 5

No collision, store 30 at index 8.

Step 3: Hash 108

Hash(108) = 108 mod 11 = 9

Collision! (already occupied by 97)

F(1) = 1 * h2(108) = 1 * (7 - (108 mod 7)) = 1 * (7 - 6) = 1

h1 = (9 + 1) % 11 = 10

Store 108 at index 10.

Step 4: Hash 33

Hash(33) = 33 mod 11 = 0

No collision, store 33 at index 0.

Step 5: Hash 44

Hash(44) = 44 mod 11 = 0

Collision! (already occupied by 33)

F(1) = 1 * h2(44) = 1 * (7 - (44 mod 7)) = 1 * (7 - 2) = 5

h1 = (0 + 5) % 11 = 5

Collision! (index 5 occupied)

F(2) = 2 * h2(44) = 2 * 5 = 10

h1 = (0 + 10) % 11 = 10

Collision! (index 10 occupied by 108)

F(3) = 3 * h2(44) = 3 * 5 = 15

h1 = (0 + 15) % 11 = 4
Store 44 at index 4.

Final Hash Table:

Index | Value
------|------
0 | 33
4 | 44
8 | 30
9 | 97
10 | 108

Collisions:

- 108 collided with 97 at index 9.


- 44 collided with 33 at index 0, then with 108 at index 10.

Double hashing resolves collisions efficiently by using a secondary hash function to


probe alternative indices.

6. Question –
Perform one round of partitioning using Quick sort with n/3 th element as pivot for
the given integer set {21, 28, 34, 7, 67, 31, 42, 50, 56}

Answer –

Let's perform one round of partitioning using Quick sort with the n/3 th element as
pivot.

Given array: {21, 28, 34, 7, 67, 31, 42, 50, 56}

n=9

n/3 = 3

Choose the 3rd element (31) as the pivot.

Partitioning:

- Elements less than pivot (31): {21, 28, 7}


- Elements equal to pivot (31): {31}
- Elements greater than pivot (31): {34, 67, 42, 50, 56}

After partitioning:

{7, 21, 28, 31, 34, 42, 50, 56, 67}


Pivot (31) is in its final position.

Note:

- The left partition {7, 21, 28} has elements less than 31.
- The right partition {34, 42, 50, 56, 67} has elements greater than 31.

This completes one round of partitioning using Quick sort with the n/3 th element as
pivot.

Next steps would involve recursively sorting the left and right partitions.

You might also like