0% found this document useful (0 votes)

5 views

Data Structures and Algorithms

Uploaded by

aditijadhav725

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Data Structures and Algorithms

Uploaded by

aditijadhav725

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

DATA STRUCTURES AND ALGORITHMS

Semester

1. Let G be a graph whose vertices are the integers 1 through 8, and let the adjacent vertices of
each vertex be given by the table below:

Assume that, in a traversal of G, the adjacent vertices of a given vertex are returned in the
same order as they are listed in the above table.
a. Draw G.
b. Order the vertices as they are visited in a DFS traversal starting at vertex 1.
c. Order the vertices as they are visited in a BFS traversal starting at vertex 1.

Answer –

b. Give the sequence of vertices of G visited using a DFS traversal starting at vertex 1.

Depth First Search visits all the children until the bottom is reached of all child nodes is
reached.

There are many possible DFS traversals. One DFS traversal is:

1, 2, 3, 4, 6, 5, 7, 8.

Starting at 1, The first child is visited which is 2. Next the first child of 2 has (1) already been
visited so we look at 3. The next children of 3 have already been looked at (1 and 2), so now
we look at 4. The first child of 4 not looked at is 6. The next child not looked at from 6 is 5. The
children of 5 not yet looked at is 7. The last child of 7 not looked at is 8. In DFS, we always look
at children, children of children, until the bottom, and then backtrack up one step (one vertex)
at a time, and check to see if there are any other paths that we could possibly take.

c. Give the sequence of vertices visited using a BFS traversal starting at vertex 1.

BFS is Level Order Traversal. There are many possible BFS traversals. One BFS traversal is 1, 2,
3, 4, 6, 5, 7, 8.

We look at 1 first, then all of its children 2, 3, 4. Now we look at the children of 2 not already
looked at yet which 4. Now we look at the children of 4 not looked at yet is 6. All the children
of 6 not looked at yet are 5 and then 7. Finally, the last child of 5 is 8.

2. Suppose you have analysed a file and determined that the frequency of occurrence of certain
characters is as follows:

a) Construct the Huffman tree for the characters

b) List the codes for each character
c) Use the tree to compress the following strings:
i) ‘faded’ ii) ‘feed’

Answer –
(a) Huffman Tree Construction

1. Create a priority queue with characters and their frequencies:

| Character | Frequency |
| --- | --- |
| e | 30 |
| a | 15 |
|d|8|
| f | 10 |
|b|7|
|c|5|

1. Combine the two nodes with the lowest frequencies:

(5, c) + (7, b) = 12 (cb)

| Character | Frequency |
| --- | --- |
| e | 30 |
| a | 15 |
|d|8|
| f | 10 |
| cb | 12 |

1. Continue combining nodes:

(8, d) + (10, f) = 18 (df)

| Character | Frequency |
| --- | --- |
| e | 30 |
| a | 15 |
| cb | 12 |
| df | 18 |

1. Combine again:

(12, cb) + (15, a) = 27 (cba)

| Character | Frequency |
| --- | --- |
| e | 30 |
| df | 18 |
| cba | 27 |

1. Final combinations:

(18, df) + (27, cba) = 45 (dfcba)

(30, e) + (45, dfcba) = 75 (edfcba)

Huffman Tree:

+---------------+
| edfcba |
+---------------+
/ \
+---------------+ +---------------+
| e | | dfcba |
+---------------+ +---------------+
/ \
+---------------+ +---------------+
| df | | cba |
+---------------+ +---------------+
/ \ / \
+---------------+ +---------------+ +---------------+
| d || f || cb |
+---------------+ +---------------+ +---------------+
/ \
+---------------+ +---------------+
| c || b |
+---------------+ +---------------+
(b) Huffman Codes for Each Character

| Character | Code |
| --- | --- |
|e|0|
| a | 10 |
| d | 110 |
| f | 111 |
| c | 1000 |
| b | 1001 |

i) 'faded'

Original: f-a-d-e-d
Huffman Codes: 111-10-110-0-110
Compressed: 1111011000110

ii) 'feed'

Original: f-e-e-d
Huffman Codes: 111-0-0-110
Compressed: 11100110

Uncompressed sizes:

- 'faded': 5 characters × 8 bits/character = 40 bits

- 'feed': 4 characters × 8 bits/character = 32 bits

Compressed sizes:

- 'faded': 13 bits
- 'feed': 8 bits

Compression ratios:

- 'faded': 40/13 ≈ 3.08

- 'feed': 32/8 = 4

3. Find the DFS traversal for the graph given below start node at 1. show the stack trace at each
step of the DFS traversal provide the DFS tree. [5Marks]
Answer –
Graph Description
The graph has the following connections:
• Node 1 is connected to nodes 2 and 3.
• Node 2 is connected to nodes 5 and 6.
• Node 3 is connected to node 4.
• Node 4 is connected to node 7.
• Node 7 is connected to node 8.
DFS Traversal Steps and Stack Trace
1. Start at Node 1:
o Stack: 1
o Visited: {1}
2. Move to Node 2:
o Stack: [2, 1]
o Visited: {1, 2}
3. Move to Node 5:
o Stack: [5, 2, 1]
o Visited: {1, 2, 5}
4. Node 5 has no unvisited adjacent nodes, backtrack:
o Stack: [2, 1]
o Visited: {1, 2, 5}
5. Move to Node 6:
o Stack: [6, 2, 1]
o Visited: {1, 2, 5, 6}
6. Node 6 has no unvisited adjacent nodes, backtrack:
o Stack: [2, 1]
o Visited: {1, 2, 5, 6}
7. Node 2 has no more unvisited adjacent nodes, backtrack:
o Stack: 1
o Visited: {1, 2, 5, 6}
8. Move to Node 3:
o Stack: [3, 1]
o Visited: {1, 2, 3, 5, 6}
9. Move to Node 4:
o Stack: [4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6}
10. Move to Node 7:
o Stack: [7, 4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7}
11. Move to Node 8:
o Stack: [8, 7, 4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
12. Node 8 has no unvisited adjacent nodes, backtrack:
o Stack: [7, 4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
13. Node 7 has no more unvisited adjacent nodes, backtrack:
o Stack: [4, 3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
14. Node 4 has no more unvisited adjacent nodes, backtrack:
o Stack: [3, 1]
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
15. Node 3 has no more unvisited adjacent nodes, backtrack:
o Stack: 1
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
16. Node 1 has no more unvisited adjacent nodes, traversal complete:
o Stack: []
o Visited: {1, 2, 3, 4, 5, 6, 7, 8}
DFS Tree
The DFS tree based on the traversal is:

1
├── 2
│ ├── 5
│ └── 6
└── 3
└── 4
└── 7
└── 8

4. Construct the Binary search tree for values 15, 20, 10, 8, 25, 12, 16.
Write the inorder successor of 12 and 15 from the constructed tree.

Answer –
find the inorder successors of 12 and 15.
Step 1: Construct the BST
Insert the values one by one into the BST:
1. Insert 15: Root node.
2. Insert 20: 20 > 15, so it goes to the right of 15.
3. Insert 10: 10 < 15, so it goes to the left of 15.
4. Insert 8: 8 < 10, so it goes to the left of 10.
5. Insert 25: 25 > 20, so it goes to the right of 20.
6. Insert 12: 12 > 10, so it goes to the right of 10.
7. Insert 16: 16 < 20, so it goes to the left of 20.
The constructed BST looks like this:

15
/ \
10 20
/ \ / \
8 12 16 25

Step 2: Find the Inorder Successors

Inorder Traversal of the BST:
The inorder traversal of a BST visits nodes in ascending order. For the given tree, the inorder
traversal is: 8, 10, 12, 15, 16, 20, 25.
Inorder Successor of 12:
The inorder successor of a node is the next node in the inorder traversal. For 12, the next
node is 15.
Inorder Successor of 15:
For 15, the next node in the inorder traversal is 16.

5. Compress ‘BILL BEATS BEN.’ using the Huffman approach.

Answer –
Huffman Coding Compression

Step 1: Calculate Frequency

| Character | Frequency |
| --- | --- |
|B|3|
|E|2|
|L|2|
|I|1|
|T|1|
|N|1|
|A|1|
|S|1|
| . | 1 (period) |

Step 2: Build Huffman Tree

Combine nodes with lowest frequencies:

(1, .) + (1, T) = 2 (T.)

(1, N) + (1, A) = 2 (NA)
(1, S) + (1, I) = 2 (SI)
(2, L) + (2, E) = 4 (LE)
(2, SI) + (2, NA) = 4 (SINA)
(3, B) + (4, LE) = 7 (BLE)
(4, SINA) + (7, BLE) = 11 (all)

Huffman Tree:

+---------------+
| all |
+---------------+
/ \
+---------------+ +---------------+
| BLE | | SINA |
+---------------+ +---------------+
/ \ / \

+-------+ +-------+ +-------+ +-------+

| B | | LE | | SI | | NA |
+-------+ +-------+ +-------+ +-------+
/ \ /
+---+ +---+ +---+ +---+
|E||L| |I||S|
+---+ +---+ +---+ +---+
/
+---+ +---+
|N||A|
+---+ +---+
/
+---+
| T. |
+---+

Step 3: Assign Huffman Codes

| Character | Code |
| --- | --- |
|B|0|
| E | 100 |
| L | 101 |
| I | 1100 |
| T | 11010 |
| N | 11011 |
| A | 1110 |
| S | 1111 |
| . | 1101 |

Step 4: Compress

Original: BILL BEATS BEN.

Huffman Codes:
0-0-101-101-100-0-1111-100-1100-1110-1101

Compressed: 0010110100101111001001100111011101

Uncompressed size: 12 characters × 8 bits/character = 96 bits

Compressed size: 32 bits
Compression ratio: 96/32 = 3

Note: This is a simple example of Huffman coding. Actual implementations may use more
complex techniques for optimal compression.

6. Given the following directed acyclic graph (DAG) with vertices labelled as numbers, perform
topological sorting on the graph:
Vertices: {5, 3, 2, 8, 6}
Edges: (5, 3), (5, 2), (3, 8), (2, 8), (6, 8), (6, 3)
Show each step of the sorting process and the final sorted result. [5Marks]

Answer –

Let’s perform a topological sort on the given directed acyclic graph (DAG). Here are the
vertices and edges:
• Vertices: {5, 3, 2, 8, 6}
• Edges: (5, 3), (5, 2), (3, 8), (2, 8), (6, 8), (6, 3)
Steps for Topological Sorting:
1. Identify vertices with no incoming edges:
o Vertices 5 and 6 have no incoming edges.
2. Select one of these vertices:
o Let’s start with vertex 5.
3. Remove vertex 5 and its outgoing edges:
o Remove edges (5, 3) and (5, 2).
o Remaining vertices: {3, 2, 8, 6}
o Remaining edges: (3, 8), (2, 8), (6, 8), (6, 3)
4. Add vertex 5 to the sorted list:
o Sorted list: 5
5. Identify vertices with no incoming edges:
o Vertices 6 and 2 have no incoming edges.
6. Select one of these vertices:
o Let’s choose vertex 6.
7. Remove vertex 6 and its outgoing edges:
o Remove edges (6, 8) and (6, 3).
o Remaining vertices: {3, 2, 8}
o Remaining edges: (3, 8), (2, 8)
8. Add vertex 6 to the sorted list:
o Sorted list: [5, 6]
9. Identify vertices with no incoming edges:
o Vertices 2 and 3 have no incoming edges.
10. Select one of these vertices:
o Let’s choose vertex 2.
11. Remove vertex 2 and its outgoing edge:
o Remove edge (2, 8).
o Remaining vertices: {3, 8}
o Remaining edge: (3, 8)
12. Add vertex 2 to the sorted list:
o Sorted list: [5, 6, 2]
13. Identify vertices with no incoming edges:
o Vertex 3 has no incoming edges.
14. Select vertex 3:
o Remove vertex 3 and its outgoing edge (3, 8).
o Remaining vertex: {8}
o No remaining edges.
15. Add vertex 3 to the sorted list:
o Sorted list: [5, 6, 2, 3]
16. Identify vertices with no incoming edges:
o Vertex 8 has no incoming edges.
17. Select vertex 8:
o Add vertex 8 to the sorted list.
o Sorted list: [5, 6, 2, 3, 8]
Final Topologically Sorted Order:
[5, 6, 2, 3, 8]

7. Consider the following directed graph G. Using Dijkstra’s algorithm, find the sequence of
vertices to be visited if the starting vertex is G. Show the step by step selection of each vertex
in the graph with the updated distance. At the end show the sequence of vertices visited by
following Dijkstra's algorithm. [5Marks]

Answer –

Let’s apply Dijkstra’s algorithm to the given directed graph starting from vertex G. Here are
the steps:
Step 1: Initialize Distances
• Set the distance to the starting vertex G to 0.
• Set the distance to all other vertices to infinity.
• Mark all vertices as unvisited.
Step 2: Select the Vertex with the Smallest Distance
• Initially, this is vertex G with a distance of 0.
Step 3: Update Distances of Adjacent Vertices
• Update the distances of vertices adjacent to G.
Step 4: Mark the Current Vertex as Visited
• Mark G as visited and move to the next vertex with the smallest distance.
Step-by-Step Execution
Start at Vertex G:
o Distance: G = 0, others = ∞
o Adjacent vertices: F (distance 4), E (distance 5)
o Updated distances: F = 4, E = 5
o Mark G as visited.
Select Vertex F (smallest distance 4):
o Distance: F = 4
o Adjacent vertices: C (distance 10 via F)
o Updated distances: C = 10, E remains 5 (since 5 < 4 + 6)
o Mark F as visited.
Select Vertex E (smallest distance 5):
o Distance: E = 5
o Adjacent vertices: D (distance 9 via E)
o Updated distances: D = 9, C remains 10
o Mark E as visited.
Select Vertex D (smallest distance 9):
o Distance: D = 9
o Adjacent vertices: None
o No updates needed.
o Mark D as visited.
Select Vertex C (smallest distance 10):
o Distance: C = 10
o Adjacent vertices: A (distance 11 via C)
o Updated distances: A = 11
o Mark C as visited.
Select Vertex A (smallest distance 11):
o Distance: A = 11
o Adjacent vertices: None
o No updates needed.
o Mark A as visited.
Sequence of Vertices Visited
The sequence of vertices visited by following Dijkstra’s algorithm starting from G is: G -> F -> E
-> D -> C -> A
Vertices B and H are not reachable from G based on the given graph structure.

8. Question –
Construct the AVL tree with the following numbers (by mentioning the rotations used)
12, 15, 36, 17, 34, 85, 64, 19, 3 and find the balance factor of the nodes: 64 and 34.

Answer –
Let’s construct the AVL tree with the given numbers and mention the rotations used.
After constructing the tree, we’ll find the balance factors of the nodes 64 and 34.
Step-by-Step Construction of the AVL Tree
Insert 12:
o Tree: 12
Insert 15:
o Tree:
12
\
15
Insert 36:
o Tree:
12
\
15
\
36
o No rotation needed.
Insert 17:
o Tree:
12
\
15
\
36
/
17
o No rotation needed.
Insert 34:
o Tree:
12
\
15
\
36
/
17
\
34
o No rotation needed.
Insert 85:
o Tree:
12
\
15
\
36
/ \
17 85
\
34
o No rotation needed.
Insert 64:
o Tree:
12
\
15
\
36
/ \
17 85
\ /
34
64
o No rotation needed.
Insert 19:
o Tree:
12
\
15
\
36
/ \
17 85
\ /
34
/ 64
19
o Left-Right (LR) Rotation at node 36:
First, perform a left rotation on 17:
12
\
15
\
36
/ \
19 85
/ /
17 64
34
Then, perform a right rotation on 36:
12
\
15
\
19
/ \
17 36
/ \
34 85
64
Insert 3:
o Tree:
12
/ \
3 15
\
19
/ \
17 36
/ \
34 85
64
o No rotation needed.
Final AVL Tree
12
/ \
3 15
\
19
/ \
17 36
/ \
34 85
64

Balance Factors

• Node 64: Balance factor = Height of left subtree - Height of right subtree = 0 - 0 = 0
• Node 34: Balance factor = Height of left subtree - Height of right subtree = 0 - 0 = 0

9. Question –
Answer the following questions
a) Give a word which is the only non-hyphenated word in the English language with three
consecutive double letters. (0.5M)
b) Give the frequency of each letter. (0.5M)
c) Explain which letter you would expect to find closest to the top of the Huffman Tree when
created for the expected word. (0.5M).
d) Determine the percentage of vowels in the word. (0.5M)

Answer –

(a) Word with three consecutive double letters

The word is: Bookkeeper

(b) Frequency of each letter

| Letter | Frequency |
| --- | --- |
|B|1|
|O|2|
|O|2|
|K|2|
|K|2|
|E|2|
|E|2|
|P|1|
|E|1|
|R|1|

Letter frequency table:

| Letter | Frequency |
| --- | --- |
|O|2|
|K|2|
|E|3|
|B|1|
|P|1|
|R|1|

(c) Letter closest to the top of the Huffman Tree

The letter 'E' would be closest to the top of the Huffman Tree since it has the highest
frequency (3) among all letters.

(d) Percentage of vowels

Vowels: O, E (total occurrences: 2 + 3 = 5)

Total letters: 10
Percentage of vowels = (5/10) × 100% = 50%

The word "Bookkeeper" is 50% vowels.

Midsemester

1. Given the top pointer of stack implemented using arrays, write an algorithm to print
minimum element of stack and find its running time complexity?

Answer –

Algorithm to Print Minimum Element of Stack

def print_min_element(stack, top):

if top == -1: # Stack is empty
return None

min_element = stack[0]
for i in range(1, top + 1):
if stack[i] < min_element:
min_element = stack[i]

return min_element

# Alternatively, you can maintain a separate variable to keep track of the minimum
element
def print_min_element_optimized(stack, top, min_element):
if top == -1: # Stack is empty
return None

return min_element
# To update min_element when pushing/popping elements
def update_min_element(min_element, new_element):
if new_element < min_element:
return new_element
return min_element

def push(stack, top, max_size, new_element):

if top == max_size - 1: # Stack is full
return top

top += 1
stack[top] = new_element
min_element = update_min_element(min_element, new_element)
return top, min_element

def pop(stack, top, min_element):

if top == -1: # Stack is empty
return top, min_element

popped_element = stack[top]
top -= 1
if popped_element == min_element:
min_element = find_new_min(stack, top)
return top, min_element

def find_new_min(stack, top):

min_element = stack[0]
for i in range(1, top + 1):
if stack[i] < min_element:
min_element = stack[i]
return min_element

Running Time Complexity:

- print_min_element: O(n), where n is the number of elements in the stack.

- print_min_element_optimized: O(1), since we maintain a separate variable to track
the minimum element.
- push: O(1), since we update the minimum element when pushing a new element.
- pop: O(n) in the worst case, when the popped element is the minimum and we need
to find the new minimum.

Optimization:

To achieve O(1) time complexity for print_min_element, maintain a separate variable

min_element that tracks the minimum element in the stack. Update this variable
when pushing or popping elements.

Space Complexity:
O(n), where n is the maximum size of the stack, since we use an array to implement
the stack.

2. Sort the following array in ascending array using heap sort. Show all the steps.
20,10,5, 7,9,25,30,15,12,35

Answer –

Here are the steps to sort the array using Heap Sort:

Initial Array:
20, 10, 5, 7, 9, 25, 30, 15, 12, 35

Step 1: Build Max Heap

1. Start with the last non-leaf node (index 4) and perform heapify.
2. Compare 9 with its children (25, 30). Since 30 is larger, swap 9 and 30.
3. Compare 30 with its children (25, none). No swap needed.
4. Move to the next non-leaf node (index 3) and repeat.
5. Compare 7 with its children (15, 12). Since 15 is larger, swap 7 and 15.
6. Compare 15 with its children (none). No swap needed.
7. Continue this process until the root node.

Max Heap:
35, 30, 25, 20, 15, 12, 10, 9, 7, 5

Step 2: Extract Maximum Element (Root) and Heapify

1. Swap root (35) with the last element (5).

2. Remove the last element (5) from the heap.
3. Heapify the reduced heap.

Heap after Extraction:

30, 20, 25, 15, 12, 10, 9, 7

Repeat Step 2 until the heap is empty

Iteration 2:
Swap 30 with 7. Remove 30.
Heapify: 25, 20, 15, 12, 10, 9

Iteration 3:
Swap 25 with 9. Remove 25.
Heapify: 20, 15, 12, 10, 9

Iteration 4:
Swap 20 with 9. Remove 20.
Heapify: 15, 12, 10

Iteration 5:
Swap 15 with 10. Remove 15.
Heapify: 12

Iteration 6-9:
Remove remaining elements.

Sorted Array:
5, 7, 9, 10, 12, 15, 20, 25, 30, 35

Heap Sort has a time complexity of O(n log n) and is an efficient sorting algorithm
for large datasets.

3. Question –
Your company has assigned a task for you to develop software for a small inventory
management system used by a local grocery store. The system needs to maintain an
ordered list of products based on their expiry dates, with the product closest to
expiration appearing first. The list will be updated frequently as new products arrive
and as existing products are sold. [ 5Marks]

i) Considering the requirements of the inventory management system, discuss the

suitability of using the Insertion Sort algorithm for the grocery store.
ii). Would you recommend any other algorithms.? Justify your recommendation.

Answer –

(i) Suitability of Insertion Sort

Insertion Sort can be suitable for the inventory management system due to the
following reasons:

1. Frequent Updates: Insertion Sort is efficient for small, frequent updates, as it only
sorts the newly inserted element.
2. Maintaining Order: Insertion Sort preserves the existing order, ensuring the list
remains sorted by expiry dates.
3. Simple Implementation: Insertion Sort has a straightforward implementation.

However, Insertion Sort also has some limitations:

1. Time Complexity: Insertion Sort has a worst-case and average time complexity of
O(n^2), which may lead to performance issues for large inventory lists.
2. Scalability: As the inventory grows, Insertion Sort's efficiency decreases.

(ii) Alternative Algorithms

Considering the potential scalability issues with Insertion Sort, I recommend the
following algorithms:

1. Binary Search Tree (BST):

- Time Complexity: O(log n) for insertion, deletion, and search operations.
- Efficient Updates: BST maintains the sorted order while handling frequent
updates.
- Scalability: Suitable for large inventory lists.
2. Heap Data Structure:
- Time Complexity: O(log n) for insertion and deletion operations.
- Maintaining Order: Heap ensures the product closest to expiration appears first.
- Scalability: Efficient for large inventory lists.
3. Balanced Binary Search Tree (AVL/RB Tree):
- Time Complexity: O(log n) for insertion, deletion, and search operations.
- Self-Balancing: Ensures the tree remains approximately balanced, maintaining
efficiency.

These data structures and algorithms offer better scalability and performance for
large inventory lists, making them more suitable for the grocery store's inventory
management system.

Justification:

- Frequent updates and maintenance of sorted order require efficient algorithms with
low time complexity.
- Scalability is crucial to handle growing inventory lists.
- BST, Heap, and Balanced BST algorithms provide optimal performance and
scalability.

Recommendation:

- Implement a Binary Search Tree (BST) or Heap data structure to manage the
inventory list.
- Consider using a self-balancing BST (AVL/RB Tree) for optimal performance.

By choosing an efficient algorithm and data structure, the inventory management

system will maintain optimal performance, scalability, and accuracy, ensuring the
grocery store's products are always organized by expiry date.

4. Question –
Consider an application scenario where Bucket Sort is utilized for sorting large- scale
genomic data consisting of DNA sequences. Discuss how the choice of bucket size
impacts the efficiency and correctness of the sorting process in this context.

Answer –

Bucket Sort for Genomic Data

Bucket Sort is a suitable algorithm for sorting large-scale genomic data, particularly
DNA sequences, due to its:

1. Efficiency: Bucket Sort has a time complexity of O(n + k) in the best case, where n is
the number of elements and k is the number of buckets.
2. Stability: Bucket Sort is a stable sorting algorithm, preserving the relative order of
equal elements.
3. Parallelization: Bucket Sort can be easily parallelized, making it suitable for large-
scale genomic data.

Impact of Bucket Size

The choice of bucket size significantly impacts the efficiency and correctness of the
sorting process:

Correctness

1. Insufficient buckets: Too few buckets (small bucket size) may lead to inaccurate
sorting, as similar DNA sequences might be distributed across multiple buckets.
2. Excessive buckets: Too many buckets (large bucket size) may result in unnecessary
computational overhead and memory usage.

Efficiency

1. Optimal bucket size: Choosing a bucket size that balances the number of buckets
(k) with the number of elements (n) ensures efficient sorting.
2. Load factor: Maintaining an optimal load factor (average number of elements per
bucket) between 0.5 and 1.5 ensures efficient sorting.

Guidelines for Choosing Bucket Size

1. Data distribution: Analyze the distribution of DNA sequences to determine an

optimal bucket size.
2. Memory constraints: Consider available memory to avoid excessive bucket
creation.
3. Computational resources: Balance computational resources with the number of
buckets.

DNA Sequence-Specific Considerations

1. Sequence length: Consider varying sequence lengths when determining bucket

size.
2. Sequence similarity: Account for similar sequences when choosing bucket size.
3. Alphabet size: Consider the size of the DNA alphabet (4 nucleotides: A, C, G, T).

Example Bucket Size Calculation

Suppose we have:

- 1 million DNA sequences (n = 1,000,000)

- Average sequence length: 1000 nucleotides
- Available memory: 16 GB
- Computational resources: 16 CPU cores

A suitable bucket size might be:

- 1000 buckets (k = 1000)

- Load factor: 1000 elements/bucket (n/k ≈ 1)

By carefully selecting the bucket size, Bucket Sort can efficiently and accurately sort
large-scale genomic data.

Real-World Applications

1. Genomic assembly: Sorting DNA sequences for genome assembly.

2. Variant detection: Identifying genetic variations through sorted DNA sequences.
3. Gene expression analysis: Analyzing gene expression levels through sorted RNA-
seq data.

In conclusion, choosing an optimal bucket size is crucial for efficient and accurate
sorting of genomic data using Bucket Sort.

5. Question –
Consider the following data: [ 5Marks]
Hash(x)=x mod 11, hi=(Hash(x)+F(i))%11
Where F(i)=i*h2(x) with h2(x)=7-(xmod7) and i is the ith time hashing is applied for
same value of x when collision occurs. Solve the following double hashing problem
for the input 97,30,108,33,44 clearly indicating the collisions.

Answer –

Let's solve the double hashing problem step by step.

Hash Function 1: Hash(x) = x mod 11

Hash Function 2: h2(x) = 7 - (x mod 7)

F(i) = i * h2(x)

hi = (Hash(x) + F(i)) % 11

Input Values: 97, 30, 108, 33, 44

Step 1: Hash 97

Hash(97) = 97 mod 11 = 9

h2(97) = 7 - (97 mod 7) = 7 - 6 = 1

No collision, store 97 at index 9.

Step 2: Hash 30

Hash(30) = 30 mod 11 = 8

h2(30) = 7 - (30 mod 7) = 7 - 2 = 5

No collision, store 30 at index 8.

Step 3: Hash 108

Hash(108) = 108 mod 11 = 9

Collision! (already occupied by 97)

F(1) = 1 * h2(108) = 1 * (7 - (108 mod 7)) = 1 * (7 - 6) = 1

h1 = (9 + 1) % 11 = 10

Store 108 at index 10.

Step 4: Hash 33

Hash(33) = 33 mod 11 = 0

No collision, store 33 at index 0.

Step 5: Hash 44

Hash(44) = 44 mod 11 = 0

Collision! (already occupied by 33)

F(1) = 1 * h2(44) = 1 * (7 - (44 mod 7)) = 1 * (7 - 2) = 5

h1 = (0 + 5) % 11 = 5

Collision! (index 5 occupied)

F(2) = 2 * h2(44) = 2 * 5 = 10

h1 = (0 + 10) % 11 = 10

Collision! (index 10 occupied by 108)

F(3) = 3 * h2(44) = 3 * 5 = 15

h1 = (0 + 15) % 11 = 4
Store 44 at index 4.

Final Hash Table:

Index | Value
------|------
0 | 33
4 | 44
8 | 30
9 | 97
10 | 108

Collisions:

- 108 collided with 97 at index 9.

- 44 collided with 33 at index 0, then with 108 at index 10.

Double hashing resolves collisions efficiently by using a secondary hash function to

probe alternative indices.

6. Question –
Perform one round of partitioning using Quick sort with n/3 th element as pivot for
the given integer set {21, 28, 34, 7, 67, 31, 42, 50, 56}

Answer –

Let's perform one round of partitioning using Quick sort with the n/3 th element as
pivot.

Given array: {21, 28, 34, 7, 67, 31, 42, 50, 56}

n=9

n/3 = 3

Choose the 3rd element (31) as the pivot.

Partitioning:

- Elements less than pivot (31): {21, 28, 7}

- Elements equal to pivot (31): {31}
- Elements greater than pivot (31): {34, 67, 42, 50, 56}

After partitioning:

{7, 21, 28, 31, 34, 42, 50, 56, 67}

Pivot (31) is in its final position.

Note:

- The left partition {7, 21, 28} has elements less than 31.
- The right partition {34, 42, 50, 56, 67} has elements greater than 31.

This completes one round of partitioning using Quick sort with the n/3 th element as
pivot.

Next steps would involve recursively sorting the left and right partitions.

CS218-Data Structures Final Exam
100% (2)
CS218-Data Structures Final Exam
7 pages
Dsa
No ratings yet
Dsa
20 pages
DAA Decrease and Conquer ADA
No ratings yet
DAA Decrease and Conquer ADA
16 pages
Graph
No ratings yet
Graph
21 pages
Data Structure - Solved PYQs - 2024 MAY To 2017 DEC - Aeraxia - in
No ratings yet
Data Structure - Solved PYQs - 2024 MAY To 2017 DEC - Aeraxia - in
278 pages
Daa Lab 8 - Dfs and Bfs
No ratings yet
Daa Lab 8 - Dfs and Bfs
27 pages
important_qn
No ratings yet
important_qn
18 pages
Ai LP-II Lab Manual
No ratings yet
Ai LP-II Lab Manual
42 pages
Graph
No ratings yet
Graph
3 pages
Review:: Graphs and Some Graph Algorithms
No ratings yet
Review:: Graphs and Some Graph Algorithms
11 pages
Practical No1 3
No ratings yet
Practical No1 3
20 pages
AI EXP 1
No ratings yet
AI EXP 1
7 pages
You Do Not Need To Fully Understand This Section To Complete The Assessment.
No ratings yet
You Do Not Need To Fully Understand This Section To Complete The Assessment.
9 pages
Spanning Trees
No ratings yet
Spanning Trees
12 pages
CACSC02 Question Paper Data Structure Open Book Solution
No ratings yet
CACSC02 Question Paper Data Structure Open Book Solution
11 pages
Practical Lab I: Review of Data Structures
No ratings yet
Practical Lab I: Review of Data Structures
6 pages
Data Structures and Algorithms: (CS210/ESO207/ESO211)
No ratings yet
Data Structures and Algorithms: (CS210/ESO207/ESO211)
28 pages
Graph Traversals (BFS and DFS)
No ratings yet
Graph Traversals (BFS and DFS)
47 pages
Graph Traversals PDF
No ratings yet
Graph Traversals PDF
47 pages
Updated Final Ai Manual
No ratings yet
Updated Final Ai Manual
36 pages
Top 10 Algorithms in Interview Questions (Autosaved)
No ratings yet
Top 10 Algorithms in Interview Questions (Autosaved)
20 pages
Unit - 5
No ratings yet
Unit - 5
34 pages
Graph Traversal Algorithms
No ratings yet
Graph Traversal Algorithms
10 pages
BFS_AND_DFS_29
No ratings yet
BFS_AND_DFS_29
95 pages
DS Ans Key
No ratings yet
DS Ans Key
18 pages
Naive String Search
No ratings yet
Naive String Search
3 pages
B - F S (BFS) D - F S (DFS) : Raph Earch
No ratings yet
B - F S (BFS) D - F S (DFS) : Raph Earch
43 pages
BFS DFS Reference
No ratings yet
BFS DFS Reference
4 pages
QuestionBank2024
No ratings yet
QuestionBank2024
8 pages
Unit 4: Trees
No ratings yet
Unit 4: Trees
28 pages
Lec25 DFS I
No ratings yet
Lec25 DFS I
25 pages
BFS, DFS
No ratings yet
BFS, DFS
9 pages
Vansh
No ratings yet
Vansh
17 pages
Best Graph Notes
No ratings yet
Best Graph Notes
8 pages
Graph 1
No ratings yet
Graph 1
64 pages
Lec 12
No ratings yet
Lec 12
41 pages
DFS
No ratings yet
DFS
5 pages
Graphs
No ratings yet
Graphs
11 pages
S2015 Lecture-5
No ratings yet
S2015 Lecture-5
31 pages
Uninformed and Informed Search Algorithm
No ratings yet
Uninformed and Informed Search Algorithm
13 pages
Daa
No ratings yet
Daa
6 pages
dijkstra,huffmancoding
No ratings yet
dijkstra,huffmancoding
55 pages
Graph Search:: Breadth-First Search (BFS) Depth-First Search (DFS)
No ratings yet
Graph Search:: Breadth-First Search (BFS) Depth-First Search (DFS)
42 pages
Fin f14 Sol
No ratings yet
Fin f14 Sol
7 pages
BCA Semester IV Design & Analysis of Algorithms Module 5
No ratings yet
BCA Semester IV Design & Analysis of Algorithms Module 5
24 pages
Problem E: Huffman Codes
No ratings yet
Problem E: Huffman Codes
2 pages
Daa Module 2
No ratings yet
Daa Module 2
7 pages
Uninformed Search Methods
No ratings yet
Uninformed Search Methods
28 pages
Data structures (Binary Trees)
No ratings yet
Data structures (Binary Trees)
41 pages
LP-II - AI-Lab Manual
No ratings yet
LP-II - AI-Lab Manual
33 pages
Lecture 5
No ratings yet
Lecture 5
33 pages
10 AVL Heap Expression Trie Coding Trees
No ratings yet
10 AVL Heap Expression Trie Coding Trees
39 pages
12.1 Types of Edges: Lecture 12: Graph Algorithms
No ratings yet
12.1 Types of Edges: Lecture 12: Graph Algorithms
4 pages
BFS and DFS
No ratings yet
BFS and DFS
20 pages
Solved Dsa Sppu Q - Paper
No ratings yet
Solved Dsa Sppu Q - Paper
21 pages
DSA_Practice_Test_2024_2025_Test02 (2)
No ratings yet
DSA_Practice_Test_2024_2025_Test02 (2)
3 pages
Huffman
No ratings yet
Huffman
70 pages
BFS DFS
No ratings yet
BFS DFS
75 pages
Sat Mathematics Review And Practice
From Everand
Sat Mathematics Review And Practice
Addison Shaw
1/5 (1)
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
From Everand
The Self-Taught Computer Scientist: The Beginner's Guide to Data Structures & Algorithms
Cory Althoff
No ratings yet
Bellman Ford
No ratings yet
Bellman Ford
10 pages
Ajith's Artificial Intelligence Assignment Unit2 PDF
No ratings yet
Ajith's Artificial Intelligence Assignment Unit2 PDF
28 pages
Read Weiss, 9.6 Depth-First Search and 10.5 Backtracking Algorithms
No ratings yet
Read Weiss, 9.6 Depth-First Search and 10.5 Backtracking Algorithms
27 pages
Dijkstra Algorithm Notes
No ratings yet
Dijkstra Algorithm Notes
1 page
Bellman Ford Algorithm
No ratings yet
Bellman Ford Algorithm
5 pages
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
No ratings yet
Unit-1 DAA Notes - Daa Unit 1 Note Unit-1 DAA Notes - Daa Unit 1 Note
26 pages
IV - AIDS Group - C (Assignment - 1)
No ratings yet
IV - AIDS Group - C (Assignment - 1)
3 pages
Spanning Tree
No ratings yet
Spanning Tree
7 pages
FINAL (اسئلة متوقعة تيجي في الامتحان)
No ratings yet
FINAL (اسئلة متوقعة تيجي في الامتحان)
7 pages
Parul University Piet/Pit-It Department: Grade
No ratings yet
Parul University Piet/Pit-It Department: Grade
16 pages
ADA LAB MANUAL 2022 SCHEME (1)
No ratings yet
ADA LAB MANUAL 2022 SCHEME (1)
28 pages
Dijkstra's Shortest Path Algorithm
No ratings yet
Dijkstra's Shortest Path Algorithm
3 pages
Sorting Algorithms Thesis
75% (4)
Sorting Algorithms Thesis
6 pages
CTDL Lab2
No ratings yet
CTDL Lab2
4 pages
Algorithms CSE 245: All Pairs of Shortest Path
No ratings yet
Algorithms CSE 245: All Pairs of Shortest Path
41 pages
Lab 3 AI Search Algorithms
100% (1)
Lab 3 AI Search Algorithms
5 pages
CS3401 SET2
No ratings yet
CS3401 SET2
3 pages
Prim's and Kruskal's Algorithm
No ratings yet
Prim's and Kruskal's Algorithm
31 pages
Name: P Surya Narayana Subject: Summer Internship Section: K18Uw REG NO: 11802507 Course Code: Cse443 Topic: Dsa Self Paced
No ratings yet
Name: P Surya Narayana Subject: Summer Internship Section: K18Uw REG NO: 11802507 Course Code: Cse443 Topic: Dsa Self Paced
33 pages
DAA-Unit II
No ratings yet
DAA-Unit II
12 pages
Super Sort Research Paper
No ratings yet
Super Sort Research Paper
5 pages
Dijkstra's Algorithm
No ratings yet
Dijkstra's Algorithm
6 pages
Lab 4 - DFS
No ratings yet
Lab 4 - DFS
8 pages
AI Notes
No ratings yet
AI Notes
10 pages
演算法導論第一次作業
No ratings yet
演算法導論第一次作業
2 pages
Different Types of Sorting Techniques Used in Data Structures
No ratings yet
Different Types of Sorting Techniques Used in Data Structures
138 pages
DSA QB Updated
No ratings yet
DSA QB Updated
4 pages
Chap3 - Bruteforce and Exhaustive Search
No ratings yet
Chap3 - Bruteforce and Exhaustive Search
28 pages
Lecture 2 Insertion, Selection, Bubble Sort Algorithms
No ratings yet
Lecture 2 Insertion, Selection, Bubble Sort Algorithms
72 pages
JFJF
No ratings yet
JFJF
14 pages