BDA April-May 2024 Answers
BDA April-May 2024 Answers
Part – A
o Genetic Algorithm: Optimization technique inspired by the process of natural selection to solve problems by
evolving a set of potential solutions.
o Genetic Programming: A specialization of genetic algorithms where the solutions are computer programs
represented as tree structures, which evolve over time.
o Namenode: Manages the metadata of the file system and keeps track of file locations in Hadoop Distributed
File System (HDFS).
o Datanode: Stores the actual data blocks and communicates with the Namenode for read/write operations.
RDBMS: Reliable for structured data and ACID (Atomicity, Consistency, Isolation, Durability) compliance, making it
suitable for critical transactional systems.
NoSQL: Better for handling unstructured or semi-structured data and scaling horizontally, but lacks strict ACID
properties.
print(first_row)
print(match_pos)
Part – B
Genetic algorithms are often used for optimizing tasks like machine learning hyperparameters, query optimization, and
clustering large datasets.
Crossover and mutation in big data help evolve better solutions over time to handle vast amounts of data more
efficiently.
Chromosome: 11100110
One-Point Crossover:
Two-Point Crossover:
Two crossover points are selected, say after the 2nd and 6th bits.
Parent 1: 11|1001|10
Parent 2: 00|1110|01 (example second parent)
Resulting offspring:
o Offspring 1: 11111010
o Offspring 2: 00100101
Mutation:
13 a) Moments:
The second moment (also known as the surprise number or frequency moment) is defined as:
F2=∑i(fi2)
F2=i∑(fi2)
Given Stream:
3, 1, 4, 1, 3, 4, 2, 1, 2
Frequency Count:
1 appears 3 times
2 appears 2 times
3 appears 2 times
4 appears 2 times
F2=32+22+22+22
F2=9+4+4+4=21
F_2 = 9 + 4 + 4 + 4 = 21
F2=9+4+4+4=21
F3=∑i(fi3)
F3=i∑(fi3)
Calculation:
F3=33+23+23+23
F3=33+23+23+23
F3=27+8+8+8=51
F_3 = 27 + 8 + 8 + 8 = 51
F3=27+8+8+8=51
The DGIM algorithm is designed to efficiently estimate the number of 1s in the most recent N elements of a binary data stream
using limited memory. It is commonly used in applications like network traffic monitoring, clickstream analysis, and sensor data
processing.
Key Idea
Instead of storing every bit in the stream, DGIM groups bits into buckets and maintains only a logarithmic number of buckets,
thereby reducing memory usage while providing approximate results.
Example Explanation
Problem Statement:
1. Bucket Representation:
o Group consecutive 1s into buckets and store only the size and timestamp of the most recent 1 in each bucket.
o Merge buckets when needed to maintain the constraint that there are at most two buckets of the same size.
2. Bucket Management:
o The algorithm keeps buckets of sizes 1, 2, 4, etc. (powers of 2).
o If there are more than two buckets of the same size, merge the oldest ones.
3. Counting the 1s:
o Add the sizes of all full buckets.
o For the last bucket, count only the portion that fits within the N most recent elements.
Example Calculation:
If the last three buckets have sizes 2, 2, and 4, the estimated count of 1s would be:
2+2+4=8
2+2+4=8
2+2+4=8
# Performing multiplication
result_matrix <- multiplyMatrices(matrix1, matrix2)
show(result_matrix)
# Example usage
string <- "index.html"
suffixes <- c("html", "php", "txt")
result <- check_suffix(string, suffixes)
cat("Does the string have a suffix?", result, "\n")
16 b) ii) R Program to Find Substrings of Length 'n' Starting and Ending with the Same Character
return(substrings)
# Example usage
n <- 3
cat("Substrings of length", n, "that start and end with the same character:", result, "\n")