0% found this document useful (0 votes)
20 views5 pages

BDA April-May 2024 Answers

The document discusses various optimization techniques including Genetic Algorithms and Simulated Annealing, highlighting their applications and differences. It also covers data structures in Hadoop, the reliability of RDBMS vs. NoSQL, and provides R programming examples for matrix operations and string manipulations. Additionally, it introduces the DGIM algorithm for estimating binary data stream counts and explains statistical moments for data analysis.

Uploaded by

Boopathy S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

BDA April-May 2024 Answers

The document discusses various optimization techniques including Genetic Algorithms and Simulated Annealing, highlighting their applications and differences. It also covers data structures in Hadoop, the reliability of RDBMS vs. NoSQL, and provides R programming examples for matrix operations and string manipulations. Additionally, it introduces the DGIM algorithm for estimating binary data stream counts and explains statistical moments for data analysis.

Uploaded by

Boopathy S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

BDA April- May 2024

Part – A

3. Genetic Algorithm vs. Genetic Programming

o Genetic Algorithm: Optimization technique inspired by the process of natural selection to solve problems by
evolving a set of potential solutions.
o Genetic Programming: A specialization of genetic algorithms where the solutions are computer programs
represented as tree structures, which evolve over time.

4. Purpose of Simulated Annealing:


Simulated annealing is a probabilistic optimization technique used to find an approximate global optimum for functions with
multiple local optima. It mimics the annealing process in metallurgy, where gradual cooling leads to a stable crystal structure.

5. Examples of Stream Sources:

1. Keyboard inputs (stdin)


2. Network sockets
3. Sensor data streams
4. Real-time stock price feeds

7. Namenode vs. Datanode:

o Namenode: Manages the metadata of the file system and keeps track of file locations in Hadoop Distributed
File System (HDFS).
o Datanode: Stores the actual data blocks and communicates with the Namenode for read/write operations.

8. Which is more reliable, RDBMS or NoSQL? Why?

 RDBMS: Reliable for structured data and ACID (Atomicity, Consistency, Isolation, Durability) compliance, making it
suitable for critical transactional systems.
 NoSQL: Better for handling unstructured or semi-structured data and scaling horizontally, but lacks strict ACID
properties.

9. R Program to Create a 2x2 Matrix and Display Its First Row:

# Creating a 2x2 matrix

matrix_data <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)

# Displaying the first row

first_row <- matrix_data[1, ]

print(first_row)

10. Illustration of regexpr() in R Programming:

text <- "This is a test string"

pattern <- "test"

match_pos <- regexpr(pattern, text)

print(match_pos)
Part – B

12 a). Genetic Algorithm:

 Genetic algorithms are often used for optimizing tasks like machine learning hyperparameters, query optimization, and
clustering large datasets.
 Crossover and mutation in big data help evolve better solutions over time to handle vast amounts of data more
efficiently.

Chromosome: 11100110

One-Point Crossover:

 A crossover point is selected randomly, say after the 3rd bit.


 Parent 1: 111|00110
 Parent 2: 000|11101 (example second parent)
 Resulting offspring:
o Offspring 1: 11111101
o Offspring 2: 00000110

Two-Point Crossover:

 Two crossover points are selected, say after the 2nd and 6th bits.
 Parent 1: 11|1001|10
 Parent 2: 00|1110|01 (example second parent)
 Resulting offspring:
o Offspring 1: 11111010
o Offspring 2: 00100101

Mutation:

 Mutation involves flipping a bit at a random position.


 Assume the mutation point is the 5th bit.
 Original chromosome: 11100110
 After mutation: 11101110 (5th bit flipped from 0 to 1)

13 a) Moments:

 Moments are essential for statistical analysis of large data streams.


 Computing higher moments helps detect data anomalies, model variance, and optimize machine learning models over
streaming data.

1. Surprise Number (Second Moment)

The second moment (also known as the surprise number or frequency moment) is defined as:

F2=∑i(fi2)

F_2 = \sum_{i} (f_i^2)

F2=i∑(fi2)

where f_i^2 is the frequency of each element in the stream.

Given Stream:

3, 1, 4, 1, 3, 4, 2, 1, 2
Frequency Count:

 1 appears 3 times
 2 appears 2 times
 3 appears 2 times
 4 appears 2 times

Calculation of Second Moment:


F2=32+22+22+22

F_2 = 3^2 + 2^2 + 2^2 + 2^2

F2=32+22+22+22

F2=9+4+4+4=21

F_2 = 9 + 4 + 4 + 4 = 21

F2=9+4+4+4=21

2. Third Moment Calculation

The third moment is defined as:

F3=∑i(fi3)

F_3 = \sum_{i} (f_i^3)

F3=i∑(fi3)

Calculation:
F3=33+23+23+23

F_3 = 3^3 + 2^3 + 2^3 + 2^3

F3=33+23+23+23

F3=27+8+8+8=51

F_3 = 27 + 8 + 8 + 8 = 51

F3=27+8+8+8=51

13. b) Purpose of the Datar-Gionis-Indyk-Motwani (DGIM) Algorithm

The DGIM algorithm is designed to efficiently estimate the number of 1s in the most recent N elements of a binary data stream
using limited memory. It is commonly used in applications like network traffic monitoring, clickstream analysis, and sensor data
processing.

Key Idea

Instead of storing every bit in the stream, DGIM groups bits into buckets and maintains only a logarithmic number of buckets,
thereby reducing memory usage while providing approximate results.

Example Explanation

Problem Statement:

Suppose we have a binary data stream:


1, 0, 1, 1, 0, 1, 0, 1, 1, 1 (N = 10)

We want to estimate the number of 1s in the last 10 elements.


Steps:

1. Bucket Representation:
o Group consecutive 1s into buckets and store only the size and timestamp of the most recent 1 in each bucket.
o Merge buckets when needed to maintain the constraint that there are at most two buckets of the same size.
2. Bucket Management:
o The algorithm keeps buckets of sizes 1, 2, 4, etc. (powers of 2).
o If there are more than two buckets of the same size, merge the oldest ones.
3. Counting the 1s:
o Add the sizes of all full buckets.
o For the last bucket, count only the portion that fits within the N most recent elements.

Example Calculation:

If the last three buckets have sizes 2, 2, and 4, the estimated count of 1s would be:

2+2+4=8
2+2+4=8
2+2+4=8

This is an approximation because we count part of the last bucket.

Applications in Big Data:

 Network Traffic Analysis: Monitoring active connections over time.


 Clickstream Analysis: Counting recent user clicks without storing entire histories.
 IoT Systems: Tracking sensor activations in resource-constrained environments.

16 a) Create a Class for a Matrix and Perform Matrix Multiplication in R

# Define a custom class for a mom matrix


MomMatrix <- setClass( "MomMatrix",slots = list(data = "matrix"))

# Method to initialize and print the matrix


setMethod("show", "MomMatrix", function(object) {
cat("Mom Matrix:\n")
print(object@data)
})

# Matrix multiplication function


setGeneric("multiplyMatrices", function(x, y) standardGeneric("multiplyMatrices"))
setMethod("multiplyMatrices", c("MomMatrix", "MomMatrix"), function(x, y) {
result_data <- x@data %*% y@data
return(MomMatrix(data = result_data))
})

# Creating two MomMatrix objects


matrix1 <- MomMatrix(data = matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2))
matrix2 <- MomMatrix(data = matrix(c(2, 0, 1, 2), nrow = 2, ncol = 2))

# Displaying the matrices


show(matrix1)
show(matrix2)

# Performing multiplication
result_matrix <- multiplyMatrices(matrix1, matrix2)
show(result_matrix)

16 b) i) R Program to Check if String has a Given Suffix


# Function to check suffix
check_suffix <- function(string, suffixes) {
for (suffix in suffixes) {
if (grepl(paste0(suffix, "$"), string)) {
return(TRUE)
}
}
return(FALSE)
}

# Example usage
string <- "index.html"
suffixes <- c("html", "php", "txt")
result <- check_suffix(string, suffixes)
cat("Does the string have a suffix?", result, "\n")

16 b) ii) R Program to Find Substrings of Length 'n' Starting and Ending with the Same Character

find_substrings <- function(string, n) {

substrings <- c()

length_str <- nchar(string)

for (i in 1:(length_str - n + 1)) {

substring <- substr(string, i, i + n - 1)

if (nchar(substring) == n && substring[1] == substring[n]) {

substrings <- c(substrings, substring)

return(substrings)

# Example usage

string <- "abcabc"

n <- 3

result <- find_substrings(string, n)

cat("Substrings of length", n, "that start and end with the same character:", result, "\n")

You might also like