Assignment (1)
Assignment (1)
Semester-I 2024–2025
M.Tech.(CS) - First Year
Assignment (Due date: 20 December, 2024)
Subject: Computing Laboratory
Total: 4×15 = 60 marks
INSTRUCTIONS
1. You may consult or use slides / programs provided to you as course material, or programs that
you have written yourself as part of classwork / homework for this course, but please do not
consult or use material from other Internet sources, your classmates, or anyone else.
2. Unless otherwise specified, all programs should take the required inputs from stdin, and print the
desired outputs to stdout. Please make sure that your programs adhere strictly to the specified
input and output format. Your program may not pass the test cases provided, if your
program violates the input and output requirements.
3. Submissions from different students having significant match will not be evaluated.
4. To avoid mismatches between your output and the provided output, please store all floating point
numbers in double type variables.
g
G4
G3
G7 (n = 7)
(a) (b)
1
a b a b G2
F3 f F1 F2 f
F4 F4 (m = 4)
F1 F3
F2 G5
G6
G2
g G4 (n = 4)
G1 F5 (m = 5) G1 G3
g
G4
G3
G7 (n = 7)
(a) (b)
Irrespective of whether the two functions are separable, the lines corresponding to f and g enclose an area
between them within the interval [a, b], as shown in Figure 2.
Output format: Your program should print either separable or not separable, depending on whether
the given functions are separable or not in the given interval, followed by the area between the curves
corresponding to f and g within the interval [a, b], correct to 4 decimal places.
... ...
2n + 1 2n + 2 ... 3n − 1 3n
m rows
Figure 3: An m × n Snakes-and-Ladders board, with a green arrow representing a ladder (from 2 to 3n − 1), and
a red arrow representing a snake (from n + 2 to n).
2
Input format: The input will consist of the following 3 parts.
(a) Five positive integers k, m, n, l, s, r representing the number of players, rows in the board, columns in
the board, ladders, snakes, and finally, the number of rounds for which the players play the game,
respectively.
(b) l + s pairs of positive integers, corresponding to the end-points of the l ladders and s snakes respectively.
The starting point of a ladder is guaranteed to be smaller than its end point; conversely, the starting
point of a snake is guaranteed to be larger than its end point.
(c) r × k integers (each between 1 and 12), representing the outcomes when the k players roll the dice in
turn, for r rounds. When one player reaches the end, the result of her rolling the dice is
ignored.
Output format: Your program should print one line of output per player, starting with the serial number
of the player (1 . . . k), the current position of the player if it is between 1 and mn − 1, or completed if the
player has reached the end (cell mn), followed by the number of rounds played by the player.
Q3. Many programming languages provide map() and filter() functions that work as follows.
map. Given two sets X, Y , map(L, f ) takes as input a list L = [l0 , l1 , . . . , lN −1 ] of elements from X, and a
function f : X → Y , and returns f (L) ≜ [f (l0 ), f (l1 ), . . . , f (lN −1 )], a list of elements from Y .
filter. Given a set X, filter(L, g) takes as input a list L = [l0 , l1 , . . . , lN −1 ] of elements from X, and a function
g : X → {true, false}, and returns L′ ≜ [lt | 0 ≤ t < N, g(lt ) = true].
Implement the map and filter functions in C. The functions should have the following prototypes:
void *map(void *L, unsigned int N,
size_t domain_elt_size, size_t range_elt_size,
void (*f)(void *input, void *output))
where
3
• f is a pointer to a C function that implements f (input is a pointer to an element of the domain X,
and output is a pointer to an existing chunk of memory that is just big enough to store an element of
the co-domain Y ); and
• g is a pointer to a C function that implements g (input is a pointer to an element of the domain X; g
returns a non-zero value if g(∗input) = true, and 0 otherwise).
• map should allocate memory for the array containing f (L); the user of map is responsible for freeing this
memory after use.
• filter should rearrange the elements of L so that all elements l for which g(l) is true appear in their
original order before all elements l′ for which g(l′ ) is false (these elements should also appear in their
original order). filter should return the number of elements of L for which g(l) is true.
Example. Suppose X = Z (the set of integers), f (x) = x2 and g(x) = true iff x is even. Let L =
[−1, 3, −8, 2]. Then, map(L, f) returns a new array of ints containing [ 1, 9, 64, 4 ]; filter(L, g)
changes L to [ -8, 2, -1, 3] and returns 2.
Submission format: You should upload a C file (say map-filter.c) containing your implementation of
the functions. We will write a program, say tester.c, which will contain the line #include "map-filter.c".
It will also define a number of functions that can be used as f or g, and test whether your implementation
produces the same outputs that we get.
Q4. Your task in this problem is to implement and measure the performance of a Bloom filter, a data structure
used for inexact searching in sets. Suppose S is a set stored using a Bloom filter. In response to a search for
an element x, the Bloom filter returns one of two answers: x is not in S, and I am sure of that, or I think x
is in S, but I could be mistaken (this is called a false positive error when x is not actually in S). At the end
of this question, there is an example from https://en.wikipedia.org/wiki/Bloom_filter of a situation
where inexact searching using Bloom filters is useful.
How a Bloom filter works1 . An empty Bloom filter is a bit array A of m bits, all set to 0. There are
also k different hash functions, say h1 , h2 , . . . , hk , each of which maps an element of S to one of the m array
positions in a uniformly random manner.
To add an element x to the Bloom filter, the bits A[h1 (x)], A[h2 (x)], . . . , A[hk (x)] are each set to 1. To query
for an element, say y, i.e., to test whether y ∈ S, we check A[h1 (y)], A[h2 (y)], . . . , A[hk (y)]. If any of these
bits is 0, then y ̸∈ S (if y ∈ S, then all the bits would have been set to 1 when y was inserted). If all are 1,
then the Bloom filter returns “y may be in S”. If these bits were set by chance to 1 during the insertion of
other elements, this would be a false positive result. Intuitively, if m and k are large, the chances of a false
positive are small; as more and more elements are inserted into the Bloom filter, the chance of a false positive
increases.
Note that, the space required by a Bloom filter to store a set of n items remains fixed at m, while the space
required by exact search structures such as balanced search trees (BSTs) usually grows linearly. On the other
hand, the probability of a false positive error for a Bloom filter grows with n, whereas it is always zero for
BSTs.
Problem statement. The objective of this question is to study the false positive rate vs. space-efficiency
tradeoff for Bloom filters.
1
https://en.wikipedia.org/wiki/Bloom_filter
4
(a) Given a sequence of non-negative integers (as command-line arguments), possibly with repetitions, insert
them in an AVL tree.2 For each element, report whether it was actually inserted (print inserted), or if
it was already contained in the tree (print duplicate). Measure the time taken to process the sequence;
also, compute the total number of nodes in your final balanced search tree, and thus, estimate the total
storage space required (in bytes) for the tree.
(b) Repeat the above exercise with the same sequence, but use the Bloom filter defined below. Note that
your output would sometimes be wrong, i.e., your program would print duplicate for some integers
that are actually appearing for the first time. As before, measure the time taken to process the sequence;
also, compute the false positive error rate per cent.
Bloom filter definition. For a fixed m and k, to insert a non-negative integer x in the Bloom filter,
call srand(x); then compute rand() % m k times to get k values. Set the bits in these k positions to 1
in the Bloom filter.
note: You may use a byte instead of a bit to store 0 and 1, but you will get full credit only if you store the
0 and 1 at the bit level.
Input format: The values of m and k will be provided as the first two command-line arguments. The
remaining command-line arguments comprise the sequence of input numbers.
Output format: Your program should print to stdout a sequence of inserted and duplicate that corre-
sponds to the given input. It should also print to stderr the number of bytes and the time taken to process
the sequence using an AVL tree, m, k, and the time taken to process the sequence using a Bloom filter.
Example: A Web server can use a Bloom filter to determine whether a Web object (a page, an image, etc.)
has been requested earlier. If it has been requested at least once earlier, only then it is stored in a cache.3 If a
cached object is requested again (i.e., thrice or more in all), it can be served quickly from the cache; otherwise,
the object is served more slowly from the hard disk where it is stored. The reason for this strategy is that a
very large proportion of Web objects are requested only once; there is no benefit to storing such objects in
the cache.
One way to implement this strategy would involve storing the identifiers of the requested objects in an exact
search structure, such as a balanced search tree (BST). Since Web servers service requests for a very large
number of objects, such a BST would be large. In response to the question: “Has object x been requested
before?”, a BST would always correctly reply yes or no.
Instead, if the identifiers were stored in a Bloom filter, the amount of space needed would be much less. Of
course, this space saving would come at a cost. In response to the above question, a no would be guaranteed
to be a correct answer, but a yes would sometimes be mistaken, i.e., occasionally, an object being requested
for the first time would appear to have been requested earlier, and would end up being cached unnecessarily.
2
You may use libraries such as GDSL for this purpose.
3
A cache is a faster, but much smaller, storage space that is intended to store a frequently accessed subset of the complete data stored
in a larger, but much slower, storage medium.