University of Mumbai Examination 2020 Under Cluster - 4 - (Lead College: PCE-New Panvel)
University of Mumbai Examination 2020 Under Cluster - 4 - (Lead College: PCE-New Panvel)
Q1.(s) Which of the following tool is designed for efficiently transferring bulk data
between Apache Hadoop and structured datastores such as relational databases?
Option A: A. Replication Factor can be configured at a cluster level (Default is set to 3) and
also at a file level
Option B: Block Report from each DataNode contains a list of all the blocks that are stored
on that DataNode
Option C: User data is stored on the local file system of DataNodes
Option D: DataNode is aware of the files to which the blocks stored on it belong to
Option A: DataNode is the slave/worker node and holds the user data in the form of Data
Blocks
Option B: Each incoming file is broken into 32 MB by default
Option C: Data blocks are replicated across different nodes in the cluster to ensure a low
degree of fault tolerance
Option D: DataNode is master node and holds the meta data details
1 | Page
6.s Which of the following is a wrong statement for a document store
Option A: Documents can contain many different key-value pairs, or key-array pairs, or
even nested documents
Option B: When compared to relational databases, Document stores are more scalable and
provide superior performance
Option C: It requires schema to be defined before you can add data
Option D: Secondary indices are available in Document store
Option A: t-1
∑ at (1-c)i
i=0
Option B: t-1
∑ at (1-c)i
i=0
Option C: t
∑ at-1 (1-c)i
i=0
Option D: t-1
∑ at-1 (1-c)i
i=0
10.s While devising the bloom filter if the filter is of 5 bits 0 0 0 0 0 and 2 hash functions
h1(x) = x mod 5 and h2(x)= (2x+3) mod 5 are used, what is the filter bit positions
when 9 followed by 11 is inserted
Option A: 01001
Option B: 10001
Option C: 11001
2 | Page
Option D: 00001
11.d Stream Queries is one that is supplied to the DSMS before any relevant data has
arrived is called as
12.s The angle between two points in Cosine Distance will range from
Option A: 0 to 90 degrees
Option B: 0 to 180 degrees
Option C: 0 to 360 degrees
Option D: 90 to 180 degrees
13.D Which of the step is not performed in the second phase of the CURE algorithm
Option A: clustering the renaming points and output the final cluster
Option B: merge two clusters if they have a pair of representative points, one from each
cluster, that are sufficiently close.
Option C: Move each of the representative points a fixed fraction of the distance between its
location and the centroid of its cluster.
Option D: Each point P is brought from secondary storage and compared with the
representative points
14.M For the distance function, the triangle inequality guarantees the function is well-
behaved. Which of the following shows correct distance function for triangle
inequality?
Option A: d(x,y) = d( x, y) + d( z)
Option B: d(x,y) = d( x,y) + d(x,z)
Option C: d(x,y) = d(x,z) + d(z,y)
Option D: d(x) = d(y) + d(z)
15.s Find the correct Hamming distance between X=111111101 and Y=000111111
Option A: 4
Option B: 5
Option C: 3
Option D: 2
16.D The process of identifying similar users and recommending what similar users like
is called _________ .
19.s The _______ , consists of pages that could reach the SCC by following links, but
were not reachable from the SCC.
Option A: out-component
Option B: in-component
Option C: Tendrils
Option D: Tubes
20.D The problems of dead end and spider traps are solved by a method called
__________
4 | Page