UEC718
UEC718
UEC 718 — Big Data Analytics "U" grade Exam — 07.03.2022 Duration: 2 hrs
Instructors: Debayani Ghosh, Arnab Pattanayak
Note: Attempt any 5 questions out of the following (Max-45 Marks)
Question I. (9 marks)
(a)What are the components of Hadoop echosystem? (3 marks)
(b) Define the functionalities of a Namenode and a Datanode. (3 marks)
(c) Describe Map-Reduce with an example. (3 marks)
Question 2. (9 marks)
Let's consider above two tables in .csv format. The left table is 'employee.csv' and the right
table is `salary.csv'
Question 4. (9 marks)
Using item-item collaborative filtering, predict the rating of MI by l'5. For similarity
measure use centred cosine similarity and use 2 nearest neighbours of M I. I 9 marks)
Question 5. (9 marks)
(a) Define RDD. (3 marks)
(b) What is a DStream? (2 marks)
(c) Given the statement — (4 marks)
rdd = parallelize(Wa', 1), (V, I), (`a',1)])
Find a way to count the occurrences of the keys and print the following output: I('a',
2), (W. I)].
Question 6. (9 marks)
Consider the following two-dimensional dataset —
(1, 8), (1, 5), (4. 4), (5, 8), (8, 5), (5, 5), (4, 2). (4, 9)
Apply two iterations of K-means clustering algorithm to the above data points to group them
into three clusters. Show the three cluster centroids and data points belonged to these three
clusters. Choose initial cluster centroids as (1,8), (5,8) and (4.2) and calculate ihe distance
between two points using the Euclidean distance formula.