0% found this document useful (0 votes)
29 views2 pages

UEC718

The document contains 7 questions related to Big Data technologies like Hadoop, Hive, MapReduce, Spark etc. The questions test knowledge of components of Hadoop ecosystem, MapReduce concepts, Hive queries, collaborative filtering, K-means clustering, RDD operations in Spark.

Uploaded by

Abhi Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views2 pages

UEC718

The document contains 7 questions related to Big Data technologies like Hadoop, Hive, MapReduce, Spark etc. The questions test knowledge of components of Hadoop ecosystem, MapReduce concepts, Hive queries, collaborative filtering, K-means clustering, RDD operations in Spark.

Uploaded by

Abhi Mittal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Thapar Institute of Engineering and Technology

UEC 718 — Big Data Analytics "U" grade Exam — 07.03.2022 Duration: 2 hrs
Instructors: Debayani Ghosh, Arnab Pattanayak
Note: Attempt any 5 questions out of the following (Max-45 Marks)

Question I. (9 marks)
(a)What are the components of Hadoop echosystem? (3 marks)
(b) Define the functionalities of a Namenode and a Datanode. (3 marks)
(c) Describe Map-Reduce with an example. (3 marks)

Question 2. (9 marks)

Employee ID Name Age Employee ID Salary


1 Alice 23 3 26000
3 Rose 25 5 30000
5 Michael 24 7 25000

Let's consider above two tables in .csv format. The left table is 'employee.csv' and the right
table is `salary.csv'

(a) Now write a program in Apache Hive to create a database 'office'.


Create two tables 'employee' and 'salary' in that database 'office' from .csv files
'employee.csv' and 'salary.csv', respectively. (3 marks)
(b) Perform a relational join on two tables based on common column. (4 marks)
(c) Write a query to display the details of employees with more than 25000 salary. (2
marks)

Question 3. (9 marks) Consider the following string

"this will add the string to string constant pool"


(a) Write a map-reduce program to print below answer —(7 marks)
2: [to]
3: [add, the]
4: [this, will, pool]
5: [string, string]
8: [constant] (2, 3. 4, 5, 8 are the lengths of the respective words)

(b) Explain the functions of a Jobtracker and a Tasktracker. (2 marks)

Question 4. (9 marks)

Consider the following matrix of 12 users rated 6 movies


Ul U2 U3 U4 U5 U6 U7 US U9 LI0
MI I 3 I 4 4 I
i
M2 4 5 4 1
M3 I 5 2 11- 3 4
M4 1 5 5 5
M5 4 3 5
M6 1 3 3 2 —+

Using item-item collaborative filtering, predict the rating of MI by l'5. For similarity
measure use centred cosine similarity and use 2 nearest neighbours of M I. I 9 marks)

Question 5. (9 marks)
(a) Define RDD. (3 marks)
(b) What is a DStream? (2 marks)
(c) Given the statement — (4 marks)
rdd = parallelize(Wa', 1), (V, I), (`a',1)])
Find a way to count the occurrences of the keys and print the following output: I('a',
2), (W. I)].

Question 6. (9 marks)
Consider the following two-dimensional dataset —

(1, 8), (1, 5), (4. 4), (5, 8), (8, 5), (5, 5), (4, 2). (4, 9)
Apply two iterations of K-means clustering algorithm to the above data points to group them
into three clusters. Show the three cluster centroids and data points belonged to these three
clusters. Choose initial cluster centroids as (1,8), (5,8) and (4.2) and calculate ihe distance
between two points using the Euclidean distance formula.

Question 7. (9 marks) In a fladoop environment, write the following programs in Python-


Spark programming:

(a) Read and display the contents of a text file (2 marks)


(b) Given the RDD which contains the following elements: [100, 100, 210. 300, 300.
400, 4050, 400], display only the first occurrence of a number. (4 marks)
(c) Given a dataset ([10, 12, 13, 14, 15]), print only -> [10, 12, 14]. (3 marks)

You might also like