0% found this document useful (0 votes)
237 views2 pages

Bda

This document is an exam for the subject "Big Data Analytics" taken at Gujarat Technological University. It contains 5 questions assessing various topics in big data and distributed systems. Question 1 asks about big data processing vs distributed processing, applications of big data for business, and the Hadoop architecture. Question 2 covers Avro data serialization, big data characteristics, and the Hadoop ecosystem. Question 3 involves HDFS commands, MapReduce phases, and writing MapReduce programs. Question 4 is about Zookeeper, HDFS architecture, and Apache Pig. Question 5 discusses MongoDB concepts and NoSQL databases or alternately scaling in MongoDB, RDDs in Spark, and why RDDs are better than MapReduce storage.

Uploaded by

Jigar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
237 views2 pages

Bda

This document is an exam for the subject "Big Data Analytics" taken at Gujarat Technological University. It contains 5 questions assessing various topics in big data and distributed systems. Question 1 asks about big data processing vs distributed processing, applications of big data for business, and the Hadoop architecture. Question 2 covers Avro data serialization, big data characteristics, and the Hadoop ecosystem. Question 3 involves HDFS commands, MapReduce phases, and writing MapReduce programs. Question 4 is about Zookeeper, HDFS architecture, and Apache Pig. Question 5 discusses MongoDB concepts and NoSQL databases or alternately scaling in MongoDB, RDDs in Spark, and why RDDs are better than MapReduce storage.

Uploaded by

Jigar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Seat No.: ________ Enrolment No.

___________

GUJARAT TECHNOLOGICAL UNIVERSITY


BE – SEMESTER 7 (NEW SYLLABUS) EXAMINATION- SUMMER 2018

Subject Code: 2171607 Date: 28-04-2018


Subject Name: BIG DATA ANALYTICS (Department Elective-II)
Time: 02:30 pm to 05:00 pm Total Marks: 70
Instructions:
1. Attempt all questions.
2. Make suitable assumptions wherever necessary.
3. Figures to the right indicate full marks.

Q.1 (a) What is Big Data? Explain how big data processing differs from 03
distributed processing.
(b) List various application of big data. How it can be used to improve 04
business for a superstore.
(c) Explain core architecture of Hadoop with suitable block diagram. Discuss 07
role of each component in detail.

Q.2 (a) Explain Avro data serialization technique in MapReduce. 03


(b) Explain characteristics of Big Data. 04
(c) What is Hadoop Ecosystem? Discuss various components of Hadoop 07
Ecosystem.
OR
(c) What is data serialization? With proper examples discuss and differentiate 07
structured, unstructured and semi-structured data. Make a note on how
type of data affects data serialization.
Q.3 (a) Explain following commands with syntax and at least one example of 03
each. (1) copyFromLocal (2) showing the content of outputfile.
(b) Explain “Map Phase” and “Combiner Phase” in MapReduce. 04
(c) Write Map Reduce steps for counting occurrences of specific numbers in 07
the input text file(s). Also write the commands to compile and run the
code.
OR
Q.3 (a) List various configuration files used in Hadoop Installation. What is use 03
of mapred-site.xml?
(b) Explain “Shuffle & Sort” phase and “Reducer Phase” in MapReduce. 04
(c) Write Map Reduce steps for counting sum of numbers in the input text 07
file(s). Also write the commands to compile and run the code.
Q.4 (a) What is Zookeeper? What are the benefits of Zookeeper? 03
(b) Draw architecture of APACHE PIG and explain in short. 04
(c) Define HDFS. Discuss the HDFS Architecture and HDFS Commands in 07
brief.
OR
Q.4 (a) What is HBase? Write a query to create a table in HBase. 03
(b) Discuss role of Data node and Name node in HDFS. 04

1
(c) Draw and explain Architecture of APACHE HIVE. Explain various data 07
insertion techniques in HIVE with example.
[P.T.O]
Q.5 (a) Explain following in brief with respect to Mongo DB : 03
1) Collections and documents
2) Indexing and retrieval
(b) Write difference between MangoDB and Hadoop. 04
(c) What is NoSQL database? List the differences between NoSQL and 07
relational databases. Explain in brief various types of NoSQL databases
in practice.
OR
Q.5 (a) Explain scaling in MangoDB. 03
(b) Explain CRUD operations in MongoDB. 04
(c) What is Resilient Distributed Dataset in Apache Spark? Explain in detail. 07
Make a note on why RDD is better than Map Reduce data storage?

*************

You might also like