0% found this document useful (0 votes)
116 views2 pages

Bda Simp-23

This document provides sample questions for an exam on big data analytics (BDA) across 5 modules: 1. The first module covers the need for big data, its types and characteristics, big data architecture design, case studies and applications, and differences between distributed computing approaches and data types. 2. The second module asks about Hadoop core components, the Hadoop ecosystem, HDFS features and commands, YARN and MapReduce, and Apache Sqoop and Flume. 3. The third module covers NoSQL databases, types of NoSQL databases, MongoDB and Cassandra features and commands, distribution models, and shared nothing architecture. 4. The fourth module asks about MapReduce execution

Uploaded by

Sana Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views2 pages

Bda Simp-23

This document provides sample questions for an exam on big data analytics (BDA) across 5 modules: 1. The first module covers the need for big data, its types and characteristics, big data architecture design, case studies and applications, and differences between distributed computing approaches and data types. 2. The second module asks about Hadoop core components, the Hadoop ecosystem, HDFS features and commands, YARN and MapReduce, and Apache Sqoop and Flume. 3. The third module covers NoSQL databases, types of NoSQL databases, MongoDB and Cassandra features and commands, distribution models, and shared nothing architecture. 4. The fourth module asks about MapReduce execution

Uploaded by

Sana Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

BDA(18CS72)-IMP and SIMP Questions

by TIE review team- RNS,BNMIT,JSSATE

Disclaimer: These questions are prepared by the TIE review team teachers/mentors by
referring to various question banks and Internal question papers from more than 10
colleges. The sole purpose of this is to give a thorough idea about the Questions in the
final assessment paper(sem-end exams).

Module-1

1. Explain why Big Data is Needed in the Modern World , Mention its types? Explain
its evolution and also Mention its characteristics(4V’s)
2. With a neat Sketch, Discuss the five layers in Big Data Architecture Design
3. Discuss various case studies and applications of Big data
4. Differentiate between the following - 5M each
(i) Distributed computing v/s Grid computing v/s Cluster computing
(ii)Horizontal Scalability vs Vertical Scalability
(iii)Structured v/s Unstructured v/s Semi Structured Data
5. Mention any 6 techniques used for Data Preprocessing, also mention the
advantages of BDAS by understanding its future scope in the field of Big Data
6. Define:(i)Hadoop (ii) Mesos (iii)SQL and NoSQL(with features) (iv)DDBMS
(v)In-memory column and row format data

Module-2

1. Describe the Hadoop Core Components with a diagram


2. Explain using a diagram the Hadoop Ecosystem its Components and features
3. What is HDFS? List the different commands in HDFS,also explain its features
4. Write a note on (i)YARN Based Execution Model (ii)Map-reduce -
Framework,Features,Functions with necessary diagrams - 8M Each
5. Explain with a neat labeled diagram Apache Sqoop and Flume
6. Describe the Hadoop physical organization? Write the features of Hadoop?
Mention any 5 Essential Hadoop tools

Module-3

1. What is NoSQL? What are the advantages of NoSQL? Explain why


NoSQL should be used to Manage Big Data
2. What is a NoSQL database? List and explain any 4 Types of NoSQL
databases and also list any 3 differences between NoSQL and SQL
3. Explain the Components,features,Data types and various commands of
CQL and MongoDB, state the difference between both - 16+4M
4. Explain Master-Slave v/s Peer-Peer Distribution Model with proper
Diagrams, Explain which is the right distribution model in terms of
business requirements - VBQ
5. Explain shared nothing architecture for the Big Data Tasks
6. Write a short note on (i) Cap theorem (ii) Acid properties

Module-5

1. Explain in detail web content mining and diff phases for web usage mining
2. Difference between (i)Linear and non-linear relationship (ii)Standard deviation
and standard error
3. Explain apriori algorithm for frequent itemsets and association rule mining
4. Explain Social Network as graph and its analytics
5. Describe the regression analysis using linear and non linear models, explain KNN in
detail
6. Explain the following (i) Probability Distributions, and Correlations (ii)Page rank
(iii)Web Usage Analytics

Module-4

1. Describe the MapReduce execution steps when a client submits a job with neat
diagram
2. What is Hive in Big data? List the features of Hive? Also, explain Hive architecture
with relevant diagrams
3. Write a short note on Pig Data Model(pig architecture) along with its features,
also list out commands used in Pig Data Model by explaining its Data types -12M
4. What are MapReduce Tasks?Explain with examples
5. Write a note on HiveQL, and its Queries.
6. Explain how hive interacts with Hadoop(VBQ)

You might also like