Bda End Sem
Bda End Sem
Module : 1
1. Mention four characteristics of big data and explain in detail.(5 marks/24)
2. Write three important characteristics of big data and explain any one with real life
example. (5 marks/24)
3. Explain how big data problems are handled by Hadoop system(5 marks/24)
4. Explain Hadoop ecosystem components - Hive and Pig.(5 marks/24)
5. Hadoop advantages and limitations(5 marks/24)
6. Mention four characteristics of big data. Elaborate these characteristics with respect to
social media websites.(5 marks/24)
7. What is the basic difference between traditional RDBMS and Hadoop?(5/23)
8. What are the 3 V’s of big data? Give two big data case studies indicating respective
V’s with justification. (5/23)
9. Why is HDFS more suited for applications having large datasets and not when there
are small files? Elaborate. (5/22)
10. What are the Core Hadoop components? Explain in detail.(10/22)
11. Give a brief overview of hadoop core components and Hadoop Ecosystem
Components. (5/22)
12. Mention the 4 characteristics of bigdata. Elaborate these characteristics w.r.to social
media websites. (5/20)
13. List down at least 4 different sources of bigdata from different domain and justify
how they can be considered as bigdata applications.(5/20)
14. When it comes to big data how NoSQL scores over RDBMS. (5/19)
15. Give difference between Traditional data management and analytics approach Versus
Big data Approach(5/19)
16. What is Hadoop? Describe HDFS architechure with diagram. (10/19)
17.
Module : 2
1. Write a map reduce pseudo code for word count problem. Illustrate with an example
showing all the steps.(10 marks/24)
2. Explain selection and projection relational algebraic operation using MapReduce. (10
marks/24)
3. Explain MapReduce programming model in detail. (5 marks/24)
4. Discuss 1-step Matrix-Matrix Multiplication MapReduce algorithm and apply to the
Module : 3
1. List and explain the core business drivers behind the NoSQL movement. (5 marks/24)
2. Differentiate between SQL and NoSQL system.(5 marks/24)
3. Recall all NoSQL design patterns with example. Justify CAP property(10 marks/24)
4. b) List and explain the core business drivers behind the NoSQL movement. (5
marks/23)
5. What is a key-value store? What are the benefits of using a key-value store? (10/23)
6. Describe the four ways by which big data problems are handled by NoSQL.(10/23)
7. Demonstrate how business problems have been successfully solved faster, cheaper
and more effectively considering NoSQL Google’s MapReduce case study. Also
illustrate the business drivers and the findings in it.(5/22)
8. Name the three ways that resources can be shared between computer systems. Name
the architecture used in big data solutions and describe it in detail.(10/22)
9. Compare KeyValue No-SQL datastore with Document based NoSQL datastore.
(5/22)
10. Explain in detail any two Big data Applications based on NoSQL.(5/22)
11. List all variation of NoSQL database with two features of each and two examples of
each(5/20)
12. Explain CAP theorem of NoSQL database. As No SQL database is not able to adopt
ACID properties can we adopt NoSQL for traditional banking application?(5/20)
13. Explain different ways by which big data problems are handled by NoSQL. (10/19)
Module 4 :
1. Explain the concept of bloom filter with an example.(5/24)
2. Suppose the stream is S = {4, 2, 5 ,9, 1, 6, 3, 7}. Let hash functions h(x) = 3x + 7mod
32 for some a and b, treat result as a 5-bit binary integer. Show how the Flajolet-
Martin algorithm will estimate the number of distinct elements in this stream.(10/24)
3. Explain DGIM algorithm for counting ones in a stream with example(10/24)
4. Explain DGIM algorithm for counting ones in a stream with example.(10/24)
5. FM algorithm(5/24)
6. List and explain the different issues and challenges in data stream query
processing.(5/23)
7. Suppose the stream is S = {2, 1, 6, 1, 5, 9, 2, 3, 5}. Let hash functions h(x) = ax + b
mod 16 for some a and b, treat result as a 4-bit binary integer. Show how the Flajolet-
Martin algorithm will estimate the number of distinct elements, h(x) = 4x + 1 mod
16.(10/23)
8. With a neat sketch, explain the architecture of the data-stream management
system.(10/23)
9. List down all six constraints that must be satisfied for representing a stream by
buckets using DGIM algorithm with examples.(5/23)
10. Suppose the stream is S = {4, 2, 5 ,9, 1, 6, 3, 7}. Let hash functions h(x) = x + 6 mod
32 for some a and b, treat result as a 5-bit binary integer. Show how the Flajolet-
Martin algorithm will estimate the number of distinct elements in this stream.(10/23)
11. Explain DGIM algorithm for counting ones in a stream with example.(10/23)
12. Explain the concept of bloom filter with an example(5/22)
13. Suppose the stream is 1, 3, 2, 1, 2, 3, 4, 3, 1, 2, 3, 1. Let h(x) = 6x + 1 mod 5. Show
how the Flajolet- Martin algorithm will estimate the number of distinct elements in
this stream.(10/22)
14. With a neat sketch, explain the architecture of the data-stream management
system(10/22)
15. Why is it difficult to work with stream data?(5/21)
16. Explain the architecture of Data Stream Management Systems. How is it different
from DBMS?(10/21)
17. Investigate problems in Flajolet-Martin (FM) algorithm to count distinct elements
in a stream.(5/21)
18. Explain the DGIM algorithm and solve the following problem :
Consider the data stream shown below with N=14. 10011010101011101
i) Show one way of how the above initial stream will be divided into buckets and
count distinct 1’s.
ii) ii) The following bits enter the window one at a time: 10101. What is the
bucket configuration in the window after this sequence of bits has been
processed by DGIM and count distinct 1’s.(10/21)
19. Consider the stock market stream data. Justify the data stream features and draw the
model of data stream management for the mention system. Give two examples of
onetime query and continuous query from stock marketing stream.(10/20)
20. Explain with block diagram architechure of Data stream Management System.(10/19)
21. What do you mean by Counting Distinct Elements in a stream. Illustrate with an
example working of an Flajolet - Martin Algorithm used to count number of distinct
elements.(10/19)
Module 5 :
1. What is graph store? Give an example where a graph store can be used to
effectively solve a particular business problem.(10/24)
2. Determine communities for the given social network graph using Girvan-
Newman algorithm. (10/24)
10. How recommendation is done based on properties of the product? Explain with
the help of an example(10/23)
11. Determine communities for the given social network graph using Girvan-
Newman algorithm.(10/22)
(10/21)
17. What is the use of Recommender System. How is classification algorithm used in
recommendation system.(10/19)
18.
(10/19)