Question Bank - Big Data
Question Bank - Big Data
SECTION – A
Question 1 ( unit 1)
Five marks questions
Question 2 ( Unit 2)
Five marks questions
1. What is Hadoop? And How data visualization tools help to work with big data.
2. Explain Apache Spark and Apache Flink
3. What do you understand by Apache Cassandra? Write a short note on Apache
Kafka?
4. What is Apache Pig? What do you understand by Apache Zeppelin:
5. Write short note on Apache Storm: Which Apache technique can be used for
scale datasets.
6. Explain Elasticsearch? What is Apache Mahout:
7. What is Apache Drill? How TensorFlow is useful in big data.
8. How can you create a probability distribution plot in Python?
9. What is Apache Drill? Why we use Splunk?
10. What is Databricks? What are the uses of KNIME in Bigdata?
Question 3 (Unit 3)
Five marks questions
1. What is Hive?
2. What is the usage of Hive? What are some of the features of HIVE?
3. What is a Hive variable? What do we use it for?
4. What are the limitations of HIVE? How to load data into a Hive table?
5. How to query data in Hive? How to insert data into a Hive table?
6. How can you perform linear regression analysis in Python?
7. How to join tables in Hive? How to create partitions in Hive?
8. How to load data into a partition in Hive? How to create an external table in
Hive?
9. How to perform aggregations in Hive? What is the present version of Hive?
Explain ACID transactions in Hive.
10. When should we use SORT BY instead of ORDER BY?
Question 4 ( Unit 4)
Five marks questions
1. What is the role of big data in understanding the genetic diversity and evolution
of species? How does big data help in tracking and studying the spread of
infectious diseases and their evolution?
2. What are some examples of how big data has contributed to our understanding
of human evolution? How does the analysis of big data contribute to the study of
evolutionary relationships among different species?
3. How has big data improved our understanding of the impact of environmental
factors on evolution?
4. What role does big data play in studying the evolution of drug resistance in
pathogens? How does big data facilitate the study of evolutionary dynamics in
complex ecosystems?
5. How does the analysis of big data contribute to understanding the role of genetic
mutations in evolutionary processes?
6. What are some ethical considerations associated with the use of big data in
evolutionary research? What is HDFS (Hadoop Distributed File System)?
7. What are the key features of HDFS? How does HDFS achieve fault tolerance?
8. How does HDFS support high throughput for data-intensive workloads? How
does HDFS handle large files?
9. What is data locality in the context of HDFS?
10. How does HDFS ensure scalability? What are the main components of HDFS
architecture?
Question 5. ( Unit 5)
Five marks questions
SECTION – B
Question 1 ( Unit 1)
Each question carries NINE marks
1. How does big data contribute to cybersecurity? What are some use cases of big
data in e-commerce?
2. How is big data utilized in manufacturing? Write short note on supply chain
management
3. What are some use cases of big data in transportation and logistics? Write short
note on fleet management.
4. How does big data contribute to personalized healthcare? How big data useful in
remote patient monitoring.
5. What are some use cases of big data in the entertainment industry? Write about
audience analysis.
6. How does big data support urban planning and smart cities? What are some use
cases of big data in e-commerce? How is big data utilized in manufacturing?
7. What are some use cases of big data in transportation and logistics? How does
big data contribute to personalized healthcare? What are some use cases of big
data in the entertainment industry?
1. What is the role of HIVE in Distributed System? How query processed in HIVE?
2. What are the common uses of HIVE?
3. What is a Zookeeper? What are the benefits of using a zookeeper?
4. What is partitioning in Hive? What are the components of Apache HBase?
5. When is it appropriate to use a NoSQL database?
6. What are the advantages of Apache Spark?
7. How we use spark?
Question 3 ( unit 5)
1. What is dynamic partitioning and when is it used? What is indexing and why do
we need it? Explain the different types of joins in Hive.
2. How does data transfer happen from HDFS to Hive? How can you create a
temporary table in Hive?
3. How can you perform a subquery in Hive? How can you use a user-defined
function (UDF) in Hive? How can you export data from Hive to external
systems?
4. How can you monitor Hive jobs? How can you optimize Hive queries for
performance? How can you perform data transformations in Hive?
5. How can you comment in Hive scripts? How can you run a Hive script? How can
you filter data in Hive?
6. Write queries for the following:
To drop a Hive table
To display the schema of a Hive table
To perform sorting in Hive
To rename a Hive table?
SECTION - C
Question 1
Each question carries TWELVE marks
1. How does big data impact industries and sectors? How did the Hadoop
framework influence the history of big data? What factors contributed to the
growth of big data? ( Unit 1)
2. What are the different types of tools in Big Data? When do we use Apache drill
over Apache Hive ( Unit 2)
3. How is Apache Spark different from MapReduce? Suppose that I want to
monitor all the open and aborted transactions in the system along with the
transaction id and the transaction state. Can this be achieved using Apache
Hive? ( Unit 3)
4. What are the different components of a Hive architecture? ( unit 4)
Write queries to perform aggregate functions:
To calculate the minimum value of a column in Hive
To calculate the maximum value of a column in Hive?
To calculate the sum of a column in Hive?
To calculate the average of a column in Hive
To calculate the count of a column in Hive
5. What are the different types of tables available in Hive? What is the difference
between external and managed tables in Hive? What do you understand by a
Hive Metastore? What is the difference between local and remote Meta stores in
Hive? ( unit 4)
6. Is it possible to run a Unix shell command from Hive? Give an example to
demonstrate. What do you understand by bucketing in Hive? Why do we need
a bucket? Can you list a few commonly used Hive services? ( unit 4 )