0% found this document useful (0 votes)
128 views

Sapthagiri College of Engineering: Department of Information Science and Engineering Big Data Analytics Question Bank

This document contains a question bank with modules on big data analytics topics. Module 1 covers HDFS components and architecture, MapReduce concepts, and HDFS commands and benchmarks. Module 2 discusses Apache Hadoop ecosystem projects like Pig, Hive, Sqoop, Flume, Oozie, HBase, YARN, and Tez. Module 3 is about business intelligence and data warehousing concepts. Modules 4 and 5 cover data mining and machine learning techniques like clustering, decision trees, text mining and web mining. The document provides a comprehensive set of questions to test knowledge of distributed computing and big data systems using Hadoop, as well as data warehousing, business intelligence and analytics concepts and techniques.

Uploaded by

Kaushik Kaps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views

Sapthagiri College of Engineering: Department of Information Science and Engineering Big Data Analytics Question Bank

This document contains a question bank with modules on big data analytics topics. Module 1 covers HDFS components and architecture, MapReduce concepts, and HDFS commands and benchmarks. Module 2 discusses Apache Hadoop ecosystem projects like Pig, Hive, Sqoop, Flume, Oozie, HBase, YARN, and Tez. Module 3 is about business intelligence and data warehousing concepts. Modules 4 and 5 cover data mining and machine learning techniques like clustering, decision trees, text mining and web mining. The document provides a comprehensive set of questions to test knowledge of distributed computing and big data systems using Hadoop, as well as data warehousing, business intelligence and analytics concepts and techniques.

Uploaded by

Kaushik Kaps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

SAPTHAGIRI COLLEGE OF ENGINEERING

(Affiliated to VTU, Belagavi, Approved by AICTE, New Delhi)


14/5, CHIKKASANDRA, HESARAGHATTA MAIN ROAD
BENGALURU-560 057
DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING
Big Data Analytics Question Bank
Module-1

1. Describe features of HDFS


2. Explain in detail different components of HDFS
3. How HDFS block replication happens show it pictorially
4. Explain how HDFS works in safe mode ? Define Rack awareness
5. Explain NameNode high availability design with diagram
6. Explain Apache map reduce parallel data flow
7. Define HDFS NameNode Federation example ? Explain HDFS NFS Gateway ?
8. Explain HDFS check points , backups and snapshots ?
9. List few HDFS commands
10. List Hadoop Benchmarks and explain in detail TeraSort and TestDFSIO Benchmark ?
11. Explain in detail “mapred” command ?
12. With simple scripts explain mapper and reducer ?
13. Explain how mapreduce model functions ?
14. Explain in detail Mapreduce Parallel Data flow with diagram?
15. Explain process placement during MapReduce ?
16. How is HDFS Fault Tolerant ? Explain Speculative execution in HDFS
17. Write wordCount program in Java, C++ and Python
18. Explain Streaming interface ? What are the limitations of Streaming interface ?
19. Explain pipes interface ?
20. How debugging is done in HDFS ? Explain Hadoop Log Management ?

Module 2
1. Explain Apache Pig along with commands.
2. Explain Apache Hive with commands
3. Explain Apache Sqoop ? Explain Apache Sqoop Import and Export methods.
4. Describe Apache Flume Agent Components with neat sketch. [Include
pipeline and also consolidation network]
5. Explain in detail Apache Oozie with workflow DAG.
6. Explain HBase in detail.
7. Explain Structure of YARN Applications.
8. Explain the following –
Apache Tez, Apache Giraph, Apache Storm, Apache Spark, Apache Flink
9. Explain YARN architecture taking two clients with neat diagram.

1
Module 3
1. How BI can be used for better decisions ?
2. Explain BI tools in detail.
3. Explain any 2 BI applications in detail.
4. List three Business intelligence applications in Healthcare and wellness.
5. List three Business intelligence applications in Education
6. List three Business intelligence applications in Customer relationship management.
7. What are the design considerations for Data warehouse ?

8. Compare Datamart and Datawarehouse.

9. Describe DataWarehouse Architecture.

10. Explain DataLoading process.

11. Explain DataWarehouse Design.

12. Explain Datawarehouse Best practices.

13. What is DataMining ? What are supervised and unsupervised learning techniques.

14. What are the possible outputs of DataMining ?

15. How to evaluate Data Mining results?

16. Explain Data Mining Techniques.

17. List down Data Mining tools.

18. List down the Data Mining Best practices.

19. What are the major mistakes to be avoided when doing Data Mining ?

20. What is confusion matrix ?

21. Why is data preparation so important and time consuming ?

Module 4 & Module 5

1. What is clustering? Explain the applications of clustering. Write the generic pseudo code for
clustering.

2. Comparison between decision tree with table lookup.

3. Explain with an example of K-Means algorithm for clustering.

4. Explain the construction of the decision tree and pseudo code of making a decision tree.

2
5. Write an architectural diagram for text mining and explain. What are the applications of text
mining?

6. Explain the web content mining, structure mining and usage mining in detail.

7. Write the differences between text mining and data mining.

8. How data can be stored and accessed in big data technologies.

9. Write a note on web mining algorithms.

10. Explain Term Document Matrix.

You might also like