0% found this document useful (0 votes)
23 views2 pages

Question Bank Big Data Analytics

The document outlines a comprehensive curriculum on Big Data Analytics, divided into five units covering key concepts, technologies, and methodologies. It includes topics such as the importance of big data, Hadoop architecture, NoSQL databases, MapReduce paradigm, and Hive architecture. Each unit addresses fundamental differences between traditional and big data systems, challenges, and practical applications in data analytics.

Uploaded by

anisha01531
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views2 pages

Question Bank Big Data Analytics

The document outlines a comprehensive curriculum on Big Data Analytics, divided into five units covering key concepts, technologies, and methodologies. It includes topics such as the importance of big data, Hadoop architecture, NoSQL databases, MapReduce paradigm, and Hive architecture. Each unit addresses fundamental differences between traditional and big data systems, challenges, and practical applications in data analytics.

Uploaded by

anisha01531
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Big Data Analytics

Unit-I
1. Big data and its importance
2. Characteristics or 5 V’s of Big data
3. Types of big data
4. Difference between traditional data and big data
5. Challenges of big data
6. Big data analytics and classification
7. Big data technologies
8. Big data applications (all the applications)

Unit-II
1. Hadoop architecture
2. Explain how data is analyzed with Hadoop Using Map and Reduce
3. Explain Hadoop Ecosystem.
4. Explain how Data is read using the Java FileSystem API.
5. Anatomy of file read.
6. Anatomy of file writes.
7. Difference between traditional database and hadoop
8. List various hadoop filesystems.
9. Assumption and goals for hadoop design.
10. Define failover and fencing in hadoop.
11. Replication policy used in hadoop design
12. What is safemode

Unit-III
1. Why NoSQL is used.
2. Difference between SQL and NoSQL
3. Types of NoSQL databases
4. Database impedence mismatch
5. Polygot persistence
6. What is Auto sharding
7. Aggregate data model with example
8. What is Key-value and document data model
9. Graph databases and relationships
10. What is schemaless database
11. Explain distribution models
12. Explain sharding
13. Explain partitioning and combining in MapReduce
Unit-IV
1. explain working of MapReduce paradigm
2. map reduce architecture
3. anatomy of YARN map reduce job run
4. anatomy of classic map reduce job run
5. failures in classic/YARN MapReduce
6. Explain Map Reduce Input formats
7. Shuffle and sort in map reduce
8. Job tracker, task tracker
9. Node manager, resource manager
10. Application master failure in YARN map reduce
11. Scheduling and types of schedulers
12. Speculative execution
13. Types of input formats for hadoop
14. Types of output formats for hadoop
15. Benefits of map reduce

Unit-V
1. Explain Hive architecture
2. Hive metastore configuration.
3. Hive comparison with traditional databases.
4. Difference between SQL and HQL.
5. What are Hive partitions and buckets
6. Sorting and aggregating in Hive.
7. Hive UDF and UDAF with examples.
8. Working (workflow) of Hive UDAF.
9. Pig Vs Hive, Pig Vs SQL.
10. Apache Pig data processing operators.
11. Apache Pig register and define statements
12. Pig Latin built-in functions.
13. Types of functions in Apache Pig.
14. Apache Pig UDFs
15. Grouping and joining, combining and splitting in Pig Latin.

You might also like