0% found this document useful (0 votes)
30 views5 pages

Big Data QB

The document contains a question bank for the subject Big Data Analytics. It has three parts with multiple choice questions divided into five units related to topics like Hadoop, MapReduce, HDFS, NoSQL databases, and Big Data tools. The questions cover concepts like unstructured data, Hadoop ecosystem, Cassandra architecture, MapReduce workflows, and features of tools like Hive and Pig.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views5 pages

Big Data QB

The document contains a question bank for the subject Big Data Analytics. It has three parts with multiple choice questions divided into five units related to topics like Hadoop, MapReduce, HDFS, NoSQL databases, and Big Data tools. The questions cover concepts like unstructured data, Hadoop ecosystem, Cassandra architecture, MapReduce workflows, and features of tools like Hive and Pig.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Academic Year 2023-2024 (Odd Semester)

Department of Information Technology


_______________________________________________
Question Bank

Subject Code & Subject Name :CCS334 & Big Data Analytics
Year & Sem :III & V
Name of Faculty :M.JEBA MALAR
Designation & Department :Assistant Professor & IT

Part A

S.No Question BL CO PI MM/YY


Unit 1
1. What is unstructured data? L1 CO1 1.4.1
2. What do you mean by big data analytics? L2 CO1 1.4.1
3. What is Hadoop? L1 CO1 1.4.1
4. How big data is used in marketing? L4 CO1 1.4.1
5. Define streaming data. L1 CO1 1.4.1
6. What is data science? L1 CO1 1.4.1
7. What is a web log file? L1 CO1 1.4.1
8. What is a web crawler? L1 CO1 1.4.1
9. What are the characteristics of a firewall? L2 CO1 1.4.1
10. Compare Cloud computing and Big Data. L4 CO1 1.4.1
Unit 2
1. Define Cassandra. L1 CO2 1.4.1
2. What is the difference between sharding and L4 CO2 1.4.1
replication?
3. What are schemaless databases? L2 CO2 1.4.1
4. List the advantages of graph data L1 CO2 1.4.1
5. What is the use of Bloom filters in Cassandra? L2 CO2 1.4.1
6. Define session consistency. L1 CO2 1.4.1
7. What is database sharding? L1 CO2 1.4.1
8. Why are NOSQL database known as schemaless L2 CO2 1.4.1
database?
9. How is sharding different from partitioning? L4 CO2 1.4.1
10. Waht are write-write and read-write conflicts? L1 CO2 1.4.1
Unit 3
1. Why do we need Hadoop streaming? L4 CO3 1.4.1
2. How HDFS services support big data? L4 CO3 1.4.1
3. Define Serialization L1 CO3 1.4.1
4. What is MapFile? L1 CO3 1.4.1
5. What is the Hadoop distributed file system? L1 CO3 1.4.1
6. What is data locality optimization? L1 CO3 1.4.1
7. What if writable were not there in Hadoop? L4 CO3 1.4.1
8. What is writables in Hadoop? L1 CO3 1.4.1
9. What happens if a client detects an error when L2 CO3 1.4.1
reading a block in Hadoop?
10. What are Hadoop pipes? L2 CO3 1.4.1
Unit 4
1. Define MapReduce. L2 CO4 1.4.1
2. List the characteristictics of MapReduce? L1 CO4 1.4.1
3. What are the major responsibilities L4 CO4 1.4.1
4. Why is YARN used? L1 CO4 1.4.1
5. What is fair scheduler? L1 CO4 1.4.1
6. List the failures of MapReduce. L1 CO4 1.4.1
7. Explain First in First out Scheduling. L2 CO4 1.4.1
8. Why Hadoop works better with a small number L1 CO4 1.4.1
of large files?
9. What is TextInputFormat? L4 CO4 1.4.1
10. What is Node Manager failure in YARN? L1 CO4 1.4.1
Unit 5
1. What is HBase? L1 CO5 1.4.1
2. What is Hive? L2 CO5 1.4.1
3. What is Hive data definition? L1 CO5 1.4.1
4. Explain services provided by Zookeeper in L4 CO5 1.4.1
Hbase
5. What is Zookeeper? L1 CO5 1.4.1
6. What are the responsibitities of HMaster? L1 CO5 1.4.1
7. Where to Use HBase? L1 CO5 1.4.1
8. Explain unique features of Hbase? L2 CO5 1.4.1
9. Explain data model in Hbase? L2 CO5 1.4.1
10. What is the difference between Pig Latin and Pig L2 CO5 1.4.1
engine?
Part B

S.No Question BL CO PI MM/YY


Unit 1
1. What is unstructured data?Compare structured L2 CO1 1.4.1
and unstructured data.
2. Explain the application of big data. L1 CO1 1.4.1
3. What is web analytics?Why web analytics is L1 CO1 1.4.1
important?
4. Draw and explain Hadoop ecosystem L2 CO1 1.4.1
5. Discuss about crowd sourcing and Trans firewall L2 CO1 1.4.1
analytics.
Unit 2
1 Briefly discuss schemaless database L2 CO2 1.4.1
2. What is CAP theorem?Explain. L1 CO2 1.4.1
3. What is sharding?Compare sahrding with L1 CO2 1.4.1
replication.
4. Discuss read and write Quorums. L2 CO2 1.4.1
5. Explain in detail about Casandra Architecture L2 CO2 1.4.1
and Casandra Data model.
Unit 3
1. What is Hadoop streaming?Explain the feature L2 CO3 1.4.1
of Hadoop streaming.
2. Explain heartbeat mechanism of HDFS. L1 CO3 1.4.1
3. Explain in detail about i)writable interface of L1 CO3 1.4.1
Hadoop ii)Avro
4. Explain in detail about i)Data integrity in HDFS L2 CO3 1.4.1
ii)Hadoop local file system.
5. Explain in detail about Hadoop I/O. L2 CO3 1.4.1
Unit 4

1. Explain in detail about MapReduce workflows L2 CO4 1.4.1


2. Explain in detail about anatomy of MapReduce L1 CO4 1.4.1
Job Run.
3. Write short notes on YARN. L1 CO4 1.4.1
4. Discuss Input-Output format of MapReduce. L2 CO4 1.4.1
5. What is capacity scheduler? Compare capacity L2 CO4 1.4.1
and fair scheduler.
Unit 5
1. Explain in detail about Hbase architecture. L2 CO5 1.4.1
2. Difference between HDFS and Hbase. L1 CO5 1.4.1
3. Write short notes on Hbaseclient. L1 CO5 1.4.1
4. What is pig?Explain the features of pig. L2 CO5 1.4.1
5. Draw the architecture of pig. L2 CO5 1.4.1

Part C

C
S.No Question BL PI MM/YY
O
Unit 1
1. What is open source technology? Explain L2 CO1 1.4.1
advantage,disadvantages and application of open
source.
2. Explain about convergence of key trends in Big L1 CO1 1.4.1
data.
3. Describe about industry examples of Big data L1 CO1 1.4.1

Unit 2
1. Explain with diagram various aggregate data L2 CO2 1.4.1
model of NoSQL.
2. Discuss about distributed models L1 CO2 1.4.1
Unit 3
1. Explain the data flow between client reading data L2 CO3 1.4.1
from HDFS.
2 Demonstrate the execution of streaming and L2 CO3 1.4.1
pipes in Hadoop.
Unit 4
1. Explain in detail about Job Scheduling L2 CO4 1.4.1
2. Describe about shuffle and sort. L2 CO4 1.4.1
3. Explain failures in classic map reduce and L2 CO4 1.4.1
YARN.
Unit 5
1. What is Hbase? Draw architecture of Hbase. L1 CO5 1.4.1
Explain the difference between HDFS and
Hbase.
2. Explain in detail about Hive architecture. L1 CO5 1.4.1
3. Explain in detail HiveQL Queries. L2 CO5 1.4.1

Prepared By Verified By
(Name & Sign) (Name & Sign)

Format No : TLP 50 Rev.No : 1.0 Date : 19-07-2023

SYLLBUS:

UNIT I UNDERSTANDING BIG DATA 5 Introduction to big data – convergence of key trends –
unstructured data – industry examples of big data – web analytics – big data applications– big data
technologies – introduction to Hadoop – open source technologies – cloud and big data – mobile
business intelligence – Crowd sourcing analytics – inter and trans firewall analytics.
UNIT II NOSQL DATA MANAGEMENT 7 Introduction to NoSQL – aggregate data models – key-value
and document data models – relationships – graph databases – schemaless databases – materialized
views – distribution models – master-slave replication – consistency - Cassandra – Cassandra data
model – Cassandra examples – Cassandra clients
UNIT IV MAP REDUCE APPLICATIONS 6 MapReduce workflows – unit tests with MRUnit – test data
and local tests – anatomy of MapReduce job run – classic Map-reduce – YARN – failures in classic
Map-reduce and YARN – job scheduling – shuffle and sort – task execution – MapReduce types –
input formats – output formats.
UNIT III BASICS OF HADOOP 6 Data format – analyzing data with Hadoop – scaling out – Hadoop
streaming – Hadoop pipes – design of Hadoop distributed file system (HDFS) – HDFS concepts – Java
interface – data flow – Hadoop I/O – data integrity – compression – serialization – Avro – file-based
data structures - Cassandra – Hadoop integration.
UNIT V HADOOP RELATED TOOLS 6 Hbase – data model and implementations – Hbase cllients –
Hbase examples – praxis. Pig – Grunt – pig data model – Pig Latin – developing and testing Pig Latin
scripts. Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation –
HiveQL queries. 30 PER

You might also like