Big Data QB
Big Data QB
Subject Code & Subject Name :CCS334 & Big Data Analytics
Year & Sem :III & V
Name of Faculty :M.JEBA MALAR
Designation & Department :Assistant Professor & IT
Part A
Part C
C
S.No Question BL PI MM/YY
O
Unit 1
1. What is open source technology? Explain L2 CO1 1.4.1
advantage,disadvantages and application of open
source.
2. Explain about convergence of key trends in Big L1 CO1 1.4.1
data.
3. Describe about industry examples of Big data L1 CO1 1.4.1
Unit 2
1. Explain with diagram various aggregate data L2 CO2 1.4.1
model of NoSQL.
2. Discuss about distributed models L1 CO2 1.4.1
Unit 3
1. Explain the data flow between client reading data L2 CO3 1.4.1
from HDFS.
2 Demonstrate the execution of streaming and L2 CO3 1.4.1
pipes in Hadoop.
Unit 4
1. Explain in detail about Job Scheduling L2 CO4 1.4.1
2. Describe about shuffle and sort. L2 CO4 1.4.1
3. Explain failures in classic map reduce and L2 CO4 1.4.1
YARN.
Unit 5
1. What is Hbase? Draw architecture of Hbase. L1 CO5 1.4.1
Explain the difference between HDFS and
Hbase.
2. Explain in detail about Hive architecture. L1 CO5 1.4.1
3. Explain in detail HiveQL Queries. L2 CO5 1.4.1
Prepared By Verified By
(Name & Sign) (Name & Sign)
SYLLBUS:
UNIT I UNDERSTANDING BIG DATA 5 Introduction to big data – convergence of key trends –
unstructured data – industry examples of big data – web analytics – big data applications– big data
technologies – introduction to Hadoop – open source technologies – cloud and big data – mobile
business intelligence – Crowd sourcing analytics – inter and trans firewall analytics.
UNIT II NOSQL DATA MANAGEMENT 7 Introduction to NoSQL – aggregate data models – key-value
and document data models – relationships – graph databases – schemaless databases – materialized
views – distribution models – master-slave replication – consistency - Cassandra – Cassandra data
model – Cassandra examples – Cassandra clients
UNIT IV MAP REDUCE APPLICATIONS 6 MapReduce workflows – unit tests with MRUnit – test data
and local tests – anatomy of MapReduce job run – classic Map-reduce – YARN – failures in classic
Map-reduce and YARN – job scheduling – shuffle and sort – task execution – MapReduce types –
input formats – output formats.
UNIT III BASICS OF HADOOP 6 Data format – analyzing data with Hadoop – scaling out – Hadoop
streaming – Hadoop pipes – design of Hadoop distributed file system (HDFS) – HDFS concepts – Java
interface – data flow – Hadoop I/O – data integrity – compression – serialization – Avro – file-based
data structures - Cassandra – Hadoop integration.
UNIT V HADOOP RELATED TOOLS 6 Hbase – data model and implementations – Hbase cllients –
Hbase examples – praxis. Pig – Grunt – pig data model – Pig Latin – developing and testing Pig Latin
scripts. Hive – data types and file formats – HiveQL data definition – HiveQL data manipulation –
HiveQL queries. 30 PER