BD Course Handout
BD Course Handout
Deemed to be University
BHUBANESWAR-751024
Course Handout
● To explore the big data stacks and the technologies associated with it.
● To evaluate the different NoSQL databases and frameworks required to handle the big data.
● To formulate the concepts, principles and techniques focusing on the applications to industry
and real world experience.
● To contextually integrate and correlate large amounts of information to gain faster insights for
real time scenarios.
8. Course Outcome:
CO # Detail
CO1 Understand the concept of big data and its analytics in the real world
CO2 Analyse various big data technology foundations
CO3 Apply filtering technique to stream data
CO4 Apply Hadoop ecosystem paradigm using MapReduce, YARN, Pig, Hive, Scoop,
HBase to solve data intensive problems
CO5 Analyse big data framework like Hadoop and NoSQL to efficiently store and process
big data to generate analytics
CO6 Present appropriate solutions to big data analytics frameworks and visualization.
9. Course Contents
The course focuses on basic and essential topics in Big Data.
Unit # Unit Detailed Area
1 Overview of Importance of Data, Characteristics of Data, Analysis of
Big Data unstructured data, Introduction to Big Data, Challenges of
conventional systems, Data analytic, Evolution of analytic
scalability, Big Data Analytics, Key Big Data terminologies, Big
1
Data analytics lifecycle, Cloud Computing and Big Data.
2 Big Data Exploring the Big Data Stack, Data Sources Layer, Ingestion Layer,
Technology Storage Layer, Physical Infrastructure Layer, Platform Management
Foundations Layer, Security Layer, Monitoring Layer, Analytics Engine,
Visualization Layer, Big Data Applications, Virtualization.
3 Streaming Introduction to Streams Concepts – Stream data model and
architecture – Stream Computing, Sampling data in a stream –
Filtering streams, Counting distinct elements in a stream.
4 Hadoop Introduction to Hadoop, Hadoop Ecosystem, Hadoop Distributed
Ecosystem File System, MapReduce, YARN, Pig and PigLatin, Hive, Scoop,
HBase
5 Storing Data Data Models, RDBMS and Hadoop, Non-Relational Database,
in Big Data Introduction to NoSQL, Types of NoSQL, Polyglot Persistence,
Sharding
context.
6 Frameworks Distributed and Parallel Computing for Big Data, Big Data
And Visualizations – Visual data analysis techniques, interaction
Visualization techniques, applications
● DBMS
2
Lecture No. Unit Topics Lesson #
6
● Cloud Computing and Big Data
● Discussion
7-11 Big Data 7
● Exploring the Big Data Stack
Technology
Foundations ● Data Sources Layer
● Ingestion Layer
8
● Storage Layer
● Monitoring Layer
10
● Analytics Engine
● Visualization Layer
11
● Big Data Applications, Virtualization.
12-14 Streaming 12
● Introduction to Streams Concepts
● MapReduce
17
● YARN
18
● Hive
19
● Pig and PigLatin
20
● HBase
21
● Scoop
22
● Discussion
23-30 Storing Data 23
● Data Models
in Big Data
context 24
● RDBMS and Hadoop
25
● Non-Relational Database
3
Lecture No. Unit Topics Lesson #
26
● Introduction to NoSQL
27
● Types of NoSQL
28
● Types of NoSQL cont...
29
● Polyglot Persistence
30
● Sharding
● Discussion
31-36 Framework 31
● Distributed and Parallel Computing for Big Data
&
visualization 32
● Big Data Visualizations – Visual data analysis
techniques
33
● Interaction techniques and applications
34
● Big Data Visualizations – Visual data analysis
techniques cont...
35
● Big Data Visualizations – Visual data analysis
techniques cont...
36
● Interaction techniques and applications
● Discussions
Considering the guidelines circulated and after discussing with the faculty members, following
activity based teaching and learning is proposed and Component wise distributions of the
activities are listed below.
4
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
1 Assignment 10-08-2023 5
2 Assignment 24-08-2023 5
5 Assignment 20-10-2023 5
6 Quiz 08-11-2023 5