0% found this document useful (0 votes)
67 views6 pages

Course Pack BDA

It's a course pack of big data

Uploaded by

h4198390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views6 pages

Course Pack BDA

It's a course pack of big data

Uploaded by

h4198390
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

AY. 2024-2025 COURSE PACK FOR BIG DATA ANALYTICS - 22ADE12 PE-I

Course Title Big Data Analytics Course Type PE-I


Course Code 22ADE12 Credits 3 Class V Semester
Contact Work
TLP Credits Total Number
Hours Load Assessment in
of Classes
Theory 3 2 2 Per Semester Weightage
Course
Practice - -
Structure
Theory Practical CIE SEE
Tutorial - 1 1

Total 3 3 3 40 - 40 60
Course Lead: Dr Raman ( Course Coordinator )

Course Theory Practice


Instructors V SEM 2024-25
Dr D RAMAN -
Dr. G.VANITHA

COURSE OVERVIEW:
About Data handling Storage, Processing, tools and techniques
COURSE OBJECTIVE
This course aims to:

1. Introduce the importance of big data and role of Hadoop framework in analyzing large
datasets by writing
mapper and reducer for a given problem.
2. Familiarize writing queries in Pig and Hive to process big data
3. Present the latest big data frameworks and applications using Spark and Scala.
4. Discuss the concept and writing applications using SparkSQL.
5. Investigate the integration of Kafka with other streaming frameworks like Apache Spark and
Apache.
COURSE OUTCOMES (COs) : After the completion of the course, the student will be able to:

CO# Course Outcomes POs PSOs


Understand the processing of large datasets in Hadoop
1 PO1,2,3,4,5, 1,3
framework and Apply MapReduce architecture to solve real 10 & PO12
world problems.
2 Develop scripts using Pig over large datasets and query using PO1-PO6 1,2,3
Hive. PO10,PO12
3 Understand the Implementation of Spark and the Scala PO1PO8, 1,3
programming. PO10-PO12
4 Expertise in using Resilient Distributed Datasets (RDD) for PO1PO8, 1,2,3
creating applications in Spark and query using SparkSQL. PO10-PO12
5 PO1PO8, 1,2,3
Apply streaming technologies in real-time data processing
PO10-PO12

BLOOM’S LEVEL OF THE COURSE OUTCOMES

Bloom’s Level
CO# Remember Understand Apply Analyze Evaluate Create
(L1) (L2) (L3) (L4) (L5) (L6)

1 ✔ ✔
2 ✔ ✔ ✔ ✔

3 ✔ ✔ ✔ ✔
4 ✔ ✔ ✔
5 ✔ ✔ ✔
6 ✔ ✔

COURSE ARTICULATION MATRIX


Note: 1-Low, 2-Medium, 3-High
COURSE ASSESSMENT
Component
S Duration Total Marks
Component Wise Weighta Marks
No in Hours
Mark ge
s
1 Theory: Test-1 1 20 Average of T1, 40 40
(T1) T2 (20 )
2 Theory: Test-2 1 20
(T2)
Alternate
Aver

3 5 Best two Average of


Assessments
Continuous S1,S2,S3 (5)
Slip Test-01
Internal (S1)
Evaluation
Slip Test-02
(CIE) 5
(S2)
Slip Test-03
5
(S3)
Assignment- Average of A1, A2
4 10 (10)
01 (A1)
Assignment-
10
02 (A1)
5 Attendance 5 5 marks >=85%
4 marks >=80%
3 marks >=75%
2 marks >=70%
1 marks >=65%.
-
6 Practical Exam - -
7 Semester End Exam (SEE) 3 Part-A Part- A contains 60 60
(Questions:Q, Marks:M) (5Q*3M=15M) five questions one
Part-B from each unit
(5Q* 9M=45M) Part-B
Covering all the
five units with
internal choice.
Questions in
Part-A and part-B
may have
subdivision
Total Marks 100

* Assignment, Quiz, Class test, SWAYAM/NPTEL/MOOCs and etc.


Course Prerequisites: Basic knowledge of programming language such as python.

COURSE CONTENT
UNIT-I
Introduction to Big Data: Introduction, Big Data Enabling Technologies, Hadoop Stack for Big Data. The
Hadoop Distributed Files system: Overview, The Design of HDFS, HDFS Concepts, The Command-Line
Interface, Hadoop File systems. MapReduce: Overview, Developing a MapReduce Application, How
MapReduce works, MapReduce Types and Formats, MapReduce Features, MapReduce Examples.
UNIT-II
Pig: Generating Examples, Comparison with Databases, Pig Latin, User-Defined Functions, Data Processing
Operators, Pig in Practice. Hive: Comparison with Traditional Databases, HiveQL, Tables, Querying Data,
User
Defined Functions, Writing a User Defined Functions, Writing a User Defined Aggregate Function.
UNIT-III
Parallel programming with Spark: Overview of Spark, Fundamentals of Scala and functional
programming,
Spark concepts - Resilient Distributed Datasets (RDD), creating RDDs, Basic Transformations, Basic
Actions,
Word Count example; Spark operations, Job execution, Spark Applications : Cluster computing with working
sets. Spark SQL: What is SQL, Big Data and SQL: Spark SQL, Creating DataFrames, Dataframes
Operations,
How to Run Spark SQL Queries, Tables, Views, Databases, Select Statements. UNIT-IV
Machine Learning with Spark: Designing a Machine Learning System, Obtaining, Processing and
Preparing
Data with Spark, Building a Recommendation Engine with Spark, Building a Classification Model with
Spark,
Building a Regression Model with Spark and Building a Clustering Model with Spark. Spark GraphX &
Graph
Analytics: GraphX : Introduction, Graphs in Machine Learning Landscape, Graph-structured data, PageRank,
Graph Analytics: Property Graphs, Graph Operators, Distributed Graphs, GraphX Unified Analytics; Case
Study:
Flight Data Analysis using Spark GraphX.
UNIT-V
Streaming: Introduction to Stream Processing, Batch processing vs. stream processing, Spark structured
streaming API, use case using Spark streaming. Apache Kafka Fundamentals: Architecture, Brokers,
Topics,
Partitions, Producers, Consumers, Kafka Connect and Kafka Streams. Advanced Kafka Features: xactly-
Once
Semantics, Kafka Transactions, Tiered Storage, Integrating Kafka with Apache Spark and Apache Flink,
Integrating Kafka with Spark Streaming, Real-time Analytics Use Cases with Kafka such as Fraud Detection,
Clickstream Analysis, Real-time Monitoring.
Text Books:
1. Tom White, "Hadoop: The Definitive Guide", 4th Edition, O'Reilly Media Inc, 2015.
2. Bill Chambers, Matei Zaharia, “Spark: The Definitive Guide”, 4 th Edition, O'Reilly Media Inc, 2018.
3. Anand Rajaraman and Jeffrey David Ullman,”Mining of Massive Datasets”, 2 nd Edition, Cambridge
University Press, 2014.
4. Neha Narkhede, Gwen Shapira, Todd Palino, "Kafka: The Definitive Guide", 2nd Edition, O'Reilly
Media, 2017
5. Viktor Gamov, "Kafka Streams in Action", 1st Edition, Manning Publications, 2018
Suggested Reading:
1. Thilinagunarathne Hadoop MapReduce v2 Cookbook – 2 nd Edition, Packet Publishing, 2015.
2. Chuck Lam, Mark Davis, Ajit Gaddam, “Hadoop in Action”, Manning Publications Company, 2016.
3. Alex Holmes,” Hadoop in Practice”, Manning Publications Company, 2012.
4. Alan Gates, "Programming Pig", O'Reilly Media Inc, 2011.
5. Edward Capriolo, Dean Wampler, and Jason Rutherglen, "Programming Hive", O'Reilly Media Inc,
October 2012.
Online Resources:
1. http://www.planetcassandra.org/what-is-nosql
2. http://www.iitr.ac.in/media/facspace/patelfec/16Bit/index.html
3. https://class.coursera.org/datasci-001/lecture
4. http://bigdatauniversity

Self-Learning Exercises:

LESSON PLAN
Unit Topic
No.
I Unit I
1. Introduction to Big Data: Introduction, Big Data Enabling Technologies, Hadoop Stack
for Big Data.
2. The Hadoop Distributed Files system: Overview
3. The Design of HDFS, HDFS Concepts, The Command-Line Interface
4. Hadoop File systems. MapReduce: Overview
5. Developing a MapReduce Application, How MapReduce works
6. MapReduce Types and Formats
7. MapReduce Features, MapReduce Examples.
8. Overview of Unit-1 Concepts
Unit II
9. Overview of Unit-2 Pig: Generating Examples.
10. Comparison with Databases, Pig Latin
11. User-Defined Functions, Data Processing Operators, Pig in Practice.
12. SLIPTEST-1 Hive: Comparison with Traditional Databases,
13. HiveQL Tables, Querying Data, User Defined Functions,
14. Writing a User Defined Functions
15. Writing a User Defined Aggregate Function
16. ASSIGNMENT-1
Unit III
17. Parallel programming with Spark: Overview of Spark
18. Fundamentals of Scala and functional programming
19. Spark concepts - Resilient Distributed Datasets (RDD)
20. Creating RDDs, Basic Transformations, Basic Actions,Word Count example; Spark
operations
21. Job execution, Spark Applications : Cluster computing with working sets.
22. Spark SQL: What is SQL, Big Data and SQL: Spark SQL
23. Creating DataFrames, Dataframes Operations
24. How to Run Spark SQL Queries, Tables, Views, Databases, Select Statements.
Unit IV
25. SLIPTEST-2 Machine Learning with Spark: Designing a Machine Learning System
26. Obtaining, Processing and Preparing Data with Spark
27. Building a Recommendation Engine with Spark, Building a Classification Model with
Spark
28. Building a Regression Model with Spark and Building a Clustering Model with Spark.
29. Spark GraphX & Graph Analytics: GraphX : Introduction, Graphs in Machine Learning
Landscape
30. Graph-structured data, PageRank, Graph Analytics: Property Graphs
31. Graph Operators, Distributed Graphs, GraphX Unified Analytics;

Unit V
32. Streaming: Introduction to Stream Processing, Batch processing vs. stream processing
33. Spark structured streaming API, use case using Spark streaming.
34. Apache Kafka Fundamentals: Architecture, Brokers, Topics, Partitions, Producers,
Consumers
35. Kafka Connect and Kafka Streams. Advanced Kafka Features: xactly-Once Semantics
36. SLIPTEST-3 Kafka Transactions, Tiered Storage, Integrating Kafka with Apache Spark
and Apache Flink
37. Integrating Kafka with Spark Streaming
38. Real-time Analytics Use Cases with Kafka such as Fraud Detection
39. Clickstream Analysis, Real-time Monitoring
40. ASSIGNMENT-2
Revision
Over all Syllabus Discussions as per GATE Exam
TOTAL HRS : 39

Signature of Course Coordinator Signature of HoD

You might also like