0% found this document useful (0 votes)
43 views4 pages

Big Data SYLLABUS

The document outlines the curriculum for a Big Data Analytics course (BAD601), detailing course objectives, teaching methods, and assessment criteria. It covers various modules including Hadoop, MongoDB, Hive, and Spark, along with practical components for hands-on experience. The course aims to equip students with skills in big data processing, analysis, and the use of relevant technologies and tools.

Uploaded by

sahanasaana19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views4 pages

Big Data SYLLABUS

The document outlines the curriculum for a Big Data Analytics course (BAD601), detailing course objectives, teaching methods, and assessment criteria. It covers various modules including Hadoop, MongoDB, Hive, and Spark, along with practical components for hands-on experience. The course aims to equip students with skills in big data processing, analysis, and the use of relevant technologies and tools.

Uploaded by

sahanasaana19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

BIG DATA ANALYTICS Semester

Course Code BAD601 CIE Marks 50

S) 3:0:25
Teaching Hours/Week (L.T:P

40 hours Theory 8-10 Lab


100
al Hours PedagogYy + slots

of
Credits
Exam Hours
Examination nature (SEE) Theory/practical

Courseobjectives:
1. To implementMapReduce programsfor processing big data
2. To realizestorage and processing of big data using MongoDB, Pig,Hive and Spark.
3 To analyze big data using machinelearning techniques.

Teaching-Learning Process (General Instructions)


These are sample Strategies; that teachers can use to accelerate the attainment ofthe various course outcomes.
1. Lecturer method (L)needs not to be only a traditional lecture method, but alternative effective teaching
methods could be adopted to attain the outcomes.
2. Use ofVideo/Animation to explainfunctioningof various concepts.
3. Encourage collaborative (Group Learning) Learning in the class.

4. Ask at least three HOT (Higher order Thinking) questions in the class, which promotes critical thinking,

5. Discuss how every concept can beapplied to the real world- and when that's possible, it helps improve the
students understanding

6. Use anyof these methods: Chalk and board, Active Learning,Case Studies
MODULE1
Classificationof data, Characteristics, Evolution and definition of Big data, What is Big data, Why Big data,
Traditional Business Intelligence Vs Big Data, Typicaldata warehouse and Hadoop environment.
Big Data Analytics: What is Big data Analytics, Classification of Analytics, Importance of Big Data
Analytics, Technologies used in Big data Environments, Few Top Analytical Tools, NoSQL, Hadoop.

TB1: Ch l: L1, Ch2: 2.1-2.5,2.7,2. 9-2.11, Ch3: 32,3.5,3.8,3.12, Ch4: 4.1,4.2

MODULE-2
Introduction toHadoop: Introducing hadoop, Why hadoop, Why not RDBMS, RDBMS Vs Hadoop, History
of Hadoop,Hadoop overview, Use case of Hadoop, HDFS (Hadoop Distributed File System),Processing data
with Hadoop,Managing resources and applicationswith Hadoop YARN(Yet Another Resource Negotiator).
Introduction to Map Reduce Programming: Introduction, Mapper, Reducer, Combiner, Partitioner,
Scarching. Sorting.Compression.

TBI: Ch5: 5.1-,5.8, 5.10-5.12, Ch 8: 8.1 - 8.8


MODULE-3
Introduction to MongoDB: What is MongoDB, Why MongoDB, Termsused in RDBMS and MongoDB, Data
Types in MongoDB, MongoDB Query Language.

TBI:Ch6: 6.1-6.5
MODULE-4
Introduction to Hive: What is Hive, Hive Architecture, Hive data types, Hive file formats, Hive Query
Language (HL), RC File implementation, User Defined Function (UDF),
Introduction to Pig: What is Pig. Anatomy of Pig. Pig on Hadoop, Pig Philosophy, Use case for Pig. Pig Latin
Overview, Data types in Pig, Running Pig, Execution Modes ofPig. HDFS Commands, Relational Operators,
Eval Function, Complex Data Types, Piggy Bank, User Defined Function, Pig Vs Hive.

TB1: Ch 9: 9.1-9.6,9,8, Ch 10: 10.1- 10.15, 10.22


MODULE-5
Spark and Big Data Analytics: Spark, Introductionto Data Analysis with Spark.

@HG10012025
Text, Web Contentand Link Analytics: Introduction,Text Mining, Web Mining, Web Content and Web
Usage Analytics, Page Rank, Structure of Web and Analyzing a Web Graph.
TB2:Ch5: 5.2,5.3, Ch 9:9.1-94

PRACTICAL COMPONENT OF IPCC


SLNO Experiments (Java/Python/R)
Hadoop and Implement the
Install following file managementtasks in Hadoop:
Adding files and directories
Retrieving files

Deleting files and directorics.

Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them into

HDFS using one of the abovecommand line utilitices.


2 Develop a MapReduce program to implement Matrix Multiplication

3 Develop a Map Reduce program that minesweather data and displays appropriate messages indicating

the weather conditions of the day.


4 Develop a MapReduce program to find the tags associated with each movie by analyzing movie lens
data

Implement Functions: Count- Sort-Limit- Skip- Aggregate using MongoDB


6
Develop Pig Latin scripts to sort,group, join, project, and filter the data.

7 Use Hive to create, alter, and drop databases, tables, vicws, functions,and indexes
8
Implement a word count program in Hadoop and Spark.
9 Use CDH (Cloudera Distribution for Hadoop)and HUE (HadoopUser Interface)to analyze data and
generate reports for sample datasets

Course outcomes (CourseSkill Set)

At the end of the course,the student will be able to:

1. ldentify and list various Big Data concepts, tools and applications.

2. Develop programs using HADOOP framework


Make use of HadoopClusterto deploy Map Reduce jobs,PIG, HIVE and Spark programs.
3.
Analyze the given data set and identify deep from the data set
insights
p
5. Demonstrate Text, Web Content and Link Analytics

Assessment Details (both CIE and SEE)


The saotake
50% and for SemesterEnd Exam (SEEisS0%
The weightage of Continuous Internal Evaluation (CIE) is

for the CIEis409% of the marimum marks (20


marks out of 50) andforthe
SEE minimum passing mark is 35% of the maximum marks (18 out of 50 marks). A student shall be
deemed to have satisfied the academic requirements and earned the credits allotted to each subject/
course if the student secures a minimum of 40% (40 marks out of 100) in the sum total of the ClE
(Continuous Internal Evaluation) and SEE (SemesterEnd Examination) taken together.

CIE for the theorycomponent oftheIPCC (maximum marks 50)


• IPCCmeans practical portion integrated with the theory of the course.
• marks
CIE for thetheory component are 25 marks that for the practical component is 25
d
marks.
25 marks for the theory component aresplit into 15marks for two Internal AssessmentTests (Two
Tests, each of 15 Marks with 01-hour duration, are to be conducted) and 10 mars for other

GIe 10012025 2
assessment methods mentioned in 220B4.2. The first test at the end of 40-50% coverage of the

and the second test after covering 85-90% of the syllabus.


syllabus
Scaled-downmarks ofthe sum of two tests and other assessmentmethods will be CIE marks forthe
theory component of IPCC (that is for 25 marks).

The student has to secure 40% of 25 marks toqualify in the CIE of the theory component of IPCC
CIEforthepractical component ofthe IPCC

15 marks for the conduction of the experimentand preparation of laboratory record, and 10 marks
for the test to be conducted after the completion ofall the laboratory sessions.

On completion of every experiment/programin the laboratory, the students shall be evaluated


including viva-voce andmarks shall be awarded on the same day.
The CIE marks awarded in the case of the Practical component be based on the continuous
shall

evaluation of the laboratory report. Each experiment report can be evaluated for 10 marks.Marks of
all experiments' write-ups areadded and scaled down to 15 marks.

The laboratory (duration 02/03 hours) after completion


test of all the experiments shall be
conducted for 50 marks and scaled down to 10 marks.

Scaled-downmarks of write-up evaluations and tests added will be CIE marks for the laboratory

component ofIPCC for 25 marks.


• The student has to secure 40% of 25 marks to qualify in the CIE of the practical component of the IPCC.
SEE forIPCC
TheorySEE will beconducted by University as per the scheduled timetable, with common question

papers for the course (duration 03hours)


1. The question paper will have ten questions. Each question is set for 20 marks.
2. There will be 2 questions from each module.Each of the two questions under a module (with a
maximum of 3 sub-questions), shouldhave a mix of topics under that module.
3. The students havetoanswer 5full questions, selecting one full question from each module.

4. Marks scored by the student shall be proportionally scaled down to 50 Marks


Thetheoryportionof the IPCCshall be for both CIE and SEE,whereas the practical portionwill have
a CIE component only. Questions mentioned in the SEE paper may incude questionsfrom the
practical component.
Suggested Learning Resources:
Books:
1. Seema Acharya and Subhashini Chellappan "Big data and Analytics Wiley India Publishers,2nd Edition,

2019.
2. Rajkamal and Preeti Saxena, "Big Data Analytics, Introduction to Hadoop, Spark and Machine Learning",
McGrawkHill Publication, 2019.
v
Reference Books:

1 Adam Shookand Donald Mine, "MapReduceDesign Patterns: Building Effective Algorithms and Analyticsfor
Hadoop and Other Systems"-O'Reilly 2012
2. Tom
3. Thomas
White, "Hadoop:
Erl, Wajid
The Definitive Guide" 4 Edition, Oreilly Medila,
Khattak, and Paul Buhler, Big Data Fundamentals:
2015.
Concepts, Drivers & Techniques,
Pearson India Education Service Pvt. Ltd., 1 Edition, 2016
4. John D. Kelleher,Brian Mac Namee,Aoife D'Arcy -Fundamentals of Machine Learming for Predictive Data
Analytics: Algorithms, Worked Examples, MIT Press 2020, 2nd Edition

Ge10012025
Web links and Video Lectures (e-Resources):
• https://www.kagle.com/datasets/grouplens/movielens-20m-dataset
• https://www.youtube.com/watchhv=bAyrOblTTYE&list=PLEIEAq2VkUUjqplk-gSWimo37 urjQ0dCZ
• https//www.youtube.com/watchvVm00Qg
dexs4
PCbZY&list=PLEIEAq2VkUUjgplkg5W1imo37urjQodCZ&in

• https|/www.youtube.com/watch?v=GG-VRm6XnNk https://www.youtube.com/watch?v=lglozNv 92A

Activity Based Learning (Suggested Activities in Class)/ Practical Based learning


1 Implement MongoDB based application to store big data for data processing and analyzing the results (10
marks)

GG10012025

You might also like