Big Data SYLLABUS
Big Data SYLLABUS
S) 3:0:25
Teaching Hours/Week (L.T:P
of
Credits
Exam Hours
Examination nature (SEE) Theory/practical
Courseobjectives:
1. To implementMapReduce programsfor processing big data
2. To realizestorage and processing of big data using MongoDB, Pig,Hive and Spark.
3 To analyze big data using machinelearning techniques.
4. Ask at least three HOT (Higher order Thinking) questions in the class, which promotes critical thinking,
5. Discuss how every concept can beapplied to the real world- and when that's possible, it helps improve the
students understanding
6. Use anyof these methods: Chalk and board, Active Learning,Case Studies
MODULE1
Classificationof data, Characteristics, Evolution and definition of Big data, What is Big data, Why Big data,
Traditional Business Intelligence Vs Big Data, Typicaldata warehouse and Hadoop environment.
Big Data Analytics: What is Big data Analytics, Classification of Analytics, Importance of Big Data
Analytics, Technologies used in Big data Environments, Few Top Analytical Tools, NoSQL, Hadoop.
MODULE-2
Introduction toHadoop: Introducing hadoop, Why hadoop, Why not RDBMS, RDBMS Vs Hadoop, History
of Hadoop,Hadoop overview, Use case of Hadoop, HDFS (Hadoop Distributed File System),Processing data
with Hadoop,Managing resources and applicationswith Hadoop YARN(Yet Another Resource Negotiator).
Introduction to Map Reduce Programming: Introduction, Mapper, Reducer, Combiner, Partitioner,
Scarching. Sorting.Compression.
TBI:Ch6: 6.1-6.5
MODULE-4
Introduction to Hive: What is Hive, Hive Architecture, Hive data types, Hive file formats, Hive Query
Language (HL), RC File implementation, User Defined Function (UDF),
Introduction to Pig: What is Pig. Anatomy of Pig. Pig on Hadoop, Pig Philosophy, Use case for Pig. Pig Latin
Overview, Data types in Pig, Running Pig, Execution Modes ofPig. HDFS Commands, Relational Operators,
Eval Function, Complex Data Types, Piggy Bank, User Defined Function, Pig Vs Hive.
@HG10012025
Text, Web Contentand Link Analytics: Introduction,Text Mining, Web Mining, Web Content and Web
Usage Analytics, Page Rank, Structure of Web and Analyzing a Web Graph.
TB2:Ch5: 5.2,5.3, Ch 9:9.1-94
Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them into
3 Develop a Map Reduce program that minesweather data and displays appropriate messages indicating
7 Use Hive to create, alter, and drop databases, tables, vicws, functions,and indexes
8
Implement a word count program in Hadoop and Spark.
9 Use CDH (Cloudera Distribution for Hadoop)and HUE (HadoopUser Interface)to analyze data and
generate reports for sample datasets
1. ldentify and list various Big Data concepts, tools and applications.
GIe 10012025 2
assessment methods mentioned in 220B4.2. The first test at the end of 40-50% coverage of the
The student has to secure 40% of 25 marks toqualify in the CIE of the theory component of IPCC
CIEforthepractical component ofthe IPCC
15 marks for the conduction of the experimentand preparation of laboratory record, and 10 marks
for the test to be conducted after the completion ofall the laboratory sessions.
evaluation of the laboratory report. Each experiment report can be evaluated for 10 marks.Marks of
all experiments' write-ups areadded and scaled down to 15 marks.
Scaled-downmarks of write-up evaluations and tests added will be CIE marks for the laboratory
2019.
2. Rajkamal and Preeti Saxena, "Big Data Analytics, Introduction to Hadoop, Spark and Machine Learning",
McGrawkHill Publication, 2019.
v
Reference Books:
1 Adam Shookand Donald Mine, "MapReduceDesign Patterns: Building Effective Algorithms and Analyticsfor
Hadoop and Other Systems"-O'Reilly 2012
2. Tom
3. Thomas
White, "Hadoop:
Erl, Wajid
The Definitive Guide" 4 Edition, Oreilly Medila,
Khattak, and Paul Buhler, Big Data Fundamentals:
2015.
Concepts, Drivers & Techniques,
Pearson India Education Service Pvt. Ltd., 1 Edition, 2016
4. John D. Kelleher,Brian Mac Namee,Aoife D'Arcy -Fundamentals of Machine Learming for Predictive Data
Analytics: Algorithms, Worked Examples, MIT Press 2020, 2nd Edition
Ge10012025
Web links and Video Lectures (e-Resources):
• https://www.kagle.com/datasets/grouplens/movielens-20m-dataset
• https://www.youtube.com/watchhv=bAyrOblTTYE&list=PLEIEAq2VkUUjqplk-gSWimo37 urjQ0dCZ
• https//www.youtube.com/watchvVm00Qg
dexs4
PCbZY&list=PLEIEAq2VkUUjgplkg5W1imo37urjQodCZ&in
GG10012025