0% found this document useful (0 votes)
2 views

Big Data Syllabus

The course on Big Data Technologies aims to equip students with knowledge and skills to address challenges in storing, analyzing, and searching large datasets. It covers topics such as the Google File System, Map-Reduce Framework, NoSQL databases, and practical applications using Hadoop and Elasticsearch. Students will engage in hands-on projects to apply their learning to real-world big data problems.

Uploaded by

sharproentgen3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Big Data Syllabus

The course on Big Data Technologies aims to equip students with knowledge and skills to address challenges in storing, analyzing, and searching large datasets. It covers topics such as the Google File System, Map-Reduce Framework, NoSQL databases, and practical applications using Hadoop and Elasticsearch. Students will engage in hands-on projects to apply their learning to real-world big data problems.

Uploaded by

sharproentgen3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

BIG DATA TECHNOLOGIES

CT 765 07
Course Objectives: -
The growth of information systems has given rise to large amount of data which do not qualify as
traditional definition of data. This scenario has given us new possibilities but at same time pose serious
challenges. Such challenges lie in effective storage, analysis and search of such large set of data.
Fortunately, a number of technologies have been developed that answer such challenges. This course
introduces this scenario along with technologies and how they answer these challenges.
In this context, the specific objective of the course is to introduce student to current scenarios of big data
and provide various facets of big data. It also provides them opportunity to be familiar with the
technologies playing key role in it and equips them with necessary knowledge to use them for solving
various big data problems in different domains.

1. Introduction to Big Data [7 hours]


1. Big Data Overview
2. Background of Data Analytics
3. Role of Distributed System in Big Data
4. Role of Data Scientist
5. Current Trend in Big Data Analytics

2. Google File System [7 hours]


1. Architecture
2. Availability
3. Fault tolerance
4. Optimization for large scale data

3. Map-Reduce Framework [10 hours]


1. Basics of functional programming
2. Fundamentals of functional programming
3. Real world problems modeling in functional style
4. Map reduce fundamentals
5. Data flow (Architecture)
6. Real world problems
7. Scalability goal
8. Fault tolerance
9. Optimization and data locality
10. Parallel Efficiency of Map-Reduce

4. NoSQL [6 hours]
1. Structured and Unstructured Data
2. Taxonomy of NoSQL Implementation
3. Discussion of basic architecture of Hbase, Cassandra and MongoDb

5. Searching and Indexing Big Data [7 hours]


1. Full text Indexing and Searching
2. Indexing with Lucene
3. Distributed Searching with elasticsearch
6. Case Study: Hadoop [8 hours]
1. Introduction to Hadoop Environment
2. Data Flow
3. Hadoop I/O
4. Query languages for Hadoop
5. Hadoop and Amazon Cloud

Practical
Students will get opportunity to work in big data technologies using various dummy as well as real world
problems that will cover all the aspects discussed in course. It will help them gain practical insights in
knowing about problems faced and how to tackle them using knowledge of tools learned in course.
1. HDFS: Setup a hdfs in a single node to multi node cluster, perform basic file system operation on
it using commands provided, monitor cluster performance
2. Map-Reduce: Write various MR programs dealing with different aspects of it as studied in course
3. Hbase: Setup of Hbase in single node and distributed mode, write program to write into hbase
and query it
4. Elastic Search: Setup elastic search in single mode and distributed mode, Define template, Write
data in it and finally query it
5. Final Assignment: A final assignment covering all aspect studied in order to demonstrate problem
solving capability of students in big data scenario.

References
1. Jeffrey Dean, Sanjay Ghemawat MapReduce:Simplified Data Processing on Large Clusters
2. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung The Google File System
3. http://wiki.apache.org/hadoop/

Evaluation Scheme:
The questions will cover all the chapters of the syllabus. The evaluation scheme will be as indicated in the
table below:

Chapters Hours Marks Distribution*

1 7 12

2 7 13

3 10 18

4 6 11

5 7 13

6 8 13

Total 45 80
*There could be a minor deviation in Marks distribution

You might also like