0% found this document useful (0 votes)
63 views3 pages

Bigdata Syllabus

Uploaded by

Sankar Terli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views3 pages

Bigdata Syllabus

Uploaded by

Sankar Terli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Module I: Module Name: Getting an overview of Big Number of hours (LTP) 6 0 6

Data
Big Data definition, History of Data Management, Structuring Big Data, Elements of Big-
data, Big Data Analytics.

Exploring use of Big Data in Business Context: Use of Big Data in Social Networking, Use of
Big Data in preventing Fraudulent Activities in Insurance Sector & in Retail Industry.
Learning Outcomes:
After completion of this unit, the student will be able to:

1. Learn various sources of data and forms of data generation. (L2)


2. Understand the evolution and elements of Big Data. (L2)
3. Explore different opportunities available in the career path. (L3)
4. Understand the role and importance of Big Data in various domains. (L2)

Module II: Handling Big Data Number of hours (LTP) 6 0 6


Distributed and parallel computing for Big Data, Introducing Hadoop, Cloud computing and
Big Data, In-memory Computing Technology for Big Data.
Understanding Hadoop Ecosystem: Hadoop Ecosystem, Hadoop Distributed File System,
MapReduce, Hadoop YARN, Introducing HBase, Combing HBase and HDFS, Hive, Pig and
Pig Latin, Sqoop, ZooKeeper, Flume, Oozie.

Learning Outcomes:
After completion of this unit, the student will be able to:

1. Identify the difference between distributed and parallel computing. (L3)


2. Learn the importance of Virtualization in Big Data. (L2)
3. Learn the details of Hadoop and Cloud Computing. (L2)
4. Learn the architecture and features of HDFS. (L2)
Module III:
Understanding Big Data Technology Number of hours (LTP) 6 0 6
Foundations
The MapReduce Framework, Techniques to Optimize Map Reduce Jobs, Uses of Map Reduce,
Role of HBase in Big Data Processing.
Exploring the Big Data Stack, Virtualization and Big Data, Virtualization approaches.
Learning Outcomes:
After completion of this unit, the student will be able to:
1. Understand Hadoop Ecosystem, MapReduce and HBase. (L2)
2. Apply the technique in optimizing MapReduce jobs. (L3)
3. Explore the layers of Big Data Stack. (L2)
4. Learn virtualization approaches in handling Big Data operations. (L2)

Module IV: HIVE and PIG Number of hours (LTP) 6 0 6


Exploring Hive: Introducing Hive, Getting Started with Hive, Hive Services, Data Types,
Built- in Functions, Hive-DDL, Data Manipulation, Data Retrieval Queries, Using Joins.
Analysing Data with Pig: Introducing Pig, Running Pig, Getting started with Pig Latin, working
`with operators in Pig, Debugging Pig, Working with Functions in pig, Error Handling in Pig.

Learning Outcomes:
After completion of this unit, the student will be able to:
1. Learn the working of Hive and query execution. (L2)
2. Learn the importance of Pig. (L2)
3. Choose the operators in Pig. (L2)

Module V: SPARK Number of hours (LTP) 6 0 6


Introduction, Spark Jobs and API, Spark 2.0 Architecture, Resilient Distributed Datasets:
Internal Working, Creating RDDs, Transformations, Actions. Data Frames: Python to RDD
Communications, speeding up PySpark with Data Frames, Creating Data Frames and Simple
Data Frame Queries, Interoperating with RDDs, Querying with Data Frame.
Learning Outcomes:
After completion of this unit, the student will be able to:

1. Get an overview of Spark technology and Jobs Organization concept (L2)


2. Understand the schema less data structure available in PySpark (L3)
3. Get an overview of data frames that bridges the gap between Scala and Python in
terms of efficiency. (L2)
4. Able to handle a real time Big Data Application. (L4)

Textbooks(s)
1. Big Data Black Book by Dt Editorial Services, Dreamtech Publications, 2016.
2. Learning PySpark by Tomasz Drabas, Denny Lee, Packt publishing, 2017.
3. Tom White, "Hadoop: The Definitive Guide", 3/e,4/e O'Reilly, 2015.
Reference Book(s)
1. Bill Franks Taming, The Big Data Tidal Wave, 1/e, Wiley, 2012.

2. Frank J. Ohlhorst, Big Data Analytics, 1/e, Wiley, 2012


Course Outcomes:
1. Demonstrate the big data concepts for real world data analysis (L1).
2. Develop Map Reduce concepts (L2).
3. Learn how Pig Latin is used for programming in Hadoop. (L3).
4. Illustrate Hadoop API for Map reduce framework (L4).
5. Develop basic programs of map reduce framework particularly driver code,
mapper code, reducer code (L5).
6. Learn Apache Spark fundamentals, RDD, DataFrame.
Lab experiments for Bigdata

1 Installation of Hadoop Cluster –


a. Stand Alone Mode, b. Pseudo Distributed Mode, c.Fully Distributed Mode
2 Perform file management task in Hadoop.
a. Creating directory
b. List the contents of a directory
c. Upload and download a file
d. See contents of a file
e. Copy a file from source to destination
f. Move file from source to destination.
3 Map reduce programming
a. Wordcount program using Java
b. Wordcount program using python
4 Databases,Tables,Views,Functions and Indexes
5 Write a program to perform matrix multiplication in hadoop with a matrix size of nxn
where n >1000.
7 Given the following table schema
Employee_table {ID: INT, Name: Varchar (10), Age: INT, Salary: INT}
Loan_table {LoanID:INT, ID: INT, Loan_applied: Boolean, Loan_amt: INT)
a. Create a database and the following tables in Hive.
b. Insert records into the table
c. write an SQL to retrieve the employee details who have applied for a loan.
8 Write a query to create a table which stores the employee records working in the same
department together in the same sub-directory in HDFS. The schema for the table is given
below:Emp_table: {id, name, dept, yoj}
9 Given
+ -+ -+ + + +
| ID | NAME | AGE | ADDRESS | SALARY |
+ -+ -+ + + +
+ + + -+ -+
|OID | DATE | CUSTOMER_ID | AMOUNT |

Create the following table in hive and insert transaction records into it.
write an SQL query to find the customer details who have made an order?
10 Understanding Spark

You might also like