Big Data Analytics
Big Data Analytics
COURSE OUTCOMES:
Upon successful completion of the course, the student is able to
1. Compare various file systems and use an appropriate file system for storing different types of
data.
2. Demonstrate the concepts of Hadoop ecosystem for storing and processing of unstructured
data.
3. Apply the knowledge of programming to process the stored data using Hadoop tools and
generate reports.
4. Connect to web data sources for data gathering, Integrate data sources with hadoop
components to process streaming data.
5. Tabulate and examine the results generated using hadoop components
UNIT-I
INTRODUCATION TO BIG DATA: Data and its importance, Big Data - definition, implications of Big
Data, addressing Big Data implications using Hadoop, Hadoop Ecosystem
HADOOP ARCHITECTURE:
Hadoop Storage : HDFS, Hadoop
Processing : Map Reduce Framework
Hadoop Server Roles : Name Node, Secondary Name Node and Data Node, Job Tracker,
TaskTracker
HDFS-HADOOP DISTRIBUTED FILE SYSTEM: Design of HDFS, HDFS Concepts, HDFS Daemons,
HDFS High Availability, Block Abstraction, FUSE: File System in User Space. HDFS Command Line
Interface (CLI), Concept of File Reading and Writing in HDFS.
UNIT-II
UNIT-III
INTRODUCTION TO PIG: Understanding pig and pig Platform, introduction to Pig Latin Language and
Execution engine, running pig in different modes, Pig Grunt Shell and its usage.
PIG LATIN LANGUAGE –SEMANTICS –DATA TYPES IN PIG: Pig Latin Basics, Key words, Pig Data
types, Understanding Pig relation, bag, tuple and writing pig relations or statements using Grunt Shell,
expressions, Data processing operators, using Built in functions.
WRITING PIG SCRIPTS USING PIG LATIN: Writing pig scripts and saving them text editor, running pig
scripts from command line.
UNIT-IV
INTRODUCATION TO HIVE: Understanding Hive Shell, Running Hive, Understanding Schema on read
and Schema on write.
HIVE QL DATA TYPES, SEMANTICS: Introduction to Hive QL (Query Language), Language semantics,
Hive Data Types.
HIVE DDL, DML AND HIVE SCRIPTS: Hive Statements, Understanding and working with Hive Data
Definition Languages and Manipulation Language statements, Creating Hive Scripts and running them
from hive terminal and command line.
UNIT-V
SQOOP: Introduction to Sqoop tool, commands to connect databases and list databases and tables,
command to import data from RDBMS into HDFS, Command to export data from HDFS into required
tables of RDBMS.
FLUME: Introduction to Flume agent, understanding Flume components Source, Channel and Sink.
OOZIE: Introduction to Oozie, Understanding work flow Management.
TEXT BOOKS:
REFERENCE BOOKS:
1. Michael Berthold, David J. Hand, "Intelligent Data Analysis”, Springer, 2007.
2. Paul Zikopoulos ,Dirk DeRoos , Krishnan Parasuraman , Thomas Deutsch , James Giles , David
Corigan , "Harness the Power of Big Data The IBM Big Data Platform ", Tata McGraw Hill
Publications, 2012.