Big Data (Hadoop)
Big Data (Hadoop)
Velocity
Volume
Variety
At High Speed
In Various Format
Hadoop:
Hadoop deals,
Storage
Process
Key Features Of Hadoop:
Open Source
Distributed Technology
Batch Processing
Fault tolerance
Replication
Scalability
Commodity Hardware for Hadoop:
Inexpensive Software.
Cloudera
MapR
Horton Networks
Apcche
Hadoop Cluster Nodes:
HDFS Storage
NODE
MAPREDUCE
Process
/home/hadoop/conf/hdfs-site.Xml
Hadoop Architecture:
Components of Hadoop Architecture,
Name Node
Data Node
Job Tracker
Task Tracker
Diagramatic representation:
Slave Slave
node node
Name Node
Data
hdfs
Node Dat
a
nod
Data Node
e
Task Tracker
mapreduce
Task
Task
Tracke
Job Tracker Tracke
r
r
Name Node:
Assign Tasks
Schedule Tasks
Re-schedule Tasks
Task Tracker:
Hdfs
MapReduce
Hive
Pig
Sqoop
Hbase
Oozie
Flume
Mahout
Impala
YARN
HDFS:
HDFS
MR
LFS
Commodity Hardware.
High Latency.
Job Tracker
Task Tracker
Phases in MapReduce:
MAPPER Phase
REDUCER Phase
FileInputForma (FIF)
FileOutputFormat (FOF)
TextInputFormat (TIF)
TextOutputForma (TOF)
KeyValueTextInputFormat (KVTIF)
NLineInputFormat (NLINE)
DBInputFormat (DBIF)
Combiner:
Jobobj.SetCombinerClass(<<CombinerClassName.class>>);
PIG:
Local Mode:
LFS
LFS
HDFS
HDFS
Different flavours of PIG Execution:
Grunt Shell
Script Mode
Embedded Mode
HIVE:
External tables
SQOOP:
Hbase is built on top of hdfs and is used for performing real time
random reads/writers.
OOZIE meant for creating the workflow and scheduling same i.e
job scheduling tool in hadoop.
Flume is for collecting the live streaming data and distributed the same
data over hdfs paths.