YARN
YARN
NodeManager:
NodeManager monitors the resource usage such as memory, CPU, disk, network,
etc.
Container_1: 1GB,6CPU
During the application execution, the client that submitted the job directly
communicates with the Application Master to get status, progress updates.
Once the application has been processed completely, the application master
deregisters with the ResourceManager and shutsdown allowing its own container
to be repurposed.
HDFS: Hadoop Distributed File System. It simply stores data files as close to the
original form as possible.
HBase: It is Hadoop’s distributed column based database. It supports structured
data storage for large tables.
Hive: It is a Hadoop’s data warehouse, enables analysis of large data sets using a
language very similar to SQL. So, one can access data stored in hadoop cluster by
using Hive.
Pig: Pig is an easy to understand data flow language. It helps with the analysis of
large data sets which is quite the order with Hadoop without writing codes in
MapReduce paradigm.
Sqoop: it is used to transfer bulk data between Hadoop and structured data stores
such as relational databases.
Ambari: it is a web based tool for provisioning, Managing and Monitoring Apache
Hadoop clusters.
Hadoop Common: It is a set of common utilities and libraries which handle other
Hadoop modules. It makes sure that the hardware failures are managed by Hadoop
cluster automatically.