Discuss Mesos and Yarn and The Relative Placement of The Two Respectively
Discuss Mesos and Yarn and The Relative Placement of The Two Respectively
Group Members
Dikshika Arya 19PT1-07
Jigyasa Monga 19PT1-12
Pankhuri Bhatnagar 19PT1-18
Question 1:
Discuss Mesos and Yarn and the relative placement of the two respectively.
MESOS
● open source cluster manager that handles workloads in a distributed environment
through dynamic resource sharing and isolation
● suited for the deployment and management of applications in large-scale clustered
environments
● Isolates the processes running in a cluster, such as memory, CPU, file system, rack
locality and I/O, to keep them from interfering with each other. Such isolation allows
Mesos to create a single, large pool of resources to offer workloads
● brings together the existing resources of the machines/nodes in a cluster into a single
pool from which a variety of workloads may utilize
● Also known as node abstraction, this removes the need to allocate specific machines for
different workloads
● Mesos also utilizes Apache Zookeeper, part of Hadoop, to synchronize distributed
processes to ensure all clients receive consistent data and assure fault tolerance
● Each framework consists of at least two crucial components: a scheduler and executor.
Schedulers register with the Mesos master to get resources, and executors launch the
command or program that runs tasks on the slaves
● The master offers resources to each framework, but it is the framework’s scheduler that
chooses which of those available resources to use. After a framework accepts the
resources offered by the master, it sends a description of the tasks back to the master.
The master then sends these tasks to the slave, and the executor on the slave launches
the tasks
● Mesos sit between the operating system and the application layer and basically acts as a
data center kernel
YARN (Yet Another Resource Negotiator)
● In 2012, the architecture was upgraded to YARN, which provided a general purpose data
processing framework
● This framework supports not just the MapReduce model but also newer data processing
frameworks
● In YARN data processing is separated from resource management and scheduling
components of MapReduce
● Helps in efficiently running interactive queries, streaming applications and supports
broader range of applications
Source: https://www.oreilly.com/content/a-tale-of-two-clusters-mesos-and-yarn/
In between YARN and Mesos, YARN is specially designed for Hadoop workloads whereas Mesos
is designed for all kinds of workloads. YARN is an application level scheduler and Mesos is an OS
level scheduler. It is better to use YARN if we have already run a Hadoop cluster.
Question 2:
2. Discuss Apache Tez and its utility? How does it fit into Hadoop logical stack ?
● The Apache TEZ project is aimed at building an application framework which allows for a
complex directed-acyclic-graph of tasks for processing data. It is currently built on top
Apache Hadoop YARN- the resource management framework.
● It is a distributed parallel execution framework which is targeted towards data
processing applications.
● It is based on expressing a computation as a data flow graph.
● It negotiates resources from the Hadoop framework.
● It supports Fault tolerance and recovery.
● It also supports Horizontal scalability, Resource elasticity.
● It has a shared library of ready-to-use components.
● It is highly customizable to meet a broad range of use cases.
Tez helps in solving hard problems of running in a distributed Hadoop environment. Using this,
Apps can focus on solving their domain specific problems.
Tez is built on top of YARN, which is the new resource-management framework for
Hadoop.
Tez generalizes the MapReduce paradigm to a more powerful framework based on
expressing computations as a dataflow graph.
Tez is not meant directly for end-users – in fact it enables developers to build end-user
applications with much better performance and flexibility.
Tez enables the project to be highly customizable to meet broad spectrum of use cases
and there is a significant improvement in the response time of APIs Hive, Pig , etc when
they use Tez instead of Map Reduce for data processing.
Tez: Simple Deployment –
Tez is completely a client-side application, leverages YARN local resources and
distributed cache. It usually does not need to deploy anything on the cluster for Tez. It
requires to just upload the relevant Tez libraries to HDFS and then use the Tez client to
submit with those libraries.
Working of Tez:
Tez API