0% found this document useful (0 votes)
4 views

DATA228 Lecture Notes Week 5

The document discusses distributed computing, focusing on Hadoop's YARN as a general-purpose distributed compute system. It outlines the elements and characteristics of distributed compute systems, including task scheduling, resource management, and the differences between stateless and stateful applications. Additionally, it describes YARN's architecture, including its ResourceManager and NodeManager, and the various scheduling strategies it employs to optimize resource usage and application performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

DATA228 Lecture Notes Week 5

The document discusses distributed computing, focusing on Hadoop's YARN as a general-purpose distributed compute system. It outlines the elements and characteristics of distributed compute systems, including task scheduling, resource management, and the differences between stateless and stateful applications. Additionally, it describes YARN's architecture, including its ResourceManager and NodeManager, and the various scheduling strategies it employs to optimize resource usage and application performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DATA 228

Big Data Technologies and Applications (Fall 2024)

Sangjin Lee
Hadoop: YARN, Hadoop’s
distributed compute

Ch pter 4, “H doop: the De initive Guide” 4th Edition, Tom White


a
a
f
What is distributed compute/computing?

A distributed compute system is system which solves computing problem utilizing set of
multiple devices over network.
a
a
a
a
Distributed compute

• Sometimes referred to s Scheduler or Orchestr tor


a
a
Elements of distributed compute
Fundamentals

• Run t sks in p r llel cross multiple m chines

• Schedule t sks in n e icient nd f ir m nner

• Monitor nd ccount for resource us ge cross multiple m chines

• Sc le horizont lly by dding more m chines to the cluster

• Recover from ll m nners of f ilures: node f ilures, t sk f ilures, network f ilures, etc.
a
a
a
a
a
a
a
a
a
a
a
a
a
ff
a
a
a
a
a
a
a
a
a
a
a
a
a
Elements of distributed compute
More advanced elements

• Support notion of n “ pplic tion”

• Support both st teless nd st teful types of pplic tions

• (Big-d t -speci ic) Schedule t sks to be s loc l to d t s possible

• (St teless- pp-speci ic) Provide support for tr ic ingress for pplic tions
a
a
a
a
a
f
a
a
f
a
a
a
a
a
a
a
a
a
ff
a
a
a
a
a
a
Elements of distributed compute
Stateless and stateful

St teless St teful

Coordin tion No coordin tion required Coordin tion required mong worklo ds

Dur tion Tends to run long (long-running) Tends to complete its job nd shut down

St te T sks don’t need to m int in st te St te is critic l to t sks

Big d t jobs, b tch jobs, C ss ndr , AI/ML


Ex mples Web services, microservices
jobs requiring GPUs
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
Examples of distributed compute systems

• Not quite but pointing the w y: servlet cont iners (Tomc t, Jetty), pplic tion servers

• Mesos

• Kubernetes

• Orchestr tes cont iner worklo ds

• The le der in st teless distributed compute systems

• Supports more complex pplic tion types (st teful, GPU, etc.)

• Sp wned n ecosystem of supporting technologies: Helm, cont inerd, Istio, Envoy, etc. (in CNCF)

• Old competition: Mesos


a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
Examples of distributed compute systems

• YARN

• Orchestr tes mostly big-d t or d t -rel ted worklo ds

• C n support other types of pplic tions

• D t - w re scheduling
a
a
a
a
a
a
a
a
a
a
a
a
a
a
Distributed compute in Hadoop
Characteristics

• Accomplish d t processing over l rge d t (TBs or PBs) within short mount of time

• Use resources (memory, CPUs, network, nd I/O) e iciently to ccomplish them

• H ndle d t -w re scheduling

• H ndle t sk scheduling in bursts


a
a
a
a
a
a
a
a
a
a
a
a
ff
a
a
a
YARN
Basics

• H doop’s gener l-purpose distributed compute system

• It is NOT d t computing fr mework itself

• D t computing fr meworks (M pReduce, Sp rk, Tez, etc.) re YARN pplic tions

• D t pr ctitioners don’t inter ct with YARN directly for the most p rt

• Supports pplic tions nd cont iners (t sks)


a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
Architecture

• ResourceM n ger nd NodeM n gers


a
a
a
a
a
YARN
ResourceManager

• “One” for single cluster

• Processes resource requests

• Resource requests

• Amount of compute resources being requested: virtu l cores (vCPUs), memory (MBs), GPUs, etc.

• Loc lity constr ints: speci ic node, speci ic r ck, or o -r ck (= nywhere)

• ResourceM n ger schedules cont iners b sed on resource requests

• ResourceM n ger tries to schedule b sed on the loc lity constr ints
a
a
a
a
a
a
a
f
a
a
a
f
a
a
a
ff
a
a
a
YARN
ResourceManager

• M int ins st te of v il ble nd lloc ted resources on nodes

• M int ins st te of ll running pplic tions nd cont iners

• ResourceM n ger requires l rge mount of memory

• ResourceM n ger c n be sc l bility bottleneck


a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
ResourceManager high availability (HA)

• Active nd st ndby ResourceM n gers

• ResourceM n ger st te is persisted in n RMSt teStore ( ilesystems, ZooKeeper, etc.)

• M nu l nd utom tic f ilover

• HA c n recover ll running pplic tions during f ilover


a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
f
YARN
NodeManager

• Responsible for l unching nd m n ging cont iners

• Runs he lth checks on the node it runs nd communic tes the st te to RM

• Reports resource us ge st tus to RM

• NodeM n ger c n be rest rted without ecting running cont iners


a
a
a
a
a
a
a
a
a
a
a
a
a
ff
a
a
a
a
YARN
Application taxonomy

• Applic tion

• A single entity th t represents the distributed compute job s whole

• Cont iners

• Individu l compute t sks s p rt of the pplic tion

• Not the s me s the (Docker/Kubernetes) cont iner

• Applic tion m ster (AM)

• Speci l cont iner th t m n ges other cont iners for the pplic tion
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
YARN application

• Client requests n Applic tion M ster


(AM) to ResourceM n ger (RM)

• RM selects node to l unch the AM


cont iner

• AM m y request more cont iners to RM


a
a
a
a
a
a
a
a
a
a
YARN
Schedulers

• Scheduler: ResourceM n ger component/ lgorithm th t lloc tes cont iners b sed on
cert in policy

• A scheduler tries to optimize resource us ge (utiliz tion) nd timeliness of pplic tion


completions (throughput)

• The centr l consider tion is multi-ten ncy

• YARN supports 3 schedulers: FIFO ( irst-in- irst-out) scheduler, C p city scheduler, nd F ir


scheduler
a
a
a
a
a
a
f
a
a
f
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
Schedulers
YARN
Schedulers

• FIFO scheduler

• Simplest scheduler

• It’s FIFO ( irst-in- irst-out) queue

• Applic tions run in the order of submission: the next pplic tion gets scheduled fter the
previous pplic tions h ve been completed

• Not suit ble for multi-ten nt cluster


a
a
a
a
f
a
a
f
a
a
a
a
a
YARN
Schedulers

• C p city scheduler (def ult)

• C p city is p rtitioned with multiple dedic ted queues (e.g. te ms, groups of pps, etc.)

• Queues get resource gu r ntees even if other queues re contended

• Improves over ll throughput comp red to FIFO

• Provides te ms with predict ble c p city

• M y str nd resources if utiliz tion cross queues is uneven

• Queue-sizing becomes very import nt


a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
YARN
Schedulers

• F ir scheduler

• Dyn mic lly b l nces resources between ll running pps so th t ll pps get equ l sh re
of resources over time: “f ir sh re”

• Tries to b l nce between h ving ll pps m ke good progress nd letting l rge pps inish in
timely m nner

• Queues re on p per unnecess ry, but they c n be used to incre se predict bility

• It c n be the best of both worlds in l rge multi-ten nted clusters

• Some loss of predict bility


a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
a
f
a
a
YARN
Demo

Exploring schedulers using


sleep pp (job)
a
a
YARN
Demo: sleep job

• The j r for the sleep job (hadoop mapreduce client jobclient 3.4.0 tests.jar) is v il ble on C nv s

• Sleep job p r meters

• m: number of m ppers

• r: number of reducers

• mt: m pper sleep dur tion (ms)

• rt: reducer sleep dur tion (ms)

• bin/hadoop jar share/hadoop/mapreduce/hadoop mapreduce client jobclient 3.4.0 tests.jar


sleep m 2 r 1 mt 90000 rt 90000
-
-
-
-
a
a
-
a
a
-
a
-
a
a
-
-
-
-
-
-
-
-
-
a
-
a
a
-
a
a
YARN
Demo: sleep job

• AM size: 2 GB memory nd 1 vCore

• MR cont iner size: 1 GB memory nd 1 vCore e ch

• (My) YARN cluster size: 8 GB memory nd 8 vCores


a
a
a
a
a
YARN
Demo: sleep job: rst app

fi
YARN
Demo: sleep job: rst app (mappers running)

fi
YARN
Demo: sleep job: second app submitted
YARN
Demo: sleep job: second app running mappers
YARN
Demo: sleep job: second app completes running

You might also like