IN4392 Cloud Computing Cloud Programming Models
Cloud Computing (IN4392) D. .!. "pema and #. Iosup. $it% &ontri'utions (rom Claudio Martella and )ogdan *%it. 2012-2013
1
Parallel and Distri'uted *roup+,- Del(t
,erms (or ,oda./s Dis&ussion
Programming model = language + libraries + runtime system that create a model of computation (an abstract machine) = an abstraction of a computer system Wikipedia Examples message!passing "s shared memory# data! "s task!parallelism# $ #'stra&tion le0el % What is the best abstraction le"el& = distance from physical machine Examples Assembly lo*!le"el "s Java is high le"el +any design trade!offs performance# ease!of!use# common!task optimi,ation# programming paradigm# $
'(1'!'(1) '
C%ara&teristi&s o( a Cloud Programming Model
1')234-
.ost model (Efficiency) = cost/performance# o"erheads# $ 0calability 1ault!tolerance 0upport for specific ser"ices .ontrol model# e-g-# fine!grained many!task scheduling 5ata model# including partitioning and placement# out!of! memory data access# etc6- 0ynchroni,ation model
'(1'!'(1)
#genda
12. )237ntroduction Cloud Programming in Pra&ti&e (,%e Pro'lem) 8rogramming +odels for .ompute!7ntensi"e Workloads 8rogramming +odels for 9ig 5ata 0ummary
'(1'!'(1)
,oda./s C%allenges
e0cience :he 1ourth 8aradigm :he 5ata 5eluge and 9ig 5ata 8ossibly others
'(1'!'(1)
e1&ien&e2 ,%e $%.
0cience experiments already cost '3;3(< budget
$ and perhaps incur 63< of the delays
+illions of lines of code *ith similar functionality
=ittle code reuse across pro>ects and application domains $ but last t*o decades? science is "ery similar in structure
+ost results difficult to share and reuse
.ase!in!point 0loan 5igital 0ky 0ur"ey digital map of '3< of the sky x spectra 2(:9+ sky sur"ey data '((++ astro!ob>ects (images) 1++ ob>ects *ith spectrum (spectra) @o* to make it *ork for this and the next generation of scientists&
'(1'!'(1) 4
0ource Aim Bray and Clex 0,alay#e0cience !! C :ransformed 0cientific +ethod# http //research-microsoft-com/en!us/um/people/gray/talks/DE.!.0:9Fe0cience-ppt
e1&ien&e (!o%n ,a.lor3 -4 1&i.,e&%.3 1999)
# ne5 s&ienti(i& met%od
.ombine science *ith 7: 1ull scientific process control scientific instrument or produce data from simulations# gather and reduce data# analy,e and model results# "isuali,e results +ostly compute!intensi"e# e-g-# simulation of complex phenomena
I, support
7nfrastructure =@. Brid# Gpen 0cience Brid# 5C0# DorduBrid# $ 1rom programming models to infrastructure management tools
"6amples
'(1'!'(1)
% Why is .omp0ci an example here&
6
H physics# 9ioinformatics# +aterial science# Engineering# Comp1&i
,%e 7ourt% Paradigm2 ,%e $%. (#n #ne&dotal "6ample)
,%e 80er5%elming *ro5t% o( 4no5ledge
When 1' men founded the Dumber of 1II) 1II6 Eoyal 0ociety in 144(# it *as 8ublications 1II6 '((1 possible for an educated person to encompass all of scientific kno*ledge- J$K 7n the last 3( years# such has been the pace of scientific ad"ance that e"en the best scientists cannot keep up *ith disco"eries at frontiers outside their o*n field- :ony 9lair# 8+ 0peech# +ay '((' 5ata Ling#:he scientific impact of nations#Dature?(2-
,%e 7ourt% Paradigm2 ,%e $%at
7rom
.pot%esis to Data
2
:housand years ago science *as empiri&al describing natural phenomena =ast fe* hundred years t%eoreti&al branch using models# generali,ations
. a 4G c2 a = 3 2 a
=ast fe* decades a &omputational branch simulating complex phenomena :oday (t%e 7ourt% Paradigm) data e6ploration %1 What is the 1ourth 8aradigm&
unify theory# experiment# and simulation 5ata captured by instruments or generated by simulator %' What are the dangers 8rocessed by soft*are of the 1ourth 8aradigm& 7nformation/Lno*ledge stored in computer 0cientist analy,es results using data management and statistics
'(1'!'(1)
0ource Aim Bray and :he 1ourth 8aradigm# http //research-microsoft-com/en!us/collaboration/fourthparadigm/
,%e 9Data Deluge:2 ,%e $%.
ME"ery*here you look# the Nuantity of information in the *orld is soaring- Cccording to one estimate# mankind created 13( exabytes (billion gigabytes) of data in '((3- :his year# it *ill create 1#'(( exabytes- +erely keeping up *ith this flood# and storing the bits that might be useful# is difficult enoughCnalysing it# to spot patterns and extract useful information# is harder still- :he 5ata 5eluge# :he Economist# '3 1ebruary '(1(
1(
9Data Deluge:2 ,%e Personal Meme6 "6ample
Oanne"ar 9ush in the 1I2(s record your life +7: +edia =aboratory :he @uman 0peechome 8ro>ect/:otalEecall# data mining/analysis/"isio 5eb Eoy and Eupal 8atel record practically e"ery *aking moment of their son?s first three years ('(< pri"acy time$7s this e"en legal&P 0hould it be&P) 11x1+8/12fps cameras# 12x14b!2QL@, mics# 2-2:9 EC75 + tapes# 1( computersR '((k hours audio!"ideo 5ata si,e '((B9/day# 1-389 total
11
9Data Deluge:2 ,%e *aming #nal.ti&s "6ample
E% 77 '(:9/year all logs @alo) 1-289 ser"ed statistics on player logs
1'
9Data Deluge:2 Datasets in Comp.1&i.
5ataset 0i,e http //g*a-e*i-tudelft-nl :he 1ailure :race Crchi"e 1:9/yr 1:9 1((B9 1(B9 Bam:C 8'8:C
http //fta-inria-fr
8eer!to!8eer :race Crchi"e $ 8WC# 7:C# .ECW5C5# $
1B9 T(4 T(I T1( T11 Sear
1)
1#(((s of scientists 1rom theory to practice
1)
9Data Deluge:2 ,%e Pro(essional $orld *ets Conne&ted
1eb '(1'
'(1'!'(1) 12
0ource Oincen,o .osen,a# :he 0tate of =inked7n# http //"incos-it/the!state!of!linkedin/
$%at is 9)ig Data:;
Oery large# distributed aggregations of loosely structured data# often incomplete and inaccessible Easily exceeds the processing capacity of con"entional database systems 8rinciple of 9ig 5ata When you can# keep e"erythingP :oo big# too fast# and doesn?t comply *ith the traditional database architectures
'(11!'(1' 13
,%e ,%ree 9<:s o( )ig Data
Oolume
+ore data "s- better models 5ata gro*s exponentially Cnalysis in near!real time to extract "alue 0calable storage and distributed Nueries
Oelocity
0peed of the feedback loop Bain competiti"e ad"antage fast recommendations 7dentify fraud# predict customer churn faster
Oariety
:he data can become messy text# "ideo# audio# etc 5ifficult to integrate into applications
'(11!'(1' 14
Cdapted from 5oug =aney# )5 data management# +E:C Broup/Bartner report# 1eb '((1- http //blogs-gartner-com/doug!laney/files/'(1'/(1/adI2I!)5!5ata! +anagement!.ontrolling!5ata!Oolume!Oelocity!and!Oariety-pdf
Data $are%ouse 0s. )ig Data
'(11!'(1'
16
0ource http //*ikibon-org/
#genda
1- 7ntroduction '- .loud 8rogramming in 8ractice (:he 8roblem) 3. Programming Models (or Compute-Intensi0e $or=loads 1. )ags o( ,as=s '- Workflo*s )- 8arallel 8rogramming +odels 2- 8rogramming +odels for 9ig 5ata 3- 0ummary
'(1'!'(1) 1Q
$%at is a )ag o( ,as=s ()o,); # 1.stem <ie5
9o: = set of >obs sent by a user$
$that start at most Us after the first >ob
% What is the user?s "ie*& :ime JunitsK
Why 9ag of :asks& 1rom the perspecti"e of the user# >obs in set are >ust tasks of a larger >ob C single useful result from the complete 9o: Eesult can be combination of all tasks# or a selection of the results of most or e"en a single task
'(1'!'(1) Iosup et al., The Characteristics and Performance of Groups of Jobs in Grids, Euro-Par, LNCS, ol.!"!#, pp. $%&-$'$, '((6.
1I
#ppli&ations o( t%e )o, Programming Model
8arameter s*eeps
.omprehensi"e# possibly exhausti"e in"estigation of a model Oery useful in engineering and simulation!based science
+onte .arlo simulations
0imulation *ith random elements fixed time yet limited inaccuracy Oery useful in engineering and simulation!based science
+any other types of batch processing
8eriodic computation# .ycle sca"enging Oery useful to automate operations and reduce *aste
'(1'!'(1) '(
)o,s )e&ame t%e Dominant Programming Model (or *rid Computing
(V0) :eraBrid!' D.0C (V0) .ondor V-Wisc(EV) EBEE (.C)
[email protected]: (V0) Brid) (V0) B=GW (VL) EC= (DG#0E) DorduBrid (1E) BridW3((( (D=) 5C0!'
7rom >o's ?@A (
(V0) :eraBrid!' D.0C (V0) .ondor V-Wisc(EV) EBEE (.C)
[email protected]: (V0) Brid) (V0) B=GW (VL) EC= (DG#0E) DorduBrid (1E) BridW3((( (D=) 5C0!'
'(
2(
4(
Q(
1((
7rom CP-,ime ( ?@A
'(1'!'(1)
'(
2(
4(
Q( '1
1((
Iosup and Epema( Grid Computin) *or+loads. IEEE Internet Computin) #,-&.( #'-&" -&/##.
Pra&ti&al #ppli&ations o( t%e )o, Programming Model
Parameter 15eeps in Condor ?1+4A
0ue the scientist *ants to 1ind the "alue of 1(x#y#,) for 1( "alues for x and y# and 4 "alues for , 1olution Eun a parameter s5eep# *ith 1( x 1( x 4 = 4(( parameter "alues 8roblem of the solution
0ue runs one >ob (a combination of x# y# and ,) on her lo*!end machine- 7t takes 4 hours :hat?s 1B0 da.s uninterrupted computation on 0ue?s machineP
'(1'!'(1) ''
0ource .ondor :eam# .ondor Vser?s :utorialhttp //cs-u*isc-edu/condor
Pra&ti&al #ppli&ations o( t%e )o, Programming Model
Parameter 15eeps in Condor ?2+4A
Universe Executable Input Output Error Log = = = = = = vanilla sim.exe input.txt output.txt error.txt sim.log
Requirements = OpSys == WINNT61 && .omplex 0=Cs can be specified easily Arch == INTEL && (Disk >= DiskUsage) && ((Memory * 1024)>=ImageSize)
InitialDir = run_$(Process) Clso passed as Queue 600 parameter to sim.exe
'(1'!'(1) ')
0ource .ondor :eam# .ondor Vser?s :utorialhttp //cs-u*isc-edu/condor
Pra&ti&al #ppli&ations o( t%e )o, Programming Model
Parameter 15eeps in Condor ?3+4A
% condor_submit sim.submit Submitting job(s) ................................................ ................................................ ................................................ ................................................ ................................................ ............... Logging submit event(s) ................................................ ................................................ ................................................ ................................................ ................................................ ............... 600 job(s) submitted to cluster 3.
'(1'!'(1) '2
0ource .ondor :eam# .ondor Vser?s :utorialhttp //cs-u*isc-edu/condor
Pra&ti&al #ppli&ations o( t%e )o, Programming Model
Parameter 15eeps in Condor ?4+4A
% condor_q -- Submitter: x.cs.wisc.edu : <128.105.121.53:510> : x.cs.wisc.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 3.0 frieda 4/20 12:08 0+00:00:05 R 0 9.8 sim.exe 3.1 frieda 4/20 12:08 0+00:00:03 I 0 9.8 sim.exe 3.2 frieda 4/20 12:08 0+00:00:01 I 0 9.8 sim.exe 3.3 frieda 4/20 12:08 0+00:00:00 I 0 9.8 sim.exe ... 3.598 frieda 4/20 12:08 0+00:00:00 I 0 9.8 sim.exe 3.599 frieda 4/20 12:08 0+00:00:00 I 0 9.8 sim.exe 600 jobs; 599 idle, 1 running, 0 held
'(1'!'(1)
'3
0ource .ondor :eam# .ondor Vser?s :utorialhttp //cs-u*isc-edu/condor
#genda
1- 7ntroduction '- .loud 8rogramming in 8ractice (:he 8roblem) 3. Programming Models (or Compute-Intensi0e $or=loads 1- 9ags of :asks 2. $or=(lo5s )- 8arallel 8rogramming +odels 2- 8rogramming +odels for 9ig 5ata 3- 0ummary
'(1'!'(1) '4
$%at is a $o=(lo5;
W1 = set of >obs *ith precedence (think 5irect Ccyclic Braph)
'(1'!'(1)
'6
#ppli&ations o( t%e $or=(lo5 Programming Model
.omplex applications
.omplex filtering of data .omplex analysis of instrument measurements
Cpplications created by non!.0 scientistsH
Workflo*s ha"e a natural correspondence in the real!*orld# as descriptions of a scientific procedure Oisual model of a graph sometimes easier to program
8recursor of the +apEeduce 8rogramming +odel (next slides)
'(1'!'(1) 'Q
HCdapted from .arole Boble and 5a"id de Eoure# .hapter in :he 1ourth 8aradigm# http //research-microsoft-com/en!us/collaboration/fourthparadigm/
$or=(lo5s "6isted in *rids3 'ut Did Not )e&ome a Dominant Programming Model
:races
0elected 1indings
=oose coupling Braph *ith )!2 le"els C"erage W1 si,e is )(/22 >obs 63<+ W1s are si,ed 2( >obs or less# I3< are si,ed '(( >obs or less
'I
'(1'!'(1) 0stermann et al., 0n the Characteristics of Grid *or+flo1s, CoreG2I3 Inte)rated 2esearch in Grid Computin) -CGI*., '((Q.
Pra&ti&al #ppli&ations o( t%e $7 Programming Model
)ioin(ormati&s in ,a0erna
'(1'!'(1)
)(
0ource .arole Boble and 5a"id de Eoure# .hapter in :he 1ourth 8aradigm# http //research-microsoft-com/en!us/collaboration/fourthparadigm/
#genda
1- 7ntroduction '- .loud 8rogramming in 8ractice (:he 8roblem) 3. Programming Models (or Compute-Intensi0e $or=loads 1- 9ags of :asks '- Workflo*s 3. Parallel Programming Models 2- 8rogramming +odels for 9ig 5ata 3- 0ummary
'(1'!'(1) )1
Parallel Programming Models
,-D &ourse in4049 Introdu&tion to PC
Cbstract machines ,as= (groups o( B3 B minutes)2 (5istributed) shared memory dis&uss parallel programming in &louds
5istributed memory +87
,as= (inter (inter-group dis&ussion)2 dis&uss. models .onceptual programming
+aster/*orker 5i"ide and conNuer 5ata / :ask parallelism 908
0ystem!le"el programming models
:hreads on B8Vs and other multi!cores
'(1'!'(1) )'
4arbanescu et al.( To1ards an Effecti e 5nified Pro)rammin) 6odel for 6an7-Cores. IP3PS *S &/#&
#genda
1')4. 37ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads Programming Models (or )ig Data 0ummary
'(1'!'(1)
))
"&os.stems o( )ig-Data Programming Models
% Where does +E!on!demand %fit& Where does 8regel!on!B8Vs fit&
@igh!=e"el =anguage
1lume 9ig%uery 0%= +eteor AC%= @i"e 8ig 0a*,all 0cope 5ryad=7D% C%=
8rogramming +odel
P#C, MapCedu&e Model Pregel 5ataflo* Clgebrix
Execution Engine
1lume 5remel :era C,ure Nep%ele @aloop Engine 0er"ice 5ata Engine :ree Engine adoop+ *irap% +87/ 5ryad D#CN Erlang @yracks
0torage Engine
0) B10 :era C,ure 5ata 5ata 0tore 0tore D71 Ooldemort = .osmos10 1 0 Csterix 9!tree
)2
H 8lus Yookeeper# .5D# etc-
'(1'!'(1)
Cdapted from 5agstuhl 0eminar on 7nformation +anagement in the .loud# http //***-dagstuhl-de/program/calendar/partlist/&semnr=11)'1X0VGB
#genda
1')4. 7ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads Programming Models (or )ig Data 1. MapCedu&e '- Braph 8rocessing )- Gther 9ig 5ata 8rogramming +odels 3- 0ummary
'(1'!'(1)
)3
MapCedu&e
+odel for processing and generating large data sets Enables a functional!like programming model 0plits computations into independent parallel tasks +akes efficient use of large commodity clusters @ides the details of paralleli,ation# fault!tolerance# data distribution# monitoring and load balancing
'(11!'(1' )4
MapCedu&e2 ,%e Programming Model
C programmig model# not a programming languageP 1- 7nput/Gutput 0et of key/"alue pairs '- +ap 8hase
8rocesses input key/"alue pair 8roduces set of intermediate pairs map (in_key, in_value) -> list(out_key, interm_value)
3. Reduce Phase:
Combines all intermediate values for a given key Produces a set of merged output values reduce(out_key, list(interm_value)) -> list(out_value)
'(11!'(1' )6
MapCedu&e in t%e Cloud
1acebook use case
0ocial!net*orking ser"ices Cnaly,e connections in the graph of friendships to recommend ne* connections
Boogle use case
Web!base email ser"ices# Boogle5ocs Cnaly,e messages and user beha"ior to optimi,e ad selection and placement
Soutube use case
Oideo!sharing sites Cnaly,e user preferences to gi"e better stream suggestions
'(11!'(1'
)Q
$ord&ount "6ample
1ile 1 :he big data is big- 1ile ' +apEeduce tames big data- Map 8utput2
+apper!1 (:he# 1)# (big# 1)# (data# 1)# (is# 1)# (big# 1) +apper!' (+apEeduce# 1)# (tames# 1)# (big# 1)# (data# 1)
Cedu&e Input
Eeducer!1 Eeducer!' Eeducer!) Eeducer!2 Eeducer!3 Eeducer!4 (:he# 1) (big# 1)# (big# 1)# (big# 1) (data# 1)# (data# 1) (is# 1) (+apEeduce# 1)# (+apEeduce# 1) (tames# 1)
Cedu&e 8utput
Eeducer!1 Eeducer!' Eeducer!) Eeducer!2 Eeducer!3 Eeducer!4 (:he# 1) (big# )) (data# ') (is# 1) (+apEeduce# ') (tames# 1)
)I
'(11!'(1'
Colored 1Euare Counter
'(11!'(1'
2(
MapCedu&e F *oogle2 "6ample 1
Benerating language model statistics
.ount Z of times e"ery 3!*ord seNuence occurs in large corpus of documents (and keep all those *here count [= 2)
+apEeduce solution
+ap extract 3!*ord seNuences =[ count from document Eeduce combine counts# and keep if count large enough
'(11!'(1'
21
http //***-slideshare-net/>hammerb/mapreduce!pact(4!keynote
MapCedu&e F *oogle2 "6ample 2
Aoining *ith other data
Benerate per!doc summary# but include per!host summary (e-g-# Z of pages on host# important terms on host)
+apEeduce solution
+ap extract host name from VE=# lookup per!host info# combine *ith per!doc data and emit Eeduce identity function (emit key/"alue directly)
'(11!'(1'
2'
http //***-slideshare-net/>hammerb/mapreduce!pact(4!keynote
More "6amples o( MapCedu&e #ppli&ations
5istributed Brep .ount of VE= Cccess 1reNuency Ee"erse Web!=ink Braph :erm!Oector per @ost 5istributed 0ort 7n"erted indices 8age ranking +achine learning algorithms $
'(11!'(1'
2)
"6e&ution 80er0ie5 (1+2)
% What is the performance problem raised by this step&
'(11!'(1'
22
"6e&ution 80er0ie5 (2+2)
Gne master# many *orkers example
7nput data split into + map tasks (42 / 1'Q +9) '((#((( maps Eeduce phase partitioned into E reduce tasks 2#((( reduces 5ynamically assign tasks to *orkers '#((( *orkers
+aster assigns each map / reduce task to an idle *orker
+ap *orker 5ata locality a*areness Eead input and generate E local files *ith key/"alue pairs Eeduce *orker Eead intermediate output from mappers 0ort and reduce to produce the output
'(11!'(1' 23
7ailures and )a&=-up ,as=s
'(11!'(1'
24
$%at is
adoop; # MapCedu&e "6e&. "ngine
7nspired by Boogle# supported by SahooP 5esigned to perform fast and reliable analysis of the big data =arge expansion in many domains such as
1inance# technology# telecom# media# entertainment# go"ernment# research institutions
'(11!'(1'
26
adoop F Da%oo
When you "isit yahoo# you are interacting *ith data processed *ith @adoopP
'(11!'(1'
2Q
https //open&irrus-org/system/files/8penCirrus adoop'((I-ppt
adoop -se Cases
1- 0earch Sahoo# Cma,on# Y"ents '- =og processing 1acebook# Sahoo# Aoost# =ast-fm )- Eecommendation 0ystems 1acebook 2- 5ata Warehouse 1acebook# CG= 3- Oideo and 7mage Cnalysis De* Sork :imes# Eyealike
'(11!'(1' 2I
http //cloud-berkeley-edu/data/hdfs-pdf
adoop Distri'uted 7ile 1.stem
Cssumptions and goals
5ata distributed across hundreds or thousands of machines 5etection of faults and automatic reco"ery 5esigned for batch processing "s- interacti"e use @igh throughput of data access "s- lo* latency of data access @igh aggregate band*idth and high scalability Write!once!read!many file access model +o"ing computations is cheaper than mo"ing data minimi,e net*ork congestion and increase throughput of the system
'(11!'(1'
3(
D71 #r&%ite&ture
+aster/sla"e architecture DameDode (DD)
+anages the file system namespace Eegulates access to files by clients Gpen/close/rename files or directories +apping of blocks to 5ataDodes Gne per node in the cluster +anages local storage of the node 9lock creation/deletion/replication initiated by DD 0er"e read/*rite reNuests from clients
'(11!'(1' 31
5ataDode (5D)
D71 Internals
Eeplica 8lacement 8olicy
1irst replica on one node in the local rack 0econd replica on a different node in the local rack :hird replica on a different node in a different rack impro"ed *rite performance ('/) are on the local rack) preser"es data reliability and read performance
.ommunication 8rotocols
=ayered on top of :.8/78 .lient 8rotocol client \ DameDode machine 5ataDode 8rotocol 5ataDodes \ DameDode DameDode responds to E8. reNuests issued by 5ataDodes / clients
'(11!'(1' 3'
adoop 1&%eduler
Aob di"ided into se"eral independent tasks executed in parallel
:he input file is split into chunks of 42 / 1'Q +9 Each chunk is assigned to a map task Eeduce task aggregate the output of the map tasks
:he master assigns tasks to the *orkers in 171G order
Aob:racker maintains a Nueue of running >obs# the states of the :ask:rackers# the tasks assignments :ask:rackers report their state "ia a heartbeat mechanism
5ata =ocality execute tasks close to their data 0peculati"e execution re!launch slo* tasks
'(11!'(1' 3)
MapCedu&e "0olution ?1+BA
adoop is Maturing2 Important Contri'utors
Dumber of patches
'(1'!'(1)
32
0ource http //***-theregister-co-uk/'(1'/(Q/16/communityFhadoop/
MapCedu&e "0olution ?2+BA
Gate 1&%eduler
=C:E 0cheduler
=ongest Cpproximate :ime to End 0peculati"ely execute the task that *ill finish farthest into the future
timeFleft = (1!progress0core) / progressEate
:asks make progress at a roughly constant rate Eobust to node heterogeneity
E.' 0ort running times
=C:E "s- @adoop "s- Do spec-
'(11!'(1'
33
8aharia et al.( Impro in) 6ap2educe performance in hetero)eneous en ironments. 0S3I &//%.
MapCedu&e "0olution ?3+BA
7#IC 1&%eduling
7solation and statistical multiplexing :*o!le"el architecture
1irst# allocates task slots across pools 0econd# each pool allocates its slots among multiple >obs 9ased on a max!min fairness policy
'(11!'(1'
34
8aharia et al.( 3ela7 schedulin)( a simple techni9ue for achie in) localit7 and fairness in cluster schedulin). EuroS7s &/#/. :lso T2 EECS-&//'-,,
MapCedu&e "0olution ?4+BA
Dela. 1&%eduling
5ata locality issues
@ead!of!line scheduling 171G# 1C7E =o* probability for small >obs to achie"e data locality 3Q< of >obs ] 1C.E9GGL ha"e ^ '3 maps Gnly 3< achie"e node locality "s- 3I< rack locality
'(11!'(1'
36
8aharia et al.( 3ela7 schedulin)( a simple techni9ue for achie in) localit7 and fairness in cluster schedulin). EuroS7s &/#/. :lso T2 EECS-&//'-,,
MapCedu&e "0olution ?B+BA
Dela. 1&%eduling
0olution
0kip the head!of!line >ob until you find a >ob that can launch a local task Wait :1 seconds before launching tasks on the same rack Wait :' seconds before launching tasks off!rack :1 = :' = 13 seconds =[ Q(< data locality
'(11!'(1'
3Q
8aharia et al.( 3ela7 schedulin)( a simple techni9ue for achie in) localit7 and fairness in cluster schedulin). EuroS7s &/#/. :lso T2 EECS-&//'-,,
48#G# and MapCedu&e ?1+9A
LGC=C
8lacement X allocation .entral for all +E clusters +aintains +E cluster metadata
+E!Eunner
.onfiguration X deployment +E cluster monitoring Bro*/0hrink mechanism
Gn!demand +E clusters
8erformance isolation 5ata isolation 1ailure isolation Oersion isolation
'(11!'(1'
3I
4oala and MapCedu&e ?2+9A
CesiHing Me&%anism
:*o types of nodes
Core nodes2 fully!functional nodes# *ith :ask:racker and 5ataDode (local disk access) ,ransient nodes2 compute nodes# *ith :ask:racker
8arameters
7 = Dumber of running tasks per number of a"ailable slots 8redefined 7min and 7ma6 thresholds 8redefined constant step gro51tep / s%rin=1tep , = time elapsed bet*een t*o successi"e resource offers
:hree policies
*ro5-1%rin= Poli&. (*1P)2 gro*!shrink but maintain 1 bet*een 7min and 7ma6 *reed.-*ro5 Poli&. (**P)2 gro*# shrink *hen *orkload done *reed.-*ro5-5it%-Data Poli&. (**DP)2 BB8 *ith core nodes (local disk access)
4(
'(11!'(1'
4oala and MapCedu&e ?3+9A
$or=loads
J1K J'K
IQ< of >obs ] 1acebook process 4-I +9 and take less than a minute J1K Boogle reported in '((2 computations *ith :9 of data on 1(((s of machines J'K
S- .hen# 0- Clspaugh# 5- 9orthakur# and E- Lat,# Energy Efficiency for =arge!0cale +apEeduce Workloads *ith 0ignificant 7nteracti"e Cnalysis# pp- 2)\34# '(1'A- 5ean and 0- Bhema*at# +apreduce 0implified 5ata 8rocessing on =arge .lusters# .omm- of the C.+# Ool- 31# no- 1# pp- 1(6\11)# '((Q-
'(11!'(1'
41
4oala and MapCedu&e ?4+9A
$ord&ount (,.pe 03 (ull s.stem)
.8V 5isk
1(( B9 input data 1( core nodes *ith Q map slots each Q(( map tasks executed in 6 *a"es Wordcount is .8V!bound in the map phase
'(11!'(1' 4'
4oala and MapCedu&e ?B+9A
1ort (,.pe 03 (ull s.stem)
.8V 5isk
8erformed by the +E frame*ork during the shuffling phase 7ntermediate key/"alue pairs are processed in increasing key order 0hort map phase *ith 2(<!4(< .8V utili,ation =ong reduce phase *hich is highly disk intensi"e
'(11!'(1' 4)
4oala and MapCedu&e ?I+9A
1peedup ((ull s.stem)
Gptimum
a) Wordcount
(:ype 1)
b) 0ort (:ype ')
0peedup relati"e to an +E cluster *ith 1( core nodes Close to linear speedup on &ore nodes
'(11!'(1'
42
4oala and MapCedu&e ?J+9A
"6e&ution ,ime 0s. ,ransient Nodes
7nput data of 2( B9 Wordcount output data = '( L9 0ort output data = 2( B9 Wordcount scales better than 0ort on transient nodes
'(11!'(1' 43
4oala and MapCedu&e ?K+9A
Per(orman&e o( t%e CesiHing Me&%anism
7minL0.2B 7ma6L1.2B gro51tep=3 s%rin=1tep=' ,*1P =)( s ,**(D)P =1'( s
0tream of 3( +E >obs +E cluster of '( core nodes + '( transient nodes BB8 increases the si,e of the data transferred across the net*ork B08 gro*s/shrinks based on the resource utili,ation of the cluster BB58 enables local *rites on the disks of pro"isioned nodes
'(11!'(1' 44
4oala and MapCedu&e ?9+9A
-se Case F ,-Del(t
Boal 8rocess 1(s :9 of 8'8 traces 5C0!2 constraints
:ime limit per >ob (13 min) Don!persistent storage
0olution
Eeser"e the nodes for se"eral days# import and process the data&
Gur approach
0plit the data into multiple subsets 0maller data sets =[ faster import and processing 0etup multiple +E clusters# one for each subset
46
'(11!'(1'
#genda
1')4. 7ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads Programming Models (or )ig Data 1- +apEeduce 2. *rap% Pro&essing )- Gther 9ig 5ata 8rogramming +odels 3- 0ummary
'(1'!'(1)
4Q
*rap% Pro&essing "6ample 1ingle-1our&e 1%ortest Pat% (111P)
5i>kstra?s algorithm
0elect node *ith minimal distance Vpdate neighbors G(_E_ + _O_`log_O_) *ith 1ibo@eap
7nitial dataset
C 9 . 5 --'(1'!'(1)
^(# (9# 3)# (5# ))[ ^inf# (E# 1)[ ^inf# (1# 3)[ ^inf# (9# 1)# (.# ))# (E# 2)# (1# 2)[
4I
0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-
*rap% Pro&essing "6ample 111P in MapCedu&e
% What is the performance problem& +apper output distances
^9# 3[# ^5# )[# ^.# inf[# ---
9ut also graph structure
^C# ^(# (9# 3)# (5# ))[ ---
Eeducer input distances
9 ^inf# 3[# 5 ^inf# )[ ---
9ut also graph structure D >obs# *here D is the graph diameter
'(1'!'(1)
9 ^inf# (E# 1)[ --6(
0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-
,%e Pregel Programming Model (or *rap% Pro&essing
9atch!oriented processing Euns in!memory Oertex!centric C87 1ault!tolerant Euns on +aster!0la"e architecture
GL# the actual model follo*s in the next slides
'(1'!'(1) 61
0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-
,%e Pregel Programming Model (or *rap% Pro&essing
8rocessors
9ased on Oaliant?s 9ulk 0ynchronous 8arallel (908)
=ocal .omputation
D processing units *ith fast local memory 0hared communication medium 0eries of 1upersteps Blobal 0ynchroni,ation 9arrier .ommunication Ends *hen all "ote:o@alt
8regel executes initiali,ation# one or se"eral supersteps# shutdo*n
'(1'!'(1)
9arrier 0ynchroni,ation
6'
0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-
Pregel ,%e 1uperstep
Each Oertex (execution in parallel)
Eecei"e messages from other "ertices 8erform o*n computation (user!defined function) +odify o*n state or state of outgoing messages +utate topology of the graph 0end messages to other "ertices
:ermination condition
Cll "ertices inacti"e Cll messages ha"e been transmitted
'(1'!'(1)
6)
0ource 8regel article-
Pregel2 ,%e <erte6-)ased #PI
7nput message 7mplements processing algorithm Gutput message
'(1'!'(1)
62
0ource 8regel article-
,%e Pregel #r&%ite&ture Master-$or=er
+aster assigns "ertices to Workers
Braph partitioning
+aster Worker 1
3
+aster coordinates 0upersteps +aster coordinates .heckpoints Workers execute "ertices compute() Workers exchange messages directly
1 ' 2 )
'(1'!'(1)
Worker
k
63
0ource 8regel article-
Pregel Per(orman&e 111P on 1 )illion-<erte6 )inar. ,ree
'(1'!'(1)
64
0ource 8regel article-
Pregel Per(orman&e 111P on Candom *rap%s3 <arious 1iHes
'(1'!'(1)
66
0ource 8regel article-
#pa&%e *irap% #n 8pen-1our&e Implementation o( Pregel
:asktracker
+ap 0lot +ap 0lot
:asktracker
+ap 0lot +ap 0lot
:asktracker
+ap 0lot +ap 0lot
:asktracker
+ap 0lot +ap 0lot
8ersistent computation state
Yookeeper
DD X A:
+aster
=oose implementation of 8regel 0trong community (1acebook# :*itter# =inked7n) Euns 1((< on existing @adoop clusters 0ingle +ap!only >ob
6Q
'(1'!'(1)
0ource .laudio +artella# 8resentation on Biraph at :V 5elft# Cpr '(1'-
%ttp2++in&u'ator.apa&%e.org+girap%+
Page ran= 'en&%mar=s
:iberium :an
Clmost 2((( nodes# shared among numerous groups in SahooP @adoop (-'(-'(2 (secure @adoop) 'x %uad .ore '-2B@,# '2 B9 EC+# 1x 4:9 @5
org-apache-giraph-benchmark-8ageEank9enchmark
Benerates data# number of edges# number of "ertices# Z of supersteps 1 master/YooLeeper '( supersteps Do checkpoints 1 random edge per "ertex
6I
0ource C"ery .hing presentation at @ortonWorkshttp //***-slideshare-net/a"eryching/'(111(12horton*orks/
$or=er s&ala'ilit. (2B0M 0erti&es)
3((( 23(( 2((( ,otal se&onds )3(( )((( '3(( '((( 13(( 1((( 3(( ( ( 3( 1(( 13( '(( M o( 5or=ers '3( )(( )3(
Q(
0ource C"ery .hing presentation at @ortonWorkshttp //***-slideshare-net/a"eryching/'(111(12horton*orks/
<erte6 s&ala'ilit. (300 5or=ers)
'3(( '((( ,otal se&onds 13(( 1((( 3(( ( ( 1(( '(( )(( 2(( 3(( M o( 0erti&es (in 100Ns o( millions) 4((
Q1
0ource C"ery .hing presentation at @ortonWorkshttp //***-slideshare-net/a"eryching/'(111(12horton*orks/
<erte6+5or=er s&ala'ilit.
M o( 0erti&es (100Ns o( millions) '3(( '((( ,otal se&onds 13(( )(( 1((( '(( 3(( ( ( 3( 1(( 13( '(( M o( 5or=ers '3( )(( )3(
Q'
4(( 3(( 2((
1(( (
0ource C"ery .hing presentation at @ortonWorkshttp //***-slideshare-net/a"eryching/'(111(12horton*orks/
#genda
1')4. 7ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads Programming Models (or )ig Data 1- +apEeduce '- Braph 8rocessing 3. 8t%er )ig Data Programming Models 3- 0ummary
'(1'!'(1)
Q)
1tratosp%ere
+eteor Nuery language# 0upremo operator frame*ork 8rogramming .ontracts (8C.:s) programming model
Extended set of 'nd order functions ("s +apEeduce) 5eclarati"e definition of data parallelism
Dephele execution engine
0chedules multiple dataflo*s simultaneously 0upports 7aa0 en"ironments based on Cma,on E.'# Eucalyptus
@510 storage engine
'(1'!'(1) Q2
1tratosp%ere Programming Contra&ts (P#C,s) ?1+2A
'nd!order function
Clso in +apEeduce
'(1'!'(1) Q3
0ource 8C.: o"er"ie*# https //stratosphere-eu/*iki/doku-php/*iki pactpm
1tratosp%ere Programming Contra&ts (P#C,s) ?2+2A
% @o* can 8C.:s optimi,e data processing&
'(1'!'(1)
Q4
0ource 8C.: o"er"ie*# https //stratosphere-eu/*iki/doku-php/*iki pactpm
1tratosp%ere Nep%ele
'(1'!'(1)
Q6
1tratosp%ere 0s MapCedu&e
8C.: extends +apEeduce
9oth propose 'nd!order functions (3 8C.:s "s +ap X Eeduce) 9oth reNuire from user 1st!order functions (*hat?s inside the +ap) 9oth can benefit from higher!le"el languages 8C.: ecosystem has 7aa0 support
Ley!"alue data model
'(1'!'(1)
QQ
0ource 1abian @ueske# =arge 0cale 5ata Cnalysis 9eyond +apEeduce# @adoop Bet :ogether# 1eb '(1'-
1tratosp%ere 0s MapCedu&e Pair5ise 1%ortest Pat%s ?1+3A
1loyd!Warshall Clgorithm
For k from 1 to n For i from 1 to n For j from 1 to n Di,j min( Di,j , Di,k + Dk,j )
'(1'!'(1)
QI
0ource 0tratosphere example# https //stratosphere-eu/*iki/lib/exe/detail-php/*iki all'allF spFtaskdescription-png&id=*iki<)Ca'aspexample
1tratosp%ere 0s MapCedu&e )- Aoin edges to triads ,riangle "numeration ?1+3A
7nput undirected graph edge!based Gutput triangles 1- Eead input graph '- 0ort by smaller "ertex 75
2- :riads to triangles
'(1'!'(1)
I(
0ource 1abian @ueske# =arge 0cale 5ata Cnalysis 9eyond +apEeduce# @adoop Bet :ogether# 1eb '(1' and 0tratosphere example-
1tratosp%ere 0s MapCedu&e ,riangle "numeration ?2+3A
+apEeduce 1ormulation
'(1'!'(1)
I1
0ource 1abian @ueske# =arge 0cale 5ata Cnalysis 9eyond +apEeduce# @adoop Bet :ogether# 1eb '(1' and 0tratosphere example-
1tratosp%ere 0s MapCedu&e ,riangle "numeration ?3+3A
0tratosphere 1ormulation
'(1'!'(1)
I'
0ource 1abian @ueske# =arge 0cale 5ata Cnalysis 9eyond +apEeduce# @adoop Bet :ogether# 1eb '(1' and 0tratosphere example-
#genda
1')2B. 7ntroduction .loud 8rogramming in 8ractice (:he 8roblem) 8rogramming +odels for .ompute!7ntensi"e Workloads 8rogramming +odels for 9ig 5ata 1ummar.
'(1'!'(1)
I)
Con&lusion ,a=e,a=e- ome Message
Programming model L &omputer s.stem a'stra&tion Programming Models (or Compute-Intensi0e $or=loads
+any trade!offs# fe* dominant programming models +odels bags of tasks# *orkflo*s# master/*orker# 908# $
Programming Models (or )ig Data
)ig data programming models %a0e e&os.stems +any trade!offs# many programming models +odels +apEeduce# 8regel# 8C.:# 5ryad# $ Execution engines @adoop# Loala++E# Biraph# 8C.:/Dephele# 5ryad# $
Cealit. &%e&=2 &loud programming is maturing
Gctober 1(# '(1' http //***-flickr-com/photos/dimitrisotiropoulos/2'(264421Q/ I2
Ceading Material (Ceall. #&ti0e 7ield)
$or=loads
Clexandru 7osup# 5ick @- A- Epema Brid .omputing Workloads- 7EEE 7nternet .omputing 13(') 1I!'4 ('(11)
,%e 7ourt% Paradigm :he 1ourth 8aradigm# http //research-microsoft-com/en!us/collaboration/fourthparadigm/ Programming Models (or Compute-Intensi0e $or=loads
Clexandru 7osup# +athieu Aan# Gmer G,an 0onme,# 5ick @- A- Epema :he .haracteristics and 8erformance of Broups of Aobs in BridsEuro!8ar '((6 )Q'!)I) Gstermann et al-# Gn the .haracteristics of Brid Workflo*s# .oreBE75 7ntegrated Eesearch in Brid .omputing (.B7W)# '((Qhttp //***-pds-e*i-tudelft-nl/aiosup/*ftraces(6charsFcamera-pdf .orina 0tratan# Clexandru 7osup# 5ick @- A- Epema C performance study of grid *orkflo* engines- BE75 '((Q '3!)'
Programming Models (or )ig Data
Aeffrey 5ean# 0an>ay Bhema*at +apEeduce 0implified 5ata 8rocessing on =arge .lusters- G057 '((2 1)6!13( Aeffrey 5ean# 0an>ay Bhema*at +apEeduce a flexible data processing tool- .ommun- C.+ 3)(1) 6'!66 ('(1() +atei Yaharia# Cndy Lon*inski# Cnthony 5- Aoseph# Eandy Lat,# and 7on 0toica- '((Q- 7mpro"ing +apEeduce performance in heterogeneous en"ironments- 7n 8roceedings of the Qth V0ED7b conference on Gperating systems design and implementation (G057W(Q)V0ED7b Cssociation# 9erkeley# .C# V0C# 'I!2'+atei Yaharia# 5hruba 9orthakur# Aoydeep 0en 0arma# Lhaled Elmeleegy# 0cott 0henker# 7on 0toica 5elay scheduling a simple techniNue for achie"ing locality and fairness in cluster scheduling- Euro0ys '(1( '43!'6Q :yson .ondie# Deil .on*ay# 8eter Cl"aro# Aoseph +- @ellerstein# Lhaled Elmeleegy# Eussell 0ears +apEeduce Gnline- D057 '(1( )1)!)'Q Br,egor, +ale*ic,# +atthe* @- Custern# Cart A- .- 9ik# Aames .- 5ehnert# 7lan @orn# Daty =eiser# Br,egor, .,a>ko*ski 8regel a system for large!scale graph processing- 07B+G5 .onference '(1( 1)3!124 5ominic 9attrc# 0tephan E*en# 1abian @ueske# Gde> Lao# Oolker +arkl# 5aniel Warneke Dephele/8C.:s a programming model and execution frame*ork for *eb!scale analytical processing- 0o.. '(1( 11I!1)(
'(1'!'(1) I3