0% found this document useful (0 votes)
123 views

Lesson1-1 Introduction PDF

This document provides an introduction to the multithreaded directed acyclic graph (DAG) model. It defines key DAG concepts such as vertices, edges, and paths. It explains that DAGs can be used to model processes where data flows through a network of processors. It also introduces scheduling of operations on processors and discusses cost models. An example sequential reduction DAG is provided and analyzed. The concepts of work, span, and Brent's theorem for analyzing parallel algorithms are introduced. Basic parallel programming constructs like spawn, sync, and parallel for loops are defined.

Uploaded by

Projectlouis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
123 views

Lesson1-1 Introduction PDF

This document provides an introduction to the multithreaded directed acyclic graph (DAG) model. It defines key DAG concepts such as vertices, edges, and paths. It explains that DAGs can be used to model processes where data flows through a network of processors. It also introduces scheduling of operations on processors and discusses cost models. An example sequential reduction DAG is provided and analyzed. The concepts of work, span, and Brent's theorem for analyzing parallel algorithms are introduced. Basic parallel programming constructs like spawn, sync, and parallel for loops are defined.

Uploaded by

Projectlouis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Lesson11Introduction

TheMultithreadedDAGModel

DAG=DirectedAcyclicGraph:acollectionofverticesanddirectededges(lineswitharrows).
Eachedgeconnectstwovertices.Thefinalresultoftheseconnectionsisthatthereisnowayto
startatsomevertexA,followasequenceofverticesalongdirectedpaths,andendupbackat
A.

DAGscanbeusedforavarietyoftasks,includingmodelingprocessesinwhichdataflowsina
consistentdirectionthroughanetworkofprocessors.

Eachvertexisanoperationlikeafunctioncall,addition,branch,etc.
Directededgesshowhowoperationsdependononeanother.

Thesinkdependsontheoutputofthesource
Assumethereisalwaysonestartingandoneexitvertex.

Beginanalysisbylookingforastartingvertexthisisavertexwhereallinputsaresatisfied.
Thisvertexcanbeassignedtoanyopenprocessor.

Schedulingtakingunitsofworkandassigningittoprocessors.

HowlongwillittaketoruntheDAG?Acostmodelisneeded.
CostModelAssumptions: allprocessorsrunatthesamespeed
1operation=1unitoftime
Edgesdonothaveanycostassociatedwiththem

ExampleSequentialReduction

Reductionreduceanarraytoasumofitselements.
Tofindthecostofthisreduction.wewillonlycareaboutthecostofarrayaccessandthe
costofaddition.

HowlongwillittaketoexecutethisDAGwithprocessors?
Tp(n)=>ceilingofn/p(timeisdependentuponthesizeofthearray)and
Tp(n)=>n(thentimeforeachaddition)

Theadditionsmustbedonesequentially.
Bothtimeconditionsmustbetrue.

Tp(n)=>ceilingofn/ppwillalwaysbeatleastone.Thismeansareductionwilltakenunits
oftimeonaPRAM.

Tp(n)=>n(thentimeforeachaddition)

QUIZ:AReductionTree

Assumeassociativity(a+b)+c=a+(b+c)
Assumenprocessors.
Assumeadditionisdoneinpairs.

WhatistheminimumtimeonaPRAMwithP=nprocessors?
TheDAGisexecutedlevelbylevelandeachleveltakesconstanttimesoallthatisneededto
calculatethetimeistoknowthelevels.logn.

WorkandSpan

Work=numberofverticesintheDAG=W(n)

Span=longestpaththroughtheDAG=D(n)=numberofverticesonthelongestspan

Spanisalsoknownasthecriticalpath.

T1(n)=W(n)
Tinfinity(n)=D(n)

QUIZ:WorkandSpanforReduction

ForthesequentialDAGspan=O(n)
ForthetreeDAGspan=O(log(n))

BasicWorkSpanLaws

W(n)/D(n)=theamountofworkpercriticalvertex=theaverageavailableparallelisminthe
DAG.

Howmanyprocessorsfortheproblem?W(n)/D(n)

SpanLawTp(n)=>D(n)
WorkLawTp(n)=>ceilingofW(n)/P

Tp(n)=>maximumof{SpanLaw,WorkLaw}={D(n),ceilingofW(n)/P}

BrentsTheoremPart1(setup)

IsthereanupperboundtoexecutetheDAG?Yes,accordingtoBrentsTheorem

GivenaPRAMwithPprocessors.
Breaktheexecutionintophases:
1.
Eachphasehas1criticalpathvertex
2.
Noncriticalpathverticesineachphaseareindependent.Thismeanstheverticesinthe
phasecanhaveedgesthatenterorexitthephase,buttheycannotdependonone
another.
3.
Everyvertexhastobeinsomephase,andonlyonephase.

Howlongwillittaketoexecutephasek?

QUIZBrentsTheoremAside
Usethefollowingequivalencies

BrentsTheoremPart2

TheupperboundofthetimetoexecutetheDAGis:

whichbecomes.

ThisisBrentsTheorem.Itsays

Theupperlimitoftimetoexecutethepath,usingPprocessorsis<=Thetimetoexecutethe
criticalpath+thetimetoexecuteeverythingoffthecriticalpathusingpprocessors.

**Thissetsthegoalforanyscheduler.**

Thesetwolimitsarewithinafactorof2witheachother.

ThisimpliesthatyoumaybeabletoexecutetheDAGinafastertimethanBrentpredicts,but
neverfasterthanthelowerbound.

DesiderataSpeedup,WorkOptimality,andWeakScaling

HowcanwetellisaDAGisgoodorbad.

Speedup=bestsequentialtime/paralleltime=Sp(n)=T*(n)/Tp(n)

T*(n)dependsontheworkdonebythebestsequentialalgorithm
Tp(n)dependsonthework,thespan,n,andp

IdealSpeedup:LinearinP(youwantthespeeduptobelinearwiththenumberofprocessors).


Sp(n)=Theta(p)=BestSequentialWork/ParallelTime=W*(n)/Tp(n)

UseBrentsTheoremtogetanUpperboundontime.
Intheequationshownbelowthereisstilladependenceonn,itisjustnotshownontheright
side.

P=numberofprocessors
Thepenalty(thedenominator)to
getlinearscaling,thedenominator
needstobeaconstant.

Togetaconstantinthedenominator:
W=W*WorkOptimality

WeakScalability
P=O(W*/D)W*/P=Omega(D)workperprocessorhastogrowproportionaltothe
span.Spandependsonproblemsizen.

Recap:

Speeduplinearscalingisthegoal.
Toachievelinearscalingtheworkoftheparallelalgorithmshouldmatchthebestsequential
algorithmandtheworkperprocessorshouldgrowasafunctionofn.

BasicConcurrencyPrimitives

TheDivideandConquerScheme

Thisisthesequentialversionofthedivide
andconquerscheme.

Notethatthetworecursivecallsare
independent,andwillnowbecalledSPAWN

Spawnisasignaltoeitherthecompileror
theruntimesystemthatthetargetisan
independentunitofwork.Thetargetmaybe
executedasynchronouslyfromthecaller.


SYNCthedependencebetweenaandbandthereturnstatement.Thesehavetobecombined.
Syncisusedtocombinethedependentstatements.

TowhichSpawndoesagivenSyncapply?Thesyncmatchesanyspawninthesameframe.

NestedParallelism=Thereisalwaysanimplicitsyncbeforereturningtothecaller.

Thespawncreatestwoindependentpathsonepathcarriesthenewwork,andonepath
continuescarryingonafterthespawn.

QUIZ:ASubtlePointAboutSpawns
Theabovecursivereductionusestwospawnsaretheybothnecessary?YoucaneliminateB
butnotA.

IfyoueliminatetheApathyoueliminate
concurrencythisisbad.

IfyoueliminatetheBpaththetwosub
graphscanbeexecutedconcurrently.

BasicAnalysisofWorkandSpan
Manyoftheanalysistoolsusedonsequentialalgorithmscanbeusedonparallelalgorithms.

Wanttoanalyzeworkandspan.

Assumeeachspawnandsyncisaconstanttime
operation.Andcanbeignoredforanalysis.

Analyzingworkiscountingtotaloperations,endupwith
linearworkO(n).

AnalyzingSpanaspawncreatestwopaths,thecritical
pathisthelongerofthetwopaths.

DesiderataForWorkandSpan
Thegoalsofaparallelalgorithmdesigner:
1.
WorkoptimalityAchieveadegreeofworkthatmatchesthebestsequentialalgorithm.
2.
Findalgorithmswithpolylogarithmicspan.D(n)=O(logkn)thisislowspan
Thisinsurestheaverageavailableparallelismgrowswithn.

ConcurrencyPrimitiveParallelFor
Alliterationsareindependentofoneanother.
Aparforcreatesnindependentsubpaths.
Theendofaparforloopwillincludeanimplicitsyncpoint.
TheWorkofaparforisWparfor(n)=O(n)
TheSpanofaparforisDparfor(n)=O(1)intheory,butinpracticeitwillgrowwithn,especiallyifn
isreallylarge.

QUIZImplementingParFor
TheDAGexecutesthespawnssequentially,oneafteranother.Thisleadstoabottleneck.The
Spangrowswithn.Thisisbad.

ImplementingParForPart2
Implementparforasaprocedurecall(ParForT).Thisisabetterwaytoimplementaparallelfor
loop.Thespanwillnowgrowlogarithmicallywithn.

Fortherestofthiscourse,assumetheParForTimplementation.

D(n)=O(logn)

QUIZMatrixVectorMultiply
Ifaloopcarriesadependence,thenitcannotbeparallelizedwithaparfor.

DataRacesandRaceConditions
Ifwelookatthenestedloops,we
seethattheinnermostloopthere
areiterationsofjthatwritetothe
samei.

DataRace=atleastonereadandonewritecanhappenatthesamedatalocationatthesame
time.

RaceCondition=adataracethatcausesanerror.

**Adataracedoesnotalwaysleadtoaracecondition.**

VectorNotation
t[1:n]A[i,1:n]*x[1:n]Thisisamorecompactformoftheparforloop.
t[:]A[i,:]*x[:]

Thiscanbefurtherreducedto:y[i]y[i]+reduce(t)

You might also like