0% found this document useful (0 votes)

123 views

Lesson1-1 Introduction PDF

This document provides an introduction to the multithreaded directed acyclic graph (DAG) model. It defines key DAG concepts such as vertices, edges, and paths. It explains that DAGs can be used to model processes where data flows through a network of processors. It also introduces scheduling of operations on processors and discusses cost models. An example sequential reduction DAG is provided and analyzed. The concepts of work, span, and Brent's theorem for analyzing parallel algorithms are introduced. Basic parallel programming constructs like spawn, sync, and parallel for loops are defined.

Uploaded by

Projectlouis

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

123 views

Lesson1-1 Introduction PDF

Uploaded by

Projectlouis

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Lesson11Introduction

TheMultithreadedDAGModel

DAG=DirectedAcyclicGraph:acollectionofverticesanddirectededges(lineswitharrows).
Eachedgeconnectstwovertices.Thefinalresultoftheseconnectionsisthatthereisnowayto
startatsomevertexA,followasequenceofverticesalongdirectedpaths,andendupbackat
A.

DAGscanbeusedforavarietyoftasks,includingmodelingprocessesinwhichdataflowsina
consistentdirectionthroughanetworkofprocessors.

Eachvertexisanoperationlikeafunctioncall,addition,branch,etc.
Directededgesshowhowoperationsdependononeanother.

Thesinkdependsontheoutputofthesource
Assumethereisalwaysonestartingandoneexitvertex.

Beginanalysisbylookingforastartingvertexthisisavertexwhereallinputsaresatisfied.
Thisvertexcanbeassignedtoanyopenprocessor.

Schedulingtakingunitsofworkandassigningittoprocessors.

HowlongwillittaketoruntheDAG?Acostmodelisneeded.
CostModelAssumptions: allprocessorsrunatthesamespeed
1operation=1unitoftime
Edgesdonothaveanycostassociatedwiththem

ExampleSequentialReduction

Reductionreduceanarraytoasumofitselements.
Tofindthecostofthisreduction.wewillonlycareaboutthecostofarrayaccessandthe
costofaddition.

HowlongwillittaketoexecutethisDAGwithprocessors?
Tp(n)=>ceilingofn/p(timeisdependentuponthesizeofthearray)and
Tp(n)=>n(thentimeforeachaddition)

Theadditionsmustbedonesequentially.
Bothtimeconditionsmustbetrue.

Tp(n)=>ceilingofn/ppwillalwaysbeatleastone.Thismeansareductionwilltakenunits
oftimeonaPRAM.

Tp(n)=>n(thentimeforeachaddition)

QUIZ:AReductionTree

Assumeassociativity(a+b)+c=a+(b+c)
Assumenprocessors.
Assumeadditionisdoneinpairs.

WhatistheminimumtimeonaPRAMwithP=nprocessors?
TheDAGisexecutedlevelbylevelandeachleveltakesconstanttimesoallthatisneededto
calculatethetimeistoknowthelevels.logn.

WorkandSpan

Work=numberofverticesintheDAG=W(n)

Span=longestpaththroughtheDAG=D(n)=numberofverticesonthelongestspan

Spanisalsoknownasthecriticalpath.

T1(n)=W(n)
Tinfinity(n)=D(n)

QUIZ:WorkandSpanforReduction

ForthesequentialDAGspan=O(n)
ForthetreeDAGspan=O(log(n))

BasicWorkSpanLaws

W(n)/D(n)=theamountofworkpercriticalvertex=theaverageavailableparallelisminthe
DAG.

Howmanyprocessorsfortheproblem?W(n)/D(n)

SpanLawTp(n)=>D(n)
WorkLawTp(n)=>ceilingofW(n)/P

Tp(n)=>maximumof{SpanLaw,WorkLaw}={D(n),ceilingofW(n)/P}

BrentsTheoremPart1(setup)

IsthereanupperboundtoexecutetheDAG?Yes,accordingtoBrentsTheorem

GivenaPRAMwithPprocessors.
Breaktheexecutionintophases:
1.
Eachphasehas1criticalpathvertex
2.
Noncriticalpathverticesineachphaseareindependent.Thismeanstheverticesinthe
phasecanhaveedgesthatenterorexitthephase,buttheycannotdependonone
another.
3.
Everyvertexhastobeinsomephase,andonlyonephase.

Howlongwillittaketoexecutephasek?

QUIZBrentsTheoremAside
Usethefollowingequivalencies

BrentsTheoremPart2

TheupperboundofthetimetoexecutetheDAGis:

whichbecomes.

ThisisBrentsTheorem.Itsays

Theupperlimitoftimetoexecutethepath,usingPprocessorsis<=Thetimetoexecutethe
criticalpath+thetimetoexecuteeverythingoffthecriticalpathusingpprocessors.

**Thissetsthegoalforanyscheduler.**

Thesetwolimitsarewithinafactorof2witheachother.

ThisimpliesthatyoumaybeabletoexecutetheDAGinafastertimethanBrentpredicts,but
neverfasterthanthelowerbound.

DesiderataSpeedup,WorkOptimality,andWeakScaling

HowcanwetellisaDAGisgoodorbad.

Speedup=bestsequentialtime/paralleltime=Sp(n)=T*(n)/Tp(n)

T*(n)dependsontheworkdonebythebestsequentialalgorithm
Tp(n)dependsonthework,thespan,n,andp

IdealSpeedup:LinearinP(youwantthespeeduptobelinearwiththenumberofprocessors).

Sp(n)=Theta(p)=BestSequentialWork/ParallelTime=W*(n)/Tp(n)

UseBrentsTheoremtogetanUpperboundontime.
Intheequationshownbelowthereisstilladependenceonn,itisjustnotshownontheright
side.

P=numberofprocessors
Thepenalty(thedenominator)to
getlinearscaling,thedenominator
needstobeaconstant.

Togetaconstantinthedenominator:
W=W*WorkOptimality

WeakScalability
P=O(W*/D)W*/P=Omega(D)workperprocessorhastogrowproportionaltothe
span.Spandependsonproblemsizen.

Recap:

Speeduplinearscalingisthegoal.
Toachievelinearscalingtheworkoftheparallelalgorithmshouldmatchthebestsequential
algorithmandtheworkperprocessorshouldgrowasafunctionofn.

BasicConcurrencyPrimitives

TheDivideandConquerScheme

Thisisthesequentialversionofthedivide
andconquerscheme.

Notethatthetworecursivecallsare
independent,andwillnowbecalledSPAWN

Spawnisasignaltoeitherthecompileror
theruntimesystemthatthetargetisan
independentunitofwork.Thetargetmaybe
executedasynchronouslyfromthecaller.

SYNCthedependencebetweenaandbandthereturnstatement.Thesehavetobecombined.
Syncisusedtocombinethedependentstatements.

TowhichSpawndoesagivenSyncapply?Thesyncmatchesanyspawninthesameframe.

NestedParallelism=Thereisalwaysanimplicitsyncbeforereturningtothecaller.

Thespawncreatestwoindependentpathsonepathcarriesthenewwork,andonepath
continuescarryingonafterthespawn.

QUIZ:ASubtlePointAboutSpawns
Theabovecursivereductionusestwospawnsaretheybothnecessary?YoucaneliminateB
butnotA.

IfyoueliminatetheApathyoueliminate
concurrencythisisbad.

IfyoueliminatetheBpaththetwosub
graphscanbeexecutedconcurrently.

BasicAnalysisofWorkandSpan
Manyoftheanalysistoolsusedonsequentialalgorithmscanbeusedonparallelalgorithms.

Wanttoanalyzeworkandspan.

Assumeeachspawnandsyncisaconstanttime
operation.Andcanbeignoredforanalysis.

Analyzingworkiscountingtotaloperations,endupwith
linearworkO(n).

AnalyzingSpanaspawncreatestwopaths,thecritical
pathisthelongerofthetwopaths.

DesiderataForWorkandSpan
Thegoalsofaparallelalgorithmdesigner:
1.
WorkoptimalityAchieveadegreeofworkthatmatchesthebestsequentialalgorithm.
2.
Findalgorithmswithpolylogarithmicspan.D(n)=O(logkn)thisislowspan
Thisinsurestheaverageavailableparallelismgrowswithn.

ConcurrencyPrimitiveParallelFor
Alliterationsareindependentofoneanother.
Aparforcreatesnindependentsubpaths.
Theendofaparforloopwillincludeanimplicitsyncpoint.
TheWorkofaparforisWparfor(n)=O(n)
TheSpanofaparforisDparfor(n)=O(1)intheory,butinpracticeitwillgrowwithn,especiallyifn
isreallylarge.

QUIZImplementingParFor
TheDAGexecutesthespawnssequentially,oneafteranother.Thisleadstoabottleneck.The
Spangrowswithn.Thisisbad.

ImplementingParForPart2
Implementparforasaprocedurecall(ParForT).Thisisabetterwaytoimplementaparallelfor
loop.Thespanwillnowgrowlogarithmicallywithn.

Fortherestofthiscourse,assumetheParForTimplementation.

D(n)=O(logn)

QUIZMatrixVectorMultiply
Ifaloopcarriesadependence,thenitcannotbeparallelizedwithaparfor.

DataRacesandRaceConditions
Ifwelookatthenestedloops,we
seethattheinnermostloopthere
areiterationsofjthatwritetothe
samei.

DataRace=atleastonereadandonewritecanhappenatthesamedatalocationatthesame
time.

RaceCondition=adataracethatcausesanerror.

**Adataracedoesnotalwaysleadtoaracecondition.**

VectorNotation
t[1:n]A[i,1:n]*x[1:n]Thisisamorecompactformoftheparforloop.
t[:]A[i,:]*x[:]

Thiscanbefurtherreducedto:y[i]y[i]+reduce(t)

Anderson, Modern Compressible Flow Solution
16% (44)
Anderson, Modern Compressible Flow Solution
135 pages
PRAM Algorithms
100% (1)
PRAM Algorithms
24 pages
Parallel Algorithm Main Single
No ratings yet
Parallel Algorithm Main Single
289 pages
Parallel Algorithms: Theory and Practice
No ratings yet
Parallel Algorithms: Theory and Practice
44 pages
parallel and distributed algorithms
No ratings yet
parallel and distributed algorithms
21 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
217-lec10
No ratings yet
217-lec10
27 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
1 Overview, Models of Computation, Brent's Theorem
No ratings yet
1 Overview, Models of Computation, Brent's Theorem
8 pages
7-Tree Sum Parallel Algorithm & Applications
No ratings yet
7-Tree Sum Parallel Algorithm & Applications
23 pages
Multithreading Algorithms
No ratings yet
Multithreading Algorithms
36 pages
PRAM COMP 633: Parallel Computing Algorithms: The PRAM Model of Computation
No ratings yet
PRAM COMP 633: Parallel Computing Algorithms: The PRAM Model of Computation
49 pages
Notes 03
No ratings yet
Notes 03
3 pages
ece408-lecture13-reduction-tree-vk-FL24
No ratings yet
ece408-lecture13-reduction-tree-vk-FL24
45 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
HPC
No ratings yet
HPC
8 pages
L19-20 PA Design Intro
No ratings yet
L19-20 PA Design Intro
31 pages
Lecture Parallelism DC PDF
No ratings yet
Lecture Parallelism DC PDF
7 pages
Program 3 Assignment: Input
No ratings yet
Program 3 Assignment: Input
3 pages
pp assignment
No ratings yet
pp assignment
6 pages
Daa 6
No ratings yet
Daa 6
59 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
14 Parallel Algorithms CUDA Basics s20
No ratings yet
14 Parallel Algorithms CUDA Basics s20
89 pages
L2 Parallel Computing Models
No ratings yet
L2 Parallel Computing Models
31 pages
CSE524sp10-01
No ratings yet
CSE524sp10-01
62 pages
FoP HPC Unit II
No ratings yet
FoP HPC Unit II
107 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Introduction
No ratings yet
Introduction
46 pages
Parallel Programming
No ratings yet
Parallel Programming
42 pages
PCP 2022 6 ParallelAlgorithms PartI
No ratings yet
PCP 2022 6 ParallelAlgorithms PartI
39 pages
Intermediate Code
No ratings yet
Intermediate Code
14 pages
Multi Threading
No ratings yet
Multi Threading
96 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
65 pages
Chapter 02 - Asynchronous and Parallel Programming in .NET
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in .NET
55 pages
Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
No ratings yet
Pram Algorithms: Parallel and Distributed Algorithms BY Debdeep Mukhopadhyay AND Abhishek Somani
17 pages
Lecture 08
No ratings yet
Lecture 08
36 pages
Comp322 s19 Lec01 Slides v1 PDF
No ratings yet
Comp322 s19 Lec01 Slides v1 PDF
17 pages
Parallel Programming
No ratings yet
Parallel Programming
18 pages
A Tutorial On Parallel and Concurrent Programming in Haskell
No ratings yet
A Tutorial On Parallel and Concurrent Programming in Haskell
40 pages
Dag
No ratings yet
Dag
20 pages
Concurrency Models
No ratings yet
Concurrency Models
22 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Intro Parallel Programming 2015
No ratings yet
Intro Parallel Programming 2015
38 pages
CS4230 Parallel Programming Introduction To Parallel Algorithms
No ratings yet
CS4230 Parallel Programming Introduction To Parallel Algorithms
25 pages
Lecture 6
No ratings yet
Lecture 6
37 pages
unit1 2 and 3
No ratings yet
unit1 2 and 3
76 pages
QUIZ PREP
No ratings yet
QUIZ PREP
21 pages
TLP and ILP
No ratings yet
TLP and ILP
9 pages
HPC Note
No ratings yet
HPC Note
39 pages
HPC Unit 456
No ratings yet
HPC Unit 456
25 pages
Course Outline
No ratings yet
Course Outline
4 pages
Lecture 05
No ratings yet
Lecture 05
73 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
Par Seq Algorithms
No ratings yet
Par Seq Algorithms
44 pages
Partitioning
No ratings yet
Partitioning
37 pages
PA midsem
No ratings yet
PA midsem
20 pages
Lesson 3 4 Cache Oblivious Algorithms: The Ideal Cache Model
No ratings yet
Lesson 3 4 Cache Oblivious Algorithms: The Ideal Cache Model
4 pages
Lesson2 6DistributedBFS PDF
No ratings yet
Lesson2 6DistributedBFS PDF
6 pages
Lesson2 7GraphPartitioning PDF
No ratings yet
Lesson2 7GraphPartitioning PDF
14 pages
Lesson2 5DistributedMemorySorting PDF
No ratings yet
Lesson2 5DistributedMemorySorting PDF
9 pages
Lesson3 3IOAvoidingAlgorithms PDF
No ratings yet
Lesson3 3IOAvoidingAlgorithms PDF
8 pages
Lesson1-7 Parallel Pointers Graphs PDF
No ratings yet
Lesson1-7 Parallel Pointers Graphs PDF
7 pages
Lesson2 4DistributedMartrixMultiply PDF
No ratings yet
Lesson2 4DistributedMartrixMultiply PDF
7 pages
Lesson1 5TreeComputations PDF
No ratings yet
Lesson1 5TreeComputations PDF
17 pages
Lesson1-5 5 PDF
No ratings yet
Lesson1-5 5 PDF
7 pages
Ece3099 Ipt PPT Template 18becxxxx
No ratings yet
Ece3099 Ipt PPT Template 18becxxxx
27 pages
Data Analytics
No ratings yet
Data Analytics
6 pages
2nd Sem 4th Module Important Questions Cse 24-25-1
No ratings yet
2nd Sem 4th Module Important Questions Cse 24-25-1
2 pages
FLANN Presnetation For Group
No ratings yet
FLANN Presnetation For Group
26 pages
Compiler Design Elimination Left Recursion and Left Factoring
No ratings yet
Compiler Design Elimination Left Recursion and Left Factoring
16 pages
CSC 580 - Chapter 3
No ratings yet
CSC 580 - Chapter 3
35 pages
Computer Vision Intern Position - Set 2
No ratings yet
Computer Vision Intern Position - Set 2
3 pages
Filter Second Order
No ratings yet
Filter Second Order
16 pages
Lab Ex 5
No ratings yet
Lab Ex 5
7 pages
15A04502 Digital Communication Systems (3) - 1
No ratings yet
15A04502 Digital Communication Systems (3) - 1
1 page
Principal Component Analysis For Noise Reduction and Fraudulent Activity Detection in Scientific Data
No ratings yet
Principal Component Analysis For Noise Reduction and Fraudulent Activity Detection in Scientific Data
10 pages
Chapter 2 Root Finding
No ratings yet
Chapter 2 Root Finding
9 pages
Chapterwise Test01 Maths Classix Cbse 102576 Test PDF 01yapnnjza
No ratings yet
Chapterwise Test01 Maths Classix Cbse 102576 Test PDF 01yapnnjza
1 page
Homework #4: Interpolation and Polynomial Approximation / Curve Fitting
No ratings yet
Homework #4: Interpolation and Polynomial Approximation / Curve Fitting
9 pages
12 Transformer
No ratings yet
12 Transformer
41 pages
Discrete Time Systems
No ratings yet
Discrete Time Systems
5 pages
Names of Reciprocals of Large Numbers
No ratings yet
Names of Reciprocals of Large Numbers
5 pages
Machine Learning-Assignments PDF
No ratings yet
Machine Learning-Assignments PDF
2 pages
BM2406 LM5
No ratings yet
BM2406 LM5
11 pages
Binaya Kumar Panigrahi
No ratings yet
Binaya Kumar Panigrahi
36 pages
Module 1part 2
No ratings yet
Module 1part 2
70 pages
3-Logic Regression
No ratings yet
3-Logic Regression
27 pages
Weather Forecasting Basepaper
100% (1)
Weather Forecasting Basepaper
14 pages
The Gabor Transform, STFT and CWT Invertibility, and Generalized Parseval S Like Theorem
No ratings yet
The Gabor Transform, STFT and CWT Invertibility, and Generalized Parseval S Like Theorem
7 pages
Bbabbm 2ND 2
No ratings yet
Bbabbm 2ND 2
3 pages
Interpolation
No ratings yet
Interpolation
24 pages
cs2403-DIGITAL SIGNAL PROCESSING PDF
No ratings yet
cs2403-DIGITAL SIGNAL PROCESSING PDF
0 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
ITCT Lab Manual 2018-19
100% (3)
ITCT Lab Manual 2018-19
40 pages
007 Uninformed Search
No ratings yet
007 Uninformed Search
24 pages

Lesson1-1 Introduction PDF

Uploaded by

Lesson1-1 Introduction PDF

Uploaded by

Lesson11Introduction

You might also like