Lesson3 3IOAvoidingAlgorithms PDF
Lesson3 3IOAvoidingAlgorithms PDF
SenseofScale
GoalofLesson:developalowerboundfortheamountofcommunicationtosortonamachine
withslowandfastmemory.
nisthenumberofdatapoints
Zisthesizeofthefastmemory(inwords)
Listhenumberofwordspertransfer
QuizInformation:
Given:
50
volumeofdatatosort:r*n=1PiB(2
Bytes)
record(item)size:r=256Bytes
36
Fastmemorysze:r*Z=64GiB(2
Bytes)
15
Memorytransfersize:r*L=32KiB(2
Bytes)
Thenumberoftransferoperations:
8
43
28
7
Use:r=2
B,n=2
records,Z=2
records,L=2
records
nlog
n=185T
2
ops
nlog
(n/L)=154T
2
ops
n=4.40T
ops
n/Llog
(n/L)=1.20T
2
ops
n/Llog
(n/Z)=0.275T
2
ops
n/Llog
(n/L)=0.0523T
Z/L
ops
Notetheimprovementsrelativetothebaseline(nlog
n):
2
Abigimprovementcomesfromreducingnton/L.ThismeanstransferringthedatainLsized
fragments.
Anotherbigimprovementcomesfromgoingfrombase2tobaseZ/L.Thisimprovementcomes
fromthecapacityoffastmemory(Z).
Handylogfactoid:
log
x=(log
x)/(log
Z/L)
Z/L
2
2
ExternalMemoryMergeSort
Letssortnelementsinatwolevelmemoryhierarchy.
Phase1:
Assumeprocessorissequential.
1. DividethenelementsintochunksofsizeZ,sothateachchunkfitsintofastmemory.
2. Readachunkfromslowmemorytofastmemory.
3. Nowsortthischunk,callitarun.
4. Writethechunkbacktoslowmemory.
5. Repeatforeachchunkofdata.
6. Youwillendupwithn/(fZ)chunksofsortedruns.
Phase1:
Partitioninputinton/(fZ)chunks
foreachchunki1ton/(fZ)do
Readchunki
Sortitintoarun
Writeruni
Phase2:
Mergethen/(fZ)runsintoasinglerun
PartitionedSortingStepAnalysis
(Phase1fromabove)
Countthenumberofasymptotictransfersateachsteptoobtainatotalasymptoticcost.
Readchunki
O(n/L)transfers
Sortitintoarun
O(n*log(Z))transfers
Writeruni
O(n/L)transfers
Toderivethevalues:
Assume:L|(fZ)and(fZ)|n+optimalcomparisonsort
Readchunki(fZ/L)*(n/(f*Z))=O(n/L)
Sortit(thenumberofcomparisons)O(fZlog(fZ))*n/fZ=O(nlog
Z)
2
WriteruniO(n/L)
Theschemeisgivingussomethingproportionalton/Ztransactions
TwoWayExternalMemoryMerging
Assumemrunsofsizes.
Thenumberofitemsisn=m*s
Nowwemustmergeallofthesortedrunsintoasinglesortedrun
Onemethod:mergerpairsofruns,thenpairsofthepairs,etc.
Ifwefollowthismethod.
k1
k
Foreachmerge,therunsizegrows.s,2s,,2
s,2
s
Tovisualizethis:
k1
Wehavetworunseachofsize2
sinslowmemory.
k
GoaltomergethetworunsintoanewrunCofsize2
s
Todothis:
Infastmemorytherewillbethreebuffers,eachholdingLitems(recallListhetransaction
size)
WellusethefirsttwobufferswillholdthetworunsAandB.Thethirdbuffer,C,willholdthe
combined,sortedrun.
Tobegin:
1. MoveanLsizeblockfromAandonefromB.StoretheminthefastmemoryA^and
B^.
2. SortA^andB^intoC^
ReadLsizedblocksofA,BandstoreinA^andB^
whileanyunmergeditemsinA&Bdo
mergeA^,B^C^aspossible
ifA^orB^emptythenreadmore
ifC^fullthenflush
FlushanyunmergedinAorB
Whatisthecostofthis?
ThisschemeonlyloadsitemsfromAorBonce
k1
k1
Transfers=(2
s)/L+(2
s)/L
k
Itonlywritesagivenoutputblockonce(2
s)/L
k1
k1
k
k+1
SoTransfers=(2
s)/L+(2
s)/L+(2
s)/L=(2
s)/L
k
Numberofcomparisons (2
s)
k
Numberofpairsmergedatlevelk=n/(2
s)
Numberoflevels=log
(n/s)
2
Totalnumberoftransfersis:2(n/L)log
(n/s)
2
Totalnumberofcomparisonsis: (nlog
n/s)
2
Thequestiontoaskisisthisgoodorbad?
ExternalMemoryMergeSort
MergeSortinExternalMemory:
Phase1:
Partitioninputinto (n/Z)chunks
Sorteachchunk,producing (n/Z)runsofsizeZeach
Phase2:
Mergeallrunsusinga2waymerge
Whataretheasymptoticcosts:
Phase1:
Comparisons=O(nlog
Z)
2
Transfers=O(n/L)
Phase2:
Comparisons=O(nlog
n/Z)
2
Transfers=O(n/Llog
n/Z)
2
Total:
Comparisons:O(nlog(n))=nlogn(thisisgoodnews,theschemeisworkoptimal)
Transfers:O(n/L*log(n/Z))
Thelowerboundis:~n/Llog
(n/L)
Z/L
WhatsWrongwith2WayMerging
Thenumberoftransfersinexternalmemorymergesortwith2waymerging:
Q(nZ,L)=O(n/Llog
(n/Z))=O(n/L[log
(n/L)log
(Z/L)])
2
2
2
Thelowerbound:
Q(nZ,L)= (n/Llog
(n/L))= (n/L(log
(n/L))/(log
(Z/L))
Z/L
2
2
Thisdifferencecanbequitehigh,dependingon
thesystem.
Whydoesnt2waymergedoabetteratachievinglowercosts?
Twowaymergingisnotgoodatutilizingfastmemory(Z)capacity.Themergingprocedureonly
worksonpairsofarraysatatimeandusesablockofsizeL.Sotwowaymergeissensitiveto
LbutnotsensitivetoZ.
Youshouldbeabletofixthisissue.
MultiwayMerging
Todobetterthan2waymerging,letsmergealotofrunsatonce.
Assume:
Kruns,storedinslowmemoryandsortedinascendingorder
K+1blockswillfitinfastmemory,(K+1)L Z(Zissizeoffastmemory)
(thiswillallowoneblockforeachinputand1blockfortheoutput)
1.Loadthefastmemorywithkblocks.
2.Nowfindthesmallestvalueofalltheblocks,moveittotheoutputblock(thek+1block).
Thesmallestvaluecanbefoundanumberofwaysalinearscanorminheaparetwo
possibilities.ThelinearscanwillworkifKissmall.
Touseapriorityqueue(orminheap)
1.loadtheblocks
2.buildtheheap(costO(K)operations)
3.extractMin(costO(logK)operations)
4.insert(costO(logk)operations)
Theseareallfastmemoryoperationssowecancountthemascomparisons.
3.Whentheoutputblockiffilled,flushitbywritingittoslowmemory.
4.Ifaninputblockisempty,refillit.
WhatisthecostofaKwaymerge?
Transfers:readinputblocksonce,writeblocksonce:2Ks/L
Comparisons:initialcosttobuildtheheap,theneachitemiseitherinsertedorextracted
O(K+KslogK)..forasinglekwaymerge
CostofMultiwayMerge
Initialinputhasnelements,dividedintosortedruns,withzitemseach.
Ifyoudokwaymerges,thenumberofcomparisons=nlog(n)
Whatisthetotalnumberofasymptoticmemorytransfers?
Assumek= (z/L)<z/L
l= (log
n/L)(lismaximumnumberofmergetrees)
z/L
i
Transfersperrunati= (K
s/L)
i
#ofrunsatleveli=n/K
s
Totaltransfersatleveli= (n/L)
#oflevels= (log
n/L)
z/L
Whatisthetotalnumberofasymptoticmemorytransfers?
Totaltransfersatleveli*#oflevels=O(n/L*log(n/L,Z/L)
ALowerBoundonExternalMemorySorting
Thisisverygood.
#ofpossibleorderings:n!
#orderingsaftert1transfers:K(t1)
soK(0)=n!
Now,afteracertainnumberoforderings,yourealfromslowmemorytofastmemoryLitems.
SothereareL!waysoforderingthisnewitems.
SonowyouhaveZLolditems(alreadyordered)andLnewitems.
Howmanywayscanthesebeordered? (ZchooseL) L!
t
Aftertreads,thelowerboundonthenumberoforderings:K(t) K(t1)/[(ZchooseL)L!]
Thisisconservative,itassumesLisunordered.
IfLhasbeenreadbefore.
Nowanswerthequestion,whendoesonly1orderingremain?
Thisisthelowerboundonthenumberoftransfers: