0% found this document useful (0 votes)
76 views

Lesson3 3IOAvoidingAlgorithms PDF

The document describes algorithms for sorting data that exceeds the capacity of fast memory by using a two-level memory hierarchy. It proposes an external merge sort algorithm that partitions the data into chunks that fit in fast memory, sorts each chunk, then merges the sorted chunks using a multi-way merging technique. This achieves optimal comparison complexity of O(n log n) while minimizing data transfers between fast and slow memory to O(n/L log(n/L, Z/L)).

Uploaded by

Projectlouis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views

Lesson3 3IOAvoidingAlgorithms PDF

The document describes algorithms for sorting data that exceeds the capacity of fast memory by using a two-level memory hierarchy. It proposes an external merge sort algorithm that partitions the data into chunks that fit in fast memory, sorts each chunk, then merges the sorted chunks using a multi-way merging technique. This achieves optimal comparison complexity of O(n log n) while minimizing data transfers between fast and slow memory to O(n/L log(n/L, Z/L)).

Uploaded by

Projectlouis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Lesson33I/OAvoidingAlgorithms

SenseofScale

GoalofLesson:developalowerboundfortheamountofcommunicationtosortonamachine
withslowandfastmemory.

nisthenumberofdatapoints
Zisthesizeofthefastmemory(inwords)
Listhenumberofwordspertransfer

QuizInformation:

Given:
50
volumeofdatatosort:r*n=1PiB(2
Bytes)

record(item)size:r=256Bytes

36
Fastmemorysze:r*Z=64GiB(2
Bytes)

15
Memorytransfersize:r*L=32KiB(2
Bytes)

Thenumberoftransferoperations:
8
43
28
7
Use:r=2
B,n=2
records,Z=2
records,L=2
records

nlog
n=185T
2
ops
nlog
(n/L)=154T
2
ops
n=4.40T

ops
n/Llog
(n/L)=1.20T
2
ops
n/Llog
(n/Z)=0.275T
2
ops
n/Llog
(n/L)=0.0523T
Z/L
ops

Notetheimprovementsrelativetothebaseline(nlog
n):
2

Abigimprovementcomesfromreducingnton/L.ThismeanstransferringthedatainLsized
fragments.

Anotherbigimprovementcomesfromgoingfrombase2tobaseZ/L.Thisimprovementcomes
fromthecapacityoffastmemory(Z).


Handylogfactoid:
log
x=(log
x)/(log
Z/L)
Z/L
2
2

ExternalMemoryMergeSort

Letssortnelementsinatwolevelmemoryhierarchy.

Phase1:
Assumeprocessorissequential.
1. DividethenelementsintochunksofsizeZ,sothateachchunkfitsintofastmemory.
2. Readachunkfromslowmemorytofastmemory.
3. Nowsortthischunk,callitarun.
4. Writethechunkbacktoslowmemory.
5. Repeatforeachchunkofdata.
6. Youwillendupwithn/(fZ)chunksofsortedruns.

Phase1:
Partitioninputinton/(fZ)chunks
foreachchunki1ton/(fZ)do
Readchunki
Sortitintoarun
Writeruni

Phase2:
Mergethen/(fZ)runsintoasinglerun

PartitionedSortingStepAnalysis
(Phase1fromabove)

Countthenumberofasymptotictransfersateachsteptoobtainatotalasymptoticcost.
Readchunki

O(n/L)transfers
Sortitintoarun

O(n*log(Z))transfers
Writeruni

O(n/L)transfers

Toderivethevalues:
Assume:L|(fZ)and(fZ)|n+optimalcomparisonsort

Readchunki(fZ/L)*(n/(f*Z))=O(n/L)
Sortit(thenumberofcomparisons)O(fZlog(fZ))*n/fZ=O(nlog
Z)
2
WriteruniO(n/L)


Theschemeisgivingussomethingproportionalton/Ztransactions

TwoWayExternalMemoryMerging

Assumemrunsofsizes.
Thenumberofitemsisn=m*s
Nowwemustmergeallofthesortedrunsintoasinglesortedrun

Onemethod:mergerpairsofruns,thenpairsofthepairs,etc.
Ifwefollowthismethod.
k1
k
Foreachmerge,therunsizegrows.s,2s,,2
s,2
s

Tovisualizethis:

k1
Wehavetworunseachofsize2
sinslowmemory.
k
GoaltomergethetworunsintoanewrunCofsize2
s
Todothis:
Infastmemorytherewillbethreebuffers,eachholdingLitems(recallListhetransaction
size)
WellusethefirsttwobufferswillholdthetworunsAandB.Thethirdbuffer,C,willholdthe
combined,sortedrun.

Tobegin:
1. MoveanLsizeblockfromAandonefromB.StoretheminthefastmemoryA^and
B^.
2. SortA^andB^intoC^
ReadLsizedblocksofA,BandstoreinA^andB^
whileanyunmergeditemsinA&Bdo
mergeA^,B^C^aspossible
ifA^orB^emptythenreadmore
ifC^fullthenflush
FlushanyunmergedinAorB

Whatisthecostofthis?
ThisschemeonlyloadsitemsfromAorBonce
k1
k1
Transfers=(2
s)/L+(2
s)/L

k
Itonlywritesagivenoutputblockonce(2
s)/L

k1
k1
k
k+1
SoTransfers=(2
s)/L+(2
s)/L+(2
s)/L=(2
s)/L

k
Numberofcomparisons (2
s)

k
Numberofpairsmergedatlevelk=n/(2
s)
Numberoflevels=log
(n/s)
2

Totalnumberoftransfersis:2(n/L)log
(n/s)
2
Totalnumberofcomparisonsis: (nlog
n/s)
2

Thequestiontoaskisisthisgoodorbad?


ExternalMemoryMergeSort

MergeSortinExternalMemory:

Phase1:
Partitioninputinto (n/Z)chunks
Sorteachchunk,producing (n/Z)runsofsizeZeach

Phase2:
Mergeallrunsusinga2waymerge

Whataretheasymptoticcosts:

Phase1:
Comparisons=O(nlog
Z)
2
Transfers=O(n/L)

Phase2:
Comparisons=O(nlog
n/Z)
2
Transfers=O(n/Llog
n/Z)
2

Total:
Comparisons:O(nlog(n))=nlogn(thisisgoodnews,theschemeisworkoptimal)
Transfers:O(n/L*log(n/Z))

Thelowerboundis:~n/Llog
(n/L)
Z/L

WhatsWrongwith2WayMerging

Thenumberoftransfersinexternalmemorymergesortwith2waymerging:

Q(nZ,L)=O(n/Llog
(n/Z))=O(n/L[log
(n/L)log
(Z/L)])
2
2
2

Thelowerbound:

Q(nZ,L)= (n/Llog
(n/L))= (n/L(log
(n/L))/(log
(Z/L))
Z/L
2
2

Thisdifferencecanbequitehigh,dependingon
thesystem.

Whydoesnt2waymergedoabetteratachievinglowercosts?
Twowaymergingisnotgoodatutilizingfastmemory(Z)capacity.Themergingprocedureonly
worksonpairsofarraysatatimeandusesablockofsizeL.Sotwowaymergeissensitiveto
LbutnotsensitivetoZ.

Youshouldbeabletofixthisissue.

MultiwayMerging
Todobetterthan2waymerging,letsmergealotofrunsatonce.

Assume:
Kruns,storedinslowmemoryandsortedinascendingorder
K+1blockswillfitinfastmemory,(K+1)L Z(Zissizeoffastmemory)
(thiswillallowoneblockforeachinputand1blockfortheoutput)

1.Loadthefastmemorywithkblocks.
2.Nowfindthesmallestvalueofalltheblocks,moveittotheoutputblock(thek+1block).
Thesmallestvaluecanbefoundanumberofwaysalinearscanorminheaparetwo
possibilities.ThelinearscanwillworkifKissmall.
Touseapriorityqueue(orminheap)
1.loadtheblocks
2.buildtheheap(costO(K)operations)
3.extractMin(costO(logK)operations)
4.insert(costO(logk)operations)
Theseareallfastmemoryoperationssowecancountthemascomparisons.
3.Whentheoutputblockiffilled,flushitbywritingittoslowmemory.
4.Ifaninputblockisempty,refillit.

WhatisthecostofaKwaymerge?

Transfers:readinputblocksonce,writeblocksonce:2Ks/L

Comparisons:initialcosttobuildtheheap,theneachitemiseitherinsertedorextracted

O(K+KslogK)..forasinglekwaymerge

CostofMultiwayMerge

Initialinputhasnelements,dividedintosortedruns,withzitemseach.
Ifyoudokwaymerges,thenumberofcomparisons=nlog(n)

Whatisthetotalnumberofasymptoticmemorytransfers?
Assumek= (z/L)<z/L

l= (log
n/L)(lismaximumnumberofmergetrees)
z/L

i
Transfersperrunati= (K
s/L)
i
#ofrunsatleveli=n/K
s
Totaltransfersatleveli= (n/L)
#oflevels= (log
n/L)
z/L

Whatisthetotalnumberofasymptoticmemorytransfers?
Totaltransfersatleveli*#oflevels=O(n/L*log(n/L,Z/L)

ALowerBoundonExternalMemorySorting

Mergesortwith (Z/L) waymerges:Q(nZ,L)= (n/Llog


(n/L))
Z/L

Thisisverygood.

#ofpossibleorderings:n!
#orderingsaftert1transfers:K(t1)
soK(0)=n!

Now,afteracertainnumberoforderings,yourealfromslowmemorytofastmemoryLitems.
SothereareL!waysoforderingthisnewitems.
SonowyouhaveZLolditems(alreadyordered)andLnewitems.
Howmanywayscanthesebeordered? (ZchooseL) L!

t
Aftertreads,thelowerboundonthenumberoforderings:K(t) K(t1)/[(ZchooseL)L!]

Thisisconservative,itassumesLisunordered.
IfLhasbeenreadbefore.

Nowanswerthequestion,whendoesonly1orderingremain?
Thisisthelowerboundonthenumberoftransfers:

You might also like