0% found this document useful (0 votes)
37 views

Chapter 13: Query Processing

This document discusses different algorithms for query processing operations like selection, sorting, and joins in a database. It provides cost estimates for each algorithm in terms of disk block accesses. For selection, it describes linear scans, binary searches, and index scans. Sorting can be done with external merge sort. Join algorithms include nested loops, block nested loops, and hash and merge joins. The goal is to choose the most efficient algorithm based on statistics about the relations and available indexes.

Uploaded by

Karan Jain
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Chapter 13: Query Processing

This document discusses different algorithms for query processing operations like selection, sorting, and joins in a database. It provides cost estimates for each algorithm in terms of disk block accesses. For selection, it describes linear scans, binary searches, and index scans. Sorting can be done with external merge sort. Join algorithms include nested loops, block nested loops, and hash and merge joins. The goal is to choose the most efficient algorithm based on statistics about the relations and available indexes.

Uploaded by

Karan Jain
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 49

13.

1
Chapter13:QueryProcessing
Overview
MeasuresofQueryCost
SelectionOperation
Sorting
JoinOperation
OtherOperations
EvaluationofExpressions
13.2
BasicStepsinQueryProcessing
1. Parsingandtranslation
2. Optimization
3. Evaluation
13.3
BasicStepsinQueryProcessing
(Cont.)
Parsingandtranslation
translatethequeryintoitsinternalform.Thisisthen
translatedintorelationalalgebra.
Parsercheckssyntax,verifiesrelations
Evaluation
Thequery-executionenginetakesaquery-evaluationplan,
executesthatplan,andreturnstheanswerstothequery.

13.4
BasicStepsinQueryProcessing:
Optimization
Arelationalalgebraexpressionmayhavemanyequivalent
expressions
E.g.,o
balance<2500
([
balance
(account))isequivalentto
[
balance
(o
balance<2500
(account))
Eachrelationalalgebraoperationcanbeevaluatedusingoneof
severaldifferentalgorithms
Correspondingly,arelational-algebraexpressioncanbeevaluatedin
manyways.
Annotatedexpressionspecifyingdetailedevaluationstrategyis
calledanevaluation-plan.
E.g.,canuseanindexonbalancetofindaccountswithbalance<2500,
orcanperformcompleterelationscananddiscardaccountswith
balance>2500
13.5
BasicSteps:Optimization(Cont.)
QueryOptimization:Amongstallequivalentevaluationplans
choosetheonewithlowestcost.
Costisestimatedusingstatisticalinformationfromthe
databasecatalog
e.g.numberoftuplesineachrelation,sizeoftuples,etc.
Inthischapterwestudy
Howtomeasurequerycosts
Algorithmsforevaluatingrelationalalgebraoperations
Howtocombinealgorithmsforindividualoperationsinorderto
evaluateacompleteexpression
InChapter14
Westudyhowtooptimizequeries,thatis,howtofindanevaluation
planwithlowestestimatedcost

13.6
MeasuresofQueryCost
Costisgenerallymeasuredastotalelapsedtimefor
answeringquery
Manyfactorscontributetotimecost
diskaccesses,CPU,orevennetworkcommunication
Typicallydiskaccessisthepredominantcost,andisalso
relativelyeasytoestimate.Measuredbytakinginto
account
Numberofseeks*average-seek-cost
Numberofblocksread*average-block-read-cost
Numberofblockswritten*average-block-write-cost
Costtowriteablockisgreaterthancosttoreadablock
dataisreadbackafterbeingwrittentoensurethat
thewritewassuccessful
13.7
MeasuresofQueryCost(Cont.)
Forsimplicitywejustusenumberofblocktransfersfromdiskasthe
costmeasure
WeignorethedifferenceincostbetweensequentialandrandomI/Ofor
simplicity
WealsoignoreCPUcostsforsimplicity
Costsdependsonthesizeofthebufferinmainmemory
Havingmorememoryreducesneedfordiskaccess
Amountofrealmemoryavailabletobufferdependsonotherconcurrent
OSprocesses,andhardtodetermineaheadofactualexecution
Weoftenuseworstcaseestimates,assumingonlytheminimum
amountofmemoryneededfortheoperationisavailable
RealsystemstakeCPUcostintoaccount,differentiatebetween
sequentialandrandomI/O,andtakebuffersizeintoaccount
Wedonotincludecosttowritingoutputtodiskinourcost
formulae

13.8
SelectionOperation
Filescansearchalgorithmsthatlocateandretrieverecordsthat
fulfillaselectioncondition.
AlgorithmA1(linearsearch).Scaneachfileblockandtestall
recordstoseewhethertheysatisfytheselectioncondition.
Costestimate(numberofdiskblocksscanned)=b
r

b
r
denotesnumberofblockscontainingrecordsfromrelationr
Ifselectionisonakeyattribute,cost=(b
r
/2)
stoponfindingrecord
Linearsearchcanbeappliedregardlessof
selectionconditionor
orderingofrecordsinthefile,or
availabilityofindices

13.9
SelectionOperation(Cont.)
A2(binarysearch).Applicableifselectionisanequality
comparisonontheattributeonwhichfileisordered.
Assumethattheblocksofarelationarestoredcontiguously
Costestimate(numberofdiskblockstobescanned):
log
2
(b
r
)(costoflocatingthefirsttuplebyabinarysearch
ontheblocks
Plusnumberofblockscontainingrecordsthatsatisfy
selectioncondition
WillseehowtoestimatethiscostinChapter14

13.10
SelectionsUsingIndices
Indexscansearchalgorithmsthatuseanindex
selectionconditionmustbeonsearch-keyofindex.
A3(primaryindexoncandidatekey,equality).Retrieveasinglerecord
thatsatisfiesthecorrespondingequalitycondition
Cost=HT
i
+1
A4(primaryindexonnonkey,equality)Retrievemultiplerecords.
Recordswillbeonconsecutiveblocks
Cost=HT
i
+numberofblockscontainingretrievedrecords
A5(equalityonsearch-keyofsecondaryindex).
Retrieveasinglerecordifthesearch-keyisacandidatekey
Cost=HT
i
+1
Retrievemultiplerecordsifsearch-keyisnotacandidatekey
Cost=HT
i
+numberofrecordsretrieved
Canbeveryexpensive!
eachrecordmaybeonadifferentblock
oneblockaccessforeachretrievedrecord
13.11
SelectionsInvolvingComparisons
Canimplementselectionsoftheformo
AsV
(r)oro
A>V
(r)byusing
alinearfilescanorbinarysearch,
orbyusingindicesinthefollowingways:
A6(primaryindex,comparison).(RelationissortedonA)
Foro
A>V
(r)useindextofindfirsttuple>vandscanrelation
sequentiallyfromthere
Foro
AsV
(r)justscanrelationsequentiallytillfirsttuple>v;donot
useindex
A7(secondaryindex,comparison).
Foro
A>V
(r)useindextofindfirstindexentry>vandscanindex
sequentiallyfromthere,tofindpointerstorecords.
Foro
AsV
(r)justscanleafpagesofindexfindingpointerstorecords,
tillfirstentry>v
Ineithercase,retrieverecordsthatarepointedto
requiresanI/Oforeachrecord
Linearfilescanmaybecheaperifmanyrecordsare
tobefetched!
13.12
ImplementationofComplexSelections
Conjunction:o
u
1
.
u
2
....
u
n
(r)
A8(conjunctiveselectionusingoneindex).
Selectacombinationofu
i
andalgorithmsA1throughA7that
resultsintheleastcostforo
ui
(r).
Testotherconditionsontupleafterfetchingitintomemorybuffer.
A9(conjunctiveselectionusingmultiple-keyindex).
Useappropriatecomposite(multiple-key)indexifavailable.
A10(conjunctiveselectionbyintersectionofidentifiers).
Requiresindiceswithrecordpointers.
Usecorrespondingindexforeachcondition,andtakeintersection
ofalltheobtainedsetsofrecordpointers.
Thenfetchrecordsfromfile
Ifsomeconditionsdonothaveappropriateindices,applytestin
memory.
13.13
AlgorithmsforComplexSelections
Disjunction:o
u
1
v
u
2
v...
u
n
(r).
A11(disjunctiveselectionbyunionofidentifiers).
Applicableifallconditionshaveavailableindices.
Otherwiseuselinearscan.
Usecorrespondingindexforeachcondition,andtakeunionofallthe
obtainedsetsofrecordpointers.
Thenfetchrecordsfromfile
Negation:o
u
(r)
Uselinearscanonfile
Ifveryfewrecordssatisfyu,andanindexisapplicabletou
Findsatisfyingrecordsusingindexandfetchfromfile
13.14
Sorting
Wemaybuildanindexontherelation,andthenusetheindexto
readtherelationinsortedorder.Mayleadtoonediskblock
accessforeachtuple.
Forrelationsthatfitinmemory,techniqueslikequicksortcanbe
used.Forrelationsthatdontfitinmemory,external
sort-mergeisagoodchoice.
13.15
ExternalSort-Merge
1. Createsortedruns.Letibe0initially.
Repeatedlydothefollowingtilltheendoftherelation:
(a)ReadMblocksofrelationintomemory
(b)Sortthein-memoryblocks
(c)WritesorteddatatorunR
i
;incrementi.
LetthefinalvalueofIbeN
2. Mergetheruns(N-waymerge).Weassume(fornow)thatN<M.
1. UseNblocksofmemorytobufferinputruns,and1blocktobuffer
output.Readthefirstblockofeachrunintoitsbufferpage
2. repeat
1. Selectthefirstrecord(insortorder)amongallbufferpages
2. Writetherecordtotheoutputbuffer.Iftheoutputbufferisfull
writeittodisk.
3. Deletetherecordfromitsinputbufferpage.
Ifthebufferpagebecomesemptythen
readthenextblock(ifany)oftherunintothebuffer.
3. untilallinputbufferpagesareempty:
Let M denote memory size (in pages).
13.16
ExternalSort-Merge(Cont.)
Ifi>M,severalmergepassesarerequired.
Ineachpass,contiguousgroupsofM-1runsare
merged.
ApassreducesthenumberofrunsbyafactorofM-1,
andcreatesrunslongerbythesamefactor.
E.g.IfM=11,andthereare90runs,onepass
reducesthenumberofrunsto9,each10timesthe
sizeoftheinitialruns
Repeatedpassesareperformedtillallrunshavebeen
mergedintoone.
13.17
Example:ExternalSortingUsingSort-Merge
13.18
ExternalMergeSort(Cont.)
Costanalysis:
Totalnumberofmergepassesrequired:log
M1
(b
r
/M)(.
Diskaccessesforinitialruncreationaswellasineachpassis2b
r

forfinalpass,wedontcountwritecost
weignorefinalwritecostforalloperationssincetheoutputof
anoperationmaybesenttotheparentoperationwithout
beingwrittentodisk
Thustotalnumberofdiskaccessesforexternalsorting:
b
r
(2log
M1
(b
r
/M)(+1)

13.19
JoinOperation
Severaldifferentalgorithmstoimplementjoins
Nested-loopjoin
Blocknested-loopjoin
Indexednested-loopjoin
Merge-join
Hash-join
Choicebasedoncostestimate
Examplesusethefollowinginformation
Numberofrecordsofcustomer:10,000depositor:5000
Numberofblocksofcustomer:400depositor:100
13.20
Nested-LoopJoin
Tocomputethethetajoinr
u
s
foreachtuplet
r
inrdobegin
foreachtuplet
s
insdobegin
testpair(t
r
,t
s
)toseeiftheysatisfythejoinconditionu
iftheydo,addt
r
t
s
totheresult.
end
end
riscalledtheouterrelationandstheinnerrelationofthejoin.
Requiresnoindicesandcanbeusedwithanykindofjoin
condition.
Expensivesinceitexamineseverypairoftuplesinthetwo
relations.
13.21
Nested-LoopJoin(Cont.)
Intheworstcase,ifthereisenoughmemoryonlytohold
oneblockofeachrelation,theestimatedcostis
n
r
-b
s
+b
r

diskaccesses.
Ifthesmallerrelationfitsentirelyinmemory,usethatasthe
innerrelation.Reducescosttob
r
+b
s
diskaccesses.
Assumingworstcasememoryavailabilitycostestimateis
5000-400+100=2,000,100diskaccesseswithdepositoras
outerrelation,and
1000-100+400=1,000,400diskaccesseswithcustomeras
theouterrelation.
Ifsmallerrelation(depositor)fitsentirelyinmemory,thecost
estimatewillbe500diskaccesses.
Blocknested-loopsalgorithm(nextslide)ispreferable.
13.22
BlockNested-LoopJoin
Variantofnested-loopjoininwhicheveryblockofinner
relationispairedwitheveryblockofouterrelation.
foreachblockB
r
ofrdobegin
foreachblockB
s
ofsdobegin
foreachtuplet
r
inB
r
dobegin
foreachtuplet
s
inB
s
dobegin
Checkif(t
r
,t
s
)satisfythejoincondition
iftheydo,addt
r

t
s
totheresult.
end
end
end
end
13.23
BlockNested-LoopJoin(Cont.)
Worstcaseestimate:b
r
-b
s
+b
r
blockaccesses.
Eachblockintheinnerrelationsisreadonceforeachblockinthe
outerrelation(insteadofonceforeachtupleintheouterrelation
Bestcase:b
r
+b
s
blockaccesses.
Improvementstonestedloopandblocknestedloopalgorithms:
Inblocknested-loop,useM2diskblocksasblockingunitfor
outerrelations,whereM=memorysizeinblocks;useremaining
twoblockstobufferinnerrelationandoutput
Cost=b
r
/(M-2)(-b
s
+b
r

Ifequi-joinattributeformsakeyorinnerrelation,stopinnerloop
onfirstmatch
Scaninnerloopforwardandbackwardalternately,tomakeuseof
theblocksremaininginbuffer(withLRUreplacement)
Useindexoninnerrelationifavailable(nextslide)
13.24
IndexedNested-LoopJoin
Indexlookupscanreplacefilescansif
joinisanequi-joinornaturaljoinand
anindexisavailableontheinnerrelationsjoinattribute
Canconstructanindexjusttocomputeajoin.
Foreachtuplet
r
intheouterrelationr,usetheindextolookup
tuplesinsthatsatisfythejoinconditionwithtuplet
r
.
Worstcase:bufferhasspaceforonlyonepageofr,and,foreach
tupleinr,weperformanindexlookupons.
Costofthejoin:b
r
+n
r
-c
Wherecisthecostoftraversingindexandfetchingallmatchings
tuplesforonetupleorr
ccanbeestimatedascostofasingleselectiononsusingthejoin
condition.
Ifindicesareavailableonjoinattributesofbothrands,
usetherelationwithfewertuplesastheouterrelation.
13.25
ExampleofNested-LoopJoinCosts
Computedepositorcustomer,withdepositorastheouter
relation.
LetcustomerhaveaprimaryB
+
-treeindexonthejoinattribute
customer-name,whichcontains20entriesineachindexnode.
Sincecustomerhas10,000tuples,theheightofthetreeis4,and
onemoreaccessisneededtofindtheactualdata
depositorhas5000tuples
Costofblocknestedloopsjoin
400*100+100=40,100diskaccessesassumingworstcase
memory(maybesignificantlylesswithmorememory)
Costofindexednestedloopsjoin
100+5000*5=25,100diskaccesses.
CPUcostlikelytobelessthanthatforblocknestedloopsjoin
13.26
Merge-Join
1. Sortbothrelationsontheirjoinattribute(ifnotalreadysortedonthe
joinattributes).
2. Mergethesortedrelationstojointhem
1. Joinstepissimilartothemergestageofthesort-mergealgorithm.
2. Maindifferenceishandlingofduplicatevaluesinjoinattributeevery
pairwithsamevalueonjoinattributemustbematched

13.27
Merge-Join(Cont.)
Canbeusedonlyforequi-joinsandnaturaljoins
Eachblockneedstobereadonlyonce(assumingalltuplesfor
anygivenvalueofthejoinattributesfitinmemory
Thusnumberofblockaccessesformerge-joinis
b
r
+b
s
+thecostofsortingifrelationsareunsorted.
hybridmerge-join:Ifonerelationissorted,andtheotherhasa
secondaryB
+
-treeindexonthejoinattribute
MergethesortedrelationwiththeleafentriesoftheB
+
-tree.
Sorttheresultontheaddressesoftheunsortedrelationstuples
Scantheunsortedrelationinphysicaladdressorderandmergewith
previousresult,toreplaceaddressesbytheactualtuples
Sequentialscanmoreefficientthanrandomlookup
13.28
Hash-Join
Applicableforequi-joinsandnaturaljoins.
Ahashfunctionhisusedtopartitiontuplesofbothrelations
hmapsJoinAttrsvaluesto{0,1,...,n},whereJoinAttrsdenotes
thecommonattributesofrandsusedinthenaturaljoin.
r
0
,r
1
,...,r
n
denotepartitionsofrtuples
Eachtuplet
r
erisputinpartitionr
i
wherei=h(t
r
[JoinAttrs]).
r
0
,,r
1
...,r
n
denotespartitionsofstuples
Eachtuplet
s
esisputinpartitions
i
,wherei=h(t
s
[JoinAttrs]).

Note:Inbook,r
i
isdenotedasH
ri,
s
i
isdenotedasH
si
and
n

isdenotedasn
h.

13.29
Hash-Join(Cont.)
13.30
Hash-Join(Cont.)
rtuplesinr
i
needonlytobecomparedwithstuplesins
i

Neednotbecomparedwithstuplesinanyotherpartition,
since:
anrtupleandanstuplethatsatisfythejoinconditionwillhave
thesamevalueforthejoinattributes.
Ifthatvalueishashedtosomevaluei,thertuplehastobeinr
i

andthestupleins
i
.
13.31
Hash-JoinAlgorithm
1. Partitiontherelationsusinghashingfunctionh.When
partitioningarelation,oneblockofmemoryisreservedasthe
outputbufferforeachpartition.
2. Partitionrsimilarly.
3. Foreachi:
(a) Loads
i
intomemoryandbuildanin-memoryhashindexonit
usingthejoinattribute.Thishashindexusesadifferenthash
functionthantheearlieroneh.
(b) Readthetuplesinr
i
fromthediskonebyone.Foreachtuple
t
r
locateeachmatchingtuplet
s
ins
i
usingthein-memoryhash
index.Outputtheconcatenationoftheirattributes.
The hash-join of r and s is computed as follows.
Relation s is called the build input and
r is called the probe input.
13.32
Hash-Joinalgorithm(Cont.)
Thevaluenandthehashfunctionhischosensuchthateach
s
i
shouldfitinmemory.
Typicallynischosenasb
s
/M(*fwherefisafudgefactor,
typicallyaround1.2
Theproberelationpartitionss
i
neednotfitinmemory
Recursivepartitioningrequiredifnumberofpartitionsnis
greaterthannumberofpagesMofmemory.
insteadofpartitioningnways,useM1partitionsfors
FurtherpartitiontheM1partitionsusingadifferenthash
function
Usesamepartitioningmethodonr
Rarelyrequired:e.g.,recursivepartitioningnotneededfor
relationsof1GBorlesswithmemorysizeof2MB,withblocksize
of4KB.
13.33
HandlingofOverflows
Hash-tableoverflowoccursinpartitions
i
ifs
i
doesnotfitin
memory.Reasonscouldbe
Manytuplesinswithsamevalueforjoinattributes
Badhashfunction
Partitioningissaidtobeskewedifsomepartitionshave
significantlymoretuplesthansomeothers
Overflowresolutioncanbedoneinbuildphase
Partitions
i
isfurtherpartitionedusingdifferenthashfunction.
Partitionr
i
mustbesimilarlypartitioned.
Overflowavoidanceperformspartitioningcarefullytoavoid
overflowsduringbuildphase
E.g.partitionbuildrelationintomanypartitions,thencombinethem
Bothapproachesfailwithlargenumbersofduplicates
Fallbackoption:useblocknestedloopsjoinonoverflowed
partitions

13.34
CostofHash-Join
Ifrecursivepartitioningisnotrequired:costofhashjoinis
3(b
r
+b
s
)+2-n
h

Ifrecursivepartitioningrequired,numberofpassesrequiredfor
partitioningsislog
M1
(b
s
)1(.Thisisbecauseeachfinal
partitionofsshouldfitinmemory.
Thenumberofpartitionsofproberelationristhesameasthat
forbuildrelations;thenumberofpassesforpartitioningofris
alsothesameasfors.
Thereforeitisbesttochoosethesmallerrelationasthebuild
relation.
Totalcostestimateis:
2(b
r
+b
s
log
M1
(b
s
)1(+b
r
+b
s

Iftheentirebuildinputcanbekeptinmainmemory,ncanbeset
to0andthealgorithmdoesnotpartitiontherelationsinto
temporaryfiles.Costestimategoesdowntob
r
+b
s
.
13.35
ExampleofCostofHash-Join
Assumethatmemorysizeis20blocks
b
depositor
=100andb
customer
=400.
depositoristobeusedasbuildinput.Partitionitintofivepartitions,
eachofsize20blocks.Thispartitioningcanbedoneinonepass.
Similarly,partitioncustomerintofivepartitions,eachofsize80.
Thisisalsodoneinonepass.
Thereforetotalcost:3(100+400)=1500blocktransfers
ignorescostofwritingpartiallyfilledblocks
customer depositor
13.36
HybridHashJoin
Usefulwhenmemorysizedarerelativelylarge,andthebuildinput
isbiggerthanmemory.
Mainfeatureofhybridhashjoin:
Keepthefirstpartitionofthebuildrelationinmemory.
E.g.Withmemorysizeof25blocks,depositorcanbepartitioned
intofivepartitions,eachofsize20blocks.
Divisionofmemory:
Thefirstpartitionoccupies20blocksofmemory
1blockisusedforinput,and1blockeachforbufferingtheother4
partitions.
customerissimilarlypartitionedintofivepartitionseachofsize80;
thefirstisusedrightawayforprobing,insteadofbeingwrittenout
andreadback.
Costof3(80+320)+20+80=1300blocktransfersfor
hybridhashjoin,insteadof1500withplainhash-join.
Hybridhash-joinmostusefulifM>>
s
b
13.37
ComplexJoins
Joinwithaconjunctivecondition:
r
u1.u2.....un
s
Eitherusenestedloops/blocknestedloops,or
Computetheresultofoneofthesimplerjoinsr
ui
s
finalresultcomprisesthosetuplesintheintermediateresult
thatsatisfytheremainingconditions

u
1
.....u
i1
.u
i+1
.....u
n

Joinwithadisjunctivecondition

r
u1vu2v...vun
s
Eitherusenestedloops/blocknestedloops,or
Computeastheunionoftherecordsinindividualjoinsr
u
i
s:
(r
u1
s)(r
u2
s)...(r
un
s)

13.38
OtherOperations
Duplicateeliminationcanbeimplementedviahashing
orsorting.
Onsortingduplicateswillcomeadjacenttoeachother,
andallbutonesetofduplicatescanbedeleted.
Optimization:duplicatescanbedeletedduringrun
generationaswellasatintermediatemergestepsin
externalsort-merge.
Hashingissimilarduplicateswillcomeintothesame
bucket.
Projectionisimplementedbyperformingprojectionon
eachtuplefollowedbyduplicateelimination.
13.39
OtherOperations:Aggregation
Aggregationcanbeimplementedinamannersimilartoduplicate
elimination.
Sortingorhashingcanbeusedtobringtuplesinthesamegroup
together,andthentheaggregatefunctionscanbeappliedoneach
group.
Optimization:combinetuplesinthesamegroupduringrun
generationandintermediatemerges,bycomputingpartialaggregate
values
Forcount,min,max,sum:keepaggregatevaluesontuples
foundsofarinthegroup.
Whencombiningpartialaggregateforcount,addupthe
aggregates
Foravg,keepsumandcount,anddividesumbycountatthe
end
13.40
OtherOperations:SetOperations
Setoperations(,and):caneitherusevariantofmerge-
joinaftersorting,orvariantofhash-join.
E.g.,Setoperationsusinghashing:
1. Partitionbothrelationsusingthesamehashfunction,thereby
creating
,
r
1,..,
r
n
r
0,
ands
1,
s
2..,
s
n
2. Processeachpartitioniasfollows.Usingadifferenthashing
function,buildanin-memoryhashindexonr
i
afteritisbrought
intomemory.
3.rs:Addtuplesins
i
tothehashindexiftheyarenotalreadyin
it.Atendofs
i
addthetuplesinthehashindextotheresult.
rs:outputtuplesins
i
totheresultiftheyarealreadytherein
thehashindex.
rs:foreachtupleins
i
,ifitisthereinthehashindex,deleteit
fromtheindex.Atendofs
i
addremainingtuplesinthehash
indextotheresult.
13.41
OtherOperations:OuterJoin
Outerjoincanbecomputedeitheras
Ajoinfollowedbyadditionofnull-paddednon-participatingtuples.
bymodifyingthejoinalgorithms.
Modifyingmergejointocomputers
Inrs,nonparticipatingtuplesarethoseinrH
R
(rs)
Modifymerge-jointocomputers:Duringmerging,forevery
tuplet
r
fromrthatdonotmatchanytupleins,outputt
r
paddedwith
nulls.
Rightouter-joinandfullouter-joincanbecomputedsimilarly.
Modifyinghashjointocomputers
Ifrisproberelation,outputnon-matchingrtuplespaddedwithnulls
Ifrisbuildrelation,whenprobingkeeptrackofwhich
rtuplesmatchedstuples.Atendofs
i
output
non-matchedrtuplespaddedwithnulls

13.42
EvaluationofExpressions
Sofar:wehaveseenalgorithmsforindividualoperations
Alternativesforevaluatinganentireexpressiontree
Materialization:generateresultsofanexpressionwhoseinputsare
relationsorarealreadycomputed,materialize(store)itondisk.
Repeat.
Pipelining:passontuplestoparentoperationsevenasan
operationisbeingexecuted
Westudyabovealternativesinmoredetail
13.43
Materialization
Materializedevaluation:evaluateoneoperationata
time,startingatthelowest-level.Useintermediateresults
materializedintotemporaryrelationstoevaluatenext-
leveloperations.
E.g.,infigurebelow,computeandstore

thencomputethestoreitsjoinwithcustomer,andfinally
computetheprojectionsoncustomer-name.
) (
2500
account
balance<
o
13.44
Materialization(Cont.)
Materializedevaluationisalwaysapplicable
Costofwritingresultstodiskandreadingthembackcanbequite
high
Ourcostformulasforoperationsignorecostofwritingresultsto
disk,so
Overallcost=Sumofcostsofindividualoperations+
costofwritingintermediateresultstodisk
Doublebuffering:usetwooutputbuffersforeachoperation,
whenoneisfullwriteittodiskwhiletheotherisgettingfilled
Allowsoverlapofdiskwriteswithcomputationandreduces
executiontime
13.45
Pipelining
Pipelinedevaluation:evaluateseveraloperationssimultaneously,
passingtheresultsofoneoperationontothenext.
E.g.,inpreviousexpressiontree,dontstoreresultof

instead,passtuplesdirectlytothejoin..Similarly,dontstoreresultof
join,passtuplesdirectlytoprojection.
Muchcheaperthanmaterialization:noneedtostoreatemporary
relationtodisk.
Pipeliningmaynotalwaysbepossiblee.g.,sort,hash-join.
Forpipeliningtobeeffective,useevaluationalgorithmsthat
generateoutputtuplesevenastuplesarereceivedforinputstothe
operation.
Pipelinescanbeexecutedintwoways:demanddrivenand
producerdriven
) (
2500
account
balance<
o
13.46
Pipelining(Cont.)
Indemanddrivenorlazyevaluation
systemrepeatedlyrequestsnexttuplefromtopleveloperation
Eachoperationrequestsnexttuplefromchildrenoperationsasrequired,inorderto
outputitsnexttuple
Inbetweencalls,operationhastomaintainstatesoitknowswhattoreturnnext
Eachoperationisimplementedasaniteratorimplementingthefollowingoperations
open()
E.g.filescan:initializefilescan,storepointertobeginningoffileasstate
E.g.mergejoin:sortrelationsandstorepointerstobeginningofsorted
relationsasstate
next()
E.g.forfilescan:Outputnexttuple,andadvanceandstorefilepointer
E.g.formergejoin:continuewithmergefromearlierstatetill
nextoutputtupleisfound.Savepointersasiteratorstate.
close()

13.47
Pipelining(Cont.)
Inproduce-drivenoreagerpipelining
Operatorsproducetupleseagerlyandpassthemuptotheirparents
Buffermaintainedbetweenoperators,childputstuplesinbuffer,
parentremovestuplesfrombuffer
ifbufferisfull,childwaitstillthereisspaceinthebuffer,andthen
generatesmoretuples
Systemschedulesoperationsthathavespaceinoutputbufferand
canprocessmoreinputtuples
13.48
EvaluationAlgorithmsforPipelining
Somealgorithmsarenotabletooutputresultsevenastheyget
inputtuples
E.g.mergejoin,orhashjoin
Theseresultinintermediateresultsbeingwrittentodiskandthenread
backalways
Algorithmvariantsarepossibletogenerate(atleastsome)results
onthefly,asinputtuplesarereadin
E.g.hybridhashjoingeneratesoutputtuplesevenasproberelation
tuplesinthein-memorypartition(partition0)arereadin
Pipelinedjointechnique:Hybridhashjoin,modifiedtobufferpartition
0tuplesofbothrelationsin-memory,readingthemastheybecome
available,andoutputresultsofanymatchesbetweenpartition0
tuples
Whenanewr
0
tupleisfound,matchitwithexistings
0
tuples,
outputmatches,andsaveitinr
0

Symmetricallyfors
0
tuples
13.49
ComplexJoins
Joininvolvingthreerelations:loandepositorcustomer
Strategy1.Computedepositorcustomer;useresultto
computeloan(depositorcustomer)
Strategy2.Computeloandepositorfirst,andthenjointhe
resultwithcustomer.
Strategy3.Performthepairofjoinsatonce.Buildand
indexonloanforloan-number,andoncustomerfor
customer-name.
Foreachtupletindepositor,lookupthecorrespondingtuples
incustomerandthecorrespondingtuplesinloan.
Eachtupleofdepositisexaminedexactlyonce.
Strategy3combinestwooperationsintoonespecial-
purposeoperationthatismoreefficientthanimplementing
twojoinsoftworelations.

You might also like