Physical Database Design and Tuning: Module 5, Lecture 3
Physical Database Design and Tuning: Module 5, Lecture 3
and Tuning
Module 5, Lecture 3
Overview
AfterERdesign,schemarefinement,andthedefinition
ofviews,wehavethelogicalandexternalschemasfor
ourdatabase.
Thenextstepistochooseindexes,makeclustering
decisions,andtorefinetheconceptualandexternal
schemas(ifnecessary)tomeetperformancegoals.
Wemustbeginbyunderstandingtheworkload:
Themostimportantqueriesandhowoftentheyarise.
Themostimportantupdatesandhowoftentheyarise.
Thedesiredperformanceforthesequeriesandupdates.
2
Foreachqueryintheworkload:
Whichrelationsdoesitaccess?
Whichattributesareretrieved?
Whichattributesareinvolvedinselection/joinconditions?How
selectivearetheseconditionslikelytobe?
Foreachupdateintheworkload:
Whichattributesareinvolvedinselection/joinconditions?How
selectivearetheseconditionslikelytobe?
Thetypeofupdate(INSERT/DELETE/UPDATE),andtheattributes
thatareaffected.
3
Decisions to Make
Whatindexesshouldwecreate?
Whichrelationsshouldhaveindexes?Whatfield(s)shouldbethe
searchkey?Shouldwebuildseveralindexes?
Foreachindex,whatkindofanindexshoulditbe?
Clustered?Hash/tree?Dynamic/static?Dense/sparse?
Shouldwemakechangestotheconceptualschema?
Consideralternativenormalizedschemas?(Remember,thereare
manychoicesindecomposingintoBCNF,etc.)
Shouldwe``undosomedecompositionstepsandsettlefora
lowernormalform?(Denormalization.)
Horizontalpartitioning,replication,views...
4
Choice of Indexes
Oneapproach:considerthemostimportantqueriesinturn.
Considerthebestplanusingthecurrentindexes,andseeif
abetterplanispossiblewithanadditionalindex.Ifso,
createit.
Beforecreatinganindex,mustalsoconsidertheimpacton
updatesintheworkload!
Tradeoff:indexescanmakequeriesgofaster,updatesslower.
Requirediskspace,too.
AttributesmentionedinaWHEREclausearecandidatesfor
indexsearchkeys.
Exactmatchconditionsuggestshashindex.
Rangequerysuggeststreeindex.
Clusteringisespeciallyusefulforrangequeries,althoughitcan
helponequalityqueriesaswellinthepresenceofduplicates.
Trytochooseindexesthatbenefitasmanyqueriesas
possible.Sinceonlyoneindexcanbeclusteredper
relation,chooseitbasedonimportantqueriesthatwould
benefitthemostfromclustering.
6
Ifrangeselectionsareinvolved,orderofattributesshouldbecarefully
chosentomatchtherangeordering.
Suchindexescansometimesenableindexonlystrategiesforimportant
queries.
Forindexonlystrategies,clusteringisnotimportant!
Whenconsideringajoincondition:
HashindexoninnerisverygoodforIndexNestedLoops.
Shouldbeclusteredifjoincolumnisnotkeyforinner,andinner
tuplesneedtoberetrieved.
ClusteredB+treeonjoincolumn(s)goodforSortMerge.
7
Example1
SELECTE.ename,D.mgr
FROMEmpE,DeptD
WHERED.dname=ToyANDE.dno=D.dno
HashindexonD.dnamesupportsToyselection.
Giventhis,indexonD.dnoisnotneeded.
HashindexonE.dnoallowsustogetmatching(inner)Emp
tuplesforeachselected(outer)Depttuple.
WhatifWHEREincluded:``...ANDE.age=25?
CouldretrieveEmptuplesusingindexonE.age,thenjoinwith
Depttuplessatisfyingdnameselection.Comparabletostrategy
thatusedE.dnoindex.
So,ifE.ageindexisalreadycreated,thisqueryprovidesmuch
lessmotivationforaddinganE.dnoindex.
8
Example2
SELECTE.ename,D.mgr
FROMEmpE,DeptD
WHEREE.salBETWEEN10000AND20000
ANDE.hobby=StampsANDE.dno=D.dno
Clearly,Empshouldbetheouterrelation.
SuggeststhatwebuildahashindexonD.dno.
WhatindexshouldwebuildonEmp?
B+treeonE.salcouldbeused,ORanindexonE.hobbycouldbe
used.Onlyoneoftheseisneeded,andwhichisbetterdepends
upontheselectivityoftheconditions.
Asaruleofthumb,equalityselectionsmoreselectivethan
rangeselections.
Asbothexamplesindicate,ourchoiceofindexesisguided
bytheplan(s)thatweexpectanoptimizertoconsiderfora
query.Havetounderstandoptimizers!
9
ToretrieveEmprecordswithage=30ANDsal=4000,an
indexon<age,sal>wouldbebetterthananindexonageor
anindexonsal.
Suchindexesalsocalledcompositeorconcatenatedindexes.
Choiceofindexkeyorthogonaltoclusteringetc.
Ifconditionis:20<age<30AND3000<sal<5000:
Clusteredtreeindexon<age,sal>or<sal,age>isbest.
Ifconditionis:age=30AND3000<sal<5000:
Clustered<age,sal>indexmuchbetterthan<sal,age>index!
Compositeindexesarelarger,updatedmoreoften.
10
Index-Only Plans
<E.dno>
SELECTD.mgr
FROMDeptD,EmpE
WHERED.dno=E.dno
SELECTD.mgr,E.eid
Anumberof
<E.dno,E.eid>
FROMDeptD,EmpE
Treeindex!
queriescanbe
WHERED.dno=E.dno
answeredwithout
SELECTE.dno,COUNT(*)
retrievingany
<E.dno> FROMEmpE
tuplesfromone
GROUPBYE.dno
ormoreofthe
SELECTE.dno,MIN(E.sal)
<E.dno,E.sal> FROMEmpE
relations
Treeindex! GROUPBYE.dno
involvedifa
suitableindexis
<E.age,E.sal> SELECTAVG(E.sal)
or
available.
FROMEmpE
<E.sal,E.age> WHEREE.age=25AND
Tree! E.salBETWEEN3000AND5000
11
Summary
Databasedesignconsistsofseveraltasks:requirements
analysis,conceptualdesign,schemarefinement,physical
designandtuning.
Ingeneral,havetogobackandforthbetweenthesetaskstorefine
adatabasedesign,anddecisionsinonetaskcaninfluencethe
choicesinanothertask.
Understandingthenatureoftheworkloadforthe
application,andtheperformancegoals,isessentialto
developingagooddesign.
Whataretheimportantqueriesandupdates?What
attributes/relationsareinvolved?
12
Summary (Contd.)
Indexesmustbechosentospeedupimportantqueries(and
perhapssomeupdates!).
Indexmaintenanceoverheadonupdatestokeyfields.
Chooseindexesthatcanhelpmanyqueries,ifpossible.
Buildindexestosupportindexonlystrategies.
Clusteringisanimportantdecision;onlyoneindexonagiven
relationcanbeclustered!
Orderoffieldsincompositeindexkeycanbeimportant.
Staticindexesmayhavetobeperiodicallyrebuilt.
Statisticshavetobeperiodicallyupdated.
13
Thechoiceofconceptualschemashouldbeguidedbytheworkload,in
additiontoredundancyissues:
Wemaysettlefora3NFschemaratherthanBCNF.
Workloadmayinfluencethechoicewemakeindecomposingarelationinto3NF
orBCNF.
WemayfurtherdecomposeaBCNFschema!
Wemightdenormalize(i.e.,undoadecompositionstep),orwemightaddfieldsto
arelation.
Wemightconsiderhorizontaldecompositions.
Ifsuchchangesaremadeafteradatabaseisinuse,calledschema
evolution;mightwanttomasksomeofthesechangesfromapplications
bydefiningviews.
14
Theconceptualschemashouldberefinedbyconsidering
performancecriteriaandworkload:
Maychoose3NForlowernormalformoverBCNF.
MaychooseamongalternativedecompositionsintoBCNF(or3NF)based
upontheworkload.
Maydenormalize,orundosomedecompositions.
MaydecomposeaBCNFrelationfurther!
Maychooseahorizontaldecompositionofarelation.
Importanceofdependencypreservationbaseduponthedependencytobe
preserved,andthecostoftheICcheck.
Canaddarelationtoensuredeppreservation(for3NF,notBCNF!);or
else,cancheckdependencyusingajoin.
15
Summary (Contd.)
Overtime,indexeshavetobefinetuned(dropped,
created,rebuilt,...)forperformance.
Shoulddeterminetheplanusedbythesystem,andadjustthe
choiceofindexesappropriately.
Systemmaystillnotfindagoodplan:
Onlyleftdeepplansconsidered!
Nullvalues,arithmeticconditions,stringexpressions,theuseof
ORs,etc.canconfuseanoptimizer.
So,mayhavetorewritethequery/view:
Avoidnestedqueries,temporaryrelations,complexconditions,
andoperationslikeDISTINCTandGROUPBY.
16