0% found this document useful (0 votes)
96 views

Query Optimization - Wikipedia

The document summarizes query optimization in relational database management systems. It discusses how query optimizers attempt to determine the most efficient way to execute a given query by considering different query plans and choosing the plan with the lowest estimated cost. It describes some key aspects of query optimization including join ordering, cost estimation, and extensions like parametric query optimization that account for uncertainty in cost estimates.

Uploaded by

Justin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views

Query Optimization - Wikipedia

The document summarizes query optimization in relational database management systems. It discusses how query optimizers attempt to determine the most efficient way to execute a given query by considering different query plans and choosing the plan with the lowest estimated cost. It describes some key aspects of query optimization including join ordering, cost estimation, and extensions like parametric query optimization that account for uncertainty in cost estimates.

Uploaded by

Justin
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

9/8/2016

QueryoptimizationWikipedia,thefreeencyclopedia

Queryoptimization
FromWikipedia,thefreeencyclopedia

Queryoptimizationisafunctionofmanyrelationaldatabasemanagementsystems.Thequeryoptimizer
attemptstodeterminethemostefficientwaytoexecuteagivenquerybyconsideringthepossiblequeryplans.
Generally,thequeryoptimizercannotbeaccesseddirectlybyusers:oncequeriesaresubmittedtodatabaseserver,
andparsedbytheparser,theyarethenpassedtothequeryoptimizerwhereoptimizationoccurs.However,some
databaseenginesallowguidingthequeryoptimizerwithhints.
Aqueryisarequestforinformationfromadatabase.Itcanbeassimpleas"findingtheaddressofapersonwith
SS#123456789,"ormorecomplexlike"findingtheaveragesalaryofalltheemployedmarriedmenin
Californiabetweentheages30to39,thatearnlessthantheirwives."Queriesresultsaregeneratedbyaccessing
relevantdatabasedataandmanipulatingitinawaythatyieldstherequestedinformation.Sincedatabasestructures
arecomplex,inmostcases,andespeciallyfornotverysimplequeries,theneededdataforaquerycanbecollected
fromadatabasebyaccessingitindifferentways,throughdifferentdatastructures,andindifferentorders.Each
differentwaytypicallyrequiresdifferentprocessingtime.Processingtimesofthesamequerymayhavelarge
variance,fromafractionofasecondtohours,dependingonthewayselected.Thepurposeofqueryoptimization,
whichisanautomatedprocess,istofindthewaytoprocessagivenqueryinminimumtime.Thelargepossible
varianceintimejustifiesperformingqueryoptimization,thoughfindingtheexactoptimalwaytoexecuteaquery,
amongallpossibilities,istypicallyverycomplex,timeconsumingbyitself,maybetoocostly,andoften
practicallyimpossible.Thusqueryoptimizationtypicallytriestoapproximatetheoptimumbycomparingseveral
commonsensealternativestoprovideinareasonabletimea"goodenough"planwhichtypicallydoesnotdeviate
muchfromthebestpossibleresult.

Contents
1 Generalconsideration
2 Implementation
2.1 Joinordering
2.2 QueryplanningfornestedSQLqueries
2.3 Costestimation
3 Extensions
3.1 ParametricQueryOptimization
3.2 MultiObjectiveQueryOptimization
3.3 MultiObjectiveParametricQueryOptimization
4 Seealso
5 References
6 Externallinks

Generalconsideration
Thereisatradeoffbetweentheamountoftimespentfiguringoutthebestqueryplanandthequalityofthechoice
theoptimizermaynotchoosethebestansweronisown.Differentqualitiesofdatabasemanagementsystemshave
differentwaysofbalancingthesetwo.Costbasedqueryoptimizersevaluatetheresourcefootprintofvarious
queryplansandusethisasthebasisforplanselection.Theseassignanestimated"cost"toeachpossiblequery
plan,andchoosetheplanwiththesmallestcost.Costsareusedtoestimatetheruntimecostofevaluatingthe
query,intermsofthenumberofI/Ooperationsrequired,CPUpathlength,amountofdiskbufferspace,disk
https://en.wikipedia.org/wiki/Query_optimization

1/5

9/8/2016

QueryoptimizationWikipedia,thefreeencyclopedia

storageservicetime,andinterconnectusagebetweenunitsofparallelism,andotherfactorsdeterminedfromthe
datadictionary.Thesetofqueryplansexaminedisformedbyexaminingthepossibleaccesspaths(e.g.,primary
indexaccess,secondaryindexaccess,fullfilescan)andvariousrelationaltablejointechniques(e.g.,mergejoin,
hashjoin,productjoin).ThesearchspacecanbecomequitelargedependingonthecomplexityoftheSQLquery.
Therearetwotypesofoptimization.Theseconsistoflogicaloptimizationwhichgeneratesasequenceof
relationalalgebratosolvethequeryandphysicaloptimizationwhichisusedtodeterminethemeansof
carryingouteachoperation.

Implementation
Mostqueryoptimizersrepresentqueryplansasatreeof"plannodes".Aplannodeencapsulatesasingleoperation
thatisrequiredtoexecutethequery.Thenodesarearrangedasatree,inwhichintermediateresultsflowfromthe
bottomofthetreetothetop.Eachnodehaszeroormorechildnodesthosearenodeswhoseoutputisfedas
inputtotheparentnode.Forexample,ajoinnodewillhavetwochildnodes,whichrepresentthetwojoin
operands,whereasasortnodewouldhaveasinglechildnode(theinputtobesorted).Theleavesofthetreeare
nodeswhichproduceresultsbyscanningthedisk,forexamplebyperforminganindexscanorasequentialscan.

Joinordering
Theperformanceofaqueryplanisdeterminedlargelybytheorderinwhichthetablesarejoined.Forexample,
whenjoining3tablesA,B,Cofsize10rows,10,000rows,and1,000,000rows,respectively,aqueryplanthat
joinsBandCfirstcantakeseveralordersofmagnitudemoretimetoexecutethanonethatjoinsAandCfirst.
MostqueryoptimizersdeterminejoinorderviaadynamicprogrammingalgorithmpioneeredbyIBM'sSystemR
databaseproject.Thisalgorithmworksintwostages:
1.First,allwaystoaccesseachrelationinthequeryarecomputed.Everyrelationinthequerycanbeaccessed
viaasequentialscan.Ifthereisanindexonarelationthatcanbeusedtoanswerapredicateinthequery,an
indexscancanalsobeused.Foreachrelation,theoptimizerrecordsthecheapestwaytoscantherelation,as
wellasthecheapestwaytoscantherelationthatproducesrecordsinaparticularsortedorder.
2.Theoptimizerthenconsiderscombiningeachpairofrelationsforwhichajoinconditionexists.Foreach
pair,theoptimizerwillconsidertheavailablejoinalgorithmsimplementedbytheDBMS.Itwillpreserve
thecheapestwaytojoineachpairofrelations,inadditiontothecheapestwaytojoineachpairofrelations
thatproducesitsoutputaccordingtoaparticularsortorder.
3.Thenallthreerelationqueryplansarecomputed,byjoiningeachtworelationplanproducedbytheprevious
phasewiththeremainingrelationsinthequery.
Sortordercanavoidaredundantsortoperationlateroninprocessingthequery.Second,aparticularsortordercan
speedupasubsequentjoinbecauseitclustersthedatainaparticularway.

QueryplanningfornestedSQLqueries
ASQLquerytoamodernrelationalDBMSdoesmorethanjustselectionsandjoins.Inparticular,SQLqueries
oftennestseverallayersofSPJblocks(SelectProjectJoin),bymeansofgroupby,exists,andnotexistsoperators.
InsomecasessuchnestedSQLqueriescanbeflattenedintoaselectprojectjoinquery,butnotalways.Query
plansfornestedSQLqueriescanalsobechosenusingthesamedynamicprogrammingalgorithmasusedforjoin
ordering,butthiscanleadtoanenormousescalationinqueryoptimizationtime.Sosomedatabasemanagement
systemscanbeuseanalternativerulebasedapproachthatusesaquerygraphmodel.

Costestimation

https://en.wikipedia.org/wiki/Query_optimization

2/5

9/8/2016

QueryoptimizationWikipedia,thefreeencyclopedia

Oneofthehardestproblemsinqueryoptimizationistoaccuratelyestimatethecostsofalternativequeryplans.
Optimizerscostqueryplansusingamathematicalmodelofqueryexecutioncoststhatreliesheavilyonestimates
ofthecardinality,ornumberoftuples,flowingthrougheachedgeinaqueryplan.Cardinalityestimationinturn
dependsonestimatesoftheselectionfactorofpredicatesinthequery.Traditionally,databasesystemsestimate
selectivitiesthroughfairlydetailedstatisticsonthedistributionofvaluesineachcolumn,suchashistograms.This
techniqueworkswellforestimationofselectivitiesofindividualpredicates.Howevermanyquerieshave
conjunctionsofpredicatessuchasselectcount(*)fromRwhereR.make='Honda'andR.model='Accord'.Query
predicatesareoftenhighlycorrelated(forexample,model='Accord'impliesmake='Honda'),anditisveryhardto
estimatetheselectivityoftheconjunctingeneral.Poorcardinalityestimatesanduncaughtcorrelationareoneof
themainreasonswhyqueryoptimizerspickpoorqueryplans.Thisisonereasonwhyadatabaseadministrator
shouldregularlyupdatethedatabasestatistics,especiallyaftermajordataloads/unloads.

Extensions
Classicalqueryoptimizationassumesthatqueryplansarecomparedaccordingtoonesinglecostmetric,usually
executiontime,andthatthecostofeachqueryplancanbecalculatedwithoutuncertainty.Bothassumptionsare
sometimesviolatedinpractice[1]andmultipleextensionsofclassicalqueryoptimizationhavebeenstudiedinthe
researchliteraturethatovercomethoselimitations.Thoseextendedproblemvariantsdifferinhowtheymodelthe
costofsinglequeryplansandintermsoftheiroptimizationgoal.

ParametricQueryOptimization
Classicalqueryoptimizationassociateseachqueryplanwithonescalarcostvalue.Parametricquery
optimization[2]assumesthatqueryplancostdependsonparameterswhosevaluesareunknownatoptimization
time.Suchparameterscanforinstancerepresenttheselectivityofquerypredicatesthatarenotfullyspecifiedat
optimizationtimebutwillbeprovidedatexecutiontime.Parametricqueryoptimizationthereforeassociateseach
queryplanwithacostfunctionthatmapsfromamultidimensionalparameterspacetoaonedimensionalcost
space.
Thegoalofoptimizationisusuallytogenerateallqueryplansthatcouldbeoptimalforanyofthepossible
parametervaluecombinations.Thisyieldsasetofrelevantqueryplans.Atruntime,thebestplanisselectedoutof
thatsetoncethetrueparametervaluesbecomeknown.Theadvantageofparametricqueryoptimizationisthat
optimization(whichisingeneralaveryexpensiveoperation)isavoidedatruntime.

MultiObjectiveQueryOptimization
Thereareoftenothercostmetricsinadditiontoexecutiontimethatarerelevanttocomparequeryplans[1](http
s://www.youtube.com/watch?v=EZ9FHvOJ0Ws).Inacloudcomputingscenarioforinstance,oneshouldcompare
queryplansnotonlyintermsofhowmuchtimetheytaketoexecutebutalsointermsofhowmuchmoneytheir
executioncosts.Orinthecontextofapproximatequeryoptimization,itispossibletoexecutequeryplanson
randomlyselectedsamplesoftheinputdatainordertoobtainapproximateresultswithreducedexecution
overhead.Insuchcases,alternativequeryplansmustbecomparedintermsoftheirexecutiontimebutalsoin
termsoftheprecisionorreliabilityofthedatatheygenerate.
Multiobjectivequeryoptimization[3]modelsthecostofaqueryplanasacostvectorwhereeachvector
componentrepresentscostaccordingtoadifferentcostmetric.Classicalqueryoptimizationcanbeconsideredasa
specialcaseofmultiobjectivequeryoptimizationwherethedimensionofthecostspace(i.e.,thenumberofcost
vectorcomponents)isone.

https://en.wikipedia.org/wiki/Query_optimization

3/5

9/8/2016

QueryoptimizationWikipedia,thefreeencyclopedia

Differentcostmetricsmightconflictwitheachother(e.g.,theremightbeoneplanwithminimalexecutiontime
andadifferentplanwithminimalmonetaryexecutionfeesinacloudcomputingscenario).Therefore,thegoalof
optimizationcannotbetofindaqueryplanthatminimizesallcostmetricsbutmustbetofindaqueryplanthat
realizesthebestcompromisebetweendifferentcostmetrics.Whatthebestcompromiseisdependsonuser
preferences(e.g.,someusersmightpreferacheaperplanwhileotherspreferafasterplaninacloudscenario).The
goalofoptimizationisthereforeeithertofindthebestqueryplanbasedonsomespecificationofuserpreferences
providedasinputtotheoptimizer(e.g.,userscandefineweightsbetweendifferentcostmetricstoexpressrelative
importanceordefinehardcostboundsoncertainmetrics)ortogenerateanapproximationofthesetofPareto
optimalqueryplans(i.e.,planssuchthatnootherplanhasbettercostaccordingtoallmetrics)suchthattheuser
canselectthepreferredcosttradeoffoutofthatplanset.

MultiObjectiveParametricQueryOptimization
Multiobjectiveparametricqueryoptimization[1]generalizesparametricandmultiobjectivequeryoptimization.
Plansarecomparedaccordingtomultiplecostmetricsandplancostsmaydependonparameterswhosevaluesare
unknownatoptimizationtime.Thecostofaqueryplanisthereforemodeledasafunctionfromamulti
dimensionalparameterspacetoamultidimensionalcostspace.Thegoalofoptimizationistogeneratethesetof
queryplansthatcanbeoptimalforeachpossiblecombinationofparametervaluesanduserpreferences.

Seealso
Joinselectionfactor
Sargablequery

References
1.Trummer,ImmanuelKoch,Christoph(2015)."MultiObjectiveParametricQueryOptimization".VLDB:221232.
2.Ioannidis,YannisNg,RaymondT.Shim,KyuseokSellis,TimosK.(1997)."ParametricQueryOptimization".VLDB:
132151.
3.Trummer,ImmanuelKoch,Christoph(2014).ApproximationSchemesforManyObjectiveQueryOptimization.
SIGMOD.pp.12991310.

Chaudhuri,Surajit(1998)."AnOverviewofQueryOptimizationinRelationalSystems".Proceedingsofthe
ACMSymposiumonPrinciplesofDatabaseSystems.pp.3443.doi:10.1145/275487.275492.
Ioannidis,Yannis(March1996)."Queryoptimization".ACMComputingSurveys.28(1):121123.
doi:10.1145/234313.234367.
Selinger,P.G.Astrahan,M.M.Chamberlin,D.D.Lorie,R.A.Price,T.G.(1979)."AccessPath
SelectioninaRelationalDatabaseManagementSystem".Proceedingsofthe1979ACMSIGMOD
InternationalConferenceonManagementofData.pp.2334.doi:10.1145/582095.582099.
ISBN089791001X.

Externallinks
SQLqueryoptimizationtool(http://sqltuning.com)
TalkonMultiObjectiveQueryOptimization(https://www.youtube.com/watch?v=EZ9FHvOJ0Ws)
Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Query_optimization&oldid=733922708"
Categories: Databasemanagementsystems Databasealgorithms SQL
https://en.wikipedia.org/wiki/Query_optimization

4/5

9/8/2016

QueryoptimizationWikipedia,thefreeencyclopedia

Thispagewaslastmodifiedon10August2016,at23:55.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.
Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.Wikipediaisaregisteredtrademark
oftheWikimediaFoundation,Inc.,anonprofitorganization.

https://en.wikipedia.org/wiki/Query_optimization

5/5

You might also like