Tips For Writing SQL Query

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 12

1 Database Coding Guidelines

1.1 Overview It is necessary to have coding guidelines and standards for any programming language you use within a corporate environment. Database programming languages, such as SQL, L!SQL are no e"ceptions. #ithout guidelines, each developer uses different coding styles, resulting difficult understood codes and maintenance difficulties. $ther than code maintenance issue, database%coding style has direct relationship to your application&s performance. 'our database coding techni(ues and SQL program design will affect your application performance a great deal. ) SQL program was running for *+ hours may be rewritten to run for * hour. ,his Guidelines focus on SQL coding against $racle -D./S engine. It applies to both client side 0such as ower.uilder coding1 and server side 0 L!SQL, ro2C1 programming environment. It is organi3ed to five components. 4umerous e"amples are included here to demonstrate good SQL coding style and program design. ,wo general format rules are used in this Guidelines and are also recommended to developers5 6eywords in SQL or L!SQL should be uppercased7 all other items 0e"cept for case% sensitive selection criteria1 should be low%cased. Comments should be in sentence case. )ll column names must be (ualified with the table name in front of the column name. 8or e"ample5 S9L9C, table*.column* instead of S9L9C, column*. 1.2 Optimizing SQL It isn&t hard to write SQL. .ut it is not that easy to write high performance SQL. ,here are many factors which may affect your SQL performance, the database server configurations, database design, inde"es, $racle optimi3er, and your SQL statements. ,his section discusses how to write SQL for better performance and how to tune your SQL using $racle&s utilities, 9: L)I4 L)4 and SQL ,race!,6 -$8. 1.2.1 Index Usage Inde" usage will affect your SQL statement performance. So, understanding inde" usage will help you write good SQL. Inde"es provides a faster path of access to the data in the database by using pointers to the data. Inde"es can be useful to avoid table scans, established upper!lower bounds for reading data, avoiding a sort, and perhaps avoiding table access altogether. Inde"es should be chosen based on the ;inds of (ueries you will be doing. In G9C8S environment, application developers first identify the most (ueries characters, follow the guidelines below to identify the candidates for inde" creation. ,hen wor; with your pro<ect )pplication D.) to create the proper inde"es. 'ou may find yourself setting up inde"es during program development and going through a process of investigating their usage, dropping some, and perhaps setting up new ones.

,CS I4,9-4)L

,he big (uestion is5 #hat columns to inde"= In general, the following are the guidelines for creating inde"es5 Columns that are most fre(uently used in the &where& clause and used in a particular order should be chosen as inde" columns. ,his is usually the primary ;ey of the table. Columns that are used in &<oins>, &order by&, and &group by& should be inde"ed. Columns that are accessed se(uentially or by ranges in the &where& clause should be inde"ed. Columns with low selectivity. Selectivity is the percent of rows in a table that have the same value. In order to figure out columns selectivity, you have to ;now your data. Inde"es defined on multiple columns will only be useful if the first column0s1 in the inde" are specified in the &where& clause.

8or e"ample, a uni(ue inde" defined on the columns 0column*, column?, column@1 would only be useful if the &where& clause contained column*. If the &where& clause contained column? without column*, the inde" would not be used. In order to use an inde", a secondary inde" would have to be created on column?, or the order of the composite inde" would have to be changed to have column? as the first inde"ed column. ,here is a trade%off between &select& performance and &insert, delete and update& performance with inde"es. Inde"es will be updated for every insert, update, and delete performed. #eigh the overhead of additional inde"es against accessing the data through an e"isting inde". Do not inde" the columns which are always included in functions 0e.g., 8L$$-, ).S1 or string manipulation and conversion 0e.g., SA.S,- or ,$BCC)-1, even if those columns contain a wide range of values.

,CS I4,9-4)L

1.2.2 Tips for SELECT Statement S9L9C, is the most widely used SQL statement. It is used for retrieving data from the database. ,here are some tips the programmers need to ;eep in mind when writing the S9L9C, statement in order to achieve the best performance at run time. 1.2.2.1 Avoid the OR operator In some cases, $racle will not use an inde" if the condition in the #C9-9 clause contains the $- operator. ,hese $- statements can be rewritten in two ways. In certain circumstances, we can replace the condition with one containing an I4 operator, or replace the whole statement with two S9L9C, statements lin;ed with A4I$4. 8or e"ample,
SELECT customer.customer_name FROM customer WHERE customer.customer_id = 1002 OR customer.customer_id = 1004 OR customer.customer_id = 100

If there is an inde" on customerBid, $racle will use it. .ut $racle will create a more efficient path if we rewrite the statement to5
SELECT customer.customer_name FROM customer WHERE customer.customer_id !" #1002$ 1004$ 100 %

8or A D),9 and D9L9,9 statements, the same applies. )nother e"ample,
SELECT customer.customer_name$ customer.customer_address FROM customer WHERE customer.ser&ice_date = '10()*L( +' OR customer.cit, = 'Minnea-o.is'

$racle will use se(uential search path even if the presence of inde"es on the serviceBdate and city. #e cannot rewrite this e"ample using I4. .ut we can rewrite it in A4I$4.
SELECT customer.customer_name$ customer.customer_address FROM customer WHERE customer.ser&ice_date = '10()*L( +' *"!O" SELECT customer.customer_name. customer.customer_address FROM customer WHERE customer.cit, = 'Minnea-o.is'

A D),9 and D9L9,9 li;e this cannot adapted using the A4I$4. In such case, two separate statements are necessary5 ,CS I4,9-4)L

8or e"ample,
*/01TE customer SET customer.ser&ice_2ee = 200 WHERE customer.ser&ice_2ee = 0 OR customer.ser&ice_date = '20()*L( +'

)nother formulation5
*/01TE customer SET customer.ser&ice_2ee = 200 WHERE customer.ser&ice_2ee = 0

)nd
*/01TE customer SET customer.ser&ice_2ee = 200 WHERE customer.ser&ice_date = '20()*L( +'

4ote5 A4I$4 automatically removes duplicate rows 0that is, DIS,I4C, is assumed1. 1.2.2.2 Use the BETWEE operator If a #C9-9 clause specifies values in a particular range using the )4D operator, $racle will generally not use an inde" efficiently. In such cases, we can replace the )4D operator with .9,#994 operator. 8or e"ample,
SELECT customer.customer_name FROM customer WHERE customer.ser&ice_date 3= '10()*L( +' 1"0 customer.ser&ice_date 4= '51(0EC( +'

)n inde" on the serviceBdate will not be used here, while the ne"t statement will be considered5
SELECT customer.customer_name FROM customer WHERE customer.ser&ice_date 6ETWEE" '10()*L( +' 1"0 '51(0EC( +'

1.2.2.! Avoid parti"#$ar %orms o% the L&'E operator If the mas; in the LI69 operator begins with a percent sign or an underscore character, the inde" cannot be used. 8or e"ample,
SELECT customer.customer_name$ customer_id FROM customer WHERE customer.customer_name L!7E '8"'

,CS I4,9-4)L

,he inde" will not be used, but unfortunately there is no alternative solution for this e"ample. 1.2.2.( Avoid the )A*& + "$a#se, #se W)ERE 'ou must always try to place as many conditions in the #C9-9 clause as possible and as few as possible in the C)DI4G clause. ,he reason is that inde"es are not used for conditions specified in C)DI4G clause. 8or e"ample,
SELECT a.contract_id$ CO*"T#9% FROM contract a :RO*/ 6; a.contract_id H1<!": a.contract_id 3= =0

,his condition in C)DI4G can be moved to #C9-9.


SELECT a.contract_id$ CO*"T#9% FROM contract a WHERE a.contract_id 3= =0 :RO*/ 6; a.contract_id

.ut not every condition can be moved to #C9-9 clause.

1.2.2.- Using the & TERSE.T and /& US operator $racle&s optimi3er processes the set operators I4,9-S9C, and /I4AS efficiently. So try to use these operators if possible and re(uired. 8or e"ample,
SELECT 0!ST!"CT contract.customer_id FROM contract WHERE contract.customer_id !" #SELECT customer.customer_id FROM customer%

-ewrite it to this is better5


SELECT contract.customer_id FROM contract !"TERSECT SELECT customer.customer_id FROM customer

1.2.2.0 Avoid 1&ST& .T


Specifying DIS,I4C, in the S9L9C, clause leads to the removal of duplicate rows from a result. ,his may have a negative effect on processing time. So, avoid DIS,I4C, if it is not re(uired or even superfluous. #hen the S9L9C, clause includes at least one uni(ue ;ey for each table specified in the 8-$/ clause, DIS,I4C, is superfluous. #hen the S9L9C, clause results in one row with values, DIS,I4C, is superfluous.

,CS I4,9-4)L

8or e"ample,
SELECT 0!ST!"CT a.contract_id$ c.customer_name FROM contract a$ customer c WHERE a.customer_id = c.customer_id

DIS,I4C, is unnecessary here, because the S9L9C, clause contains the primary ;ey of the contract table as well as a condition on the primary ;ey of the customer table. 1.2.2.2 Avoid data t3pe "onversions $racle automatically does data type conversion for you. Converting data types affects the performance. try to avoid this type of conversion. 8or e"ample, #C9-9 customerBid E &FF& 0customerBid is defined as 4A/.9- in DDL1 1.2.2.4 The sma$$est ta5$e as $ast #hen you do <oins, specify the table with smallest number of (ualifying rows last in the 8-$/ clause. 8or e"ample,
SELECT c.customer_id$ c.customer_name t.customer_t,-e_ind FROM customer_t,-e t$ customer c WHERE c.customer_t,-e_code = t.customer_t,-e_code

,his 8-$/ clause should be replaced by the following, because the ,9)/S table has less rows5 8-$/ customer c, customerBtype t .ut, if there is condition that restricts the number of rows from the customer table, you should specify the customer_type table first 0see below1. .ecause then customer is the smallest table, it is the table with the smallest number of (ualifying rows.
SELECT c.customer_id$ c.customer_name t.customer_t,-e_ind FROM customer_t,-e t$ customer c WHERE c.customer_t,-e_code = t.customer_t,-e_code 1"0 c.customer_id = 102

1.2.2.6 The ROW&1 ,he fastest S9L9C, statements are those where the #C9-9 clause contains a condition based on the so%called -$#ID. ,CS I4,9-4)L

1.2.2.17 The ROW U/ ,a;e advantage of -$#4A/. -$#4A/ is a special pseudo%column that e"ists for every result set. It refers to the relative row for a given (uery, before any $-D9- .' clause is applied. It is (uite useful for limiting number of rows returned. 8or e"ample,
SELECT CO*"T#9% FROM customer WHERE ROW"*M 4 100

$racle will select and return the first GG rows and the (uery will halt. If you have a name search on a large table, such as, selecting #C9-9 customerBname LI69 &SH& could easily return *II,III rows or more. 'ou can add this row%limit (ualifier to end the search when the upper limit is reached5
SELECT c.customer_name$ c.customer_address FROM customer c WHERE c.customer_name L!7E 'S8' 1"0 ROW"*M 4 1000

will return no more than GGG rows. /ore important, the (uery will return when the upper limit is reached, before e"ecuting any sorts. 1.2.2.11 .o#nting Rows %rom Ta5$es 9"ample, if you want to count rows from table CAS,$/9-, there are three ways to do it5 SQL *5 S9L9C, C$A4,0customerBid1 8-$/ customer !2 customerBid is uni(ue inde" column 2! SQL ?5 S9L9C, C$A4,021 8-$/ customer SQL @5 S9L9C, 0*1 8-$/ customer SQL * is the fastest one, SQL ? is the second fastest one, SQL @ is the slowest one. So, when counting rows, try to use inde"ed column. 1.2.2.12 Using Ta5$e A$iases Ase table aliases, and prefi" all column names by their aliases where there is more than one table involved in a (uery. ,his will reduce parse time and prevent synta" errors from occurring when ambiguously named columns are added later on.

1.2.2.1! Using OT E8&STS in 9$a"e o% OT & In sub%(uery statements such as the following, the 4$, I4 clause causes an internal sort!merge.
SELECT c.customer_name FROM customer c

,CS I4,9-4)L

WHERE customer_t,-e_code "OT !" #SELECT t.customer_t,-e_code FROM customer_t,-e t WHERE t.customer_t,-e_ind = '1'%

,o improve performance, replace this code with5


SELECT c.customer_name FROM customer c WHERE customer_t,-e_code "OT E>!STS #SELECT '>' FROM customer_t,-e t WHERE t.customer_t,-e_code = c.customer_t,-e_code 1"0 t.customer_t,-e_ind = '1'%

1.2.2.1( Using :oins in 9$a"e o% E8&STS In general, <oin tables rather than specifying sub%(ueries for them such as the following5
SELECT c.customer_name FROM customer c WHERE E>!STS #SELECT '>' FROM customer_t,-e t WHERE c.customer_t,-e_code = t.customer_t,-e_code 1"0 t.customer_t,-e_ind = '1'%

,o improve performance, specify5


SELECT c.customer_name FROM customer_t,-e t$ customer c WHERE t.customer_t,-e_code = c.customer_t,-e_code 1"0 t.customer_t,-e_ind = '1'

1.2.3 Using EXISTS in Place of DISTI CT )void <oins that re(uire the DIS,I4C, on the S9L9C, list when you define (uery to determine information at the owner end of a one%to%many relationship 0e.g. departments that have employees1. )n e"ample of such (uery is shown below5
SELECT 0!ST!"CT t.customer_t,-e_code$ t.customer_t,-e_name FROM customer_t,-e t$ customer c WHERE t.customer_t,-e_code = c.customer_t,-e_code

Ase 9:IS,S is a faster alternative, because when the sub(uery has been satisfied once, the (uery will be terminated.
SELECT t.customer_t,-e_code$ t.customer_t,-e_name FROM customer_t,-e t WHERE E>!STS #SELECT '>' FROM customer c

,CS I4,9-4)L

WHERE c.customer_t,-e_code = t.customer_t,-e_code%

1.2.!.1 Avoid .a$"#$ations on &nde;ed .o$#mns In general, avoid doing calculations on inde"ed columns. #hen the optimi3er encounters a calculation on an inde"ed column, it will not use the inde" and will perform a full%table scan instead. In this e"ample, the optimi3er does not use the inde"5
SELECT customer.customer_name FROM customer WHERE customer.ser&ice_2ee 9 10 3 2=000

Instead, code this way5


SELECT customer.customer_name FROM customer WHERE customer.ser&ice_2ee 3 2=000?10

lease note that the SQL functions /I4 and /): are e"ceptions to this rule and will utili3e all available inde"es. 1.2.!.2 Avoid OT <=>, ?@A on &nde;ed .o$#mns )void 4$, on inde"ed columns, because it will turn off inde" and perform a full% table scan. 8or e"ample, in the following case, an inde" will be used5 Do 4ot Ase5
SELECT customer.customer_name FROM customer WHERE customer.customer_id @= 0

Ase5
SELECT customer.customer_name FROM customer WHERE customer.customer_id 3 0

1.2.!.! Avoid #$$ in &nde;es 8irst, avoid using any column that contains a null as part of an inde", because null can never be e(uated or compared. $racle can never use an inde" to locate rows via a predicate such as IS 4ALL or IS 4$, 4ALL. 8or e"ample, the inde" will be used if you do this5
SELECT customer.customer_name FROM customer WHERE customer.customer_id 3 0

,CS I4,9-4)L

.ut, inde" will not used if you do this5


SELECT customer.customer_name FROM customer WHERE customer.customer_id !S "OT "*LLA

1.2.!.( Using W)ERE &nstead o% OR1ER BB $-D9- .' clauses use an inde" only if they meet two rigid re(uirements5 )ll of the columns that ma;e up the $-D9- .' clause must be contained within a single inde" in the same se(uence. ?. )ll of the columns that ma;e up the $-D9- .' clause must be defined as 4$, 4ALL within the table definition.
*.

,hese two re(uirements tend to rule out most inde"es. In some cases, you can rewrite the SQL by add a dummy #C9-9 clause instead of $-D9- .' to tric; the optimi3er to use the correct inde". 8or e"ample, table D9 , is defined as below, note that D9 ,B,' 9 is defined as allowing 4ALL, and there is a inde" over this column. ,able D9 ,
0E/T_CO0E /7 "OT "*LL 0E/T_0ESCR!/T!O" "OT "*LL 0E/T_T;/E "*LL "O" *"!B*E !"0E> #0E/T_T;/E%

In the following statement, the inde" will be used5


SELECT ... FROM 0E/T WHERE 0E/T_T;/E 3 0

In the following statement, the inde" will not be used5


SELECT ... FROM 0E/T OR0ER 6; 0E/T_T;/E EC-.ain /.an Buer, /.an (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( Sort Order 6,

,able )ccess 8ALL

,CS I4,9-4)L

1.2.!.- Other Beware o% the W)EREs 1A SUBSTR %#n"tion disa5$es the inde; when it is #sed over an inde;ed "o$#mnC
Do 4ot Ase5 S9L9C, customer.customerBname, customer.customerBid 8-$/ customer #C9-9 SA.S,-0customer.customerBname, *, J1 E &C) I,)L& Ase5 S9L9C, customer.customerBname, customer.customerBid 8-$/ customer #C9-9 customer.customerBname LI69 &C) I,)LH&

1.2.!.0 2A D#n"tion TRU . on $e%t side disa5$es inde;


Do 4ot Ase5 S9L9C, customer.customerBname, customer.serviceBdate

8-$/ customer #C9-9 ,-A4C0customer.serviceBdate1 E ,-A4C0S'SD),91 Ase5 S9L9C, customer.customerBname, customer.serviceBdate 8-$/ customer #C9-9 customer.serviceBdate .9,#994 ,-A4C0S'SD),91 )4D ,-A4C0S'SD),91 K .GGGGG 1.2.!.2 !A 1o not #se E E , it disa5$es inde;
Do 4ot Ase5 S9L9C, customer.customerBname, customer.serviceBdate 8-$/ customer #C9-9 customer.customerBname LL customer.customerBtype E &)/9:& Ase5 S9L9C, customer.customerBname, customer.serviceBdate 8-$/ customer

,CS I4,9-4)L

#C9-9 customer.customerBname E &)/9& )4D customer.customerBtype E &:&

1.2.!.4 (A 1o not #se same "o$#mn name on 5oth sides, whi"h disa5$es inde;
Do 4ot Ase5 S9L9C, customer.customerBname, customer.serviceBdate 8-$/ customer #C9-9 customer.customerBname E 4DL 05custBname M customer.customerBname1 Ase5 S9L9C, customer.customerBname, customer.serviceBdate 8-$/ customer #C9-9 customer.customerBname LI69 4DL05custBname,

1.2.!.6 .an"e$ Q#er3


'ou should enable your users to cancel (ueries. #hen returning multi%row (ueries, many client tools are able to display rows as they are returned. If a user finds the desired row on the first page, he or she can cancel the (uery, rather waiting for the entire result set. ,his can save considerable networ; traffic and database I!$.

,CS I4,9-4)L

You might also like