0% found this document useful (0 votes)
11 views8 pages

17 Ways To Speed Your SQL Queries

The document provides 17 strategies for SQL developers to enhance the performance of their queries and avoid common pitfalls that can lead to inefficiencies. It emphasizes the importance of understanding production issues, using best practices, and being mindful of resource management when writing SQL code. Key recommendations include avoiding unnecessary data retrieval, using temporary tables, batching updates and deletes, and being cautious with nested views and ORMs.

Uploaded by

Deniz Yaşar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

17 Ways To Speed Your SQL Queries

The document provides 17 strategies for SQL developers to enhance the performance of their queries and avoid common pitfalls that can lead to inefficiencies. It emphasizes the importance of understanding production issues, using best practices, and being mindful of resource management when writing SQL code. Key recommendations include avoiding unnecessary data retrieval, using temporary tables, batching updates and deletes, and being cautious with nested views and ORMs.

Uploaded by

Deniz Yaşar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

InfoWorld.

com DEEP DIVE SERIES 2

17 ways to speed
your SQL queries
It’s easy to create database code that slows down query results or
ties up the database unnecessarily -- unless you follow these tips.
BY SEAN MCCOWN

SQL developers on every platform are


struggling, seemingly stuck in a DO WHILE
loop that makes them repeat the same
mistakes again and again. That’s because the
database field is still relatively immature. Sure,
vendors are making some strides, but they
continue to grapple with the bigger issues.
Concurrency, resource management, space
management, and speed still plague SQL
developers whether they’re coding on SQL
Server, Oracle, DB2, Sybase, MySQL, or any
other relational platform.
Part of the problem is that there is no
magic bullet, and for almost every best
practice, I can show you at least one
exception. Typically, a developer finds
his or her own favorite methods
-- though usually they don’t
include any constructs for
performance or concurrency
-- and doesn’t bother
exploring other options.
Maybe that’s a symptom
of lack of education, or the developers
are just too close to the process to recognize
when they’re doing something wrong. Maybe
the query runs well on a local set of test data
but fails miserably on the production system.
I don’t expect SQL developers to become
administrators, but they must take production
issues into account when writing their code.
If they don’t do it during initial development,
the DBAs will just make them go back and do
it later -- and the users suffer in the interim.
There’s a reason why we say tuning a
database is both an art and a science. It’s
PA
S QA
L SU N L E A S H E D InfoWorld.com DEEP DIVE SERIES 3

Deep Dive

because very few hard-and-fast rules


exist that apply across the board. The
problems you’ve solved on one system
aren’t issues on another, and vice versa.
There’s no right answer when it comes to
tuning queries, but that doesn’t mean
you should give up.
There are some good principles
you can follow that should yield
results in one combination or
another. I’ve encapsulated them
in a list of SQL dos and don’ts
that often get overlooked or are hard to
spot. These techniques should give you a
little more insight into the minds of your
DBAs, as well as the ability to start thinking of
processes in a production-oriented way.

1 Don’t use UPDATE instead


of CASE
This issue is very common, and though
it’s not hard to spot, many developers often
overlook it because using UPDATE has a natural
copy someone else’s code because you know
it pulls the data you need. The problem is that
quite often it pulls much more data than you
flow that seems logical. need, and developers rarely bother trimming it
Take this scenario, for instance: You’re down, so they end up with a huge superset of
inserting data into a temp table and need it to data. This usually comes in the form of an extra
display a certain value if another value exists. outer join or an extra condition in the WHERE
Maybe you’re pulling from the Customer table clause. You can get huge performance gains if
and you want anyone with more than $100,000 you trim reused code to your exact needs.

3
in orders to be labeled as “Preferred.” Thus,
you insert the data into the table and run an
These tech- UPDATE statement to set the CustomerRank
niques should column to “Preferred” for anyone who has more Do pull only the number
give you a little than $100,000 in orders. The problem is that of columns you need
more insight the UPDATE statement is logged, which means This issue is similar to issue No. 2, but it’s specific
into the minds it has to write twice for every single write to the to columns. It’s all too easy to code all your
of your DBAs, table. The way around this, of course, is to use queries with SELECT * instead of listing the
as well as the an inline CASE statement in the SQL query itself. columns individually. The problem again is that
ability to start This tests every row for the order amount condi- it pulls more data than you need. I’ve seen this
thinking of tion and sets the “Preferred” label before it’s error dozens and dozens of times. A developer
processes in written to the table. The performance increase does a SELECT * query against a table with
a production- can be staggering. 120 columns and millions of rows, but winds up
oriented way.

2
using only three to five of them. At that point,
you’re processing so much more data than
you need it’s a wonder the query returns at all.
Don’t blindly You’re not only processing more data than you
reuse code need, but you’re also taking resources away from
This issue is also very common. It’s very easy to other processes.
InfoWorld.com DEEP DIVE SERIES 4

4 Don’t double-dip
Here’s another one I’ve seen more
times than I should have: A stored proce-
dure is written to pull data from a table with
5 Do know when to use
temp tables
This issue is a bit harder to get a handle on, but
it can yield impressive gains. You can use temp
hundreds of millions of rows. The developer tables in a number of situations, such as keeping
needs customers who live in California and have you from double-dipping into large tables.
incomes of more than $40,000. So he queries You can also use them to greatly decrease the
for customers that live in California and puts processing power required to join large tables.
the results into a temp table; then he queries If you must join a table to a large table and there’s
for customers with incomes above $40,000 a condition on that large table, you can improve
and puts those results into another temp table. performance by pulling out the subset of data you
Finally, he joins both tables to get the final need from the large table into a temp table and
product. joining with that instead. This is also helpful (again)
Are you kidding me? This should be done in if you have several queries in the procedure that
Don’t be a a single query; instead, you’re double-dipping a have to make similar joins to the same table.
moron: Query

6
superlarge table. Don’t be a moron: Query large
large tables tables only once whenever possible -- you’ll find
only once how much better your procedures perform.
whenever A slightly different scenario is when a subset Do pre-stage data
possible -- of a large table is needed by several steps in This is one of my favorite topics
you’ll find how a process, which causes the large table to be because it’s an old technique that’s often over-
much better queried each time. Avoid this by querying for the looked. If you have a report or a procedure (or
your proce- subset and persisting it elsewhere, then pointing better yet, a set of them) that will do similar joins
dures perform. the subsequent steps to your smaller data set. to large tables, it can be a benefit for you to
pre-stage the data by joining the tables ahead of
time and persisting them into a table. Now the
reports can run against that pre-staged table and
avoid the large join.
You’re not always able to use this technique,
but when you can, you’ll find it is an excellent
way to save server resources.
Note that many developers get around this
join problem by concentrating on the query
itself and creating a view-only around the join
so that they don’t have to type the join condi-
tions again and again. But the problem with this
approach is that the query still runs for every
report that needs it. By pre-staging the data, you
run the join just once (say, 10 minutes before the
reports) and everyone else avoids the big join. I
can’t tell you how much I love this technique; in
most environments, there are popular tables that
get joined all the time, so there’s no reason why
they can’t be pre-staged.
DEEP DIVE SERIES 5

7 Do delete and
update
in batches
Here’s another easy technique that gets
overlooked a lot. Deleting or updating
large amounts of data from huge tables
can be a nightmare if you don’t do it
right. The problem is that both of these
statements run as a single transaction,
and if you need to kill them or if some-
Along these thing happens to the system while they’re
lines, many working, the system has to roll back the
developers entire transaction. This can take a very
have it stuck long time. These operations can also
in their heads block other transactions for their dura-
that these tion, essentially bottlenecking the system.
delete and The solution is to do deletes or other operations for a lot longer than is neces-
update opera- updates in smaller batches. This solves your sary. This greatly decreases concurrency in your
tions must be problem in a couple ways. First, if the transac- system.
completed tion gets killed for whatever reason, it only has However, you can’t always avoid using
the same a small number of rows to roll back, so the cursors, and when those times arise, you may be
day. That’s database returns online much quicker. Second, able to get away from cursor-induced perfor-
not always while the smaller batches are committing to mance issues by doing the cursor operations
true, espe- disk, others can sneak in and do some work, so against a temp table instead. Take, for example,
cially if you’re concurrency is greatly enhanced. a cursor that goes through a table and updates
archiving. Along these lines, many developers have a couple of columns based on some comparison
it stuck in their heads that these delete and results. Instead of doing the comparison against
update operations must be completed the same the live table, you may be able to put that data
day. That’s not always true, especially if you’re into a temp table and do the comparison against
archiving. You can stretch that operation out as that instead. Then you have a single UPDATE
long as you need to, and the smaller batches statement against the live table that’s much
help accomplish that. If you can take longer to smaller and holds locks only for a short time.
do these intensive operations, spend the extra Sniping your data modifications like this can
time and don’t bring your system down. greatly increase concurrency. I’ll finish by saying
you almost never need to use a cursor. There’s
almost always a set-based solution; you need to

8 9
learn to see it.

Do use temp tables Don’t nest views


to improve cursor Views can be convenient, but you
performance need to be careful when using them. While views
I hope we all know by now that it’s best to can help to obscure large queries from users
stay away from cursors if at all possible. Cursors and to standardize data access, you can easily
not only suffer from speed problems, which find yourself in a situation where you have views
in itself can be an issue with many operations, that call views that call views that call views. This
but they can also cause your operation to block is called nesting views, and it can cause severe
DEEP DIVE SERIES 6

performance issues, particularly in two ways. First, Not everyone will be able to take advantage
you will very likely have much more data coming of this tip, which relies on partitioning in SQL
back than you need. Second, the query optimizer Server Enterprise, but for those of you who
will give up and return a bad query plan. can, it’s a great trick. Most people don’t realize
I once had a client that loved nesting views. that all tables in SQL Server are partitioned. You
The client had one view it used for almost every- can separate a table into multiple partitions if
thing because it had two important joins. The you like, but even simple tables are partitioned
problem was that the view returned a column from the time they’re created; however, they’re
with 2MB documents in it. Some of the docu- created as single partitions. If you’re running SQL
ments were even larger. The client was pushing Server Enterprise, you already have the advan-

This is at least an extra 2MB across the network for tages of partitioned tables at your disposal.

one of my every single row in almost every single query it This means you can use partitioning features

favorite ran. Naturally, query performance was abysmal. like SWITCH to archive large amounts of data

tricks of And none of the queries actually used that from a warehousing load. Let’s look at a real

all time column! Of course, the column was buried seven example from a client I had last year. The client

because it views deep, so even finding it was difficult. When had the requirement to copy the data from the

is truly one I removed the document column from the view, current day’s table into an archive table; in case

of those the time for the biggest query went from 2.5 the load failed, the company could quickly recover

hidden hours to 10 minutes. When I finally unraveled the with the current day’s table. For various reasons,

secrets nested views, which had several unnecessary joins it couldn’t rename the tables back and forth every

that only and columns, and wrote a plain query, the time time, so the company inserted the data into an

the experts for that same query dropped to subseconds. archive table every day before the load, then

know. When

10
deleted the current day’s data from the live table.
This process worked fine in the beginning,
you use a scalar but a year later, it was taking 1.5 hours to copy
function in the Do use table- each table -- and several tables had to be copied
SELECT list of a valued functions every day. The problem was only going to get
query, the func- This is one of my favorite tricks of all time worse. The solution was to scrap the INSERT and
tion gets called because it is truly one of those hidden secrets DELETE process and use the SWITCH command.
for every single that only the experts know. When you use a The SWITCH command allowed the company
row in the scalar function in the SELECT list of a query, the to avoid all of the writes because it assigned the
result set. function gets called for every single row in the pages to the archive table. It’s only a metadata
result set. This can reduce the performance of change. The SWITCH took on average between
large queries by a significant amount. However, two and three seconds to run. If the current load
you can greatly improve the performance by ever fails, you SWITCH the data back into the
converting the scalar function to a table-valued original table.
function and using a CROSS APPLY in the query. This is a case where understanding that all
This is a wonderful trick that can yield great tables are partitions slashed hours from a data

12
improvements. load.
Want to know more about the APPLY
operator? You’ll find a full discussion in an excel-
lent course on Microsoft Virtual Academy by Itzik If you must use
Ben-Gan. ORMs, use stored
procedures

11
This is one of my regular diatribes. In short, don’t
use ORMs (object-relational mappers). ORMs
Do use partitioning produce some of the worst code on the planet,
to avoid large data and they’re responsible for almost every perfor-
moves mance issue I get involved in. ORM code genera-
DEEP DIVE SERIES 7

tors can’t possibly write SQL as well as a person database, but what can I say other than you’re
who knows what they’re doing. However, if you outright wrong. By putting the business logic
use an ORM, write your own stored procedures on the front end of the application, you have to
and have the ORM call the stored procedure bring all of the data across the wire merely to
instead of writing its own queries. Look, I know compare it. That’s not good performance. I had
all the arguments, and I know that developers a client earlier this year that kept all of the logic
and managers love ORMs because they speed out of the database and did everything on the
you to market. But the cost is incredibly high front end. The company was shipping hundreds
when you see what the queries do to your data- of thousands of rows of data to the front end,
base. so it could apply the business logic and present
Stored procedures have a number of the data it needed. It took 40 minutes to do
advantages. For starters, you’re pushing much that. I put a stored procedure on the back end
less data across the network. If you have a and had it call from the front end; the page
long query, then it could take three or four loaded in three seconds.
round trips across the network to get the Of course, the truth is that sometimes the
This is one of entire query to the database server. That’s not logic belongs on the front end and sometimes it
my regular including the time it takes the server to put the belongs in the database. But ORMs always get
diatribes. In

13
query back together and run it, or considering me ranting.
short, don’t use that the query may run several -- or several
ORMs (object- hundred -- times a second.
relational Using a stored procedure will greatly Don’t do large ops
mappers). reduce that traffic because the stored proce- on many tables in
ORMs dure call will always be much shorter. Also, the same batch
produce some stored procedures are easier to trace in Profiler This one seems obvious, but apparently it’s not.
of the worst or any other tool. A stored procedure is an I’ll use another live example because it will drive
code on the actual object in your database. That means it’s home the point much better. I had a system
planet, and much easier to get performance statistics on that suffered tons of blocking. Dozens of opera-
they’re respon- a stored procedure than on an ad-hoc query tions were at a standstill. As it turned out, a
sible for almost and, in turn, find performance issues and draw
every perfor- out anomalies.
mance issue I In addition, stored procedures parameterize
get involved in. more consistently. This means you’re more
likely to reuse your execution plans and even
deal with caching issues, which can be
difficult to pin down with ad-hoc queries.
Stored procedures also make it much
easier to deal with edge cases and
even add auditing or change-locking
behavior. A stored procedure can
handle many tasks that trouble ad-hoc
queries. My wife unraveled a two-page
query from Entity Framework a couple
of years ago. It took 25 minutes to run.
When she boiled it down to its essence,
she rewrote that huge query as SELECT
COUNT(*) from T1. No kidding.
OK, I kept it as short as I could. Those are
the high-level points. I know many .Net coders
think that business logic doesn’t belong in the
InfoWorld.com DEEP DIVE SERIES 8

delete routine that ran several times a day was bunch of data into one table with a clustered
deleting data out of 14 tables in an explicit trans- GUID and into another table with an IDENTITY
action. Handling all 14 tables in one transaction column. The GUID table fragmented so severely
meant that the locks were held on every single that the performance degraded by several thou-
table until all of the deletes were finished. The sand percent in a mere 15 minutes. The IDENTITY
solution was to break up each table’s deletes into table lost only a few percent off performance
separate transactions so that each delete trans- after five hours. This applies to more than GUIDs
action held locks on only one table. This freed up -- it goes toward any volatile column.

16
the other tables and reduced the blocking and
allowed other operations to continue working.
You always want to split up large transactions
like this into separate smaller ones to prevent Don’t count all
blocking. rows if you only
need to see if data exists

14
It’s a common situation. You need to see if data
exists in a table or for a customer, and based
Don’t use triggers on the results of that check, you’re going to
This one is largely the same as perform some action. I can’t tell you how often
Don’t use the previous one, but it bears mentioning. Don’t I’ve seen someone do a SELECT COUNT(*) FROM
triggers use triggers unless it’s unavoidable -- and it’s dbo.T1 to check for the existence of that data:
unless it’s almost always avoidable.
unavoid- The problem with triggers: Whatever it is SET @CT = (SELECT COUNT(*) FROM
able -- and you want them to do will be done in the same dbo.T1);
it’s almost transaction as the original operation. If you write If @CT > 0
always a trigger to insert data into another table when BEGIN
avoidable. you update a row in the Orders table, the lock <Do something>
will be held on both tables until the trigger is END
done. If you need to insert data into another
table after the update, then put the update and It’s completely unnecessary. If you want to
the insert into a stored procedure and do them check for existence, then do this:
in separate transactions. If you need to roll back,
you can do so easily without having to hold locks If EXISTS (SELECT 1 FROM dbo.T1)
on both tables. As always, keep transactions as BEGIN
short as possible and don’t hold locks on more <Do something>
than one resource at a time if you can help it. END

15
Don’t count everything in the table. Just get
back the first row you find. SQL Server is smart
Don’t cluster on enough to use EXISTS properly, and the second
GUID block of code returns superfast. The larger the
After all these years, I can’t believe we’re still table, the bigger difference this will make. Do
fighting this issue. But I still run into clustered the smart thing now before your data gets too
GUIDs at least twice a year. big. It’s never too early to tune your database.
A GUID (globally unique identifier) is a 16-byte In fact, I just ran this example on one of my
randomly generated number. Ordering your production databases against a table with 270
table’s data on this column will cause your table million rows. The first query took 15 seconds,
to fragment much faster than using a steadily and included 456,197 logical reads, while the
increasing value like DATE or IDENTITY. I did a second one returned in less than one second and
benchmark a few years ago where I inserted a included only five logical reads. However, if you
DEEP DIVE SERIES 9

This query can easily be rewritten like this:

SELECT * FROM Customers WHERE


RegionID < 3 UNION ALL SELECT *
FROM Customers WHERE RegionID

This query will use an index, so if your data


set is large it could greatly outperform the
table scan version. Of course, nothing
is ever that easy, right? It could also
perform worse, so test this before
you implement it. There are too many
factors involved for me to tell you that it will
work 100 percent of the time. Finally, I realize
this query breaks the No. 4 rule, “No Double
Dipping,” but that goes to show there are
no hard and fast rules. Though we’re double
dipping here, we’re doing it to avoid a costly
table scan.
OK, there you go. You won’t be able to apply
all of these tips all of the time, but if you keep
them in mind you’ll find yourself using them as
really do need a row count on the table, and it’s solutions to some of your biggest issues. The
really big, another technique is to pull it from the most important thing to remember is not to take
system table. SELECT rows from sysindexes anything I say as the gospel and implement it
will get you the row counts for all of the indexes. because I said so. Test everything in your environ-
And because the clustered index represents the ment, then test it again. The same solutions
data itself, you can get the table rows by adding won’t work in every situation. But these are
WHERE indid = 1. Then simply include the tactics I use all the time when addressing poor
table name and you’re golden. So the final query performance, and they have all served me well
is SELECT rows from sysindexes where time and again. n
object_name(id) = ‘T1’ and indexid =
1. In my 270 million row table, this returned sub- Sean McCown is a Certified Master in SQL
second and had only six logical reads. Now that’s Server and a SQL Server MVP with 20 years of
performance. experience in databases.

17 Don’t do negative
searches
Take the simple query SELECT * FROM
Customers WHERE RegionID <> 3. You
can’t use an index with this query because it’s a
negative search that has to be compared row by
row with a table scan. If you need to do some-
thing like this, you may find it performs much
better if you rewrite the query to use the index.

You might also like