SQL Query Optimization
and Indexing
Along the way
How is a db query executed
Schema optimization
Execution Plan
Indexing (Types of Indices)
Using indices
Lock Contention
Covering Indices
The DB Engine
How does a
database server
run a query ?
Server process
SQL Query
analysis
If new query?
Execution plan?
(Optimizer )
If cpu?
Table scan ?
index?
Require high
performance ??
Good optimized schema
Give indexes for specific
queries
Tradeoffs !!
Whats an Index??
adata structure
Retrieval
Inserts
Also
Denormalized db
=
faster 4 some
queries
+
slower for others
Choosing optimal Data Types
Smaller is better
Use less space
Require fewer CPU cycles
Simple is good
Integers easier to compare than characters
E.g.: Use MySQL built in types for date/time
Avoid NULL if possible
Harder for MySQL to optimize queries referring to
nullable columns
DATETIME and TIMESTAMP store same kind of data, but ??
String Types
VARCHAR and CHAR types
Their storage on disk is storage engine dependent
Usually the storage is different for disk, memory and
after retrieval from the storage engine
VARCHAR
Uses as much space as it needs
Uses 1 or 2 bytes extra for storing the length
1 byte if length up to 255 bytes, 2 for above 255 length
So VARCHAR(10) uses 11 bytes and VARCHAR(1000)
uses 1002 bytes
Improves performance as it saves space
Variableupdat
rows can
more
-
-
length
e
grow
work!!!
-
Use VARCHAR , max col length > avg length, updates are
rare
CHAR
Fixed length
For data changing frequently, char better than varchar
For very short columns, CHAR(1) = 1 BYTE and
VARCHAR(1) = 2 BYTES
The siblings of char and varchar are binary and
varbinary data types
Good for comparing as bytes that characters.
Comparing random strings
Strings produced by MD5(), SHA1() OR UUID().
Each new string generated will be distributed in arbitrary
ways over a large space
Can slow INSERT coz get inserted in a random loc in
indexes
They slow some SELECT queries as logically adjacent
rows will be widely dispersed in disk and memory
If you do store UUID values, you
should remove the dashes or, even
better, convert the UUID values to
16-byte numbers with UNHEX() and
store them in a BINARY(16) column.
You can retrieve the values in
hexadecimal format with the HEX()
function
IP Address
Usual case, use VARCHAR(15)
But, IP is really an unsigned 32 bit integer , not a string
Dotted-quad notation for humans to understand easily
MySQL provide INET_ATON() and INET_NTOA() fns to
convert btw 2 representations
The Execution Plan
Every SQL query is broken down in to series of
execution steps called as operators
Each operator performs basic operations like
insertion, search, scan, updation, aggregation etc.
There are 2 kinds of operators Logical operators
and physical operators.
Logical operators : describe how the execution
will be executed at a conceptual level
Physical operators : The actual logic / routine
which perform the action.
Checks
syntax
Query
process
or tree
is output
of parse
PARSE
OPTIMIZE
Calculate
cost and
gives out
estimated
plan and
an actual
plan
DATA STATISTICS
1. How many rows?
2. Unique data?
3. Does table span
over more than
one page?
EXECUTE
As per
plan
executio
n is
done
Into Indexing
TYPES
B-Tree Indexes
Hash Indexes
B-Tree
We use the term "B-Tree" for these indexes because
that's what MySQL uses in CREATE TABLE and other
statements
All the values are stored in order, and each leaf page is
the same distance from the root
Leaf nodes have pointers to the indexed data instead
of pointers to other pages
Because B-Trees store the indexed columns in order,
they're useful for searching for ranges of data
Hash Indexes
Built on hash tables and useful for exact lookups that
use every column in the index
Memory storage engine only supports this in MySQL
Forms hash codes of the indexed columns and stores a
pointer to each row in hash table
E.g. :
CREATE TABLE testhash (
fname VARCHAR(50) NOT NULL,
lname VARCHAR(50) NOT NULL,
KEY USING HASH(fname)
) ENGINE=MEMORY;
containing this data:
mysql> SELECT * FROM testhash
Fname
Darshan
Bijesh
Jophin
Vivek
lname
Raj
Chandran
Joseph
Babu
Suppose the index use a fn f(), which return following
values
f(Darshan) = 2323
f(Bijesh') = 7437
f(Jophin') = 8784
f('Vivek') = 2458
The index's data structure will look like this:
Slot
Value
2323
Pointer to row 1
2458
Pointer to row 4
7437
Pointer to row 2
8784
Pointer to row 3
A hash index on a TINYINT will be the same size as a hash
index on a large character, coz ???
the indexes store only the short hash values.
Non - Clustered Indexes
Data present in random order
Logical ordering specified by index
Typically created on column used in JOIN, WHERE and ORDER BY
Good for tables whose values may be modified frequently
Clustered Indexes
Data blocks arranged in order to match the index
Only one clustered index possible on a given table
Faster retrieval if data accessed in asc or desc order
MS SQL Server creates non-clustered
indices by default when CREATE INDEX is
given.
Using indices
Indexing the primary key
Usually automatically indexed to facilitate effective
information retrieval
Most effective access path
Other columns or combination of columns = secondary
index to improve performance in data retrieval
Secondary indexes
Indexes on other columns other than primary key
column
Create secondary indexes on tables that have more
reads than writes
Just copy of the db table but containing only the fields
specified in the index
Dont give more than 4 fields in an
index and more than 5 indexes for a
table. You are inviting trouble
otherwise !!
Index Column Order does matter !!
Not useful if lookup does not start from the leftmost side
of the indexed columns.
Cant skip columns in the index.
Join vs. Sub query
Join faster when we have less number of tables
Join faster when we have less data in tables
Sub query faster when there are large number of tables
as joining more tables is tedious
Sub query faster when we have huge data in tables
Explaining the explain ??!
A way to obtain information about how MySQL executes a
SELECT statement
Syntax : Explain SELECT select_options
Returns a row of information for each table used in the
SELECT statement
These are the info that MySQL gives for each table
id
Selec
t_typ
e
Tabl
e
Type Possibl
e_
Keys
Key
Key_
Length
Ref
Row
s
Id : select identifier
Select_type : type of select (Simple, Primary , Union ,
Dependent Union, Subquery, Dependent Subquery etc)
extr
a
Table : table to which row output refers
type : The join type (important)
possible keys : The possible indexes that can be used for
the query
keys : The indexes used in the query
rows : no: of rows scanned
Lock Contention??
1) DELETE FROM user WHERE
status = 9
Fully scan user
table, deleting if
status = 9;
User_id
(PK)
Name
status
100
What happens if query
1 does not lock row:
user_id = 100 ?
100000
2) UPDATE user SET status=9 WHERE
user_id =100
DATA CONSISTENCY IS
BROKEN !!
If STATUS column is
indexed
1) DELETE FROM user WHERE
status = 9
Status
PK
100
101
12345
100000
1) And 2) can run in
parallel
(CONCURRENCY
IMPROVED)
User_id
(PK)
Name
status
Roger
100
Rafael
01
100000
Andy
2) UPDATE user SET status=9 WHERE
user_id =100
Covering Index??
DB Engine ??
The underlying software component
that a database management
system(DBMS) uses to create , read ,
update , delete (CRUD) data from a
database
MySQL has InnoDB and MyISAM
InnoDB = transactional
MyISAM = non-transactional
InnoDB create a Clustered Index for
every table. If it has a primary key,
that is the clustered index. If not, it
created a six-byte unique ID and
makes it the clustered index.
All Indexes are B-Trees. The Primary
keys leaf nodes are the data.
References
High Performance MySQL Steven Feuerstein
Mastering the art of Indexing - Yoshinori Matsunobu
http://www.codeproject.com
http://www.databasejournal.com
SQL Best Practices Video Journal by Steven Feuerstein
MySQL 5.0 Reference manual
THANK YOU