sql indexes - advanced sql _ bipp analytics
sql indexes - advanced sql _ bipp analytics
SQL Indexes
An SQL index is an independently stored data structure that belongs to a table. A table IN THIS PAGE
can have zero or more indexes. When used appropriately, an SQL index can speed up
your query. SQL indexes are primarily used with WHERE, JOIN or ORDER BY clauses Cost of SQL
Indexes
to optimize the access of few records. Indexes are not typically used in analytical Verify Indexes are
environments where queries access big portions of the involved tables. being Used
SQL Indexes
In these examples, there is an index on table population with a key on the Based on
Expressions
last_name column. This query uses the index:
Query
Refactorization
Copy Reduce the
Record Quantity
SELECT ssn, last_name Avoid Correlated
Subqueries
FROM population
Consider a
WHERE last_name = ‘Smith1234567’ Materialized View
Add Directives to
the Planner
This query cannot use the index as the WHERE clause is using the first_name Closing Words
column:
Copy
Before creating an index, evaluate the cost benefit relationship. You should only create
an index where the benefit exceeds the cost. To identify the indexes that have the best
benefit:
Copy
In the EXPLAIN output you see it ran a Sequential Scan instead of the expected Index
Scan on the last_name column. What happened? The problem is the lower()
function. You cannot have a function or operation on a column if you want to use it as
an index.
Copy
To solve this limitation, some databases offer a special Functional index to support an
expression as the index key.
In the PostgreSQL dialect you can use CREATE INDEX to create a functional index
using the expression lower(last_name):
Copy
Query Refactorization
In SQL, there are many ways to write a query to return the same results. You can
refactor a slow query to use different clauses or conditions while returning the same
results.
Here are some refactor techniques to try and improve query performance.
In these examples, the query is used to obtain the first last_name in each zip_code.
This example uses a correlated subquery to obtain the zip_code for cities with a
population of 1000 or above. The execution time is 1 minute 37 seconds
Here is the refactored query without using the correlated subquery. The execution time
is 17.2 seconds.
For this region population example, data refreshed daily is highly accurate.
This view groups any zip_code that starts with the same 3 digits and obtains the
population in the region:
Copy
This is the query to obtain the population for the regions in the range City200 to
City250:
Copy
SELECT region, pop
FROM region_population
WHERE region BETWEEN ‘City200’ AND ‘Citi250’
When the query is executed, the view region_population records are calculated as part
of the query, adding extra time to the query execution time.
If you create a materialized view, the view records are not calculated each time the
query is run. This example creates the materialized view:
Copy
Copy
Some database engines support adding execution directives in the query. For example,
you can specify an index to use, or an index to not use, and what table to read first.
For example, this Oracle query includes a directive for which index to use:
Copy
Closing Words
In this lesson you learned about indexes, functional indexes, and techniques to refactor
your queries to take advantage of indexes. Keep going, learn SQL and increase your
skills!
Query Optimization
account!
Sign up for free
Platform Features Resources Learn Company Connect
Business Intelligence bipp Data Modeling Layer Blog Documentation About Request Demo
Embedded Analytics Visual SQL Data Explorer Reports bipp Tutorial Meet The Team Support
Professional Services Data Visualization Release Notes Why bipp? Careers Contact Us