0% found this document useful (0 votes)
28 views

SQL Best Practicces

The document discusses SQL best practices for writing efficient queries. It covers practices for different clauses like WHERE, GROUP BY, HAVING, SELECT and ORDER BY. Key recommendations include filtering rows early with WHERE before aggregating, avoiding functions in WHERE clauses, ordering GROUP BY columns by descending cardinality, only using HAVING for filtering aggregates, specifying columns instead of '*' in SELECT, and avoiding unnecessary sorting. The document emphasizes readability practices like indentation, whitespace and using aliases. It also discusses optimizing queries for performance and cost.

Uploaded by

shivam5.aiesec
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

SQL Best Practicces

The document discusses SQL best practices for writing efficient queries. It covers practices for different clauses like WHERE, GROUP BY, HAVING, SELECT and ORDER BY. Key recommendations include filtering rows early with WHERE before aggregating, avoiding functions in WHERE clauses, ordering GROUP BY columns by descending cardinality, only using HAVING for filtering aggregates, specifying columns instead of '*' in SELECT, and avoiding unnecessary sorting. The document emphasizes readability practices like indentation, whitespace and using aliases. It also discusses optimizing queries for performance and cost.

Uploaded by

shivam5.aiesec
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

SQL Best Practices SQL Best

Practices
How does it function?

Execution Order of the Query


1. FROM (and JOIN) get(s) the tables referenced in the query. These tables represent
the maximum search space specified by your query. Where possible, restrict this
search space before moving forward.

2. WHERE filters data.

3. GROUP BY aggregates the data.

4. HAVING filters out aggregated data that doesn’t meet the criteria.

5. SELECT grabs the columns (then deduplicates rows if DISTINCT is invoked).

6. UNION merges the selected data into a result set.

7. ORDER BY sorts the results.

The idea is to reduce the data in the best possible early stage. This reduces the data
processed.

SQL Best Practices SQL Best Practices 1


***SQL best practices for WHERE****
Filter with WHERE before HAVING
Use a WHERE clause to filter superfluous rows, so you don’t have to compute those
values in the first place. Only after removing irrelevant rows, and after aggregating
those rows and grouping them, should you include a HAVING clause to filter out
aggregates.

***Avoid functions on columns in WHERE clauses**


Using a function on a column in a WHERE clause can really slow down your query, as the
function makes the query non-sargable (i.e., it prevents the database from using an
index to speed up the query). Instead of using the index to skip to the relevant rows, the
function on the column forces the database to run the function on each row of the table.
And remember, the concatenation operator || is also a function, so don’t get fancy
trying to concat strings to filter multiple columns. Prefer multiple conditions instead:
Avoid
SELECT hero, sidekick FROM superheros WHERE hero || sidekick = 'BatmanRobin'

Prefer
SELECT hero, sidekick FROM superheros WHERE hero = 'Batman' AND sidekick = 'Robin'

***SQL best practices for GROUP BY****


Order multiple groupings by descending cardinality
Where possible, GROUP BY columns in order of descending cardinality. That is, group by
columns with more unique values first (like IDs or phone numbers) before grouping by
columns with fewer distinct values (like state or gender).

SQL best practices for HAVING

Only use HAVING for filtering aggregates


And before HAVING , filter out values using a WHERE clause before aggregating and
grouping those values.

SQL Best Practices SQL Best Practices 2


SQL best practices for SELECT

SELECT columns, not stars


Specify the columns you’d like to include in the results (though it’s fine to use * when
first exploring tables — just remember to LIMIT your results).

SQL best practices for ORDER BY

Avoid sorting where possible, especially in subqueries


Sorting is expensive. If you must sort, make sure your subqueries are not needlessly
sorting data.

Things to take care of!


Let’s skip the tech talks!

REQUIREMENTS
The scope of the requirement should be clear.
The problem statement should be clear and should not have any ambiguity.

The schema required should be clear.


All this should be covered in the grooming stage.

FORMATTING
Formatting is an important aspect of the query that makes it readable and reusable.

The following things should be adopted:


Carefully use Indentation & White spaces

Even though it’s a basic principle, it’s a quick win to make your code more readable. As
you would do with python, you should indent your SQL code.
Indent after a keyword, and when you use a subquery or a derived table.

Avoid

SQL Best Practices SQL Best Practices 3


SELECT customers.id, customers.name, customers.age, customers.gender, customers.salary, fi
rst_purchase.date
FROM company.customers
LEFT JOIN ( SELECT customer_id, MIN(date) as date FROM company.purchases GROUP BY customer
_id ) AS first_purchase
ON first_purchase.customer_id = customers.id
WHERE customers.age<=30

Prefer

SELECT customers.id,
customers.name,
customers.age,
customers.gender,
customers.salary,
first_purchase.date
FROM company.customers
LEFT JOIN (
SELECT customer_id,
MIN(date) as date
FROM company.purchases
GROUP BY customer_id
)AS first_purchase
ON first_purchase.customer_id = customers.id
WHERE customers.age <= 30

Also, note how we used the white spaces in the where clause.

Avoid

SELECT id WHERE customers.age<=30

Prefer

SELECT id WHERE customers.age <= 30

Use aliases when it improves readability

SQL Best Practices SQL Best Practices 4


It’s well known, aliases are a convenient way to rename tables or columns which
doesn’t make sense. Don’t hesitate to give an alias to your tables and columns when
their names aren’t meaningful, and to alias your aggregates.

Avoid

SELECT customers.id,
customers.name,
customers.context_col1,
nested.f0_
FROM company.customers
JOIN (
SELECT customer_id,
MIN(date)
FROM company.purchases
GROUP BY customer_id
)ON customer_id = customers.id

Prefer

SELECT customers.id,
customers.name,
customers.context_col1as ip_address,
first_purchase.dateas first_purchase_date
FROM company.customers
JOIN (
SELECT customer_id,
MIN(date)as date
FROM company.purchases
GROUP BY customer_id
)AS first_purchase
ON first_purchase.customer_id = customers.id

TIP: Alias the columns with a lower case as , and the tables with an uppercase AS .

Comments
Provide proper comments, wherever necessary.

End Objective

SQL Best Practices SQL Best Practices 5


Once you are done with the entire query, make sure it is serving the problem statement
adequately. Do test it for accuracy.

Provide relevant filters for your stakeholders.


Once we are sure of the accuracy and results we are getting, we can move to the next
part i.e. Optimisation.

OPTIMISATION
Performance and Cost
The query written should be checked for performance, it should not be too time-
consuming and should be able to cater to the demands of the stakeholders without
getting timed out.
We should void these things in our queries for better optimisation:

1. Avoid one to many, many to many joins. Try to go for one to one mapping
joins

This can sometimes lead to an uncontrolled chain of operations which can result
in a longer time run even for a smaller set of data.
Whenever stuck with such scenarios, we need to identify the particular tables
responsible for this.

2.Avoid using same tables in multiple subqueries or CTEs, instead try to get all
the relevant data in a single from a single sub query.

3.Avoid the unnecessary historical data consumed in the query. Understand the
relevant scope of the data from your stakeholder and try to put a filter on the
irrelevant part of the tables, this helps us in optimising the cost too.

4.Try to understand the frequency at which the report will be consumed, if it is


required on a higher frequency by a large number of audience then it’s always
better to create a scheduled big query with the relevant scheduled frequency and
can redirect our final report to that scheduled query.
5.Avoid any unexpected duplicacy by simple checks such as

*count(*),count(distinct primary identifier)**

SQL Best Practices SQL Best Practices 6


Conclusion
Once done with optimisation please make sure we didn’t lose our accuracy in the
process of optimisation and compare the both pre and post optimisation results to make
sure that the reports are still accurate.

Remember:

***Correctness, readability, then optimization: in that order****

Final Checklist for Review:

Description added in the MB report

Add BQ memory consumed in the Description

Add comments for Select statements where formulas are used over the raw
columns

Add comments for each table with the reason for using that table

Add comments for where clause conditions

MB reports should be in proper collection

Avoid using functions in where clause, if necessary use with proper comment

test cases that are used should be mentioned in the description with dates, to have
a log like visibility. That can help in avoiding the rework in future if any discrepancy
arises

SQL Best Practices SQL Best Practices 7

You might also like