SQL Best Practicces
SQL Best Practicces
Practices
How does it function?
4. HAVING filters out aggregated data that doesn’t meet the criteria.
The idea is to reduce the data in the best possible early stage. This reduces the data
processed.
Prefer
SELECT hero, sidekick FROM superheros WHERE hero = 'Batman' AND sidekick = 'Robin'
REQUIREMENTS
The scope of the requirement should be clear.
The problem statement should be clear and should not have any ambiguity.
FORMATTING
Formatting is an important aspect of the query that makes it readable and reusable.
Even though it’s a basic principle, it’s a quick win to make your code more readable. As
you would do with python, you should indent your SQL code.
Indent after a keyword, and when you use a subquery or a derived table.
Avoid
Prefer
SELECT customers.id,
customers.name,
customers.age,
customers.gender,
customers.salary,
first_purchase.date
FROM company.customers
LEFT JOIN (
SELECT customer_id,
MIN(date) as date
FROM company.purchases
GROUP BY customer_id
)AS first_purchase
ON first_purchase.customer_id = customers.id
WHERE customers.age <= 30
Also, note how we used the white spaces in the where clause.
Avoid
Prefer
Avoid
SELECT customers.id,
customers.name,
customers.context_col1,
nested.f0_
FROM company.customers
JOIN (
SELECT customer_id,
MIN(date)
FROM company.purchases
GROUP BY customer_id
)ON customer_id = customers.id
Prefer
SELECT customers.id,
customers.name,
customers.context_col1as ip_address,
first_purchase.dateas first_purchase_date
FROM company.customers
JOIN (
SELECT customer_id,
MIN(date)as date
FROM company.purchases
GROUP BY customer_id
)AS first_purchase
ON first_purchase.customer_id = customers.id
TIP: Alias the columns with a lower case as , and the tables with an uppercase AS .
Comments
Provide proper comments, wherever necessary.
End Objective
OPTIMISATION
Performance and Cost
The query written should be checked for performance, it should not be too time-
consuming and should be able to cater to the demands of the stakeholders without
getting timed out.
We should void these things in our queries for better optimisation:
1. Avoid one to many, many to many joins. Try to go for one to one mapping
joins
This can sometimes lead to an uncontrolled chain of operations which can result
in a longer time run even for a smaller set of data.
Whenever stuck with such scenarios, we need to identify the particular tables
responsible for this.
2.Avoid using same tables in multiple subqueries or CTEs, instead try to get all
the relevant data in a single from a single sub query.
3.Avoid the unnecessary historical data consumed in the query. Understand the
relevant scope of the data from your stakeholder and try to put a filter on the
irrelevant part of the tables, this helps us in optimising the cost too.
Remember:
Add comments for Select statements where formulas are used over the raw
columns
Add comments for each table with the reason for using that table
Avoid using functions in where clause, if necessary use with proper comment
test cases that are used should be mentioned in the description with dates, to have
a log like visibility. That can help in avoiding the rework in future if any discrepancy
arises