0% found this document useful (0 votes)
38 views

Sqlfordevscom Next Level Database Techniques For Developers Pages 5 10

The document discusses techniques for improving data manipulation performance in SQL databases. It covers strategies for updating rows based on related data, deleting duplicate rows efficiently with common table expressions, and optimizing high-volume counter updates by spreading the load across multiple rows. The document also recommends analyzing tables after bulk data modifications to keep database statistics and query planning accurate.

Uploaded by

Kumar SIVA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Sqlfordevscom Next Level Database Techniques For Developers Pages 5 10

The document discusses techniques for improving data manipulation performance in SQL databases. It covers strategies for updating rows based on related data, deleting duplicate rows efficiently with common table expressions, and optimizing high-volume counter updates by spreading the load across multiple rows. The document also recommends analyzing tables after bulk data modifications to keep database statistics and query planning accurate.

Uploaded by

Kumar SIVA
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Manipulation

A database without any INSERT, UPDATE or DELETE query would be a barely valuable
application. Although some applications exist with only static content, they are the
exception and you will have to do data modifications all the time. While this seems to be the
most uncomplicated functionality in SQL, it still has room for improvement for your
applications. Always remember that the number of write operations your disk can do in a
second is very limited. If you can reduce the operations per second, your application will be
much more performant.

The data manipulation chapter will teach you tricks to update rows based on information in
other tables, delete duplicate rows or make your application faster by removing lock
contention. You should study the last tip closely, as I have often found this to be a
performance problem.

5
Prevent Lock Contention For Updates On Hot Rows

-- MySQL
INSERT INTO tweet_statistics (
tweet_id, fanout, likes_count
) VALUES (
1475870220422107137, FLOOR(RAND() * 10), 1
) ON DUPLICATE KEY UPDATE likes_count =
likes_count + VALUES(likes_count);

-- PostgreSQL
INSERT INTO tweet_statistics (
tweet_id, fanout, likes_count
) VALUES (
1475870220422107137, FLOOR(RANDOM() * 10), 1
) ON CONFLICT (tweet_id, fanout) DO UPDATE SET likes_count =
tweet_statistics.likes_count + excluded.likes_count;

In some applications counters for e.g. likes of a tweet are constantly updated. During a
traffic spike or for trendy content a counter may get updated countless times within a
second. Due to the database's concurrency control the updates will start interfering with
each other as a row can only be locked by one transaction (query) at a time. Every update
will be executed one after another instead of parallel execution for independent rows.

Instead of updating a single row, the increments are fan outed to e.g. 100 different rows in a
special counter table. The scaling factor now increases by the number of additional rows the
counter is written to. Those values are later aggregated to a single value and saved in their
original column that would have had lock contention.

6
Updates Based On A Select Query

-- MySQL
UPDATE products
JOIN categories USING(category_id)
SET price = price_base - price_base * categories.discount;

-- PostgreSQL
UPDATE products
SET price = price_base - price_base * categories.discount
FROM categories
WHERE products.category_id = categories.category_id;

Tables are often not updated in isolation, but the values are updated based on information
stored in other tables. For e.g. discounting all products on Black Friday, a discount for every
product category will be applied. Instead of the naive approach to execute an update query
for every category, you can update the products by joining them to their categories. The
manual join in the application is replaced by a more efficient one by the database.

Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: UPDATE from a SELECT

7
Return The Values Of Modified Rows

-- PostgreSQL:
DELETE FROM sessions
WHERE ip = '127.0.0.1'
RETURNING id, user_agent, last_access;

Many maintenance operations are based on finding particular rows, processing them (e.g.
sending an email or calculating some statistics) and marking them as processed. Typically a
flag within the row is updated or deleted as it is not needed anymore. This workflow can be
simplified by using the RETURNING feature and doing the data manipulation and selection of
the data in one step.

This feature is available for DELETE, INSERT and UPDATE queries and will always return the
data after the modification, e.g. the inserted or updated data with all triggers executed and
generated values available.

Notice: This feature is only available for PostgreSQL.

8
Delete Duplicate Rows

-- MySQL
WITH duplicates AS (
SELECT id, ROW_NUMBER() OVER(
PARTITION BY firstname, lastname, email
ORDER BY age DESC
) AS rownum
FROM contacts
)
DELETE contacts
FROM contacts
JOIN duplicates USING(id)
WHERE duplicates.rownum > 1;

-- PostgreSQL
WITH duplicates AS (
SELECT id, ROW_NUMBER() OVER(
PARTITION BY firstname, lastname, email
ORDER BY age DESC
) AS rownum
FROM contacts
)
DELETE FROM contacts
USING duplicates
WHERE contacts.id = duplicates.id AND duplicates.rownum > 1;

After some time, most applications will have duplicated rows resulting in a bad user
experience, higher storage requirements and less database performance. The cleaning
process is usually implemented in application code with complex chunking behavior as the
data does not fit into memory entirely. By using a Common Table Expression (CTE) the
duplicate rows can be identified and sorted by their importance to keep them. A single
delete query can afterward delete all duplicates except a specific number of ones to keep.
The former complex logic is done by one simple SQL query.

Notice: I have written a more extensive text about this topic on my database
focused website SqlForDevs.com: Delete Duplicate Rows

9
Table Maintenance After Bulk Modifications

-- MySQL
ANALYZE TABLE users;

-- PostgreSQL
ANALYZE SKIP_LOCKED users;

The database needs up-to-date statistics about your tables like the approximate amount of
rows, data distribution of values and more to calculate the most efficient way to execute
your query. Contrary to indexes that are automatically altered whenever a row affecting its
data is created, updated or deleted the statistics are not mutated on every change. A
recalculation is only triggered when a threshold of changes to a table is crossed.

Whenever you change a big part of a table, the number of affected rows may still be below
the statistics recalculation threshold but significant enough to make the statistics incorrect.
Some queries may become very slow as the database predicts the best query plan based on
the now incorrect information about the table. Therefore, you should analyze a table to
trigger the statistics recalculation after every significant change to ensure fast queries.

10

You might also like