0% found this document useful (0 votes)

10 views17 pages

Redshift Best Practices

The document provides an overview of AWS Redshift, detailing its architecture and key features such as columnar storage and massively parallel processing. It discusses best practices for sorting and distribution styles, query writing, and performance optimization, emphasizing the importance of selecting appropriate sort keys and distribution strategies. Additionally, it outlines specific recommendations for writing efficient queries to maximize performance.

Uploaded by

Saad Durrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views17 pages

Redshift Best Practices

Uploaded by

Saad Durrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

Optimizing Performance:

Best Practices in Redshift

Shaheer Anjum
Senior Data Engineer
Data Platforms
AGENDA
● Overview of AWS Redshift
○ What is AWS Redshift
○ AWS Redshift Architecture

● Understanding the Sorting & Distribution Styles

○ Best Distribution Styles
○ Best Sort Keys

● Query Writing Best Practices & Performance Optimization

○ How to Write an optimized query.
○ How to reduce execution time.
● Code Refactoring
○ Converting SQL server queries to redshift.
AWS REDSHIFT OVERVIEW:
● Amazon Redshift is a fully managed, petabyte-scale data warehouse service on
the AWS cloud, built for high-performance analysis.
● Key Features:
○ Columnar Storage, Stores data in columns rather than rows for faster query
performance.
○ Massively Parallel Processing (MPP).
○ Distributes queries across multiple nodes for parallel execution.
○ Easily scales up or down based on workload and data volume.
○ Ideal for data warehousing, analytics, and business intelligence.
○ Leader Node: Coordinates queries, manages query optimization, and distributes
queries to compute nodes.
○ Compute Nodes: Store and process data in parallel, providing scalable storage and
processing power.
DATA DISTRIBUTION STRATEGIES:
● Amazon Redshift supports three different data distribution styles:
○ Key Distribution
○ Even Distribution
○ All Distribution
○ AUTO Distribution

● Key Distribution:
○ Key distribution is achieved by selecting a column or set of columns as the
distribution key. The distribution key determines how data is distributed across
compute nodes.
○ Rows with the same distribution key are hashed to the same node, enabling efficient
querying for data with the specified key. This can enhance performance for joint
operations involving the distribution key.
DATA DISTRIBUTION STRATEGIES:
● Even Distribution:
○ Even distribution (or distribution style EVEN) distributes data evenly across all nodes
without relying on a specific column for hashing.
○ Redshift automatically divides the data evenly across the compute nodes, providing a
balanced workload distribution.
○ Even distribution is useful when there is no clear natural key for distribution, or when
the workload is evenly spread across the entire dataset.
● ALL Distribution:
○ All distribution (or distribution style ALL) involves replicating the entire table on each
node in the cluster.
○ Each compute node holds a full copy of the table, eliminating the need for inter-node
data movement during query execution.
○ All distribution is advantageous for small sized dimension tables which are joined very
frequently and take minimum
DATA DISTRIBUTION STRATEGIES:
● Auto Distribution:
○ With AUTO distribution, Amazon Redshift assigns an optimal distribution style based
on the size of the table data.
○ For example, if the AUTO distribution style is specified, Amazon Redshift initially
assigns the ALL distribution style to a small table.
○ When the table grows larger, Amazon Redshift might change the distribution style to
KEY, choosing the primary key (or a column of the composite primary key) as the
distribution key.
○ If the table grows larger and none of the columns are suitable to be the distribution
key, Amazon Redshift changes the distribution style to EVEN. The change in
distribution style occurs in the background with minimal impact to user queries.
REDSHIFT SORT KEYS:
● There can be multiple columns defined as Sort Keys. Data stored in the table can
be sorted using these columns. The query optimizer uses this sort ordered table
while determining optimal query plans.
● Amazon Redshift supports two kinds of Sort Keys.
○ Compound Sort Keys
○ Interleaved Sort Keys
REDSHIFT SORT KEYS:
● COMPOUND SORT KEYS:
○ These are made up of all the columns that are listed in the Redshift sort keys
definition during the creation of the table, in the order that they are listed. Therefore,
it is advisable to put the most frequently used column at the first in the list.
COMPOUND is the default sort type. Compound sort keys might speed up joins,
GROUP BY and ORDER BY operations, and window functions that use PARTITION BY.

● INTERLEAVED SORT KEYS:

○ Interleaved sort gives equal weight to each column in the Redshifts sort keys. As a
result, it can significantly improve query performance where the query uses
restrictive predicates (equality operator in WHERE clause) on secondary sort columns.
REDSHIFT SORT KEYS:
● Selecting the right kind needs the knowledge of the queries.
○ Use Interleaved Sort Key when you plan to use one column as Sort Key or when
WHERE clauses in your query have highly selective restrictive predicates. Or if the
tables are huge.

○ Use the Compound Sort Key, when you have more than one column as Sort Key when
your query includes JOINS, GROUP BY, ORDER BY, and PARTITION BY when your table
size is small.

○ Don’t use an interleaved sort key on columns with monotonically increasing

attributes, like an identity column, dates or timestamps.
DIST KEY EXAMPLES:
● Look at the schema of the USERS table in the TICKIT database. USERID is defined as
the SORTKEY column and the DISTKEY column:
DIST KEY EXAMPLES:
● USERID is a good choice for the distribution column in this table. If you query the
SVV_DISKUSAGE system view, you can see that the table is very evenly distributed.
Column numbers are zero-based, so USERID is column 0.
DIST KEY EXAMPLES:
● CREATE [ [LOCAL ] { TEMPORARY | TEMP } ] TABLE [ IF NOT EXISTS ] table_name (
{ column_name data_type [column_attributes] [ column_constraints ] |
table_constraints | LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] }
[, ... ] ) [ BACKUP { YES | NO } ] [table_attributes]
and table_attributes are: [ DISTSTYLE { AUTO | EVEN | KEY | ALL } ] [ DISTKEY
( column_name ) ] [ [COMPOUND | INTERLEAVED ] SORTKEY ( column_name [,...])
| [ SORTKEY AUTO ] ] [ ENCODE AUTO ]

● ALTER TABLE tablename ALTER DISTSTYLE ALL, ALTER SORTKEY (column_list);

AMAZON REDSHIFT BEST PRACTICES FOR WRITING QUERIES:

● To maximize query performance, follow these recommendations when creating

queries:
○ Avoid using select *. Include only the columns you specifically need.
○ Use a CASE conditional expression to perform complex aggregations instead of
selecting from the same table multiple times.
○ Don't use cross-joins unless necessary. Cross joins without a join condition result in
the Cartesian product of two tables. Cross-joins are typically run as nested-loop joins,
which are the slowest of the possible join types.
○ Use subqueries in cases where one table in the query is used only for predicate
conditions and the subquery returns a small number of rows (less than about 200).
The following example uses a subquery to avoid joining the LISTING table.
AMAZON REDSHIFT BEST PRACTICES FOR WRITING QUERIES:

○ Join Larger tables first.

○ Use predicates to restrict the dataset as much as possible.
○ In the predicate, use the least expensive operators that you can.
○ Comparison operators are preferable to like operator.
○ =,<>,<,> are better than LIKE.
○ LIKE operators are still better than SIMILAR TO.
○ Avoid using functions in query predicates. Using them can drive up the cost of the
query by requiring large numbers of rows to resolve the intermediate steps of the
query.
AMAZON REDSHIFT BEST PRACTICES FOR WRITING QUERIES:

○ Add predicates to filter tables that participate in joins, even if the predicates apply the
same filters. The query returns the same result set, but Amazon Redshift is able to
filter the join tables before the scan step and can then efficiently skip scanning blocks
from those tables. Redundant filters aren't needed if you filter on a column that's
used in the join condition.

○ For example, suppose that you want to join SALES and LISTING to find ticket sales for
tickets listed after December, grouped by seller. Both tables are sorted by date. The
following query joins the tables on their common key and filters for listing.listtime
values greater than December 1.
AMAZON REDSHIFT BEST PRACTICES FOR WRITING QUERIES:

○ The WHERE clause doesn't include a predicate for sales.saletime, so the execution
engine is forced to scan the entire SALES table. If you know the filter would result in
fewer rows participating in the join, then add that filter as well. The following example
cuts execution time significantly.
AMAZON REDSHIFT BEST PRACTICES FOR WRITING QUERIES:

○ Use sort keys in the GROUP BY clause so the query planner can use more efficient
aggregation.
○ If you use both GROUP BY and ORDER BY clauses, make sure that you put the
columns in the same order in both. That is, use the approach just following.
■ group by a, b, c;
■ order by a, b, c

○ Don't use the following approach.

■ group by b, c, a
■ order by a, b, c

SQL For Data Analysis PDF
100% (1)
SQL For Data Analysis PDF
10 pages
AWS Redshift
No ratings yet
AWS Redshift
145 pages
Redshift Interview Guide!
No ratings yet
Redshift Interview Guide!
21 pages
Flipkart Data Analyst Interview Questions 1747625566
No ratings yet
Flipkart Data Analyst Interview Questions 1747625566
27 pages
AWS Data Engineering Cheatsheet2
No ratings yet
AWS Data Engineering Cheatsheet2
27 pages
Getting Started With Amazon Redshift
No ratings yet
Getting Started With Amazon Redshift
51 pages
Redshift DG PDF
100% (1)
Redshift DG PDF
1,161 pages
Deep Dive On AWS Redshift
67% (3)
Deep Dive On AWS Redshift
73 pages
Deep Dive and Best Practices For Amazon Redshift ANT418
100% (1)
Deep Dive and Best Practices For Amazon Redshift ANT418
85 pages
Redshift-Developer Guide
No ratings yet
Redshift-Developer Guide
1,552 pages
Redshift DG
No ratings yet
Redshift DG
733 pages
Amazon Redshift
No ratings yet
Amazon Redshift
20 pages
SQL Tuning Guidelines
100% (1)
SQL Tuning Guidelines
26 pages
Stoic Philosophy - JM Rist
No ratings yet
Stoic Philosophy - JM Rist
312 pages
Data Warehouse
No ratings yet
Data Warehouse
42 pages
Redshift DG
No ratings yet
Redshift DG
871 pages
Aws (S3, Iam, Ec2, Emr and Redshift)
100% (1)
Aws (S3, Iam, Ec2, Emr and Redshift)
16 pages
An Introduction To Amazon Redshift
No ratings yet
An Introduction To Amazon Redshift
10 pages
7.1 GR 7 EHL & EFAL - 2023 - 2024 ATP - Tracker Term 3 2024
No ratings yet
7.1 GR 7 EHL & EFAL - 2023 - 2024 ATP - Tracker Term 3 2024
30 pages
SQL Handbook by Mohammed Zahid Wadiwale
No ratings yet
SQL Handbook by Mohammed Zahid Wadiwale
83 pages
Migrate Your On-Premise Data Warehouse To Amazon Redshift: Noman Jaffery
100% (1)
Migrate Your On-Premise Data Warehouse To Amazon Redshift: Noman Jaffery
18 pages
Amazon Redshift Database Developer Guide
No ratings yet
Amazon Redshift Database Developer Guide
783 pages
Query Best Practices Redshift
No ratings yet
Query Best Practices Redshift
71 pages
Programs in SIC
100% (1)
Programs in SIC
15 pages
Amazon Redshift Best Practices
No ratings yet
Amazon Redshift Best Practices
47 pages
Amazon Redshift Interview Questions
100% (1)
Amazon Redshift Interview Questions
4 pages
Redshift DG
No ratings yet
Redshift DG
735 pages
A1-A2 Time To Pack and Plan - Be Going To TV
No ratings yet
A1-A2 Time To Pack and Plan - Be Going To TV
21 pages
SQL 1721960421
No ratings yet
SQL 1721960421
131 pages
Amazon Red Shift
No ratings yet
Amazon Red Shift
54 pages
SQL Training 101
No ratings yet
SQL Training 101
25 pages
Data Engineering 101 Redshift
No ratings yet
Data Engineering 101 Redshift
65 pages
Superstore Sales Analysis
No ratings yet
Superstore Sales Analysis
20 pages
All About History - Book of Ancient Greece
100% (6)
All About History - Book of Ancient Greece
165 pages
Course SQL Scripts
No ratings yet
Course SQL Scripts
18 pages
T SQL
No ratings yet
T SQL
39 pages
Handy SQL Functions
No ratings yet
Handy SQL Functions
9 pages
Amazon Redshift - Analyze Data Across Your Lake House With Amazon Redshift
No ratings yet
Amazon Redshift - Analyze Data Across Your Lake House With Amazon Redshift
48 pages
Mysql Postgre Test
No ratings yet
Mysql Postgre Test
21 pages
Sort and Filter Query Results
No ratings yet
Sort and Filter Query Results
7 pages
Ade 1737191501
No ratings yet
Ade 1737191501
29 pages
10 Most Asked SQL Questions With Answers and Explanations
No ratings yet
10 Most Asked SQL Questions With Answers and Explanations
6 pages
Amazon Redshift
No ratings yet
Amazon Redshift
5 pages
Solutions SQL PseudoCode BIE Concepts
No ratings yet
Solutions SQL PseudoCode BIE Concepts
5 pages
Amazon AWS Redshift Overview
No ratings yet
Amazon AWS Redshift Overview
3 pages
Advanced SQL Techniques
No ratings yet
Advanced SQL Techniques
19 pages
Workbook in LOGIC
No ratings yet
Workbook in LOGIC
40 pages
Amazon Redhsift
No ratings yet
Amazon Redhsift
25 pages
SQL Practice
No ratings yet
SQL Practice
5 pages
Querying With T-SQL - 02
No ratings yet
Querying With T-SQL - 02
11 pages
SQL Short Notes Top 10 Questions 1748266007
No ratings yet
SQL Short Notes Top 10 Questions 1748266007
8 pages
12 SQL Query Optimization Best Practices For Cloud Databases
No ratings yet
12 SQL Query Optimization Best Practices For Cloud Databases
9 pages
Amazon Redshift论文
No ratings yet
Amazon Redshift论文
13 pages
Amazon Red Shift
No ratings yet
Amazon Red Shift
17 pages
Data Warehousing & OLAP (Business Intellegent)
No ratings yet
Data Warehousing & OLAP (Business Intellegent)
31 pages
6 Tips For Better SQL Query Optimization (With Example Code)
No ratings yet
6 Tips For Better SQL Query Optimization (With Example Code)
4 pages
Description of Hospitality by Ramesh Sir
No ratings yet
Description of Hospitality by Ramesh Sir
61 pages
Yasser Auda CCIEv5 IPv4 Multicast Study Guide PDF
No ratings yet
Yasser Auda CCIEv5 IPv4 Multicast Study Guide PDF
50 pages
Cheat Sheet - Redshift Performance Optimization
No ratings yet
Cheat Sheet - Redshift Performance Optimization
17 pages
Optimizing Tableau Aws Redshift Whitepaper
No ratings yet
Optimizing Tableau Aws Redshift Whitepaper
33 pages
Hebrew Letters Genetic
90% (10)
Hebrew Letters Genetic
29 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
Introduction To NetSim
No ratings yet
Introduction To NetSim
8 pages
Netezza Best Practices
No ratings yet
Netezza Best Practices
5 pages
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
No ratings yet
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
8 pages
Bimal Krishna Matilal-Perception - An Essay On Classical Indian Theories of Knowledge - Oxford University Press, USA (1986)
No ratings yet
Bimal Krishna Matilal-Perception - An Essay On Classical Indian Theories of Knowledge - Oxford University Press, USA (1986)
44 pages
Moon of The Caribbees: Presented by
No ratings yet
Moon of The Caribbees: Presented by
7 pages
Spanish I Course Outline
No ratings yet
Spanish I Course Outline
4 pages
DLL-Gr. 9 English
No ratings yet
DLL-Gr. 9 English
10 pages
M06-2053-101 (No Logo) - 231218 - 214518
No ratings yet
M06-2053-101 (No Logo) - 231218 - 214518
246 pages
Mini-Lesson Plan: Gabrielle Villador
No ratings yet
Mini-Lesson Plan: Gabrielle Villador
3 pages
Where I'm From Poems
87% (23)
Where I'm From Poems
3 pages
Oracle Dba Notes
No ratings yet
Oracle Dba Notes
11 pages
Past Perfect-Lesson Plan
No ratings yet
Past Perfect-Lesson Plan
6 pages
Party Data Model
No ratings yet
Party Data Model
26 pages
1 - Introduction To Python Programming
No ratings yet
1 - Introduction To Python Programming
19 pages
Recuperación Activities N 1
No ratings yet
Recuperación Activities N 1
5 pages
Test 2 Combined
No ratings yet
Test 2 Combined
31 pages
Ordinal Numbers
No ratings yet
Ordinal Numbers
3 pages
2023 Hindu Calendar
No ratings yet
2023 Hindu Calendar
1 page
Technical Skills: Github Repo Video Demo Deployed App
No ratings yet
Technical Skills: Github Repo Video Demo Deployed App
1 page
British National Academy Complaint Letter
No ratings yet
British National Academy Complaint Letter
2 pages
Experimental
No ratings yet
Experimental
5 pages
Singing With Intention ACDA NATL 2011
No ratings yet
Singing With Intention ACDA NATL 2011
23 pages
Ahmad Faraz: Personal Summary
No ratings yet
Ahmad Faraz: Personal Summary
1 page
Mastering Excel Array Formulas!
From Everand
Mastering Excel Array Formulas!
Besedin Andrei
No ratings yet
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
Pivot Tables for everyone. From simple tables to Power-Pivot: Useful guide for creating Pivot Tables in Excel
From Everand
Pivot Tables for everyone. From simple tables to Power-Pivot: Useful guide for creating Pivot Tables in Excel
Olga Maria Stefania Cucaro
No ratings yet

Redshift Best Practices

Uploaded by

Redshift Best Practices

Uploaded by

Optimizing Performance:

Best Practices in Redshift

● Understanding the Sorting & Distribution Styles

● Query Writing Best Practices & Performance Optimization

● INTERLEAVED SORT KEYS:

○ Don’t use an interleaved sort key on columns with monotonically increasing

● ALTER TABLE tablename ALTER DISTSTYLE ALL, ALTER SORTKEY (column_list);

● To maximize query performance, follow these recommendations when creating

○ Join Larger tables first.

○ Don't use the following approach.

You might also like