DW SQL 201
DW SQL 201
Amazon.com Confidential 1
Introduction & Intended Audience
• This class is for anyone with a basic knowledge of
SQL.
• This class focuses on Data Warehouse SQL rather
than OLTP SQL (analytic vs. operational)
• We will cover the SELECT clause
– The DW Virt (sandbox) class will cover INSERT, UPDATE,
etc
• At the end of this class you will:
– Be able to write optimized, efficient SQL
– Be familiar with various methods for getting your data
quickly and accurately
Amazon.com Confidential 2
Topics
• OLTP vs. DW
• Left/right/outer/full outer joins
• Handling and manipulating dates
• Dealing with NULLs
• String and conversion functions like NVL, DECODE, and
COALESCE
• Case statements
• Pivoting data
• Analytic (window) functions
• Subqueries and inline views
• Tuning
Amazon.com Confidential 3
OLTP vs. DW
• An OLTP (On Line Transaction Processing)
database is designed to run thousands (or
even millions to billions) of small queries per
day
• A Data Warehouse database is designed to
run hundreds (or thousands) of large queries
per day
Amazon.com Confidential 4
OLTP
SELECT customer_name
FROM customers
WHERE customer_id = 31337;
SELECT COUNT(*)
FROM orders
WHERE customer_id = 31337
AND order_date = TO_DATE(’11/16/1998’,’MM/DD/YYYY’);
Amazon.com Confidential 5
DW
SELECT COUNT(*)
, SUM(msrp) AS total_msrp
FROM orders
WHERE order_date >= TO_DATE(’11/16/1998’,’MM/DD/YYYY’)
AND order_date <= TO_DATE(’11/16/1999’,’MM/DD/YYYY’)
AND country = ‘USA’;
Amazon.com Confidential 6
OLTP vs. DW
Amazon.com Confidential 7
A word about Formatting
SELECT
order_item.customer_id AS customer_id,
order_item.order_id AS order_id,
TO_CHAR(order_item.order_datetime, 'yyyy-mm-dd hh24:mi') AS order_date,
order_item.ASIN AS order_asin,
order_item.our_price AS item_price,
order_item.customer_order_item_id AS order_item_id,
ship_item.customer_shipment_item_id AS shipment_item_id,
ship_item.marketplace_id
FROM
d_customer_order_items order_item
left join d_customer_shipment_items ship_item
ON order_item.customer_order_item_id = ship_item.customer_order_item_id
WHERE
order_item.legal_entity_id =
(CASE WHEN order_item.marketplace_id = '4861' THEN 101
WHEN order_item.marketplace_id = '41092' THEN 109 END)
AND order_item.marketplace_id = '41092'
AND order_item.order_datetime >= SYSDATE - 7
AND order_item.order_datetime < SYSDATE - 1
ORDER BY
customer_id ASC, item_price DESC
Amazon.com Confidential 8
A word about Formatting
SELECT order_item.customer_id AS customer_id
, order_item.order_id AS order_id
, TO_CHAR(order_item.order_datetime, 'yyyy-mm-dd hh24:mi') AS order_date
, order_item.ASIN AS order_asin
, order_item.our_price AS item_price
, order_item.customer_order_item_id AS order_item_id
, ship_item.customer_shipment_item_id AS shipment_item_id
, ship_item.marketplace_id
FROM d_customer_order_items order_item
left join
d_customer_shipment_items ship_item
ON order_item.customer_order_item_id = ship_item.customer_order_item_id
WHERE order_item.legal_entity_id =
(CASE WHEN order_item.marketplace_id = '4861' THEN 101
order_item.marketplace_id = '41092' THEN 109 END
)
AND order_item.marketplace_id = '41092'
AND order_item.order_datetime >= SYSDATE - 7
AND order_item.order_datetime < SYSDATE - 1
ORDER BY 1 ASC, 5 DESC
Amazon.com Confidential 9
A word about Formatting
SELECT order_item.customer_id AS customer_id
, order_item.order_id AS order_id
, TO_CHAR(order_item.order_datetime, 'yyyy-mm-dd hh24:mi') AS order_date
, order_item.ASIN AS order_asin
, order_item.our_price AS item_price
, order_item.customer_order_item_id AS order_item_id
, ship_item.customer_shipment_item_id AS shipment_item_id
, ship_item.marketplace_id
FROM d_customer_order_items order_item
, d_customer_shipment_items ship_item
WHERE order_item.customer_order_item_id = ship_item.customer_order_item_id (+)
AND order_item.legal_entity_id =
(CASE WHEN order_item.marketplace_id = '4861' THEN 101
WHEN order_item.marketplace_id = '41092' THEN 109
END
)
AND order_item.marketplace_id = '41092'
AND order_item.order_datetime >= SYSDATE - 7
AND order_item.order_datetime < SYSDATE - 1
ORDER BY 1 ASC, 5 DESC
Amazon.com Confidential 10
Terms and Definitions
• Partition
– A horizontal slice of a table
– Like the floors of an office building – “I’m someplace on the 3rd floor of US2”…
• Partition elimination
• Index
– A data structure use to improve the efficiency of look ups in a table
– Like the office numbers – “I’m in 327.F2”…
• Cardinality (row)
– Number of elements of a data set
• Cardinality (Relationship)
– Uniqueness of values in a column
• Predicate
– The “WHERE” clauses
• Explain plan (a verb)
• Query plan (a noun)
• Query Coordinator
• Parallel Query server (aka slave)
• Cost Amazon.com Confidential 11
Tools
• ETL Manager
– Table Explorer
– Explain Plan Tool
• VDBA - https://vdba.amazon.com/
• SQL*Plus
• SQL Developer
• TOAD
• Bookshelf - https://bookshelf.amazon.com/oracle/
Amazon.com Confidential 12
Tools
• Oracle Syntax diagrams
Amazon.com Confidential 13
Join Types
Equi-join (inner)
Amazon.com Confidential 14
Outer Joins
SELECT coi.order_id
, coi.quantity
, promo.is_bxgy_promotion
FROM d_customer_order_items coi
, d_promotions promo
WHERE coi.promo_id = promo.promo_id (+)
SELECT coi.order_id
, coi.quantity
, promo.is_bxgy_promotion
FROM d_customer_order_items coi
left join d_promotions promo
ON promo.promo_id = coi.promo_id
Amazon.com Confidential 15
Outer Joins
SELECT coi.order_id
, coi.quantity
, promo.is_bxgy_promotion
FROM d_customer_order_items coi
, d_promotions promo
WHERE coi.promo_id (+) = promo.promo_id (+)
AND …
Nope
Amazon.com Confidential 16
Outer Joins
SELECT coi.order_id
, coi.quantity
, promo.is_bxgy_promotion
FROM d_customer_order_items coi
FULL OUTER JOIN d_promotions promo
ON coi.promo_id = promo.promo_id
WHERE …
Amazon.com Confidential 17
Dates in Oracle
• Dates are stored as a big integer
– The number of seconds since 1/1/1970
• Adding/subtracting dates results in a number, not a date
• 12/02/2008 12:00:00 – 12/01/2008 12:00:00 = 1
• 12/02/2008 12:00:00 – 12/01/2008 00:00:00 = 1.5
• order_day = TO_DATE(‘{RUN_DATE_YYYY/MM/DD’}, ‘YYYY/MM/DD’}
Amazon.com Confidential 18
Dates in Oracle
Function / Formula Input Output
Amazon.com Confidential 21
Decode
SELECT DECODE(legal_entity_id,
101, ‘Amazon.com’,
102, ‘Amazon.co.uk’,
103, ‘Amazon.de’,
‘Unknown Legal Entity ID’)
FROM foo
Amazon.com Confidential 22
The Case Statement
SELECT CASE WHEN country IS NULL
THEN ‘Unknown’
ELSE country
END AS country
FROM …
SELECT *
FROM d_customer_order_items
WHERE legal_entity_id IN (101,116) /* US and CA */
AND order_day = TO_DATE(‘1/1/2009’,’MM/DD/YYYY’)
AND CASE WHEN legal_entity_id = 101
THEN list_price
WHEN legal_entity_id = 116
THEN associated_store_price
END >= 50
…
Amazon.com Confidential 23
The Case Statement
SELECT item_name
, customer_id
, CASE WHEN cust.age < 21 AND item.no_minors_flag = ‘Y’
THEN ‘No’
ELSE ‘Yes’
END AS can_buy_item
FROM customers cust
, order_items item
WHERE cust.customer_id = item.customer_id
;
Amazon.com Confidential 24
Pivoting Data
Turn This
ASIN Attribute Value
B00154JDAI Item_name Kindle 2
B00154JDAI Item_weight 10.2
B00154JDAI Is_fast_track_flag Y
B00154JDAI Our_price 359.00
B00154JDAI MSRP 359.00
B000WP91XK Item_name Blackberry Curve
B000WP91XK Our_price 0.01
Into This
ASIN Item_name Item_weight Fast_track Our_price MSRP
Amazon.com Confidential 26
Pivoting Data
ASIN ITEM_NAME ITEM_WEIGHT FAST_TRACK OUR_PRICE MSRP
B00154JDAI Kindle 2
B00154JDAI 10.2
B00154JDAI Y
B00154JDAI 359.00
B00154JDAI 359.00
B000WP91XK 0.01
Amazon.com Confidential 27
Pivoting Data
SELECT ASIN
, MAX(CASE WHEN attribute = ‘item_name’
THEN value ELSE NULL END) AS item_name
, MAX(CASE WHEN attribute = ‘item_weight’
THEN value ELSE NULL END) AS item_weight
, MAX(CASE WHEN attribute = ‘is_fast_track_flag’
THEN value ELSE NULL END) AS fast_track
, MAX(CASE WHEN attribute = ‘our_price’
THEN value ELSE NULL END) AS our_price
, MAX(CASE WHEN attribute = ‘msrp’
THEN value ELSE NULL END) AS msrp
FROM items
GROUP BY ASIN;
Amazon.com Confidential 28
Pivoting Data
ASIN Item_name Item_weight Fast_track Our_price MSRP
Amazon.com Confidential 29
Analytic Functions
analytic_function::=
analytic_clause::=
query_partition_clause::=
1 Oracle Documentation
Amazon.com Confidential 30
Analytic Functions
CUSTOMER_ID ORDER_DATE OUR_PRICE
1234 3/23/2000 15.99
8675309 12/24/2006 209.98
1337 9/15/2001 54.00
8675309 5/5/2005 112.45
1337 6/18/1999 34.24
1337 7/4/2008 77.07
1337 10/31/2004 99.99
8675309 3/4/2005 67.89
1337 5/6/2007 89.01
Amazon.com Confidential 31
Analytic Functions
CUSTOMER_ID ORDER_DATE OUR_PRICE
1234 3/23/2000 15.99
1337 6/18/1999 34.24
1337 9/15/2001 54.00
1337 10/31/2004 99.99
1337 5/6/2007 89.01
1337 7/4/2008 77.07
8675309 3/4/2005 67.89
8675309 5/5/2005 112.50
8675309 12/24/2006 210.00
Amazon.com Confidential 32
Analytic Functions
CUSTOMER_ID ORDER_DATE OUR_PRICE
1234 3/23/2000 15.99
1337 6/18/1999 34.24
1337 9/15/2001 54.00
1337 10/31/2004 99.99
1337 5/6/2007 89.01
1337 7/4/2008 77.07
8675309 3/4/2005 67.89
8675309 5/5/2005 112.50
8675309 12/24/2006 210.00
Grouped by customer_id:
MIN order date
MAX our_price
Amazon.com Confidential 33
Analytic Functions
SELECT customer_id
, MIN(order_date) AS first_order
, MAX(our_price) AS max_price
FROM orders
GROUP BY customer_id;
Amazon.com Confidential 34
Analytic Functions
SELECT customer_id
, order_date
, our_price
, row_number() OVER (PARTITION BY customer_id ORDER BY order_day) as order_number
FROM orders;
Amazon.com Confidential 35
Analytic Functions
SELECT *
FROM (SELECT customer_id
, our_price
, order_day
, row_number() OVER (PARTITION BY customer_id ORDER BY order_day) AS
order_number
FROM orders
)
WHERE order_number = 1;
Amazon.com Confidential 36
Analytic Functions
SELECT *
FROM (SELECT customer_id
, our_price
, order_day
, row_number() OVER (PARTITION BY customer_id, legal_entity ORDER BY order_day)
AS order_number
FROM orders
)
WHERE order_number = 1;
Amazon.com Confidential 37
Analytic Functions - Percent_rank & TP80
SELECT customer_id, price
, percent_rank() over (PARTITION BY customer_id ORDER BY price) AS p_rank
FROM …
Customer_id Price P_rank
1234 54 0
1234 65 .14
1234 65 .14
1234 65 .14
1234 75 .57
1234 81 .71
1234 123 .85
1234 150 1
5678 12 0
5678 12 0
5678 18 .4
5678 45 .6
5678 98 .8
5678 121 1
Amazon.com Confidential 38
Analytic Functions
Percent_rank & TP80
SELECT customer_id
, MIN(CASE WHEN p_rank >= .8 THEN price ELSE NULL END) AS price
FROM (SELECT customer_id, price
, percent_rank() over (PARTITION BY customer_id ORDER BY price) AS p_rank
FROM …
)
GROUP BY customer_id
Amazon.com Confidential 39
Analytic Functions
Rank and Dense_Rank
SELECT customer_id
, total_spend
, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY price) AS row_num
, DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY price) AS d_rank
, RANK() OVER (PARTITION BY customer_id ORDER BY price) AS rank
FROM
Amazon.com Confidential 41
Other Useful Analytic Functions
Amazon.com Confidential 44
Inline Views and Subqueries
SELECT cust.customer_id
, COALESCE(cs.shipment_count, 0) AS shipment_count
FROM d_customers cust
, (SELECT customer_id
, COUNT(*) AS shipment_count
FROM d_customer_shipments
WHERE legal_entity_id = 101
AND ship_day >= TO_DATE(‘01/01/2008’,’YYYYMMDD’)
GROUP BY customer_id
HAVING COUNT(*) >= 5
)cs Just like having a table with two columns:
customer_id
WHERE cust.customer_id = cs.customer_id (+)
shipment_count
Amazon.com Confidential 45
Inline Views and Subqueries
SELECT shipment_count
, COUNT(*) AS occurrences
FROM (SELECT COUNT(*) AS shipment_count
FROM d_customer_shipments
WHERE legal_entity_id = 101
AND ship_day >= TO_DATE(‘20080101’,’YYYYMMDD’)
GROUP BY customer_id
)
GROUP BY shipment_count
Amazon.com Confidential 46
Inline Views and Subqueries
Shipment_count Occurrences
1 1929
2 2301
3 1102
4 809
5 234
6 118
… …
Amazon.com Confidential 47
Anti Joins
SELECT DISTINCT customer_id
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day > TO_DATE(’01/01/2009’,’MM/DD/YYYY’)
AND gl_product_group = 14 /* Books */
MINUS
SELECT DISTINCT customer_id
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day > TO_DATE(’01/01/2009’,’MM/DD/YYYY’)
AND gl_product_group = 23 /* Electronics */
;
Amazon.com Confidential 48
Anti Joins
SELECT DISTINCT coi.customer_id
FROM d_customer_order_items coi
WHERE coi.legal_entity_id = 101
AND coi.order_day > TO_DATE('04/01/2009','MM/DD/YYYY')
AND coi.gl_product_group = 14 /* Books */
AND NOT EXISTS (
SELECT 'X'
FROM d_customer_order_items coi_inner
WHERE coi_inner.customer_id = coi.customer_id
AND coi_inner.legal_entity_id = 101
AND coi_inner.order_day >
TO_DATE('04/01/2009','MM/DD/YYYY')
AND coi_inner.gl_product_group = 23 /* Electronics */
)
Amazon.com Confidential 49
Anti Joins
SELECT DISTINCT coi.customer_id
FROM d_customer_order_items coi
WHERE coi.legal_entity_id = 101
AND coi.order_day > TO_DATE('04/01/2009','MM/DD/YYYY')
AND coi.gl_product_group = 14 /* Books */
AND coi.customer_id NOT IN (
SELECT coi_inner.customer_id
FROM d_customer_order_items coi_inner
WHERE coi_inner.legal_entity_id = 101
AND coi_inner.order_day >
TO_DATE('04/01/2009','MM/DD/YYYY')
AND coi_inner.gl_product_group = 23 /* Electronics */
)
Amazon.com Confidential 50
Anti Joins
SELECT DISTINCT coi_books.customer_id
FROM d_customer_order_items coi_books
, d_customer_order_items coi_electronics
WHERE coi_books.customer_id = coi_electronics.customer_id (+)
AND coi_books.legal_entity_id = 101
AND coi_books.order_day > TO_DATE('04/01/2009','MM/DD/YYYY')
AND coi_books.gl_product_group = 14 /* Books */
AND coi_electronics.legal_entity_id (+) = 101
AND coi_electronics.order_day (+) >
TO_DATE('04/01/2009','MM/DD/YYYY')
AND coi_electronics.gl_product_group(+)= 23 /* Electronics */
AND coi_electronics.customer_id IS NULL
Amazon.com Confidential 51
Anti Joins
SELECT customer_id
, SUM(CASE WHEN gl_product_group = 14 THEN 1 ELSE 0 END) AS num_books
, SUM(CASE WHEN gl_product_group = 23 THEN 1 ELSE 0 END) AS num_electronics
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day >= TO_DATE(’01/01/2009’,’MM/DD/YYYY’)
AND gl_product_group IN (14,23) /* Books, Electronics */
GROUP BY customer_id
Amazon.com Confidential 52
Global Temp Tables
• Make your query unnecessarily complex and
difficult to debug
• Usually can be replaced by a UNION ALL or
grouping set
Amazon.com Confidential 53
Grouping Sets
SELECT coi.order_day
, coi.gl_product_group
, coi.legal_entity_id
, SUM(quantity) AS num_items
, grouping_id(coi.order_day,coi.gl_product_group,
coi.legal_entity_id) AS grp_id
FROM d_customer_order_items coi
WHERE coi.order_day = TO_DATE('07/12/2009','MM/DD/YYYY')
AND coi.legal_entity_id IN (116,108)
AND coi.gl_product_group IN (14,15)
GROUP BY grouping sets (
(coi.order_day,coi.gl_product_group,coi.legal_entity_id),
(coi.order_day,coi.gl_product_group),
(coi.order_day)
)
;
Amazon.com Confidential 54
Grouping Sets
ORDER_DAY GL_PRODUCT_GROUP COUNTRY_CODE NUM_ITEMS GRP_ID
1/12/2008 14 UK 100,000 0
1/12/2008 14 DE 75,000 0
1/12/2008 14 FR 60,000 0
1/12/2008 15 UK 50,000 0
1/12/2008 15 DE 40,000 0
1/12/2008 15 FR 30,000 0
1/12/2008 14 235,000 1
1/12/2008 15 120,000 1
1/12/2008 355,000 3
Amazon.com Confidential 55
The HAVING Clause
Amazon.com Confidential 56
The HAVING Clause
SELECT coi.order_day
, coi.gl_product_group
, coi.country_code
, num_items
FROM (SELECT coi.order_day
, coi.gl_product_group
, coi.country_code
, SUM(quantity) AS num_items
FROM d_customer_order_items coi
WHERE ...
GROUP BY coi.order_day,coi.gl_product_group,coi.country_code
)
WHERE num_items >= 100
SELECT coi.order_day
, coi.gl_product_group
, coi.country_code
, SUM(quantity) AS num_items
FROM d_customer_order_items coi
WHERE ...
GROUP BY
coi.order_day,coi.gl_product_group,coi.country_code
HAVING SUM(quantity) >= 100
Amazon.com Confidential 57
Tuning
• What is a query (execution) plan?
– A.K.A Execution plan
– The steps the database will take to access or modify data
Explained.
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------
Plan hash value: 3513048284
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 394M| 93G| 45747 | | | | | |
| 1 | PX COORDINATOR | | | | | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 394M| 93G| 45747 | | | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 394M| 93G| 45747 | 1 | 16 | Q1,00 | PCWC | |
| 4 | TABLE ACCESS FULL| D_CUSTOMERS | 394M| 93G| 45747 | 1 | 16 | Q1,00 | PCWP | |
-----------------------------------------------------------------------------------------------------------------
SQL>
Amazon.com Confidential 59
ETL Manager Explain Plan
Amazon.com Confidential 60
VDBA Explain Plan
Amazon.com Confidential 61
SQL Developer Explain Plan
Amazon.com Confidential 62
Tuning
Reading a Query Plan
• Operation order
• Access methods (and pstart/pstop)
• Join methods
• Cost
• Cardinality
Amazon.com Confidential 63
Query Plans
Operation Order
7
5
1
2
4
6
Amazon.com Confidential 64
Query Plans
Access Methods
Index lookups:
INDEX Full Scan
Table Access:
INDEX Fast Full Scan
Table Access Full
INDEX Range Scan
Table Access By RowID (Range)
INDEX Unique Scan
Amazon.com Confidential 65
Query Plans
Join Methods
Join Types:
Hash Join (Left/Right/Outer)
Nested Loops (Left/Right/Outer)
Merge Join (Outer/Cartesian)
Amazon.com Confidential 66
Query Plans
All that PX junk
PX Partition (Range/Hash):
PX Send:
All
Broadcast
PXSingle
Receive
Partition (Key)
Iterator
QC(Random)
Inlist
Amazon.com Confidential 68
Query Plans
In/Out
Pstart/Pstop
Pstart/Pstop
QC(Random) – Data is sentPart(Key)
toN..NN
parent in any order it is received
Broadcast – All data sent to all workers
Partitions
ActualAlways
partition
will the
be identified
last
numbers
step of
at
being
aexecution
SELECT
scannedtime
Amazon.com Confidential 70
Tuning
Plan Cost – Why it rarely matters
Based on its knowledge of the tables being joined and the
predicates (where clauses) applied, Oracle’s optimizer
comes up with the most efficient way to execute your SQL.
The problem is Oracle gets much of its data from indexes. The
DW has almost no indexes. Oracle uses indexes to
calculate how many rows it will get from each data source
(table) and how many distinct values are in the keys you
are using to join. It gets the rest of the data when the table
is analyzed (which is less accurate and less complete).
Without indexes, Oracle makes a guess which is often
wrong. A very common bad guess is for Oracle to think it
will get less rows from a table than it really will.
Amazon.com Confidential 71
Query Plans
Cost – Why it rarely matters
SELECT *
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day = TO_DATE('20061201','YYYYMMDD')
Oracle's Guess: 1.05M - Real Number: 1.3M
SELECT *
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day = TO_DATE('20061201','YYYYMMDD')
AND condition = 4
Oracle's Guess: 86K - Real Number: 1M
SELECT *
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day = TO_DATE('20061201','YYYYMMDD')
AND condition = 4
AND quantity = 1
Oracle's Guess: 1 - Real Number: 995K
In each subsequent query, I only filtered out a little data but Oracle's guess got "wronger and wronger"...
This is also why use_hash hints are key. Even if Oracle's guess is completely wrong it will (mostly) listen to you
and join the tables correctly. Amazon.com Confidential 72
Assessing the Query Plan
What a good plan often looks like
Amazon.com Confidential 73
Tuning
Assessing the Query Plan
• Behind the curtain
• Partitions vs. indexes
• Partition elimination
• Best join type for data size
• Join order
• Narrowing down data early in execution
• Merge Join Cartesian
• Missing (or hitting) an index
• Serialization
Amazon.com Confidential 74
Assessing the Query Plan
Behind the Curtain – DW4
5 HP XP24000 Arrays
16 HP DL580 G5
2200 300GB drives (440 drives per array)
4 2.4Ghz quad cores
327 TB total storage
64GB RAM
Amazon.com Confidential 75
Amazon.com Confidential 76
Assessing the Query Plan
Behind the Curtain
• Throughput
– How big is the dump truck?
• IOPS (Input/Output Operations Per Second)
– How many round trips can the dump truck make?
• Parallelism
Amazon.com Confidential 77
Assessing the Query Plan
Partitions vs. Indexes
• Partition – Floor of the building
• Index – Cube number
• Why does the DW not have indexes?
Amazon.com Confidential 78
Assessing the Query Plan
• A Full Table Scan (FTS) rarely actually scans the full table
• Indexes tend to do small (often single-row) lookups
– The Amazon DW is built for FTS speed, not IOPS speed
– Only an INDEX FAST FULL does multi-block reads.
• Most DW tables are partitioned how they are queried
– Partition columns available in the Datanet table
explorer:
http://datanet.amazon.com/dw-platform/servlet/dwp/template/DWPExploreStructureWarehouseLevel.vm
Amazon.com Confidential 79
Assessing the Query Plan
• Join type given row count
– Hash Join
• Almost always the best method
• Almost never hurts performance significantly even if both datasets
are small
– Nested Loops Join
• Works if one dataset is small
• Must have correct information on dataset rows counts to join in
the correct order
– Merge Join
• Somewhat rare in analytic SQL
• Evil cousin is the merge join Cartesian
Amazon.com Confidential 80
Assessing the Query Plan
Joins
– Hash Join
• Build a hash table of the keys in the smaller data set
• Probe (scan) the larger data set, checking each key against the
hash table
• HJ scales (approximately) linearly with data size
– Nested Loops Join
• For each row in the larger data set, scan the entire smaller data
set
• NL scales (approximately) exponentially with data size
– Merge Join
• Sort both data sets and “merge” them
• Scales linearly
Amazon.com Confidential 81
Assessing the Query Plan
• Join order
– Smallest dataset first
– The Optimizer’s row count (cardinality) information must
be correct
• Narrow it down early
– The entire table size isn’t nearly as important as the size
after the filters are applied
– Sometimes the biggest table can be joined first
Amazon.com Confidential 82
Assessing the Query Plan
• Partition elimination
– Don’t scan rows you don’t need
– Always add a filter (predicate/WHERE clause) on all
partition columns if at all possible
– Don’t use bind variables, sysdate, or join keys to filter
partition columns
Amazon.com Confidential 83
Assessing the Query Plan
Partition Elimination
SELECT *
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day = TO_DATE('04/01/2009','MM/DD/YYYY') Yes
SELECT *
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day = '4/1/2009' No
SELECT *
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day = TRUNC(SYSDATE-10) No
SELECT *
FROM d_customer_order_items
WHERE legal_entity_id = 101
AND order_day = :B1 No
Amazon.com Confidential 84
Assessing the Query Plan
Partition Elimination
SELECT *
FROM d_customer_order_items coi
, d_customer_shipment_items csi
WHERE coi.order_day = csi.order_day Better, but not great
AND coi.legal_entity_id = 101
AND csi.legal_entity_id = 101
AND csi.ship_day = TO_DATE(‘{RUN_DATE_YYYYMMDD}',‘YYYMMDD')
Amazon.com Confidential 85
Assessing the Query Plan
Partition Elimination
Hard-coded:
SELECT *
FROM d_inventory_costs
WHERE warehouse_id = 'RNO1'
AND snapshot_day = TO_DATE('04/06/2009','MM/DD/YYYY')
AND region_id = 1 /* NA */
Amazon.com Confidential 86
Tuning
Modifying a Query Plan
• Hints
• Hint syntax
• Commonly used hints
– Use_hash
– Use_nl
– Index
– No_index
– Ordered
– Cardinality
Amazon.com Confidential 87
Modifying a Query Plan
Hints
• A hint (also called an optimizer hint) asks
Oracle to change the steps it will perform to
execute the query
• Hints are sometimes ignored by Oracle
• Incorrect syntax
• Logically impossible
• Oracle just doesn’t want to behave
Amazon.com Confidential 88
Modifying a Query Plan
Hint Syntax
SELECT /*+ hint1 hint2 hint3 */
FROM…
Amazon.com Confidential 89
Modifying a Query Plan
Common Hints
• Use_hash (table_alias)
• Use_nl (table_alias list)
• Index (table_alias index_name)
• No_index (table_alias index_name)
• Ordered
• Cardinality (table_alias rows)
Amazon.com Confidential 90
Modifying a Query Plan
Common Hints
use_hash (table_alias list)
Amazon.com Confidential 93
Modifying a Query Plan
Common Hints
no_index (table_alias index_name)
Amazon.com Confidential 94
Modifying a Query Plan
Common Hints
ordered
Amazon.com Confidential 95
Modifying a Query Plan
Common Hints
cardinality (table_alias rows)
Amazon.com Confidential 96
Remember me?
SELECT order_item.customer_id AS customer_id
, order_item.order_id AS order_id
, TO_CHAR(order_item.order_datetime, 'yyyy-mm-dd hh24:mi') AS order_date
, order_item.ASIN AS order_asin
, order_item.our_price AS item_price
, order_item.customer_order_item_id AS order_item_id
, ship_item.customer_shipment_item_id AS shipment_item_id
, ship_item.marketplace_id
FROM d_customer_order_items order_item
, d_customer_shipment_items ship_item
WHERE order_item.customer_order_item_id = ship_item.customer_order_item_id
AND order_item.legal_entity_id =
(CASE WHEN order_item.marketplace_id = '4861' THEN 101
WHEN order_item.marketplace_id = '41092' THEN 109
END
)
AND order_item.marketplace_id = '41092'
AND order_item.order_datetime >= SYSDATE - 7
AND order_item.order_datetime < SYSDATE - 1
ORDER BY 1 ASC, 5 DESC
Amazon.com Confidential 97
Things to Remember
• Don’t scan the same data twice
• You’re smarter then the computer
• Narrow it down early
• Always get full partition elimination
• Use aggregates and rollup tables when
possible
Amazon.com Confidential 98
Fin
• Questions
Amazon.com Confidential 99