0% found this document useful (0 votes)

80 views41 pages

Best Practices For Query Performance in A Data Warehouse: Calisto Zuzarte

The document provides best practices for improving query performance in a data warehouse. It discusses database design considerations like parallelism, partitioning, schema design, and compression. It also covers application design best practices and performance layer techniques including indexes, statistics, constraints, and materialized views. Ongoing tuning considerations are also highlighted to keep operations optimized as data and requirements change over time.

Uploaded by

Indrajit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views41 pages

Best Practices For Query Performance in A Data Warehouse: Calisto Zuzarte

Uploaded by

Indrajit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 41

Best Practices for Query

Performance In a Data Warehouse

Calisto Zuzarte
IBM
[email protected]
Session Code: D09
May 13, 2010 8:30AM9:30AM
Platform: Linux, Unix and Windows

Data Warehouse Life Cycle

Database design / Application design

Database performance layer implementation

The Warehouse Application architects and Database Administrators work together

to design the queries and schema before they put the application in production

In order to meet SLAs, DBAs usual go through some iterations augmenting the
database with performance layer objects and set up the initial configuration to get
good performance

Database tuning operations

During production, with changing requirements and change in data, there is ongoing tuning required to keep operations smooth.

Motivation
Data warehouse environments characteristics:
Large volumes of data
Millions/Billions of rows involved in some tables
Large amounts of data rolled-in and rolled-out

Complex queries

Large Joins
Large Sorts,
Large amounts of Aggregations
Many tables involved

Ad Hoc Queries

It is important to pay attention to query performance

Objectives
Provide recommendations so that you can improve data
warehouse query performance

Database Design considerations

Application Design considerations
Performance Layer Considerations
Ongoing Tuning Considerations

Agenda

Best Practices Database Design

Best Practices Application Design
Best Practices Performance Layer
Best Practices Configuration and Operations

Best Practices Database Design

Best Practices - Parallelism
Inter-partition Shared nothing parallelism
Intra-Query Parallelism (SMP)

Best Practices - Partitioning

Database Partitioning
Table Partitioning
Multi-Dimension Clustering
UNION ALL Views

Best Practices Schema

Best Practices - Compression

Best Practices - Parallelism

Database partition feature (DPF) is recommended
To achieve parallelism in a data warehouse
For scalability and query performance

SMP (Intra-Query Parallelism) not recommended

In concurrent multi-user environments with heavy CPU usage

SMP recommended
When CPUs are highly under utilized
When DPF is not an option

Partitioning (Complimentary Strategies in DB2)

Database Partitioning (DPF) DISTRIBUTE BY HASH

Multidimensional Clustering (MDC) ORGANIZE BY DIMENSION

Key Benefit : Better scalability and performance through parallelism

Key Benefit : Better query performance through data clustering

Table (Range) Partitioning PARTITION BY RANGE

Table Partitioning
Key Benefit : Better data management (roll-in and roll-out of data)

UNION ALL Views

Key Benefit : Independent branch optimization

Divide And Conquer ! Distribute, Partition, Organize !

OrganizeBy

PartitionBy

DistributeBy

Best Practices Database Partitioning

Collocate the fact and largest dimension
Choose to avoid significant skew on some partitions
Avoid DATE dimension where active transactions for
current date all fall on one database partition
(TIMESTAMP is good)
Possibilities for workload isolation for data marts
Different partition groups but common dimension tables
Needs replicated tables (discussed later)

Best Practices Table Partitioning

Recommend partitioning the fact tables
Typically based on DATE dimension
Works better with application key predicates applied directly

Table or Range Partitioning

Recommend table or range partitioning (V9.7 :partitioned indexes)
Choose partitioning based on roll-in / roll-out granularity

UNION ALL Views

Each branch optimized independently

Use with well designed applications (Dangers of materialization)
Large number of branches require time and memory to optimize
Needs predicates with constants for branch elimination

Best Practices Multidimensional Clustering

(MDC)
Recommend defining MDC on the fact table
Guaranteed clustering (Avoids the need to REORG for clustering)
I/O optimization
Compact indexes (compact, coexists with regular indexes)

Choose dimensions based on query predicates

Recommend the use of 1 to 4 dimensions
Need to ensure dimensions are chosen such that they do not waste
storage

Could choose a finer granularity of Table partitioning range

For example: Table partition range by month, MDC by date

Star Schema

STORE

PRODUCT

Store_id

Product_id
SALES

Region_id

TIME
Date_id
Month_id
Quarter_id
Year_id

Product_id
Store_id
Channel_id
Date_id
Amount
Quantity

Class_id
Group_id
Family_id
Line_id
Division_id

CHANNEL
Channel_id

Product Dimension

Dimension Hierarchy

Division

Level 5

Time Dimension

Line

Level 4

Year

Family

Level 3

Group

Level 2

Class

Level 1

Product

Level 0

Store Dimension

Channel Dimension

Retailer

Channel

Store

Quarter
Month
Date

Sales Fact

Best Practices - Schema

Surrogate Keys
As far as possible use application keys themselves
allows predicates to be applied/transferred directly on the fact table
DATE is a good candidate (easier to roll-in/roll-out and for MDC )

Star Schema / Snowflakes

Separate tables for each dimension hierarchy (snowflake) may result in a large
number of joins
Flattened dimensions may contain a lot of redundancy (space)

Define Columns NOT NULL when appropriate

Many optimizations that are done based on NOT NULL

Define Uniqueness when appropriate

Primary Keys / Unique Constraints / Unique Indexes

Compression
Table, Index and Temp Table compression
Huge benefits with storage savings
With table and TEMP compression 30-70%
With Index compression 30-40%

Performance gains because

Less I/O and better use of bufferpools

TEMP table compression helps operators like Hash

Join, Merge Join, Sorts and Table Queues if they spill

Best Practices - Compression

Consider compression particularly with the fact table
Strongly recommend compression on the fact table when
not CPU bound

Agenda

Best Practices Database Design

Best Practices Application Design
Best Practices Performance Layer
Best Practices Configuration and Operations

Best Practices Application Considerations

Use constants instead of expressions in the query
Example
WHERE DateCol <= CURRENT DATE 5
Use VALUES(CURRENT DATE 5) to get the resulting constant first
and use it in the query

Avoid expressions on indexed columns

Example
WHERE DATECOL 2 DAYS > 2009-10-22
WHERE DATECOL > 2009-10-22 + 2 DAYS

Best Practices Application Considerations

Avoid mixing data types in join predicates
Example
WHERE IntegerCol = DecimalCol

Use Global Temporary Tables to split a query if it contains

more than 10-15 tables
Reduces optimization time

Agenda

Best Practices Database Design

Best Practices Application Design
Best Practices Performance Layer
Best Practices Configuration and Operations

Best Practices Performance Layer

Indexes
Statistics
Distribution Statistics
Column Group Statistics
Statistical Views

Constraints
Referential Integrity

Materialized Query Tables

Replicated Tables

Indexes
Indexes are a vertical subset of the data in the table
Indexes provide ORDER
Indexes may allow for clustered access to the table

Index Considerations
To get Index Only Access instead of more expensive ISCANFETCH or TSCAN (Table Scan)
To avoid SORTs particularly those that spill
To promote index-ORing and index-ANDing
To promote Star Joins
When you have range join predicates
Better possibilities with Nested Loop Join

Indexes for clustering (MDC)

Cardinality Estimation
Estimating the size of intermediate results is critical to getting
good query execution plans
Without sufficient information, the optimizer can only guess
based on some assumptions
Data skew and statistical correlation between multiple
column values introduce uncertainty
Pay attention to DATE columns

Best Practices - Statistics

Collect distribution Statistics when there is skew and
predicates use constants
Consider a high number of quantile statistics on columns
with DATE range predicates and character string columns

Column Group Statistics

1
2
3

Country City

Hotel Name

German Bremen
y

Hilton

German Bremen
y

Best Western

German Frankfur
y
t

InterCity

German Frankfur
y
t

Shangri-La

Canada
Four Seasons
Example: COUNTRY = Germany And
CITYToronto
= Frankfurt
Canada

Toronto

No CGS: Selectivity = * 1/3 = 1/6 Estimate 1 row

With CGS: Selectivity = 1/3 Estimate 2 rows

Intercontinent
al

Problem Scenario - Skew

CUST Table 100 rows, 100 custids

10000000 rows

Frequency Statistics SALES Table

CUSTID

CNAME

CUSTID

# of Rows

ABC

2000000

DEF

700000

GHI

500000

IBM

300000

JKL

100000

MNO

50000

PQR

20000

100

XYZ

SELECT FROM SALES, CUST

WHERE CUST.CNAME = IBM AND CUST.CUSTID = SALES.CUSTID
Cardinality Estimate with Uniformity = 100,000
Actual Cardinality : 2,000,000 !!!!!!!!!!!!!!!!!!!!

Best Practices - Statistics

Collect Column Group Statistics with multiple predicates on the
same table
WHERE Country = CANADA and City = TORONTO
RUNSTATS ON ALL COLUMNS AND ON COLUMNS ((country, city) )

Consider Statistical Views when

There is skew on the join column
There is a significant difference in the range of values in the fact and the
dimension
CREATE VIEW SV1 AS
(SELECT C.* FROM CUST C, FACT F WHERE C.CUST_ID = F.CUST_ID)
ALTER VIEW cust_fact ENABLE QUERY OPTIMIZATION
RUNSTATS ON TABLE dba.cust_fact WITH DISTRIBUTION

Referential Integrity (RI)

Facilitates aggregation push down
Example in the appendix section

Eliminates redundant joins in views

RI helps determine that queries that do not require data from a
primary key table need not do that join even if it is in the view

Helps with Materialized Query Table matching

Allows Queries to match MQTs with more dimension table joins

Consider Materialized Query Tables

Joes Query

Sues

GB
JOIN
JOIN

Dim2

JOIN

Joes Q
GB

JOIN
Dim2
JOIN
Fact
Dim1
GB
Fact
Dim1
JOIN
Dim2
JOIN
Fact
Dim1
JOIN
Dim2

Sues Query

Fact

Bobs Q

JOIN

Dim1

JOIN

Bobs Query
Fact

Dim2
Dim1

MQT

Best Practices - Defining Materialized Query

Tables
What MQTs should I define ?
Estimate the size of the candidate MQTs by executing COUNT
queries against base tables.
Try to achieve at least a 10X reduction in size between fact and
the MQT
Build MQTs with a reasonable number of GROUP BY columns (3
to 6 dimension keys) at a time based on query patterns

As far as possible build the MQT from the fact table alone

Use Table Partitioning for the fact table and the MQTs

Best Practices - MQT Matching

Define Referential Integrity to help with matching MQTs
that contain more tables than the queries
Define Functional Dependencies for thinner MQTs
Use COUNT_BIG instead of COUNT for DPF MQTs
Define indexes on MQTs
Keep statistics up-to-date
Define base table columns NOT NULL as far as possible
For example we can match SUM(A + B) with SUM(A) + SUM(B)

Best Practices MQT Maintenance

REFRESH IMMEDIATE
Create an index on the GROUP BY columns
Create the index on the set of columns that form a unique key
Always keep the base table and MQT statistics up-to-date

REFRESH DEFERRED
If log space is an issue, consider NOT LOGGED INITIALLY or LOAD from
cursor
An MQT can be temporarily toggled into a regular table by using
ALTER TABLE DROP MATERIALIZED QUERY
ALTER TABLE ADD MATERIALIZED QUERY

Use ATTACH / DETACH if fact table and MQT are range partitioned tables

Replicated Tables
JOIN
BTQ

CUST

CUST
COPY

JOIN
BTQ

SALES

CUST
COPY

JOIN
BTQ

SALES

CUST
COPY

SALES

Replicate dimension tables (unless collocated with fact )

Benefit : Avoids data movement
Important : Define suitable indexes
If too large, replicate a subset of frequently used columns

Agenda

Best Practices Database Design

Best Practices Application Design
Best Practices Performance Layer
Best Practices Configuration and Operations

Best Practices Configuration

Optimization Level 5
Registry Variables
DB2_ANTIJOIN=EXTEND
If slow queries have NOT EXISTS, NOT IN predicates

DB2_REDUCED_OPTIMIZATION=YES
If compile time is an issue

Configuration thumb rules

BUFFPOOL ~= SHEAPTHRES
SORTHEAP ~= SHEAPTHRES/(# of concurrent SORT, HSJN)

Best Practices - Statistics

The DB2 Query Optimizer relies on reasonably accurate statistics to
get a good query plans
User runs RUNSTATS when data changes (part of ETL)
Statistics Fabrication (unreliable)
DB2 keeps UPDATE / DELETE / INSERT counters
Fabrication limited to a few statistics Not enough

Consider configuring Automatic Statistics

Automatically collects statistics on tables in need
Runs in the background as a low priority job

Consider configuring Real Time Statistics

Collects statistics on-the-fly

Summary Best Practices

Database Design :
Parallelism, Partitioning, Schema, Compression

Application Design
SQL Tips

Performance Layer
Indexes, Statistics, Referential Integrity, Materialized Query
Tables, Replicated Tables

Configuration and Operations

Configuration, Collecting Statistics

Calisto Zuzarte
[email protected]

02 - AWS Restart Training Modules and Topics
100% (1)
02 - AWS Restart Training Modules and Topics
4 pages
Project Management Assignment Improvemen
No ratings yet
Project Management Assignment Improvemen
29 pages
Data Structures Lec PDF
100% (1)
Data Structures Lec PDF
156 pages
Stage Wash 7x 10W LED Moving Head (RGBW)
No ratings yet
Stage Wash 7x 10W LED Moving Head (RGBW)
13 pages
Erwin Data Modeler Navigator Edition User Guide - 140
No ratings yet
Erwin Data Modeler Navigator Edition User Guide - 140
135 pages
SRM Institute of Science & Technology: Learn Leap Lead
No ratings yet
SRM Institute of Science & Technology: Learn Leap Lead
3 pages
Kali Linux Course
No ratings yet
Kali Linux Course
4 pages
Threat Visualizer Essentials Study Guide
No ratings yet
Threat Visualizer Essentials Study Guide
17 pages
Study Theme 1 - Chapter 1 - Hello Data
No ratings yet
Study Theme 1 - Chapter 1 - Hello Data
23 pages
SPE-177527-MS The Design and Implementation of A Full Field Inter-Well Tracer Program On A Giant UAE Carbonate Oil Field
No ratings yet
SPE-177527-MS The Design and Implementation of A Full Field Inter-Well Tracer Program On A Giant UAE Carbonate Oil Field
8 pages
'Expert Advisor Based On The - New Trading Dimensions - by Bill Williams - MQL5 Articles
100% (2)
'Expert Advisor Based On The - New Trading Dimensions - by Bill Williams - MQL5 Articles
12 pages
Qualis TruRisk Report
No ratings yet
Qualis TruRisk Report
43 pages
Cybersecurity Practice
100% (1)
Cybersecurity Practice
7 pages
10 1016@j Applthermaleng 2020 115406
No ratings yet
10 1016@j Applthermaleng 2020 115406
15 pages
PhonePe Statement Dec2024 Jan2025
No ratings yet
PhonePe Statement Dec2024 Jan2025
14 pages
Azure Resilency
No ratings yet
Azure Resilency
210 pages
Festo Instructions
No ratings yet
Festo Instructions
31 pages
623c5167e6932197335531 SMT GRC Analyst
No ratings yet
623c5167e6932197335531 SMT GRC Analyst
3 pages
Boq HMRP OSC 0058 AND 0059
No ratings yet
Boq HMRP OSC 0058 AND 0059
3 pages
Neo4j Manual PDF
No ratings yet
Neo4j Manual PDF
334 pages
Infrastructure Design For Availability and Resilience WP
No ratings yet
Infrastructure Design For Availability and Resilience WP
15 pages
Crystal Reports Introductory
No ratings yet
Crystal Reports Introductory
124 pages
Add Notes - Six Signma Vs Lean
No ratings yet
Add Notes - Six Signma Vs Lean
12 pages
Helpfile 24 Pages Brochure: Spinwerad 48pt White
No ratings yet
Helpfile 24 Pages Brochure: Spinwerad 48pt White
4 pages
Worksoft Execution Suite Installation Guide v101
No ratings yet
Worksoft Execution Suite Installation Guide v101
105 pages
Database Coding Guidelines
No ratings yet
Database Coding Guidelines
23 pages
Cisco ASA - Troubleshooting Basic Traffic Flow
No ratings yet
Cisco ASA - Troubleshooting Basic Traffic Flow
2 pages
Building A Mobile App For Volunteer Management and Coordination
100% (1)
Building A Mobile App For Volunteer Management and Coordination
13 pages
Woodward Governor 505 UO Min Knowledge PDF
No ratings yet
Woodward Governor 505 UO Min Knowledge PDF
20 pages
SR - Den/Co-ord/MDU Acting For and On Behalf of The President of India Invites E-Tenders Against Tender No MDU-W-04-2019-20
No ratings yet
SR - Den/Co-ord/MDU Acting For and On Behalf of The President of India Invites E-Tenders Against Tender No MDU-W-04-2019-20
11 pages
Examining The Relationship Between Stress Levels and Cybersecurity Practices Among Hospital Employees in Three Countries Ghana, Norway, and Indonesia - Final
No ratings yet
Examining The Relationship Between Stress Levels and Cybersecurity Practices Among Hospital Employees in Three Countries Ghana, Norway, and Indonesia - Final
27 pages
Introduction To Database Systems (Er Diagrams) Class Exercise 5
No ratings yet
Introduction To Database Systems (Er Diagrams) Class Exercise 5
2 pages
MBM Lafarge Translated
No ratings yet
MBM Lafarge Translated
24 pages
SabahForestIndustriesSdnBhdES Oct2008
No ratings yet
SabahForestIndustriesSdnBhdES Oct2008
37 pages
Top Software Development Trends Expected To Dominate in 2022 - by Ankita Kapoor - Enlear Academy
No ratings yet
Top Software Development Trends Expected To Dominate in 2022 - by Ankita Kapoor - Enlear Academy
31 pages
Chapter 8 (Helpdesk Operations)
No ratings yet
Chapter 8 (Helpdesk Operations)
81 pages
Mastering AWS IAM For Amazon S3: E-Book
No ratings yet
Mastering AWS IAM For Amazon S3: E-Book
25 pages
Resume: Name:Srilatha Phone: +91-9494171851
No ratings yet
Resume: Name:Srilatha Phone: +91-9494171851
2 pages
Interview Questions Part 1
No ratings yet
Interview Questions Part 1
12 pages
AWS Certified Database Specialty - Exam Guide
No ratings yet
AWS Certified Database Specialty - Exam Guide
11 pages
SAS Job Execution Web Application 2.1 - User S Guide
No ratings yet
SAS Job Execution Web Application 2.1 - User S Guide
94 pages
How To Evaluate Distributed SQL Solution Whitepaper 1107
No ratings yet
How To Evaluate Distributed SQL Solution Whitepaper 1107
11 pages
Ultimate Guide To Incident Response and Management
No ratings yet
Ultimate Guide To Incident Response and Management
29 pages
OWASP AlphaRelease CodeReviewGuide2.0
No ratings yet
OWASP AlphaRelease CodeReviewGuide2.0
223 pages
Blockchain Explained v2.09
No ratings yet
Blockchain Explained v2.09
35 pages
6 Tips For Better SQL Query Optimization (With Example Code)
No ratings yet
6 Tips For Better SQL Query Optimization (With Example Code)
4 pages
AIS-Dedicated Processors For Maritime Safety: CML Microcircuits
No ratings yet
AIS-Dedicated Processors For Maritime Safety: CML Microcircuits
12 pages
Business Impact Analysis Template
No ratings yet
Business Impact Analysis Template
11 pages
OSI Model CheatSheet
No ratings yet
OSI Model CheatSheet
13 pages
Window Mill
No ratings yet
Window Mill
6 pages
Fundamentals of Ecommerce Security
No ratings yet
Fundamentals of Ecommerce Security
15 pages
Contrabalance Walvoil CC10A
No ratings yet
Contrabalance Walvoil CC10A
3 pages
Information System For Strategic Management: Ihr Logo
No ratings yet
Information System For Strategic Management: Ihr Logo
56 pages
ITIL Foundation
No ratings yet
ITIL Foundation
133 pages
Cyber Security Analyst Interview Questions and Answers
No ratings yet
Cyber Security Analyst Interview Questions and Answers
11 pages
mgmt4135 Chapter8
No ratings yet
mgmt4135 Chapter8
39 pages
CIS Microsoft SQL Server 2005 Benchmark v2.0.0
No ratings yet
CIS Microsoft SQL Server 2005 Benchmark v2.0.0
166 pages
Migration From Oracle To MySQL - An NPR Case Study Presentation
No ratings yet
Migration From Oracle To MySQL - An NPR Case Study Presentation
24 pages
Jeppesen - New Approach Charts
100% (6)
Jeppesen - New Approach Charts
25 pages
IBM Universe Uonet
No ratings yet
IBM Universe Uonet
264 pages
Crystal Reports Tips
No ratings yet
Crystal Reports Tips
59 pages
The Expert Guide To Vmware Disaster Recovery and Data Protection
No ratings yet
The Expert Guide To Vmware Disaster Recovery and Data Protection
54 pages
Product Lifecycle Management Advantages and Approach
No ratings yet
Product Lifecycle Management Advantages and Approach
4 pages
AWS Partner Network (APN) Marketing Toolkit
No ratings yet
AWS Partner Network (APN) Marketing Toolkit
46 pages
PowerGUI 3.5 UserGuide
No ratings yet
PowerGUI 3.5 UserGuide
60 pages
Product BackLog Exercise
No ratings yet
Product BackLog Exercise
64 pages
Sakuntala - English Translation by JG Jennings
No ratings yet
Sakuntala - English Translation by JG Jennings
231 pages
Developing Financial Projections
No ratings yet
Developing Financial Projections
14 pages
AI Program-Simplilearn
No ratings yet
AI Program-Simplilearn
27 pages
Software Requirements Specification Template
No ratings yet
Software Requirements Specification Template
3 pages
Lecture 1 and 2
No ratings yet
Lecture 1 and 2
10 pages
A Study On Effectiveness of Training Program in DCW Ltd. Sahupuram, Tutucorin District
100% (1)
A Study On Effectiveness of Training Program in DCW Ltd. Sahupuram, Tutucorin District
7 pages
Paybooks Employee Self Service
No ratings yet
Paybooks Employee Self Service
19 pages
Examples of Personal Statements
No ratings yet
Examples of Personal Statements
5 pages
Dbms Quiz
No ratings yet
Dbms Quiz
13 pages
Sponsorship Proposal Template
No ratings yet
Sponsorship Proposal Template
7 pages
Project Tracker Documentation
No ratings yet
Project Tracker Documentation
51 pages
PHP Developer Certification
No ratings yet
PHP Developer Certification
0 pages
Design and Functional Specification
No ratings yet
Design and Functional Specification
25 pages
Software Process Models
No ratings yet
Software Process Models
31 pages
What Are The Duties of A Lead Solutions Architect - Everyday Life - Global Post
No ratings yet
What Are The Duties of A Lead Solutions Architect - Everyday Life - Global Post
2 pages
BigData Research Paper
No ratings yet
BigData Research Paper
22 pages
Software Agreement
No ratings yet
Software Agreement
2 pages
Digitisation of Hyderabad - AP Media PDF
No ratings yet
Digitisation of Hyderabad - AP Media PDF
24 pages
Common Interview Questions and Answers PDF
No ratings yet
Common Interview Questions and Answers PDF
4 pages
Steering System: 994H Wheel Loader
No ratings yet
Steering System: 994H Wheel Loader
24 pages
CAIQ Lite
No ratings yet
CAIQ Lite
12 pages
26 Ways to Save on Your Utility Bills!: 26 Ways, #1
From Everand
26 Ways to Save on Your Utility Bills!: 26 Ways, #1
Kimberly Peters
No ratings yet
system integrator Second Edition
From Everand
system integrator Second Edition
Gerardus Blokdyk
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet

Best Practices For Query Performance in A Data Warehouse: Calisto Zuzarte

Uploaded by

Best Practices For Query Performance in A Data Warehouse: Calisto Zuzarte

Uploaded by

Best Practices for Query

Performance In a Data Warehouse

Data Warehouse Life Cycle

Database design / Application design

Database performance layer implementation

The Warehouse Application architects and Database Administrators work together

Database tuning operations

It is important to pay attention to query performance

Database Design considerations

Best Practices Database Design

Best Practices Database Design

Best Practices - Partitioning

Best Practices Schema

Best Practices - Parallelism

SMP (Intra-Query Parallelism) not recommended

Partitioning (Complimentary Strategies in DB2)

Database Partitioning (DPF) DISTRIBUTE BY HASH

Multidimensional Clustering (MDC) ORGANIZE BY DIMENSION

Key Benefit : Better scalability and performance through parallelism

Key Benefit : Better query performance through data clustering

Table (Range) Partitioning PARTITION BY RANGE

UNION ALL Views

Divide And Conquer ! Distribute, Partition, Organize !

Best Practices Database Partitioning

Best Practices Table Partitioning

Table or Range Partitioning

UNION ALL Views

Each branch optimized independently

Best Practices Multidimensional Clustering

Choose dimensions based on query predicates

Could choose a finer granularity of Table partitioning range

Best Practices - Schema

Star Schema / Snowflakes

Define Columns NOT NULL when appropriate

Define Uniqueness when appropriate

Performance gains because

TEMP table compression helps operators like Hash

Best Practices - Compression

Best Practices Database Design

Best Practices Application Considerations

Avoid expressions on indexed columns

Best Practices Application Considerations

Use Global Temporary Tables to split a query if it contains

Best Practices Database Design

Best Practices Performance Layer

Materialized Query Tables

Indexes for clustering (MDC)

Best Practices - Statistics

Column Group Statistics

No CGS: Selectivity = * 1/3 = 1/6 Estimate 1 row

Problem Scenario - Skew

Frequency Statistics SALES Table

SELECT FROM SALES, CUST

Best Practices - Statistics

Consider Statistical Views when

Referential Integrity (RI)

Eliminates redundant joins in views

Helps with Materialized Query Table matching

Consider Materialized Query Tables

Best Practices - Defining Materialized Query

Best Practices - MQT Matching

Best Practices MQT Maintenance

Replicate dimension tables (unless collocated with fact )

Best Practices Database Design

Best Practices Configuration

Configuration thumb rules

Best Practices - Statistics

Consider configuring Automatic Statistics

Consider configuring Real Time Statistics

Summary Best Practices

Configuration and Operations

You might also like