0% found this document useful (0 votes)
268 views5 pages

Netezza Best Practices

This document provides best practices for Netezza including: - Choosing good distribution keys to evenly distribute data across nodes for optimal performance. - Using consistent and appropriate data types. - Leveraging zonemaps for ordered or grouped data to reduce disk scans. - Maintaining up-to-date statistics for better query optimization. - Regularly reclaiming space through grooming to free up outdated or deleted tuples. - Following guidelines for efficient ETL such as bulk loading and minimizing I/O.

Uploaded by

SUBHASH RAJAK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
268 views5 pages

Netezza Best Practices

This document provides best practices for Netezza including: - Choosing good distribution keys to evenly distribute data across nodes for optimal performance. - Using consistent and appropriate data types. - Leveraging zonemaps for ordered or grouped data to reduce disk scans. - Maintaining up-to-date statistics for better query optimization. - Regularly reclaiming space through grooming to free up outdated or deleted tuples. - Following guidelines for efficient ETL such as bulk loading and minimizing I/O.

Uploaded by

SUBHASH RAJAK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Netezza Best Practices

Prepared By
Sivakumar Nair/India/IBM
1. Introduction

2. Distribution

3. Datatypes

4. ZoneMaps

5. Statistics

6. Groom / Reclaim

7. ETL/ELT Guidelines
Introduction
Netezza sells itself on simplicity and therefore best practice should not mean
hundreds of rules and regulations to follow. Recommended that basic principles are
on
> Distribution
> Datatypes
> Statistics
> Zonemaps
> Reclaim
Along side some basic standards for ETL and general pointers will help applications
to perform 99%. Best practices means minimal effort early on for maximum gain.

Distribution
Good Distribution is the fundamental element of performance. A SPU is the
individual element of parallelism and if all SPUs have same amount of work to do, a
query will be quicker than if one SPU was asked to do same job.
> Bad distribution is called data skew
> Skew to one SPU is worst case scenario.
> Skew affects query in hand and others as SPU has more to do.
> Skew also means that the machine will fill up quicker.
> Simple rule. Good distribution-Good Performance.
> Never create a table with out distribution key.
> If no distribution key is specified, the NPS chooses a distribution key and there is
no guarantee what that key is. This will eventually creates data skew.
When choosing the distribution key consider the following factors
> More distinct the distribution key values, the better.
> The Same distribution key value always goes to the same SPU.
> Table Used together should use the same columns for their distribution key
when possible.
> If a particular key is largely used in equal join clause, then that key is good
choice for distribution key.
> Check that there is no accidental process skew when there is a good record
distribution.
> If in doubt, use Random distribution as it will give perfect distribution.
> For Smaller tables Random distribution is usually good choice.
Criteria for Selecting distribution keys.
> Choose column for distribution key that distribute table rows eventually.
> Choose columns for the distribution key based on the selection set that you use
most frequently to retrieve rows from the table.
> Choose as few columns as possible for distribution key (Max 4 Columns).
> Do not choose Boolean columns as distribution key.

Data types
Picking right data types always give better performance.
> Having columns of uniform type produces consistent results.
> Having columns of uniform type ensures that data is stored efficiently.
> Having columns of Uniform type allow the system to process the queries
efficiently
> Numeric data type with a scales 0 are similar to INTEGER datatypes and switch to
Integer datatype means Zonemaps
> The INTERVAL datatype means cumbersome and hard to work with. Consider
storing original Time and Timestamp values and calculating interval on fly.
> Floating point datatype are, by definition, lousy in nature. There may be
performance hit by using them
> Inconsistent datatype for same column in different tables hit performance

ZoneMaps
> ZoneMaps improve the throughput and the response time of SQL against large
groups, or continually augmented nearly ordered data.
> Zonemaps are automatically generated, persistent, internal tables.
> Works with Large, grouped or nearly ordered date, timestamp and byteint,
smallint, integer, and biginteger datatypes.
> Zonemaps take advantage of inherent ordering or grouping of data to reduce disk
scans required to retrieve data on restricted scan queries.

Statistics
> Netezza uses Cost based Optimizer
> The more up to date and accurate table statistics are, the better plans the query
optimizer will generate.
> Statistics should be built into ETL or ELT processing where ever possible.
> Regular monitoring should be deployed to check out of date statistics.

Groom / Reclaim
Why groom is important?
> An update or delete of a table row does not remove old tuple.
> Over time outdated or deleted tuples are of no interest to any transaction
and must be deleted to free up space.
When should you reclaim
> Groom tables that receives frequent update or deletes
> Groom tables if you cancel or abort large load operation.
Groom best practices
> If You have a table whose contents are delete completely, consider using
truncate rather than delete, which eliminates the need to run groom
command.
> Build groom into the ETL processing where ever possible.

ETL / ELT Guidelines


> Avoid many small insert / update, especially single line inserts.
> Use bulk load method where ever possible.
> Avoid cursor based processing
> Order by on primary key, date or common join column field to optimize zone
maps
> Look to establish standard load and ETL methods (best practices) for ETL and
Load tools and methods that you used.
> Minimize I/O between the host and the ETL server where ever possible.

You might also like