0% found this document useful (0 votes)
40 views

7.2 Netezza Data Loading Guide

This document provides an overview and guidance for using IBM Netezza's external table functionality for loading data. It describes the components involved in data loading, supported file formats and data types, and options that can be specified. The document also provides examples for creating external tables to load data from files and best practices for using external tables.

Uploaded by

jerry1972
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

7.2 Netezza Data Loading Guide

This document provides an overview and guidance for using IBM Netezza's external table functionality for loading data. It describes the components involved in data loading, supported file formats and data types, and options that can be specified. The document also provides examples for creating external tables to load data from files and best practices for using external tables.

Uploaded by

jerry1972
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 92

IBM Netezza

Release 7.2

IBM Netezza Data Loading Guide


IBM Netezza
Release 7.2

IBM Netezza Data Loading Guide


Note
Before using this information and the product it supports, read the information in “Notices” on page D-1

Revised: September 15, 2014


This edition applies to IBM Netezza Release 7.2 and to all subsequent releases until otherwise indicated in new
editions.
© Copyright IBM Corporation 2011, 2014.
US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract
with IBM Corp.
Contents
Electronic emission notices . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Regulatory and compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

About this publication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii


If you need help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
How to send your comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Chapter 1. Overview of data loading methods. . . . . . . . . . . . . . . . . . . 1-1


Data loading components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Data loading formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
Decimal delimiter option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2

Chapter 2. External tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1


About external tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Privileges required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Display external table information . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
Log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
External table usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
Parse rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
Back up and restore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
CREATE EXTERNAL TABLE command syntax . . . . . . . . . . . . . . . . . . . . . . . 2-3
Transient external tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
Supported data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
Integer data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
Fixed-point data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
Floating-point data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
Character strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
Time data types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
Best practices for using external tables . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
CREATE EXTERNAL TABLE command examples . . . . . . . . . . . . . . . . . . . . . . 2-13
Transient external table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
Fixed-Length format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Examples of unload and reload . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
Back up and restore a user table . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15

Chapter 3. External table load options . . . . . . . . . . . . . . . . . . . . . . 3-1


External table options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
Option details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
The BoolStyle option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
The Compress option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
The CRinString option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
The CtrlChars option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
The DataObject option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
The DateDelim option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
The DateStyle option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5
The DecimalDelim option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
The Delimiter option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6
The encoding option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
The EscapeChar option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-7
The FillRecord option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
The Format option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8
The IgnoreZero option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8

© Copyright IBM Corp. 2011, 2014 iii


The IncludeHeader option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
The IncludeZeroSeconds option . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
The Layout option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
The LogDir option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
The MaxErrors option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-9
The MaxRows options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
The NullValue option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
The QuotedValue option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-10
The RecordDelim option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11
The RecordLength option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-11
The RemoteSource option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
The RequireQuotes option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
The SkipRows option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
The SocketBufSize option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12
The TimeDelim option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
The TimeRoundNanos option. . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
The TimeStyle option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
The TruncString option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-13
The Y2Base option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
External table option processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Row Counts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Bad rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
Input row delineation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Input fields and table columns . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
String and non-string fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Handle the absence of a value . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-15
Load continuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
Legal characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-16
Session variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-17

Chapter 4. The nzload command . . . . . . . . . . . . . . . . . . . . . . . . 4-1


How the nzload command works. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Protection and privileges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Concurrency and transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
Program invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
Load status information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
nzload command syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
The nzload control file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
Configuration file example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7

Chapter 5. Unload data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1


External table unload options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
Unloading data to a remote client system . . . . . . . . . . . . . . . . . . . . . . . . . 5-1

Chapter 6. Fixed-length format . . . . . . . . . . . . . . . . . . . . . . . . . 6-1


Format background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
Fixed-length format files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
Data attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
Format options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
Layout definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
Fixed-length format definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6

Appendix A. Examples and grammar . . . . . . . . . . . . . . . . . . . . . . A-1


Examples of specifying the nzload arguments . . . . . . . . . . . . . . . . . . . . . . . A-1
Specify parameters for the nzload command . . . . . . . . . . . . . . . . . . . . . . . A-1
Streaming data with named pipes . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
Sample nzload usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
Reference examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
Decimal delimiter examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-3
SQL grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5

iv IBM Netezza Data Loading Guide


Fixed-length format definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-5
Script example for loading data by using fixed-length format . . . . . . . . . . . . . . . . . . A-7

Appendix B. Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . B-1


Tips for successful loading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Create your table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Determine your data format . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
Consider the load source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
Run the job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
Troubleshoot. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
Handle exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Validate the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Generate statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Test performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Error handling for nzload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
Error reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
The nzload log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4

Appendix C. Option names . . . . . . . . . . . . . . . . . . . . . . . . . . C-1


Specify options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-3

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X-1

Contents v
vi IBM Netezza Data Loading Guide
Electronic emission notices
When you attach a monitor to the equipment, you must use the designated
monitor cable and any interference suppression devices that are supplied with the
monitor.

Federal Communications Commission (FCC) Statement

This equipment was tested and found to comply with the limits for a Class A
digital device, according to Part 15 of the FCC Rules. These limits are designed to
provide reasonable protection against harmful interference when the equipment is
operated in a commercial environment. This equipment generates, uses, and can
radiate radio frequency energy and, if not installed and used in accordance with
the instruction manual, might cause harmful interference to radio communications.
Operation of this equipment in a residential area is likely to cause harmful
interference, in which case the user is required to correct the interference at their
own expense.

Properly shielded and grounded cables and connectors must be used to meet FCC
emission limits. IBM® is not responsible for any radio or television interference
caused by using other than recommended cables and connectors or by
unauthorized changes or modifications to this equipment. Unauthorized changes
or modifications might void the authority of the user to operate the equipment.

This device complies with Part 15 of the FCC Rules. Operation is subject to the
following two conditions: (1) this device might not cause harmful interference, and
(2) this device must accept any interference received, including interference that
might cause undesired operation.

Industry Canada Class A Emission Compliance Statement

This Class A digital apparatus complies with Canadian ICES-003.

Avis de conformité à la réglementation d'Industrie Canada

Cet appareil numérique de la classe A est conforme à la norme NMB-003 du


Canada.

Australia and New Zealand Class A Statement

This product is a Class A product. In a domestic environment, this product might


cause radio interference in which case the user might be required to take adequate
measures.

European Union EMC Directive Conformance Statement

This product is in conformity with the protection requirements of EU Council


Directive 2004/108/EC on the approximation of the laws of the Member States
relating to electromagnetic compatibility. IBM cannot accept responsibility for any
failure to satisfy the protection requirements resulting from a nonrecommended
modification of the product, including the fitting of non-IBM option cards.

© Copyright IBM Corp. 2011, 2014 vii


This product is an EN 55022 Class A product. In a domestic environment, this
product might cause radio interference in which case the user might be required to
take adequate measures.

Responsible manufacturer:

International Business Machines Corp.


New Orchard Road
Armonk, New York 10504
914-499-1900

European Community contact:

IBM Technical Regulations, Department M456


IBM-Allee 1, 71137 Ehningen, Germany
Telephone: +49 7032 15-2937
Email: [email protected]

Germany Class A Statement

Deutschsprachiger EU Hinweis: Hinweis für Geräte der Klasse A EU-Richtlinie


zur Elektromagnetischen Verträglichkeit

Dieses Produkt entspricht den Schutzanforderungen der EU-Richtlinie


2004/108/EG zur Angleichung der Rechtsvorschriften über die elektromagnetische
Verträglichkeit in den EUMitgliedsstaaten und hält die Grenzwerte der EN 55022
Klasse A ein.

Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu


installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM
empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für
die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung der
IBM verändert bzw. wenn Erweiterungskomponenten von Fremdherstellern ohne
Empfehlung der IBM gesteckt/eingebaut werden.

EN 55022 Klasse A Geräte müssen mit folgendem Warnhinweis versehen werden:


“Warnung: Dieses ist eine Einrichtung der Klasse A. Diese Einrichtung kann im
Wohnbereich Funk-Störungen verursachen; in diesem Fall kann vom Betreiber
verlangt werden, angemessene Maßnahmen zu ergreifen und dafür
aufzukommen.”

Deutschland: Einhaltung des Gesetzes über die


elektromagnetische Verträglichkeit von Geräten

Dieses Produkt entspricht dem “Gesetz über die elektromagnetische Verträglichkeit


von Geräten (EMVG)”. Dies ist die Umsetzung der EU-Richtlinie 2004/108/EG in
der Bundesrepublik Deutschland.

Zulassungsbescheinigung laut dem Deutschen Gesetz über die


elektromagnetische Verträglichkeit von Geräten (EMVG) (bzw. der
EMC EG Richtlinie 2004/108/EG) für Geräte der Klasse A

Dieses Gerät ist berechtigt, in Übereinstimmung mit dem Deutschen EMVG das
EG-Konformitätszeichen - CE - zu führen.

Verantwortlich für die Einhaltung der EMV Vorschriften ist der Hersteller:

viii IBM Netezza Data Loading Guide


International Business Machines Corp.
New Orchard Road
Armonk, New York 10504
914-499-1900

Der verantwortliche Ansprechpartner des Herstellers in der EU ist:

IBM Deutschland
Technical Regulations, Department M456
IBM-Allee 1, 71137 Ehningen, Germany
Telephone: +49 7032 15-2937
Email: [email protected]

Generelle Informationen: Das Gerät erfüllt die Schutzanforderungen nach EN 55024


und EN 55022 Klasse A.

Japan VCCI Class A Statement

This product is a Class A product based on the standard of the Voluntary Control
Council for Interference (VCCI). If this equipment is used in a domestic
environment, radio interference might occur, in which case the user might be
required to take corrective actions.

Japan Electronics and Information Technology Industries


Association (JEITA) Statement

Japan Electronics and Information Technology Industries Association (JEITA)


Confirmed Harmonics Guidelines (products less than or equal to 20 A per phase)

Japan Electronics and Information Technology Industries


Association (JEITA) Statement

Japan Electronics and Information Technology Industries Association (JEITA)


Confirmed Harmonics Guidelines (products greater than 20 A per phase)

Electronic emission notices ix


Korea Communications Commission (KCC) Statement

This is electromagnetic wave compatibility equipment for business (Type A). Sellers
and users need to pay attention to it. This is for any areas other than home.

Russia Electromagnetic Interference (EMI) Class A Statement

People's Republic of China Class A Electronic Emission


Statement

Taiwan Class A Compliance Statement

x IBM Netezza Data Loading Guide


Regulatory and compliance
Regulatory Notices

Install the NPS® system in a restricted-access location. Ensure that only those
people trained to operate or service the equipment have physical access to it.
Install each AC power outlet near the NPS rack that plugs into it, and keep it
freely accessible.

Provide approved circuit breakers on all power sources.

Product might be powered by redundant power sources. Disconnect ALL power


sources before servicing.

High leakage current. Earth connection essential before connecting supply. Courant
de fuite élevé. Raccordement à la terre indispensable avant le raccordement au
réseau.

Homologation Statement

This product may not be certified in your country for connection by any means
whatsoever to interfaces of public telecommunications networks. Further
certification may be required by law prior to making any such connection. Contact
an IBM representative or reseller for any questions.

© Copyright IBM Corp. 2011, 2014 xi


xii IBM Netezza Data Loading Guide
About this publication
These topics describe the methods and commands for loading data into the IBM
Netezza® appliance. The topics are written for administrators who are transferring
or loading data into the appliance.

These topics often reference SQL commands that are used for tasks such as
creating external tables, inserting data, and running selects for reporting. IBM
Netezza SQL is the Netezza Structured Query Language (SQL), which runs on the
Netezza data warehouse appliance. Throughout this publication, the term SQL
refers to the SQL implementation by Netezza.

If you need help


If you are having trouble using the IBM Netezza appliance, follow these steps:
1. Try the action again, carefully following the instructions for that task in the
documentation.
2. Go to the IBM Support Portal at: http://www.ibm.com/support. Log in using
your IBM ID and password. You can search the Support Portal for solutions. To
submit a support request, click the Service Requests & PMRs tab.
3. If you have an active service contract maintenance agreement with IBM, you
can contact customer support teams by telephone. For individual countries,
visit the Technical Support section of the IBM Directory of worldwide contacts
(http://www.ibm.com/support/customercare/sas/f/handbook/contacts.html).

How to send your comments


You are encouraged to send any questions, comments, or suggestions about the
IBM Netezza documentation. Send an email to [email protected]
and include the following information:
v The name and version of the manual that you are using
v Any comments that you have about the manual
v Your name, address, and phone number

We appreciate your suggestions.

© Copyright IBM Corp. 2011, 2014 xiii


xiv IBM Netezza Data Loading Guide
Chapter 1. Overview of data loading methods
Within the IBM Netezza environment, data loading is the process to transfer data
to the IBM Netezza appliance.

This section provides general information about the data loading methods that are
available for the IBM Netezza appliance. Data loading could require a significant
percentage of resources, which can affect system performance. It is important to
schedule loads during times when the system is less busy to avoid impacts to user
activity and scheduled reports.

Data loading components


IBM Netezza supports several methods for loading data into the appliance.
External tables
Tables that are stored as flat files on the host or client systems and not in
the Netezza database. You can use these tables to load data into the
Netezza appliance. For more information, see Chapter 2, “External tables,”
on page 2-1.
The nzload command
A command that provides an easy method for using external tables and
getting data into the Netezza appliance. For more information, see
Chapter 4, “The nzload command,” on page 4-1.
Format options
Several file format options that you can use to format the data load to and
from external tables. Since data comes in different forms, Netezza provides
different ways of setting up the load. For more information, see Chapter 4,
“The nzload command,” on page 4-1 and “Fixed-Length format” on page
2-15.
Back up and restore
The Netezza backup and restore utilities provides different methods for
transferring data between Netezza systems. One method is to create
external tables and use nzload, described in Chapter 2, “External tables,”
on page 2-1 and Chapter 4, “The nzload command,” on page 4-1 For more
information about backups and restores, see the IBM Netezza System
Administrator’s Guide.
The nz_migrate utility
A separate tool in the general purpose scripts supplied in the Netezza
Software Support Tools package. This utility is a script that can migrate
(copy) a database or table from one Netezza appliance to another, or make
a copy of a database or table on the same server. Online help is available
for the utility using the nz_migrate -? command.

Data loading formats


In the database environment, there is always the need to load data from external
sources such as files, pipes, or sockets into a table. These external sources have
various formats to represent each of the data types individually, and together as
records or rows.

© Copyright IBM Corp. 2011, 2014 1-1


When you load data from database-like applications, such as an RDBMS, a
web-server, or some other structured data-store, they might export data into files
or streams in different formats. The following formats are used with the IBM
Netezza environment:
Text-Delimited
The method commonly used for data loading is text-delimited format,
where every value of a field or column ends with a delimiter, and each set
of these values of rows or records has an end-of-record delimiter, typically
a newline character.
Fixed-Length
A loading format which allows for a more expressive form of external table
definition, thus increasing the kinds of data formats and layouts that can
be loaded.
Compressed Binary
This IBM Netezza proprietary format compresses the data before a backup
or restore to benefit performance. Also called internal format, it typically
yields smaller data files, retains information about the IBM Netezza
topology, and thus is often faster to back up and restore. The internal
format is not a documented interface and it could change between releases.
For more information on backup and restore, see the IBM Netezza System
Administrator’s Guide.

Decimal delimiter option


If you create text-delimited or fixed-length format files, you can specify a period or
a comma as a decimal separator. The period symbol is the default value. The
comma-separator value is available for external tables and for nzload, to help you
to directly load data without extra pre-load conversion.

For the text-delimited format, and for unloading data, this option is available only
at the table level.

For the fixed-length format, you can specify this option at the column level,
making it possible to have a mix of comma and decimal separators.

The option is available for the following data types, for both text-delimited and
fixed-length formats:
v Numeric
v Float
v Double
v Time
v Timetz
v Timestamp

Option usage for each data type is explained in each particular section that
describes that data type.
Related concepts:
Appendix A, “Examples and grammar,” on page A-1

1-2 IBM Netezza Data Loading Guide


Chapter 2. External tables
This section describes external tables and summarizes the best practices and
restrictions for using them.

In the IBM Netezza environment, there are the following types of tables:
System tables
Stored on the host
User tables
Stored on the disks in the storage arrays
External tables
Stored as flat files on the host or client systems
Related concepts:
Chapter 3, “External table load options,” on page 3-1
Appendix A, “Examples and grammar,” on page A-1

About external tables


An external table allows IBM Netezza to treat an external file as a database table.
An external table has a definition (also called a table schema), but the actual data
exists outside of the Netezza appliance database. External tables can be used to
access files that are stored on the Netezza host server or, in the case of a remote
external table, Netezza can treat a file on a client system as an external table using
the REMOTESOURCE option.

After you create the external table definition, you can use INSERT INTO
statements to load data from the external file into a database table, or SELECT
FROM statements to query the external table.

Privileges required
To create an external table, you must have the CREATE EXTERNAL TABLE
administration privilege and List privilege on the database where you are defining
the table. If the schema where the table is defined is not the default schema, you
must have List privilege on the schema as well.

The database user who issues the CREATE EXTERNAL TABLE command owns the
external table.

When you create an external table, you must specify the location where the
external table data object is stored. The nz operating system user must have
permission to read from the data object location to support SELECT operations
from the table, and to write to the location if you use commands such as INSERT
to add rows to the external table.

Display external table information


To display information about external tables, use the \d command from the nzsql
prompt.
v To list all external tables found in the current database, use the \dx command.
For example:

© Copyright IBM Corp. 2011, 2014 2-1


dev(admin)=> \dx
List of relations
Name | Type | Owner
------------+-----------+-------
extlineitem | ext table | admin
xlineitem | ext table | admin
(2 rows)
v To list the options defined in an external table, use the \d <external_tablename>
command. For example:
dev(admin)=>\d extlineitem

Log files
By default, loading errors are written to the following log files:
v nzbad: <tablename>.<schema>.<dbname>.nzbad
v nzlog: <tablename>.<schema>.<dbname>.nzlog

You can override the default by specifying the file for errors by using the following
options with a file name:
v bf <filename> for nzbad
v lf <filename> for nzlog

External table usage


Use external tables to do the following tasks:
v Load data into the IBM Netezza appliance from an external table and structure
the loading operation to manipulate the data by using casts, joins, dropping
columns, and other features.
v Store data outside the Netezza appliance, either to transfer to another
application, or as a table backup.
v Create an external table and use data from an external table as part of a SQL
query.

The power of external tables is that the entire Extraction-Transformation-Loading


(ETL) process is mapped to plain SQL. Since a SQL-based ETL process can be
initiated from any SQL client that can talk to the Netezza appliance, it reduces or
avoids the requirement of specialized ETL tools.

To load an external data file into the Netezza appliance as an external table, you
can use either of the following clauses:
v Use a FROM clause of a SELECT SQL statement/command, like any normal
table.
v Use a WHERE clause of an UPDATE or DELETE SQL statement.

To unload an external table into an external data file, use the table as the target
table in any of the following SQL statements:
v INSERT SQL
v SELECT INTO SQL
v CREATE TABLE AS SELECT SQL

All references to columns in the external table can be complex SQL expressions
used for the transformation of external data during a load/unload process.
Related concepts:
“Back up and restore” on page 2-3

2-2 IBM Netezza Data Loading Guide


“Restrictions” on page 2-12

Parse rows
For loads, the sequence of rows are parsed one-by-one from the external data file,
and converted into internal records of the external table. There can be errors
during the parsing of each row or each column. For example, there can be errors in
identifying the column value itself, as in the case of a missing delimiter. Or there
can be errors during the conversion from external format to internal records of the
external table, such as alphabets mentioned for an integer column in text-delimited
format.

Each error is logged in detail in an nzlog file, and bad rows are logged in an nzbad
file. These files help user to identify bad rows in the external data file and correct
them for reloading. Depending on the load options of the external table in use,
each bad row would either cause the row to be skipped, or the entire load to be
aborted. Similarly, each bad column of a bad row can cause the rest of the row to
be ignored, or if possible to recover, the load can continue to parse subsequent
columns of the same row.

If there is an error in the project-expression on the external table columns, then the
entire load is aborted and the transaction rolled back. Errors of this nature are not
logged in nzbad or nzlog files, as they are outside of the scope of the external table
load mechanism. When the processing reaches the normal SQL engine, the external
table is treated as if it is a normal table.

Unlike an external table that has external rows in an ordered sequence, normal
user tables have no implicit row order other than hidden rowid columns. So there
is no way for a user who is not using rowids to identify the bad row in a SQL
engine. In this case, the IBM Netezza system returns an error that a particular
column caused an error, without identifying the bad row. It is as if the query was
selecting from a normal table and inserting into another normal table, with some
row that caused the error during insertion.

Back up and restore


You can use external tables to back up a table in the system database. While the
IBM Netezza appliance database backup utility, nzbackup, creates backups of the
entire database, you can use the external table backup method to create a backup
of a single table, with the ability to later restore it to the database as needed.

To back up table data by using an external table, create external table definitions
for each user table and then use SQL to insert into the external table. When you
restore table data, create a table definition (if it does not exist) and then use SQL to
insert into the table from an external table.
Related concepts:
“External table usage” on page 2-2

CREATE EXTERNAL TABLE command syntax


The CREATE EXTERNAL TABLE command has the following syntax.
v To create an external table based on another table:
CREATE EXTERNAL TABLE table_name
SAMEAS table_name
USING external_table_options
v To create an external table by defining columns:

Chapter 2. External tables 2-3


CREATE EXTERNAL TABLE table_name
({ column_name type
[ column_constraint [ ... ] ]} [, ... ]
)
[USING external_table_options]

Note: The system allows and maintains PRIMARY KEY, DEFAULT, UNIQUE, and
REFERENCES. UNIQUE, PRIMARY KEY, and REFERENCES are ignored for
external tables. The system does not support constraint checks and referential
integrity. The user must ensure constraint checks and referential integrity.
Related concepts:
“Column constraint rules for empty strings” on page 2-9

Transient external tables


Transient external tables (TET) provide a way to define an external table that exists
only for the duration of a single query.

Transient external tables have the same capabilities and limitations as normal
external tables. A special feature of a TET is that the table schema does not need to
be defined when the TET is used to load data into a table or when the TET is
created as the target of a SELECT statement.

Syntax

The following is the syntax for a TET:


INSERT INTO <table> SELECT <column_list | *>
FROM EXTERNAL ’filename’ [(table_schema_definition)]
[USING (external_table_options)];

CREATE EXTERNAL TABLE ’filename’ [USING (external_table_options)]


AS select_statement;

SELECT <column_list | *> FROM EXTERNAL ’filename’ (table_schema_definition)


[USING (external_table_options)];

Explicit table schema definition

The table schema of a transient external table can be explicitly defined in a query.
When defined this way, the table schema definition is the same as is used when
defining a table schema by using CREATE TABLE.
SELECT x, y, NVL(dt, current_date) AS dt FROM EXTERNAL ’/tmp/test.txt’
( x integer, y numeric(18,4), dt date ) USING (DELIM ’,’);

The explicit schema definition feature can be used to specify fixed-length formats.
SELECT * FROM EXTERNAL ’/tmp/fixed.txt’ ( x integer, y numeric(18,4),
dt date ) USING (FORMAT ’fixed’ LAYOUT (bytes 4, bytes 20, bytes 10));

The SAMEAS keyword can also be used to specify that the schema of the external
table is identical to some other table that currently exists in the database.
SELECT * FROM EXTERNAL ’/tmp/test.txt’ SAMEAS test_table
USING (DELIM ’,’);

2-4 IBM Netezza Data Loading Guide


Implicit table schema definition

If the transient external table schema is not explicitly defined, the schema is
determined based on the query that is executing. When a TET is used as a data
source for an INSERT statement, the external table uses the schema of the target
table.

The external table in this INSERT statement uses the schema of the target table.
The columns in the external data file must be in the same order as the target table,
and every column in the target table must also exist in the external table data file.
INSERT INTO target SELECT * FROM external ’/tmp/data.txt’
USING (DELIM ’|’);

Export data by using transient external tables

A transient external table can also be used to export data out of the database. In
this case, the schema of the external table is based on the query that is executing.
For example:
CREATE EXTERNAL TABLE ’/tmp/export.csv’ USING (DELIM ’,’) AS
SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x;

Remote transient external tables

A session connected to IBM Netezza using ODBC, JDBC, or OLE DB from a client
system can import and export data by using a remote transient external table,
which is defined by using the REMOTESOURCE option in the USING clause.

For example, the following SQL statement loads data from a file on a Windows
system into a TEMP table on Netezza by using an ODBC connection.
CREATE TEMP TABLE mydata AS SELECT cust_id, upper(cust_name) as name
from external ’c:\customer\data.csv’ (cust_id integer, cust_name
varchar(100)) USING (DELIM ’,’ REMOTESOURCE ’ODBC’);

Remote external table loads work by sending the contents of a file from the client
system to the Netezza server where the data is then parsed. This method
minimizes CPU usage on the client system during a remote external table load.

Supported data types


The following table describes the IBM Netezza supported data types for external
tables.
Table 2-1. Supported data types
Data type Example See
byteint 120 “Integer data types” on page 2-6.
smallint 0
integer 256
bigint 1290985
numeric -99.56 “Fixed-point data types” on page 2-7.
decimal 123.679
real -81293.35 “Floating-point data types” on page 2-8.

double precision
char (n) salary “Character strings” on page 2-9 and “Column
constraint rules for empty strings” on page 2-9.

Chapter 2. External tables 2-5


Table 2-1. Supported data types (continued)
Data type Example See
varchar (n) this is a variable “Character strings” on page 2-9 and “Column
string constraint rules for empty strings” on page 2-9.
boolean true An ASCII string that contains one of the
following values:
true yes 1 t y
false no 0 f n

“The BoolStyle option” on page 3-2.


date 2002-02-04 The date is an exact 4-byte data type. The
system recognizes a range of dates composed of
year, month, and day.

“The DateStyle option” on page 3-5.


time 01:59:45 “Time” on page 2-10.

23:00:01
time with time zone 01:15:33 -05 “Time with time zone” on page 2-11.
timestamp 2002-02-04 “Timestamp” on page 2-12.
01:15:33

Integer data types


Integer types are exact data types. The system generates an error if a value of the
input field cannot be expressed without loss of accuracy in the target table.

The following list describes the integer syntax:


Syntax
[+|-]<digit>...
Description
v Optional leading sign
v Unlimited leading zeros
v At least one decimal digit
Limitation
v No thousands-separator commas
v No support for exponential notation

The following table describes the integer handling.


Table 2-2. Integer handling
SQL Alias Representation Values
byteint int1 1 byte, signed min value = -128

max value = 127


smallint int2 2 bytes, signed min value = -32768

max value = 32767


integer int or int4 4 bytes, signed min value = –2147483648

max value = 2147483647

2-6 IBM Netezza Data Loading Guide


Table 2-2. Integer handling (continued)
SQL Alias Representation Values
bigint int8 8 bytes, signed min value = –9223372036854775808

max value = 9223372036854775807

Fixed-point data types


The fixed-point data types are exact data types. The system generates an error if a
value in the input field cannot be expressed without loss of accuracy in the target
table or database.

The following list describes the fixed-point syntax.


Syntax
[’+’|’-’]<digit>...[’.’[<digit>...]]
[’+’|’-’]’.’<digit>...
[’+’|’-’]<digit>...[’,’[<digit>...]]
[’+’|’-’]’.’<digit>...
Description
v Optional leading sign
v Unlimited leading zeros
v At least one decimal digit
Limitation
v No thousands-separator commas
v No support for exponential notation

The syntax of fixed-point values is the same as the syntax of integer values with
the addition of an optional decimal digit that can occur anywhere such as from
before the first decimal digit to after the last decimal digit.

The optional decimal point can be followed by zero or more decimal digits, if there
is at least one decimal digit before the decimal point; followed by one or more
decimal digits if there are no decimal digits before the decimal point.

If there is no explicit decimal point, the system assumes a decimal point


immediately following the last decimal digit.

You can also specify a comma as a separator by using it like the decimal digit.

The following table describes the fixed-point precision and representation:

Precision Representation
P≤9 4 bytes, signed
9 < P ≤ 18 8 bytes, signed
18 < P ≤ 36 16 bytes signed

The following conditions result in system errors:


Precision
Having more decimal digits before the decimal point than the declaration
allows (P-S).

Chapter 2. External tables 2-7


Scale Having more decimal digits that follow the decimal point than the
declared scale (S).

Note: Because the fixed-point data type is an exact data type, when there are too
many digits that follow the decimal point, the system does not round the number.
Related concepts:
“Decimal delimiter examples” on page A-3

Floating-point data types


The floating-point data types are approximate data types. The system rounds the
significand if more precision is present that it can represent.

The following list describes the floating point syntax.


Syntax
[ ’+’ | ’-’ ] <digit>... [ ’.’ [ <digit>... ] ] [( ’e’ | ’E’ ) [ ’+’ | ’-’ ] <digit>... ]
[ ’+’ | ’-’ ] ’.’ <digit>... [ ( ’e’ | ’E’ ) [ ’+’ | ’-’ ] <digit>... ]
[ ’+’ | ’-’ ] <digit>... [ ’,’ [ <digit>... ] ] [( ’e’ | ’E’ ) [ ’+’ | ’-’ ] <digit>... ]
[ ’+’ | ’-’ ] ’,’ <digit>... [ ( ’e’ | ’E’ ) [ ’+’ | ’-’ ] <digit>... ]
Description
v Optional leading sign
v Unlimited leading zeros
v At least one decimal digit
v Decimal point or comma, if needed
v Optional 'e' or 'E' introducing an exponent followed by an optional sign
and one or more digits
Limitation
v No thousands commas
v No support for loading exceptional values (Not a Number (NaNs) and
infinities)

The syntax of floating-point values is the same as the syntax of fixed-point values
augmented by an optional trailing exponent specification.

The optional decimal point can be followed by zero or more decimal digits, if there
is at least one decimal digit before the decimal point; followed by one or more
decimal digits if there are no decimal digits before the decimal point.

If there is no explicit decimal point, the system assumes a decimal point


immediately following the last decimal digit.

You can also specify a comma as a separator by using it like the decimal digit.

The optional power of 10 exponent is 'e' (lowercase or uppercase), with an optional


sign, non-empty sequence of decimal digits.

The following table describes the floating-point precision and representation:


Table 2-3. Floating-point precision
Type Real Double
Representation 4-byte IEEE floating point 8-byte IEEE floating point
Approx. largest normalized value ±3.40e+38 ±1.79e+308

2-8 IBM Netezza Data Loading Guide


Table 2-3. Floating-point precision (continued)
Type Real Double
Approx. smallest normalized value ±1.18e-38 ±3.40e-308
Approx. smallest denormalized ±7.01e-46 ±2.50e-324
value

The following result in system errors:


Overflow
If the field exceeds the largest representable value (maximal exponent and
maximal significand)
Underflow
If the number is too small to approximate in the denormalized range
Related concepts:
“Decimal delimiter examples” on page A-3

Character strings
Char(n)/nchar(n) are character strings of length n. Varchar(n)/nvarchar(n) are
variable-length character strings of maximum length n. A valid character is
between the ASCII values 32 - 255.

System handling of characters


The following table describes how the system handles char, nchar, varchar, and
nvarchar characters.
Table 2-4. Character handling
Characters char, nchar,
varchar, and nvarchar How the characters are handled
Padding char/nchar: Padded to normal length with spaces

varchar/nvarchar: Not padded


Truncation If the data is longer than the field:
v The system writes the record to the nzbad file.
v The system writes a summary of the bad records to the nzlog
file.

You can turn on automatic truncation with the -truncString


option.
Note: If you use this option for Unicode character data, it can
truncate combined NFC characters if they exceed the specified
column length. The switch does not attempt to keep any
grapheme clusters; it truncates data as necessary to fit in the
specified column size.

Column constraint rules for empty strings


For all char(n) and varchar(n) data types, the result of inserting an empty string
and entering missing data values depends on whether the columns are declared
null-able (default) or not null-able (declared with constraint not null). The
following table describes the different cases.

Chapter 2. External tables 2-9


Table 2-5. Column constraining rule for empty strings
Column Null token does
Data type constraint Null token exists not exist
null token '' (empty string) '' (empty string)
Char/Nchar NULL NULL char/nchar: space NULL
filled.
Varchar/Nvarchar
varchar/nvarchar:
Zero length string.
NOT NULL ERROR char/nchar: space ERROR
filled.

varchar/nvarchar:
Zero length string.
Bool, Date, Int NULL NULL NULL NULL
(1,2,4,8), Numeric(),
NOT NULL ERROR ERROR ERROR
Float (4,8), Time,
Timestamp, Timetz

If the record contains fewer data values than the actual columns defined in the
schema of the table, the system writes an error to the nzlog file and discards the
record. To override this behavior, use the -fillRecord option, which applies to the
entire load operation.

The -fillRecord option tells the system to use a null value in place of any missing
fields. You can use this option if the columns whose values are missing allow
nulls. If these columns are defined as not null, the system writes an error to the
nzlog file and discards the record. You must resolve this conflict by changing the
schema to allow null values or modifying the data file to include a valid non-null
value.
Related concepts:
“CREATE EXTERNAL TABLE command syntax” on page 2-3
Related reference:
“The NullValue option” on page 3-10

Time data types


The system supports time, timestamp, and time with time zone. These data types
are exact types, stored to the accuracy of 1µSec (1/1,000,000 of a second).

You can also specify a comma as a separator in time data types by using it like the
decimal digit.
Related concepts:
“Decimal delimiter examples” on page A-3

Time
The IBM Netezza appliance time is an exact, 8-byte data type stored internally as a
signed integer that represents the number of microseconds since midnight.

The system accepts both 24 hour and 12 hour a.m. and p.m. time values. You can
specify the format with the -timeStyle option. The default is the 24-hour format.

2-10 IBM Netezza Data Loading Guide


The time format consists of five components: hour, minute, second, fraction of a
second, and the AM or PM token. You must have hour and minute; second and
fraction of second are optional. The AM or PM token is required for 12 hour and
not allowed for 24-hour format.

The time options have the following formats. The delimited examples use the
default time delimiter, which is a colon (:).
v 12-hour delimited HH:MM:SS.FFF [AM | PM] (such as 10:12 PM, or
1:02:46.12345 AM)
v 12-hour undelimited HHMMSS.FFF [AM | PM] (such as 1012 PM or
010246.12345 PM)
v 24-hour delimited HH:MM:SS.FFF (such as 19:15 or 1:15:00.1234)
v 24-hour undelimited HHMMSS.FFF (such as 1915 or 10246.12345 PM)

In these formats, keep in mind the following:


v HH is a one-digit or two-digit hour value 1 - 12 in the 12-hour notation or 1 - 24
in the 24-hour notation. In undelimited format, you must specify two digits such
as 01, 02, and so on.
v MM is a one-digit or two-digit minute value 1 - 60. In undelimited format, you
must specify two digits such as 01.
v SS is a one-digit or two-digit seconds value 1 - 60. In undelimited format, you
must specify two digits such as 01.
v FFF specifies a fraction of a second. If you specify a fractional value, you must
precede it with a decimal point. If the value can be stored without loss of
precision, it is accepted. If the value cannot be stored without loss of precision, it
is rejected. You can use the -timeRoundNanos option to allow rounding when the
full precision of any fractional digits cannot be preserved.

Time with time zone


The IBM Netezza time with time zone (timetz) is an exact data type stored in 12
bytes. Internally the Netezza appliance stores it as time and an offset. The stored
offset has the same 1µS resolution as time even though the input is restricted to a
one-minute resolution.

Syntax
<time> ( ’+’ | ’-’ ) <digit> [ <digit> [ ’:’ <digit> [ <digit> ] ] ]

The input format of time with time zone value is identical to that of simple time
followed by a trailing signed offset from Coordinated Universal Time (UTC,
formerly Greenwich Mean Time GMT). The time section must conform to the
-timeStyle and -timeDelim in effect during the nzload job.

You must specify a signed, time-zone hour, whereas the time-zone minute is
optional. If you use the minute, separate it with a colon (the default timeDelim
character).

Note: You cannot use named time zones, such as EST.

Errors

The following are time and range errors:


v Time: The same errors as the time data type.
v Range: The time zone offset is restricted to -13:00 to +12:59.

Chapter 2. External tables 2-11


Timestamp
The IBM Netezza appliance timestamp is an exact data type stored as eight bytes.
The stored offset has the same 1µS resolution as the time data type.

Syntax
timestamp <date> <time>

The input format of a timestamp value is a date value followed by a time value.
You can have optional spaces between the date and the time. The date section
must conform to the -dateStyle and -dateDelim in effect during the load job.

Errors

The following are date and time errors:


v Date: The same errors as the date data type.
v Time: The same errors as the time data type.

Restrictions
The following restrictions and considerations are for use with external tables:
v Always consider your source and target systems, and whether the data is
properly formatted for loading.
v To insert and drop an external table, use the INSERT and DROP commands.
v You cannot delete, truncate, or update an external table. After creating an
external table, you can alter and drop the table definition. (Dropping an external
table deletes the table definition, but it does not delete the data file that is
associated with the table.) You can select the rows in the table and insert rows
into the table (following a table truncation).
v While you cannot select from more than one external table at a time in a query
or subquery, you can move data from one external table to another, such as
using SELECT and INSERT. The system displays an error if you incorrectly
specify multiple external tables in a SQL query, or if you reference the same
external table more than once in a query:
ERROR: Multiple external table references in a query not allowed
To specify more than two external tables, load the data in into a non-external
table and specify this table in the query.
v You cannot use a union operation that involves two or more external tables.
v Using the nzbackup command to back up external tables backs up the schema
but not the data.
v Host-side operations, such as selects and rowsetlimit user and group property
interactions, are not supported for compressed external tables.
v The DecimalDelim option is not supported for compressed external tables.
v There is a maximum limit of 300 concurrent loads for multiple loads.
Related concepts:
“External table usage” on page 2-2

Best practices for using external tables


When specifying external tables, keep in mind the following:
v An external table reference can be used as the source table of a SELECT FROM
statement. A transient external table reference in a SELECT FROM clause infers
its shape from the preceding INSERT INTO clause.
2-12 IBM Netezza Data Loading Guide
v The system catalog data types TEXT and NAME are treated as NVARCHAR. If
these types are used in the table that is referenced in the select_clause, include
the encoding option in the CREATE EXTERNAL TABLE command to specify
internal encoding. Otherwise, you can receive the error LATIN9 encoding cannot
be specified with NCHAR/NVARCHAR column definitions. For example:
create external table ’/tmp/ext1’ using (encoding ’internal’)
as select username from _t_user;
v The CREATE EXTERNAL TABLE AS statement supports an optional table name.
If you do not provide a table name, the table is transient, which means the
external table definition does not persist in the system catalog. If you supply a
table name, the external table becomes a named object in the system catalog.
v The USING clause in the inline external statement is optional. If you omit it, the
resulting external table has the default settings. You must specify the USING
clause in the CREATE EXTERNAL TABLE SAMEAS statement, because the
SAMEAS table might be another external table.
v When you insert data into an external table that references an existing data file,
the system truncates the file before inserting the data of an external table.
v You cannot use external tables in complex SQL statements. If the statement is
not supported, the system displays an error.

Before you reload an external table, verify that the destination table in the database
is empty or that it does not contain the rows in the external table that you are
about to reload. If the destination table contains the rows contained in the external
table, problems might occur. These problems can also occur if you accidentally
reload the external table more than once.

For example, loading a text-format external table into a destination table that
contains the same data creates duplicate data in the database. The rows will have
unique row IDs, but the data is duplicated. To fix this problem, you would need to
delete the duplicate rows or truncate the database table and reload the external
table again (but only once).

If you load a compressed binary format external table into a destination table that
has the same rows, you create duplicate rows with duplicate row IDs in the
database table. The system restores the rows by using the same row IDs saved in
the compressed binary format file.

Duplicate row IDs can cause incorrect query results and can lead to problems in
the database. You can check for duplicate rowIDs by using the rowid keyword as
follows:
SELECT rowid FROM employee_table GROUP BY rowid HAVING count(rowid)>1;

If the query returns multiple rows that share the row ID, truncate the database
table and reload the external table (but only once).

After you load data from an external table into a user table, run GENERATE
STATISTICS to update the statistics for the user table. This improves the
performance of queries that run against that table.

CREATE EXTERNAL TABLE command examples


The following examples show how to use the CREATE EXTERNAL TABLE
command.
v To create an external table, enter:

Chapter 2. External tables 2-13


CREATE EXTERNAL TABLE ext_orders(ord_num INT, ord_dt
TIMESTAMP)USING(dataobject(’/tmp/order.tbl’) DELIMITER ’|’);
v To create an external table that uses column definitions from an existing table,
enter:
CREATE EXTERNAL TABLE demo_ext SAMEAS emp USING (dataobject
(’/tmp/demo.out’) DELIMITER ’|’);
v To create an external table and specify the escape character ('\'), enter:
CREATE EXTERNAL TABLE extemp SAMEAS emp USING( dataobject
(’/tmp/extemp.dat’) DELIMITER ’|’ escapechar ’\’);
v To unload data from your database into a file by using an insert statement,
enter:
INSERT INTO demo_ext SELECT * FROM emp;
v To drop an external table, enter:
DROP TABLE extemp
The system removes only the schema information of the external table from the
system catalog. The file defined in the dataobject option remains unaffected in
the file system.
v To back up by creating an external table, enter:
CREATE EXTERNAL TABLE ’/path/extfile’ USING (FORMAT ’internal’
COMPRESS true) AS SELECT * FROM source_table;
v To restore from an external table, enter:
INSERT INTO t_desttbl SELECT * FROM EXTERNAL’/path/extfile’
USING(FORMAT ’internal’ COMPRESS true);

Transient external table


The following examples show how to specify the shape of a transient external
table:
v To use the schema of the target table, enter:
insert into <table> select * from external ’<file>’ [USING(...)]
v To use the schema of the query, enter:
create external table ’<file>’ [USING (...)] as <QUERY>
v To use the schema of <table>, enter:
select * from external ’<file>’ sameas <table> [USING(...)]
v To use the schema as defined, enter:
select * from external ’<file>’ (schema) [USING(...)]
v To use the schema as defined, enter:
create external table ’<file>’ (schema) [USING(...)]
v To make the source file FIXED format with the schema as defined, enter:
select * from external ’<file>’ (schema) USING (FORMAT ’FIXED’
LAYOUT (...))
v To make the source file FIXED format and the table use the schema of the target
table, enter:
insert into <table> select * from external ’<file>’ USING (FORMAT
’FIXED’ LAYOUT (...))
v The following example does not work, because you cannot unload data into a
FIXED format external table:
create external table ’<file>’ [(schema)] USING (FORMAT ’FIXED’
LAYOUT ... )

2-14 IBM Netezza Data Loading Guide


Fixed-Length format
The following examples show how to use fixed-length format with external tables:
v To load data in fixed format, enter:
INSERT INTO t SELECT * FROM EXTERNAL ’/data/fixed’ USING ( FORMAT
’FIXED’ LAYOUT (BYTES 20, REF BYTES 3, BYTES @2) )
v To load data with different date/time delimiters for different zones, enter:
INSERT INTO t SELECT * FROM EXTERNAL ’/data/fixed’ USING ( FORMAT
’FIXED’ LAYOUT ( YMD ’-’ BYTES 15, DMY ’/’ BYTES 15 ) )
v To load spatial data (binary data into VARCHAR), enter:
INSERT INTO t SELECT * FROM EXTERNAL ’/data/fixed’ USING ( FORMAT
’FIXED’ CTRLCHARS true LAYOUT ( BYTES 100, REF BYTES 4, BYTES @2) )
v To load fixed-format data with record-length and no record-delimiter, enter:
INSERT INTO t SELECT * FROM EXTERNAL ’/data/fixed’ USING ( FORMAT
’FIXED’ RECORDDELIM ’’ RECORDLENGTH @1 LAYOUT( REF BYTES 2, BYTES
120, REF BYTES 2, BYTES @3) )
v To load data with different NULLIF clauses for different zones, enter:
INSERT INTO t SELECT * FROM EXTERNAL ’/data/fixed’ USING ( FORMAT
’FIXED’ LAYOUT ( BYTES 15 NULLIF ’2000-10-10’, BYTES 2 & = ’12’) )
v To load data with NULLIF clauses that refer to other zones, enter:
INSERT INTO t SELECT * FROM EXTERNAL ’/data/fixed’ USING ( FORMAT
’FIXED’ LAYOUT ( REF BYTES 2, BYTES @1 NULLIF @1 = -1, REF BYTES 4,
BYTES 100 NULLIF &&3 = ’null’ ) )
Related concepts:
“Legal characters” on page 3-16

Examples of unload and reload


The following examples unload and load a user table to an external table in
text-delimited format. Unloading is not supported for fixed-length format.
v To create a text-format external table, enter:
CREATE EXTERNAL TABLE extemp SAMEAS emp USING (DATAOBJECT
(’/tmp/emp.dat’));
v To unload data in user table EMP to the external table EXTEMP, enter:
INSERT INTO extemp SELECT * FROM emp;
v To load data into user table EMP from external table EXTEMP, enter:
TRUNCATE TABLE emp;
INSERT INTO emp SELECT * FROM extemp;

Back up and restore a user table


The following examples show how to back up and restore the user table EMP to
an external table in binary compressed format.
v To create a compressed binary format external table definition called
emp_backup for the table emp, enter:
CREATE EXTERNAL TABLE emp_backup SAMEAS emp USING (
DATAOBJECT (’/tmp/emp.bck’)
COMPRESS true
FORMAT ’internal’);
v To back up the emp table data into emp_backup, enter:
INSERT INTO emp_backup SELECT * FROM emp;
v To restore the emp table from emp_backup, make sure that the emp table is
empty and enter:

Chapter 2. External tables 2-15


TRUNCATE TABLE emp;
INSERT INTO emp SELECT * FROM emp_backup;

2-16 IBM Netezza Data Loading Guide


Chapter 3. External table load options
If you plan to load data using external tables, there are numerous table format
options that you can use to help control how the data is processed during the load.
Related concepts:
Chapter 2, “External tables,” on page 2-1
Appendix A, “Examples and grammar,” on page A-1
“nzload command syntax” on page 4-3
“External table unload options” on page 5-1

External table options


When you create an external table definition, you can specify options that control
processing for records, fields, and for the load operation itself. There are different
types of options: some are for records/rows, some are for fields, and some are for
loads. Use these options when loading from an external table or when using the
external table directly in a SQL query.

Note: The best method to verify that the load processing has been successful is to
review the errors, if any, in the nzlog and nzbad files. Check these files occasionally
during and after the load operations.

The following table lists the external table options, their values, and data types.
The sections after the table describe each option. In the Valid formats column, Text
indicates the text-delimited format and Fixed is the fixed-length format. In the
Data type column, enumeration indicates that the system accepts a specified set of
quoted or unquoted string values.
Table 3-1. External table options
Unload
Option Valid formats Values Default Y/N Data type
BoolStyle Text, Fixed 1_0/T_F/Y_N... NULL, 1_0 Y enumeration
Compress Text, Fixed True/False False Y boolean
CRinString Text, Fixed True/False NULL, False Y boolean
CtrlChars Text, Fixed True/False NULL, False N boolean
DataObject Text, Fixed Existing file path No default Y file name
DateDelim Text, Fixed 1 byte NULL, "-" Y string
DateStyle Text, Fixed YMD/MDY/DMY... NULL, YMD Y enumeration
DecimalDelim Text, Fixed 1 byte ‘.’ Y string
Delimiter Text 1 byte NULL, "|" Y string
Encoding Text Internal/Latin9/Utf8 NULL, Internal Y enumeration
EscapeChar Text 1 byte NULL Y string
FillRecord Text True/False NULL, False N boolean
Format Text, Fixed Text/Internal/Fixed Text Y enumeration
IgnoreZero Text True/False NULL, False N boolean
IncludeHeader Text True/False NULL, False N boolean

© Copyright IBM Corp. 2011, 2014 3-1


Table 3-1. External table options (continued)
Unload
Option Valid formats Values Default Y/N Data type
IncludeZeroSeconds Text True/False NULL, False Y boolean
Layout Text, Fixed Zone definitions NULL, Inherit N none
LogDir Text, Fixed existing dir path NULL, /tmp N string
MaxErrors Text, Fixed >=0 NULL,1 N integer
MaxRows Text, Fixed >=0 NULL, 0 N integer
NullValue Text, Fixed 4 bytes NULL, "NULL" Y string
QuotedValue Text No/Yes/Single/Double NULL, No N enumeration
RecordDelim Text, Fixed 4 bytes NULL, /newline N string
RecordLength Fixed Integer/Zone-ref expr NULL N integer
RemoteSource Text, Fixed ODBC/JDBC/OLE-DB NULL Y enumeration
RequireQuotes Text True/False NULL, False N boolean
SkipRows Text, Fixed >=0 NULL, 0 N bigint
SocketBufSize Text, Fixed 64 KB - 2 GB 8 MB Y integer
TimeDelim Text, Fixed 1 byte NULL, ":" Y string
TimeRoundNanos Text True/False NULL, False N boolean
TimeExtraZeros
TimeStyle Text, Fixed 24hour/12hour NULL, 24hour Y enumeration
TruncString Text True/False NULL, False N boolean
Y2Base Text, Fixed >=0 NULL, 0 N integer

Option details
The following sections describe each of the external table load options.

The BoolStyle option


Specifies the boolean style. During a load, the loader process requires all the
boolean values to use one style only.

The following table lists the boolean styles and their values.
Table 3-2. Boolean values
Style name Value
1_0 1 or 0
T_F T or F
Y_N Y or N
YES_NO YES or NO
TRUE_FALSE TRUE or FALSE

The default style is 1_0. The values can be specified in mixed case, so you could
specify a value of true, True, TRUE, or tRuE.

If you specify the YES_NO option on the command line, the system assumes that the
data in the boolean field is in the form yes or no. If the data is any of the other

3-2 IBM Netezza Data Loading Guide


values (for example: true, false, 1, 0, t, f, y, or n), the system discards the record to
the nzbad file and logs an error with the record number in the nzlog file.

The Compress option


Specifies whether the source data file data is compressed.

The valid values are true or on, false or off. The default is false. This option can
be true only if the format is set to 'internal'.

The CRinString option


Specifies whether to allow unescaped carriage returns in char/varchar and
nchar/nvarchar fields.

Acceptable values are true or false, on or off. Do not put quotation marks around
the value.
v False is the default value and treats all CR or CRLF as end-of-record.
v True accepts unescaped CR in char/varchar fields (LF becomes only end of row).

Note: This option is different for fixed-length format.


Related concepts:
“Format options” on page 6-2

The CtrlChars option


Specifies whether to allow an ASCII value 1 - 31 in char/varchar and
nchar/nvarchar fields.

You must escape NULL, CR, and LF characters. Acceptable values are true or
false, on or off. The default is false. Do not insert quotation marks around the
value.

Note: This option is different for fixed-length format. For more information, see
“Fixed-length option changes” on page 6-2.
Related concepts:
“Format options” on page 6-2

The DataObject option


Specifies the operating system path to the source data file (or any media that can
be treated as a file).

You must specify a value for the data object path name. There is no default value
for the external table data object. When the RemoteSource option is not set (or set
to empty string), this path must be an absolute path and not a relative path. The
file name must be a valid UTF-8 string.
v For loads, this file must be an existing file with read permission for the OS user
that initiates the load.
v For unloads, the parent directory of this file must have read and write
permissions for the OS user that initiates the unload, and the data file is
overwritten if it exists. Typically, the unloads are owned by the nz user, so the
nz user must have permission to read and write files in the target path.

As a best practice, the external table locations should not be within the /nz
directory or its subdirectories because the data object files might accidentally

Chapter 3. External table load options 3-3


interfere with IBM Netezza operations, and they might consume disk space that is
needed for the operation of the Netezza database and software.

Manage External Table Locations

Starting in Release 7.1.0.1, the admin user can specify and manage the locations on
the IBM Netezza host where users can store the external table data object files.
Users who have the Manage System privilege can also manage the locations for the
external table object files.

Note: When you change or restrict the external table locations, the restrictions
apply only to the new external tables that are created on the system. Any existing
external tables continue to use their current data object path name.

You use the SHOW EXTERNAL TABLE LOCATION command to display the
current table locations. By default, data objects can be created in any of the paths
on the Netezza host that are accessible by the nz user account.
TESTDB.ADMIN(ADMIN)=> SHOW EXTERNAL TABLE LOCATION;
ALLOWDIRECTORY
----------------
*
(1 row)

The asterisk indicates that there are no restrictions on the locations for the external
table object files.

To restrict the locations for the external table data objects, the admin user or any
privileged database user can add and remove table locations using the following
steps.
1. Connect to a Netezza database as the admin user or any database user with
Manage System privilege.
2. Use the SHOW EXTERNAL TABLE LOCATION command to review the
current table location path names.
3. Delete the ’*’ wildcard location to remove access to all the paths that the nz
user can access.
TESTDB.ADMIN(ADMIN)=> REMOVE EXTERNAL TABLE LOCATION ’*’;
REMOVE EXTERNAL TABLE LOCATION
4. Add the locations where the external table objects are allowed using the ADD
EXTERNAL TABLE LOCATION command. Any new external tables created on
the system must be stored in a permitted directory.
TESTDB.ADMIN(ADMIN)=> ADD EXTERNAL TABLE LOCATION ’/export/home/nz/ext_tbl’;
ADD EXTERNAL TABLE LOCATION
TESTDB.ADMIN(ADMIN)=> ADD EXTERNAL TABLE LOCATION ’/tmp/ext_tbl’;
ADD EXTERNAL TABLE LOCATION
The locations and the object file must exist on the system and be accessible by
the nz user account before you can insert to or read from the external table.

After you specify the external table locations, you can use the SHOW EXTERNAL
TABLE LOCATION command to review the list of supported table locations. After
you restrict the external table locations, the restrictions apply when you create new
external tables. Any existing external tables continue to use their specified data
object locations.

When a user creates an external table and specifies a data object path that is not
part of the allowed location list, the command fails with an error:

3-4 IBM Netezza Data Loading Guide


TESTDB.ADMIN(ADMIN)=> CREATE EXTERNAL TABLE my_ext_tbl SAMEAS tbl_retail
USING (DATAOBJECT (’/mydir’));
ERROR: Invalid path specified in DATAOBJECT, path not allowed ’/mydir’

When a user creates an external table and specifies a data object path that is in the
allowed locations list, but the nz user does not have read or write access to the file,
the CREATE EXTERNAL TABLE command succeeds, but commands to insert data
to the table will fail with a permission error:
TESTDB.ADMIN(ADMIN)=> CREATE EXTERNAL TABLE my_ext_tbl SAMEAS tbl_retail
USING (DATAOBJECT (’/tmp/ext_tbl’));
CREATE EXTERNAL TABLE
TESTDB.ADMIN(ADMIN)=> INSERT INTO my_ext_tbl VALUES (1,2,3,4);
ERROR: /tmp/ext_tbl : Permission denied

The DateDelim option


Specifies the delimiter character that separates the date components used with the
dateStyle option.

The default is a dash '-' for all dateStyle types except MONDY[2], where the
default is ' ' (space). This option is a single-byte string.
v If you specify the option as an empty string, which means that there is no
delimiter between the date components, you must specify days and months as
two-digit numbers. Single-digit months and days are not supported.
v With MonDY or MonDY2, the default dateDelim option is space.
v With days and months less than 10, use either one or two digits, or a space
followed by a single digit.
v With the dateDelim option as a space, the system allows a comma after the day.
v With any component (day, month, year) as zero, or any day/month
inconsistency, such as August 32 or February 30, the system returns an error.

The following table lists dateDelim option examples.


Table 3-3. The -dateDelim
No dateDelim -dateDelim ',' -dateDelim ' ' (space)
Jan 01 2003 Jan 01,2003 Jan 01, 2003
Jan 1 2003 Jan 1,2003 Jan 1, 2003
Jan 1 2003 Jan 1,2003 Jan 1, 2003

Note: If you are not using delimiters, the date is determined as in the following
example for June 12, 2009: 06122009

The DateStyle option


Specifies how to interpret the date format.

The possible values for the DateStyle option are shown in the following table. The
example shows how the date 21 March 2014 would be represented without a date
delimiter.
Table 3-4. DateStyle
Value Description Example
YMD 4-digit year, 2-digit month, 2-digit day. This is the default. 20140321
DMY 2-digit day, 2-digit month, 4-digit year. 21032014

Chapter 3. External table load options 3-5


Table 3-4. DateStyle (continued)
Value Description Example
MDY 2-digit month, 2-digit day, 4-digit year. 03212014
MONDY 3-character month, 2-digit day, 4-digit year. Mar212014
DMONY 2-digit day, 3-character month, 4-digit year. 21Mar2014
Y2MD 2-digit year, 2-digit month, 2-digit day. Not supported for 140321
unloads.
DMY2 2-digit day, 2-digit month, 2-digit year. Not supported for 210314
unloads.
MDY2 2-digit month, 2-digit day, 2-digit year. Not supported for 032114
unloads.
MONDY2 3-character month, 2-digit day, 2-digit year. Not supported Mar2114
for unloads.
DMONY2 2-digit day, 3-character month, 2-digit year. Not supported 21Mar14
for unloads.

The 4-digit years are in the range 0001 - 9999. There is no provision for years
before 0001 CE or after 9999 CE.

For example:
v In a control file, to specify the date format MM-DD-YY (for example, 03-21-14),
set datestyle to MDY2 and datedelim to '-'.
v In the command line, if the data file jan-01.data contains records in the
following format (the date format is shown in bold):
14255932|30/06/2002|20238|20127|40662|157|

because the date value uses the DD/MM/YYYY format, load that file by
specifying the following nzload command:
nzload -t agg_month -df jan-01.data -delim ’|’ -dateStyle DMY -dateDelim ’/’

The DecimalDelim option


Specifies the decimal delimiter for the following data types: float, double, numeric,
time, timetz, and timestamp. This format supports both text-delimited and
fixed-length formats. The default is '.'.
Related concepts:
“Decimal delimiter examples” on page A-3

The Delimiter option


Specifies the field delimiter.

The default is the pipe character '|'. You can specify characters in the 7-bit ASCII
range by using either a quoted value (for example: delimiter ’|’) or by its
unquoted decimal number (delimiter 124). To specify a byte value above 127, use
the decimal number. This option is a single-byte string. This option is not
supported for fixed-length format.

Note: For nzload, the default is '\t' (tab).

The system processes an input row by identifying the successive fields within that
row. A single character field delimiter separates adjacent fields. The lack of a field

3-6 IBM Netezza Data Loading Guide


delimiter between fields is an error. You can use a trailing field delimiter that
follows the last field in a row (but it is not required).

You can specify the following delimiters:


Numeric
0xNN or NN where NN is a number for either hexadecimal or decimal.
Control characters
^A -^Z (low-order 5 bits) and ^a -^z (low-order 5 bits).
Symbols
\b backspace (8), \t horizontal tab (9), \n line feed (10), \f form feed (12),
\r carriage return (13), \\ backslash, \' quotation mark, \" double
quotation mark.
Literal Any character, such as c (the non-control character c).

To use a character other than a 7-bit-ASCII character as a delimiter, make sure that
you specify it as a decimal or hex number. Do not specify a character literal, which
can result in errors from encoding transformation. For example, to use the hex
value 0xe9 as a delimiter (which is é in Latin9), use –d 0xe9 as the value. Do not
use –d 'é'.

Although the system accepts alphanumeric characters, to avoid ambiguity do not


select a delimiter that conflicts with the data in a field. If you use the dateDelim
and timeDelim options, select different delimiters for each type.

Note: When you are using the nzload command you can enter escape characters
on the command line, such a \b. If you use the CREATE EXTERNAL TABLE
command, the only special character you can specify is \t ("\t").

The encoding option


The system supports single-byte characters in Latin9 encoding, and Unicode data
in the multi-byte UTF-8 encoding. Use the encoding option to specify the type of
data in the file.
'latin9'
The entire file contains only Latin-9 char/varchar data and no
nchar/nvarchar data. If the file contains any nchar/nvarchar data, it is
rejected by the load operation.
'utf8' The entire file uses UTF-8 encoding and contains only nchar/nvarchar data
and no char/varchar data. If the file contains any char/varchar data, it is
rejected by the load operation.
'internal'
The file can contains Latin-9 data, UTF-8 data, or both, and can use any
combination of char, varchar, nchar, or nvarchar data types. Use
'internal' if you are not certain of the data encoding. This is the default.

Use the nzconvert command to convert character encoding before loading data
from external tables, if necessary.

Note: The encoding option is not supported for fixed-length format.

The EscapeChar option


Specifies the use of an escape character.

Chapter 3. External table load options 3-7


The character immediately following the '\' is escaped. The only supported value
is '\', and the default is no escaping.

By default, the system expects fields to be delimited by a field-delimiter character


or by an end-of-row sequence. The system assumes that all other characters are
part of the value of the field.

Although efficient, this representation has the drawback that string fields might
not contain instances of the field delimiters. In addition, one value typically
becomes inexpressible because you use it to convey the absence of any value (that
is, that column is null).

One solution is to use an escape character for the delimiter. For example, the
following command line demonstrates the use the escapeChar option.
nzload -escapeChar ’\’ -nullValue ’NULL’ -delim ’|’
v |NULL| is null input field
v |\NULL| is a non-null input field that contains the text NULL
v |\|| is a non-null input field that contains the single character |
v |\\| is a non-null input field that contains the single character \

Note: This option is not supported for fixed-length format.

The FillRecord option


Specifies whether to allow an input line with fewer columns than the table
definition. Missing or trailing input fields must be treated as nulls if the columns
are nullable. The default is false.

The system expects one input field for every column in the schema of a target
table, and rejects a row with fewer fields. If you specify the fillRecord option, the
system allows omitting one or more trailing (rightmost) fields if all corresponding
columns can be null.

Note: This option is not supported for fixed-length format.

The Format option


Specifies the data format of the source file to load and unload.

The valid values are as follows:


'text' (default)
Data is in text-delimited format
'fixed' Data is in new fixed-length format
'internal'
Data is in compressed binary format. To use this value, the compress
option must be set to true. Internal format is used by IBM Netezza
processes such as backup and restore. It is not a documented interface and
is subject to change between releases.

The IgnoreZero option


Specifies discarding byte value zero in char() and varchar() fields.

The default is false. If true, the command accepts binary value zeros in input fields
and discards them.

3-8 IBM Netezza Data Loading Guide


Note: This option is not supported for fixed-length format.

The IncludeHeader option


Includes the table column names as headers in the external table file.

By default, the setting is false to omit the table column names from the file. Set the
variable to true to add the column names as header values to the external table
file.

Note: This option is only for unloading.

The IncludeZeroSeconds option


Specifies that "00" seconds values are unloaded to the external table.

For example, a time value such as 12:34:00 or 12:34 is unloaded to the external
table in the format 12:34:00. The default is false.

Note: This option is not supported for fixed-length format, and is only for
unloading.

The Layout option


Specifies the location of fields of the input record. The format is a series of
comma-separated zone definitions within braces.

Note: This option is required for and used only with the fixed-length format. For
more information, see “Fixed-length only options” on page 6-2.
Related concepts:
“Format options” on page 6-2

The LogDir option


Specifies the directory to which nzlog and nzbad files are generated for loads. This
option is not used for unloads.

The default value is '/tmp'. When users run remote loads from Windows clients
(through ODBC/JDBC), the default output directory is mapped to "C:\". The
directory name must be a valid UTF-8 string.

The MaxErrors option


Specifies the number of errors at which the system stops processing rows. If the
count of rejected rows reaches this threshold, the system immediately ends and
rolls back the load.

The default value is 1. This default causes the system to commit a load only if it
contains no errors. A maxErrors value n (where n is greater than 1) allows the first
n-1 row rejections to be recoverable errors, not including the number of rows
processed in the skipped row range.

Use this option to specify a different value, from 0 (unlimited errors) up to


2,147,483,647 (the largest signed 32-bit integer).

This option is different for fixed-length format. For more information, see
“Fixed-length option changes” on page 6-2.
Related concepts:

Chapter 3. External table load options 3-9


“Format options” on page 6-2

The MaxRows options


Specifies to stop processing after this initial number of rows. Use a limit clause
with the SELECT statement to limit loading data. The default is 0 (load all rows).

After processing a row (whether inserted, skipped or rejected), the system uses
these guidelines to look for another input row:
v If you did not specify the maxRows option, the system attempts to locate the next
input row.
v If you specified the maxRows option and the input row counter is equal to the
maxRows count, the system ends the load and commits all inserted records, not
including the rows processed in the skipped row range. Otherwise, the system
attempts to locate the next input row.

The NullValue option


Specifies the string to use for the null value, with a maximum 4-byte UTF-8 string.
The default is ‘NULL’.

You can specify a value such as a space (' ') or any string up to four characters.
Conceptually a field contains either a value or an indication that there is no value.
The system provides some flexibility for how you indicate that a field contains no
value.

The system determines the type of a field and whether it is null by inspecting the
corresponding column declaration:
v If there is no value, the system sets the corresponding value in the candidate
binary record to null.
v If you declared the target column “not null,” then an absence of a value is an
error.
v If a field does not indicate null, the system assumes that it contains a value. The
system analyzes the contents of that field, converts its textual input
representation to binary, and sets the corresponding value in the candidate
binary record to that value.
Related concepts:
“Column constraint rules for empty strings” on page 2-9

The QuotedValue option


Specifies whether data values are quoted or not. The default is false.

Specify SINGLE or YES to require quotation marks or DOUBLE to require double


quotation marks. You can precede the opening quotation or follow the closing
quotation with spaces. You can use the actual quotation characters if you enclose
them in double quotation marks. The system recognizes the end of the field by a
field-delimiter character or an end-of-row sequence.

The system recognizes a quoted value when the first non-space character is the
quotation character specified in the quotedValue option. If the first non-space
character is not the specified quotation character, then the system handles it
according to the normal rules. In particular, leading or trailing spaces in string
fields are considered part of the value of the string.

3-10 IBM Netezza Data Loading Guide


For example, the following command line demonstrates by using the quotedValue
option.
nzload -quotedValue SINGLE -nullValue ’NULL’ -delim ’|’
v |NULL| is a null input field
v |'NULL'| is a null input field
v | I'm | is a non-null input field that contains the text “I’m”
v | 'I''m' | is a non-null input field that contains the text “I’m“
v | '|' | is a non-null input filed containing the single character “|”
v |' '| is a non-null input field that containing a single space
v | | is a non-null input field that contains a single space
v | '' | is a non-null input field that contains a zero-length string
v || is a non-null input field that contains a zero-length string

Unlike the escapeChar option, the quotedValue option is not able to force the
system to accept the nullValue token as a valid non-null input value. The system
overhead for processing quoted value syntax is much greater than the default
unquoted syntax. In addition, except for strings that contain three or more field
delimiters that need to be escaped and no embedded quotation marks by using the
quotedValue option results in more bytes of input data than the escapeChar option.
When you have a choice, use unquoted syntax.

If you expect all values in all input fields (string or otherwise) to be uniformly
enclosed in quotation marks, then use the requireQuotes option to cause the
system to enforce this usage. Using the requireQuotes option improves the parsing
overhead and provides extra robustness.

Note: This option is not supported for fixed-length format.


Related reference:
“The RequireQuotes option” on page 3-12

The RecordDelim option


Specifies that the row/record delimiter to be used is the string literal.

Valid values must be a maximum 8-byte UTF-8 string.

Note: This option is used only with the fixed-length format. For more information,
see “Fixed-length only options” on page 6-2.
Related concepts:
“Format options” on page 6-2

The RecordLength option


Specifies the length of the entire record. Includes the length itself, but does not
include the RecordDelimiter.

Note: This option is used only with the fixed-length format. For more information,
see “Fixed-length only options” on page 6-2.
Related concepts:
“Format options” on page 6-2

Chapter 3. External table load options 3-11


The RemoteSource option
Specifies the source data file is remote, and takes the following values: ODBC,
JDBC, OLE-DB, or the empty string.

External tables created with the remote source value set to ODBC, JDBC, or
OLE-DB are usable only through those values. External tables created with the
remote source not set (or set to empty string) are usable from any client (the source
data file path is assumed to be on the IBM Netezza host, even if the load or
unload is initiated remotely from a different host).

The nzsql command does not support remote loads or unloads to external tables.
You can only create external tables remotely. The command supports loads and
unloads locally on the host.

This option is automatically set to ODBC if the host name option is set to anything
but local host or the reserved IP address (127.0.0.1).

The RequireQuotes option


Specifies whether quotation marks are mandatory.

The default is false. If set to true, the quoted value must be set to YES, SINGLE, or
DOUBLE.

Note: This option is not supported for fixed-length format.


Related reference:
“The QuotedValue option” on page 3-10

The SkipRows option


Specifies the number of initial rows to skip before loading the data.

The default is 0 (none). After the system has a candidate binary record from an
input row, it determines whether to insert that record into the target table:
v If you did not specify this option, the system inserts every record.
v If you specified this option and the input row counter is less than or equal to
the skipRows count, the system discards the candidate binary record (skipped).
Otherwise, the system inserts the record.

Note: If you use the skipRows option, the system skips that number of rows, and
then begins the count for the maxErrors option, the maxRows option, or both if you
specify them.

This option cannot be used for 'header' row processing in a datafile, as even the
skipped rows are processed first, so the data in the header rows should be valid
with respect to the external table definition.

This option can be helpful for testing purposes. If you set this option to a
maximum value, you can validate that the data file is correct before loading the
rows into a user table.

The SocketBufSize option


Specifies the chunk size at which to read the data from the source file, expressed in
bytes.

3-12 IBM Netezza Data Loading Guide


Valid values range from 64 KB to 800 MB, with a default value of 8 MB. Values
outside this range result in a system notice that the value is reset to the
appropriate minimum or maximum level. You can use this option to tune the
performance of loads based on the speed at which the source data is available for
loading.

The TimeDelim option


Specifies the single-byte character that separates the time components. The default
is a colon character ':'.
v If you specify the timeDelim option as an empty string, you must specify the
hour, minutes, and optional seconds as two-digit numbers.
v If you specify the 12-hour format, you can precede the AM or PM token with a
single space. The AM and PM tokens are not case-sensitive.

The system checks syntax and range errors. If an error occurs, the system discards
the record to the nzbad file and logs an error with the record number in nzlog file.

The TimeRoundNanos option


Rounds the time value to six fractional seconds digits. You can use the
timeRoundNanos option to specify that the system will allow and round non-zero
digits with smaller than microsecond precision.
v If you do not use the timeRoundNanos option, a value is accepted if it can be
stored without loss of precision.
v If you specify this option, the value is accepted, even when full precision of any
fractional seconds cannot be preserved. In this case, the value is rounded.
For example, consider the following timestamps:
1999/12/31 23:59:59.9999994
1999/12/31 23:59:59.9999995
Both of these timestamps specify a smaller than microsecond resolution. Without
the option, each record would be rejected. Using the option, the first sample
timestamp would round to a 1999/12/31 23:59:59.999999 value. The second
sample would round to a2000/01/01 00:00:00.0 value.

Note: This option is not supported for fixed-length format. It is also referred to as
the TimeExtraZeros option.

The TimeStyle option


Specifies the time format ('24HOUR', '12HOUR') used in the data file. The default
is '24HOUR'.

The TruncString option


Specifies how to process strings that are longer than their declared storage.

When a string is larger than its declared storage size, you can use this option to
define how to process records with those strings.
v A value of True causes the system to truncate any string value that exceeds its
declared char/varchar storage.
v A value of False causes the system to report an error when a string exceeds its
declared storage. This is the default behavior.

Note: This option is not supported for fixed-length format.

Chapter 3. External table load options 3-13


The Y2Base option
If you specify the Y2-style date, use the -y2Base option to specify the start of the
100-year range.

The following table provides some examples of date ranges and their
corresponding input values.
Table 3-5. The -y2Base option
Wanted range 1900...1999 1923...2022 1976...2075 2000...2999
Option -y2Base 1900 -y2Base 1923 -y2Base 1976 -y2Base 2000
In Y2 input
00 1900 2000 2000 2000
01 1901 2001 2001 2001
02 1902 2002 2002 2002
...
24 1924 1924 2024 2024
25 1925 1925 2025 2025
...
76 1976 1976 1976 2076
77 1977 1977 1977 2077
...
98 1998 1998 1998 2098
99 1999 1999 1999 2099

External table option processing


This section contains more information about how the system processes the
external table options.

Row Counts
The system uses a line-oriented input format where one line of text is an input
row. It operates by isolating successive rows in the input stream. For each new
row, the system increments a row counter (starting at 1) and analyzes the contents
of the row.

Two kinds of errors can occur during the analysis:


v The input text might not match the expected format.
v A field value might fail to meet a requirement imposed by the target table
schema.

If a row contains no errors, the system converts the row into a candidate binary
record.

Bad rows
When the system encounters an error processing a row, it stops analyzing the row,
appends the row to the bad rows file, writes a supporting diagnostic message to
the nzlog file that describes the position and nature of the error, and increments a
rejected rows counter.

3-14 IBM Netezza Data Loading Guide


Input row delineation
Input rows are separated by any of the common end-of-line conventions:
<CR><LF>, <LF><CR>, <CR>, or <LF>. In UNIX environments, <LF> is
commonly known as NewLine. The last row or last line does not need an
end-of-line character.

The <CR><CR> or <LF><LF> pairs are not valid end-of-line sequences. Instead
each pair encloses an empty row that contains no values. The system considers
such an empty row valid only if you specified the fillRecord option, and you
specified that every column in the target table is able to be set to null.

Input fields and table columns


The system determines the shape of input rows by inspecting the schema of the
target table. The fields are paired-up left-to-right with the columns in the target
schema. When the system locates the start of a field, the declared type of the
corresponding target column guides further processing.

Note: It is an error for a row to contain more fields than the number of columns in
the target table.

String and non-string fields


If an input field corresponds to a column declared as char, nchar, varchar, or
nvarchar, the system considers it a string field. All other types are considered
non-string fields. This distinction is important because spaces are significant within
string fields but not in non-string fields.

Note: An empty field or a field that contains only spaces can represent a legitimate
string value, but can never be a legitimate non-string value.

The system uses the following rules based on whether the field is a string field:
v For a string field, all characters from the beginning of the field to the
terminating delimiter or end of row sequence contribute to the value of the field.
v For a non-string field, the system skips any leading spaces, interprets or converts
the contents of the field, and skips any trailing spaces.

The string and non-string distinction also affects the details of how a field indicates
that it is null. For more information, see “Handle the absence of a value.”
Related concepts:
“Handle the absence of a value”

Handle the absence of a value


In SQL, a record must include a value if a column is declared as not null. When a
record contains no value for a column, the column is considered to be null. The
system provides an explicit and implicit method for conveying nullness.
v The explicit method includes a specific token in the field instead of a value. By
default, this token is the word “null” (not case-sensitive). You can use the
nullValue option to change this token to any other 1 - 4 character alphabetic
token. You can precede or follow an occurrence of the explicit null token in a
non-string field with adjacent spaces. For the system to recognize an explicit null
token in a string field, the token cannot have preceding or trailing adjacent
spaces. The explicit null token method makes it impossible to express a string
that consists of exactly the text of the null token.

Chapter 3. External table load options 3-15


v The implicit method interprets an empty field as null. This method is always
available to non-string fields independent of any nullValue option setting and
works even if the non-string field contains spaces. You can use the implicit
method on string fields only if you set the nullValue option to the empty string
('').
The system considers a string field empty (potentially null) only if it contains
truly zero characters (no spaces). Setting nullValue to the empty string makes it
impossible to set any character varying (alias varchar(n)) column to an empty,
zero-length string. In other words, if the system encounters an empty string and
the nullValue is set to '', then the system treats the empty string as a null value.
Related concepts:
“String and non-string fields” on page 3-15

Load continuation
If you enable load continuation with the allowReplay option, or set the session
variable LOAD_REPLAY_REGION to true, the system ensures that a simple load
that uses external tables continues after the system is paused and resumed. You do
not have to stop and resubmit the load.

If no value is specified for the allowReplay option, or the option setting is 0, the
system defaults to the Postgres default setting. If the setting is a valid non-zero
number, it specifies the number of allowable restarts.

When you enable load continuation, the system holds the records to be sent to the
SPU in the replay region in host memory. After the system sends the data in this
region to the SPUs, it does a partial commit that forces all the unwritten data to
the disks and allows the system to reuse the data buffers of the reload region. If a
SPU reboots or resets, the system rolls back to the last partial commit, and
reprocesses and resends the data.

Note: This option has a performance impact which depends on the speed of the
incoming data. In addition, system memory is used for the data buffering that
enables loads to be continued. When the buffer memory is exhausted, new loads
will pend until needed memory becomes available.

Load continuation cannot operate on any table that has one or more materialized
views in an active state. Before enabling load continuation, suspend the associated
materialized views. You can suspend active materialized views either through the
NzAdmin tool or by issuing the ALTER VIEWS command. Sample syntax for
ALTER VIEWS follows.
ALTER VIEWS ON <table> MATERIALIZE SUSPEND

After loading is completed, you can update and activate the materialized views for
the table. Sample syntax follows.
ALTER VIEWS ON <table> MATERIALIZE REFRESH
Related concepts:
“Session variables” on page 3-17

Legal characters
Input is composed of the printing characters (bytes 33-255), space (byte 32),
horizontal tab (byte 9), line feed (byte 10), and carriage return (byte 13). By default
you cannot use the nonprinting control characters.

3-16 IBM Netezza Data Loading Guide


v Specify the ctrlChars option to allow control characters (bytes 1-8, 11-12, and
14-31) to display within strings. In this case, only 0, 10, and 13 are not allowed.
v Specify the crInString option to allow unescaped carriage returns (cr) in
char/varchar fields. If you specify the crlnString option, line feed (LF) becomes
the default end-of-row indicator.
v Specify the escapeChar option to allow any character preceded with a backslash
(\) to be interpreted as an escape character. In this way, you can use the zero
(byte 0), line feed (byte 10), carriage return (byte 13), or the closing delimiter.
v Specify the ignoreZero option to cause the system to check every character for
zero. This causes the system to skip over each zero it finds and to consider the
next character. If you specify this option, you cannot include a zero byte in a
string.
For example, assume <nul> is a null byte, the field delimiter is '|' and you
specified ignoreZero:
..|<nul>AB<nul>CDEF<nul>|..

fills a char(6) column with 'ABCDEF'.


..|<nul>127<nul>|..

fills a byteint column with binary 01111111 (= 0x7F).

The following table lists the end-of-row and control characters that are allowed
with the different nzload system options. The check mark indicates that the option
is specified or allowed.
Table 3-6. Control characters and end of record characters
Options End of record Control characters allowed within strings
-crlnString -ctrlChars lf cr crlf lfcr 0 1-8 ht lf 11 12 cr 14-31
U U U U U
U U U U
U U U U U U U U U U
U U U U U U U U U

Note: In fixed-length format, control characters are treated differently.


Related concepts:
“Fixed-Length format” on page 2-15

Session variables
You can use the following session variables as nzload options.
v LOAD_REPLAY_REGION
Specifies that a simple load using external tables has the ability to continue after
the system has been paused and resumed
v MAX_QUERY_RESTARTS
Specifies the number of restarts allowed for load continuation.
v LOAD_LOG_MAX_FILESIZE
Specified the maximum allowed size in MB for the log file.
v NZ_SCHEMA
For Netezza systems that run Release 7.0.3 or later that support multiple
schemas in a database, specifies the schema into which data should be loaded.
Chapter 3. External table load options 3-17
The NZ_SCHEMA value can be helpful for users who have to support loads into
release 7.0.3 multiple schema systems as well as systems which do not support
multiple schemas. You can keep the same nzload commands and set
NZ_SCHEMA when connecting to multiple schema systems, and unset the
variable before using the load scripts to single- schema systems.
Related concepts:
“Load continuation” on page 3-16

3-18 IBM Netezza Data Loading Guide


Chapter 4. The nzload command
This section describes the nzload command, how it works, and how to use it to
load data into a Netezza appliance.
Related concepts:
Appendix A, “Examples and grammar,” on page A-1

How the nzload command works


The nzload command is a SQL CLI client application that you can use to load data
from the local host or a remote client, on all the supported client platforms.

The nzload command processes command-line load options to send queries to the
host to create an external table definition, run the insert/select query to load data,
and when the load completes, drop the external table.

The nzload command connects to a database with a database user name and
password, like any other IBM Netezza client application. The user name specifies
an account with a particular set of privileges, and the system uses this account to
verify access.

Note: While you can use the nzload command as an ODBC client application, it
does not require or does not work with Data Source Name (DSN). It bypasses the
ODBC Driver Manager and connects directly to the Netezza ODBC driver.

Protection and privileges


To run the nzload command, you must have the Create External Table privilege
and access privileges to that table or database (List, Insert, Select).

If you issue the nzload command from the IBM Netezza appliance host itself, and
the user who issues the command is not the user nz, you must do one of the
following tasks:
v Ensure that the GROUP nz has Read permissions for the data file to load.
v Use the -host option with the nzload command (such as nzload -host
<hostname>).

For more information, see the IBM Netezza System Administrator’s Guide.

Concurrency and transactions


You can run multiple nzload commands in parallel that add records to the same or
different tables. While loading, you can run concurrent queries, inserts, updates,
and deletes against committed records in the target tables.

The nzload command conducts all insertions into the target table within a single
transaction. The nzload command commits the transaction at the end of the job,
provided it does not detect any unrecoverable errors. Only after the nzload
command commits the transaction are the newly loaded records visible to other
queries. When encountering a load error while running multiple concurrent loads,
only the load with the error does not complete.

© Copyright IBM Corp. 2011, 2014 4-1


While the nzload command is running, it sends records to the SPUs along with the
current transaction ID. When a SPU receives new records, it immediately allocates
resources and writes the records to the database or the table on the disk.

If the nzload command cannot commit the transaction, these storage resources
remain allocated. To free up this disk space, use the nzreclaim command on the
specific table or database. For more information about the nzreclaim command, see
the IBM Netezza System Administrator’s Guide.

If you cancel an nzload job, the nzload command does not commit the transaction.

Program invocation
The nzload command is a command-line program that accepts input values from
multiple sources. The precedence order of the input values is as follows:
v Command line
v Control file. Without a control file, you can only do one load at a time, and the
use of a control file allows multiple loads. For more information about the
control file, see “The nzload control file” on page 4-5.
v Environmental variables (only used for user, password, database, and host)
v Built-in defaults

Option names are not case-sensitive. Every option has a standard name for use in
either the command line or the control file. For more information about the input
values, see Table 4-1 on page 4-3.

Many options include a token argument, which you can enclose in either single or
double quotation marks. The nzload command ignores letter casing for the
characters in option token arguments (for example -boolStyle YES_NO is equivalent
to -boolStyle yes_no).

Note: You must use quotation marks for options that require a punctuation
character as a token, and use an escape character if quotation marks are part of the
argument.

Load status information


When active loads are running on your NPS system, you can use the
_v_load_status view to display information about the load operations.

You can query the system view _v_load_status to display details about the
progress of loads that are running on the system. The view shows information
about the load operations such as the table name, database name, data file, number
of processed rows, and number of rejected rows. More information has been added
to the load log file for performance-related details about the load operation.

A sample view query follows. The output has been reformatted to fit the page
width.
SYSTEM.ADMIN(ADMIN)=> select * from _v_load_status;
PLANID | DATABASENAME | TABLENAME | SCHEMANAME | USERNAME | BYTESPROCESSED | ROWSINSERTED |
--------+--------------+-----------+------------+----------+----------------+--------------+
ROWSREJECTED | BYTESDOWNLOADED
-------------+-----------------
2932 | SYSTEM | LINEITEM | ADMIN | ADMIN | 142606226 | 1136931 |
4 | 131911476
(1 row)

4-2 IBM Netezza Data Loading Guide


The nzlog file also contains two fields Rows/Second and Bytes/Second that provide
more performance-related details about the load operation.

nzload command syntax


The nzload command takes options and arguments. You can accept the defaults or
specify options on the command line, in the control file, or by using environment
variables.

Syntax

The nzload command uses the following syntax:


nzload [-h|-rev] [options]

Inputs

The nzload command uses many of the options for external tables as described in
Chapter 3, “External table load options,” on page 3-1. Particular options for nzload
are shown in the following table.
Table 4-1. The nzload options
Option Description
-cf filename Specifies the control file. For more information, see “The nzload
control file” on page 4-5.
-df filename Specifies the data file to load. If you do not specify a path, the
system uses the special token <stdin> to store the file path string.
Corresponds to the DataObject external table option.
-lf filename Specifies the log file name. If the file exists, append to it.
-bf filename Specifies the bad or rejected rows file name (overwrite if the file
exists).
-outputDir dir Specifies the output directory for the log and bad or rejected rows
files. Corresponds to the LogDir external table option.
-logFileSize n Session variable (LOAD_LOG_MAX_FILESIZE) that specifies the
size (in MB) of the log and bad or rejected rows files. The default is
2000 MB (2 GB).
-fileBufSize Specifies the chunk size (MB for fileBufSize or bytes for
fileBufByteSize) at which to read the data from the source file.
-fileBufByteSize Corresponds to the SocketBufSize external table option.
-allowReplay Session variables (LOAD_REPLAY_REGION and
MAX_QUERY_RESTARTS) that specify the number of query restarts
-allowReplay n for load continuation if a SPU is reset or failed over. If n is a valid
non-zero number, it specifies the number of allowable query
restarts. If no value is specified, or n is 0, the system defaults to the
Postgres default setting.

Additional options

The nzload command takes the following additional options:


Table 4-2. Additional options for nzload
Option Description
-u user Specifies the database user name [NZ_USER].

Chapter 4. The nzload command 4-3


Table 4-2. Additional options for nzload (continued)
Option Description
-pw password Specifies the password of the Netezza user [NZ_PASSWORD].
-host name Specifies the host name or IP address [NZ_HOST]. Runs on the
local host if not specified here. If you set this option to any name
but localhost or any IP address but the reserved one (127.0.0.1),
the system sets the remotesource option to ODBC.
-caCertFile path Specifies the path name of the root CA certificate file on the
client system. This argument is used by IBM Netezza clients who
use peer authentication to verify the Netezza host system. The
default value is NULL which skips the peer authentication
process.
-securityLevel level Specifies the security level that you want to use for the session.
The argument has four values:
preferredUnSecured
This value is the default value. Specify this option when
you would prefer an unsecured connection, but you
accept a secured connection if the Netezza system
requires one.
onlyUnSecured
Specify this option when you want an unsecured
connection to the Netezza system. If the Netezza system
requires a secured connection, the connection is rejected.
preferredSecured
Specify this option when you want a secured connection
to the Netezza system, but you accept an unsecured
connection if the Netezza system is configured to use
only unsecured connections.
onlySecured
Specify this option when you want a secured connection
to the Netezza system. If the Netezza system accepts
only unsecured connections, or if you are attempting to
connect to a Netezza system that is running a release
before 4.5, the connection is rejected.
Note: If you specify an invalid value for the -securityLevel
argument of the nzload command, the command defaults to the
preferredUnSecured level.
-db database Specifies the database to load [NZ_DATABASE].
-schema schema For a Netezza system that supports multiple schemas in a
database, this option specifies the schema in which to load the
table. If you do not specify the -schema option, the system uses
the value of the NZ_SCHEMA environment variable. If
NZ_SCHEMA is not set, the system uses the default schema for
the database.
-t table Specifies the table name. You can specify a fully qualified name
for this value.
-portnumber Specifies the port to use, which you can use to override the
default.
-loginTimeout You can enter a different value, expressed in seconds, for the
<int-seconds> login timeout. This option overrides the default value of 30
seconds.

4-4 IBM Netezza Data Loading Guide


Outputs

The nzload command exits with the following codes:


0 Successful, all input records were inserted.
1 Failed, no records were inserted because of an error or errors found during
the load.
2 Successful, but errors found during the input did not exceed the error
threshold (-maxErrors), good records were inserted.
Related concepts:
Appendix C, “Option names,” on page C-1
Chapter 3, “External table load options,” on page 3-1

The nzload control file


With a nzload control file, you can define load operations in a text file without
having to specify the options on the nzload command line.

You can also use control files to run multiple concurrent loads, with different
options, in one command instance. Each load is a different transaction. If a load
fails, the command continues to run the other load operations in the file. The
command displays messages to inform you of the success or failure of each load
operation.

Options

Within a control file, you can specify any of the valid options for an external table.
You can specify the long format name of the option or the short format name.

You can also specify the following options:


Database
Specifies the name of the database that contains the table to which you are
loading data.
Schema
Specifies the schema in which to load the data. Use only if the Netezza
system supports multiple schemas in a database
Table Specifies the name of the table in which to load the data.
Bad file (bf)
Specifies the name of the nzbad file, which contains any records that cannot
be loaded. The default is table.schema.database.nzbad.
Log file (lf)
Specifies the name of the nzload log file, which contains messages and
errors that occurred during the load processing. The default is
table.schema.database.nzlog.
Data file
Specifies the path name of the file that you want to load into the specified
table, schema, and database. The data file option must be the first line of
the control file, followed by list of control file options in curly braces {}.
You can specify more than one data file, each with its own set of options,
in the control file.

Chapter 4. The nzload command 4-5


Decimal delimiter
Specifies to use a comma instead of a period as a decimal delimiter. The
default delimiter is a period.

The options in a control file are not case-sensitive. For example, you can specify
the option in letter formats such as database, DataBase, Database, or DATABASE.

Command-line options take precedence over any equivalent options specified in a


control file. With this precedence, you can override any control file options as
necessary without changing the control file. If you specify a control file for the
nzload command, you cannot specify a data file argument (-df) on the command
line.

Syntax

The syntax for using a control file is as follows, where each sequence can be
another load:
DATAFILE <filename>
{
[<option name> <option value>]*
}

For example, the following control file options load the data from customer.dat
into the customer table:
DATAFILE /home/operation/data/customer.dat
{
Database dev
schema sales
TableName customer
}

If you save the control file contents as a text file (named cust_control.txt in this
example) you can specify it by using the nzload command as follows:
nzload -cf /home/nz/sample/cust_control.txt
Load session of table ’CUSTOMER’ completed successfully

When you use the nzload command, you cannot specify both the -cf and -df
options in the same command. You can load from a specified data file, or load
from a control file, but not both in one command.

The following control file options define two data sets to load. The options can
vary for each data set. The examples show the schema option, but if your Netezza
system supports only one schema in a database, omit that option.
DATAFILE /home/operation/data/customer.dat
{
Database dev
Schema sales
TableName customer
Delimiter ’|’
Logfile operation.log
Badfile customer.bad
}

DATAFILE /home/imports/data/inventory.dat
{
Database dev
Schema ops
TableName inventory

4-6 IBM Netezza Data Loading Guide


Delimiter ’#’
Logfile importload.log
Badfile inventory.bad
}

If you save these control file contents as a text file (named import_def.txt in this
example) you can specify it by using the nzload command as follows:
nzload -cf /home/nz/sample/import_def.txt
Load session of table ’CUSTOMER’ completed successfully
Load session of table ’INVENTORY’ completed successfully
Related concepts:
“Error reporting” on page B-4

Configuration file example


The following is an example of a fixed-format configuration file.
{
outputdir /home/nzuser
crinstring ’true’
ctrlchars ’true’
decimaldelim ’.’
format fixed
recordlength 10
maxerrors 0
tablename refnull
layout ( fld1 bool 1_0 bytes 1 , fld2 char(5) bytes 5 , fld3 char(4)
bytes 4)
}

Chapter 4. The nzload command 4-7


4-8 IBM Netezza Data Loading Guide
Chapter 5. Unload data
This section describes the options for unloading data.
Related concepts:
Appendix A, “Examples and grammar,” on page A-1

External table unload options


The following external table options are not supported for unloads. For a complete
list of external table options, see Chapter 3, “External table load options,” on page
3-1.
v CtrlChars
v FillRecord
v IgnoreZero
v Layout
v LogDir
v MaxErrors
v MaxRows
v QuotedValue
v RecordDelim
v RecordLength
v RequireQuotes
v SkipRows
v TimeRound Nanos/TimeExtraZeros
v TruncString
v Y2Base

The IncludeZeroSeconds external table option is used only for unloads. The
two-digit format of the DateStyle external table option is not supported for
unloads.
Related concepts:
Chapter 3, “External table load options,” on page 3-1

Unloading data to a remote client system


A special use of the CREATE EXTERNAL TABLE/INSERT INTO commands is to
stream data from an IBM Netezza database file on a Netezza host system to a
remote client. This unload does not remove rows from the database, but rather
stores the unloaded data in a flat file that is suitable for loading back into a
Netezza database.

You can unload data to any of the supported Netezza clients, which include
Windows, Linux, Solaris, AIX®, and HP-UX. You can unload all data types
(including Unicode) and file types (uncompressed and compressed formats).

Note: You must be the admin user or have the Create External Table
administration privilege to create an external table, and you must have permission
to write to the path of the data object.

© Copyright IBM Corp. 2011, 2014 5-1


Note: Unloading for fixed-length format is not supported.

To unload to a remote client:


1. Establish an ODBC or JDBC connection between the client machine and the
Netezza appliance host. For example on a Linux or UNIX client, type isql.
2. Use the CREATE EXTERNAL TABLE command to create an external table. An
example follows:
CREATE EXTERNAL TABLE emp_backup SAMEAS emp USING (
DATAOBJECT (’/tmp/emp.dat’)
REMOTESOURCE ’ODBC’);
INSERT INTO emp_backup SELECT * FROM emp;
In the example, the DATAOBJECT file specification must be a valid file on the
receiving machine. REMOTESOURCE must be either ODBC, JDBC, or OLE-DB.
The ODBC, JDBC, or OLE-DB client must be connected with the corresponding
Netezza appliance library. If you do not specify a remote source, the system
unloads the data to a file on the Netezza appliance host.
3. To reload the data in the external table, you can use a SQL query such as:
INSERT INTO emp SELECT * FROM emp_backup;
Verify that emp is empty before you reload the data.

5-2 IBM Netezza Data Loading Guide


Chapter 6. Fixed-length format
This section describes the fixed-length format for loading data into external tables.

Format background
All data is a series of byte-sequences and has an associated data type, used as a
conceptual or abstract attribute of the data. Without an associated data type, a
byte-sequence can be interpreted in numerous ways.

A single data type can be represented in different forms. For example, an integer
data type can be represented or stored in various types of binary format, or in
human-readable text or character format (typically ASCII). Similarly, dates, times,
and other data types have multiple representations used by different programs,
languages, and environments. At some point, though, these data types must be
represented in readable form, so users can do something with the data. Data for
loading into the data warehouse typically is presented in either delimited format or
fixed-length format by using either ASCII or UTF-8.

Fixed-length format files


Fixed-length format files use ordinal positions, which are offsets to identify where
fields are within the record. There are no field delimiters, and there might be no
end-of-record delimiter. Data in fixed-length format files usually does not have
decimal or time delimiters because delimiters are not necessary and consume
space. Because the fields are fixed in size, the locations of delimiters are fixed, and
are specified in the layout definition, which accompanies the fixed-length format
data file.

Loading fixed-format data into the database requires that you define the target
data type for the field and the location within the record.

Not all fields in a fixed-length format file need to be loaded, and can be skipped
by using the ‘filler’ specification. The order of fields in the data file must match the
order of the target table, or an external table definition must be defined, which
specifies the order of the fields as database columns. An external table definition in
combination with an insert-select statement allows field order to be changed.

Unknown or null values are typically represented by known data patterns, which
are classified as representing null. The IBM Netezza system identifies and acts on
these values.

Data attributes
The typical data attributes in fixed-length format files are as follows:
Data type
The data at a given offset in a record is always of the same type.
Representation
The representation is constant, and each field has a fixed width. Data
within a field is always presented in the same way. Certain items such as
radix points, time separators, and date delimiters are always at the same
place and are typically implied, rather than being present in the data file.

© Copyright IBM Corp. 2011, 2014 6-1


Value The value can be an actual value or a null indicator. Data representations
that indicate a null value are specified by the layout definition. This
assumes that null is allowed.
Length
There is no length specification within the data file, as length in the file is
fixed, and the length attribute is specified by the layout definition.
Null-ness
Null-ness is identified in the layout definition as either a specific data
pattern, such as “all spaces” or as being “flagged” by a value in another
column.

Format options
The following sections describe the format options that are valid only for
fixed-length data loads, and those that have a different behavior when used for
fixed-length format loads.

Fixed-length only options

The following external table options are valid only for the fixed-length format.
Table 6-1. Fixed-length only options
Option Meaning
RecordLength The length of the entire record, including null-indicator bytes (if any) and
excluding record-delimiter (if any).
v No default value
v Constant integer
RecordDelim The row/record delimiter.
v Default is ‘\n’ (newline). The field is literally interpreted, so ‘\n’ looks for
those characters, and not ‘newline’.
v The end-of-record delimiter is entered between single quotation marks.
The end-of-record indicator can be up to a maximum 8 bytes long.
v The omission of a record delimiter is defined by side-by side single
quotation marks.
Layout Mandatory for fixed-length format. Used to define the location of the fields
of the input record.
v No default value
v Comma-separated zone definitions within braces

Fixed-length option changes

The following external table options have a different meaning for the fixed-length
format:

6-2 IBM Netezza Data Loading Guide


Table 6-2. Changed options for fixed-length
Option Meaning
CtrlChars Text-delimited: If False (default), unescaped control characters (except \t)
error out.

Exception: If CtrlChars is False and CrInString is True, \r (carriage Return)


can be used without error.

If True, unescaped control characters \0 and \n error out (also \r if


CrInString is False).

Fixed-length: If True, all unescaped characters allowed.

If False (default), unescaped characters error out.

Exceptions: \t, \n (and \r if CrInString is ON).


CrInString Text-delimited: Augments CtrlChars behaviors.

Fixed-length: Used only when CtrlChars is OFF.


MaxErrors Sets the maximum number of allowed (non-fatal) errors before stopping the
load. Since the parser now reports errors for each field or zone rather than
one error for the row, multiple errors can be reported for the same row, so
this limit must be set so. When the parser sees an error in a field or zone, it
recovers (by using the field or zone length) and continues from the next
field or zone, until the End-of-Record, an unrecoverable error, or this
maxerrors limit is reached.

Unrecoverable errors include the following:


v RecordLength mismatch
v RecordDelimiter not found
v RecordLength invalid (negative values or zero)
v Zone length invalid (negative values)
v UTF-8 initial byte is invalid
v UTF-8 continuation bytes are invalid

Unsupported options

The following external table options are not supported for fixed-length format, and
if set, result in an error:
v Encoding
v FillRecord
v IgnoreZero
v TimeExtraZeros
v TruncString
v AdjustDistZeroInt
v IncludeZeroSeconds
v Delimiter
v EscapeChar
v QuotedValue
v RequireQuotes

Chapter 6. Fixed-length format 6-3


Default zone values

The following existing external table options work as default values for zone
definitions:
NullValue
Default for the ‘NULLIF’ clause of all zones.
DateStyle, DateDelim, TimeStyle, TimeDelim, BoolStyle
Default for zone style for corresponding date, time, and boolean zones.
Related reference:
“The CRinString option” on page 3-3
“The CtrlChars option” on page 3-3
“The MaxErrors option” on page 3-9
“The Layout option” on page 3-9
“The RecordDelim option” on page 3-11
“The RecordLength option” on page 3-11

Layout definitions
Layout is an ordered collection of zone (field) definitions, and is a required option
for fixed-length format. Each zone definition is made up of mutually exclusive
(non-overlapping) clauses.

These clauses must be in the following order, although some are optional and can
be empty:
Use-type
Indicates whether a zone is a normal (data) zone or a filler zone. For data
zones, this value is omitted. Filler zones can only be specified in bytes.
Other use-types exist, but are not used for fixed-length format data.
Name The name of the zone. Duplicate zone names are not allowed. This
definition is not currently used, but is typically provided to identify the
field.
Type Defines the zone type. When not specified, type is defaulted to the
corresponding type of a table column. Filler-zones have no default type.
Valid values are as follows:
v CHAR
v VARCHAR
v NCHAR
v NVARCHAR
v INT1
v INT2
v INT4
v INT8
v INT
v UINT1
v UINT2
v UINT4
v UINT8
v UINT

6-4 IBM Netezza Data Loading Guide


v FLOATING
v DOUBLE
v NUMERIC
v BOOL
v DATE
v TIME
v TIMESTAMP
v TIMETZ
Style Defines the zone representation, and is optional. This representation is
defaulted based on the zone-type and ‘Format’ option. All other styles are
only valid for their corresponding non-textual zone-types. Valid values are
the following values:
INTERNAL
Valid only for textual zones (CHAR/VARCHAR/NCHAR/
NVARCHAR)
DECIMAL
Valid for integer/numeric zone types
DECIMALDELIM
Valid for numeric, float, double, and time-styles (time, timetz, and
timestamp) zone type
FLOATING
Valid for float or double zone type
SCIENTIFIC
Valid for float or double zone type
YMD <'date-delim'>
Valid for date zones, including other date-styles currently
supported in external table options DateStyle and DateDelim
12Hour <'time-delim'>
Valid for time zones, including other time-styles currently
supported in external table options TimeStyle and TimeDelim
24Hour <'time-delim'>
Valid for time zones other time-styles currently supported in
external table options TimeStyle and TimeDelim
YMD <'date-delim'> 24Hour <'time-delim'>
Valid for timestamp and timetz zones, including other
combinations of date and time styles currently supported for
external table options DateStyle, DateDelim, TimeStyle, and
TimeDelim.
TRUE_FALSE, Y_N, 1_0
Valid for boolean zones, including other boolean styles currently
supported for external table option BoolStyle. Style must be in
accordance with format
Length
Specified in bytes.
Nullif Defines the zone null-ness attribute. For fixed-format files, this
clause specifies a known data pattern within the field which when
present signifies the field is null. Length is equal to or less than the
column width, and maximum length is 39 bytes.

Chapter 6. Fixed-length format 6-5


Nulls are detailed in the following table:
Table 6-3. Layout example
Use type Name Type Style Length Nullness
NA f1 Int4 DECIMAL Bytes 10 Nullif @ = 0
NA f2 Date YMD Bytes 10 Nullif &= '2000-10-10'
NA f3 Char(20) INTERNAL Chars 10 Nullif && ''
Filler f4 Char(10) NA Bytes 10 NA

Fixed-length format definition


Fixed-length format files must have a format definition. This topic shows examples
of typical fixed-length format definitions for typical data types.

End-of-record
When fixed-format records end in a newline character, no action is required.
Newline is the default end-of-record delimiter. When there is no record separator,
use single quotation marks side by side, as in the following example:
RecordDelim ’’

RecordDelim is a literal sequence of up to 8 bytes, which does not translate


common escape representations or support functions like CHAR(8).

Record Length

Record Length is optional, but can provide feedback that the format definition has
the correct length. This excludes the end-of-record delimiter. The following is an
example:
Recordlength NNN

Skip fields

The following clause skips 4 bytes:


“filler char(4) bytes 4”

However, the preferred method is to indicate that the field is being skipped, as in
the following example:
“filler fld_name char(4) bytes 4”

Temporal values

Temporal values in fixed-length format files often omit delimiters. The following
table shows clauses that load dates, times, and timestamps without delimiters.
Table 6-4. Temporal values
Data type Value Format clause
Date 20101231 date1 date YMD’’ bytes 8
Time 231559 time1 time(6) 24hour ’’ bytes 6
Timestamp 0101231231559 stamp1 timestamp(6) 24hour ’’ bytes 14

6-6 IBM Netezza Data Loading Guide


Table 6-4. Temporal values (continued)
Data type Value Format clause
Timestamp 20101231231559000001 (Load as char(24), then use insert-select)

to_timestamp(col,’YYYYMMDDHH24MISSUS’)
Date 2010-12-31 date2 date YMD’-’ bytes 10
Time 23.15.59 time2 time(6) 24hour ’.’ bytes 8
Timestamp 2010-12-31 23:15:59 tms2 timestamp(6) YMD ’-’ 24hour ’:’ bytes 19
Timestamp 2010-12-31 23:15:59.0001 tms3 timestamp(6) YMD ’-’ 24hour ’:’ bytes 26
Timetz 12:30:45+03:00 Tz1 TIMETZ(6) 24HOUR ’:’ bytes 14
Timetz 123045+-0300 (Load as char(11) then use insert-select)

(substring(col1,1,2)||’:’||
substring(col1,3,2)||’:’||substring(col1,5,5)||’:’||
substring(col1,10,2))::timetz

Numeric values

The following table shows numeric values.


Table 6-5. Numeric values
Data type Value Format clause
Integer 32767 int1 int2 bytes 5
Int8 9123456789123456 int2 int8 bytes 16
Numeric 2315.59 num1 numeric(6,2) bytes 7
Numeric 231559 (Load as char(6) then use insert-select) (col/
100)::numeric(6,2)
Floating 1.2345678 flt1 floating bytes 9
Floating 12345678 (Load as char(8) then use insert-select)
(substring(col1,1,1)||’.’||substring(col1,2,7))::float
Double 1.2345678 flt1 double bytes 9
Double 12345678 (Load as char(8) then use insert-select)
(substring(col1,1,1)||’.’||substring(col1,2,7))::double

Logical values

The following table shows logical values.


Table 6-6. Logical values
Data type Value Format clause
Boolean Y or y, N or n BOOL Y_N BYTES 1
Boolean 1, 0 BOOL 1_0 BYTES 1
Boolean T or t, F or f BOOL T_F BYTES 1

Chapter 6. Fixed-length format 6-7


Null values

Fixed-length format files typically use ‘magic’ values to represent nulls. Adding a
nullif clause to any specification allows the column to be checked for null. A nullif
clause has the following parts:
v The keyword “nullif”
v The column reference
v The test expression

As an example, a file specification where field1 is a date and is considered null if it


has the value ’99991231’ would have the following characteristics:
v The nullif specification would be as follows: “nullif &='99991231'”
v The entire specification would be as follows: “fld1 date YMD’’ bytes 8 nullif
&='99991231'”
v All format specifications support the nullif clause.

In addition to &=, which evaluates to ‘string must exactly match,’ the nullif clause
also supports &&=, which allows substring matching. This is useful in cases where
the string might occur anywhere in a field with space padding. For example nullif
&&=’N’ matches the different expressions “ N “, “N “, “ N”.

The following table shows null values:


Table 6-7. Null values
Data type Null value Format clause
Boolean ' ' (1 space) BOOL Y_N BYTES 1 NULLIF &=’ ’(1 space)
DATE 000000 DATE YMD ’’ BYTES 6 NULLIF &=’000000’
INT ' ' (6 spaces) INT BYTES 6 NULLIF &=’ ’ (6 spaces)

6-8 IBM Netezza Data Loading Guide


Appendix A. Examples and grammar
This section provides some examples for external tables, the nzload command, SQL
grammar, and references.
Related concepts:
“Decimal delimiter option” on page 1-2
Chapter 2, “External tables,” on page 2-1
Chapter 3, “External table load options,” on page 3-1
Chapter 4, “The nzload command,” on page 4-1
Chapter 5, “Unload data,” on page 5-1

Examples of specifying the nzload arguments


These examples describe how to specify nzload arguments, how to use named
pipes, and sample ways of using nzload.

Specify parameters for the nzload command


The following examples illustrate how to specify parameters for the nzload
command:
v The table repeat_cust is delimited by vertical bars (|) and is contained in the
input file clickstream.dat. To load this table, enter:
nzload -t repeat_cust -delim ’|’ -df clickstream.dat
This example does not specify the -u, -pw, or -db parameters, so defaults are
used. These defaults are described in Table 4-1 on page 4-3.
v The admin user has the password production. The table areacode in the
database dev is delimited by tabs and is contained in the input file
phone-prefix.dat. To load this table, enter:
nzload -u admin -pw production -db dev -t areacode
-delim ’\t’ -df phone-prefix.dat

Streaming data with named pipes


To load a large amount of data, use a named pipe to stream the data to external
tables or to the nzload command. The nzload command loads the data as it fills
the pipe, and does not exit until it receives the end-of-file indicator. The stdin
option is supported for nzload.

To use a named pipe to load tables with the nzload command:


1. Create a zero-length, named pipe file by using the Linux command mkfifo:
mkfifo mypipe
2. Run the following command in a background session:
nzload -db <my_db> -t my_table -delim "|" -df /export/home/<my_db>/
mypipe
3. Do the following in a foreground session:
cat /export/home/nz/<my_db>/my_table.dat > mypipe

© Copyright IBM Corp. 2011, 2014 A-1


Sample nzload usage
The following provides some sample nzload usage.
v To specify the name of the load file, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -df
/tmp/daily/Import.bad
v To specify the boolean style, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -df /tmp
-boolStyle yes_no
v To specify the name of the control file, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -cf
/tmp/daily/control.file
v To allow unescaped carriage returns in char() and varchar() fields, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -df /tmp
-crinString
v To allow an ASCII value 1 - 31 in char() and varchar() fields, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -ctrlChars
v To specify the delimiter to use with the dateStyle option, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -dateDelim '/'
-dateStyle MDY
v To specify how to interpret the date format, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -dateDelim '/'
-dateStyle MDY'
v To specify the field delimiter, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -delim ','
v To specify the use of an escape character, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -df /tmp
-escapeChar '\\''
v To specify an input line with fewer columns than the table definition, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -fillRecord
v To specify discarding the byte value zero in the char() and varchar() fields, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -ignoreZero no
v To specify the log file name, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -lf
/tmp/daily/import.log
v To specify the maximum number of errors, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -maxErrors 100
v To specify stopping processing when the specified number of records are in the
database, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -maxRows 100
v To specify the string to use for the null value, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -nullValue
'none'
v To specify the output directory for the log files, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -outputDir
/tmp/daily
v To specify that quotations are mandatory, except for null values, enter:

A-2 IBM Netezza Data Loading Guide


nzload -u admin -pw password -host nzhost -db emp -t name -requireQuotes
quoted value YES
v To specify the delimiter to use for time formats, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -timeDelim '.'
v To specify allowing but rounding non-zero digits with smaller than microsecond
resolution, enter:
nzload -u admin -pw password -host nzhost -db emp -t name
-timeRoundNanos
v To specify the time style value in the data file, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -timeStyle
12hour
v To specify truncation a string and inserting it into the declared string, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -truncString
v To specify the first year in the YY format, enter:
nzload -u admin -pw password -host nzhost -y2Base 2000
v To enable load continuation, enter:
nzload -u admin -pw password -host nzhost -db emp -t name -allowReplay

Reference examples
The following table shows examples for references.
Table A-1. Reference examples
Reference Meaning
BYTES &2 Error only internal @ reference is allowed for length-clause (in any
format or zone-type).
BYTES @ An error length-clause cannot refer itself.
NULLIF & = '123' Self-reference (no number) is valid in null-clause.

The length must be BYTES/CHARS 3, for text-styles.

Matches (nullif evaluates to 'true') ONLY '123' (a row in the


external file that contains '123').
NULLIF && = '123' Matches (nullif evaluates to 'true') '123', ' 123 ' ' 123 ', if SPACE is
skipped.

Length must be at least BYTES 3 (text-styles) or BYTES 4.


NULLIF @ = 123 Valid for numerical zones.

Matches '123', ' 123 ' and so on, in text format, with spaces
skipped.
NULLIF @ = Valid for date zones
'2000-01-01'

Decimal delimiter examples


The following are examples of how to use the decimal delimiter option. The option
use is shown in bold text.

For text-delimited format for the table level:


INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(delim ’delim’ decimalDelim ’,’);

Appendix A. Examples and grammar A-3


For fixed-length format for the table level:
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(decimalDelim ’,’ format ’fixed’ layout (c1 int bytes 4, c2 float bytes
6, c3 numeric(10,2) bytes 11, c4 time 24HOUR ’:’ bytes 11 );

For fixed-length format for the column level.


v For numeric data type:
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 float bytes 6, c3
numeric(10,2) decimal ’,’ bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 float bytes 6, c3
numeric(10,2) decimal decimalDelim ’,’ bytes 11) );
v For float data type:
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 float floating ’,’
bytes 6, c3 numeric(10,2) bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 float floating
decimalDelim ’,’ bytes 6, c3 numeric(10,2) bytes 11) );
v For double data type:
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 double exponential ’,’
bytes 6, c3 numeric(10,2) bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 float exponential
decimalDelim ’,’ bytes 6, c3 numeric(10,2) bytes 11) );
v For time data types (time, timetz, timestamp):
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 time 12HOUR
decimalDelim ’,’ bytes 12, c3 numeric(10,2) bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 time timeDelim ’-’
decimalDelim ’,’ bytes 12, c3 numeric(10,2) bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 time timeDelim ’-’ ’,’
bytes 12, c3 numeric(10,2) bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 time 12HOUR ’-’
decimalDelim ’,’ bytes 12, c3 numeric(10,2) bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 time 12HOUR ’-’ ’,’
bytes 12, c3 numeric(10,2) bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 time 12HOUR timeDelim
’-’ decimalDelim ’,’ bytes 12, c3 numeric(10,2) bytes 11) );
INSERT INTO <target-table> SELECT * FROM ’<external-table>’ USING
(format ’fixed’ layout (c1 int bytes 4 , c2 time 12HOUR timeDelim
’-’ ’,’ bytes 12, c3 numeric(10,2) bytes 11) )
Related concepts:
“Fixed-point data types” on page 2-7
“Floating-point data types” on page 2-8
“Time data types” on page 2-10
Related reference:
“The DecimalDelim option” on page 3-6

A-4 IBM Netezza Data Loading Guide


SQL grammar
This section provides an explanation of the SQL grammar that is used for CREATE
EXTERNAL TABLE.
[INSERT INTO <normal-table>] SELECT <col-list>
FROM EXTERNAL [name] ’<data-file>’ [USING ’(’ <Load-options>’)’]
CREATE EXTERNAL TABLE <ext-table-name><External-table-shape> (<External-table-shape>
| SAMEAS <tablename>) USING ’(’ <Load-options> ’)’
CREATE EXTERNAL TABLE [name] ’file path’ [USING ’C’ load-options ’)’ AS SELECT-statement
Load-options: Load-option
| Load-option Load-options // space separated list of USING clause options
Load-option: FORMAT TEXT | INTERNAL | FIXED
| RECORDLENGTH <n>| Length-ref-expr
| RECORDDELIM <string-literal-max-8-bytes >
| LAYOUT ( Zone-definitions )
.....
Zone-definitions: Zone-def
| Zone-def ’,’ Zone-definitions // comma-separated lists of zone definitions
Zone-def: [Zone-use-type] [Zone-name] [Zone-type] [Zone-style] [Zone-len] [Nullness]
Zone-use-type: REF | FILLER
Zone-name: Identifier
Zone-type: CHAR| VARCHAR
| NCHAR| NVARCHAR
| BOOL
| INT1 | INT2 | INT4 | INT8 | INT
| UINT1 | UINT2 | UINT4 | UINT8 | UINT
| NUMERIC
| FLOATING| DOUBLE
| DATE | TIME | TIMESTAMP | TIMETZ
Zone-style: INTERNAL
| DECIMAL [’decimal-delim’]
| FLOATING | SCIENTIFC [’decimal-delim’]
| Date-format
| Time-format
| Date-format Time-format
Date-format:
| DateStyle [’date-delim’]
| DATE DELIM ’date-delim’
Time-format:
| TimeStyle [’time-delim’] [’decimal-delim’]
| TIME DELIM ’time-delim’ DecimalDelim ’decimal-delim’
Date-style: YMD| DMY | MDY |.. // all date styles
Time-style: 12HOUR | 24HOUR
Zone-len: BYTES <n> | <Length-ref-expr>
| CHARACTERS <n> | <Length-ref-expr>
Zone-ref: External-ref
| Isolated-ref
| Internal-ref
External-ref: &[n] // 1 based absolute position of zones, 0,
negative values for relative positions backwards
Isolated-ref: &&[n] // 1 based absolute position of zones 0,
negative values for relative positions backwards
Internal-ref: @[n] // 1 based absolute position of zones, 0,
negative values for relative positions backwards
Length-ref-expr: Internal-ref [ Operator <n> ]
Operator: + | -

Fixed-length format definition


The following is a sample data record:
20011228YF2001122814313425 Forest St Marlborough MA017525083828200600

The record is defined by the following column layout:


v Columns 1-8 Date format YYYYMMDD Null when value is '99991231'

Appendix A. Examples and grammar A-5


v Column 9 Boolean Y/N Null when value is space ' '
v Column 10 Boolean T/F Null when value is space ' '
v Column 11-24 Time stamp format YYYYMMDDHHMMSS Null when value is
'99991231000000'
v Column 25-39 Character Address Null when value is all spaces
v Column 40-52 Character City Null when value is '****NULL*****'
v Column 53-54 Character State Null when value is '##'
v Column 55-59 Number postal code Null when value is all zeros
v Column 60-68 Character Phone Null when value is all zeros
v Column 69-72 Number(3,2) Example 600 would be 6.00 Never Null
v Column 73 Newline end of record.

The following is an example of the IBM Netezza external table definition for this
data:
CREATE EXTERNAL TABLE sample_ext (
Col01 DATE ,
Col09 BOOL ,
/* Skipped col10 */
Col11 TIMESTAMP,
Col26 Char(12),
Col38 Char(10),
Col48 Char(2),
Col50 Int4,
Col56 CHAR(10),
Col67 CHAR(3) /* Numeric(3,2) cannot be loaded directly */
)
USING (
dataobject(’/home/test/sample.fixed’)
logdir ’/home/test’
recordlength 72 /* does not include end of record delimiter */
recorddelim ’
’ /* This is actually a newline between the single quotes, really not needed as newline is default */
format ’fixed’
layout (
Col01 DATE YMD ’’ bytes 8 nullif &=’99991231’,
Col09 BOOL Y_N bytes 1 nullif &=’ ’,
FILLER Char(1) Bytes 1, /* was col10 space */
Col11 TIMESTAMP YMD ’’ 24HOUR ’’ bytes 14 nullif &=’99991231000000’,
Col26 CHAR(15) bytes 15 nullif &=’ ’, /* 15 spaces */
Col38 CHAR(13) bytes 13 nullif &=’****NULL*****’ ,
Col48 CHAR(2) bytes 2 nullif &=’##’ ,
Col50 INT4 bytes 5 nullif &=’00000’ ,
Col56 CHAR(10) bytes 10 nullif &=’0000000000’,
Col67 CHAR(3) bytes 3 /* We cannot load this directly, so we use an insert-select */
) /* end layout */
); /* end external table definition. */
INSERT INTO sampleTable
SELECT
Col01,
Col09,
Col11,
Col26,
Col38,
Col48,
Col50,
Col56,
(Col67/100)::numeric(3,2) as Col67 /* convert char to numeric(3,2) */
FROM sample_ext ;

A-6 IBM Netezza Data Loading Guide


Script example for loading data by using fixed-length format
The following is an example of a script to load data by using the fixed-length
format.
LOGDIR="/tmp"
DIR="/tmp"
NZSQL="nzsql -db test -c"

function CreateDb()
{
nzsql -c "create database test"
}
function CleanUp()
{
$NZSQL "drop table textDelim_tbl"
$NZSQL "drop table textFixed_tbl"
}
function CreateTable()
{
$NZSQL "create table textDelim_tbl(col1 int, col2 char(10), col3
date)"
$NZSQL "create table textFixed_tbl(col1 int, col2 char(10), col3
date)"
}

function CreateDataFile()
{

# Create text delimited data file


cat > $DIR/delimData << EOF
1|Customer|12/7/2011
2|Netezza|02/16/2010
EOF

# Create text fixed data file


cat > $DIR/fixedData << EOF
1HelloWorld2011-12-07
2Netezza 2010-02-16
EOF
}

function LoadData()
{
# nzload using text format
nzload -t textDelim_tbl -df $DIR/delimData -db test -outputDir
$LOGDIR -delim ’|’ -dateStyle MDY -dateDelim ’/’
#nzload using fixed format
nzload -t textFixed_tbl -df $DIR/fixedData -db test -outputDir
$LOGDIR -format fixed -layout "col1 int bytes 1, col2 char(10) bytes
10, col3 date YMD ’-’ bytes 10"

function UnloadData()
{
$NZSQL "insert into textDelim_tbl select * from external
’$DIR/delimData’ using (Delimiter ’|’ DateStyle ’MDY’ DateDelim ’/’);"
}

CreateDb
CleanUp
CreateTable
CreateDataFile
LoadData
UnloadData

Appendix A. Examples and grammar A-7


A-8 IBM Netezza Data Loading Guide
Appendix B. Troubleshooting
This section contains examples to aid you in troubleshooting data loading.

Tips for successful loading


The following topics describe how to analyze your data, how to set up load
processes, and how to troubleshoot problems.

Create your table


Before you create your table, check the following:
v Choose a distribution key. If you know the primary key or a column that is used
frequently in joins, use that one. Use a distribution key with the highest
selectivity. For more information about distribution keys, see the IBM Netezza
System Administrator’s Guide.
v Check that any column that does not contain null data (or that should not
contain null data) is declared as not null. The system processes not null columns
more quickly.
v Check whether you have number fields. Are they declared as int8, int4, smallint,
byteint, or numeric(s,p)? The smaller the storage, the better for large tables.

Determine your data format


Consider the following when you are determining the format of your data:
v Check how many data fields there are in each input line of the data file. Are
there the same number of columns that are defined in the target-table definition?
– If there are fewer columns than fields, is it acceptable to extend the schema to
have filler columns? If not, then the load will not succeed.
– If there are more columns than fields, is it acceptable to use null values to
insert into those columns? If it is acceptable, specify the -fillRecord option.
v Check the field delimiter. It should be a character that is used to separate one
field value from another. This field delimiter must be unique and must not
appear in a field value, especially in a char or varchar string. Use the -delim
option to specify the field delimiter.
v Check whether there are any NULL values in the data source. How is the null
value expressed in the data file? The RDBMS industry convention is to use the
string “null” to represent a null value. If the data file uses a different
representation, use the -nullValue option to override the default null value. The
new value can be an empty string or a value in the range of a-z or A-Z and no
longer than four characters.
v Check whether there are any date, time, time with time zone, or timestamp data
types in the table schema. If there are, what style is the date value? The style of
these data type values must be consistent throughout the nzload job.
v Check the handling of string fields for char() or varchar() data types. Does the
longest or largest value fit into the storage of the char() or varchar() declaration?
If not, is it possible to alter the schema to accommodate the longest string?
– If schema cannot be altered, is truncating a string an acceptable solution?
– If truncation is acceptable, specify the -truncString option.

© Copyright IBM Corp. 2011, 2014 B-1


– If neither is acceptable, the nzload command treats the record with the long
string as an error record. The nzload command discards the record to the
nzbad file and logs an error with the record and column numbers in nzlog
file.
– See whether there are any special characters that are used in the string fields.
For example, CR, CRLF, or a character in a string that is the same as the field
delimiter? This violates the unique character rule.
– If there are special characters, can you regenerate the data file to have an
escape character added to these special characters? If so, then use the
-escapeChar '\\' option to process the strings.
– If you cannot regenerate the data file, then the load contains incomplete and
invalid records.

Consider the load source


See whether you are using pipes. If so, are they from another local feed or from
across a network? The preferred method is to read from a named pipe, rather than
to read from stdin/stdout.

Look at the file. Is the file on an NFS mounted directory? If so, remember that
your load performance is constrained by the speed of the network.

Run the job


Make a copy of your source table before you start the load if you are running on a
production system. Making a backup is fast within the IBM Netezza appliance and
is better than reloading from a backup. For example, the syntax for making a copy
is as follows:
CREATE TABLE loan_backup AS SELECT * FROM loan;

Stage the data before you move it to a production system. Create a table, load it,
validate it, then use the ALTER TABLE command to move the tables to production.
For example:
ALTER TABLE loan RENAME TO loan_lastmonth;
ALTER TABLE loan_stage RENAME TO loan;

If you are running multiple nzload jobs to load into a table, use unique names for
your nzbad files. The nzload command generates the default file name by using the
<tablename>.<databasename> format and appending the extension .nzbad. Loading
into the data table of the dev database uses the default file name data.dev.nzbad
for the nzbad file. Each instance of the nzload command overwrites the existing
file. If you want to preserve the bad records that are stored in this file, use the -bf
<file name> option to specify a different name for each nzload job.

Note: If your default system case is uppercase, the system displays lowercase table
names as uppercase in nzlog files, for example, DATA.DEV.nzlog and
DATA.DEV.nzbad.

Run the Linux top command on the host to monitor CPU resources. Consider
running more loads concurrently if resources are available.

Troubleshoot
If you see the error message, Too many data fields for table, use the Linux
command head d-1 on the data file to get the first row, which might contain the
extracted name of the column. Compare these names to the DDL of the table you
created and see whether their physical positions match.
B-2 IBM Netezza Data Loading Guide
If you see the error message, Data type mismatch on column 5, use the Linux
command cut -d^ -f 5 inputfile | more to look at the individual data values in
the source file and then compare them to your DDL. Compare these values to the
DDL of the table you created and see whether their physical positions match.

Handle exceptions
Repeat the load on the -bf file. If there are many exceptions, fix them and
re-extract from the source system. If they are few, use a text editor to change data.
To make large substitutions, use the Linux sed or awk commands.

Validate the results


After the load completes, validate the results by comparing them with the source
system.

Count the number of rows and select min/max/sum of each numeric and
min/max of each date column in the table.

Generate statistics
Remember to run the GENERATE STATISTICS command on your tables or
databases after you load new data.

Test performance
If your data is evenly distributed, you should see peak loading performance of at
least 75 percent CPU utilization on the host. You can monitor utilization by
running the Linux top command during the load. If you see less CPU utilization
that means either the data is skewed so that all SPUs are not sharing the workload
or the parser is waiting for data.

If your input data is skewed, that is, all records are being sent to only a few SPUs,
those SPUs become the performance bottleneck.

If your CPU utilization is less than 75 percent and the data is evenly distributed,
you might have a streaming problem:
v If the load is running from the local host, determine the source of the data.
Look for other concurrent database activities such as activities that are
SPU-to-SPU broadcast intensive or SPU disk I/O intensive.
v If the data is not locally staged or is on a SAN or NFS mount, determine
whether the bottleneck is the remote source of the data or the network.
The performance of the IBM Netezza appliance system depends on the number
of SPUs. If, however, data is being streamed across an external network, then the
performance is limited by the speed of the network.
Test the network by using the FTP command to send a file between the source
and the local host, and measure the transfer rate. Under optimal conditions, a
Gig-E network transfers at a rate of ~1000Mb/second, or ~125MB/second or
~450GB/hour.

Error handling for nzload


The nzload command does extensive error checking. This section describes how
the nzload command interprets different data types and the way it handles syntax
errors.

Appendix B. Troubleshooting B-3


Error reporting
The nzload command returns standard error status when it completes.
0 The load was successful, all input records were inserted.
1 The load failed, no records were inserted because of an error or errors
during the load.
2 The load was successful, but the system found an error in input that did
not exceed error threshold (-maxErrors), so good records were inserted.

The nzload command writes high-level errors to the terminal (stderr), nzlog file,
and nzbad file. You can specify the nzlog and nzbad file names on the command
line or by using a control file.

Periodically delete log files to free up disk space.


Related concepts:
“The nzload control file” on page 4-5

The nzload log files


The system creates the following nzlog file as the result of the command line,
nzload -u admin -pw password -t member_profile -db dev -maxErrors 10 -delim
'\t'
v -maxErrors allows the nzload command to continue processing until 10 errors
are found.
v -delim '\t' specifies the TAB delimiter.

The system appends to the nzlog file for every nzload command that loads the
same table into the same database. The system names the nzlog file that is based
on the table and the database name with the extension .nzlog. So, in this example,
the file name is member_profile.dev.nzlog.

There is also a member_profile.dev.nzbad file that contains any record that had an
error. The system overwrites this file each time you run the nzload command for
the same table and database name (unlike the behavior of the nzlog file).

B-4 IBM Netezza Data Loading Guide


Appendix C. Option names
This section describes the different methods for specifying external table options.
Related concepts:
“nzload command syntax” on page 4-3

Specify options
The following table shows how to enter the external table options when you use
the nzload command-line method, n a control file, or as part of a SQL command.
Table C-1. Specify external table options
Option Command line Control file SQL
AllowReplay -allowreplay Not applicable LOAD_REPLAY_REGION

MAX_QUERY_RESTARTS
BadFile -bf badfile Not applicable
BoolStyle -boolStyle boolstyle BOOLSTYLE
Compress -compress compress COMPRESS
CRinString -crInString crinstring CRINSTRING
CtrlChars -CtrlChars ctrlchars CTRLCHARS
Database -db database Not applicable
Datafile -df datafile DATAOBJECT
DateDelim -dateDelim datedelim DATEDELIM
DateStyle -dateStyle datestyle DATESTYLE
DecimalDelim -decimaldelim decimaldelim DECIMALDELIM
Delimiter -delim delim DELIM

-delimiter delimiter DELIMITER


Encoding -encoding encoding ENCODING
EscapeChar -escape escape ESCAPE

-escapeChar escapechar ESCAPECHAR


FillRecord -fillRecord fillrecord FILLRECORD
Format -format format FORMAT
IgnoreZero -ignoreZero ignorezero IGNOREZERO
IncludeZeroSeconds Not applicable Not applicable INCLUDEZEROSECONDS
Layout -layout layout LAYOUT
LogDir -outputDir outputdir LOGDIR
LogFile -lf logfile Not applicable
LogFileSize -logFileSize Not applicable LOAD_LOG_MAX_FILESIZE
MaxErrors -maxErrors maxerrors MAXERRORS
MaxRows -maxRows maxrows MAXROWS
NullValue -nullValue nullvalue NULLVALUE

© Copyright IBM Corp. 2011, 2014 C-1


Table C-1. Specify external table options (continued)
Option Command line Control file SQL
QuotedValue -quotedValue quotedvalue QUOTEDVALUE
RecordDelim -recdelim recdelim RECDELIM
RecordLength -reclength recordlength RECLENGTH
RemoteSource -host Not applicable REMOTESOURCE
RequireQuotes -requireQuotes requirequotes REQUIREQUOTES
SkipRows -skipRows skiprows SKIPROWS
SocketBufSize -fileBufSize socketbufsize SOCKETBUFSIZE

-fileBufByteSize
SuspendMviews -suspendMviews Not applicable Not applicable
Tablename -t tablename Not applicable
TimeDelim -timeDelim timedelim TIMEDELIM
TimeRound Nanos -timeRoundNanos timeroundnanos TIMEROUNDNANOS
TimeExtraZeros
-timeExtraZeros timeextrazeros TIMEEXTRAZEROS
TimeStyle -timeStyle timestyle TIMESTYLE
TruncString -truncString truncstring TRUNCSTRING
Y2Base -y2Base y2base Y2BASE

C-2 IBM Netezza Data Loading Guide


Notices
This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in
other countries. Consult your local IBM representative for information on the
products and services currently available in your area. Any reference to an IBM
product, program, or service is not intended to state or imply that only that IBM
product, program, or service may be used. Any functionally equivalent product,
program, or service that does not infringe any IBM intellectual property right may
be used instead. However, it is the user's responsibility to evaluate and verify the
operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter
described in this document. The furnishing of this document does not grant you
any license to these patents. You can send license inquiries, in writing, to: This
information was developed for products and services offered in the U.S.A.

IBM Director of Licensing


IBM Corporation
North Castle Drive
Armonk, NY 10504-1785 U.S.A.

For license inquiries regarding double-byte (DBCS) information, contact the IBM
Intellectual Property Department in your country or send inquiries, in writing, to:

IBM World Trade Asia Corporation


Licensing 2-31 Roppongi 3-chome, Minato-ku
Tokyo 106-0032, Japan

The following paragraph does not apply to the United Kingdom or any other
country where such provisions are inconsistent with local law:
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS
FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this statement may not apply
to you.

This information could include technical inaccuracies or typographical errors.


Changes are periodically made to the information herein; these changes will be
incorporated in new editions of the publication. IBM may make improvements
and/or changes in the product(s) and/or the program(s) described in this
publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for
convenience only and do not in any manner serve as an endorsement of those Web
sites. The materials at those Web sites are not part of the materials for this IBM
product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it
believes appropriate without incurring any obligation to you.

© Copyright IBM Corp. 2011, 2014 D-1


Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact:

IBM Corporation
Software Interoperability Coordinator, Department 49XA
3605 Highway 52 N
Rochester, MN 55901
U.S.A.

Such information may be available, subject to appropriate terms and conditions,


including in some cases, payment of a fee.

The licensed program described in this document and all licensed material
available for it are provided by IBM under terms of the IBM Customer Agreement,
IBM International Program License Agreement or any equivalent agreement
between us.

Any performance data contained herein was determined in a controlled


environment. Therefore, the results obtained in other operating environments may
vary significantly. Some measurements may have been made on development-level
systems and there is no guarantee that these measurements will be the same on
generally available systems. Furthermore, some measurements may have been
estimated through extrapolation. Actual results may vary. Users of this document
should verify the applicable data for their specific environment.

Information concerning non-IBM products was obtained from the suppliers of


those products, their published announcements or other publicly available sources.
IBM has not tested those products and cannot confirm the accuracy of
performance, compatibility or any other claims related to non-IBM products.
Questions on the capabilities of non-IBM products should be addressed to the
suppliers of those products.

All statements regarding IBM's future direction or intent are subject to change or
withdrawal without notice, and represent goals and objectives only.

All IBM prices shown are IBM's suggested retail prices, are current and are subject
to change without notice. Dealer prices may vary.

This information contains examples of data and reports used in daily business
operations. To illustrate them as completely as possible, the examples include the
names of individuals, companies, brands, and products. All of these names are
fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which


illustrate programming techniques on various operating platforms. You may copy,
modify, and distribute these sample programs in any form without payment to
IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating
platform for which the sample programs are written. These examples have not
been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or
imply reliability, serviceability, or function of these programs.

D-2 IBM Netezza Data Loading Guide


Each copy or any portion of these sample programs or any derivative work, must
include a copyright notice as follows:

© your company name) (year). Portions of this code are derived from IBM Corp.
Sample Programs.

© Copyright IBM Corp. _enter the year or years_.

If you are viewing this information softcopy, the photographs and color
illustrations may not appear.

Trademarks
IBM, the IBM logo, ibm.com® and Netezza are trademarks or registered trademarks
of International Business Machines Corporation in the United States, other
countries, or both. If these and other IBM trademarked terms are marked on their
first occurrence in this information with a trademark symbol (® or ™), these
symbols indicate U.S. registered or common law trademarks owned by IBM at the
time this information was published. Such trademarks may also be registered or
common law trademarks in other countries. A current list of IBM trademarks is
available on the web at "Copyright and trademark information" at
http://www.ibm.com/legal/copytrade.shtml.

Adobe is a registered trademark of Adobe Systems Incorporated in the United


States, and/ or other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other


countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of
Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other
countries.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other
names may be trademarks of their respective owners.

Red Hat is a trademark or registered trademark of Red Hat, Inc. in the United
States and/or other countries.

D-CC, D-C++, Diab+, FastJ, pSOS+, SingleStep, Tornado, VxWorks, Wind River,
and the Wind River logo are trademarks, registered trademarks, or service marks
of Wind River Systems, Inc. Tornado patent pending.

APC and the APC logo are trademarks or registered trademarks of American
Power Conversion Corporation.

Other company, product or service names may be trademarks or service marks of


others.

Notices D-3
D-4 IBM Netezza Data Loading Guide
Index
Special characters datestyle C-1
DateStyle option 3-5
load continuation 3-16
LOAD_LOG_MAX_FILESIZE 4-3
_v_load_status 4-2 decimaldelim 1-2, 3-1, 6-4 LOAD_REPLAY_REGION 4-3, C-1
decimalDelim option 3-6 load. See also nzload 4-1
delim C-1 loading, success tips B-1
A delimiter C-1 log file 4-5
allowreplay 4-3, C-1 delimiter option 3-6 log files 2-2
attributes logdir C-1
data 6-1 logDir option 3-9
E logfile
size C-1
encoding C-1
B encoding option 3-7
logfilesize 4-3

backup errors
external tables 2-3 nzload handling B-4
nzload B-2 escape C-1 M
badfile 4-5, C-1 escapechar C-1 matching input fields 3-15
best practices escapeChar option 3-8 MAX_QUERY_RESTARTS 4-3, C-1
external tables 2-12 external table maxerrors C-1
boolstyle C-1 about 2-1 maxErrors option 3-9
boolStyle option 3-2 backup and restore 2-3 maxrows 3-10, C-1
displaying information 2-1
examples 2-13
C options 3-1
parsing 2-3
N
character strings NOT NULL 3-10
privileges 2-1
char 2-9 nullvalue C-1
restrictions 2-12
varchar 2-9 nullValue option 3-10
column constraint 2-9 nz_migrate 1-1
compress C-1 nzload command
compress option 3-3 F backup B-2
compressed binary 1-2 filebufbytesize C-1 boolStyle 4-2
concurrency 4-1 fileBufByteSize 4-3 error reporting B-4
control file filebufsize C-1 examples A-1
using 4-5 fileBufSize 4-3 inputs 4-3
counting rows 3-14 fillrecord C-1 privileges 4-1
CREATE EXTERNAL TABLE fillRecord option 3-8 program invocation 4-2
dropping an external table 2-13 fixed point 2-7 specifying arguments A-1
examples 2-13 floating point 2-8 syntax 4-3
CRinsString option 3-3 format C-1 tips B-1
crinstring C-1 background 6-1 uncommitted jobs 4-1
ctrlchars C-1 format option 3-8 using 4-1
ctrlChars option 3-3 format options 6-2 nzreclaim command
nzload jobs 4-1

D H
data attributes 6-1 host 4-3, C-1 O
data file 4-5 options
data loading external table 3-1
components 1-1 I fixed-length only 6-2
fixed-length unsupported 6-2
formats 1-2 ignorezero C-1
data types names C-1
ignoreZero option 3-8
fixed-point 2-7 processing 3-2
includeHeader option 3-9
floating-point 2-8 outputdir 4-3, C-1
includezeroseconds C-1
for external tables 2-5 includeZeroSeconds option 3-9
integer 2-6
temporal 2-10 P
database C-1
datafile C-1 L pipes A-1
privileges, load session 4-1
dataObject option 3-3 layout 3-9
datedelim C-1 definitions 6-4
dateDelim option 3-5 legal characters 3-16

© Copyright IBM Corp. 2011, 2014 X-1


Q V
quotedvalue C-1 value absence 3-15
quotedValue option 3-10

Y
R y2base C-1
recdelim C-1 y2Base option 3-14
recordDelim option 3-11
recordLength option 3-11
references
examples A-3
Z
zone definition, default values 6-2
remote client, unloading 5-1
zones
remotesource C-1
default values 6-2
remoteSource option 3-12
requirequotes C-1
requireQuotes option 3-12
rows
bad 3-15
counting 3-14
input 3-15
skipping 3-12

S
session variables 3-17
skiprows C-1
skipRows option 3-12
socketbufsize C-1
socketBufsize option 3-13
SQL grammar A-5
string versus non-string 3-15
supported data types
for external tables 2-5
suspendmviews C-1

T
tablename C-1
temporal data types 2-10
textfixed
using 6-1
timedelim C-1
timeDelim option 3-13
timeextrazeros C-1
timeroundnanos C-1
timeRoundNanos option 3-13
timestamp 2-12
timestyle C-1
timeStyle option 3-13
timetz 2-11
transactions, nzload jobs 4-1
troubleshooting B-1
truncString option 3-13

U
unload
options 5-1
unloading
examples 2-15
remote client 5-1

X-2 IBM Netezza Data Loading Guide




Part Number: 20525-02 Rev. 1

Printed in USA

You might also like