Topics: What Is SQL Loader and What Is It Used For?
Topics: What Is SQL Loader and What Is It Used For?
This sample control file (loader.ctl) will load an external data file containing delimited data:
load data
infile 'c:\data\mydata.csv'
Another Sample control file with in-line data formatted as fix length records. The trick is to
specify "*" as the name of the data file, and use BEGINDATA to start the data section in the
control file.
load data
infile *
replace
begindata
MATH MATHEMATICS
spool oradata.txt
from tab1
spool off
declare
fp utl_file.file_type;
begin
fp := utl_file.fopen('c:\oradata','tab1.txt','w');
utl_file.fclose(fp);
end;
/
You might also want to investigate third party tools like SQLWays from Ispirer Systems, TOAD
from Quest, or ManageIT Fast Unloader from CA to help you unload data from Oracle.
INFILE *
TRAILING NULLCOLS
( data1,
data2
)
BEGINDATA
11111,AAAAAAAAAA
22222,"A,B,C,D,"
If you need to load positional data (fixed length), look at the following control file example:
LOAD DATA
INFILE *
( data1 POSITION(1:5),
data2 POSITION(6:15)
BEGINDATA
11111AAAAAAAAAA
22222BBBBBBBBBB
INFILE *
SKIP 5
( data1 POSITION(1:5),
data2 POSITION(6:15)
)
BEGINDATA
11111AAAAAAAAAA
22222BBBBBBBBBB
INFILE *
BEGINDATA
11111AAAAAAAAAA991201
22222BBBBBBBBBB990112
LOAD DATA
INFILE 'mail_orders.txt'
BADFILE 'bad_orders.txt'
APPEND
( addr,
city,
state,
zipcode,
mailing_state
)
Can one load data into multiple tables at once?
Look at the following control file:
LOAD DATA
INFILE *
REPLACE
Can one selectively load only the records that one need?
Look at this example, (01) is the first character, (30:37) are characters 30 to 37:
LOAD DATA
APPEND
WHEN (01) <> 'H' and (01) <> 'T' and (30:37) = '19991217'
( field1,
field2 FILLER,
field3
• CONCATENATE: - use when SQL*Loader should combine the same number of physical
records together to form one logical record.
• CONTINUEIF - use if a condition indicates that multiple records should be treated as
one. Eg. by having a '#' character in column 1.
How can get SQL*Loader to COMMIT only at the end of the load file?
One cannot, but by setting the ROWS= parameter to a large value, committing can be reduced.
Make sure you have big rollback segments ready when you use a high value for ROWS=.
1. A very simple but easily overlooked hint is not to have any indexes and/or constraints
(primary key) on your load tables during the load process. This will significantly slow
down load times even with ROWS= set to a high value.
2. Add the following option in the command line: DIRECT=TRUE. This will effectively
bypass most of the RDBMS processing. However, there are cases when you can't use
direct load. Refer to chapter 8 on Oracle server Utilities manual.
3. Turn off database logging by specifying the UNRECOVERABLE option. This option can
only be used with direct data loads.
4. Run multiple load jobs concurrently.
How does one use SQL*Loader to load images, sound clips and
documents?
SQL*Loader can load data from a "primary data file", SDF (Secondary Data file - for loading
nested tables and VARRAYs) or LOGFILE. The LOBFILE method provides and easy way to load
documents, images and audio clips into BLOB and CLOB columns. Look at this example:
image_id NUMBER(5),
file_name VARCHAR2(30),
image_data BLOB);
Control File:
LOAD DATA
INFILE *
REPLACE
image_id INTEGER(5),
file_name CHAR(30),
BEGINDATA
001,image1.gif
002,image2.jpg
Example:
=======
LOAD DATA
INFILE 'month.dat'
INTO TABLE register
(tx_type POSITION(1:10),
acct POSITION(13:17),
amt POSITION(20:24) ":amt/100"
)
Restrictions:
============
=======================================================================
========
LOADING DATABASE SEQUENCES
=======================================================================
========
Example-I:
=========
In the first example, all of the fields are located in the datafile
based
on position, which makes this easier. Another example below covers
data that
is comma delimited.
LOAD DATA
INFILE *
INTO TABLE load_db_seq_positional
(seq_number "db_seq.nextval"
data1 POSITION(1:5),
data2 POSITION(6:15),
)
BEGINDATA
11111AAAAAAAAAA
22222BBBBBBBBBB
Example-II:
==========
In this example, the data fields are comma delimited. The key here is
that
since fields are delimited, SQL*Loader will expect to find values for
the
field SEQ_NUMBER in the data file. Since such entries do not exist,
what
we must do is to put the SEQ_NUMBER field as the last field in the
control
file, and then use the TRAILING NULLCOLS clause to indicate to Loader
that
on some lines (in this case all), there may be "trailing columns" which
are
null, or non-existent.
Here is the similar create table statetement, we will use the same
sequence:
LOAD DATA
INFILE *
INTO TABLE load_db_seq_delimited
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(data1,
data2,
seq_number "db_seq.nextval"
)
BEGINDATA
11111,AAAAAAAAAA
22222,BBBBBBBBBB
Restrictions:
============
=======================================================================
========
LOADING USERNAME OF USER RUNNING SQL*LOADER
=======================================================================
========
3. How do you load the username of the user running the SQL*Loader
session?
Example-I:
=========
In this example, all of the fields are located in the datafile based
on position, which makes this easier. Another example below, which is
slightly more difficult, covers data that is comma delimited. Both
methods
take advantage of the "USER" pseudo-variable. If you prefer to use the
Oracle User ID number, you could use "UID" instead.
LOAD DATA
INFILE *
INTO TABLE load_user_positional
(username "USER"
data1 POSITION(1:5),
data2 POSITION(6:15),
)
BEGINDATA
11111AAAAAAAAAA
22222BBBBBBBBBB
Example-II:
==========
In this example, the data fields are comma delimited. The key here is
that
since fields are delimited, SQL*Loader will expect to find values for
the
field USERNAME in the data file. Since such entries do not exist, then
we
must put the USERNAME field as the last field in the control file, and
then
use the TRAILING NULLCOLS clause to indicate to SQL*Loader that on some
lines
(in this case all), there may be "trailing columns" which are null, or
non-existent.
LOAD DATA
INFILE *
INTO TABLE load_user_delimited
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(data1,
data2,
username "USER"
)
BEGINDATA
11111,AAAAAAAAAA
22222,BBBBBBBBBB
Restrictions:
============
SQL*Loader is a product for moving data from external files into tables in an ORACLE
database. SQL*Loader loads data in a variety of formats, performs filtering (selectively
loading records based upon the data values), and loads multiple tables simultaneously.
During execution SQL*Loader produces a detailed log file with statistics about the load,
and may also produce a bad file (records rejected because of incorrect data) and a discard
file (records that did not meet your selection criteria). You have control over several
loading options.
You must provide two types of input to SQL*Loader to load data from external files into
an ORACLE database: the data itself, and control information describing how to perform
the load.
You must provide a file called the control file as an input to SQL*Loader. The control
file tells SQL*Loader how to interpret the data file. For example, it describes the
following:
The control file's datatype specifications tell SQL*Loader how to interpret the fields in
the data files. SQL*Loader uses this information when working with the fields, and uses
it to describe the data that is being passed to ORACLE. ORACLE then converts the data
into the datatype specified by the table definition.
Some information is mandatory (such as where to find the data and how it corresponds to
the database tables). However, many options are also available to describe and
manipulate the file data. For example, the instructions can include directions on how to
format or filter the data, or to generate unique ID numbers.
You may load data in various formats. It is usually read from one or more data files, but
the data may also be placed in the control file after the control file information.
Data records may be in fixed or variable format. In fixed format, the data is contained in
records which all have the same (fixed) format. That is, the records have a fixed length,
and the data fields in those records have fixed length, type, and position.
In variable format (sometimes called stream format), each record is only as long as
necessary to contain the data. With character data, if the first item is shorter than the
second one, the first record is shorter. Also, the type of data in each record may vary. One
record may contain a character string, the next may contain seven integers, the third may
contain three decimals and a float, and so on. Operating systems use a record terminator
character (such as newline) to mark where variable records end.
1,1,2,3,5,8,13
``BUNKY''
A final distinction concerns the difference between logical and physical records. A record
or line in a file (either of fixed length or terminated) is referred to as a physical record.
Logical record, on the other hand, corresponds to a row in a database table. Sometimes
the logical and physical records are equivalent; such is the case when only a few short
columns are being loaded. However, sometimes several physical records must be
combined to make one logical record.
The examples below will illustrate some of the features of the SQL*Loader.
LOAD DATA
INFILE 'ulcase5.dat'
BADFILE 'ulcase5.bad'
DISCARDFILE 'ulcase5.dsc'
a. REPLACE
-- PROJ has two columns, both not null: EMPNO and PROJNO
--------------------------------------------------------------
-
NOTES:
REPLACE indicates that if there is data in the tables to be loaded (EMP and PROJ),
(a)
that data should be deleted before new rows are loaded.
(b) Multiple INTO clauses are used to load two tables, EMP and PROJ. The same set of
records is processed three times using different combinations of columns each time, to
load table PROJ.
WHEN is used to load only rows with non-blank project numbers. When PROJNO is
(c) defined as columns 25..27, rows are inserted into PROJ only if there is a value in
those columns.
LOAD DATA
a. INFILE *
b. APPEND
projno,
e. loadseq SEQUENCE(MAX,1))
f. BEGINDATA
7782, "CLARK", "Manager", 7839, 09-June-1981, 2572.50, 10:101
--------------------------------------------------------------
-
NOTES:
(a) INFILE * signifies the data is found at the end of the control file.
APPEND indicates that data may be loaded even if the table already contains rows;
(b)
the table need not be empty.
The default terminator for the data fields is a comma, and some fields may be
(c)
enclosed by a double quote.
The data to be loaded into column HIREDATE appears in the format DD-Month-
(d)
YYYY.
BEGINDATA signifies the end of the control information and the beginning of the
(f)
data.
Only a subset of the syntax will be explained below. For a complete explanation of the
above syntax, see chapter 6 of ``ORACLE7 Server Utilities Users Guide''.
Comments
Comments may appear anywhere in the command section of the file, but they should not
appear in the data. Comments are preceded with a double dash, which may appear
anywhere on a line. All text to the right of the double dash is ignored, until the end of
line.
The OPTIONS clause is useful when you usually invoke a control file with the same set
of options, or when the command line and all its arguments becomes very long. This
clause allows you to specify runtime arguments in the control file rather than on the
command line.
Values specified on the command line override values specified in the control file. With
this precedence, the OPTIONS keyword in the control file established default values that
are easily changed from the command line.
When a load is discontinued, any data already loaded remains in the tables, and the tables
are left in a valid state. SQL*Loader's log file tells you the state of the tables and indexes
and the number of logical records already read from the input data file. Use this
information to resume the load where it left off.
For example:
INSERT - This is the default option. It requires the table to be empty before loading.
SQL*Loader terminates with an error if the table contains rows.
APPEND - If data already exists in the table, SQL*Loader appends the new rows to it; if
data doesn't already exist, the new rows are simply loaded.
REPLACE - All rows in the table are deleted and the new data is loaded. This option
requires DELETE privileges on the table.
You can create one logical record from multiple physical records using
CONCATENATE and CONTINUEIF. See chapter 6 of ``ORACLE7 Server Utilities
Users Guide''.
The INTO TABLE clause may continue with some options for loading that table. For
example, you may specify different options (INSERT, APPEND, REPLACE) for each
table in order to tell SQL*Loader what to do if data already exists in the table.
The WHEN clause appears after the table name and is followed by one or more field
conditions. For example, the following clause indicates that any record with the value
``q'' in the fifth column position should be loaded:
A WHEN clause can contain several comparisons as long as each is preceded by AND.
Parentheses are optional but should be used for clarity with multiple comparisons joined
by AND. For example:
To evaluate the WHEN clause, SQL*Loader first determines the values of all the fields in
the record. Then the WHEN clause is evaluated. A row is inserted into the table only if
the WHEN clause is true.
When the control file specifies more fields for a record than are present in the record,
SQL*Loader must determine whether the remaining (specified) columns should be
considered null, or whether an error should be generated. TRAILING NULLCOLS
clause tells SQL*Loader to treat any relatively positioned columns that are not present in
the record as null columns. For example, if the following data
10 Accounting
is read with the following control file
TRAILING NULLCOLS
and the record ends after DNAME, then the remaining LOC field is set to null. Without
the TRAILING NULLCOLS clause, an error would be generated, due to missing data.
Specifying Datatypes
The datatype specification in the control file tells SQL*Loader how to interpret the
information in the data file. The server defines the datatypes for the columns in the
database. SQL*Loader extracts data from a field in the input file, guided by the datatype
specification in the control file. SQL*Loader then sends the field to the server to be
stored in the appropriate column. The server does any data conversion necessary to store
the data in the proper internal format. The datatype of the data in the file does not
necessarily have to be the same as the datatype of the column in the ORACLE table.
ORACLE automatically performs conversions - but you need to ensure that the
conversion makes sense and does not generate errors. SQL*Loader does not contain
datatype specifications for ORACLE internal datatypes like NUMBER or VARCHAR2.
SQL*Loader's datatypes describe data that can be produced with text editors (character
datatypes) and with standard programming languages (native datatypes).
Native Datatypes
Some datatypes consist entirely of binary data, or contain binary data in their
implementation. These non-character datatypes are the native datatypes:
These datatypes will not be discussed as most of the datatypes that you will be using will
be character datatypes. For more information on SQL*Loader datatypes, see page 6-52 of
``ORACLE7 SERVER Utilities User's Guide''.
Character Datatypes
The character datatypes are CHAR, DATE, and the numeric EXTERNAL datatypes
(INTEGER and DECIMAL). These fields can be delimited, and can have lengths (or
maximum lengths) specified in the control file.
CHAR - This data field contains character data. The length is optional, and is taken from
the POSITION specification if it is not present here. If present, this length overrides the
length in the POSITION specification. If no length is given, CHAR data is assumed to
have a length of 1. A field of datatype CHAR may also be variable-length delimited or
enclosed.
To Load LONG Data: If the column in the database table is defined as LONG, you must
explicitly specify a maximum length either with a length-specifier on the CHAR
keyword, or with the POSITION keyword. This guarantees that a large enough buffer is
allocated for the value, and is necessary even if the data is delimited or enclosed.
DATE - This data is character data that should be converted to an ORACLE date using
the specified date mask. The length specification is optional, unless a varying-length data
mask is specified. With a specification like:
is 18 characters. In this case, a length must be specified. Similarly, a length is required for
any Julian dates (date mask ``J'') - a field length is required any time the length of the
date string could exceed the length of the mask. An explicit length specification, if
present, overrides the length in the POSITION clause. Either of these overrides the length
derived from the mask. The mask may be any valid ORACLE date mask. If you omit the
mask, the default ORACLE date mask of ``dd-mon-yy'' is used. See Chapter 6 for the
Oracle date masks.
Numeric EXTERNAL - The numeric external datatypes are the numeric datatypes
(INTEGER, FLOAT, DECIMAL, and ZONED) specified with the EXTERNAL keyword
along with optional length and delimiter specifications. These datatypes are the human-
readable, character form of numeric data.
The data is a number in character form (not binary representation). As such, these
datatypes are identical to CHAR and are treated identically, with one exception: the use
of DEFAULTIF. If you want the default to be null, use CHAR; if you want it to be zero,
use EXTERNAL.
>>----INTEGER
---EXTERNAL--------------------------------------
|___DECIMAL_|
|___ZONED___|