SQL Loader
SQL Loader
10
C H A P T E R
In This Chapter
QL*Loader is an Oracle utility that enables you to efficiently load large amounts of data into a database. If you have data in a flat file, such as a comma-delimited text file, and you need to get that data into an Oracle database, SQL*Loader is the tool to use. This chapter introduces you to the SQL*Loader utility, discusses its control file, provides the syntax for using the SQL*Loader command, and provides examples of using SQL*Loader to load data into databases.
Introducing SQL*Loader Understanding the SQL*Loader control file Understanding the SQL*Loader command Studying SQL*Loader examples
Introducing SQL*Loader
SQL*Loaders sole purpose in life is to read data from a flat file and to place that data into an Oracle database. In spite of having such a singular purpose, SQL*Loader is one of Oracles most versatile utilities. Using SQL*Loader, you can do the following: Load data from a delimited text file, such as a commadelimited file Load data from a fixed-width text file Load data from a binary file Combine multiple input records into one logical record Store data from one logical record into one table or into several tables Write SQL expressions to validate and transform data as it is being read from a file Combine data from multiple data files into one Filter the data in the input file, loading only selected records
262
Collect bad records that is, those records that wont load into a separate file where you can fix them And more! The alternative to using SQL*Loader would be to write a custom program each time you needed to load data into your database. SQL*Loader frees you from that, because it is a generic utility that can be used to load almost any type of data. Not only is SQL*Loader versatile, it is also fast. Over the years, Oracle has added support for direct-path loads, and for parallel loads, all in an effort to maximize the amount of data that you can load in a given time period.
Control File
The control file tells SQL*Loader how to interpret the data in the flat file.
Datafile
SQL*Loader
Datafile
Flat File
Datafile
Discard File
263
Control files, such as the one illustrated in Figure 10-1, contain a number of commands and clauses describing the data that SQL*Loader is reading. Control files also tell SQL*Loader where to store that data, and they can define validation expressions for the data. Understanding control file syntax is crucial to using SQL*Loader effectively. The control file is aptly named, because it controls almost every aspect of how SQL*Loader operates. The control file describes the format of the data in the input file and tells SQL*Loader which tables and columns to populate with that data. When you write a control file, you need to be concerned with these questions: What file, or files, contain the data that you want to load? What table, or tables, are you loading? What is the format of the data that you are loading? What do you want to do with records that wont load? All of these items represent things that you specify when you write a SQL*Loader control file. Generally, control files consist of one long command that starts out like this:
LOAD DATA
The keyword DATA is optional. Everything else in the control file is a clause of some sort that is added onto this command. SQL*Loader is a broad subject thats difficult to condense into one chapter. The control file clauses shown in this chapter are the ones most commonly used when loading data from text files. The corresponding examples will help you understand SQL*Loader and how its used, and should provide enough background for you to easily use the other features of SQL*Loader as explained in the Oracle8i Server Utilities manual. Many of the control file clauses youll encounter in this chapter are explained by example. The task of loading data into the following table forms the basis for those examples:
CREATE TABLE animal_feeding ( animal_id NUMBER, feeding_date DATE, pounds_eaten NUMBER (5,2), note VARCHAR2(80)
);
Some examples are based on loading data from a fixed-width text file into the animal_feeding table, while others are based on loading the same data from a comma-delimited file.
264
If you do include your data in the control file, the last clause of your LOAD command must be the BEGINDATA clause. This tells SQL*Loader where the command ends and where your data begins. SQL*Loader will begin reading data from the line immediately following BEGINDATA.
Placing quotes around the file name often isnt necessary, but its a good habit to get into. If the file name happens to match a SQL*Loader keyword, contains some strange punctuation, or is case sensitive (UNIX), you could run into problems unless its quoted. You can use either single or double quotes. If necessary, you may include a path as part of the file name. The default extension is .dat.
265
When you specify multiple files like this, SQL*Loader will read them in the order in which they are listed.
Specifies that you are loading an empty table. SQL*Loader will abort the load if the table contains data to start with. Specifies that you are adding data to a table. SQL*Loader will proceed with the load even if preexisting data is in the table. Specifies that you want to replace the data in a table. Before loading, SQL*Loader will delete any existing data. Specifies the same as REPLACE, but SQL*Loader uses the TRUNCATE statement instead of a DELETE statement to delete existing data.
Place the keyword for whichever option you choose after the INFILE clause, as shown in this example:
LOAD DATA INFILE animal_feeding.csv APPEND ... ... ...
266
(1:3) INTEGER EXTERNAL, (4:14) DATE dd-mon-yyyy, (15:19) ZONED (5,2), (20:99) CHAR
The table name shown in this example is the animal_feeding table. The same issues apply to table names as to file names. If the table name matches a reserved word or is case sensitive, enclose it within quotes. A big part of the INTO TABLE clause is the list of field definitions. These describe the input file format for SQL*Loader and map the data in the input file onto the appropriate columns within the table being loaded. The sections Describing delimited columns and Describing fixed-width columns, later in this chapter, explain more about writing field definitions.
267
POSITION (1:3) INTEGER EXTERNAL, POSITION (4:14) DATE dd-mon-yyyy, POSITION (15:19) ZONED (5,2)
) INTO TABLE animal_feeding_note ( animal_id POSITION (1:3) INTEGER EXTERNAL, feeding_date POSITION (4:14) DATE dd-mon-yyyy, note POSITION (20:99) CHAR )
In this example, animal_id and feeding_date are loaded into both tables. After that, however, the animal_feeding table gets the pounds_eaten value, while the animal_feeding_note table gets the note value.
The problem that you experience here is that SQL*Loader works through delimited fields in the order in which they are listed, and this order cuts across all the INTO
268
TABLE clauses. Thus, SQL*Loader would expect animal_id for the animal_feeding_note table to follow the pounds_eaten value in the input file. The second feeding_date would have to follow that, and so forth. To reset SQL*Loader to the beginning of the line where the second INTO TABLE clause is applied to a record, you need to add a POSITION clause to the first field listed for that table. The LOAD statement in Listing 10-3 would work here.
Notice the following in the example shown in Listing 10-3: The second definition of animal_id contains the clause POSITION (1). This causes SQL*Loader to start scanning from the first character of the record. This is the behavior you want because you are loading the same field into two tables. Otherwise, SQL*Loader would look for another animal_id following the pounds_eaten column. The TRAILING NULLCOLS clause has been added to the second INTO TABLE clause because not all records in the input file contain notes. Even though you arent storing it in the animal_feeding_note table, the pounds_eaten column doesnt go away. The FILLER keyword has been used to specify that SQL*Loader not load this field. You can see that life does get a bit complex when loading a delimited file into multiple tables.
269
column_name. The name of a column in the table that you are loading. POSITION (start:end). The position of the column within the record. The values for start and end represent the character positions for the first and last characters of the column. The first character of a record is always position 1. datatype. A SQL*Loader datatype (not the same as an Oracle datatype) that identifies the type of data being loaded. Table 10-1 lists some of these. You will need to write one field list entry for each column that you are loading. As an example, consider the following record:
10010-jan-200002350Flipper seemed unusually hungry today.
This record contains a three-digit ID number, followed by a date, followed by a fivedigit number, followed by a text field. The ID number occupies character positions 1 through 3 and is an integer, so its definition would look like this:
animal_id POSITION (1:3) INTEGER EXTERNAL,
The date field is next, occupying character positions 4 through 14, and its definition looks like this:
feeding_date POSITION (4:14) DATE dd-mon-yyyy,
Notice the dd-mon-yyyy string following the datatype. This tells SQL*Loader the specific format used for the date field. SQL*Loader uses this in a call to Oracles built-in TO_DATE function, so any format that works for TO_DATE may be specified for SQL*Loader DATE fields. You could continue to use the same method to define the rest of the fields that you want to load. The complete field list would look like this:
( animal_id feeding_date pounds_eaten note ) POSITION POSITION POSITION POSITION (1:3) INTEGER EXTERNAL, (4:14) DATE dd-mon-yyyy, (15:19) ZONED (5,2), (20:99) CHAR
270
CHAR
DATE [format]
INTEGER EXTERNAL
DECIMAL EXTERNAL
Note
Be careful with the ZONED datatype. It can be handy for loading numeric values with assumed decimal places, but you have to be aware of how it expects the sign to be represented. This datatype harks back to the old card-punch days when data was stored on 80-character-wide punch cards. The sign for zoned decimal numbers was stored as an overpunch on one of the digits. The practical effect of that is that the ZONED data type will not recognize a hyphen (-) as a negative sign. It will, however, recognize some letters of the alphabet as valid digits. If youre not loading true zoned decimal data, such as a COBOL program might create, then use ZONED only for non-negative numbers.
271
SQL*Loader supports a number of other datatypes, most of which are well beyond the scope of this chapter. One other that you will read about later is the LOBFILE type. An example near the end of this chapter shows you how to load files into CLOB columns.
The first record is missing the three-digit animal ID number. Should that be interpreted as a null value? Or should it be left alone, causing the record to be rejected because spaces do not constitute a valid number? The latter behavior is the default. If you prefer to treat a blank field as a null, you can use the NULLIF clause to tell SQL*Loader to do that. The NULLIF clause comes after the datatype and takes the following form:
NULLIF field_name=BLANKS
To define animal_id so that blank values are stored as nulls, you would use this definition:
animal_id POSITION (1:3) INTEGER EXTERNAL NULLIF animal_id=BLANKS,
You can actually have any valid SQL*Loader expression following the NULLIF clause, but comparing the column to BLANKS is the most common approach taken.
272
The name of a column in the table that you are loading. A SQL*Loader datatype. (See Table 10-1.) Identifies the delimiter that marks the end of the column. Specifies an optional enclosing character. Many text values, for example, are enclosed by quotation marks.
When describing delimited fields, you must be careful to describe them in the order in which they occur. Take a look at this record, which contains some delimited data:
100,1-jan-2000,23.5,Flipper seemed unusually hungry today.
The first field in the record is a three-digit number, an ID number in this case, and can be defined as follows:
animal_id INTEGER EXTERNAL TERMINATED BY ,,
The remaining fields can be defined similarly to the first. However, the note field represents a special case because it is enclosed within quotation marks. To account for that, you must add an ENCLOSED BY clause to that fields definition. For example:
note CHAR TERMINATED BY , OPTIONALLY ENCLOSED BY
The keyword OPTIONALLY tells SQL*Loader that the quotes are optional. If they are there, SQL*Loader will remove them. Otherwise, SQL*Loader will load whatever text it finds.
273
The first record contains a note, while the second does not. SQL*Loaders default behavior is to consider the second record as an error because not all fields are present. You can change this behavior, and cause SQL*Loader to treat missing values at the end of a record as nulls, by using the TRAILING NULLCOLS clause. This clause is part of the INTO TABLE clause, and appears as follows:
... INTO TABLE animal_feeding TRAILING NULLCOLS ( animal_id INTEGER EXTERNAL TERMINATED BY ,, feeding_date DATE dd-mon-yyyy TERMINATED BY ,, pounds_eaten DECIMAL EXTERNAL TERMINATED BY ,, note CHAR TERMINATED BY , OPTIONALLY ENCLOSED BY )
When you use TRAILING NULLCOLS, any missing fields in the record will be saved in the database as nulls.
Error-causing records
When SQL*Loader reads a record from the input file, and for one reason or another is unable to load that record into the database, two things happen: An error message is written to the log file. The record that caused the error is written to another file called the bad file. Bad files have the same format as the input file from which they were created. The reason that SQL*Loader writes bad records to a bad file is to make it easy for you to find and correct the errors. Once the load is done, you can edit the bad file (assuming that it is text), correct the errors, and resubmit the load using the same control file as was originally used. The default name for the bad file is the input file name, but with the extension . bad. You can specify an alternate bad file name as part of the INFILE clause. For example:
INFILE animal_feeding.csv BADFILE animal_feeding_bad.bad
Each input file gets its own bad file. If you are using multiple INFILE clauses, each of those can specify a different bad file name.
274
Concatenating records
SQL*Loader has the ability to combine multiple physical records into one logical record. You can do this in one of two ways. You can choose to combine a fixed number of logical records into one physical record, or you can base that determination on the value of some field in the record.
In this example, every two physical records in the input file will be combined into one longer, logical record. The effect will be as if you took the second record and added it to the end of the first record. The following two records, for example:
100,1-jan-2000,23.5, Flipper seemed unusually hungry today.
The CONCATENATE clause is the appropriate choice if the number of records to be combined is always the same. Sometimes, however, you have to deal with cases where a particular field in a record determines whether the record is continued. For those cases, you must use CONTINUEIF.
Use this option if each record in your input file contains a flag indicating whether the next record should be considered a continuation of the current record.
275
CONTINUEIF NEXT
Use this option if the continuation flag is not in the first record to be continued, but rather in each subsequent record. Use this option if the continuation flag is always the last nonblank character or string of characters in the record.
CONTINUEIF LAST
One youve made this choice, your next two tasks are to specify the string that marks a continued record and to tell SQL*Loader the character positions where that string can be found. Lets say you have an input file that uses a dash as a continuation character and that looks like this:
-17510-jan-200003550 Paintuin skipped his first meal. -19910-jan-200000050 Nosey wasnt very hungry today. 20210-jan-200002200
The hyphen (-) character in the first column of a line indicates that the record is continued to the next line in the file. Records need to be concatenated until one is encountered that doesnt have a hyphen. Because the hyphen is in the record being continued, and because it is not the last nonblank character in the record, the CONTINUEIF THIS option is the appropriate one to use. The proper CONTINUEIF clause then becomes:
CONTINUEIF THIS (1:1) = -
The (1:1) tells SQL*Loader that the continuation string starts in column 1 and ends in column 1. The equal sign (=) tells SQL*Loader to keep combining records as long as the continuation field contains the specified string. When concatenating records, be aware that SQL*Loader removes the continuation string when it does the concatenation. Thus, the following two records:
-17510-jan-200003550 Paintuin skipped his first meal.
Notice that the leading character from each record, the one indicating whether the record is continued, has been removed. With one exception, SQL*Loader always does this. The exception is when you use CONTINUEIF LAST. When you use CONTINUEIF LAST, SQL*Loader leaves the continuation character or characters in the record. The CONTINUEIF NEXT parameter works similarly to CONTINUEIF THIS, except that SQL*Loader looks for the continuation flag in the record subsequent to the
276
one being processed. The CONTINUEIF LAST parameter always looks for the continuation string at the end of the record, so you dont need to specify an exact position. The CONTINUEIF LAST parameter is ideal for delimited records, and theres an example later in this chapter showing how its used.
Older releases of Oracle on Windows NT embedded part of the release number into the file name. So this command would be sqlldr80, sqlldr73, and so forth.
As with other Oracle command-line utilities, SQL*Loader can accept a number of command-line arguments. SQL*Loader can also read command-line arguments from a separate parameter file (not to be confused with the control file). The syntax for the SQL*Loader command looks like this:
sqlldr [param=value[, param=value...]]
If you invoke SQL*Loader without any parameters, a short help screen will appear. This is similar to the behavior of the Export and Import utilities. Table 10-2 documents the SQL*Loader parameters.
userid
userid=username[/password][@service] control
Passes in the control file name. Heres an example:
control=[path]filename[.ext]
The default extension for control files is .ctl.
log
log=[path]filename[.ext]
The default extension used for log files is .log. If you dont supply a file name, the log file will be named to match the control file.
277
Parameter
bad
bad=[path]filename[.ext]
The default extension for bad files is .bad. If you dont supply a file name, the bad file will be named to match the control file. Using this parameter overrides any file name that may be specified in the control file.
data
data=[path]filename[.ext]
The default extension used for data files is .dat. Specifying a data file name on the command line overrides the name specified in the control file. If no data file name is specified anywhere, it defaults to the same name as the control file, but with the .dat extension.
discard
discard=[path]filename[.ext]
The default extension used for discard files is .dis. If you dont supply a file name, the discard file will be named to match the control file. Using this parameter overrides any discard file name that may be specified in the control file.
discardmax
Optionally places a limit on the number of discarded records that will be allowed. The syntax looks like this:
discardmax=number_of_records
If the number of discarded records exceeds this limit, the load is aborted.
skip
Allows you to skip a specified number of logical records. The syntax looks like this:
skip=number_of_records
Use the skip parameter when you want to continue a load that has been aborted and when you know how far into the file you want to go before you restart.
load
Optionally places a limit on the number of logical records to load into the database. The syntax looks like this:
load=number_of_records
Once the specified limit has been reached, SQL*Loader will stop.
Continued
278
errors
errors=number_of_records
SQL*Loader will stop the load if more than the specified number of errors has been received. The default limit is 50. There is no way to allow an unlimited number. The best you can do is to specify a very high value, such as 999999999.
rows
Indirectly controls how often commits occur during the load process. The rows parameter specifies the size of the bind array used for conventional-path loads in terms of rows. SQL*Loader will round that value off to be some multiple of the I/O block size. The syntax for the rows parameter looks like this:
rows=number_of_rows
The default value is 64 for conventional-path loads. Direct-path loads, by default, are saved only when the entire load is done. However, when a direct-path load is done, this parameter can be used to control the commit frequency directly.
bindsize
Specifies the maximum size of the bind array. The syntax looks like this:
bindsize=number_of_bytes
The default is 65,536 bytes (64KB). If you use bindsize, any value that you specify overrides the size specified by the rows parameter.
silent
Allows you to suppress messages displayed by SQL*Loader. You can pass one or more arguments to the silent parameter, as shown in this syntax:
silent=(keyword[, keyword...]])
Valid keywords are the following:
Suppresses introductory messages Suppresses the commit point reached messages Suppresses data-related error messages Suppresses messages related to discarded records
279
Parameter
Description Disables all the messages described above Controls whether SQL*Loader performs a direct-path load. The syntax looks like this:
all direct
direct={true|false}
The default is false, causing a conventional-path load to be performed.
parfile
Specifies the name of a parameter file containing command-line parameters. The syntax looks like this:
parfile=[path]filename[.ext]
When the parfile parameter is encountered, SQL*Loader opens the file and reads command-line parameters from that file.
parallel
Controls whether direct loads are performed using parallel processing. The syntax looks like this:
parallel={true|false}
The default value is false.
readsize
Controls the size of the buffer used to hold data read from the input file. The syntax looks like this: readsize=size_in_bytes The default value is 65,536 bytes. SQL*Loader will ensure that the readsize and bindsize values match. If you specify different values for each, SQL*Loader will use the larger value for both settings.
file
Specifies the database datafile in which the data is to be stored and may be used when doing a parallel load. The syntax looks like this:
file=datafile_name
The file must be one of the files in the tablespace for the table or partition being loaded.
280
The positional method allows you to pass parameters without explicitly naming them. You must pass the parameters in the exact order in which Table 10-2 lists them, and you must not skip any. Converting the previous command to use positional notation yields the following:
sqlldr system/manager animal_feeding.ctl
You can even mix the two methods, passing one or more parameters by position and the remaining parameters by name. For example:
sqlldr system/manager control=animal_feeding.ctl
Since its conventional for Oracle utilities to accept a username and password as the first parameter to a command, this last example represents a good compromise between the two methods. With the one exception of the username and password, it is recommended that you name all your parameters. Youre much less likely to make a mistake that way.
You could invoke SQL*Loader, and use the parameters from the text file, by issuing this command:
sqlldr parfile=animal_feeding.par
Parameter files provide a stable place in which to record the parameters used for a load and can serve as a means of documenting loads that you perform regularly.
281
Loading fixed-width, columnar data, and loading from multiple files Using expressions to modify data before loading it Loading large amounts of text into a large object column With one exception, all the examples in this section will load data into the following table:
CREATE TABLE animal_feeding ( animal_id NUMBER, feeding_date DATE, pounds_eaten NUMBER (5,2), note VARCHAR2(80) );
The one exception involves the last example, which shows you how to load large objects. For that example, the note column is assumed to be a CLOB rather than a VARCHAR2 column. If you want to try these examples yourself, you can find the scripts on the CD in the directory sql_loader_examples.
This format is the typical comma-separated values (CSV) format that you might get if you had entered the data in Excel and saved it as a comma-delimited file. The fields are all delimited by commas, and the text fields are also enclosed within quotes. The following control file, named animal_feedings.ctl, would load this data:
LOAD DATA INFILE animal_feeding.csv BADFILE animal_feeding APPEND INTO TABLE animal_feeding TRAILING NULLCOLS (
282
INTEGER EXTERNAL TERMINATED BY ,, DATE dd-mon-yyyy TERMINATED BY ,, DECIMAL EXTERNAL TERMINATED BY ,, CHAR TERMINATED BY , OPTIONALLY ENCLOSED BY
Here are some points worth noting about this control file: Any records that wont load because of an error will be written to a file named animal_feeding.bad. The BADFILE clause specifies the file name, and the extension .bad is used by default. The APPEND keyword causes SQL*Loader to insert the new data regardless of whether the table has any existing data. While not the default, the APPEND option is one youll often want to use. The TRAILING NULLCOLS option is used because not all records in the input file contain a value for all fields. The note field is frequently omitted. Without TRAILING NULLCOLS, omitting a field would result in an error, and those records would be written to the bad file. The definition of the date field includes a format mask. This is the same format mask that you would use with Oracles built-in TO_DATE function. The following example shows SQL*Loader being invoked to load this data:
E:\> sqlldr seapark/seapark@bible_db control=animal_feeding SQL*Loader: Release 8.1.5.0.0 - Production on Wed Aug 18 11:02:24 1999 (c) Copyright 1999 Oracle Corporation. All rights reserved.
The command-line parameter control is used to pass in the control file name. The extension defaults to .ctl. The same command, but with different control file names, can be used for all the examples in this section.
283
This presents an interesting problem, because sometimes you want to concatenate two records into one, and sometimes you dont. In this case, the key lies in the fact that for each logical record that contains a comment, a trailing comma (,) has been left at the end of the physical record containing the numeric and date data. You can use the control file shown in Listing 10-5, which is named animal_feeding_concat.ctl, to key off of that comma, combine records appropriately, and load the data.
Listing 10-5: A control file that combines two records into one
LOAD DATA INFILE animal_feeding_concat.csv BADFILE animal_feeding_concat APPEND CONTINUEIF LAST = , INTO TABLE animal_feeding TRAILING NULLCOLS ( animal_id INTEGER EXTERNAL TERMINATED BY ,, feeding_date DATE dd-mon-yyyy TERMINATED BY ,, pounds_eaten DECIMAL EXTERNAL TERMINATED BY ,, note CHAR TERMINATED BY , OPTIONALLY ENCLOSED BY )
284
There are two keys to making this approach work: If a line is to be continued, the last nonblank character must be a comma. Since all fields are delimited by commas, this is pretty easy to arrange. The CONTINUEIF LAST = , clause tells SQL*Loader to look for a comma at the end of each line read from the file. Whenever it finds a comma, the next line is read and appended onto the first. You arent limited to concatenating two lines together. You can actually concatenate as many lines as you like, as long as they each contain a trailing comma. For example, you could enter the 5-Jan-2000 feeding for animal #100 as follows:
100, 5-jan-2000, 19.5, Flippers appetite has returned to normal.
All four lines can be concatenated because the first three end with a comma. The fourth line doesnt end with a comma, and that signals the end of the logical record.
You could load this data, reading from both files, using the control file shown in Listing 10-6.
285
Notice the following about this control file: Two INFILE clauses are used, one for each file. Each clause contains its own BADFILE name. The POSITION clause, instead of the TERMINATED BY clause, is used for each field to specify the starting and ending column for that field. The datatype for the pounds_eaten field has been changed from DECIMAL EXTERNAL to ZONED because the decimal point is assumed to be after the third digit and doesnt really appear in the number. For example, 123.45 is recorded in the input file as 12345. COBOL programs commonly create files containing zoned decimal data. The POSITION clause appears before the datatype, whereas the TERMINATED BY clause appears after the datatype. Thats just the way the syntax is. Other than the use of the POSITION clause, there really is no difference between loading fixed-width data and delimited data.
286
151,13-jan-2000,55 166,13-jan-2000,17.5,Shorty ate Squacky. 145,13-jan-2000,0,Squacky is no more. 175,13-jan-2000,35.5,Paintuin skipped his first meal. 199,13-jan-2000,0.5,Nosey wasnt very hungry today. 202,13-jan-2000,22.0 240,13-jan-2000,28,Snoops was lethargic and feverish.
Imagine for a moment that you want to uppercase the contents of the note field. Imagine also that the weights in this file are in kilograms, and that you must convert those values to pounds as you load the data. You can do that by writing expressions to modify the note field and the pounds_eaten field. The following example shows you how to multiply the weight by 2.2 to convert from kilograms to pounds:
pounds_eaten DECIMAL EXTERNAL TERMINATED BY , :pounds_eaten * 2.2,
As you can see, the expression has been placed within quotes, and it has been added to the end of the field definition. You can use a field name within an expression, but when you do, you must precede it with a colon (:). You can use any valid SQL expression that you like, but it must be one that will work within the VALUES clause of an INSERT statement. Indeed, SQL*Loader uses the expression that you supply as part of the actual INSERT statement that it builds to load your data. With respect to the pounds_eaten example, SQL*Loader will build an INSERT statement like this:
INSERT INTO animal_feeding (animal_id, feeding_date, pounds_eaten, note) VALUES (:animal_id, :feeding_date, :pounds_eaten * 2.2, :note)
Listing 10-7 shows a control file that will both convert from kilograms to pounds and uppercase the note field.
287
note )
Oracle has a rich library of such functions that you can draw from. These are documented in Appendix B, SQL Built-in Function Reference.
Instead of an 80-character-wide note column, this version of the table defines the note column as a character-based large object, or CLOB. CLOBs may contain up to 2GB of data, effectively removing any practical limit on the length of a note. Can you load such a column using SQL*Loader? Yes. For purposes of our example, lets assume that you have created note files for each animal, and that each file contains information similar to what you see here:
NAME: Shorty DATE: 16-Jan-2000 TEMPERATURE: 115.2 ACTIVITY LEVEL: High HUMOR: Predatory, Shorty caught Squacky and literally ate him for supper. Shorty should be isolated from the other animals until he can be checked out by Seaparks resident marine psychologist.
Lets also assume that you have modified the comma-delimited file so that instead of containing the note text, each line in that file contains the name of the note file to load. For example:
288
100,13-jan-2000,23.5,note_100.txt 105,13-jan-2000,99.45,note_105.txt 112,13-jan-2000,10,note_112.txt 151,13-jan-2000,55 166,13-jan-2000,17.5,note_166.txt 145,13-jan-2000,0,note_145.txt 175,13-jan-2000,35.5,note_175.txt 199,13-jan-2000,0.5,note_199.txt 202,13-jan-2000,22.0 240,13-jan-2000,28,note_240.txt
To load this data using SQL*Loader, do the following: Define a FILLER field to contain the file name. This field will not be loaded into the database. Define a LOBFILE field that loads the contents of the file identified by the FILLER field into the CLOB column. The resulting control file is shown in Listing 10-8.
Each time SQL*Loader inserts a record into the animal_feeding table, it will also store the entire contents of the associated note file in the note field.
289
Summary
In this chapter, you learned: SQL*Loader is a versatile utility for loading large amounts of data into an Oracle database. SQL*Loader control files are used to describe the data being loaded and to specify the table(s) into which that data is stored. You can use the INFILE clause to identify the file, or files, that you want SQL*Loader to read. You can use the INTO TABLE clause to identify the table, and the columns within that table, that you wish to populate using the data read from the input file. You can use the APPEND option after the INFILE clause to tell SQL*Loader to insert data into a table that already contains data to begin with. You can use SQL*Loader to load delimited data, such as comma-delimited data, or you can use it to load data stored in fixed-width columns. SQL*Loader fully supports all of Oracle8is datatypes, even to the point of allowing you to populate LOB columns.