Ascential DataStage Parallel Modify Stage Programming Guide
Ascential DataStage Parallel Modify Stage Programming Guide
Contents
[hide]
1 Keeping and Dropping Fields 2 Renaming Fields 3 Duplicating a Field and Giving It a New Name 4 Changing a Fields Data Type 5 Data Type Conversion Errors 6 Date Field Conversions 7 Decimal Field Conversions 8 Raw Field Length Extraction 9 string and ustring Conversions 10 String Conversions and Lookup Tables 11 time Type Conversions Provided by modify 12 The modify Operator and Nulls 13 Related Content
If you choose to drop a field or fields, all fields are retained except those you explicitly drop. If you chose to keep a field or fields, all fields are excluded except those you explicitly keep.
In osh you specify either the keyword keep or the keyword drop to keep or drop a field, as follows: modify 'keep field1, field2, ... fieldn; ' modify 'drop field1, field2, ... fieldn;'
Renaming Fields
To rename a field specify the attribution operator (=) , as follows: modify ' newField1=oldField1; newField2=oldField2;...newFieldn=oldFieldn; '
If the destination field has been defined as nullable, Orchestrate sets it to null. If the destination field has not been defined as nullable but you have directed modify to convert a null to a value, Orchestrate sets the destination field to the value. To convert a null to a value supply the handle_null conversion specification.
If the destination field has not been defined as nullable, Orchestrate issues an error message and terminates the application. However, a warning is issued at step-check time. To disable the warning specify the nowarn option.
dateField = date_from_days_since[date](int32Field)
date from days since Converts an integer field into a date by adding the integer to the specified base date. The date must be in the format yyyy-mm-dd.
dateField = date_from_julian_day(uint32Field)
dateField = date_from_string
dateField = date_from_ustring
[date_format |date_uformat] (ustringField) date from string or ustring Converts the string or ustring field to a date representation using the specified date_format. By default, the string format is yyyy-mm-dd.
dateField = date_from_timestamp(tsField)
int8Field = month_day_from_date(dateField)
int8Field = weekday_from_date[originDay](dateField)
day of week from date originDay is a string specifying the day considered to be day zero of the week. You can specify the day using either the first three characters of the day name or the full day name. If omitted, Sunday is defined as day zero. The originDay can be either single- or double-quoted or the quotes can be omitted.
int16Field = year_day_from_date(dateField)
int32Field = days_since_from_date[source_date]
(dateField) days since date Returns a value corresponding to the number of days from source_date to the contents of dateField. source_date must be in the form yyyy-mm-dd and can be quoted or unquoted.
uint32Field = julian_day_from_date(dateField)
next weekday from date The destination contains the date of the specified day of the week soonest after the source date (including the source date). day is a string specifying a day of the week. You can specify day by either the first three characters of the day name or the full day name. The day can be quoted in either single or double quotes or quotes can be omitted.
dateField = previous_weekday_from_date[day]
(dateField) previous weekday from date The destination contains the closest date for the specified day of the week earlier than the source date (including the source date) The day is a string specifying a day of the week. You can specify day using either the first three characters of the day name or the full day name. The day can be either single- or double- quoted or the quotes can be omitted.
stringField = string_from_date [date_format | ufornat] (dateField) ustringField = ustring_from_date [date_format | date_uformat] (dateField) strings and ustrings from date
Converts the date to a string or ustring representation using the specified date_format. By default, the string format is yyyy-mm-dd.
The time argument optionally specifies the time to be used in building the timestamp result and must be in the form hh:nn:ss. If omitted, the time defaults to midnight.
int16Field = year_from_date(dateField) year from date int8Field=year_week_from_date(dateField) week of year from date
zero. Omitting fix_zero causes Orchestrate to issue a conversion error when it encounters a decimal field containing all zeros.
decimal from decimal decimalField = decimal_from_decimal[r_type](decimalField) decimal from dfloat decimalField = decimal_from_dfloat[r_type](dfloatField) decimal from string decimalField = decimal_from_string[r_type](stringField) decimal from ustring decimalField = decimal_from_ustring[r_type](ustringField) dfloat from decimal dfloatField = dfloat_from_decimal[fix_zero](decimalField) dfloat from decimal dfloatField = mantissa_from_decimal(decimalField) dfloat from dfloat dfloatField = mantissa_from_dfloat(dfloatField) int32 from decimal int32Field = int32_from_decimal[r_type, fix_zero](decimalField) int64 from decimal int64Field = int64_from_decimal[r_type, fix_zero](decimalField) string from decimal stringField = string_from_decimal[fix_zero][suppress_zero](decimalField) ustring from decimal ustringField = ustring_from_decimal[fix_zero][suppress_zero](decimalField) uint64 from decimal uint64Field = uint64_from_decimal[r_type, fix_zero](decimalField)
The suppress_zero argument specifies that the returned string value will have no leading or trailing zeros. Examples: 000.100 -> 0.1; 001.000 -> 1; -001.100 -> -1.1 Rounding Type You can optionally specify a value for the rounding type (r_type) of many conversions. The values of r_type are:
ceil: Round the source field toward positive infinity. This mode corresponds
to the IEEE 754 Round Up mode. Examples: 1.4 -> 2, -1.6 -> -1
floor: Round the source field toward negative infinity. This mode corresponds
to the IEEE 754 Round Down mode. Examples: 1.6 -> 1, -1.4 -> -2
round_inf: Round or truncate the source field toward the nearest representable value, breaking ties by rounding positive values toward positive infinity and negative values toward negative infinity. This mode corresponds to the COBOL ROUNDED mode.
Examples: 1.4 -> 1, 1.5 -> 2, -1.4 -> -1, -1.5 -> -2
trunc_zero (default): Discard any fractional digits to the right of the rightmost fractional digit supported in the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, round or truncate to the scale size of the destination decimal. This mode corresponds to the COBOL INTEGER-PART function.
Examples: 1.6 -> 1, -1.6 -> -1 Figure 19 shows the conversion of a decimal field to a 32-bit integer with a rounding mode of ceil rather than the default mode of truncate to zero:
The osh syntax for this conversion is: 'field1 = int32_from_decimal[ceil,fix_zero] (dField);' where fix_zero ensures that a source decimal containing all zeros is treated as a valid representation.
rawField = raw_from_string(string) Returns string in raw representation. rawField = u_raw_from_string(ustring) Returns ustring in raw representation. int32Field = raw_length(raw) Returns the length of the raw field.
Use the modify operator to perform the following modifications involving string and ustring fields: Extract the length of a string. Convert long strings to shorter strings by string extraction. Convert strings to and from numeric values using lookup tables .
removes all leading ASCII NULL characters from the beginning of name and places the remaining characters in an output variable-length string with the same name.
removes all trailing Z characters from color, and left justifies the resulting hue fixed-length string.
Copies parts of strings and ustrings to shorter strings by string extraction. The starting_position specifies the starting location of the substring; length specifies the substring length. The arguments starting_position and length are uint16 types and must be positive (>= 0).
uint32 = lookup_uint32_from_string [tableDefinition](stringField) uint32 =lookup_uint32_from_ustring [tableDefinition](ustringField) stringField=lookup_string_from_uint32 [tableDefinition](uint32Field) ustringField=lookup_ustring_from_uint32 [tableDefinition](uint32Field)
stringField = string_from_ustring(ustring) Converts ustrings to strings. ustringField = ustring_from_string(string) Converts strings to ustrings. decimalField = decimal_from_string(stringField) Converts strings to decimals. decimalField = decimal_from_ustring(ustringField) Converts ustrings to decimals. stringField = string_from_decimal[fix_zero] [suppress_zero] (decimalField) Converts decimals to strings.
fix_zero causes a decimal field containing all zeros to be treated as a valid zero. suppress_zero specifies that the returned ustring value will have no leading or trailing zeros.
See string_from_decimal above for a description of the fix_zero and suppress_zero arguments.
dateField = date_from_string [date_format | date_uformat] (stringField) dateField = date_from_ustring [date_format | date_uformat] (ustringField) date from string or ustring
Converts the string or ustring field to a date representation using the specified date_format or date_uformat. By default, the string format is yyyy-mm-dd.
stringField = string_from_date [date_format | date_uformat] (dateField) ustringField = ustring_from_date [date_format | date_uformat] (dateField) strings and ustrings from date
Converts the date to a string or ustring representation using the specified date_format or date_uformat. By default, the string format is yyyy-mm-dd.
int32Field=string_length(stringField) int32Field=ustring_length(ustringField)
stringField=substring [startPosition,len] (stringField) ustringField=substring [startPosition,len] (ustringField) Converts long strings/ustrings to shorter strings/ustrings by string extraction. The startPosition specifies the starting location of the substring; len specifies the substring length. If startPosition is positive, it specifies the byte offset into the string from the beginning of the string. If startPosition is negative, it specifies the byte offset from the end of the string. stringField=uppercase_string (stringField) ustringField=uppercase_ustring (ustringField) Convert strings and ustrings to all upper case.
Convert stringsand ustrings to all lower case. Non-alphabetic characters are ignored in the conversion.
stringField = string_from_time [time_format | time_uformat ] (timeField) ustringField = ustring_from_time [time_format | time_uformat] (timeField) string and ustring from time
Converts the time to a string or ustring representation using the specified time_format or time_uformat. The time_format options are described below. The following osh command converts a string field to lowercase: osh ... | modify lname=lowercase_string(lname)? | peek?
stringField = string_from_timestamp [timestamp_format | timestamp_uformat] (tsField) ustringField = ustring_from_timestamp [timestamp_format | timestamp_uformat] (tsField) strings and ustrings from timestamp
Converts the timestamp to a string or ustring representation using the specified timestamp_format or timestamp_uformat. By default, the string format is %yyyy-%mm-%dd hh:mm:ss.
tsField = timestamp_from_string [timestamp_format | timestamp_uformat] (stringField) tsField = timestamp_from_ustring [timestamp_format | timestamp_uformat] (usringField) timestamp from strings and ustrings
Converts the string or ustring to a timestamp representation using the specified timestamp_format or timestamp_uformat. By default, the string format is yyyy-mm-dd hh:mm:ss.
timeField = time_from_string [time_format | time_uformat](stringField) timeField = time_from_ustring [time_format | time_uformat] (ustringField) string and ustring from time
Converts the time to a string or ustring representation using the specified time_format.
If a numeric value is unknown, an empty string is returned by default. However, you can set a default string value to be returned by the string lookup table. If a string has no corresponding value, 0 is returned by default. However, you can set a default numeric value to be returned by the string lookup table.
Here are the options and arguments passed to the modify operator to create a lookup table:
OR:
Numeric Value String or Ustring numVal1 string1 | ustring1 numVal2 string2 | ustring1 ... ... numVal3 stringn | ustringn
OR:
stringField = lookup_string_from_uint32[tableDefinition](source_intField);
ustringField = lookup_ustring_from_uint32[tableDefinition](source_intField);
where: tableDefinition defines the rows of a string or ustring lookup table and has the following form: {propertyList} ('string' | 'ustring' = value; 'string' | 'ustring'= value; ... ) where:
propertyList is one or more of the following options; the entire list is enclosed in braces and properties are separated by commas if there are more than one: case_sensitive: perform a case-sensitive search for matching strings; the default is caseinsensitive. default_value = defVal: the default numeric value returned for a string that does not match any of the strings in the table. default_string = defString: the default string returned for numeric values that do not match any numeric value in the table. string or ustring specifies a comma-separated list of strings or ustrings associated with value; enclose each string or ustring in quotes. value specifies a comma-separated list of 16-bit integer values associated with string or ustring.
The following osh code performs the conversion: modify gender = lookup_int16_from_string[{default_value = 2} ('f' = 0; 'female' = 0; 'm' = 1; 'male' = 1;)] (gender);' In this example, gender is the name of both the source and the destination fields of the translation. In addition, the string lookup table defines a default value of 2; if gender contains a string that is not one of "f", "female", "m", or "male", the lookup table returns a value of 2. Orchestrate performs no automatic conversions to or from the time data type. You must invoke the modify operator if you want to convert a source or destination time field. Most time field conversions extract a portion of the time, such as hours or minutes, and write it into a destination field.
int8Field = hours_from_time(timeField) hours from time int32Field = microseconds_from_time(timeField) microseconds from time
int8Field = minutes_from_time(timeField) minutes from time dfloatField = seconds_from_time(timeField) seconds from time dfloatField = midnight_seconds_from_time(timeField) seconds-from-midnight from time stringField = string_from_time [time_format | time_uformat] (timeField) ustringField = ustring_from_time [time_format |time_uformat] (timeField) string and ustring from time
Converts the time to a string or ustring representation using the specified time_format or time_uformat.
timeField = time_from_midnight_seconds(dfloatField) time from seconds-from-midnight timeField = time_from_string [time_format | time_uformat ](stringField) timeField = time_from_ustring [time_format | time_uformat] (ustringField) time from string
Converts the string or ustring to a time representation using the specified time_format or time_uformat.
timeField = time_from_timestamp(tsField) time from timestamp tsField = timestamp_from_time [date](timeField) timestamp from time
The date argument is required. It specifies the date portion of the timestamp and must be in the form yyyy-mm-dd.
dfloatField = seconds_since_from_timestamp [timestamp](tsField) seconds_since from timestamp tsField= timestamp_from_seconds_since [timestamp](dfloatField) timestamp from seconds_since stringField = string_from_timestamp [timestamp_format | timestamp_uformat] (tsField) ustringField = ustring_from_timestamp [timestamp_format | timestamp_uformat] (tsField) strings and ustrings from timestamp
Converts the timestamp to a string or ustring representation using the specified timestamp_format or timestamp_uformat. By default, the string format is %yyyy-%mm-%dd hh:mm:ss.
tsField = timestamp_from_ustring [timestamp_format | timestamp_uformat] (usringField) timestamp from strings and ustrings
Converts the string or ustring to a timestamp representation using the specified timestamp_format. By default, the string format is yyyy-mm-dd hh:mm:ss.
The time argument optionally specifies the time to be used in building the timestamp result and must be in the form hh:mm:ss. If omitted, the time defaults to midnight.
The date argument is required. It specifies the date portion of the timestamp and must be in the form yyyy-mm-dd.
Returns a timestamp from date and time. The date specifies the date portion (yyyy-nn-dd) of the timestamp. The time argument specifies the time to be used when building the timestamp. The time argument must be in the hh:nn:ss format.
It allocates a single bit to mark a field as null. This type of representation is called an outof-band null. It designates a specific field value to indicate a null, for example a numeric fields most negative possible value. This type of representation is called an inband null. In-band null representation can be disadvantageous because you must reserve a field value for nulls and this value cannot be treated as valid data elsewhere.
The modify operator can change a null representation from an out-of-band null to an in-band null and from an in-band null to an out-of-band null. The record schema of an operators input or output data set can contain fields defined to support out-of-band nulls. In addition, fields of an operators interface may also be defined to support out-of-band nulls. The next table lists the
rules for handling nullable fields when an operator takes a data set as input or writes to a data set as output.
not_nullable not_nullable Source value propagates to destination. not_nullable nullable Source value propagates; destination value is never null. nullable not_nullable If the source value is not null, the source value
propagates. If the source value is null, a fatal error occurs, unless you apply the modify operator, as in Out-of-Band to Normal Representation The modify operator can change a fields null representation from a single bit to a value you choose, that is, from an out-of-band to an in-band representation. Use this feature to prevent fatal data type conversion errors that occur when a destination field has not been defined as supporting nulls. To change a fields null representation from a single bit to a value you choose, use the following osh syntax:
where:
destField is the destination fields name. dataType is its optional data type; use it if you are also converting types. sourceField is the source fields name value is the value you wish to represent a null in the output. The destField is converted from an Orchestrate out-of-band null to a value of the fields data type. For a numeric field value can be a numeric value, for decimal, string, time, date, and timestamp fields, value can be a string.
While in the input fields a null takes Orchestrates out-of-band representation, in the output a null in aField is represented by -128 and a null in bField is represented by ASCII XXXX (0x59 in all bytes). To make the output aField contain a value of -128 whenever the input contains an out-of-band null, and the output bField contain a value of 'XXXX' whenever the input contains an out-of-band null, use the following osh code: $ modifySpec = "aField = handle_null(aField, 128); bField = handle_null(bField, 'XXXX'); " $ osh " ... | modify '$modifySpec' | ... " Notice that a shell variable (modifySpec) has been defined containing the specifications passed to the operator. Normal to Out-of-Band Representation The modify operator can change a fields null representation from a normal field value to a single bit, that is, from an in-band to an out-of-band representation. To change a fields null representation to out-of band use the following osh syntax: destField[:dataType] = make_null(sourceField,value); Where:
destField is the destination fields name. dataType is its optional data type; use it if you are also converting types. sourceField is the source fields name. value is the value of the source field when it is null.
A conversion result of value is converted from an Orchestrate out-of-band null to a value of the fields data type. For a numeric field value can be a numeric value, for decimal, string, time, date, and timestamp fields, value can be a string. The following osh syntax causes the aField of
the output data set to be set to Orchestrates single-bit null representation if the corresponding input field contains -128 (in-band-null), and the bField of the output to be set to Orchestrates single-bit null representation if the corresponding input field contains 'XXXX' (inband-null). $modifySpec = "aField = make_null(aField, -128); bField = make_null(bField, 'XXXX'); " $ osh " ... | modify '$modifySpec' | ... " Notice that a shell variable (modifySpec) has been defined containing the specifications passed to the operator. Orchestrate supplies two other conversions to use with nullable fields, called null and notnull.
The null conversion sets the destination field to 1 if the source field is null and to 0 otherwise. The notnull conversion sets the destination field to 1 if the source field is not null and to 0 if it is null.
By default, the data type of the destination field is int8. Specify a different destination data type to override this default. Orchestrate issues a warning if the source field is not nullable or the destination field is nullable.