Normalize
Normalize
Purpose
Normalize generates multiple output data records from each of its input records. You can
directly specify the number of output records for each input record, or the number of
output records can depend on some calculation.
Parameters
Transform (filename or string, required)
Either the name of the file containing the types and transform functions, or a transform
string.
Reject-threshold (choice, required)
The component's tolerance for reject events.
See AGGREGATE component for details.
Logging (boolean, optional)
Specifies whether or not you want certain component events logged.
See AGGREGATE components for details.
Runtime Behavior of Normalize
Normalize generates a series of output data records for each input data record as follows.
1. Input selection
If input_select is defined, the input records are filtered as follows:
If input_select returns 0 for a particular record, Normalize does not process the record
(that is, the record does not appear on any output port).
If input_select returns NULL for a particular record, Normalize: Writes the record to
the reject port. Write a descriptive error message to the error port. Discards the
information if you do not connect flows to the reject or error ports
If input_select returns anything other than 0 or NULL for a particular record,
Normalize processes the record.
If you have not defined input_select, Normalize processes all records.
2. Number of iterations
Normalize executes the length transform function. The length function returns a
value, n, that specifies the number of times to call the normalize transform function
for each input record.
3. Temporary initialization
If temporary_type is defined, Normalize executes the initialize transform function to
output an initial value to the temporary variable(s).
Defining temporary_type means that you are declaring one or more temporary
variables. Typically, Normalize does not need temporary variables. If you have not
defined temporary_type, Normalize does not call the initialize transform function.
4. Computation
If you defined temporary_type, Normalize calls the normalize transform function n
times with three arguments: the temporary record, the input record, and an index
value increasing from 0 to n-1. Each time it calls the transform function it produces
a new temporary record.
If you have not defined temporary_type, Normalize calls the normalize transform
function n times with only two arguments: the input record and the index value.
Each time it calls the transform function it produces an output record.
5. Finalization
If you have defined temporary_type, Normalize calls the finalize transform function
with the temporary record and the input record as arguments every time it calls the
normalize transform function.
The finalize transform function produces an output record.
If you have not defined temporary_type, Normalize does not call the finalize
transform function; instead, the normalize function produces the output record
directly.
6. Output selection
If you have defined the output_select transform function, it is now called to filter the
output records, as follows:
Normalize ignores records for which output_select returns 0; it writes all
others to the out port.
If you have not defined the output_select transform function, Normalize
writes all output records to the out port.
If any of the transform functions returns NULL, Normalize writes:
The current input record to the reject port (if connected).
The component stops the execution of the graph when the number of reject events
exceeds the result of the following formula:
limit + (ramp * number_of_records_processed_so_far)
A descriptive error message to the error port (if connected).
If you do not connect flows to the reject or error ports, Normalize discards the
information.
About Normalize Transform Functions
What Normalize specifically does is determined by the functions, types and variables you
define in its transform parameter.
There are six permanent functions, as shown in the following table. Of these, only length
and normalize are required. Examples of most of these functions can be found in Simple
Normalize Example with Vectors.
There is also an optional temporary_type, which you can define if you need to use
temporary variables. For an example of this, see Normalize Example with More
Elaborate Transform.
The input_select transform function takes a single argument the input record and returns
a value of 0 (false) to ignore a record or non-0 (true) to accept a record.
Initialize: The initialize transform function initializes temporary storage. This transform
function takes a single argument the input record and returns a single record with type
temporary_type:
temp :: initialize(in) =
begin
temp.count :: 0;
temp.sum :: 0;
end;
Finalize: The finalize transform function performs the last step in a multistage transform:
out :: finalize(temp, in) =
begin
out.key :: in.key;
out.count :: temp.count;
out.average :: temp.sum / temp.count;
end;
The finalize transform function takes the temporary storage record and the input record
as arguments, and produces a record that has the record format of the out port.
Output_select: If you define the output_select transform function, it performs selection
of output records:
out :: output_select(final) =
begin
out :: final.average > 5;
end;
The output_select transform function takes a single argument the record produced by
finalization and returns a value of 0 (false) to ignore a record or non-0 (true) to output a
record.
Temporary_type: If you want Normalize to use temporary storage, define this storage as
a record with a type named temporary_type:
type temporary_type =
record
int count;
int sum;
end;
Normalize Example Without Vectors
Normalize can also be used to split up non-vector records, similarly to the Meta Pivot
component. An advantage Normalize has over Meta Pivot in such situations is that it can
handle nested records, which Meta Pivot cannot.
Each input record has three fields that Normalize must separately split off and
generate output records for, so the normalize function needs to be executed twice
for each record. We thus simply hard-code the value 2 for the length function's
output.
ii.
After the graph is run, the contents of the Output File are as follows:
FILTER BY EXPRESSION
Purpose
Filter by Expression filters data records according to a DML expression.
Parameters
select_expr (expression, required)
Filter for data records.
reject-threshold (choice, required)
The component's tolerance for reject events.
For more information, see "AGGREGATE" component.
FUSE
Purpose
Fuse combines multiple input flows into a single output flow by applying a transform
function to corresponding records of each flow.
NOTE: In conjunction with JOIN, Fuse supersedes MATCH SORTED.
NOTE: We recommend keeping Automatic Flow Buffering, the default, turned on for
FUSE. This component reads input from its flows in a specific order. Thus, turning off
Automatic Flow Buffering could cause deadlock..
Parameters
Count (integer, required)
An integer from 1 to 20 that sets the number of all of the following: