0% found this document useful (0 votes)
79 views3 pages

Ab Initio Components Summary

Uploaded by

anumeha.raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views3 pages

Ab Initio Components Summary

Uploaded by

anumeha.raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Ab Initio Components

Ab Initio graph building involves two distinct sets of components:

1. Dataset Components: These components act as containers, holding and managing


data within the graph. They play a crucial role in data storage and retrieval.
o Input File: This component holds the data your program will work with.
o Output File: This component stores the processed data from your program.
o Lookup File: This component allows you to quickly find specific data records based on
certain criteria.
o Intermediate File: This component represents one or multiple serial files or a multifile of
intermediate results that a graph writes during execution and saves for your
review afterward.
2. Program Components: These components are responsible for data processing and
transformations. They manipulate the data according to your desired outcomes within
the graph.
o Partition Components: These components are used to divide large datasets into
smaller, manageable subsets. This partitioning can significantly improve performance,
especially when dealing with massive amounts of data.
 Partition by Key: Reads records from the input port and distributes data records to its
output flow partitions according to key values.
 Partition by Round Robin: Distributes blocks of data records evenly to each output
flow in a round-robin fashion, without requiring a partitioning key.
 Partition by Expression: Distributes data records based on a specified expression.
 Partition by Range: Distributes data records based on the ranges of key values
specified for each partition.
o Departition Components: These components are essentially the opposite of partitioning
components. They combine multiple flow partitions of data records into a single flow.
 Concatenate: This component appends multiple flow partitions one after another,
creating a single flow of concatenated data.
 Gather: This component collects data records from multiple flow partitions arbitrarily,
without preserving any specific order.
 Interleave: This component combines data records from multiple flow partitions in a
round-robin fashion, alternating blocks of data from each partition.
 Merge: This component combines data records from multiple flow partitions that have
been sorted according to the same key specifier, maintaining the sort order.
Transform Components

Transform components are the workhorses of Ab Initio, responsible for manipulating


and transforming data within a graph. They perform a wide range of operations, from
simple data cleaning to complex data analysis.

 Data Manipulation Components:


o Filter: Selects data records based on specified criteria.
o Sort: Sorts data records according to a specified key.
o Join: Combines data from multiple sources based on a common key.
o Aggregate: Calculates summary statistics for groups of data records.
o Project: Selects specific columns from a dataset.
o Reformat: Converts data from one format to another (e.g., text to numeric).
o Lookup: Retrieves data from a lookup table based on a specified key.
o Replicate: Creates multiple copies of a dataset.
o Union: Combines data from multiple datasets into a single dataset.
o Set Operations: Performs set operations like intersection, union, and difference on
datasets.
 Data Transformation Components:
o Decode: Decodes data that has been encoded using a specific format.
o Encode: Encodes data into a specific format.
o Validate: Validates data against predefined rules and constraints.
o Derive: Creates new data fields based on existing data.
o Mask: Masks sensitive data to protect privacy.
o Encrypt: Encrypts data to secure it from unauthorized access.
o Decrypt: Decrypts encrypted data.
o Format: Formats data according to specific requirements (e.g., date formatting, number
formatting).
 Control Flow Components:
o Decision: Branches the data flow based on a condition.
o Loop: Repeats a set of operations until a condition is met.
Additional Utilities
 Filter by Expression: This component is used to filter data records based on specific
criteria.
 Join: This component combines data from multiple sources based on a common key.
 Reformat: This component allows you to rearrange data by adding, removing,
combining, or changing fields.
 Rollup: This component summarizes data into a smaller form by grouping and
calculating totals.
 Aggregate: Similar to Rollup but with less control over data processing.
 Scan: This component generates a series of summary records that accumulate data
over time.

You might also like