0% found this document useful (0 votes)
275 views9 pages

Abinitio Introduction

Uploaded by

Ayush nema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
275 views9 pages

Abinitio Introduction

Uploaded by

Ayush nema
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 9

Abinitio_Introduction:

===================
1.Input file
2.output file
3.Reformat
4.join
5.Normalize
6.Rollup
7.Scan
8.Filter by Expression
9.Dedup Sorted
10.Sort
11.Gather
12.Marge
13.Interleave
14 PBE (Partition by Expression
15.PBK-Partition by Key
16.Partition by Round-robin
17.Replicate
18.Lookup Utilities

====================
1.Input file : Purpose

INPUT FILE represents records read as input to a graph from one or more serial
files or from a multifile.

You can also use INPUT FILE to read files directly from a Hadoop Distributed File
System.
For more information about using the Co>Operating System withHadoop, see the Hadoop
Integration Guide.
You can also use INPUT FILE to read files directly from Amazon S3 and Google Cloud
Storage. For more information, see “URLs for cloud storage”.

INPUT FILE is not a phased component. The phase number it displays in the GDE
refers to the phase number of the component that reads from it. (If the INPUT FILE
is read in more than one phase, the phases are shown as a range of phase numbers.)
For more information, see “Phases and checkpoints” in the Graph Developer’s Guide.

===================
2.output file : Purpose

OUTPUT FILE represents records written as output from a graph into one or more
serial files or a multifile.

You can use OUTPUT FILE to write records to a Hadoop Distributed File System
(HDFS). In this case, the file is always written in serial. For moreinformation
about using the Co>Operating System with Hadoop, see the Hadoop Integration Guide.

You can also use OUTPUT FILE to write files directly to Amazon S3 and Google Cloud
Storage. For more information, see “URLs for cloud storage”.

When the target of an OUTPUT FILE component is a special filesuch as /dev/null,


NUL, a named pipe, or some other special file,
theCo>Operating System never deletes and re-creates that file, nor does it ever
truncate it.

OUTPUT FILE is not a phased component. The phase number itdisplays in the GDE
refers to the phase number of the component that writes to it. For moreinformation,
see “Phases and checkpoints” in the Graph Developer’s Guide.

==================
3.Reformat : Purpose

REFORMAT changes the format of records by dropping fields, or by using DML


expressions to add fields, combine fields,
or transform the data in the records.

REFORMAT performs an implicit reformat when you do not define a reformat function.
For more information, see “Implicit reformat” in the Graph Developer’s Guide.
(you are not be able to use aggr funct in reformat)
==================
4.join : Purpose

JOIN reads data from two or more input ports, combines records with matching keys
according to the transform you specify,
and sends the transformed records to the output port. Additional ports allow you
to collect rejected and unused records.

#right outer join:

join type: explicit


record-required0:false
record-required1:true

#left outer join:

join type: explicit


record-required0:true
record-required1:false

#fullouter :
join type: explicit
record-required0:false
record-required1:false

or
join type: full outer join

#inner join

join type: explicit


record-required0:true
record-required1:true

or

join type: Inner join

==============
5.Rollup : Purpose

ROLLUP processes groups of input records that have the same key, generating one
output record for each group. Typically, the output record summarizes or aggregates
the data in some way; for example, a simple ROLLUP might calculate a sum or average
of one or more input fields. ROLLUP can select certain information from each group;
for example, it might output the largest value in a field, or accumulate a vector
of values that conform to specific criteria.

(working same as group by)

===============
6. Normalize : Purpose(Bablesh)

NORMALIZE generates multiple output records from each of its input records. You can
directly specify the number of output records for each input record, or you can
make the number of output records dependent on a calculation.

In contrast, to consolidate groups of related records into a single record with a


vector field for each group — the inverse of NORMALIZE — you would use the
accumulation function of the ROLLUP component.

==============

7.Scan : Purpose(balesh)

For every input record, SCAN generates an output record that consists of a running
cumulative summary for the group to which the input record belongs, upto and
including the current record. For example, the output records might include
successive year-to-date totals for groups of records.

SCAN is similar to ROLLUP. The difference between the two is that SCAN produces one
output record for each input record, while ROLLUP produces one output record for
each key group. (SCAN works like a cumulative ROLLUP: the group of records being
aggregated grows with each successive input record, until a key change occurs.)

Two modes to use SCAN

You can use a SCAN component in two modes, depending on how you define the
transform parameter:

##Template mode — You define a simple scan function that typically includes
aggregation functions. For more information, see “Using SCAN in template mode”.

##Expanded mode — You create a transform using an expanded scan package. This mode
allows for scans that do not necessarily use regular aggregation functions. For
more information, see “Using SCAN in expanded mode”.

====================
8.Filter by Expression : Purpose(Where clause of sql)

FILTER BY EXPRESSION filters records according to a DML expressionor transform


function, which specifies the selection criteria.

FILTER BY EXPRESSION is sometimes used to create a subset, or sample, of the data.


For example, you can configure FILTER BY EXPRESSION to select a certain percentage
of records, or to select everythird (or fourth, or fifth, and so on) record. Note
that if you need a random sample of aspecific size, you should use the SAMPLE
component.

FILTER BY EXPRESSION supports implicit reformat. For more information, see


“Implicit reformat” in the Graph Developer’s Guide.

======================
9.Dedup Sort : Purpose(Distinct clause)

DEDUP SORTED separates one specified record in each group of records from the rest
of the records in the group.

Although it lacks a reformat transform function, DEDUP SORTED supports implicit


reformat. For more information, see “Implicit reformat” in the Graph Developer’s
Guide.

====================
10.Sort : Purpose

SORT sorts and merges records. You can use SORT to order records before you send
them to a component that requires grouped or sorted records.
The in-port of the SORT component accepts fan-in and all-to-all flows (partitioned
data). SORT automatically performs a gather on its in port, so there’s no need to
gather databefore sorting it.

Sort stability is not guaranteed: records with identical key values may not
maintain their relative order after being sorted.
If the sort key field contains NULL values, the NULL records are listed first with
ascending sort order and last with descending sort order.
Although it lacks a reformat transform function, SORT supports implicit reformat.
For moreinformation, see “Implicit reformat” in the Graph Developer’s Guide.

=====================
11.Gather : Purpose

GATHER combines records from multiple flow partitions in an arbitrary order.


You can use GATHER to:
##Reduce data parallelism, by connecting a single fan-in flow to the in port
##Reduce component parallelism, by connecting multiple straight flows to the in
port

======================
12.Marge : Purpose

MERGE combines records from multiple flows or flow partitions that have been sorted
according to the same key specifier, and maintains the sort order.
MERGE requires sorted input data, but never sorts dataitself.

=======================
13.Interleave : Purpose

INTERLEAVE combines blocks of records from multiple flow partitions in round-robin


fashion. You can use INTERLEAVE to undo the effects of PARTITION BY ROUND-ROBIN.

==================
14.partition by Expression : Purpose

PARTITION BY EXPRESSION distributes records to its output flow partitions according


to a specified DML expression or transform function.

The output port for PARTITION BY EXPRESSION is ordered. (See “Ordered ports” in the
Graph Developer’s Guide.) Although you can use fan-out flows on the outport, we do
not recommend connecting multiple fan-out flows. You may connect a single fan-out
flow; or, preferably, limit yourself to straight flows on the outport.

PARTITION BY EXPRESSION supports implicit reformat. For moreinformation, see


“Implicit reformat” in the Graph Developer’s Guide.

==================
15.partition by key : Purpose

PARTITION BY KEY distributes records to its output flow partitions according to key
values.
How PARTITION BY KEY interprets key values depends on the internal representation
of the key. For example, the number 4 in a field of type integer(2) is not
considered identical to the number 4 in a field of type decimal(4).

Although it lacks a reformat transform function, PARTITION BY KEY supports implicit


reformat. For more information, see “Implicit reformat” in the Graph Developer’s
Guide.

===================
16. partition by round-robin : Purpose

PARTITION BY ROUND-ROBIN distributes blocks of records evenly to each output flow


in round robin fashion. The output port for PARTITION BY ROUND-ROBIN is ordered.
(See “Ordered ports” in the Graph Developer’s Guide.)

For information on undoing the effects of PARTITION BY ROUND-ROBIN,see INTERLEAVE.

Although it lacks a reformat transform function, PARTITION BY ROUND-ROBIN supports


implicit reformat. For more information, see “Implicit reformat” in the Graph
Developer’s Guide.

===================
17. Replicate : Purpose

REPLICATE arbitrarily combines all records it receives into a single flow and
writes a copy of that flow to each of its output flows. Use REPLICATE to support
component parallelism — such as when you want to perform more than one operation on
a flow of records coming from an active component.

The input port of REPLICATE has an implicit gather. If this port receives more than
one input stream, REPLICATE does not preservesort order.

REPLICATE is not required when you want to read data from an INPUT FILE component
on multiple flows. Rather, you can connect multiple flows directly to the INPUT
FILE. Thedownstream components each read the data independently, which sometimes
results in improvedgraph performance.

=====================
18. Lookup : From a lookup file, returns the first record that matches a specified
expression.

In dynamic lookups, the function can use either a LOOKUPTEMPLATE component or a


lookup template type to reference the lookupdata.

NOTE:
Errors are generated if calls to lookup and lookup_local are made to the same file
from a single component. To avoid this error, use either the lookup or lookup_local
function, but not both, to call a single file from a single component.

Syntax for looking up data statically


record lookup(string lookup_file [ , expr ... ] )
Syntax for looking up data dynamically
record lookup(lookup_identifier_type lookup_id, string lookup_template [ , expr ...
] )
record lookup(lookup_template_type [ , expr ... ] )

When you are looking up data statically, this function returns the first record
that matches the values of expr from the lookup file lookup_file. When you are
looking up data dynamically, itreturns the first record that matches the values of
expr fromthe lookup file referenced by lookup_id or the lookup_template_type
handle. If no matching recordexists, the function returns NULL.

Since this function returns a whole record, it can be used as part of a


fieldextraction expression that returns just a single field of the record.

============================
19. Buffered copy : Purpose

BUFFERED COPY regulates the flow of records in a graph by buffering data when the
downstream components experience a slowdown. This allows upstream components to
continue working while the downstream component is blocked.

You can use BUFFERED COPY to do the following:

##Regulate the flow of data from a partitioner into CPU-intensive components or


into other components with slow throughput.

##Maximize your effective CPU if your graph is locally partitioning serial to


parallel, and your average record size is larger than what can be handled on a
flow.If your average record size exceeds 64K (the size of one segment of the
memoryflows) the downstream component must wait until each part ition receives
a fullrecord, starving other partitions and reducing performance. Inserting a
BUFFERED COPY downstream from the partitioner avoids this performance reduction.

============================ Ab-initio software Introduction


===============================

1.Abinitio_Introduction:
Ab Initio means 'starts from the beginning'. Ab Initio tool is a fourth generation
data analysis, data manipulation,batch-processing and graphical user
interface(GUI)-based parallel processing product used to Extract, Transform and
Load (ETL)data.Ab Initio Software is a widely used Business Intelligence Data
Processing Platform.The Ab Initio Component Library is a reusable software module
for sorting, data transformation, and high-speed database loading and unloading.

AbInitio is an array of applications consisting of various components, such as


AbInitio is a Business Intelligence platform comprised of six data processing
products: Co>Operating System, The Component Library, Graphical Development
Environment, Enterprise Meta>Environment, Data Profiler, and Conduct>It
2.Co-operating System
3.Graphical Development Environment (GDE)
4.Enterprise Meta Environment (EME)
5.Conduct It
6.Data Profiler and
7.The Component Library
Ab initio offers a robust based architecture model, this provides a simple, fast,
and highly secured data integration application system.
This tool also integrates the diverse, continuous, and complex data stream which
can range from gigabytes to terabytes.

2.Co-operating System :

This operates on the top of operating system and is a basement for all Ab Initio
processes. This component offers following features:

Run and manage ab initio graphs and control ETL processes.


ETL processes debugging and monitoring.
Ab initio extensions are provided to the operating system.
Provides interaction with the Enterprise Meta Environment (EME) and Metadata
management.

Ab Initio Co>Operating System is the foundation for all AbInitio applications.


It provides a general engine for data integration of all kinds of data processing
and communication between all the tools
within the platform.
It work on server-model. In brief, server is called 'Co-operating System', the co-
operating system can stay in a Unix remote machine
or mainframe machine.It runs on OS/390, zOS on Mainframe, Unix, Linux, and Windows.

3.Graphical Development Environment :

AbInitio Graphical Development Environment (GDE) is an IDE to enable creation of


abinitio applications
by dragging and dropping components onto a canvas and configuring them using check-
point and click operations.
This provides an easy to use front-end application for designing ETL graphs.
It facilitates to run and debug abinitio jobs.
It is also provide traces execution logs.This tool or software works with the
client-model.
In brief, client is called 'Graphical Development Environment'.
Compilation process of ab initio ETL graphs results from the information of UNIX
shell script,which can be carried out without
the installation of GDE. This component helps developers in running and designing
abinitio graphs.

4.Enterprise Meta Environment (EME):

This is an abinitio environment and repository used for storing and managing
metadata.
It has the capability to store both technical and business metadata.EME metadata
can be accessed from the web browser,
Ab initio GDE and Co operating system command line.
The Enterprise Meta>Environment® (EME®) product is an enterprise data catalog and
metadata management solution.
It provides broad capabilities for all metadata created and used by the
Co>Operating System product and a large number of third-party products
databases (such as Snowflake, Teradata, DB2, and Oracle)

5.Conduct IT:
This is an environment used for creating abinitio data integration systems.
The prime focus of this environment is to create special type of graphs called
abinitio plans.
Abinitio provides both command-line and graphical interface to Conduct IT.
6.Data Profiler:

This runs on top of co-operating system in graphic environment,


This is an analytical application which can determine data range, quality,
distribution, variance and scope.

7. Control Center:

The Control>Center® product provides complete job monitoring and operational


management along with optional scheduling of Ab Initio applications.
The Continuous>Flows® product enables applications to apply complex logic and
business rules to high performance,
low latency real-time processing applications.

8. Metadata Hub:

The AbInitio Metadata Hub acts as the data governance component for AbInitio's data
management platform.
It can be used as either a system of record or a system of reference, is able to
govern technical, business,
and even logical assets, and provides both business and technical lineage.
The Metadata hub is used for handling the interchange and distributing of technical
Metadata between decision processing products.
It is designed for use primarily by technical staff during the growth and
maintenance of data warehouses.

9.Express>It:

The Express>It® product enables business-oriented users to rapidly configure, test,


and deploy metadata-driven applications
that have been built using the Co>Operating System product.
A set of advanced user interface controls and dialogues is provided that can easily
be customized to address
the specific requirements of the business application.
These controls include the Business Rules Environment® (BRE®) and Easy>Data™
products for creating, managing,
and testing rules and processing logic; the Profiler results viewer; an EME
metadata browser; a data viewer/editor; and a data format editor.

10.Continuous Flow:

The Continuous>Flows® product enables applications to apply complex logic and


business rules to high performance,
low latency real-time processing applications.
These applications can call and implement microservices, use the Active>Data®
product for in-memory resilient stateful services,
as well as connect to industry-standard messaging services such as Kafka, MQ, and
Kinesis.
Like all Ab Initio applications, real-time applications are highly robust because
of Ab Initio’s patented checkpointing system.

11.Query>It:
The Query>It® product enables large amounts of data to be stored and then queried
via SQL.
The data can be stored in a wide range of file systems, such as S3, Hadoop (HDFS),
or Ab Initio’s own highly efficient parallel and indexed file system (ICFF).
In addition to supporting SQL query access to these data stores, the Query>It
product can also reference data held in third-party databases
(such as Snowflake, Teradata, DB2, and Oracle), allowing federated, push-down query
access to heterogeneous data.

12.Test Data Management (TDM):


Ab Initio Test Data Management is a test data management application within the
broader Ab Initio data management platform.
Said platform can be deployed on-prem or in-cloud; operates on structured,
unstructured, and semi-structured data;
and features a highly portable 'build once, run anywhere' architecture.

13.Sandbox:
Sandbox is a special directory which contains some specific sub directories to
store Ab initio graphs and related files.
Sandbox is the user's work Area.

You might also like