Ab Initio Training Slides and Documents
Ab Initio Training Slides and Documents
1
Confidential
Ab Initio
2
Confidential
Ab Initio
Co-operating System
On a typical installation, the Co-operating system is
installed on a Unix or Windows NT server while the
GDE is installed on a Pentium PC.
3
Confidential
Ab Initio
4
Confidential
Ab Initio
The GDE …
can talk to the Co-operating system using several protocols like Telnet,
release
Note: During deployment, GDE sets AB_COMPATIBILITY to the Co>Operating System version number. So, a change in the
5
Confidential
Ab Initio
A Graph
6
Confidential
Ab Initio
A Sample Graph …
Datasets
Dataset
Components
L1
Other
Customers
Flows
7
Confidential
Ab Initio
A Sample Graph …
Expression Metadata
Ports
Layout
8
Confidential
Ab Initio
Files
Formats
Components
Flows
Layouts
Building with mp job
Building with mp run
9
Confidential
Ab Initio
Setup Command
◦ Ab Initio Host (AIH) file
◦ Builds up the environment to run an Ab Initio application.
Graph
End Script
◦ Local to the Graph
10
Confidential
Ab Initio
11
Confidential
Ab Initio
12
Confidential
Ab Initio
DML
◦ Ab Initio stores metadata in the form of record formats.
XFR
◦ Data can be transformed with the help of transform functions.
13
Confidential
Ab Initio
0212Sam Spade
0492Sue West
Data Types 0221William Black
record
decimal(4) id;
DML BLOCK string(6) first_name;
string(6) last_name;
end
14
Confidential
Ab Initio
DML Syntax
15
Confidential
Ab Initio
0322,17-01-00, 890.50Elvis,Jones
0492,25-12-02,1000.00Sue,West
decimal(“,”) id;
date(“DD-MM-YY”)(“,”) join_date;
decimal(7,2) salary_per_day;
Precision string(“,”) first_name;
& Scale
string(“\n”) last_name;
end
16
Confidential
Ab Initio
17
Confidential
Ab Initio
record
0345,090297John,Smith; decimal(7) id;
date(“MMDDYY”) join_date;
string(“,”) first_name;
Drop string(“;”) last_name;
end
Reformat
Reformat Reorder
id+1000000
record
decimal(7) id;
string(8) last_name;
date(“DD-MM-YY”)(“,”) join_date; 1000345,Smith 1997-09-02
end
18
Confidential
Ab Initio
19
Confidential
Ab Initio
Ab Initio >
DAY 2
20
Confidential
Ab Initio
Filter by Expression
Reformat
Redefine Format
Sort
Join
Replicate
Dedup
Aggregate
Rollup
Scan
21
Confidential
Ab Initio
expr
true?
Yes No
22
Confidential
Ab Initio
REJECT
◦ Input records that caused error
ERROR
◦ Associated error message
LOG
◦ Logging records
23
Confidential
Ab Initio
24
Confidential
Ab Initio
Count
Reject-Threshold
◦ Abort
◦ Never Abort
Limit
Ramp
25
Confidential
Ab Initio
Limit
◦ Number of errors to tolerate
Ramp
◦ Scale of errors to tolerate per input
26
Confidential
Ab Initio
Keys
A key identifies a field or set of fields to organize a dataset
◦ Single Field: employee_number
◦ Multiple field or Composite key: (last_name; first_name)
◦ Modifiers: employee_number descending
A surrogate key is a substitution for the natural primary key.
It is just a unique identifier or number for each record like ROWID of an Oracle table
Sort Component
Reads records from input port, sorts them by key, writes result to output port
Parameters
◦ Key
◦ Max-core
27
Confidential
Ab Initio
PORTS PARAMETERS
in count
out key
unused override key
reject (optional) transform
error (optional) limit
log (optional) ramp
28
Confidential
Ab Initio
Join Types
◦ Inner
◦ Outer
◦ Explicit
Join Methods
◦ Merge Join
Using sorted inputs
◦ Hash Join
Using in-memory hash tables to group input
29
Confidential
Ab Initio
transform function.
An example
30
Confidential
Ab Initio
Aggregate/Rollup/Scan
Generates summary records for group of input records
31
Confidential
Ab Initio
Name Description
Normalize Generates multiple data records from each input data record
Separate a data record with a vector field into several individual records, each containing
one element of the vector.
Denormalize Consolidates groups of related data records into a single output record with a vector
Sorted field for each group
Requires Grouped Input
Validate Separates valid data records from invalid data records
Records
Check Order Tests whether data records are sorted according to a key-specifier.
32
Confidential
Ab Initio
33
Confidential
Ab Initio
34
Confidential
Ab Initio
A Vector is
number of repeats is a constant integer or the value of another field in the record.
record
string(20) cust_id;
decimal(3) num_purchases;
decimal(8.2)[num_purchases]purchase_amt;
end;
Initializing a Vector
let decimal(‘,’)[100] values = make_constant_vector(100, 0);
35
Confidential
Ab Initio
Input Table
Output Table
Update Table
DB table
36
Confidential
Ab Initio
Truncate Table
Run SQL
37
Confidential
Ab Initio
Ab Initio >
DAY 3
38
Confidential
Ab Initio
39
Confidential
Ab Initio
Sorting Customers
Sorting Transactions
40
Confidential
Ab Initio
Processing Record 99
41
Confidential
Ab Initio
When data is divided into segments or partitions and processes run simultaneously on each partition
Expanded View
Global View
Multifile
42
Confidential
Ab Initio
Multifiles
43
Confidential
Ab Initio
mfile://host1/u/jo/mfs/mydir
//host1/u1/jo/mfs
//host1/vol4/pA/mydir //host2/vol3/pB/mydir //host3/vol7/pC/mydir
<.mdir>
44
Confidential
Ab Initio
mfile://host1/u/jo/mfs/mydir/myfile.dat
//host1/u1/jo/mfs/mydir
/myfile.dat //host1/vol4/pA/mydir //host2/vol3/pB/mydir //host3/vol7/pC/mydir
/myfile.dat /myfile.dat /myfile.dat
45
Confidential
Ab Initio
A multidirectory
A multifile
Control file
Partitions (Serial Files)
46
Confidential
Ab Initio
47
Confidential
Ab Initio
Partition by Round-robin
Broadcast
Partition by Key
Partition by Expression
Partition by Range
Partition by Percentage
48
Confidential
Ab Initio
49
Confidential
Ab Initio
Partition by key
◦ Records with same key value goes into the same partition
50
Confidential
Ab Initio
D D A B D
A B D
F F
A B D
E E
A E D
A A
C E F
D D
C G F
51
Confidential
Ab Initio
52
Confidential
Ab Initio
Partition by Range
◦ partitions according to the ranges of key values specified
for each partition
◦ Splitters or split port
◦ Partition by Range + Sort Global Ordering
Partition by Percentage
◦ distributes a specified percentage of the total number of
input data records to each output flow
◦ Pct port
53
Confidential
Ab Initio
Partition by Expression
◦ partitions according to a specified hash function or DML
expression
Broadcast
54
Confidential
Ab Initio
55
Confidential
Ab Initio
A A
A A
A A p0 A A
B A A
C B C A p0
D B D
A B p1 A B
D D B p1
C C D
D C D C p2
B C p2 B
C D D
B D B D
A D A D
D p3 D
D p3
Partitions evenly balanced Partitions skewed
56
Confidential
Ab Initio
Concatenate
Merge
Global View
Interleave
Expanded View
57
Confidential
Ab Initio
Gather
◦ Reads data records from the flows connected to the input port
Concatenate
another
Merge
◦ Combines data records from multiple flow partitions that have been sorted on
a key
58
Confidential
Ab Initio
59
Confidential
Ab Initio
Ab Initio >
DAY 4
60
Confidential
Ab Initio
Serial or Multifiles
Searching and Retrieval is key-based and faster as compared to files stored on disks
associates key values with corresponding data values to index records and retrieve them
Lookup parameters
◦ Key
◦ Record Format
61
Confidential
Ab Initio
Storage Methods
Lookup Functions
NOTE: Data needs to be partitioned on same key before using lookup local functions
62
Confidential
Ab Initio
functions
next_in_sequence()
this_partition()
number_of_partitions()
last_generated_key
63
Confidential
Ab Initio
◦ when the graph stops progressing because of mutual dependency of data among
components
◦ Identified when record count in does NOT change over a period of time
◦ Phasing
◦ Checkpointing
◦ Flow Buffering
64
Confidential
Ab Initio
Blocking on read
Blocking on write
65
Confidential
Ab Initio
Ab Initio >
Performance
66
Confidential
Ab Initio
◦ Source/Target files
◦ Temporary files
Phases
Checkpoints
Buffered Flows
67
Confidential
Ab Initio
datasets
phases
checkpoints
◦ "spilling" to disk
sort
68
Confidential
Ab Initio
69
Confidential
Ab Initio
Memory: Consumers
◦ Lookup Tables
◦ In-memory components
Join
70
Confidential
Ab Initio
Memory: max_core
◦ Exceeding max-core
◦ Issues
71
Confidential
Ab Initio
separately
◦ If multiple graphs require same database table, unload to a file and replicate.
72
Confidential
Ab Initio
Performance Enhancements:
◦ Memory usage:
73
Confidential
Ab Initio
Performance Enhancements
74
Confidential
Ab Initio
of data to disks
75
Confidential
Ab Initio
Purpose:
◦ At job start, output datasets are copied to temporary files (in .WORK dirs)
temporary files
76
Confidential
Ab Initio
◦ Same as phase
m_rollback –d graphname.rec
WARNING: rolling back old .rec files will restore the output files to old state
77
Confidential
Ab Initio
78
Confidential
Ab Initio
79
Confidential
Ab Initio
◦ Vertex: component
◦ Records: # processed
80
Confidential
Ab Initio
◦ Vertex: component
◦ Port: of component
◦ compare open vs. closed partitions:serialized when some partitions remain open long after
81
Confidential
Ab Initio
Avoid Sorts
Use Lookups
Phasing
82
Confidential
Ab Initio
THANK YOU
83
Confidential