BODI Training v1.1
BODI Training v1.1
BODI Training v1.1
Audience Application Developers Consultants Database Administrators working on data extraction, data warehousing, or data integration.
2/18/2011
Assumptions You understand your source data systems, RDBMS, business intelligence and e-commerce messaging concepts. You are familiar with SQL (Structured Query Language). You are familiar with Microsoft Windows or UNIX platforms to effectively use Data Integrator.
2/18/2011
Business Objects Data Integration Platform The Data Integration Platform consists of Data Integrator: data movement and management server Rapid Marts: suite of packaged data caches for speedy delivery and integration of data
2/18/2011
2/18/2011
Account Payable ----> FI-FInance Account Receivable ----> FI-FInance Cost Center ----> CO-Controlling Human Resources ----> HR-Human Resources Inventory ----> MM-Materials Movement Plant Maintenance ----> PM-Plant Maintenance Production Planning ----> PP-Production Planning Project Systems ----> PS-Project Systems Purchasing ----> SD-Sales and Distribution Sales ----> SD-Sales and Distribution 2/18/2011
6
Data Integrator
DI is a data movement and integration platform
2/18/2011
2/18/2011
2/18/2011
2/18/2011
2/18/2011
11
Data Integrator Components DI Job Server starts the data movement engine that integrates data from multiple heterogeneous sources, performs complex data transformations, and manages extractions and transactions. can move data in either batch or real-time mode and uses distributed query optimization, multithreading, in-memory caching, in-memory data transformations, and parallel pipelining to deliver high data throughput and scalability.
2/18/2011
12
Data Integrator Components DI Engine When DI jobs are executed, the Job Server starts DI engine processes to perform data extraction, transformation, and movement. DI engine processes use parallel-pipelining and in-memory data transformations to deliver high data throughput and scalability.
2/18/2011
13
Data Integrator Components DI Designer allows for defining data management applications which consist of data mappings, transformations, and control logic. a development tool with a graphical user interface. It enables developers to create objects, then drag, drop, and configure them by selecting icons in flow diagrams, table layouts and nested, workspace pages. 2/18/2011
14
DI Repository
a set of tables that hold user-created and predefined system objects, source and target metadata, and transformation rules. It is set up on an open client/server platform to facilitate the sharing of metadata with other enterprise tools. Each repository is stored on an existing RDBMS. associated with one or more DI Job Servers.
15
2/18/2011
2/18/2011
16
2/18/2011
17
2/18/2011
18
2/18/2011
19
2/18/2011
20
Data Integrator Components DI Service The DI Service is installed when DI Job and Access Servers are installed. The DI Service starts Job Servers and Access Servers when you reboot your system. The Windows service name is DATA INTEGRATOR Service. The UNIX equivalent is a daemon named AL_JobService.
2/18/2011
21
Data Integrator Components DI SNMP Agent DI error events can be communicated using SNMP-supported applications for better error monitoring. The DI SNMP agent monitors and records information about the Job Servers and jobs running on the computer where the agent is installed.
2/18/2011
22
2/18/2011
23
Single-use objects
Objects that are defined only within the context of a single job or data flow E.g. Scripts
2/18/2011
24
2/18/2011
25
Projects
A reusable object that allows you to group jobs. highest level of organization offered by DI. used to group jobs that have schedules that depend on one another or that you want to monitor together. Only one project can be open at a time. Projects cannot be shared among multiple users.
2/18/2011
26
Jobs A job is the only object that is executed. The following objects can be included in a job definition:
Data flows Transforms Work flows Scripts Conditionals While Loops Try/catch blocks.
2/18/2011
27
Datastores represent connections between DI and databases or applications, directly or through adapters. allow DI to access metadata from a database or application and hence to read from or write to a database or an application. DI datastores can connect to: Databases and mainframe file systems. Applications that have pre-packaged or userwritten DI adapters. SAP R/3, SAP BW, PeopleSoft, J.D. Edwards One World, and J.D. Edwards World. 2/18/2011
28
File Formats DI can use data stored in files for data sources or data targets. File format objects can describe files in: Delimited format Characters such as commas or tabs separate each field Fixed width format The column width is specified by the user SAP R/3 format
2/18/2011
29
Data Flows
Data flows extract, transform, and load data; reading sources, transforming data, and loading targets, occurs inside a data flow. A data flow can be added to a job or a work flow. From inside a work flow, a data flow can send and receive information to and from other objects through input and output parameters.
2/18/2011
30
Data Flows
Input Parameters
Source(s)
Target(s)
Output Parameters
2/18/2011
31
Work Flows
A work flow defines the decision-making process for executing data flows. The purpose of a work flow is to prepare for executing data flows and to set the state of the system after the data flows are complete. The following objects can be elements in work flows:
Work f lows Data flows Scripts Conditionals While loops Try/catch blocks
2/18/2011
32
Work Flows
Control Operations
Data Flow
Control Operations
2/18/2011
33
Conditionals Conditionals are single-use objects used to implement if/then/else logic in a work flow. To define a conditional, you specify a condition and two logical branches: If A Boolean expression that evaluates to TRUE or FALSE. You can use functions, variables, and standard operators to construct the expression. Then Work flow elements to execute if the If expression evaluates to TRUE. Else (Optional) Work flow elements to execute if the If expression evaluates to FALSE. 2/18/2011
34
Conditionals
Work Flow Conditional True If Process Successful False Else Send E-mail Then Run Work Flow
2/18/2011
35
While Loops The while loop is a single-use object that you can use in a work flow. The while loop repeats a sequence of steps as long as a condition is true.
2/18/2011
36
While Loops
2/18/2011
37
Try / Catch Blocks A try/catch block is a combination of one try object and one or more catch objects that allow you to specify alternative work flows if errors occur while DI is executing a job. Try/catch blocks:
Catch classes of exceptions thrown by DI, the DBMS, or the operating system Apply solutions that you provide Continue execution
2/18/2011
39
Scripts
Scripts are single-use objects used to call functions and assign values to variables in a work flow. A script can contain the following statements:
Function calls If statements While statements Assignment statements Operators
40
2/18/2011
2/18/2011
41
2/18/2011
42
Variables
Global Variables
Global variables are global within a job. Once a name for a global variable is used in a job that name becomes reserved for the job. Global variables are exclusive within the context of the job in which they are created. Setting parameters is not necessary when you use global variables.
2/18/2011
43
Parameters
Parameters are expressions passed to a work flow or data flow when the work flow or data flow is called. Parameters can be defined to pass values into and out of work flows, data flows, and custom functions
2/18/2011
44
Transforms
The following transforms are available from the object library on the Transforms tab.
-- Case -- Effective_Date -- History_Preserving -- Map_Operation -- Pivot (Columns to Rows) -- Reverse Pivot (Rows to Columns) -- SQL -- Date_Generation -- Hierarchy_Flattening -- Key_Generation -- Merge -- Query -- Row_Generation -- Table_Comparison
2/18/2011
45
Query Transform
Retrieves a data set that satisfies conditions that you specify. A query transform is similar to a SQL SELECT statement.
2/18/2011
46
Query Transform
2/18/2011
47
Query Transform
Input Schema
Output Schema
Options
2/18/2011
48
Case Transform
Specifies multiple paths in a single transform (different rows are processed in different ways). Simplifies branch logic in data flows by consolidating case or decision making logic in one transform. Paths are defined in an expression table.
2/18/2011
49
Case Transform
2/18/2011
50
Case Transform
2/18/2011
51
SQL Transform
Performs the indicated SQL query operation. Use this transform to perform standard SQL operations for things that cannot be performed using other built-in transforms.
2/18/2011
52
SQL Transform
2/18/2011
53
SQL Transform
2/18/2011
54
Merge Transform
Combines incoming data sets, producing a single output data set with the same schema as the input data sets.
2/18/2011
55
Merge Transform
2/18/2011
56
Merge Transform
2/18/2011
57
Row_Gen Transform
Produces a data set with a single column. The column values start from zero and increment by one to a specified number of rows.
2/18/2011
58
Row_Gen Transform
2/18/2011
59
Key_Generation Transform
Generates new keys for new rows in a data set. The Key_Generation transform looks up the maximum existing key value from a table and uses it as the starting value to generate new keys.
2/18/2011
60
Key_Generation Transform
2/18/2011
61
Key_Generation Transform
2/18/2011
62
Date_Generation Transform
Produces a series of dates incremented as you specify. Use this transform to produce the key values for a time dimension target. From this generated sequence you can populate other fields in the time dimension (such as day_of_week) using functions in a query.
2/18/2011
63
Date_Generation Transform
2/18/2011
64
Date_Generation Transform
Date_Generation
2/18/2011
65
Table_Comparison Transform
Compares two data sets and produces the difference between them as a data set with rows flagged as INSERT or UPDATE. The Table_Comparison transform allows you to detect and forward changes that have occurred since the last time a target was updated.
2/18/2011
66
Table_Comparison Transform
2/18/2011
67
Map_Operation Transform
Allows conversions between data manipulation operations. The Map_Operation transform allows you to change operation codes on data sets to produce the desired output. For example, if a row in the input data set has been updated in some previous operation in the data flow, you can use this transform to map the UPDATE operation to an INSERT. The result could be to convert UPDATE rows to INSERT rows to preserve the existing row in the target. 2/18/2011
68
Map_Operation Transform
2/18/2011
69
2/18/2011
70
History_Preserving Transform
The History_Preserving transform allows you to produce a new row in your target rather than updating an existing row. You can indicate in which columns the transform identifies changes to be preserved. If the value of certain columns change, this transform creates a new row for each row flagged as UPDATE in the input data set.
2/18/2011
71
2/18/2011
72
2/18/2011
74
2/18/2011
75
Functions
Functions operate on single values, such as values in specific columns in a data set. You can use functions in the following operations: Queries Scripts Conditionals You can use :
Built-in functions (DI functions) Custom functions (user-defined functions) Database and application functions (functions specific to DBMS)
2/18/2011
76
Procedures
DI supports the use of stored procedures for Oracle, Microsoft SQL Server, Sybase, and DB2 databases. You can call stored procedures from the jobs you create and run in DI
2/18/2011
77
Debugging
Execute a job in the Data Scan mode View and analyze the output data in the Data Scan window Compare and analyze different data samples
2/18/2011
78
2/18/2011
79
Schema Area
Data Area
80
2/18/2011
2/18/2011
Design Repository
Test Repository
Export to Production Repository
When moving objects from one phase to another, export jobs from your source repository to either a file or a database, then import them into your target repository.
2/18/2011
83
You can export objects from the current repository to another repository. However, the other repository must be the same version as the current one. The export process allows you to change environment-specific information defined in datastores and file formats to match the new environment.
2/18/2011
84
Exporting/Importing Objects to/from a File You can also export objects to a file. If you choose a file as the export destination, DI does not provide options to change environment specific information. Importing objects or an entire repository from a file overwrites existing objects with the same names in the destination repository. You must restart DI after the import process completes.
2/18/2011
85
Parallel Execution The maximum number of parallel DI engine processes in the Job Server options (Tools > Options> Job Server > Environment). This helps in running the transforms in parallel.
2/18/2011
86
2/18/2011
87