0% found this document useful (0 votes)
51 views5 pages

Debug and Development Stages

The document discusses several stages used for debugging and development in data integration jobs, including row generators, column generators, peek stages, sample stages, head stages, and tail stages. It also covers normal and sparse lookup stages, differences between lookup, join, and merge stages, and common data source and target stages like sequential files, data sets, file sets, complex flat files, external sources, and external targets. Finally, it summarizes different types of data schemas including star schemas, snowflake schemas, galaxy schemas, and fact constellation schemas.

Uploaded by

Ankur Virmani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views5 pages

Debug and Development Stages

The document discusses several stages used for debugging and development in data integration jobs, including row generators, column generators, peek stages, sample stages, head stages, and tail stages. It also covers normal and sparse lookup stages, differences between lookup, join, and merge stages, and common data source and target stages like sequential files, data sets, file sets, complex flat files, external sources, and external targets. Finally, it summarizes different types of data schemas including star schemas, snowflake schemas, galaxy schemas, and fact constellation schemas.

Uploaded by

Ankur Virmani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Debug and development stages

Row Generator : produces a set of test data which fits the specified metadata (can be random or cycled through a specified list of values). Useful for testing and development. Column Generator : adds one or more column to the incoming flow and generates test data for this column. Peek Stage : prints record column values to the job log which can be viewed in Director. It can have a single input link and multiple output links. Sample Stage : samples an input data set. Operates in two modes: percent mode and period mode. Head : selects the first N rows from each partition of an input data set and copies them to an output data set. Tail : is similiar to the Head stage. It select the last N rows from each partition. Write Range Map : writes a data set in a form usable by the range partitioning method.

Normal and sparse lookup


Normal Lookup: In this when ever ds wants to look up it just place the target table data into buffer. Sparse Lookup: To look up it fires an sql query to the database instead of placing into buffer. To use sparse look up your target database must be larger than source table and your target should be adatabase of any type. sparse means we will generate query source bases and one by one record ,and normal is we can take look up on target table at a time multiple records

Sparse lookup sends individual SQL statement for every incoming row (If stream data is huge you can imagine the number of times it has to hit DB and hence the down side on performance). It can be used when you want to get the next sequence number from your database (Again expensive overhead on your job as noted before). Also note that sparse lookup is only available for DB2 and Oracle. .

Normal might provide poor performance when the reference data is huge as it has to load large data into memory.

Difference b/w Lookup, Join and Merge


If the reference datasets are big enough to cause trouble, use a join else use lookup stage. Lookup is used for less amount of data because it will takes the data from source and store in buffer so every time it process from buffer.

Unlike Join stages and Lookup stages, the Merge stage allows you to specify several reject links as many as input links. Merge is also used for huge amount of data. Merge has the same number of reject links as there are updated updated input links.

Routines
RoutinesRoutines are stored in the Routines branch of the DataStage Repository,where you can create, view, or edit them using the Routine dialog box.

1) Transform functions 2) Before-after job subroutines 3) Job Control routines

Sequencers A sequencer allows you to synchronize the control flow of multiple activities in a job sequence. It can have multiple input triggers as well as multiple output triggers.The sequencer operates in two modes: ALL mode : In this mode all of the inputs to the sequencer must be TRUE for any of the sequencer outputs to fire. ANY mode:In this mode, output triggers can be fired if any of the sequencer inputs are TRUE.

Sequential File : is used to read data from or write data to one or more flat (sequential) files. Data Set Stage : allows users to read data from or write data to a dataset. Datasets are operating system files, each of which has a control file (.ds extension by default) and one or more data files (unreadable by other applications) File Set Stage : allows users to read data from or write data to a fileset. Filesets are operating system files, each of which has a control file (.fs extension) and data files. Unlike datasets, filesets preserve formatting and are readable by other applications. Complex Flat File : allows reading from complex file structures on a mainframe machine, such as MVS data sets, header and trailer structured files, files that contain multiple record types, QSAM and VSAM files.

External Source : permits reading data that is output from multiple source programs. External Target : permits writing data to one or more programs. Lookup File Set Stage : Set is similiar to FileSet stage. It is a partitioned hashed file which can be used for lookups.

Types of schema

Star Schema: A star schema is the one in which a central fact table is sourrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table.

Snow Flake Schema: A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions.

Galaxy Schema: Galaxy schema contains many fact tables with some common dimensions (conformed dimensions). This schema is a combination of many data marts.

Fact Constellation Schema: The dimensions in this schema are segregated into independent dimensions based on the levels of hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state and city; constellation schema would have five dimensions instead of one.

You might also like