Debug and Development Stages
Debug and Development Stages
Row Generator : produces a set of test data which fits the specified metadata (can be random or cycled through a specified list of values). Useful for testing and development. Column Generator : adds one or more column to the incoming flow and generates test data for this column. Peek Stage : prints record column values to the job log which can be viewed in Director. It can have a single input link and multiple output links. Sample Stage : samples an input data set. Operates in two modes: percent mode and period mode. Head : selects the first N rows from each partition of an input data set and copies them to an output data set. Tail : is similiar to the Head stage. It select the last N rows from each partition. Write Range Map : writes a data set in a form usable by the range partitioning method.
Sparse lookup sends individual SQL statement for every incoming row (If stream data is huge you can imagine the number of times it has to hit DB and hence the down side on performance). It can be used when you want to get the next sequence number from your database (Again expensive overhead on your job as noted before). Also note that sparse lookup is only available for DB2 and Oracle. .
Normal might provide poor performance when the reference data is huge as it has to load large data into memory.
Unlike Join stages and Lookup stages, the Merge stage allows you to specify several reject links as many as input links. Merge is also used for huge amount of data. Merge has the same number of reject links as there are updated updated input links.
Routines
RoutinesRoutines are stored in the Routines branch of the DataStage Repository,where you can create, view, or edit them using the Routine dialog box.
Sequencers A sequencer allows you to synchronize the control flow of multiple activities in a job sequence. It can have multiple input triggers as well as multiple output triggers.The sequencer operates in two modes: ALL mode : In this mode all of the inputs to the sequencer must be TRUE for any of the sequencer outputs to fire. ANY mode:In this mode, output triggers can be fired if any of the sequencer inputs are TRUE.
Sequential File : is used to read data from or write data to one or more flat (sequential) files. Data Set Stage : allows users to read data from or write data to a dataset. Datasets are operating system files, each of which has a control file (.ds extension by default) and one or more data files (unreadable by other applications) File Set Stage : allows users to read data from or write data to a fileset. Filesets are operating system files, each of which has a control file (.fs extension) and data files. Unlike datasets, filesets preserve formatting and are readable by other applications. Complex Flat File : allows reading from complex file structures on a mainframe machine, such as MVS data sets, header and trailer structured files, files that contain multiple record types, QSAM and VSAM files.
External Source : permits reading data that is output from multiple source programs. External Target : permits writing data to one or more programs. Lookup File Set Stage : Set is similiar to FileSet stage. It is a partitioned hashed file which can be used for lookups.
Types of schema
Star Schema: A star schema is the one in which a central fact table is sourrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table.
Snow Flake Schema: A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions.
Galaxy Schema: Galaxy schema contains many fact tables with some common dimensions (conformed dimensions). This schema is a combination of many data marts.
Fact Constellation Schema: The dimensions in this schema are segregated into independent dimensions based on the levels of hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state and city; constellation schema would have five dimensions instead of one.