Datastage Anwers
Datastage Anwers
3. Data Marts
5. Definitions
Data Warehouse (DWH) – A centralized repository for structured data
used for reporting and analytics.
ETL (Extract, Transform, Load) – A process for moving data from multiple
sources into a warehouse.
Fact Table – Stores measurable business events (e.g., sales, revenue).
Dimension Table – Stores descriptive attributes (e.g., product details,
customer info).
Data Mart – A subset of a data warehouse focused on a specific
business function.
OLAP (Online Analytical Processing) – Enables fast multi-dimensional
data analysis.
Schema – Defines the structure of the data warehouse (Star, Snowflake,
etc.).
SCD (Slowly Changing Dimensions) – Tracks historical changes in
dimension tables (Types 1, 2, 3).
6. ETL Process
Steps in ETL:
1. Extract – Retrieve data from sources (Databases, APIs, Files).
2. Transform – Data cleaning, deduplication, aggregation, and formatting.
3. Load – Store the cleaned data in the warehouse.
Example ETL Process for an E-Commerce Company:
Extract: Pull order data from MySQL, Salesforce, Google Analytics.
Transform: Standardize product names, remove duplicates, calculate
total revenue.
Load: Store the processed data in Amazon Redshift.
138. Sequence
✅ A sequence is a collection of jobs linked together in an execution flow.
🔹 Example: A complete ETL process that extracts, transforms, and loads data is
structured as a sequence.
Containers in DataStage
171. Auto-Purging
✅ Automatically removes old job logs to improve performance.
🔹 How to enable auto-purge:
1️⃣ Open DataStage Administrator
2️⃣ Select a Project
3️⃣ Navigate to Auto Purge Settings
4️⃣ Set log retention period (e.g., Keep logs for 10 days)
5️⃣ Click Apply
🔹 Example: If auto-purge is set to 7 days, logs older than a week are
automatically deleted.
194. Palette 🎨
The Palette contains all DataStage stages used for ETL job design. It is divided
into:
1️⃣ Database Stages (Oracle, DB2, SQL Server)
2️⃣ File Stages (Sequential File, Dataset)
3️⃣ Processing Stages (Transformer, Aggregator)
4️⃣ Debug Stages (Peek, Head, Tail)
🔹 Example: A developer drags a Transformer Stage from the Palette into the
job canvas.
205. Annotations
✅ Text boxes added to jobs for documentation and clarity.
🔹 Example: A job contains an annotation "This job loads daily transactions" for
future reference.
Key Takeaways 🎯
✅ Stages are the building blocks of DataStage jobs.
✅ Passive vs Active Stages – Passive read/write data, Active process it.
✅ Partitioning, Debugging, and Multiple Instances improve performance.
✅ Annotations, Job Compilation, and Batch Processing enhance efficiency.
231. Constraints
✅ Used to filter or redirect records based on conditions.
🔹 Example:
Send Valid Orders to Target.
Send Invalid Orders to Reject File.
IF Order_Amount > 0 THEN Target ELSE Reject;
232. Derivations
✅ Used to define expressions in transformers.
🔹 Example:
Calculate Net Price = Price * Quantity - Discount.
Net_Price = Price * Quantity - Discount;
📌 Key Takeaways
✅ Transformer stage is the heart of DataStage.
✅ Lookup, Join, and Merge are used for combining data.
✅ Aggregator, Sort, and Remove Duplicates help in data summarization.
✅ Pivot and Surrogate Key are essential for data warehousing.