0% found this document useful (0 votes)
33 views5 pages

SCD Types

Slowly changing dimensions
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views5 pages

SCD Types

Slowly changing dimensions
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

SCD types

Example: Reference Data , Master Data,

SCD : Slowly Changing Dimensions

Please refer to the Dimensions Table ppt to understand Dimensions table


and data.

Slowly Changing Dimensions (SCD) –

 Dimensions that change slowly over time, rather than changing on


regular schedule, time-base.
 In Data Warehouse there is a need to track changes in dimension
attributes in order to report historical data.
 In other words, implementing one of the SCD types should enable
users assigning proper dimension's attribute value for given date.
 Example of such dimensions could be: customer, geography,
employee.

There are many approaches how to deal with SCD. The most
popular are:

 Type 0 - The passive method


 Type 1 - Overwriting the old value
 Type 2 - Creating a new additional record
 Type 3 - Adding a new column
 Type 4 - Using historical table
 Type 6 - Combine approaches of types 1,2,3 (1+2+3=6)

Type 0 - The passive method. In this method no special action is


performed upon dimensional changes. Some dimension data can remain
the same as it was first time inserted, others may be overwritten.

Type 1 - Overwriting the old value. In this method no history of


dimension changes is kept in the database. The old dimension value is
simply overwritten be the new one. This type is easy to maintain and is
often use for data which changes are caused by processing
corrections(e.g. removal special characters, correcting spelling errors).
Before the change:
Customer_I Customer_Nam Customer_Typ
D e e
1 Cust_1 Corporate

After the change:


Customer_I Customer_Nam Customer_Typ
D e e
1 Cust_1 Retail

Type 2 - Creating a new additional record. In this methodology all history


of dimension changes is kept in the database. You capture attribute
change by adding a new row with a new surrogate key to the dimension
table. Both the prior and new rows contain as attributes the natural key(or
other durable identifier). Also 'effective date' and 'current indicator'
columns are used in this method. There could be only one record with
current indicator set to 'Y'. For 'effective date' columns, i.e. start_date and
end_date, the end_date for current record usually is set to value 9999-12-
31. Introducing changes to the dimensional model in type 2 could be very
expensive database operation so it is not recommended to use it in
dimensions where a new attribute could be added in the future.

Before the change:


Customer_I Customer_Nam Customer_Typ Start_Dat Current_Fla
End_Date
D e e e g
22-07- 31-12-
1 Cust_1 Corporate Y
2010 9999

After the change:


Customer_I Customer_Nam Customer_Typ Start_Dat Current_Fla
End_Date
D e e e g
22-07- 17-05-
1 Cust_1 Corporate N
2010 2012
18-05- 31-12-
2 Cust_1 Retail Y
2012 9999

Type 3 - Adding a new column. In this type usually only the current and
previous value of dimension is kept in the database. The new value is
loaded into 'current/new' column and the old one into 'old/previous'
column. Generally speaking the history is limited to the number of column
created for storing historical data. This is the least commonly needed
techinque.

Before the change:


Customer_I Customer_Nam Current_Typ Previous_Typ
D e e e
1 Cust_1 Corporate Corporate

After the change:


Customer_I Customer_Nam Current_Typ Previous_Typ
D e e e
1 Cust_1 Retail Corporate

Type 4 - Using historical table. In this method a separate historical table


is used to track all dimension's attribute historical changes for each of the
dimension. The 'main' dimension table keeps only the current data e.g.
customer and customer_history tables.

Current table:
Customer_I Customer_Nam Customer_Typ
D e e
1 Cust_1 Corporate

Historical table:
Customer_I Customer_Nam Customer_Typ Start_Dat
End_Date
D e e e
01-01- 21-07-
1 Cust_1 Retail
2010 2010
22-07- 17-05-
1 Cust_1 Oher
2010 2012
18-05- 31-12-
1 Cust_1 Corporate
2012 9999

Type 6 - Combine approaches of types 1,2,3 (1+2+3=6). In this type we


have in dimension table such additional columns as:
 current_type - for keeping current value of the attribute. All history
records for given item of attribute have the same current value.
 historical_type - for keeping historical value of the attribute. All history
records for given item of attribute could have different values.
 start_date - for keeping start date of 'effective date' of attribute's
history.
 end_date - for keeping end date of 'effective date' of attribute's history.
 current_flag - for keeping information about the most recent record.
In this method to capture attribute change we add a new record as in
type 2. The current_type information is overwritten with the new one as in
type 1. We store the history in a historical_column as in type 3.

Customer_I Customer_Na Current_Ty Historical_Ty End_D Current_Fla


Start_Date
D me pe pe ate g
21-07-
1 Cust_1 Corporate Retail 01-01-2010 N
2010
17-05-
2 Cust_1 Corporate Other 22-07-2010 N
2012
31-12-
3 Cust_1 Corporate Corporate 18-05-2012 Y
9999

2. Various types of TRANSFORMATIONS


I will explain this very soon..I have already provided list of various
Transformations.

3. Filters,Canned reports,Ad-hoc reports and Diff between among them

This is more about report testing than ETL testing. Reporting comes at the
end of ETL. No need of ETL testing to test reports . This is also like
Application testing only ( Reports Testing).
There are different types reports available. Various Reporting tools
supports various report types.
Reporting tools are different from ETL Tools ..Ex; SSRS, Data Reports and
Cognos.
Reports will be generated based on the data in the Data Marts,
Repositiories or DWH . Hence source for BI (Business Intelligence reports)
is Data Marts, Repositiories or DWH. As a ETL tester you are responsible
for Data Quality in these sources .

We will discuss more on this tomorrow.


4. Operational Data Store (ODS)
This is nothing but the source systems from which we get the data and
loaded in to our Data Warehouse.
Please refer to the ETL Diagram you have.
When these source systems were operation and Functional then these are
called ODS.
That means when source system online.
Typical ODS are OLTP application. Ex: ERP, SIEBEL and Sales force etc
etc./ Or any on line application as ICICIBank.com or ICICIDirect.com.
These is a possibility we extract data from some legacy systems which
are not operational. For example Text File ,Excel files, Archive tables.

You might also like