0% found this document useful (0 votes)
21 views2 pages

Wancerz

This document discusses methods for managing data history in databases and data marts. It describes three main types of slowly changing dimensions (SCDs) - Type 1, Type 2, and Type 3. Type 1 involves overwriting old values, Type 2 adds new records with effective dates, and Type 3 adds a new column to track changes. The advantages and disadvantages of each type are explained. Examples are provided to illustrate how each type handles changes to dimension data over time. Finally, the document notes that newer SCD types have been developed beyond the original three.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views2 pages

Wancerz

This document discusses methods for managing data history in databases and data marts. It describes three main types of slowly changing dimensions (SCDs) - Type 1, Type 2, and Type 3. Type 1 involves overwriting old values, Type 2 adds new records with effective dates, and Type 3 adds a new column to track changes. The advantages and disadvantages of each type are explained. Examples are provided to illustrate how each type handles changes to dimension data over time. Finally, the document notes that newer SCD types have been developed beyond the original three.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

ISSN 2083-0157 IAPGOŚ 3/2013 55

HISTORY MANAGEMENT OF DATA –


SLOWLY CHANGING DIMENSIONS
Marek Wancerz, Paweł Wancerz
Lublin University of Technology, Faculty of Electrical Engineering and Computer Science

Abstract: The article describes few methods of managing data history in databases and data marts. There are many types of dealing with the history
of the data. This article will show us some examples, point advantages and disadvantages of each of the method and show us possible scenarios of use.
Keywords: managing data history, databases, data marts

ZARZĄDZANIE HISTORIĄ DANYCH –


SLOWLY CHANGING DIMENSIONS
Streszczenie: Artykuł opisuje sposób zarządzania historią tabel wymiarowych w bazach danych i hurtowniach danych. Istnieje kilka sposobów
na archiwizowanie historii. Artykuł ma na celu przybliżenie ich funkcjonalności popartej przykładami, wskazanie zalet i wad oraz możliwych
scenariuszy użycia.
Słowa kluczowe: zarządzanie historią, bazy danych, hurtownie danych

Introduction 2. Slowly Changing Dimensions – main types


Nowadays, almost everyone use data in they lives. But how The easiest way to discuss about the Slowly Changing
to understand word „data”? In IT, we can name it as a set Dimensions types is to go through all of them with some
of values or variables belonging to a set of items. It is very often examples, pointing advantages and disadvantages and possible
represented in a tabular form (columns + rows), data tree (parent- usage scenarios. We have 3 basic types of Slowly Changing
child relationship) or in a graphical structure (tables models Dimensions.
with graphical representation). The data we keep doesn't have I. SCD Type 1 – overwriting the old values
to be in a text form. We can keep it as a number or even an image. Figure (Fig. 2) shows the change of data for an item. The Cat-
The data kept in our databases (or data marts) and its quality egory for an Item RoboticBook was Science Fiction in 2012.
gives us a huge advantage for the data visualization In 2013 the Item changed its Category to Reality.
and management.
But we have to remember that the dimensional data is not
a stable entity and it might change over time. To manage the data
history Slowly Changing Dimensions was invented.

1. Slowly Changing Dimensions overview


Slowly Changing Dimensions was invented by Ralph Kimball,
who is regarded as one of the original architects of data ware- Fig. 2. SCD Type 1 overview
housing. His methodology became a standard. Slowly Changing
The change of Category is a result of an error or a change
Dimensions is a set of methods to manage the data history
of structure. Nevertheless it sometimes doesn't meet business
in the Dimension tables.
requirements. The big advantage for such a method
Data might change over time and we should take it into
of development is the simplicity of database structure and ETL
account while developing our system (Fig. 1):
system. But main disadvantage of this kind of method is the fact
• Source data–source data should be delivered in a unified form, that we lose the data history! We can only see descriptive
• ETL (Extract Transform Load) – the ETL system should have attributes as they exist today. To give a better understanding
the mechanism to operate with data incoming to the target of business requirements inaccuracy after such a change, please
system. All bigger ETL tools have the option SCD already have a look the picture below (Fig. 3).
implemented (as an option to use) – Informatica PowerCenter
Tool, SSIS, Oracle Warehous Builder,
• Database structure – based on the method of history
management, the database structure has to be adapted.

Fig. 3. SCD Type 1 example

There are 2 People in dimension Person with their payments


in a separate fact table – Payments. Then on 30/03/2013 there
is a new payment from the same person – Smith but under a new
Fig. 1. Data mart load process address (it was changed in the Dimensional table).
When we analyze the Payments by City, we will see that Texas
To understand the way of data history management we should has 500 but in fact it has only 200 (300 was NewYork
take a look on the main methods. but it is no longer available!).

artykuł recenzowany/revised paper IAPGOS, 2013, nr 3, 55-56


56 IAPGOŚ 3/2013 ISSN 2083-0157

II. SCD Type 2– new record in the dimension 3. Conclusion


Let’s take the same example as in Type 1. The same sets
of values were assigned to the Item, but with additional fields – We have analyzed 3 types of managing historical data
Effective Date, Expiration Date, FlagCurrent (Fig. 4). in Dimensional tables. Each of them has advantages
and disadvantages and can be used in totally different business
needs. It is up the data modeler to set up such an environment
to make it easy to implement, maintain and develop. The most
popular method (from those 3) is definitely the SCD Type 2 which
gives us a full history of a Dimension value and helps us to build
reports not only on current but also on the historical data.
The types we described together with the whole concept were
invented in 90'. But after publish of SCD's 1, 2, 3 Kimball Group
Fig. 4. SCD Type 2 overview
started working on modifications of methods, their fusions
As we can see, for Type 2 Dimension, we have 3 additional and new ones. The result is a new book The Data Warehouse
indicators which help us control the data. The dates show us the Toolkit (Wiley, Jun/Jul 2013) where we can find 7 Types
period of time when the Item is valid. The Current flag is giving of SCD's! You can check the overview here on the Figure (Fig. 7).
us a fast information if the row is Valid (Y) or not (N). The huge I will describe them in the next publication.
advantage for this approach is that we keep all the history rows
in the dimension and we track all the historical entries. On the
other hand, this approach is more complicated for the end user
(report developer). The dimension table growth has also be taken
into account while development of the project schema.
Let’s have a look again at the example from SCD Type 1.
But here we will use SCD Type 2 for history data management.
(Fig. 5)

Fig. 7. New types of SCD [5]


Fig. 5. SCD Type 2 example
Bibliography
This example shows us correct values grouped by Cities.
This is because we created a new row for the changed Smith [1] DataModelling: http://www.learndatamodeling.com.
person with updated City. [2] Dimensional Modeling in Depth (Kimball, Ross) – coursebook.
[3] The Data Warehouse Toolkit (Wiley, 2013).
III. SCD Type 3– new dimension column [4] Informatica PowerCenter official: http://www.informatica.com.
[5] Kimball Group webpage: http://www.kimballgroup.com.
Let’s have a look at the last primary SCD – Type 3. The same
example will be taken into account while trying to visualize Dr inż. Marek Wancerz
the method. (Fig. 6) e-mail: [email protected]

Mark Wancerz graduated from Faculty of Electrical


Engineering of the Technical University of Lublin.
He currently works in the Department of Network and
Security. His research interests revolve around issues
of system protection, power system security
and the use of information technology and databases
in the energy sector. Co-author of many national
and international publications.

Fig. 6. SCD Type 2 overview


Mgr inż. Pawel Wancerz
e-mail: [email protected]
In this method, a new dimension column is created to keep
the historical value of the item. This kind of method is used Pawel Wancerz is an employee of Atos IT Solutions &
relatively infrequently. Type 3 SCD is good for tracking soft Services Company located in Wroclaw.
He is currently attending PhD Studies at University
changes, like item or business reorganization. It gives a good view of Technology in Lublin. His major interests in IT are
of the situation today and prior the change. But in case of more Business Intelligence, Data Warehousing and ETL.
frequent and important changes this method will lose the historical He participated in many professional courses which
data, as only current and original values are retained. The history enabled him to acquire in-depth knowledge
in this field.
of changes can't be reproduced as it is done in SCD Type 2.

otrzymano/received: 14.05.2013 przyjęto do druku/accepted: 18.07.2013

You might also like