Data Warehouse Creation 1
Data Warehouse Creation 1
For this practice we will use the sales database (sales.xlsx file is available on Moodle).
This data warehouse is in the form of a star schema where we have one fact table and several
dimensions as presented below:
Open SSIS and create a new project called “sales_Dwh” and open SSMS to create a new database
called “sales_DWH”
1 2023/2024
1BA
For tables “customers”, “products” and “order_line” there is no need of any transformations and
thus, we create directly the dimensions. Indeed, we have to add the surrogate key for each
dimension:
Add OLE DB source and double click on it add a new connection manager to the DB
Add a new OLE DB destination and create a new connection manager to the “sales_Dwh”. Create the
table by adding the surrogate key
2 2023/2024
1BA
3 2023/2024
1BA
By adding the surrogate key, the data in the tables should be as follows:
4 2023/2024
1BA
In SSMS define the customer_key, order_line_key and product_key as primary keys (same for the rest
of the tables):
5 2023/2024
1BA
For the dimension Employees we have to divide the data into “sellers” and “employees” (the rest of
the employees) using the Conditional Split component that splits data based on a specific condition.
6 2023/2024
1BA
7 2023/2024
1BA
Now we will create “dim_order” and “dim_time” using the table “order”:
8 2023/2024
1BA
In this part of the project, we will use the Multicast component which distributes its input to one
or more output. The difference between the Conditional Split component and the Multicast is that
the latter directs every row to every output, and the former directs a row to a single output.
In order to create the “dim_date” dimension we will add a derived column components in order
to split each date to day, month, and year as follows:
9 2023/2024
1BA
Double click on the derived columns to add the necessary functions (i.e., DAY, MONTH, YEAR) to split
the date.
Double clock on the output “dim_order” to add the connection manager and similarly to the
“dim_date”.
10 2023/2024
1BA
Creation of “dim_time”:
11 2023/2024
1BA
12 2023/2024
1BA
In order to generate the fact table, we need to use as a source the “Order_line_dim” and lookups. A
lookup transformation performs lookups by joining data in input columns with columns in a
reference dataset.
Double click on the first lookup connection, specify the connection manager and the dimension (in
this case it is the dimension product) and then go to Columns. Choose product_key as lookup column
and match the product from the table “Dim_order_line” to the attribute “product_id” in the lookup.
13 2023/2024
1BA
14 2023/2024
1BA
15 2023/2024
1BA
Now link the first lookup with the second one as follows:
16 2023/2024
1BA
17 2023/2024
1BA
18 2023/2024
1BA
19 2023/2024
Dr. Rihab BOUSLAMA 1BA
20 2023/2024
Dr. Rihab BOUSLAMA 1BA
21 2023/2024
Dr. Rihab BOUSLAMA 1BA
22 2023/2024
Dr. Rihab BOUSLAMA 1BA
23 2023/2024
Dr. Rihab BOUSLAMA 1BA
Now add an OLE DB destination and create the new fact table as follows:
24 2023/2024
Dr. Rihab BOUSLAMA 1BA
25 2023/2024
Dr. Rihab BOUSLAMA 1BA
26 2023/2024
Dr. Rihab BOUSLAMA 1BA
27 2023/2024