Data Warehouse and Data Modelling
Data Warehouse and Data Modelling
Modelling
Data Modelling
• It is diagrammatic representation showing how the entities are related to each other. It is the initial step
towards database design. We first create the conceptual model, then logical model and finally move to
the physical model.
• Snowflake:
• In data warehouses. As the warehouse is Data Central for the company, we could save lot of space this way. Because in
some cases Dimension table can store lot of redundant information resulting in huge Dimension table
• Star Schema:
• In data marts. Data marts are subsets of data taken out of the central data warehouse. They are usually created for
different departments and don’t even contain all the history data. In this setting, saving storage space is not a priority
Types of Dimensions
• Conformed Dimensions:
• It is a dimension common dim_prescriber (physician universe) and dim_time_period shared across thetable
that is shared by multiple fact tables
Example: - suppose in a pharmaceutical organization we have different data marts based on field
forces. There can be a different data marts then dim_prescriber is called as conformed dimensions.
Prod_typ Cpn_typ Pymt_typ Junk_id
Junk Dimensions:
• Table composed of low cardinality column that do not have Online Yes Cash 1
place in fact table
Offlilne Yes Cash 2
• initial Transaction_tbl Online No Cash 3
(transaction_id,product_id,customer_id,emp_id,order_id,payme Offline No Cash 4
nt_id,coupon_id,amount,qty)
Online Yes Card 5
• Final Transaction_tbl Offlilne Yes Card 6
• (transaction_id,product_id,customer_id,emp_id,order_id,junk_i Online No Card 7
d,amount,qty)
Offline No Card 8
Junk table
Types of Dimensions
• Role Playing Dimensions:
• Dimensions utilized for multiple purpose in the same database.
• Example Dim time period.
• Degenerated Dimensions:-
• A dimensions which is not a fact but present in the fact table as a PK
• Example invoice number or order number.
UNION ALL
) staged_updates
ON customers.customerId = mergeKey
WHEN MATCHED AND customers.current = true AND customers.address <> staged_updates.address THEN
UPDATE SET current = false, endDate = staged_updates.effectiveDate -- Set current to false and endDate to source's effective date.
WHEN NOT MATCHED THEN
INSERT(customerid, address, current, effectivedate, enddate)
VALUES(staged_updates.customerId, staged_updates.address, true, staged_updates.effectiveDate, null) -- Set current to true along with the
Types of Dimensions
Shrunken Dimensions:
Shrunken dimensions are conformed dimensions that are a subset of rows and /or columns of a base
dimension. Shrunken rollup dimensions are required when constructing aggregate fact tables. They are also
necessary for business processes that naturally capture data at a higher level of granularity, such as a forecast by
month and brand (instead of the more atomic date and product associated with sales data). Another case of
conformed dimension subsetting occurs when two dimensions are at the same level of detail, but one represents
only a subset of rows.
Static Dimensions:
Static dimensions are not extracted from the original data source, but are created within the context of the data
warehouse. A static dimension can be loaded manually — for example with status codes — or it can be generated
by a procedure, such as a date or time dimension.
Types of Dimensions
• Late Arriving Dimensions:
• the natural key in the fact record has not yet been loaded in a related dimension preventing a successful foreign key
lookup to the dimension’s surrogate key