DataWareHouse Notes
DataWareHouse Notes
analytics (OLAP-OnLine Analytical Processing). Supports ML, AI, data mining, OLAP and reporting.
Another def:- Subject/business oriented (customer/supplier/product/sales etc.), integrated (data collected from
multiple data sources), time-variant (timely collection of data over period) and non-volatile (existing data is not changed
just new data appended) collection of data to support mgmt. decision making process.
DWH provided on appliances, on-cloud, on-premises and mixed solutions by IBM, Oracle, Microsoft, amazon, Google etc.
Data marts:- domain/user/business function specific repository system (Type- Independent, dependent, hybrid). Specific
schema data repository for ease of retrieval and for analytics.
Data lake:- Repository of raw data in its native form without any preprocessing. For structured, semi-structured and
unstructured data. Cons- Data duplication lead to storage excess and less data quality
Data lakehouse:- To ensure optimized data quality with less storage costs and with schematic data. Pros of both DWH
and Datalake.
FACT- quantitative/aggregated data of business processes, contains foreign keys to dimension tables
Data Modeling into FLAT schema, STAR schema or SNOWFLAKE schema depending upon the storage/query processing
requirement.
Star schemas are optimized for reads and are widely used for designing data marts(query boost), whereas snowflake
schemas are optimized for writes and are widely used for transactional data warehousing(writing/size boost).
Normalization reduces redundancy, data size (5 NF types)
DWH architecture:-
1.
Question 1
What do we call a normalized version of the star schema?
1 / 1 point
Product schema
Normalized schema
Parent dimension
Snowflake schema
Correct
Correct, the normalized version of the star schema is called a snowflake schema, due to its multiple layers of
branching which resembles a snowflake pattern.
2.
Question 2
Considering a general architectural model for an Enterprise Data Warehouse, which of these components is holding
data and developing workflows?
1 / 1 point
Enterprise data warehouse repository
Staging and sandbox areas
Data sources
Data marts
Correct
Correct, these components are holding data and developing workflows.
3.
Question 3
Materialized Views can be set up to have different refresh options, such as: (Select 1 answer).
1 / 1 point
Populated
Never, upon request, and immediately
Automatically
Manually refresh
Correct
Materialized Views can be set up to have different refresh options, such as “never” (they are only populated when
created, which is useful if the data seldom changes), “upon request” (manually refresh, for example, after changes
to the data have been made, or scheduled refresh, for example, after daily data loads), and “immediately”
(automatically refresh after every statement).
4.
Question 4
Accumulating snapshot fact tables are used to __________.
0 / 1 point
extract data
process events
load data
record events
Incorrect
Incorrect, please review the Facts and Dimensional Modeling video.
5.
Question 5
In what location is data from source systems extracted to?
1 / 1 point
Target systems
Operating system
Staging area
Business intelligence platform
Correct
Correct, a staging area is a separate location where data from source systems is extracted to.
6.
Question 6
Materialized views can be used to __________.
1 / 1 point
safely work with affecting source database
automatically safe query results
replicate data
synchronize updates
Correct
Correct, they can be used to replicate data, for example to be used in a staging database