Data Warehouse Data Modeling and ETL Designs.
Data Warehouse Data Modeling and ETL Designs.
INTRODUCTION TO:
DATA WAREHOUSE:
DATA MODELING
AND ETL DESIGNS.
Introduction.
Star Schema.
workearly.gr
INTRODUCTION
If you are working data sets with B2B customers, the data sets will be most likely
diverse and make it difficult to perceive all the queries(kinds). To help with this
problem we have data modeling, which can be rather useful while designing ETL
systems.
If you have a business logic goal and design ETL correctly then you can achieve the
major goals of a data warehouse.
- Timeliness.
- Adaptability.
workearly.gr
THE GOALS OF A DATA WAREHOUSE.
The main goal of building a data warehouse is to make it easy for analysts, to write
analysis queries quickly and effectively.
- Descriptive Columns.
- Simple joins.
- Correct Aggregation.
workearly.gr
TRANSACTIONAL DATABASES AND DATA MOVEMENTS.
You must remember that the actual use case of your app or product is transactional
but the analysis that you are going to do is rarely going to be transactional.
You got to aggregate or analyze to discover information about the functioning of the
business processes.
That is why you have to ensure that there is no overlap between two events that you
recorded in your transactional database otherwise the aggregation would be wrong,
irreversibly wrong.
It is truly ideal when the data collected has primary keys. But more often than not this
is not the case. You have to have something to remove redundancy.
This is why we use different dimensions of the data to capture unique rows.
< Dimension tables are referred to by some as "descriptive context" the “who, what,
where, when, how, and why” of a facts. >
To query the fact tables and get some business insights, you have to consider the
information provided by the dimension tables.
That's why you must know how to leverage the dimensional modeling and query the
facts.
workearly.gr
STAR SCHEMA
In dimensional modeling, in some ways, the fact table is most clearly the
combination of dimensions it carries.
Each fact is identified as the unique intersection of values in each of its dimensions.
While there is often some sort of identifier that can serve as a primary key in a fact
table, standard data warehousing practice creates the primary key as a composite of
all the dimensions it carries.
For any combination of the dimensions, there is exactly one fact record, with measures
that can be aggregated and analyzed.
workearly.gr