0% found this document useful (0 votes)
100 views15 pages

Operational Data Stores Data Warehouse: 8) What Is Ods Vs Datawarehouse?

ODS stands for operational data store and supports near real-time reporting using a short window of detailed data. A data warehouse contains entire historical data for long-term decision making and includes summarized and detailed data from multiple sources. An ODS is used at the operational level while a data warehouse is used at the managerial level.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views15 pages

Operational Data Stores Data Warehouse: 8) What Is Ods Vs Datawarehouse?

ODS stands for operational data store and supports near real-time reporting using a short window of detailed data. A data warehouse contains entire historical data for long-term decision making and includes summarized and detailed data from multiple sources. An ODS is used at the operational level while a data warehouse is used at the managerial level.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

8)what is ods vs datawarehouse?

Operational Data Stores Data Warehouse

ODS means for operational reporting and A data warehouse is intended for historical an


supports current or near real-time reporting trend analysis, usually reporting on a larg
requirements. volume of data.

An ODS consist of only a short window of data. A data warehouse includes the entire histor
of data.

It is typically detailed data only. It contains summarized and detailed data.

It is used for detailed decision making and It is used for long term decision making an
operational reporting. management reporting.

It is used at the operational level. It is used at the managerial level.

It serves as conduct for data between operational It serves as a repository for cleansed an
and analytics system. consolidated data sets.

It is updated often as the transactions system It is usually updated in batch processing mod
generates new data. on a set schedule.

7.)what is fact and types of facts


A fact table is the central table in a star schema of a data warehouse. ... Thus, the
fact table consists of two types of columns. The foreign keys column allows joins with
dimension tables, and the measures columns contain the data that is being analyzed.
There are three types of facts:
 Additive: Additive facts are facts that can be summed up through all of
the dimensions in the fact table.
 Semi-Additive: Semi-additive facts are facts that can be summed up for
some of the dimensions in the fact table, but not the others.
 Non-Additive: Non-additive facts are facts that cannot be summed up
for any of the dimensions present in the fact table.

6.)what is scd and its types

   SlowlyChanging Dimensions (SCD) – dimensions that change slowly over


time, rather than changing on regular schedule, time-base.
 
SCD1: It never maintains history in the target table. It keeps the most
recent updated record only in the data base.

SCD2: It maintains full history in the target. It maintains history by inserting


the new record and updating for each change.

SCD3: It keeps the both current and previous values only in the target.

5.) Data Warehouse Schema


In a data warehouse, a schema is used to define the way to organize the system with all the
database entities (fact tables, dimension tables) and their logical association.

Here are the different types of Schemas in DW:


1. Star Schema
2. Snowflake Schema
3. Galaxy Schema
4. Star Cluster Schema
#1) Star Schema
This is the simplest and most effective schema in a data warehouse. A fact table in the
center surrounded by multiple dimension tables resembles a star in the Star Schema model.

The fact table maintains one-to-many relations with all the dimension tables. Every row in a
fact table is associated with its dimension table rows with a foreign key reference.

Due to the above reason, navigation among the tables in this model is easy for querying
aggregated data. An end-user can easily understand this structure. Hence all the Business
Intelligence (BI) tools greatly support the Star schema model.

While designing star schemas the dimension tables are purposefully de-normalized. They
are wide with many attributes to store the contextual data for better analysis and reporting.

Benefits Of Star Schema


 Queries use very simple joins while retrieving the data and thereby query
performance is increased.
 It is simple to retrieve data for reporting, at any point of time for any period.
Disadvantages Of Star Schema
 If there are many changes in the requirements, the existing star schema is not
recommended to modify and reuse in the long run.
 Data redundancy is more as tables are not hierarchically divided.
An example of a Star Schema is given below.

Querying A Star Schema


An end-user can request a report using Business Intelligence tools. All such requests will be
processed by creating a chain of “SELECT queries” internally. The performance of these
queries will have an impact on the report execution time.

From the above Star schema example, if a business user wants to know how many Novels
and DVDs have been sold in the state of Kerala in January in 2018, then you can apply the
query as follows on Star schema tables:

SELECT    pdim.Name Product_Name,


                   Sum (sfact.sales_units) Quanity_Sold
FROM      Product pdim,
                   Sales sfact,
                   Store sdim,
                   Date ddim
WHERE sfact.product_id = pdim.product_id
                 AND sfact.store_id = sdim.store_id
                 AND sfact.date_id = ddim.date_id
                 AND sdim.state = 'Kerala'
                 AND ddim.month   = 1
                 AND ddim.year    = 2018
                 AND pdim.Name in (‘Novels’, ‘DVDs’)
GROUP BY pdim.Name
Results:
Product_Name Quantity_Sold

Novels 12,702

DVDs 32,919
Hope you understood how easy it is to query a Star Schema.

#2) SnowFlake Schema


Star schema acts as an input to design a SnowFlake schema. Snow flaking is a process
that completely normalizes all the dimension tables from a star schema.

The arrangement of a fact table in the center surrounded by multiple hierarchies of


dimension tables looks like a SnowFlake in the SnowFlake schema model. Every fact table
row is associated with its dimension table rows with a foreign key reference.

While designing SnowFlake schemas the dimension tables are purposefully normalized.
Foreign keys will be added to each level of the dimension tables to link to its parent
attribute. The complexity of the SnowFlake schema is directly proportional to the hierarchy
levels of the dimension tables.

Benefits of SnowFlake Schema:


 Data redundancy is completely removed by creating new dimension tables.
 When compared with star schema, less storage space is used by the Snow Flaking
dimension tables.
 It is easy to update (or) maintain the Snow Flaking tables.
Disadvantages of SnowFlake Schema:
 Due to normalized dimension tables, the ETL system has to load the number of
tables.
 You may need complex joins to perform a query due to the number of tables added.
Hence query performance will be degraded.
An example of a SnowFlake Schema is given below.
The Dimension Tables in the above SnowFlake Diagram are normalized as explained
below:
 Date dimension is normalized into Quarterly, Monthly and Weekly tables by leaving
foreign key ids in the Date table.
 The store dimension is normalized to comprise the table for State.
 The product dimension is normalized into Brand.
 In the Customer dimension, the attributes connected to the city are moved into the
new City table by leaving a foreign key id in the Customer table.
In the same way, a single dimension can maintain multiple levels of hierarchy.

Different levels of hierarchies from the above diagram can be referred to as follows:
 Quarterly id, Monthly id, and Weekly ids are the new surrogate keys that are created
for Date dimension hierarchies and those have been added as foreign keys in the
Date dimension table.
 State id is the new surrogate key created for Store dimension hierarchy and it has
been added as the foreign key in the Store dimension table.
 Brand id is the new surrogate key created for the Product dimension hierarchy and it
has been added as the foreign key in the Product dimension table.
 City id is the new surrogate key created for Customer dimension hierarchy and it has
been added as the foreign key in the Customer dimension table.
Querying A Snowflake Schema
We can generate the same kind of reports for end-users as that of star schema structures
with SnowFlake schemas as well. But the queries are a bit complicated here.

From the above SnowFlake schema example, we are going to generate the same query
that we have designed during the Star schema query example.

That is if a business user wants to know how many Novels and DVDs have been sold in the
state of Kerala in January in 2018, you can apply the query as follows on SnowFlake
schema tables.

SELECT    pdim.Name Product_Name,


                   Sum (sfact.sales_units) Quanity_Sold
FROM        Sales sfact
INNER JOIN Product pdim ON sfact.product_id = pdim.product_id
INNER JOIN Store sdim ON sfact.store_id = sdim.store_id
INNER JOIN State stdim ON sdim.state_id = stdim.state_id
INNER JOIN Date ddim ON sfact.date_id = ddim.date_id
INNER JOIN Month mdim ON ddim.month_id = mdim.month_id
WHERE stdim.state = 'Kerala'
                 AND mdim.month   = 1
                 AND ddim.year    = 2018
                 AND pdim.Name in (‘Novels’, ‘DVDs’)
GROUP BY pdim.Name
Results:
Product_Name Quantity_Sold

Novels 12,702

DVDs 32,919
Points To Remember While Querying Star (or) SnowFlake Schema Tables
Any query can be designed with the below structure:

SELECT Clause:
 The attributes specified in the select clause are shown in the query results.
 The Select statement also uses groups to find the aggregated values and hence we
must use group by clause in the where condition.
FROM Clause:
 All the essential fact tables and dimension tables have to be chosen as per the
context.
WHERE Clause:
 Appropriate dimension attributes are mentioned in the where clause by joining with
the fact table attributes. Surrogate keys from the dimension tables are joined with the
respective foreign keys from the fact tables to fix the range of data to be queried.
Please refer to the above-written star schema query example to understand this. You
can also filter data in the from clause itself if in case you are using inner/outer joins
there, as written in the SnowFlake schema example.
 Dimension attributes are also mentioned as constraints on data in the where clause.
 By filtering the data with all the above steps, appropriate data is returned for the
reports.
As per the business needs, you can add (or) remove the facts, dimensions, attributes, and
constraints to a star schema (or) SnowFlake schema query by following the above
structure. You can also add sub-queries (or) merge different query results to generate data
for any complex reports.

#3) Galaxy Schema


A galaxy schema is also known as Fact Constellation Schema. In this schema, multiple fact
tables share the same dimension tables. The arrangement of fact tables and dimension
tables looks like a collection of stars in the Galaxy schema model.

The shared dimensions in this model are known as Conformed dimensions.

This type of schema is used for sophisticated requirements and for aggregated fact tables
that are more complex to be supported by the Star schema (or) SnowFlake schema. This
schema is difficult to maintain due to its complexity.

An example of Galaxy Schema is given below.

#4) Star Cluster Schema


A SnowFlake schema with many dimension tables may need more complex joins while
querying. A star schema with fewer dimension tables may have more redundancy. Hence, a
star cluster schema came into the picture by combining the features of the above two
schemas.

Star schema is the base to design a star cluster schema and few essential dimension tables
from the star schema are snowflaked and this, in turn, forms a more stable schema
structure.

An example of a Star Cluster Schema is given below.

Which Is Better Snowflake Schema Or Star Schema?


The data warehouse platform and the BI tools used in your DW system will play a vital role
in deciding the suitable schema to be designed. Star and SnowFlake are the most
frequently used schemas in DW.
Star schema is preferred if BI tools allow business users to easily interact with the table
structures with simple queries. The SnowFlake schema is preferred if BI tools are more
complicated for the business users to interact directly with the table structures due to more
joins and complex queries.

You can go ahead with the SnowFlake schema either if you want to save some storage
space or if your DW system has optimized tools to design this schema.

Star Schema Vs Snowflake Schema


Given below are the key differences between Star schema and SnowFlake schema.

S.N
Star Schema Snow Flake Schema
o

1 Data redundancy is more. Data redundancy is less.

2 Storage space for dimension tables is more. Storage space for dimension tables is comparativ
less.

3 Contains de-normalized dimension tables. Contains normalized dimension tables.

4 Single fact table is surrounded by multiple Single fact table is surrounded by multiple hiera
dimension tables. dimension tables.

5 Queries use direct joins between fact and Queries use complex joins between fact and dim
dimensions to fetch the data. to fetch the data.

6 Query execution time is less. Query execution time is more.

7 Anyone can easily understand and design the It is tough to understand and design the schema.
schema.

8 Uses top down approach. Uses bottom up approach.

4.) Popular ETL tools : 

1. Xplenty – 
Xplenty is a cloud-based ETL solution which requires no coding and
provides simple visualized interface for performing ETL activities. It also
connects with a large variety of data sources. 
 
2. IBM – DataStage – 
It is a business intelligence tool for integrating data across various enterprise
systems, it is part of IBM information platforms solution suite it uses
visualized notation to making etl processes, it is a powerful data integration
tool. 
 
3. Informatica – 
Informatica is leading market in data integration, Informatica’s suite of data
integration software includes PowerCenter, which is known for its strong
automation capabilities. Informatica PowerCenter is developed by
Informatica Corporation. Informatica PowerCenter can connect to many
sources for fetching data for data integration. 
Informatica PowerCenter have four client tools which is used in development
process. 
 PowerCenter Designer
 Workflow Manager
 Workflow Monitor
 Repository Manager
4. Microsoft SQL Server SSIS – 
Microsoft offers SSIS, a graphical interface for managing ETL using MS SQL
Server. SSIS have user friendly interface, allowing users to deploy integrated
data warehousing solutions without having to get involved with writing lots of
code. SSIS is a fast and flexible data warehousing tool. The graphical
interface allows for easy drag-and-drop ETL for multiple data types and
warehouse destinations. 
 
5. Talend – 
Talend is open source software which integrate, cleanse profile data and
helps you get business insights easily. Talend has a GUI that enables
managing a large number of source systems. This tool has Master Data
Management(MDM) functionality. It also provides metadata repository using
which user can easily re-use work. 
 
6. Azure Data Factory – 
Microsoft Azure Data Factory is a cloud based data integration service that
automates the ETL process. We can say it is SSIS in the cloud because they
share same idea but SSIS provide more powerful GUI, debugging and
intelligence tools. 
 
7. Oracle Data Integrator – 
Oracle Data Integrator is based on Extract, load and transform (ELT)
architecture which means it performs load first then transform data. This tool
is produced by Oracle that offers a graphical environment and it is also very
cost effective.
8. data junction
9. warehouse builder.

3.what is surrogate key

SURROGATE KEY:
A surrogate key is like a artificial primary key which is generated automatically
by the system and the value of surrogate key is numeric and it is
automatically incremented for each new row.
Generally, a DBMS designer needs a surrogate key when the primary key is
used inappropriately.
Features of the surrogate key :
 It is automatically generated by the system.
 It holds anonymous integer.
 It contains unique value for all records of the table.
 The value can never be modified by the user or application.
 Surrogate key is called the fact less key as it is added just for our ease of
identification of unique values and contains no relevant fact(or information)
that is useful for the table.

 
 

2. DATAWAREHOUSE AND DATA MINING


FIGURE1: DATAWAREHOUSE PROCESS:
Figure2 – DATA MINING PROCESS

Comparison between data mining and data warehousing:


Data Warehousing Data Mining
A data warehouse is database system which is Data mining is the process of
designed for analytical analysis instead of
transactional work. analyzing data patterns.

Data is stored periodically. Data is analyzed regularly.

Data mining is the use of pattern


Data warehousing is the process of extracting recognition logic to identify
and storing data to allow easier reporting. patterns

Data warehousing is solely carried out by Data mining is carried by business


engineers. users with the help of engineers.

Data mining is considered as a


Data warehousing is the process of pooling all process of extracting data from
relevant data together. large data sets.

1.  Fact table And Dimension table


Dimension table: Non measurable, primary_keys
Fact tables: measurable and foreign_keys

Difference between Fact Table and Dimension Table: 


Fact Table Dimension Table
S.NO

Fact table contains the measuring


on the attributes of a dimension Dimension table contains the attributes on
1. table. that truth table calculates the metric.

In fact table, There is less While in dimension table, There is more


2. attributes than dimension table. attributes than fact table.

In fact table, There is more While in dimension table, There is less


3. records than dimension table. records than fact table.
Fact Table Dimension Table
S.NO

While dimension table forms a horizontal


4. Fact table forms a vertical table. table.

The attribute format of fact table


is in numerical format and text While the attribute format of dimension
5. format. table is in text format.

6. It comes after dimension table. While it comes before fact table.

The number of fact table is less


than dimension table in a While the number of dimension is more
7. schema. than fact table in a schema.

While the main task of dimension table is to


It is used for analysis purpose store the information about a business and
8. and decision making. its process.

You might also like