0% found this document useful (0 votes)
32 views13 pages

What Is The Difference Between Star Schema and Snow Flake Schema ?and When We Use Those Schema's?

The document discusses time dimensions in data warehouses. It states that time dimensions must be loaded manually using scripts or stored procedures. The time dimension contains the lowest level of time granularity needed by the business, such as day, week, month. Depending on the frequency of data loads, the time dimension is updated - weekly loads update weekly, monthly loads update monthly. Procedures are used to populate the time dimension by iterating through dates and calculating time attributes like quarter, month, year to load historical data in one run.

Uploaded by

saprsa1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views13 pages

What Is The Difference Between Star Schema and Snow Flake Schema ?and When We Use Those Schema's?

The document discusses time dimensions in data warehouses. It states that time dimensions must be loaded manually using scripts or stored procedures. The time dimension contains the lowest level of time granularity needed by the business, such as day, week, month. Depending on the frequency of data loads, the time dimension is updated - weekly loads update weekly, monthly loads update monthly. Procedures are used to populate the time dimension by iterating through dates and calculating time attributes like quarter, month, year to load historical data in one run.

Uploaded by

saprsa1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

1) what is the difference between star schema and snow flake schema ?

and
when we use those schema's?

A)Star Schema : Star Schema is a relational database schema for


representing multimensional data. It is the simplest form of data
warehouse schema that contains one or more dimensions and fact
tables. It is called a star schema because the entity-relationship
diagram between dimensions and fact tables resembles a star where
one fact table is connected to multiple dimensions. The center of the
star schema consists of a large fact table and it points towards the
dimension tables. The advantage of star schema are slicing down,
performance increase and easy understanding of data.
Snowflake Schema : A snowflake schema is a term that describes a
star schema structure normalized through the use of outrigger tables.
i.e dimension table hierachies are broken into simpler tables.
In a star schema every dimension will have a primary key.
In a star schema, a dimension table will not have any parent table.
Whereas in a snow flake schema, a dimension table will have one
or more parent tables.
Hierarchies for the dimensions are stored in the dimensional table
itself in star schema.
Whereas hierachies are broken into separate tables in snow flake
schema. These hierachies helps to drill down the data from topmost
hierachies to the lowermost hierarchies.
B) Star schema : In this star schema fact table in normalized format
and dimension table is in de normalized format. It also known as basic
star schema.
Snow flake schema: In this both dimension and fact table is in
normalized format only. It is also knwon as Extended star schema.
If u r taking the snow flake it requires more dimensions, more foreign
keys, and it will reduse the query performance, but it normalizes the
records.,
depends on the requirement we can choose the schema

C) Both these schemas are generally used in Datawarehousing.


Star schema as the name indicates resembles the form of star as there
is only one fact table which is associated with numerous dimension
tables (with the help of foreign keys).This schema depicts a high level
of denormalized view of data.
However in Snowflake schema the dimension tables are further
normalized into different tables (fact table is single in this schema
also).This schema stores data in a more normalized form .
It depends on scenario as how much data is generally there in the
datawarehouse,generally star schema is preferred.

2)Summarize the differene between OLTP,ODS AND DATA WAREHOUSE


?
A)oltp - means online transaction processiing ,it is nothing but a database ,we are
calling oracle,sqlserver,db2 are olap tools.
OLTP databases, as the name implies, handle real time transactions which inherently have
some special requirements.
ODS- stands for Operational Data Store.Its a final integration point ETL process we load
the data in ODS before you load the values in target..
DataWareHouse- Datawarehouse is collection of integrated,time
varient,non volotile and time varient collection of data which is used to
take management decisions.

B) ODS: this is operational data stores, which means the real time
transactional databases. In data warehouse, we extract the data from
ODS, transform in the stagging area and load into the target data
warehouse.
I think, earlier comments on the ODS is little bit confusing.

3)What is the role of surrogate keys in data warehouse and how will u
generate them?
A) A surrogate key is a simple Primary key which maps one to one
with a Natural compound Primary key. The reason for using
them is to alleviate the need for the query writer to know the full
compound key and also to speed query processing by removing
the need for the RDBMS to process the full compound key when
considering a join.
For example, an shipment could have a natural key of ORDER +
ITEM + SHIPMENT_SEQ. By giving it a unique SHIPMENT_ID,
subordinate tables can access it with a single attribute, rather
than 3. However, it's important to create a unique index on the
natural key as well.

4) what is data cleaning? how is it done?


A)I can simply say it as Purifying the data.
Data Cleansing: the act of detecting and removing and/or correcting a databases dirty
data (i.e., data that is incorrect, out-of-date, redundant, incomplete, or formatted
incorrectly)
B) data clensing is nothing but standidizing and reformatting
( encoding,decoding,data type conversion) the data before we
store the data in the warehouse.
5)
A )junk dimension is a collection of random transcational codes,
flags and text attributes that are unrelated to any particular
dimension.The junk dimension is simply a structure that provides
the convienent place to store the junk dimension.
C) A "junk" dimension is a collection of random transactional codes,
flags and/or text attributes that are unrelated to any particular
dimension. The junk dimension is simply a structure that
provides a convenient place to store the junk attributes.where
asA degenerate dimension is data that is dimensional in nature
but stored in a fact table.

D)junk dimension
the column which we are using rarely or not used, these columns are
formed a dimension is called junk dimension
degenerative dimension
the column which we use in dimension are degenerative dimension
ex
emp table has empno.ename,sal,job,deptno
but
we are talking only the column empno,ename from the emp table and
forming a dimension this is called degenerative dimension
D) Junk dimension: Grouping of Random flags and text Attributes in
a dimension and moving them to a separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact
table ex: Consider a Dimension table with fields like order
number and order line number and have 1:1 relationship with
Fact table, In this case this dimension is removed and the order
information will be directly stored in a Fact table inorder
eliminate unneccessary joins while retrieving order information..
E)A junk dimension is a convenient grouping of flags and indicators. It's helpful,
but not absolutely
required, if there's a positive correlation among the values.
Their benefits:
Provide a recognizable, user-intuitive location for related codes, indicators and
their
descriptors in a dimensional framework
Clean up a cluttered design that already has too many dimensions. There might
be five
or more indicators that could be collapsed into a single 4-byte integer surrogate
key in
the fact table

Provide a smaller, quicker point of entry for queries compared to performance


from
constraining directly on these attributes in the fact table. If your database
supports bitmapped indices, this potential benefit may be irrelevant, although the others are
still
valid.
6) LOOKUP?
A)When a table is used to check for some data for its presence prior
to loading of some other data or the same data to another table,
the table is called a LOOKUP Table.
B) When a value for the column in the target table is looked up
from another table apart from the source tables, that table is called
the lookup table.
C) when we want to get related value from some other table based
on particular value... suppose in one table A we have two columns
emp_id,name and in other table B we have emp_id adress in target
table we want to have emp_id,name,address we will take source as
table A and look up table as B by matching EMp_id we will get the
result as three columns...emp_id,name,address
E) A lookup table is nothing but a 'lookup' it give values to
referenced table (it is a reference), it is used at the run time, it
saves joins and space in terms of transformations. Example, a
lookup table called states, provide actual state name ('Texas') in
place of TX to the output.
F) hi,The Look Up table provides the detailed information about the
attributes.For example, the lookup table for the quarter attribute
would include a list of all the quarters available in the data
warehouse.i.e., first quarter of 2001 may be represented as "Q1
2001" or "2001 Q1".BYE.
G)hi
if the data is not available in the source systems then we have to get
the data by some reference tables which are present in the
database.these tables are called lookuptables
for example while loading the data from oltp to olap,we have only
natural keys in oltp we don't have the respected wh keys so we take

the targetdimension table as lookup if the two natural keys match then
get the value of the wh key

7)
A)In Dataware house we manually load the time dimension
B) Every Datawarehouse maintains a time dimension. It would be at
the most granular level at which the business runs at (ex: week
day, day of the month and so on). Depending on the data loads,
these time dimensions are updated. Weekly process gets updated
every week and monthly process, every month.
C) Time dimension in DWH must be load Manually. we load
data into Time dimension using pl/sql scripts.
D) Generally we load the Time dimension by using SourceStage as
a Seq File and we use one passive stage in that transformer stage
we will manually write functions as Month and Year Functions to
load the time dimensions but for the lower level i.e., Day also we
have one function to implement loading of Time Dimension.
E) create a procedure to load data into Time Dimension. The procedure
needs to run only once to popullate all the data. For eg, the code
below fills up till 2015. You can modify the code to suit the feilds in ur
table.
create or replace procedure
QISODS.Insert_W_DAY_D_PR as
LastSeqID number default 0;
loaddate Date default to_date('12/31/1979','mm/dd/yyyy');
begin
Loop
LastSeqID := LastSeqID + 1;
loaddate := loaddate + 1;
INSERT into QISODS.W_DAY_D values(
LastSeqID,
Trunc(loaddate),
Decode(TO_CHAR(loaddate,'Q'),'1',1,decode(to_char(loaddate,'Q'),'2',
1,2)
),
TO_FLOAT(TO_CHAR(loaddate, 'MM')),
TO_FLOAT(TO_CHAR(loaddate, 'Q')),
trunc((ROUND(TO_DECIMAL(to_char(loaddate,'DDD'))) +
ROUND(TO_DECIMAL(to_char(trunc(loaddate, 'YYYY'), 'D')))+ 5) / 7),
TO_FLOAT(TO_CHAR(loaddate, 'YYYY')),
TO_FLOAT(TO_CHAR(loaddate, 'DD')),

TO_FLOAT(TO_CHAR(loaddate, 'D')),
TO_FLOAT(TO_CHAR(loaddate, 'DDD')),
1,
1,
1,
1,
1,
TO_FLOAT(TO_CHAR(loaddate, 'J')),
((TO_FLOAT(TO_CHAR(loaddate, 'YYYY')) + 4713) * 12) +
TO_number(TO_CHAR(loaddate, 'MM')),
((TO_FLOAT(TO_CHAR(loaddate, 'YYYY')) + 4713) * 4) +
TO_number(TO_CHAR(loaddate, 'Q')),
TO_FLOAT(TO_CHAR(loaddate, 'J'))/7,
TO_FLOAT (TO_CHAR (loaddate,'YYYY')) + 4713,
TO_CHAR(load_date, 'Day'),
TO_CHAR(loaddate, 'Month'),
Decode(To_Char(loaddate,'D'),'7','weekend','6','weekend','weekday'),
Trunc(loaddate,'DAY') + 1,
Decode(Last_Day(loaddate),loaddate,'y','n'),
to_char(loaddate,'YYYYMM'),
to_char(loaddate,'YYYY') || ' Half' ||
Decode(TO_CHAR(loaddate,'Q'),'1',1,decode(to_char(loaddate,'Q'),'2',
1,2)
),
TO_CHAR(loaddate, 'YYYY / MM'),
TO_CHAR(loaddate, 'YYYY') ||' Q ' ||
TRUNC(TO_number( TO_CHAR(loaddate,
'Q')) ) ,
TO_CHAR(loaddate, 'YYYY') ||' Week'||
TRUNC(TO_number( TO_CHAR(loaddate,
'WW'))),
TO_CHAR(loaddate,'YYYY'));
If loaddate=to_Date('12/31/2015','mm/dd/yyyy') Then
Exit;
End If;
End Loop;
commit;
end Insert_W_DAY_D_PR;
8) Difference between Snow flake and Star Schema. What are situations
where Snow flake Schema is better
a) star schema and snowflake both serve the purpose of
dimensional modeling when it come to datawarehouses.
star schema is a dimensional model with a fact table ( large) and

a set of dimension tables ( small) . the whole set-up is totally


denormalized.
however in cases where the dimension table are split to many
table that is where the schema is slighly inclined towards
normalization ( reduce redundancy and dependency) there
comes the snow flake schema.
the nature/purpose of the data that is to be feed to the model is
the key to your question as to which is better.
b)Star schema contains the dimesion tables mapped around one or
more fact tables.
It is a denormalised model.
No need to use complicated joins.
Queries results fastly.
Snowflake schema
It is the normalised form of Star schema.
contains indepth joins ,bcas the tbales r splitted in to many pieces.We
can easily do modification directly in the tables.
We hav to use comlicated joins ,since we hav more tables .
There will be some delay in processing the Query .
C) Star Schema means
A centralized fact table and sarounded by diffrent dimensions
Snowflake means
In the same star schema dimensions split into another dimensions
Star Schema contains Highly Denormalized Data
Snow flake contains Partially normalized
Star can not have parent table
But snow flake contain parent tables

Why need to go there Star:


Here 1)less joiners contains
2)simply database
3)support drilling up options
Why nedd to go Snowflake schema:
Here some times we used to provide seperate dimensions from
existing dimensions that time we will go to snowflake
Dis Advantage Of snowflake:
Query performance is very low because more joiners is there
Enjoy n all the best
d) veepee Wrote: star schema and snowflake both serve the
purpose of dimensional modeling when it come to datawarehouses.
star schema is a dimensional model with a fact table ( large) and a
set of dimension tables ( small) . the whole set-up is totally
denormalized.
however in cases where the dimension table are split to many table
that is where the schema is slighly inclined towards normalization
( reduce redundancy and dependency) there comes the snow flake
schema.
the nature/purpose of the data that is to be feed to the model is
the key to your question as to which is better.
e) Both represent the dimensional model, in case of star schema
the dimensons does not split ....where as in the case of snowflake u
can see the further split in dimension for eg: if u r using more than
one telephone at ur desk and it is available to more than one and at
the same time the telephone gives the facility of usage more than
one member then in this case we need further split in the table,
because we need in depth analysis..
9)what is ods?
a) ODS stands for Online Data Storage.
It is used to maintain, store the current and up to date information

and the transactions regarding the source databases taken from the
OLTP system.
It is directly connected to the source database systems instead of to
the staging area.
It is further connected to data warehouse and moreover can be treated
as a part of the data warehouse database.
Edit by Admin : ODS Stands for Operational Data Store not Online
Data Storage

b) ODS stands for Operational Data Store.


It is the final integration point in the ETL process before loading
the data into the Data Warehouse.
c) ODS stands for Operational Data Store. It contains near real
time data. In typical data warehouse architecture, sometimes
ODS is used for analytical reporting as well as souce for Data
Warehouse
d) Operationa Data Services is Hybrid structure that has some
aspects of a data warehouse and other
aspects of an Operational system.
Contains integrated data.
It can support DSS processing.
It can also support High transaction processing.
Placed in between Warehouse and Web to support web users.
e) The form that data warehouse takes in the operational
environment.
Operational data stores can be updated, do provide rapid
constant time,and contain only limited amount of historical data
f) An Operational Data Store presents a consistent picture of the
current data stored and managed by transaction processing
system. As data is modified in the source system, a copy of the
changed data is moved into the ODS. Existing data in the ODS is
updated to reflect the current status of the source system
ODS means Operational Data Store
It is used to store current data through transactional
webpplications,sap,MQ series
Cureent data means particular data from one date into onedate

ods contains 30-90 data


g) hi,An Operational Data Store is a collection of data in support of
an organizations need for upto operational, intergrated,
collective information. ODS is purely operational construct to
address the operational needs of a corporation. While loading
data from Stagging to ODS we do the process of data scrubbing,
data validation.
10) What is SCD1 , SCD2 , SCD3
->SCD 1: Complete overwrite
SCD 2: Preserve all history. Add row
SCD 3: Preserve some history. Add additional column for ol/new.
->SCD Type 1, the attribute value is overwritten with the new
value, obliterating the historical attribute values.For example, when
the product roll-up
changes for a given product, the roll-up attribute is merely updated
with the current value.
SCD Type 2,a new record with the new attributes is added to the
dimension table. Historical fact table rows continue to reference the
old dimension key with the old roll-up attribute; going forward, the
fact table rows will reference the new surrogate key with the new
roll-up thereby perfectly partitioning history.
SCDType 3, attributes are added to the dimension table to support
two simultaneous roll-ups - perhaps the current product roll-up as
well as current version minus one, or current version and original.
->SCD:-------- The value of dimensions is used change very rarely,
That is called Slowly Changing dimensions
Here mainly 3
1)SCD1:Replace the old values overwrite by new values
2)SCD2:Just Creating Additional records
3)SCD3:It's maintain just previous and recent
In the SCD2 again 3

1)Versioning
2)Flagvalue
3)Effective Date range
Versioning:Here the updated dimensions inserted in to the target along
with version number
The new dimensions will be inserted into the target along with Primary
key
Flagvalue:The updated dimensions insert into the target along with 0
and new dimensions inset into the target along with 1
->SCD1,SCD2 and SCD3 can be also
Type I,Type II,Type III Dimensions:
Type I
-Changed attribute overwrites the existing one.
eg: If income of customer changes from 4000 to 5000 it will
simply replace
4000 by 5000.
Type II Dimension
- For the changed attribute a new record is created.
eg: If the income of customer is changed from 4000 to
5000,then a new record
is created with income 5000 and the previous one will
remain as itis.This
will help us to record the history of data.
Type III Dimension
-Here a new column will be added to capture the change.
eg: If the income of customer increases from 4000 to
5000,then a new column will,
be added to the existing row titled "new income".So in
that record 2 cols
will be there "income" and "new income".
11) What is the Difference between OLTP and OLAP
->Current data
Short database transactions
Online update/insert/delete

Normalization is promoted
High volume transactions
Transaction recovery is necessary
OLAP
Current and historical data
Long database transactions
Batch update/insert/delete
Denormalization is promoted
Low volume transactions
Transaction recovery is not necessary
->OLTP is nothing but OnLine Transaction Processing ,which contains a
normalised tables and online data,which have frequent
insert/updates/delete.
But OLAP(Online Analtical Programming) contains the history of OLTP
data, which is, non-volatile ,acts as a Decisions Support System and is
used for creating forecasting reports.
->Hey add this point also,
Index
OLTP : FEW
OLAP : MANY
JOINS
OLTP : MANY
OLAP : FEW
Deepa
->In Oltp's, Data Can be insert,update and Delete.Follows ER Modeling
In OLap's Data cannot be insert,update and Detete. Follows
Dimensional Modeling.

You might also like