What Is The Difference Between Star Schema and Snow Flake Schema ?and When We Use Those Schema's?
What Is The Difference Between Star Schema and Snow Flake Schema ?and When We Use Those Schema's?
and
when we use those schema's?
B) ODS: this is operational data stores, which means the real time
transactional databases. In data warehouse, we extract the data from
ODS, transform in the stagging area and load into the target data
warehouse.
I think, earlier comments on the ODS is little bit confusing.
3)What is the role of surrogate keys in data warehouse and how will u
generate them?
A) A surrogate key is a simple Primary key which maps one to one
with a Natural compound Primary key. The reason for using
them is to alleviate the need for the query writer to know the full
compound key and also to speed query processing by removing
the need for the RDBMS to process the full compound key when
considering a join.
For example, an shipment could have a natural key of ORDER +
ITEM + SHIPMENT_SEQ. By giving it a unique SHIPMENT_ID,
subordinate tables can access it with a single attribute, rather
than 3. However, it's important to create a unique index on the
natural key as well.
D)junk dimension
the column which we are using rarely or not used, these columns are
formed a dimension is called junk dimension
degenerative dimension
the column which we use in dimension are degenerative dimension
ex
emp table has empno.ename,sal,job,deptno
but
we are talking only the column empno,ename from the emp table and
forming a dimension this is called degenerative dimension
D) Junk dimension: Grouping of Random flags and text Attributes in
a dimension and moving them to a separate sub dimension.
Degenerate Dimension: Keeping the control information on Fact
table ex: Consider a Dimension table with fields like order
number and order line number and have 1:1 relationship with
Fact table, In this case this dimension is removed and the order
information will be directly stored in a Fact table inorder
eliminate unneccessary joins while retrieving order information..
E)A junk dimension is a convenient grouping of flags and indicators. It's helpful,
but not absolutely
required, if there's a positive correlation among the values.
Their benefits:
Provide a recognizable, user-intuitive location for related codes, indicators and
their
descriptors in a dimensional framework
Clean up a cluttered design that already has too many dimensions. There might
be five
or more indicators that could be collapsed into a single 4-byte integer surrogate
key in
the fact table
the targetdimension table as lookup if the two natural keys match then
get the value of the wh key
7)
A)In Dataware house we manually load the time dimension
B) Every Datawarehouse maintains a time dimension. It would be at
the most granular level at which the business runs at (ex: week
day, day of the month and so on). Depending on the data loads,
these time dimensions are updated. Weekly process gets updated
every week and monthly process, every month.
C) Time dimension in DWH must be load Manually. we load
data into Time dimension using pl/sql scripts.
D) Generally we load the Time dimension by using SourceStage as
a Seq File and we use one passive stage in that transformer stage
we will manually write functions as Month and Year Functions to
load the time dimensions but for the lower level i.e., Day also we
have one function to implement loading of Time Dimension.
E) create a procedure to load data into Time Dimension. The procedure
needs to run only once to popullate all the data. For eg, the code
below fills up till 2015. You can modify the code to suit the feilds in ur
table.
create or replace procedure
QISODS.Insert_W_DAY_D_PR as
LastSeqID number default 0;
loaddate Date default to_date('12/31/1979','mm/dd/yyyy');
begin
Loop
LastSeqID := LastSeqID + 1;
loaddate := loaddate + 1;
INSERT into QISODS.W_DAY_D values(
LastSeqID,
Trunc(loaddate),
Decode(TO_CHAR(loaddate,'Q'),'1',1,decode(to_char(loaddate,'Q'),'2',
1,2)
),
TO_FLOAT(TO_CHAR(loaddate, 'MM')),
TO_FLOAT(TO_CHAR(loaddate, 'Q')),
trunc((ROUND(TO_DECIMAL(to_char(loaddate,'DDD'))) +
ROUND(TO_DECIMAL(to_char(trunc(loaddate, 'YYYY'), 'D')))+ 5) / 7),
TO_FLOAT(TO_CHAR(loaddate, 'YYYY')),
TO_FLOAT(TO_CHAR(loaddate, 'DD')),
TO_FLOAT(TO_CHAR(loaddate, 'D')),
TO_FLOAT(TO_CHAR(loaddate, 'DDD')),
1,
1,
1,
1,
1,
TO_FLOAT(TO_CHAR(loaddate, 'J')),
((TO_FLOAT(TO_CHAR(loaddate, 'YYYY')) + 4713) * 12) +
TO_number(TO_CHAR(loaddate, 'MM')),
((TO_FLOAT(TO_CHAR(loaddate, 'YYYY')) + 4713) * 4) +
TO_number(TO_CHAR(loaddate, 'Q')),
TO_FLOAT(TO_CHAR(loaddate, 'J'))/7,
TO_FLOAT (TO_CHAR (loaddate,'YYYY')) + 4713,
TO_CHAR(load_date, 'Day'),
TO_CHAR(loaddate, 'Month'),
Decode(To_Char(loaddate,'D'),'7','weekend','6','weekend','weekday'),
Trunc(loaddate,'DAY') + 1,
Decode(Last_Day(loaddate),loaddate,'y','n'),
to_char(loaddate,'YYYYMM'),
to_char(loaddate,'YYYY') || ' Half' ||
Decode(TO_CHAR(loaddate,'Q'),'1',1,decode(to_char(loaddate,'Q'),'2',
1,2)
),
TO_CHAR(loaddate, 'YYYY / MM'),
TO_CHAR(loaddate, 'YYYY') ||' Q ' ||
TRUNC(TO_number( TO_CHAR(loaddate,
'Q')) ) ,
TO_CHAR(loaddate, 'YYYY') ||' Week'||
TRUNC(TO_number( TO_CHAR(loaddate,
'WW'))),
TO_CHAR(loaddate,'YYYY'));
If loaddate=to_Date('12/31/2015','mm/dd/yyyy') Then
Exit;
End If;
End Loop;
commit;
end Insert_W_DAY_D_PR;
8) Difference between Snow flake and Star Schema. What are situations
where Snow flake Schema is better
a) star schema and snowflake both serve the purpose of
dimensional modeling when it come to datawarehouses.
star schema is a dimensional model with a fact table ( large) and
and the transactions regarding the source databases taken from the
OLTP system.
It is directly connected to the source database systems instead of to
the staging area.
It is further connected to data warehouse and moreover can be treated
as a part of the data warehouse database.
Edit by Admin : ODS Stands for Operational Data Store not Online
Data Storage
1)Versioning
2)Flagvalue
3)Effective Date range
Versioning:Here the updated dimensions inserted in to the target along
with version number
The new dimensions will be inserted into the target along with Primary
key
Flagvalue:The updated dimensions insert into the target along with 0
and new dimensions inset into the target along with 1
->SCD1,SCD2 and SCD3 can be also
Type I,Type II,Type III Dimensions:
Type I
-Changed attribute overwrites the existing one.
eg: If income of customer changes from 4000 to 5000 it will
simply replace
4000 by 5000.
Type II Dimension
- For the changed attribute a new record is created.
eg: If the income of customer is changed from 4000 to
5000,then a new record
is created with income 5000 and the previous one will
remain as itis.This
will help us to record the history of data.
Type III Dimension
-Here a new column will be added to capture the change.
eg: If the income of customer increases from 4000 to
5000,then a new column will,
be added to the existing row titled "new income".So in
that record 2 cols
will be there "income" and "new income".
11) What is the Difference between OLTP and OLAP
->Current data
Short database transactions
Online update/insert/delete
Normalization is promoted
High volume transactions
Transaction recovery is necessary
OLAP
Current and historical data
Long database transactions
Batch update/insert/delete
Denormalization is promoted
Low volume transactions
Transaction recovery is not necessary
->OLTP is nothing but OnLine Transaction Processing ,which contains a
normalised tables and online data,which have frequent
insert/updates/delete.
But OLAP(Online Analtical Programming) contains the history of OLTP
data, which is, non-volatile ,acts as a Decisions Support System and is
used for creating forecasting reports.
->Hey add this point also,
Index
OLTP : FEW
OLAP : MANY
JOINS
OLTP : MANY
OLAP : FEW
Deepa
->In Oltp's, Data Can be insert,update and Delete.Follows ER Modeling
In OLap's Data cannot be insert,update and Detete. Follows
Dimensional Modeling.