C CC C CCC CCCCCCCCCCCC
C CC C CCC CCCCCCCCCCCC
Ans: a x
A y
B x
B y
C x
C y
3) Even though we have star schema why snow flake schema is used.
Star schema one of the very commonly used style of data warehouse schema. The star schema consists
of a few fact tables (possibly only one, justifying the name) referencing any number of dimension tables.
The star schema is considered an important special case of the snowflake schema. Note that in star
schema different dimensions are not related to one another.
Star schema illustration
Above diagram illustrates how the schema looks like, the name "star" is because of the central entity
surrounded by other entities which resembles the star. All measures in the fact table are related to all
the dimensions that fact table is related to. In other words, they all have the same level of granularity.
A star schema can be simple or complex. A simple star consists of one fact table; a complex star can
have more than one fact table. Let's look at an example: Assume our data warehouse keeps store sales
data, and the different dimensions are time, store, product, and customer. In this case, the sales fact
table will be at the center of the diagram(above) and the dimension tables time, store, product,
customer will be surrounded and are connected to fact table with primary key-foreign key relationship.
A snowflake schema is a term that describes a star schema structure normalized through the use of
outrigger tables. i.e dimension table hierachies are broken into simpler tables. This way of logical
arrangement of tables in a multidimensional database will make the entity relationship diagram
resemble a snowflake in shape.
Where is snowflake schema used?
The star and snowflake schema are most commonly found in dimensional data warehouses and data
marts where speed of data retrieval is more important than the efficiency of data manipulations. As
such, the tables in these schema are not normalized much, and are frequently designed at a level of
normalization short of third normal form. Example of snowflake schema is shown in the below diagram.
The decision whether to employ a star schema or a snowflake schema should consider the relative
strengths of the database platform in question and the query tool to be employed. Star schema should
be favored with query tools that largely expose users to the underlying table structures, and in
environments where most queries are simpler in nature. Snowflake schema are often better with more
sophisticated query tools that isolate users from the raw table structures and for environments having
numerous queries with complex criteria.
Benefits of snowflake schema design
Some OLAP multidimensional database modeling tools that use dimensional data marts as a data source
are optimized for snowflake schemas .If a dimension is very sparse (i.e. most of the possible values for
the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a
query, the dimension table may occupy a significant proportion of the database and snowflaking may be
appropriate. A multidimensional view is sometimes added to an existing transactional database to aid
reporting. In this case, the tables which describe the dimensions will already exist and will typically be
normalized. A snowflake schema will therefore be easier to implement. A snowflake schema can
sometimes reflect the way in which users think about data. Users may prefer to generate queries using a
star schema in some cases, although this may or may not be reflected in the underlying organization of
the database. Some users may wish to submit queries to the database which, using conventional
multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly
common in data mining of customer databases, where a common requirement is to locate common
factors between customers who bought products meeting complex criteria. Some snowflaking would
typically be required to permit simple query tools to form such a query, especially if provision for these
forms of query weren't anticipated when the data warehouse was first designed.
Constraint Based Loading is a session level property. When this option is selected in session properties,
the Integration Service orders the target load on a row by row basis. For every row, the transformed row
is loaded first to the primary key table and then to the secondary key table. Target tables in the mapping
should have 1 active source (Targets in separate pipelines are not applicable) and should have key
relationships(non-circular in nature). Also the session option 'Treat Source Rows As' should be set to
Insert. Updates can not be used with constraint based loading.
Target Load Order Group is a collection of source qualifiers, transformations and targets in a mapping.
Thus a mapping containing more than 1 pipeline is eligible to Target Load Order. Target Load Order sets
the order in which Integration Service sends rows to targets in different target load order groups.
Differences:
1. Constraint Based Loading requires targets to have only 1 active source while Target Load Order is for
targets having different active sources.
2. Constraint Based Loading can not be done to maintain referential integrity for updates while Target
Load Order can be used to maintain referential integrity when inserting, deleting or updating tables that
have primary key and foreign key constraints.
5) Diff. b/w normal and bulk loading. Where they are practically used.
19) From the below table return the city name whose purchase orders are more than 4
Step 2: In aggregator transformation, group by the key column and add a new port call it count_rec to
count the key column.
Step 3: connect a router to the aggregator from the previous step.In router make two groups one
named "original" and another as "duplicate"
In original write count_rec=1 and in duplicate write count_rec>1.
21) From the below table how to get only columns from table 2
Table1 table 2
1 4
2 5
3
4
5
Pass all ports from sq to an exp create an extra port from seq_gen taking next value=3 and
enabling cycle option pass values to it them pass all ports to a router and there take three
groups with condition as next val=1,nextval=2 and next val=3 and pass them to three tgts…
These were the questions asked to me and I could remember some of them and others have not
shared question I think they might share with you separately………..