0% found this document useful (0 votes)
76 views7 pages

C CC C CCC CCCCCCCCCCCC

1. Constraint-based loading requires a single active source per target and key relationships between targets, loading rows to the primary key table then secondary. Target load order allows different active sources per target and maintains referential integrity for inserts, deletes, and updates. 2. Normal loading loads all rows at once for best performance with large datasets while bulk loading loads rows in batches for intermediate commits and recovery from failures with large datasets. They are used respectively for ETL and data warehouse bulk loading. 3. To identify duplicate records without using DISTINCT, GROUP BY the columns to find duplicates and check for counts greater than 1.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views7 pages

C CC C CCC CCCCCCCCCCCC

1. Constraint-based loading requires a single active source per target and key relationships between targets, loading rows to the primary key table then secondary. Target load order allows different active sources per target and maintains referential integrity for inserts, deletes, and updates. 2. Normal loading loads all rows at once for best performance with large datasets while bulk loading loads rows in batches for intermediate commits and recovery from failures with large datasets. They are used respectively for ETL and data warehouse bulk loading. 3. To identify duplicate records without using DISTINCT, GROUP BY the columns to find duplicates and check for counts greater than 1.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Q1>If you join bellow tables what will be the output.and how many no.

of records would come out.


Table-1 Table-2
Col-1 Col-2 Col-1 Col-2
A 1 1 X
B 1
1 y
c 1

Ans: a x
A y
B x
B y
C x
C y

1) How to update an scd without using lookup?


Alternative to Lookup is Joiner. We need to import source structure of target in Source Analyzer and
bring that into mapping and use it for comparison like Lookup. You can use Joiner transformation to
design scd Type1 manually. Import target as source and use joiner transformation. Use expression to
insert and update the rows into target.
Or
In the session u have to put update else insert

2) Diff. b/w star and snowflake schema.


Star and snowflake are most common types of dimensional modeling. Always a debating question  in the
data warehousing context is which one works better? You will here arguments favouring both sides,
however the question is incomplete without mentioning the system/business. The decision whether to
employ a star schema or a snowflake schema should consider the relative strengths of the database
platform in question and the query tool to be employed.
Star Schemas
The star schema is the simplest data warehouse schema. It is called a star schema because the diagram
resembles a star, with points radiating from a center. The center of the star consists of one or more fact
tables and the points of the star are the dimension tables.
Snowflake Schema
The snowflake schema is a variation of the star schema used in a data warehouse. The snowflake
schema (sometimes callled snowflake join schema) is a more complex schema than the star schema
because the tables which describe the dimensions are normalized.
Star vs Snowflake
 
Snowflake Schema Star Schema
Which Data Good to use for small
Good for large datawarehouses
warehouse? datawarehouses/datamarts
Normalization(dim
3 Normal Form 2 Normal Denormalized Form
table)
More complex queries and hence less Less complex queries and easy to
Ease of Use
easy to understand understand
Ease of No redundancy and hence more easy to Has redundant data and hence less
maintenance/change maintain and change easy to maintain/change
More foreign keys-and hence more query Less no. of foreign keys and hence
Query Performance
execution time lesser query execution time
 

3) Even though we have star schema why snow flake schema is used.
Star schema one of the very commonly used style of data warehouse schema. The star schema consists
of a few fact tables (possibly only one, justifying the name) referencing any number of dimension tables.
The star schema is considered an important special case of the snowflake schema. Note that in star
schema different dimensions are not related to one another.
Star schema illustration

Above diagram illustrates how the schema looks like, the name "star" is because of the central entity
surrounded by other entities which resembles the star. All measures in the fact table are related to all
the dimensions that fact table is related to. In other words, they all have the same level of granularity.
A star schema can be simple or complex. A simple star consists of one fact table; a complex star can
have more than one fact table. Let's look at an example: Assume our data warehouse keeps store sales
data, and the different dimensions are time, store, product, and customer. In this case, the sales fact
table will be at the center of the diagram(above) and the dimension tables time, store, product,
customer will be surrounded and are connected to fact table with primary key-foreign key relationship.

A snowflake schema is a term that describes a star schema structure normalized through the use of
outrigger tables. i.e dimension table hierachies are broken into simpler tables. This way of logical
arrangement of tables in a multidimensional database will make the entity relationship diagram
resemble a snowflake in shape.
Where is snowflake schema used?
The star and snowflake schema are most commonly found in dimensional data warehouses and data
marts where speed of data retrieval is more important than the efficiency of data manipulations. As
such, the tables in these schema are not normalized much, and are frequently designed at a level of
normalization short of third normal form. Example of snowflake schema is shown in the below diagram.
The decision whether to employ a star schema or a snowflake schema should consider the relative
strengths of the database platform in question and the query tool to be employed. Star schema should
be favored with query tools that largely expose users to the underlying table structures, and in
environments where most queries are simpler in nature. Snowflake schema are often better with more
sophisticated query tools that isolate users from the raw table structures and for environments having
numerous queries with complex criteria.
Benefits of snowflake schema design
Some OLAP multidimensional database modeling tools that use dimensional data marts as a data source
are optimized for snowflake schemas .If a dimension is very sparse (i.e. most of the possible values for
the dimension have no data) and/or a dimension has a very long list of attributes which may be used in a
query, the dimension table may occupy a significant proportion of the database and snowflaking may be
appropriate. A multidimensional view is sometimes added to an existing transactional database to aid
reporting. In this case, the tables which describe the dimensions will already exist and will typically be
normalized. A snowflake schema will therefore be easier to implement. A snowflake schema can
sometimes reflect the way in which users think about data. Users may prefer to generate queries using a
star schema in some cases, although this may or may not be reflected in the underlying organization of
the database. Some users may wish to submit queries to the database which, using conventional
multidimensional reporting tools, cannot be expressed within a simple star schema. This is particularly
common in data mining of customer databases, where a common requirement is to locate common
factors between customers who bought products meeting complex criteria. Some snowflaking would
typically be required to permit simple query tools to form such a query, especially if provision for these
forms of query weren't anticipated when the data warehouse was first designed.

4) What is load order?


Constraint Based Loading and Target Load Order
Define Informatica's Constraint Based Loading and Target Load Order. Whats is the difference
between the two?

Constraint Based Loading is a session level property. When this option is selected in session properties,
the Integration Service orders the target load on a row by row basis. For every row, the transformed row
is loaded first to the primary key table and then to the secondary key table. Target tables in the mapping
should have 1 active source (Targets in separate pipelines are not applicable) and should have key
relationships(non-circular in nature). Also the session option 'Treat Source Rows As' should be set to
Insert. Updates can not be used with constraint based loading.
Target Load Order Group is a collection of source qualifiers, transformations and targets in a mapping.
Thus a mapping containing more than 1 pipeline is eligible to Target Load Order. Target Load Order sets
the order in which Integration Service sends rows to targets in different target load order groups.
Differences:

1. Constraint Based Loading requires targets to have only 1 active source while Target Load Order is for
targets having different active sources.
2. Constraint Based Loading can not be done to maintain referential integrity for updates while Target
Load Order can be used to maintain referential integrity when inserting, deleting or updating tables that
have primary key and foreign key constraints. 

5) Diff. b/w normal and bulk loading. Where they are practically used.

6) How to identify duplicates without using distinct.


Select empno, count(*) from emp group by empno where count(*)>1;

7) Diff. b/w count(*) and count(1).


Count(*) counts all the rows in the primary key column where as count(1) doesn’t consider the
null values in the column 1 and returns the count.

8) What are primary and unique key.


Both are used to maintain uniqueness in the column where primary key doesn’t allow any null
values where as unique key allows a null vale but for only once.
9) Diff. b/w lookup and joiner. What fns are used in both of them?
Both are used for joining tables but non equi joins are possible only in look up

10) From “nagarjuna.reddy.uk” how to extract “reddy”.


Substr(nagarjuna.reddy.uk,instr(nagarjuna.reddy.uk,’.’,1,1),
(instr(nagarjuna.reddy.uk,’.’,1,2))-(instr(nagarjuna.reddy.uk,’.’,1,1),)))

11) What is performance tuning how do u identify bottle necks.


12) What are degenerate dimensions where they are located why they are used, practically trace
them in your project.
13) What is a confirmed dimension. Where is it practically used
14) What is fact less fact table where is it located and practically trace them in your project.
15) What is target update override why is it used.
16) If u don’t have any aggregate fn in an aggregator what will be the o/p.
It opts only one column
17) If you have s1, s2, s3, s4, s5 sessions how can you run s5 only on successful completion of s2.
Tasks like decision,link,event wait ,event raise can be used
18) What is the diff. b/w mapplets and reusable transformations why they areused and where are
they used.

19) From the below table return the city name whose purchase orders are more than 4

City Purchase Order


Pune p1
Pune p2
Pune p3
Pune p4
Pune p5
Hyd h1
Hyd h2
Hyd h3
Select city from pur_ord group by city where count(purchase order)=(select
max(count(purchaseorder)from pur_ord group by purchase order)

20) How to route distinct and duplicates to two diff. targets.


Step  1: Drag  the source to mapping and connect it to an aggregator transformation.

Step  2: In aggregator transformation, group by the key column and add a new port  call it count_rec to
count  the key column.
Step  3: connect  a router to the  aggregator from the previous step.In router make two groups one
named "original" and another as "duplicate"
In original write count_rec=1 and in duplicate write count_rec>1.

21) From the below table how to get only columns from table 2
Table1 table 2
1 4
2 5
3
4
5

22) SRC TGT


C1 C2 c1 c2
1 a 1 a
1 b 1 a,b
1 c 1 a,b,c
2 a 2 a
2 b 2 a,b
3 x 3 x

From above don’t use aggregator to get above result..


23) SRC TGT1
C1 C2 TGT2
1 A TGT3
2 b
3 c
4 e
5 f
6 g
7 h
8 i
from the above table route 1st to tgt1 and 2nd to tgt2 and 3rd to tgt3 and likewise all.

Srcsqexp  rout  tgt1,tgt2,tgt3


Seq_gen 

Pass all ports from sq to an exp create an extra port from seq_gen taking next value=3 and
enabling cycle option pass values to it them pass all ports to a router and there take three
groups with condition as next val=1,nextval=2 and next val=3 and pass them to three tgts…

24) How do you achieve dynamic flat file generation?


It can be achieved by taking an transactional control transformation.
25) How to count no. of records in file using shell script…
$ WC –l

These were the questions asked to me and I could remember some of them and others have not
shared question I think they might share with you separately………..

You might also like