Ssis Notes
Ssis Notes
==========
DB<---->DB
File<---->File
File<---->DB
DB<---->FIle
csv--file
excel --file
xml--db
--Data flow represents flow of data.It is used to transform the data from
source to distination by using business rules.
Data flow consists of sources,distinations and comman tranformations and
other transafermations.
6)It manages all connection used by different tasks and adapter in the package.
1.For loop Container:It is used to repeats control flow to the specified number of
times depends on condition.
3.Character map:It is used to convert the data to different languages and convert
case of string(upper,lower)
6.Copy column:It creates new column by copying input column and adding new column
to the transforamation output.
7.Derived column:It is used to convert the data into different data types and we
can do mathematical operatios and string operations with the help of predefined
functions.
8.Union all:It is used to combine multiple inputs into one output(same structure)
9.Merge:It is used to combine two sorted data sets into single dataset
10.Merge join:It provides an output that is generated by joining two sorted inputs
using a full,left,inner joins.
it is used to get the relavant information from reference table based on the
key column.
13.OLEDB command:It is used to execute sql statement or stored procedures for each
row in the input
15.Fuzzy grouping :it performs data cleaning tasks by identifying rows of data that
are likely to be duplicates
and selecting canonical row of data to use in standardising of data.
or it performs grouping of the rows on appropriate match.
16.Fuzzy lookup :it performs data cleaning tasks such as standardizing data,
correcting data and providing
missing values. Or used to perform appropriate match for given row values against a
lookup table rows.
17. Slowly Changing Dimension: Used to synchronise the changes in the OLTP database
tables into Datawarehousing Dimension tables.
SCD 1: When the changes are occured in the source, its simply updating and
overriding existing content in destination.
SCD 3: We are mantain historical data, but instead of maintaining entire record as
history. Here we are
maintaining those columns, which are going to update.IN .NET
3. Lookup
1. Data Conversion: Data types mismatch.
Ex: Instead of String when we pass int.
2. Expression Evaluation: Raising expression errors because of performing invalid
operator.
Ex: Instead of '+' operator, when pass different operator.
3. Lookup errors: occures because lookup operation fails to locate a match in the
lookup table.
Ex: mismatch in lookup table.
Lookup Transformation
The Look up transformation performs exact matched records by joining data in input
columns with
coulmns in a refrenced data set.
note:the look up transformation supports the following database provides for the
OLEDB connection manager
-----sql server,
----oracle
---and DB2
the look ups performed by the lookup transformations are case sensitive .
Fuzzy look up:-the fuzzy look up transformation uses fuzzy matching to return one
or more close matches
from reference table
the fuzzy look up transformation includes three features for customizing the look
up it performs
1)conditional split:
1)audit tranformation:
it is used to populate the audit information such as package name,machine name
,execution time and etc.
we can populate the same information using derived column. t/r
--it displays audit information for every row coming from source or it adds
audit
information to the source data.
3)copy column:by using this t/r we can copy the data from existing
column to new column.same way using the derived column t/r we can
exist the data from existing column to new column
4)row count:it is used to capture the no of records into the variable in the data
flow.
6)oledb command:it is used to execute a SQL statements dynamically for each and
every record in the input.
7)scd:
To process the data from granualarity tables to main table .we follow a
mechanism is called slowly changing dimension.
SCD will give the information the way see in the changed that can be
maintained in the target.
for each and every record inserted in the source those records has to be also
inserted in the target table.
9)Lookup:look up t/r is used to get the relevant information from the reference
table based on the key field
this look up t/r will be used in the slowly changing dimension to check
the incoming record is existed or not in the target table.
Full:reference emp(referenced)---dept(reference)---
empno ename sal deptno deptno dname loc high cache memory( buffer) store--
database link distroyed---
1001 shiva 3000 10 10 it hyd
1002 ramu 3000 20 20 sales hyd Fullcache:
run pack after refresh--more---disconnectted mode--high performance
1003 fuel 4000 30 40 it hyd
no cache-----------connected mode--
look up table locked.performance low,don't use look up table,
Look up:
10)Row sampling:
it used to select the specific number of random rows from the input dataset.
it will produce data sampling selected output as well as sampling unselected
output.
11)Percentage sampling:
Row sampling:it used to select the specific percetage number of random rows from
the input dataset.
it will produce data sampling selected output as well as sampling unselected
output.