0% found this document useful (0 votes)
172 views4 pages

Strategies For Incremental Load

The document discusses strategies for incrementally loading data from a source database table (Table A) to a data warehouse table (Table B). It describes four strategies: 1) complete refresh of the data warehouse table each load, 2) comparing all records to identify deltas, 3) identifying and processing only new and updated delta records, and 4) loading delta records in real-time to an intermediate table and then loading the data warehouse table from the intermediate table. The best strategy is the real-time load approach as it only processes new records in an intermediate table without requiring separate initial and delta load logic.

Uploaded by

emailkittu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
172 views4 pages

Strategies For Incremental Load

The document discusses strategies for incrementally loading data from a source database table (Table A) to a data warehouse table (Table B). It describes four strategies: 1) complete refresh of the data warehouse table each load, 2) comparing all records to identify deltas, 3) identifying and processing only new and updated delta records, and 4) loading delta records in real-time to an intermediate table and then loading the data warehouse table from the intermediate table. The best strategy is the real-time load approach as it only processes new records in an intermediate table without requiring separate initial and delta load logic.

Uploaded by

emailkittu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Strategies for Incremental Load

05/11/2008

Hi-Tech ISU

Manish Gupta
[email protected]
Requirement

The requirement is to migrate data from Table A in database to Table B in data


warehouse.

Assumptions

There will be both inserts and updates in the Table A. These changes must be
reflected in Table B.

Solution

• There will be one initial load which will move all the historical records from Table
A to Table B.
• During subsequent incremental loads, new records will be copied and updated
records will be modified from Table A to Table B.

TCS Public
Strategies

There are in all 4 strategies which can be used for data migration during incremental
loads –

1. Complete Refresh of Data


The approach is to delete the data warehouse table (Table B) and copy the entire
source data (Table A) over each load.
Pros: This is the easiest approach as no logic is required to find delta records. Also,
the same code will be used for initial and delta load.
Cons: This is not a practical approach for large volume databases.

2. Compare all records in both database and data warehouse and write the
Deltas
The approach is to compare each field in the source (Table A) with the fields in the
data warehouse (Table B), identify the changes and insert/update the records in the
data warehouse (Table B).
Pros: None
This approach may be necessary if there is no column in source table (Table A) to
identify delta records but complete refresh of data is a better option.
Cons: This is not a practical approach for large volume databases.

3. Identify and Process Delta records


The approach is to identify new/updated records in the source table (Table A) and
write them to the data warehouse table (Table B).
Pros: This can be the best approach if we are able to identify delta records (new and
updated) as only few records will be processed subsequently.
Cons: There can be few problems in implementation of this strategy as
• There might not be any column in source data (Table A) to identify delta
records.
• Logic needs to be implemented to identify delta records.
• There will be separate code for Initial and Delta load.

TCS Public
4. Real Time Load
The approach is to load an intermediate table with delta records only as soon as
there are new/updated records in source table (Table A). The intermediate table can
be loaded using some trigger on source table (Table A) or some other methodology.
Pros: This is the best and easiest approach as we need to process only intermediate
table.
Cons: Tables similar to source (Table A) needs to be created at source.

TCS Public

You might also like