0% found this document useful (0 votes)
23 views

BI Lecture 2 - Data Warehousing - Data Integration

Uploaded by

Omar Magdy
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

BI Lecture 2 - Data Warehousing - Data Integration

Uploaded by

Omar Magdy
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture 2

Chapter 2:
Data Warehousing – Data
Integration
What is a Data Warehouse?
• Data Warehousing is the process of
constructing and using a data warehouse.
• Data Warehouse is an enterprise
structured repository of subject-oriented,
integrated, time-variant, non volatile
data used for information retrieval and
decision support.

Copyright © 2014 Pearson Education, Inc. Slide 2- 2


Data Warehouse Properties

Copyright © 2014 Pearson Education, Inc. Slide 2- 3


a) Subject-Oriented
Data is categorized and stored by business subject rather than by application

Copyright © 2014 Pearson Education, Inc. Slide 2- 4


b) Integrated
 The data in the data warehouse comes from several operational systems.
 In addition to data from internal operational systems, for many enterprises,
data from outside sources is likely to be very important.

Copyright © 2014 Pearson Education, Inc. Slide 2- 5


c) Time-Variant
 Data is stored as a series of snapshots, each representing a period
of time.

Time Data
Jan-97 January
Feb-97 February
Mar-97 March

Copyright © 2014 Pearson Education, Inc. Slide 2- 6


Changing Data
First time load

Warehouse Database

Operational Refresh
Database

Refresh

Refresh

Copyright © 2014 Pearson Education, Inc. Slide 2- 7


d) Non-volatile

Typically data in the data warehouse is not updated or deleted .

Copyright © 2014 Pearson Education, Inc. Slide 2- 8


A Generic DW Framework
No data marts option
Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL
Reporting
Process Data mart
(Marketing)
Select
Legacy Metadata Data/text

/ Middleware
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
Data mart Dashboard,

API
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data

Copyright © 2014 Pearson Education, Inc. Slide 2- 9


Data Integration and the Extraction,
Transformation, and Load (ETL)
Process
 Data integration
Integration that comprises three major processes:
o Data access (i.e., the ability to access and extract data from
any data source)
o Data federation (i.e., the integration of business views across
multiple data stores)
o Change capture (based on the identification, capture, and
delivery of the changes made to enterprise data sources).

Copyright © 2014 Pearson Education, Inc. Slide 2- 10


Data Integration (Cont.)
 A major purpose of a data warehouse is to integrate data
from multiple systems.
 Various integration technologies enable data and
metadata integration:
 Enterprise application integration (EAI)
 It involves integrating application functionality and is
focused on sharing functionality (rather than data) across
systems
 Traditionally, EAI solutions uses application
programming interface (API).
 Recently, EAI is accomplished by using SOA. Using Web
services is a specialized way of implementing an SOA
Copyright © 2014 Pearson Education, Inc. Slide 2- 11
Data Integration (Cont.)
 Enterprise information integration (EII)
 An evolving tool space that promises real-time data
integration from a variety of sources, such as relational
or multidimensional databases, Web services, etc.
 It is a mechanism for pulling data from source systems
to satisfy a request for information
 Extract Transform Load (ETL)
 The ETL consists of three main processes: Extract, Transform and
Load.
 ETL process typically consumes 70 percent of the time in a data-
centric project.

Copyright © 2014 Pearson Education, Inc. Slide 2- 12


Extract -List
- of data extraction
issues
 Source Identification—identify source applications and
source structures.
 Method of extraction—for each data source, define
whether the extraction process is manual or tool-based.
 Extraction frequency—for each data source, establish
how frequently the data extraction must by done—daily,
weekly, quarterly, and so on.
 Job sequencing—determine whether the beginning of
one job in an extraction job stream has to wait until the
previous job has finished successfully.
 Exception handling—determine how to handle input
records that cannot be extracted

Copyright © 2014 Pearson Education, Inc. Slide 2- 13


Transform - Major Transformation
Types
 Conversion of Units of Measurements
 Date/Time Conversion.
 Convert data types
 Calculate and derive attribute values
 Check for referential integrity
 Aggregate data as needed
 Resolve missing values
 Remove duplicates

Copyright © 2014 Pearson Education, Inc. Slide 2- 14


Load
 Terminology:
 Initial Load — populating all the data warehouse
tables for the very first time
 Incremental Load — applying ongoing changes as
necessary in a periodic manner
 Full Refresh — completely erasing the contents of one
or more tables and reloading with fresh data (initial
load is a refresh of all the tables)

Copyright © 2014 Pearson Education, Inc. Slide 2- 15


ETL (Extract, Transform, Load)
 Issues affecting the purchase of an ETL tool
 Data transformation tools are expensive
 Data transformation tools may have a long learning
curve
 Important criteria in selecting an ETL tool
 Ability to read from and write to an unlimited number of
data sources/architectures
 Automatic capturing and delivery of metadata
 A history of conforming to open standards
 An easy-to-use interface for the developer and the
functional user
Copyright © 2014 Pearson Education, Inc. Slide 2- 16
Thank you

Copyright © 2014 Pearson Education, Inc. Slide 2- 17


All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written permission of the publisher. Printed in the
United States of America.

Copyright © 2014 Pearson Education, Inc. Slide 2- 18

You might also like