1.1 Basic Concepts & Architecture
1.1 Basic Concepts & Architecture
AND TECHNOLOGY
MODULE 1
Faculty - Dr.D.Prabha
DATA
INFORMATION
Data
Data Warehouse?
Single, complete and
consistent store of data
obtained from a variety of
different sources made
available to end users in a
way they can understand and
use in a business context.
[Barry Devlin]
Data warehouse - Database
which stores analytical data
for business decisions
6
Data Warehouse?
• “A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.” — W. H. Inmon
• A decision support database that is maintained separately
from the organization’s operational database
• Support information processing by providing a solid platform
of consolidated, historical data for analysis.
7
Data Warehouse—Subject
Oriented
• Organized around major subjects, such as customer,
product, sales
• Focusing on the modeling and analysis of data for
decision makers, not on daily operations or transaction
processing
• Provide a simple and concise view around particular
subject issues by excluding data that are not useful in the
decision support process
8
Data Warehouse—
Integrated
• Constructed by integrating multiple, heterogeneous data
sources
• relational databases, flat files, on-line transaction
records
• Data cleaning and data integration techniques are
applied.
• Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different
data sources. E.g., Hotel price: currency, tax
• When data is moved to the warehouse, it is converted.
9
Data Warehouse—Time
Variant
• The time horizon for the data warehouse is significantly
longer than that of operational systems
• Operational database: current value data
• Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
• Every key structure in the data warehouse
• Contains an element of time (explicitly or implicitly)
• But the key of operational data may or may not contain
“time element”
10
Data Warehouse—
Nonvolatile
• A physically separate store of data transformed from the
operational environment
• Operational update of data does not occur in the data
warehouse environment
• Does not require transaction processing, recovery,
and concurrency control mechanisms
• Requires only two operations in data accessing:
• initial loading of data and access of data
11
Data Warehousing tools
1. Amazon Redshift
2. Amazon S3
3. Microsoft Azure
4. Google BigQuery
5. Snowflake
6. Teradata
7. Informatica PowerCenter
8. IBM Infosphere
Data Warehouse Applications
● Retail Industry
✔ Forecasting, Market research, Merchandising etc.
● Banks
✔ Spot market trends, Marketing, Credit cards etc.
● Insurance Companies
✔ Property and casualty fraud etc.
● Internet Companies
✔ Analyzing shopping behavior, CRM etc.
● Telecommunications
✔ Telemarketing, Product development etc.
● Sports
✔ Analyzing strategies, Winning player combinations etc.
Datawarehouse Sizes(usage)
Construction of Data Warehouse requires –
Cleaning , Integration and Consolidation.
19
OLTP Vs OLAP
Feature OLTP OLAP
Number of tens millions
records accessed
20
Data Warehouse Architecture
There are 3 approaches
Single-tier architecture
The objective of a single layer is to minimize the amount of data
stored. This goal is to remove data redundancy. This
architecture is not frequently used in practice.
Two-tier architecture
Two-layer architecture is one of the Data Warehouse layers
which separates physically available sources and data
warehouse.
This architecture is not expandable and also not supporting a
large number of end-users. It also has connectivity problems
because of network limitations.
Three-Tier Data Warehouse Architecture
This is the most widely used Architecture of Data Warehouse.
21
Data Warehouse: A Multi-Tiered Architecture
Monitor
Metadata & OLAP Server
Other
sources Integrator
Analysis
Operational Extract Query
DBs Transform Data Serve Reports
Load
Refresh
Warehouse Data mining
Data Marts
•The data sourcing, transformation, and migration tools are used for
performing all the conversions and summarizations.