Warehousing

Data warehousing involves extracting data from multiple sources, transforming it to fit business needs, and loading it into a central data warehouse for analysis and reporting. A data warehouse contains integrated, subject-oriented data that supports decision-making. It allows users to analyze historical data across an organization. The extraction, transformation, and loading (ETL) process prepares the data to load into the warehouse by resolving inconsistencies and structuring it for easy analysis.

Uploaded by

manoraman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

339 views15 pages

Warehousing

Uploaded by

manoraman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 15

DATA WAREHOUSING

Introduction:
Our capabilities of both generating and collecting data have been
increasing rapidly in the last several decades.
Contributing factors include the widespread use of bar codes for most
commercial products, the computerization of many business, scientific
and government transactions, and advances in data collection tools
ranging from scanned text and image platforms to satellite remote
sensing systems.
Popular use of World Wide Web as a global information system has
flooded us with tremendous amount of data and information.
This explosive growth in stored data has generated an urgent need for
new techniques and automated tools that can intelligently assist us in
transforming the vast amounts of data into useful information and
knowledge.
Management of data is one of the important objective of computer
science.
Data for efficient management requires to be stored in better
architecture.
Data warehousing helps in this respect which stores data in multiple
dimensions.
• Definition:
1.A Data Warehouse is a repository of information
collected from multiple sources, stored under a
unified schema and which usually resides at a
single site.

2.A Data Warehouse is a repository of subjectively

selected and adapted operational data which can
answer any ad hoc, complex, statistical or analytical
queries.

3. A Data Warehouse is a subject-oriented, integrated,

time- variant and nonvolatile collection of data in
support of management’s decision making process.
• Data Warehouse refers to a database that is
maintained separately from an organization’s
operational databases.
• Data Warehouse systems allow for the
integration of a variety of application systems.
• They support information processing by
providing a solid platform of consolidated
historical data for analysis.
• Data Warehouse is a repository of an
organization’s electronically stored data.
• Data Warehouses are designed to facilitate
reporting & analysis.
• Features:
1. Subject Oriented:
 Data is arranged and optimized to provide answer
to questions from diverse functional areas.
 DW is organized around major subjects like
customer, supplier, product and sales.
 It focuses on the modeling and analysis of data for
decision makers and not on day to day operations
and transaction processing of an organization.
 DW typically provide a simple and concise view
around particular subject issues by excluding data
that are not useful in the decision support process.
 For example, to learn more about your company's
sales data, you can build a warehouse that
concentrates on sales. Using this warehouse, you
can answer questions like "Who was our best
customer for this item last year?"
2. Integrated:
 DW is constructed by integrating multiple,
heterogeneous data sources such as relational
databases, flat files, on-line transaction
Records.
 They must resolve problems such as naming
conflicts and inconsistencies among units of
measure.
 Data cleaning and data integration techniques
are applied to ensure consistency in naming
conventions, encoding structures, attribute
measures, etc. among different data sources.
E.g., Hotel price currency when data is moved
to the warehouse, it is converted.
3. Time Variant:
The time horizon for the data warehouse is
significantly longer than that of operational
systems.
Operational database: current value data.
 Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
 Every important element in the data warehouse
contains time either explicitly or implicitly.
4. Nonvolatile:
 Nonvolatile means that, once entered into the warehouse,
data should not change.
 This is logical because the purpose of a warehouse is to
enable you to analyze what has occurred.
 DW is a physically separate store of data transformed
from the operational environment.
 As operational update of data does not occur in the data
warehouse environment it does not require transaction
processing, recovery, and concurrency control
mechanisms.
 It requires only two operations in data accessing:
• Initial loading of data
• Access of data
5. Accessible:
 The primary purpose of data warehouse is to provide
readily accessible information to end users.

6. Process Oriented:
 It is important to view data warehousing as a process
for delivery of information.
 The maintenance of DW is ongoing and iterative in
nature.
Characteristics:
• Smaller number of (concurrent) users.
• Instant response is less important (only for interactively composing
reports.
• Read-only access by users.
• Most data access will be targeted at a small partition of the data: the
last month or quarter.
• Database access less frequent but executing large and complicated
queries that access many rows per table.
• Inconsistent, primarily long- running and complex read-only
transactions instead of high constant transaction rate.
• Load from operational data store will only insert new records, existing
ones do not get changed (updated).
• Bulk load from operational data store, no single-record inserts (at most
once daily).
• Database design partly de-normalized and redundant for better
performance, using a star or snowflake schema. Database design is
data-driven, not workflow-driven.
• Large storage capacity for historical data .
• May also contain aggregate data.
Benefits of data warehousing
Some of the benefits that a data warehouse provides are as follows:
• A data warehouse provides a common data model for all data of interest
regardless of the data's source.
• DW makes it easier to report and analyze information than it would be if
multiple data models were used to retrieve information such as sales
invoices, order receipts, general ledger charges, etc.
• Prior to loading data into the data warehouse, inconsistencies are
identified and resolved. This greatly simplifies reporting and analysis.
• Information in the data warehouse is under the control of data warehouse
users so that, even if the source system data is purged over time, the
information in the warehouse can be stored safely for extended periods of
time.
• Because they are separate from operational systems, data warehouses
provide retrieval of data without slowing down operational systems.
• Data warehouses can work in conjunction with and, hence, enhance the
value of operational business applications, notably customer relationship
management (CRM) systems.
• Data warehouses facilitate decision support system applications such as
trend reports (e.g., the items with the most sales in a particular area
within the last two years), exception reports, and reports that show actual
performance versus goals.
Data Warehousing:
• Data warehousing is a process of constructing and
using data warehouses.
• The classic definition of the data warehouse
focuses on data storage.
• However, the means to retrieve and analyze data,
to extract, transform and load data, and to manage
the data dictionary are also considered essential
components of a data warehousing system.
• Many references to data warehousing use this
broader context.
• Thus, an expanded definition for data warehousing
includes business intelligence tools (, tools to
extract, transform, and load data into the repository,
and tools to manage and retrieve metadata.
Extract, Transform, and Load (ETL) is a process in data
warehousing that involves:
• extracting data from outside sources,
• transforming it to fit business needs
• loading it into the end target, i.e. the data warehouse.
1) Extract:
– The first part of an ETL process is to extract the data from the source
systems.
– Most data warehousing projects consolidate data from different source
systems.
– Each separate system may also use a different data organization
format.
– Common data source formats are relational databases and flat files.
– Extraction converts the data into a format for transformation
processing.
• An intrinsic part of the extraction is the parsing of extracted
data, resulting in a check if the data meets an expected
pattern or structure. If not, the data may be rejected entirely.
2) Transform:
• The transform stage applies to a series of rules or functions to the
extracted data.
• Some data sources will require very little or even no manipulation of data.
• In other cases, one or more of the following transformations types to meet
the business and technical needs of the end target may be required:
– Selecting only certain columns to load (or selecting null columns not to
load).
– Translating coded values (e.g., if the source system stores 1 for male
and 2 for female, but the warehouse stores M for male and F for female) .
– Encoding free-form values (e.g., mapping "Male" to "1" and "Mr" to M)
– Deriving a new calculated value (e.g., sale_amount = qty * unit_price)
– Filtering
– Sorting
– Joining together data from multiple sources.
– Aggregation.
– Transposing or pivoting (turning multiple columns into multiple rows or
vice versa)
– Splitting a column into multiple columns (e.g., putting a comma-
separated list specified as a string in one column as individual values in
different columns)
3) Load:
• The load phase loads the data into the end target,
usually being the data warehouse.
• Depending on the requirements of the organization,
this process ranges widely. Some data warehouses
might weekly overwrite existing information with
cumulative, updated data, while other DW (or even
other parts of the same DW) might add new data in
a historized form, e.g. hourly.
• As the load phase interacts with a database, the
constraints defined in the database schema as well
as in triggers activated upon data load apply (e.g.
uniqueness, referential integrity, mandatory fields),
which also contribute to the overall data quality
performance of the ETL process.
Need for a separate data warehouse:
• Why not perform online analytical processing
directly on operational database?
• Why to spend additional time and resources to
construct a separate data warehouse?
1)Major reason for such separation is to promote high
performance of both systems.
2)OLAP operations on operational db reduces the
throughput of an OLTP system.
3)Separation is based on different structures, contents
and use of the data in two systems.
• Since the two systems provide quite different
functionalities and require different kinds of data, it is
necessary to maintain separate database.

Final Accounts of A Company
50% (2)
Final Accounts of A Company
10 pages
Airline Ticket Management System PROJECT
36% (11)
Airline Ticket Management System PROJECT
56 pages
Sperry Vision Master FT
80% (5)
Sperry Vision Master FT
489 pages
Data Warehouse
No ratings yet
Data Warehouse
57 pages
Outsystems Ebbok
No ratings yet
Outsystems Ebbok
586 pages
Data Mining Final New
No ratings yet
Data Mining Final New
109 pages
Basic Models For Organizational Design
100% (2)
Basic Models For Organizational Design
12 pages
Datawarehouse Unit2
No ratings yet
Datawarehouse Unit2
75 pages
GPS BASED ONLINE HOUSE RENTAL MANAGEMENT PROJECTall Onesent
50% (2)
GPS BASED ONLINE HOUSE RENTAL MANAGEMENT PROJECTall Onesent
52 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
Data Ware House
No ratings yet
Data Ware House
203 pages
Starbucks Coffee Industry
100% (1)
Starbucks Coffee Industry
32 pages
A Major Project Report On "Online Shopping Cart"
No ratings yet
A Major Project Report On "Online Shopping Cart"
53 pages
Unit 1 DWDM
No ratings yet
Unit 1 DWDM
122 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
Data and AI - Data Warehousing
No ratings yet
Data and AI - Data Warehousing
58 pages
What Is Data Warehouse
No ratings yet
What Is Data Warehouse
19 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
Cisa V26.75 - 1
No ratings yet
Cisa V26.75 - 1
182 pages
6 Thinking Hats: Edward de Bono
100% (2)
6 Thinking Hats: Edward de Bono
22 pages
Module 1-1basic Concepts
No ratings yet
Module 1-1basic Concepts
40 pages
Furniture System Documentation
77% (65)
Furniture System Documentation
35 pages
Unit Ii
No ratings yet
Unit Ii
45 pages
Unit One
No ratings yet
Unit One
41 pages
Data Warehousing-Notes (Module - I & II)
No ratings yet
Data Warehousing-Notes (Module - I & II)
32 pages
Data Warehouse
No ratings yet
Data Warehouse
86 pages
DWDM U-1
No ratings yet
DWDM U-1
45 pages
Data Warehousing
No ratings yet
Data Warehousing
16 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
29 pages
Leveraged Buyout
0% (1)
Leveraged Buyout
8 pages
Data Warehouse-Ccs341 Material
No ratings yet
Data Warehouse-Ccs341 Material
58 pages
Data Mining 1
No ratings yet
Data Mining 1
41 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
24 pages
Data Warehousing
No ratings yet
Data Warehousing
71 pages
NetBackup Copilot Configuration Guide - 2.7.3
No ratings yet
NetBackup Copilot Configuration Guide - 2.7.3
50 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Data Warehousing Fundamentals
No ratings yet
Data Warehousing Fundamentals
108 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
11 pages
PATROL Getting Started
No ratings yet
PATROL Getting Started
112 pages
DWDM
No ratings yet
DWDM
12 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
103 pages
Omniscan Mx2 Training Program: Introduction To Phased Array Using The Omniscan Mx2
No ratings yet
Omniscan Mx2 Training Program: Introduction To Phased Array Using The Omniscan Mx2
38 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
Data Warehousing-1
No ratings yet
Data Warehousing-1
51 pages
Ee PDF V2019-May-17 by Theobald 40q Vce
No ratings yet
Ee PDF V2019-May-17 by Theobald 40q Vce
12 pages
Chapter 1 - Database Performance Tuning and Query Optimization
No ratings yet
Chapter 1 - Database Performance Tuning and Query Optimization
50 pages
Introduction To DW
No ratings yet
Introduction To DW
28 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
26 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Dmi Unit 6
No ratings yet
Dmi Unit 6
6 pages
Bi Units F
No ratings yet
Bi Units F
53 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Data Warehouse Unit1 CS3551
No ratings yet
Data Warehouse Unit1 CS3551
25 pages
DMBI Unit-1
No ratings yet
DMBI Unit-1
37 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
Introduction To Data Warehousing Concepts
No ratings yet
Introduction To Data Warehousing Concepts
8 pages
Planning Process (BMC)
No ratings yet
Planning Process (BMC)
4 pages
Unit - I DW
No ratings yet
Unit - I DW
12 pages
Unit 1
No ratings yet
Unit 1
26 pages
All Unit
No ratings yet
All Unit
17 pages
Data Ware House and Its Purposes
No ratings yet
Data Ware House and Its Purposes
13 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
UNITyssu 1 LT
No ratings yet
UNITyssu 1 LT
12 pages
Data Warehouse: Concepts, Architecture and Components
No ratings yet
Data Warehouse: Concepts, Architecture and Components
5 pages
DB2 LUW For SAP BW Lab Exercises
No ratings yet
DB2 LUW For SAP BW Lab Exercises
56 pages
Case Study of BE Analysis
No ratings yet
Case Study of BE Analysis
4 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
Employee Stock Ownership Plans
No ratings yet
Employee Stock Ownership Plans
3 pages
Chale Nges
No ratings yet
Chale Nges
2 pages
International Finance Corporation Final 2
No ratings yet
International Finance Corporation Final 2
33 pages
Capital Adequacy Ratio
No ratings yet
Capital Adequacy Ratio
2 pages
Database Management Systems: What Is A Database?
No ratings yet
Database Management Systems: What Is A Database?
4 pages
Batch Processing
No ratings yet
Batch Processing
14 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
53 pages
Mock Exam
No ratings yet
Mock Exam
3 pages
Auditing in CIS Environment
No ratings yet
Auditing in CIS Environment
89 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
DWH Fundamentals (Training Material)
No ratings yet
DWH Fundamentals (Training Material)
21 pages
What Is SQL Server - Introduction, History, Types, Versions
No ratings yet
What Is SQL Server - Introduction, History, Types, Versions
11 pages
2.data Warehousing: Heterogeneous Database Integration
No ratings yet
2.data Warehousing: Heterogeneous Database Integration
26 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
Data Mining Warehousing I & II
No ratings yet
Data Mining Warehousing I & II
7 pages
RMInstallation Guide
No ratings yet
RMInstallation Guide
134 pages
Datawarehousing&Datamining: R.Kartheek B.Tech-Iii RD I.T V.R.S College, Chirala
No ratings yet
Datawarehousing&Datamining: R.Kartheek B.Tech-Iii RD I.T V.R.S College, Chirala
18 pages
Practical #3 (MS Access)
No ratings yet
Practical #3 (MS Access)
7 pages
Sap® System Cloning (Homogeneous System Copy) Solution On The Ibm Totalstorage Ds6800
No ratings yet
Sap® System Cloning (Homogeneous System Copy) Solution On The Ibm Totalstorage Ds6800
41 pages
Data Ware Housing
No ratings yet
Data Ware Housing
10 pages
Annodata-DISASTER RECOVERY WhitePaper
No ratings yet
Annodata-DISASTER RECOVERY WhitePaper
4 pages
Data Warehousing & Data Mining-A View
No ratings yet
Data Warehousing & Data Mining-A View
11 pages
De Unit-V
No ratings yet
De Unit-V
46 pages
MailWatch For MailScanner Installation
No ratings yet
MailWatch For MailScanner Installation
11 pages
Rara Tib
No ratings yet
Rara Tib
65 pages
A Debate - DDS Versus DDL - TechChannel
No ratings yet
A Debate - DDS Versus DDL - TechChannel
6 pages
Deocument 4229
No ratings yet
Deocument 4229
41 pages
Mayo Clinic-Phoenix Hospital Proposed Database Design Your Name University Name IST 7000
No ratings yet
Mayo Clinic-Phoenix Hospital Proposed Database Design Your Name University Name IST 7000
5 pages
ECGC
No ratings yet
ECGC
25 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Warehousing

Uploaded by

Warehousing

Uploaded by

DATA WAREHOUSING

2.A Data Warehouse is a repository of subjectively

3. A Data Warehouse is a subject-oriented, integrated,

You might also like