0% found this document useful (0 votes)
40 views26 pages

Data Warehouse

The document provides an overview of data warehouses, including: 1) A data warehouse is a collection of integrated, subject-oriented data used to support management decision making. It contains cleansed and summarized data from multiple sources. 2) The data warehouse architecture includes operational data stores, a warehouse manager for data loading and management, and end-user access tools like reporting, OLAP, and data mining. Dimensional modeling with fact and dimension tables is commonly used. 3) Issues in building a data warehouse include how and when to extract data, determining the schema, data cleansing, propagation of updates, and data summarization. Tools like SAS, Apertus and databases like Oracle are used.

Uploaded by

Chitransh Naman
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views26 pages

Data Warehouse

The document provides an overview of data warehouses, including: 1) A data warehouse is a collection of integrated, subject-oriented data used to support management decision making. It contains cleansed and summarized data from multiple sources. 2) The data warehouse architecture includes operational data stores, a warehouse manager for data loading and management, and end-user access tools like reporting, OLAP, and data mining. Dimensional modeling with fact and dimension tables is commonly used. 3) Issues in building a data warehouse include how and when to extract data, determining the schema, data cleansing, propagation of updates, and data summarization. Tools like SAS, Apertus and databases like Oracle are used.

Uploaded by

Chitransh Naman
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 26

Data Warehouse

Agenda

What is Data Warehouse Transaction System vs Data Warehouse Data Warehouse Architecture Metadata Data Flows Issues for building Data Warehouse Warehouse Schema Tool & Technologies Advantages of Data Warehouse Problems Data Mart Data Mining
Data Warehouse

What is Data Warehouse?

Collection of integrated, subject-oriented, time-variant and non-volatile data in support of managements decision making process. Described as the "single point of truth", the "corporate memory", the sole historical register of virtually all transactions that occur in the life of an organization.

Data Warehouse

Transaction System vs. Data Warehouse


Transaction System
Supports day-to-day operational processes Contains raw, detailed data that has not been refined or cleansed Volatile -- data changes from day-to-day, with frequent updates Technical issues drive the data structure and system design Disparate data structures, physical locations, query types, etc. Users rely on technical analysts for reporting needs Operational processes impacted by queries run off of system

Data Warehouse
Supports management analysis and decision-making processes Contains summarized, refined, and cleansed information Non-volatile -- provides a data snapshot; adjustments are not permitted, or are limited Business analysis requirements drive the data structure and system design Integrated, consistent information on a single technology platform Users have direct, fast access via On-line Analytical Processing tools Minimal impact on operational processes

Data Warehouse

Data Warehouse Architecture


ODS 1 Meta-data Lightly summarized data High Summarized data

Query Manager Load Manager


Detailed data

Reporting, query, application development, and EIS tools

ODS 2

DBMS

OLAP tools

ODS 3

Operational data store (ODS)

Warehouse Manager
Data mining

Archive/backup data
Data Warehouse

End-user access tools

Operational datastore(ODS) It is a repository of current and integrated operational data used for analysis. Load manager it performs all the operations associated with the extraction and loading of data into the warehouse.

Warehouse managerperforms all the operations associated with the management of the data in the warehouse.
Query manageralso called backend component, it performs all the operations associated with the management of user queries.

Data Warehouse

End-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools

Summarized data-> Stores all th aggregations generated by warehouse manager.Exists to speed up performance of queries and do not require backup
Archive/backup data-> Backup ensures recovery of Data Warehouse from any data loss or any failure. In archiving, older data is removed from the system in a format that allows it to be qickly restored if required. Meta-data

Data Warehouse

Importance of Meta Data


Meta-data : data about data Purpose of meta-data is to show the pathway back to where the data began, so that the warehouse administrators know the history of any item in the warehouse The meta-data associated with data transformation and loading must describe the source data and any changes that were made to the data The meta-data associated with data management describes the data as it is stored in the warehouse The meta-data is required by the query manager to generate appropriate queries, also is associated with the user of queries

Data Warehouse

Data flows

Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse. upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging , packaging, and distribution of the data downflow- The processes associated with archiving and backing-up of data in the warehouse outflow- The process associated with making the data availabe to the end-users Meta-flow- The processes associated with the management of the meta-data

Data Warehouse

Reporting, query,application development, and EIS (executive information system) tools


Operational data source1

Warehouse Manager Meta-flow


Meta-data High summarized data

Inflow Load Manager


Operational data source n Detailed data Lightly summarized data

Outflow Query Manager OLAP (online analytical processing) tools

Upflow

DBMS
Warehouse Manager

Operational data store (ods)

Downflow Archive/backup data Data mining tools

End-user access tools


Information flows of a data warehouse
Data Warehouse

Issues to be addressed in Building Data Warehouse


When and how to gather Data? What schema to use? Data Cleansing How to propagate updates? What data to summarize?

Data Warehouse

Warehouse Schema

Fact Table:
Stores the business data. Data in fact table is called Fact. They contain multidimensional data.

Dimension Table:
To minimize storage requirements, dimension attributes are usually short identifiers that are foreign keys into other tables called Dimension Table

Data Warehouse

Schema with Fact & Dimension Table


Name of the Product PRODUCT Area 1

Product Number
Description Of Product

AREA

Area 2

DURATION

Area 3

Year Beginning Date Completion Date

Data Warehouse

Star Schema

Fact table in the center and all the dimension tables attached to the central fact table. Example: Sales Processing
Dimension Table: PRODUCT

Dimension Table: AREA

Fact Table SALES

Dimension Table: TIME

Dimension Table: CUSTOMER


Data Warehouse

Dimension Tables
Region_Dimension_Table region _id region _doc NE NW SE SW Northeast Northwest Southeast Southwest

Product_Dimension_Table prod_grp_id prod_id prod_grp_desc prod_desc 10 20 30 100 140 220 Fewer devices Circuit boards Components Power supply Motherboard Co-processor

account _id _id account _doc account account _doc

100000 100000 110000 110000 120000 120000 130000 130000 140000 140000

ABC Electronics ABC Electronics Midway Electric Midway Electric Victor Components Victor Components Washburn, Inc. Washburn, Zerox Zerox

Inc.

Account_Dimension_Table

month month

prod_id prod_id

region_id region_id

account_id account_id

vend_id vend_id net-sales net-sales

gross_sales gross_sales

01-1996 01-1996 02-1996 02-1996 03-1996 03-1996

100 100 140 140 220 220

SW SW NE NE SW SW

100000 100000 110000 110000 100000 100000

100 100 200 200 300 300

30,000 30,000 23,000 23,000 32,000 32,000

50,000 50,000 42,000 42,000 49,000 49,000

Fact Table
Monthly_Sales_Summary_Table Vendor_Dimension_Table
month month mo_in_fiscal_yr mo_in_fiscal_yr month_name month_name vend_id vend_id vendor_desc vendor_desc 01-1996 01-1996 02-1996 02-1996 03-1996 03-1996 4 4 5 5 6 6 January January February February March March 100 100 200 200 300 300 PowerAge, Inc. PowerAge,

Inc.

Advanced Micro Devices Advanced Micro Devices Farad Incorporated Farad Incorporated

Time_Dimension_Table

Data Warehouse

Snowflake Schema

Consists of Fact Table and Normalized Dimensional Table.


Disadvantage:

Unmanageable Data Difficult to Retrieve Data Metadata become Complex

Data Warehouse

Snowflake Schema
Product Category Product Manufacturer

Dimension Table PRODUCT

Dimension Table AREA

Fact Table SALES

Dimension Table TIME

Dimension Table CUSTOMER


Data Warehouse

Starflake Schema

Combination of Star Schema and Snowflake Schema. Consists of Fact table, Star Dimension and Snowflake Dimension.

Data Warehouse

Starflake Schema

Price Snowflake Dimension Product

Weight

Star Dimension Product

Fact Table SALES

Star dimension Location

Location Location 1
Data Warehouse

Location 2

Tools and Technologies


Tools & Technologies used in the construction of a Data Warehouse:

Data Extraction - SAS Data Cleansing - Apertus, Trillium Data Storage - ORACLE, SYBASE

Data Warehouse

Advantages of using data warehouse

End-user access wide variety of data Business decision making for future purpose Increases data consistency Increases productivity Decreases computing costs Combines data

Data Warehouse

Problems

Increased end-user demands High demand for resources High maintenance Extracting, cleansing and loading data could be time consuming. Data warehousing increases project scope. Problems with compatibility with systems already in place e.g. transaction processing system. Providing training to end-users, who end up not using the data warehouse. Security could develop into a serious issue, especially if the data warehouse is web accessible.

Data Warehouse

Data mart

It a subset of a data warehouse that supports the requirements of particular department or business function The characteristics that differentiate Data Marts and Data Warehouses include:

A Data mart focuses on only the requirements of users associated with one department or business function Data marts do not normally contain detailed operational data, unlike data warehouses As data marts contain less data compared with data warehouses, data marts are more easily understood and navigated
Data Warehouse

Operational data source1

Warehouse Manager
Highly summarized data Lightly summarized data

Meta-data ODS 1

Reporting, query,application development, and EIS tools


Query Manager

Load Manager

ODS 2

Detailed data

DBMS
OLAP tools
Warehouse Manager

ODS 3

(First Tier) Operational data store (ODS) Archive/backup data

Data mining
End-user access tools

summarized Data Data Mart (Relational database)


(Second Tier)

Summarized data (Multi-dimension database)


Data Warehouse

Reasons for creating a Data Mart

To give users access to the data they need to analyze most often

To provide data in a form that matches the collective view of the data by a group of users in a department or business function To improve end-user response time due to the reduction in the volume of data to be accessed To provide appropriately structured data the user as it is the requirements of end-user access tools Normally use less data so tasks such as data cleansing, loading, transformation, and integration are far easier, and hence implementing and setting up a data mart is simpler than establishing a corporate data warehouse
Data Warehouse

Data Mining

Process of extracting previously unknown, valid and actionable information from large data and then using the information to make crucial business decisions. Applications : Early warning systems, Fraud detection, market research, direct mail.

Data Mining provides techniques to : Detect trends or patterns, find correlations Data Analysis

Forecasting and business modeling

Data Warehouse

You might also like