Data Warehouse
Data Warehouse
Agenda
What is Data Warehouse Transaction System vs Data Warehouse Data Warehouse Architecture Metadata Data Flows Issues for building Data Warehouse Warehouse Schema Tool & Technologies Advantages of Data Warehouse Problems Data Mart Data Mining
Data Warehouse
Collection of integrated, subject-oriented, time-variant and non-volatile data in support of managements decision making process. Described as the "single point of truth", the "corporate memory", the sole historical register of virtually all transactions that occur in the life of an organization.
Data Warehouse
Data Warehouse
Supports management analysis and decision-making processes Contains summarized, refined, and cleansed information Non-volatile -- provides a data snapshot; adjustments are not permitted, or are limited Business analysis requirements drive the data structure and system design Integrated, consistent information on a single technology platform Users have direct, fast access via On-line Analytical Processing tools Minimal impact on operational processes
Data Warehouse
ODS 2
DBMS
OLAP tools
ODS 3
Warehouse Manager
Data mining
Archive/backup data
Data Warehouse
Operational datastore(ODS) It is a repository of current and integrated operational data used for analysis. Load manager it performs all the operations associated with the extraction and loading of data into the warehouse.
Warehouse managerperforms all the operations associated with the management of the data in the warehouse.
Query manageralso called backend component, it performs all the operations associated with the management of user queries.
Data Warehouse
End-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools
Summarized data-> Stores all th aggregations generated by warehouse manager.Exists to speed up performance of queries and do not require backup
Archive/backup data-> Backup ensures recovery of Data Warehouse from any data loss or any failure. In archiving, older data is removed from the system in a format that allows it to be qickly restored if required. Meta-data
Data Warehouse
Meta-data : data about data Purpose of meta-data is to show the pathway back to where the data began, so that the warehouse administrators know the history of any item in the warehouse The meta-data associated with data transformation and loading must describe the source data and any changes that were made to the data The meta-data associated with data management describes the data as it is stored in the warehouse The meta-data is required by the query manager to generate appropriate queries, also is associated with the user of queries
Data Warehouse
Data flows
Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse. upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging , packaging, and distribution of the data downflow- The processes associated with archiving and backing-up of data in the warehouse outflow- The process associated with making the data availabe to the end-users Meta-flow- The processes associated with the management of the meta-data
Data Warehouse
Upflow
DBMS
Warehouse Manager
When and how to gather Data? What schema to use? Data Cleansing How to propagate updates? What data to summarize?
Data Warehouse
Warehouse Schema
Fact Table:
Stores the business data. Data in fact table is called Fact. They contain multidimensional data.
Dimension Table:
To minimize storage requirements, dimension attributes are usually short identifiers that are foreign keys into other tables called Dimension Table
Data Warehouse
Product Number
Description Of Product
AREA
Area 2
DURATION
Area 3
Data Warehouse
Star Schema
Fact table in the center and all the dimension tables attached to the central fact table. Example: Sales Processing
Dimension Table: PRODUCT
Dimension Tables
Region_Dimension_Table region _id region _doc NE NW SE SW Northeast Northwest Southeast Southwest
Product_Dimension_Table prod_grp_id prod_id prod_grp_desc prod_desc 10 20 30 100 140 220 Fewer devices Circuit boards Components Power supply Motherboard Co-processor
100000 100000 110000 110000 120000 120000 130000 130000 140000 140000
ABC Electronics ABC Electronics Midway Electric Midway Electric Victor Components Victor Components Washburn, Inc. Washburn, Zerox Zerox
Inc.
Account_Dimension_Table
month month
prod_id prod_id
region_id region_id
account_id account_id
gross_sales gross_sales
SW SW NE NE SW SW
Fact Table
Monthly_Sales_Summary_Table Vendor_Dimension_Table
month month mo_in_fiscal_yr mo_in_fiscal_yr month_name month_name vend_id vend_id vendor_desc vendor_desc 01-1996 01-1996 02-1996 02-1996 03-1996 03-1996 4 4 5 5 6 6 January January February February March March 100 100 200 200 300 300 PowerAge, Inc. PowerAge,
Inc.
Advanced Micro Devices Advanced Micro Devices Farad Incorporated Farad Incorporated
Time_Dimension_Table
Data Warehouse
Snowflake Schema
Data Warehouse
Snowflake Schema
Product Category Product Manufacturer
Starflake Schema
Combination of Star Schema and Snowflake Schema. Consists of Fact table, Star Dimension and Snowflake Dimension.
Data Warehouse
Starflake Schema
Weight
Location Location 1
Data Warehouse
Location 2
Data Extraction - SAS Data Cleansing - Apertus, Trillium Data Storage - ORACLE, SYBASE
Data Warehouse
End-user access wide variety of data Business decision making for future purpose Increases data consistency Increases productivity Decreases computing costs Combines data
Data Warehouse
Problems
Increased end-user demands High demand for resources High maintenance Extracting, cleansing and loading data could be time consuming. Data warehousing increases project scope. Problems with compatibility with systems already in place e.g. transaction processing system. Providing training to end-users, who end up not using the data warehouse. Security could develop into a serious issue, especially if the data warehouse is web accessible.
Data Warehouse
Data mart
It a subset of a data warehouse that supports the requirements of particular department or business function The characteristics that differentiate Data Marts and Data Warehouses include:
A Data mart focuses on only the requirements of users associated with one department or business function Data marts do not normally contain detailed operational data, unlike data warehouses As data marts contain less data compared with data warehouses, data marts are more easily understood and navigated
Data Warehouse
Warehouse Manager
Highly summarized data Lightly summarized data
Meta-data ODS 1
Load Manager
ODS 2
Detailed data
DBMS
OLAP tools
Warehouse Manager
ODS 3
Data mining
End-user access tools
To give users access to the data they need to analyze most often
To provide data in a form that matches the collective view of the data by a group of users in a department or business function To improve end-user response time due to the reduction in the volume of data to be accessed To provide appropriately structured data the user as it is the requirements of end-user access tools Normally use less data so tasks such as data cleansing, loading, transformation, and integration are far easier, and hence implementing and setting up a data mart is simpler than establishing a corporate data warehouse
Data Warehouse
Data Mining
Process of extracting previously unknown, valid and actionable information from large data and then using the information to make crucial business decisions. Applications : Early warning systems, Fraud detection, market research, direct mail.
Data Mining provides techniques to : Detect trends or patterns, find correlations Data Analysis
Data Warehouse