This document discusses data warehousing and related concepts. It covers:
- The need for a data warehouse to integrate data from different sources for analysis and decision making.
- Key components of a data warehouse including the data warehouse itself, data marts, extract/load/clean processes, and OLAP tools.
- Benefits of a data warehouse such as a single source of consistent data to support complex queries across business areas.
This document discusses data warehousing and related concepts. It covers:
- The need for a data warehouse to integrate data from different sources for analysis and decision making.
- Key components of a data warehouse including the data warehouse itself, data marts, extract/load/clean processes, and OLAP tools.
- Benefits of a data warehouse such as a single source of consistent data to support complex queries across business areas.
This document discusses data warehousing and related concepts. It covers:
- The need for a data warehouse to integrate data from different sources for analysis and decision making.
- Key components of a data warehouse including the data warehouse itself, data marts, extract/load/clean processes, and OLAP tools.
- Benefits of a data warehouse such as a single source of consistent data to support complex queries across business areas.
This document discusses data warehousing and related concepts. It covers:
- The need for a data warehouse to integrate data from different sources for analysis and decision making.
- Key components of a data warehouse including the data warehouse itself, data marts, extract/load/clean processes, and OLAP tools.
- Benefits of a data warehouse such as a single source of consistent data to support complex queries across business areas.
Download as PPT, PDF, TXT or read online from Scribd
Download as ppt, pdf, or txt
You are on page 1of 46
DATA WAREHOUSING
We need to get this
modified order quickly to our European supplier Someone from XYZ INC. wants to know what the status of this order I need to find out why our sales in the south are dropping. Where can I find a copy of last months newsletter? Different Business needs 3 OLAP Data warehous e
Intranet
Decision support
OLTP Web Technolog y
Flash monitoring & reporting
Client Server Legacy The Technology Jigsaw Puzzle 4 Legacy systems Transactional Web scripts Informational Web scripts Decision support applications Flash monitoring & reporting OLTP Applications Workflow management clients Transactional Web services Informational Web services Data Warehouse Operational data store Active database Workflow management services Virtual corp. Informational Decisional Operational The InfoMotion Enterprise Architecture 5 Business perspective Decisional Informational Virtual corporation Operational 6 Technology perspective Operational Needs Operational Legacy System OLTP Application Active Database Operational Data store Flash monitoring and reporting Workflow management (Groupware) 7 Operational Needs Legacy Systems The term legacy system refers to any information system currently in use that was built using previous technology generations.
Most legacy systems are operational in nature.
8 Operational Needs OLTP Applications The Online Transaction Processing refers to systems that automate and capture business transactions through the use of computer systems.
It produce reports that allow business users to track the status of transactions.
OLTP applications and their related Active databases compose the majority of client/server systems today. 9 Operational Needs Active Database Databases began to take more active role through database programming (e.g. stored procedures) and event management.
IT professionals are now able to bullet proof the application by placing processing logic in the database itself. 10 Operational Needs Operational Data Store An operational data store (ODS) is a collection of integrated databases designed to support the monitoring of operations. Operational data store contains subject oriented, volatile and current enterprise-wide detailed information Operational data store are constantly refreshed so that the resulting image reflects the latest state of operations. Integratio n and transform ation Operational data store 11 Operational Needs Flash monitoring and reporting These tools provide business users with a dash board meaningful online information on the operational status of the enterprise by making use of data in the operational data store.
Workflow management and groupware Workflow management systems are tools that allow groups to communicate and coordinate their work. 12 Decisional Needs Data Warehouse The data warehouse concept was developed as IT professional realized that the structure of data required for transaction reporting was significantly different from the structure required to analyze data. Decision support applications Decision support applications are also known as OLAP (Online Analytical processing) . These applications provide managerial users with meaningful views of past and present enterprise data. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? What product prom- -otions have the biggest impact on revenue? What is the most effective distribution channel? Producer wants to know. Data, Data everywhere yet ... I cant find the data I need data is scattered over the network many versions, subtle differences
I cant get the data I need need an expert to get the data I cant understand the data I found available data poorly documented
I cant use the data I found results are unexpected data needs to be transformed from one form to other 15 Definition A data warehouse is a type of contemporary database system designed to fulfill decision-support needs.
A data warehouse differ from a conventional decision- support database in number of ways Volume of data Diverse data sources Dimensional access 16 Inmon defines a data warehouse as being A subject oriented, integrated, time variant, and non- volatile collection of data used in support of management decision-making
A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
[Barry Devlin] What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference
[Forrester Research, April 1996] Data Information 18 A data warehouse provides a single manageable structure for decision-support data
A data warehouse enables organizational users to run complex queries on data that traverses a number of business areas.
A data warehouse enables a number of business intelligence applications such as on-line analytical processing and data mining. Benefits of Data warehousing What are the users saying... Data should be integrated across the enterprise Summary data has a real value to the organization Historical data holds the key to understanding data over time What-if capabilities are required Evolution 60s: Batch reports hard to find and analyze information inflexible and expensive, reprogram every new request 70s: Terminal-based DSS and EIS (executive information systems) still inflexible, not integrated with desktop tools 80s: Desktop data access and analysis tools query tools, spreadsheets, GUIs easier to use, but only access operational databases 90s: Data warehousing with integrated OLAP engines and tools Very Large Data Bases Terabytes -- 10^12 bytes:
Petabytes -- 10^15 bytes:
Exabytes -- 10^18 bytes:
Zettabytes -- 10^21 bytes:
Zottabytes -- 10^24 bytes:
Walmart -- 24 Terabytes
Geographic Information Systems
National Medical Records
Weather images
Intelligence Agency Videos Data Warehousing -- It is a process Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible
A decision support database maintained separately from the organizations operational database Explorers, Farmers and Tourists Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data Farmers: Harvest information from known access paths Tourists: Browse information harvested by farmers Data Warehouse Architecture Data Warehouse Engine Optimized Loader Extraction Cleansing Analyze Query Metadata Repository Relational Databases Legacy Data Purchased Data ERP Systems 25 Data Mart From a data warehouse, data flows to various departments for their customized DSS usage. These individual departmental components are called data marts.
Data marts is a subset of data warehouse and is much more popular than data warehouse.
26 Example Components of supermarket chain application 27 Types of Data Marts Data marts can be classified into two groups Multidimensional OLAP (MDDB OLAP or MOLAP) Relational OLAP (ROLAP) 28 Loading a Data Mart Data Mart is loaded with data from a data warehouse by means of a load program.
Main points for load program Frequency and schedule Total or partial refreshment (or addition) Customization of data from the warehouse(selection) Re-sequencing and merging of data Aggregation of data Summarization Efficiency ( loading time optimization) Integrity of data Data relationship and integrity of data domains
29 Meta data for a Data Mart Components of meta data for given data warehouse or data mart
Description of sources of data Description of Customization that may have taken place as data passes from data warehouse into data mart. Descriptive information about data mart, its tables, attributes and relationships etc. Definitions of all types
Data Mining works with Warehouse Data Data Warehousing provides the Enterprise with a memory
Data Mining provides the Enterprise with intelligence
We want to know ... Given a database of 100,000 names, which persons are the least likely to default on their credit cards? Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer? If I raise the price of my product by Rs. 2, what is the effect on my returns? If I offer only 2,500 airline miles as an incentive to purchase rather than 5,000, how many lost responses will result? If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues? Which of my customers are likely to be the most loyal? Data Mining helps extract such information Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis 33 Why Separate Data Warehouse? Performance Op dbs designed & tuned for known txs & workloads. Complex OLAP queries would degrade perf. for op txs. Special data organization, access & implementation methods needed for multidimensional views & queries. Function Missing data: Decision support requires historical data, which op dbs do not typically maintain. Data consolidation: Decision support requires consolidation (aggregation, summarization) of data from many heterogeneous sources: op dbs, external sources. Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled. Data Mining in Use The US Government uses Data Mining to track fraud A Supermarket becomes an information broker Basketball teams use it to track game strategy Cross Selling Warranty Claims Routing Holding on to Good Customers Weeding out Bad Customers What makes data mining possible? Advances in the following areas are making data mining deployable: data warehousing better and more data (i.e., operational, behavioral, and demographic) the emergence of easily deployed data mining tools and the advent of new data mining techniques.
What are Operational Systems? They are OLTP systems Run mission critical applications Need to work with stringent performance requirements for routine tasks Used to run a business!
RDBMS used for OLTP Database Systems have been used traditionally for OLTP clerical data processing tasks detailed, up to date data structured repetitive tasks read/update a few records isolation, recovery and integrity are critical Operational Systems Run the business in real time Based on up-to-the-second data Optimized to handle large numbers of simple read/write transactions Optimized for fast response to predefined transactions Used by people who deal with customers, products -- clerks, salespeople etc. They are increasingly used by customers Examples of Operational Data Data Industry Usage Technology Volumes Customer File All Track Customer Details Legacy application, flat files, main frames Small-medium Account Balance Finance Control account activities Legacy applications, hierarchical databases, mainframe Large Point-of- Sale data Retail Generate bills, manage stock ERP, Client/Server, relational databases Very Large Call Record Telecomm- unications Billing Legacy application, hierarchical database, mainframe Very Large Production Record Manufact- uring Control Production ERP, relational databases, AS/400 Medium So, whats different? Application-Orientation vs. Subject- Orientation Application-Orientation Operational Database Loans Credit Card Trust Savings Subject-Orientation Data Warehouse Customer Vendor Product Activity OLTP vs. Data Warehouse OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries) e.g., average amount spent on phone calls between 9AM-5PM in Pune during the month of December
OLTP vs Data Warehouse OLTP Application Oriented Used to run business Detailed data Current up to date Isolated Data Repetitive access Clerical User Warehouse (DSS) Subject Oriented Used to analyze business Summarized and refined Snapshot data Integrated Data Ad-hoc access Knowledge User (Manager) OLTP vs Data Warehouse OLTP Performance Sensitive Few Records accessed at a time (tens)
Read/Update Access
No data redundancy Database Size 100MB -100 GB Data Warehouse Performance relaxed Large volumes accessed at a time(millions) Mostly Read (Batch Update) Redundancy present Database Size 100 GB - few terabytes OLTP vs Data Warehouse OLTP Transaction throughput is the performance metric Thousands of users Managed in entirety
Data Warehouse Query throughput is the performance metric Hundreds of users Managed by subsets To summarize ... OLTP Systems are used to run a business