Business Intelligence: Lecture # 1
Business Intelligence: Lecture # 1
Lecture # 1
Evolving BI technologies
Page 2
A Data Warehouse is a
subject-oriented, integrated, time-variant, non-volatile
Subject Oriented
A data warehouse is organized around major subjects, such as
Customer Supplier
Product
Sales.
Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
Page 4
Subject Oriented
Page 5
Integrated
data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and online transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.
Page 6
Integrated
Page 7
Time-variant
Data are stored to provide information from a historical perspective (e.g., the past 510 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.
Page 8
Nonvolatile
A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require
Nonvolatile
Data in the data warehouse is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting Data is loaded, but not updated When subsequent changes occur, a new snapshot record is written
Page 10
Components of DW
Page 11
Components of DW
Data Staging Area Data Extraction and Loading The Warehouse Analyze and Query -- OLAP Tools
Metadata
Page 12
Page 13
Data Presentation
The data presentation area is where data is organized, stored, and made available for direct querying by users, report writers, and other analytical applications. Since the backroom staging area is off-limits, the presentation area is the data warehouse as far as the business community is concerned Data in the queryable presentation area of the data warehouse must be dimensional, must be atomic
Page 14
Page 15
Metadata
Data About Data The metadata structures the information in the data warehouse in categories, topics, groups, hierarchies and so on. Metadata are subject oriented and are based on abstractions of realworld entities, for example, project, customer, or organization. Metadata define the way in which the transformed data is to be interpreted, for example, 5/9/99 = 5th September 1999 or 9th May 1999 British or US? Metadata give information about related data in the data warehouse. Metadata estimate response time by showing the number of records to be processed in a query.
Metadata hold calculated fields and pre-calculated formulas to avoid misinterpretation, and contain historical changes of a view.
Page 16
ODS
An operational data store (ODS) presents a consistent picture of the current data stored and managed by transaction processing systems. As data is modified in the source system, a copy of the changed data is moved into the operational data store. Existing data in the operational data store is updated to reflect the current status of the source system. Typically, the data is stored in real time and used for day-to-day management of business operations.
Page 17
Tier architectures
Popular DW architectures
Page 18
Basic
With Staging
Generic Two-Level Architecture Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and Real-Time Data Warehouse Three-Layer architecture
Page 20
Page 22
Page 23
E
Near real-time ETL for Data Warehouse
Page 24
Data marts are NOT separate databases, but logical views of the data warehouse Easier to create new data marts
Page 25
OLTP
OLTP (OnLine Transaction Processing):
Also known under the name of operational data, it represents day-to-day operational business activities:
Purchasing, sales, production distribution,
Typically for data entry and retrieval transaction processing Reflects only the current state of the data
Page 26
OLAP
Online Line Analytical Processing
Page 27
Application oriented
Used by clerical staff for day-to-day operations
Subject oriented
Used by top managers for analysis
Page 28
day-to-day operations ER based, application-oriented current; guaranteed up-to-date primitive, highly detailed detailed, flat relational
Metric
Thankyou