0% found this document useful (0 votes)
57 views30 pages

Business Intelligence: Lecture # 1

The document provides an overview of business intelligence and data warehousing concepts. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used for management decision making. Key components of a data warehouse include the data staging area, extraction and loading processes, the warehouse itself, analytics and query tools, and metadata. The document also discusses architectures such as independent vs dependent data marts, and contrasts operational data stores with data warehouses.

Uploaded by

Talha Khalid
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views30 pages

Business Intelligence: Lecture # 1

The document provides an overview of business intelligence and data warehousing concepts. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used for management decision making. Key components of a data warehouse include the data staging area, extraction and loading processes, the warehouse itself, analytics and query tools, and metadata. The document also discusses architectures such as independent vs dependent data marts, and contrasts operational data stores with data warehouses.

Uploaded by

Talha Khalid
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Business Intelligence

Lecture # 1

Evolving BI technologies

Page 2

Data Warehouse Defination

A Data Warehouse is a
subject-oriented, integrated, time-variant, non-volatile

collection of data used in support of management decision making processes.


-- Inmon & Hackathorn, 1994: viz. Hoffer, Chap 11
Page 3

Subject Oriented
A data warehouse is organized around major subjects, such as

Customer Supplier

Product
Sales.
Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.

Page 4

Subject Oriented

Page 5

Integrated
data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and online transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.

Page 6

Integrated

Page 7

Time-variant
Data are stored to provide information from a historical perspective (e.g., the past 510 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.

Page 8

Nonvolatile
A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require

transaction processing, recovery,

and concurrency control mechanisms.


It usually requires only two operations in data accessing:

initial loading of data access of data.


Page 9

Nonvolatile
Data in the data warehouse is never over-written or deleted - once committed, the data is static, read-only, and retained for future reporting Data is loaded, but not updated When subsequent changes occur, a new snapshot record is written

Page 10

Components of DW

Page 11

Components of DW
Data Staging Area Data Extraction and Loading The Warehouse Analyze and Query -- OLAP Tools

Metadata

Page 12

Data Staging Area


The data staging area of the data warehouse is both a storage area and a set of processes commonly referred to as extract-transformation-load (ETL). The data staging area is everything between the operational source systems and the data presentation area. The key architectural requirement for the data staging area is that it is offlimits to business users and does not provide query and presentation services.

Page 13

Data Presentation
The data presentation area is where data is organized, stored, and made available for direct querying by users, report writers, and other analytical applications. Since the backroom staging area is off-limits, the presentation area is the data warehouse as far as the business community is concerned Data in the queryable presentation area of the data warehouse must be dimensional, must be atomic

Page 14

Data Access Tools


We use the term tool loosely to refer to the variety of capabilities that can be provided to business users to leverage the presentation area for analytic decision making. By definition, all data access tools query the data in the data warehouses presentation area. A data access tool can be as simple as an ad hoc query tool or as complex as a sophisticated data mining or modeling application.

Page 15

Metadata
Data About Data The metadata structures the information in the data warehouse in categories, topics, groups, hierarchies and so on. Metadata are subject oriented and are based on abstractions of realworld entities, for example, project, customer, or organization. Metadata define the way in which the transformed data is to be interpreted, for example, 5/9/99 = 5th September 1999 or 9th May 1999 British or US? Metadata give information about related data in the data warehouse. Metadata estimate response time by showing the number of records to be processed in a query.

Metadata hold calculated fields and pre-calculated formulas to avoid misinterpretation, and contain historical changes of a view.

Page 16

ODS
An operational data store (ODS) presents a consistent picture of the current data stored and managed by transaction processing systems. As data is modified in the source system, a copy of the changed data is moved into the operational data store. Existing data in the operational data store is updated to reflect the current status of the source system. Typically, the data is stored in real time and used for day-to-day management of business operations.

Page 17

Tier architectures
Popular DW architectures

Generic Two-Tier Architecture Independent Data Mart

Dependent Data Mart and Operational Data Store


Logical Data Mart and Active Warehouse Three-Tier Architecture

Page 18

Data Warehouse Architectures

Basic

With Staging

With Staging & Data mart's


Page 19

Data Warehouse Architectures

Generic Two-Level Architecture Independent Data Mart Dependent Data Mart and Operational Data Store Logical Data Mart and Real-Time Data Warehouse Three-Layer architecture

Page 20

Generic two-level data warehousing architecture

Periodic extraction data is not completely current in warehouse


Page 21

Independent data mart data warehousing architecture


Data marts:
Mini-warehouses, limited in scope

Page 22

Separate ETL for each independent data mart

Data access complexity due to multiple data marts

Dependent data mart with operational data store: a three-level architecture

Single ETL for enterprise data warehouse (EDW)

ODS provides option for obtaining current data

Page 23

Dependent data marts loaded from EDW

Logical data mart and real time warehouse architecture


ODS and data warehouse are one and the same

E
Near real-time ETL for Data Warehouse
Page 24

Data marts are NOT separate databases, but logical views of the data warehouse Easier to create new data marts

Three-layer data architecture for a data warehouse

Page 25

OLTP
OLTP (OnLine Transaction Processing):

Also known under the name of operational data, it represents day-to-day operational business activities:
Purchasing, sales, production distribution,

Typically for data entry and retrieval transaction processing Reflects only the current state of the data

Page 26

OLAP
Online Line Analytical Processing

Activities performed by end users in online systems


Specific, open-ended query generation
SQL

Ad hoc reports Statistical analysis Building DSS applications

Modeling and visualization capabilities Special class of tools


DSS/BI/BA front ends Data access front ends Database front ends Visual information access systems

Page 27

OLTP vs. Data Warehouse


OLTP Holds current data Data is dynamic Read/Write accesses Repetitive processing Transaction driven Holds historic data Data is largely static Read only accesses Adhoc complex queries Analysis driven OLAP/DW

Application oriented
Used by clerical staff for day-to-day operations

Subject oriented
Used by top managers for analysis

Normalized data model (ER model)

Denormalized data model (Dimensional model)

Must be optimized for writes and small queries.

Must be optimized for queries involving a large portion of the warehouse.

Page 28

OLTP vs. Data Warehouse


Feature Characteristic Orientation User OLTP operational processing transaction clerk, DBA, database professional OLAP informational processing analysis knowledge worker (e.g., manager, executive, analyst) long-term informational requirements, decision support star/snowflake, subject-oriented historical; accuracy maintained over time summarized, consolidated summarized, multidimensional

Function DB design Data Summarization View

day-to-day operations ER based, application-oriented current; guaranteed up-to-date primitive, highly detailed detailed, flat relational

Unit of work Access Focus


Operations Number of records accessed Number of users DB size Priority
Page 29

short, simple transaction read/write data in


index/hash on primary key tens thousands 100 MB to GB high performance, high availability transaction throughput

complex query mostly read information out


lots of scans millions hundreds 100 GB to TB high flexibility, end-user autonomy query throughput, response time

Metric

Thankyou

You might also like