1 Lecture 1-Introduction
1 Lecture 1-Introduction
2
Introduction
Data
Items that are the most elementary descriptions of
things, events, activities, and transactions
May be internal or external
Information
Organized data that has meaning and value
Knowledge
Processed data or information that conveys
understanding or learning applicable to a problem or
activity
3
Data, information and
knowledge
5
What is Business Intelligence (BI)?
All processes, techniques, and tools that support
business decision making based on information
technology.
6
What is a Data Warehouse (DWH)?
A decision support database that is maintained separately
from the organization’s operational database
Support information processing by providing a solid
platform of consolidated, historical data for analysis.
7
Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer,
product, sales.
Focusing on the modeling and analysis of data for
decision makers, not on daily operations or transaction
processing.
Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process.
8
Data Warehouse—Integrated
Constructed by integrating multiple,
heterogeneous data sources
relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are
applied.
Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.
9
Data Warehouse—Time Variant
The time horizon for the data warehouse is
significantly longer than that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly
But the key of operational data may or may not contain
“time element”.
10
Data Warehouse—Non-Volatile
A physically separate store of data transformed
from the operational environment.
Operational update of data does not occur in the
data warehouse environment.
Does not require transaction processing, recovery, and
Marketing
30 sec response
time.
Last 2 years
program
12
Data Warehouse vs. traditional integration in
Heterogeneous DBMS
Traditional heterogeneous DB integration:
Build wrappers/mediators on top of heterogeneous databases
Query driven approach
When a query is posed to a client site, a meta-dictionary is used
to translate the query into queries appropriate for individual
heterogeneous sites involved, and the results are integrated into a
global answer set
Complex information filtering, compete for resources
Data warehouse: update-driven, high performance
Information from heterogeneous sources is integrated in advance
and stored in warehouses for direct query and analysis
13
Data Warehouse (OLAP) vs. Operational DBMS (OLTP)
OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations such as purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
Distinct features (OLTP vs. OLAP):
User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
Database design: ER + application vs. star + subject
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only but complex queries
14
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
15
Successfully implemented data warehouses can bring benefits
to an organization as below:
16
Typical data warehouse queries (Case study: Banking industry)
• Which corporate customers are above the average account usage per month and
how does this correlate to their business?
• Who were the first hundred customers in Jan 2006 and how does this list
compare with the list for the previous three years?
17
Complexities of Creating a Data Warehouse
Incomplete errors
Missing Fields
Records or Fields That, by Design, are not Being Recorded
Incorrect errors
Wrong Calculations, Aggregations
Duplicate Records
Wrong Information Entered into Source System
Inconsistency errors
Inconsistent Use of Different Codes
18
Best Practices
Data Warehousing is a process and not a project
Complete requirements and design
Prototyping is key to business understanding
Utilizing proper aggregations and detailed data
A full iterative approach is essential
Training is an on-going process
Build data integrity checks into your system
19
•High investment
•The initial cost of building a data warehouse is very high and
ROI cannot easily be explained.
• Large storage
•Data warehouse stores useful historical data of an enterprise.
• Cross-selling of products
21
Conclusions
Building data warehouse is good but not sufficient.
The data in a data warehouse has to be accessed by
users and in order to access it; a BI tool has to be
used.
22