0% found this document useful (0 votes)
21 views22 pages

1 Lecture 1-Introduction

The document outlines a course on business intelligence and data warehousing over 11 weeks. Week 1 introduces BI and data warehouses. Weeks 2-5 cover data warehouse architectures, lifecycles, dimensional modeling, and tutorials. Week 6 has a test and covers ETL. Weeks 7-9 discuss OLAP, data mining, and data warehousing on the web. Weeks 10-11 cover matching information to users.

Uploaded by

signup8707
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views22 pages

1 Lecture 1-Introduction

The document outlines a course on business intelligence and data warehousing over 11 weeks. Week 1 introduces BI and data warehouses. Weeks 2-5 cover data warehouse architectures, lifecycles, dimensional modeling, and tutorials. Week 6 has a test and covers ETL. Weeks 7-9 discuss OLAP, data mining, and data warehousing on the web. Weeks 10-11 cover matching information to users.

Uploaded by

signup8707
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

Course Outline

 Week 1: Introduction to Business Intelligence , data warehouses and


Corporate Information
 Week 2: Data warehouse architectures
 Week 3: Data warehouse lifecycle and project management
 Week 4: Introduction to Dimensional Modeling
 Week 5: Building dimensional models – tutorial session
 Week 6: Test 1 and Data extraction, transformation and loading(ETL)
 Week 7: Online analytical processing (OLAP)
 Week 8: Data mining
 Week 9: Data warehousing and the Web
 Week 10: Data warehousing and the Web
 Week 11: Matching information to users

2
Introduction
Data
Items that are the most elementary descriptions of
things, events, activities, and transactions
May be internal or external
Information
Organized data that has meaning and value
Knowledge
Processed data or information that conveys
understanding or learning applicable to a problem or
activity
3
Data, information and
knowledge

(Adopted from Shimray) 4


Introduction
Over the years, storage and management of data
from various operational systems has become a
great challenge.
Long-term strategic planning has become
increasingly important in the modern global market.
For this reason, companies have worked towards:
Access to information at all levels
Survival and prosperity in a competitive world.
The focus of technology shifted from data input and
capture through the operational systems to
information access and availability.

5
What is Business Intelligence (BI)?
All processes, techniques, and tools that support
business decision making based on information
technology.

The approaches can range from a simple excel


spreadsheet to a major competitive intelligence
undertaking.

e.g. data visualization, data mining, statistical


analysis using R, SPSS, etc

6
What is a Data Warehouse (DWH)?
 A decision support database that is maintained separately
from the organization’s operational database
 Support information processing by providing a solid
platform of consolidated, historical data for analysis.

“A data warehouse is a subject-oriented,


integrated, time-variant, historical and non-
volatile collection of data in support of
management’s decision-making process.”—W.
H. Inmon

7
Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer,
product, sales.
Focusing on the modeling and analysis of data for
decision makers, not on daily operations or transaction
processing.
Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process.

8
Data Warehouse—Integrated
Constructed by integrating multiple,
heterogeneous data sources
relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are
applied.
Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
 E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.

9
Data Warehouse—Time Variant
The time horizon for the data warehouse is
significantly longer than that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly
But the key of operational data may or may not contain
“time element”.

10
Data Warehouse—Non-Volatile
A physically separate store of data transformed
from the operational environment.
Operational update of data does not occur in the
data warehouse environment.
Does not require transaction processing, recovery, and

concurrency control mechanisms


Requires only two operations in data accessing:
 initial loading of data and access of data.
11
Order processing
2 sec response
time
Last 6 months
order DATA
WAREHOUSE

Last 5 years data


Product price/ Inventory Response time 2
10 sec response time sec – 60 seconds
Last 10 price changes Data is not
Last 20 invent transactions modified

Marketing
30 sec response
time.
Last 2 years
program

12
Data Warehouse vs. traditional integration in
Heterogeneous DBMS
Traditional heterogeneous DB integration:
 Build wrappers/mediators on top of heterogeneous databases
 Query driven approach
 When a query is posed to a client site, a meta-dictionary is used
to translate the query into queries appropriate for individual
heterogeneous sites involved, and the results are integrated into a
global answer set
 Complex information filtering, compete for resources
Data warehouse: update-driven, high performance
 Information from heterogeneous sources is integrated in advance
and stored in warehouses for direct query and analysis

13
Data Warehouse (OLAP) vs. Operational DBMS (OLTP)
 OLTP (on-line transaction processing)
 Major task of traditional relational DBMS
 Day-to-day operations such as purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
 OLAP (on-line analytical processing)
 Major task of data warehouse system
 Data analysis and decision making
 Distinct features (OLTP vs. OLAP):
 User and system orientation: customer vs. market
 Data contents: current, detailed vs. historical, consolidated
 Database design: ER + application vs. star + subject
 View: current, local vs. evolutionary, integrated
 Access patterns: update vs. read-only but complex queries
14
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
15
Successfully implemented data warehouses can bring benefits
to an organization as below:

1. Organisational Performance: - Operational Systems and Data


Warehouse

2. Simplify - Make Complex Data from Many Systems available in


one

3. Accuracy - Standardize and Cleanse

4. Business Value - Provide the Foundation for the Business to


Have Access to Information to Make Timely, Informed Decisions

5. Direct Use - Non-IT personnel can make reports

16
Typical data warehouse queries (Case study: Banking industry)

• Which corporate customers are above the average account usage per month and
how does this correlate to their business?

• Who were the first hundred customers in Jan 2006 and how does this list
compare with the list for the previous three years?

• What is the revenue by destination, by month, by business unit, by region?

17
Complexities of Creating a Data Warehouse
 Incomplete errors
 Missing Fields
 Records or Fields That, by Design, are not Being Recorded

 Incorrect errors
 Wrong Calculations, Aggregations
 Duplicate Records
 Wrong Information Entered into Source System

 Inconsistency errors
 Inconsistent Use of Different Codes

18
Best Practices
Data Warehousing is a process and not a project
Complete requirements and design
Prototyping is key to business understanding
Utilizing proper aggregations and detailed data
A full iterative approach is essential
Training is an on-going process
Build data integrity checks into your system

19
•High investment
•The initial cost of building a data warehouse is very high and
ROI cannot easily be explained.
• Large storage
•Data warehouse stores useful historical data of an enterprise.

• Maintenance of source systems


• If data source systems are not cleaned, we automatically get
dirty data into the data warehouse. Decision makers using such
data are likely to be misled and their decisions may lead to loss
of company revenue.
• Qualified staff
•Data warehouse building and maintenance requires skilled
personnel.
20
• New insights into
• Customer habits
• Developing new products
• Selling more products

• Cost savings and revenue increases

• Cross-selling of products

• Identify and target most profitable customers

21
Conclusions
 Building data warehouse is good but not sufficient.
The data in a data warehouse has to be accessed by
users and in order to access it; a BI tool has to be
used.

22

You might also like