0% found this document useful (0 votes)
36 views27 pages

Lect 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views27 pages

Lect 4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

CSC-426

Business Intelligence
&
Analytics
Lecture 4
Lecture 04
Dataware housing
Contents
• Business Intelligence and Data warehousing
• History of Data warehousing
• Characteristics of Data ware housing
• Data warehousing architectures
• Extraction, transformation, and load (E T L) processes
Learning Objectives (1 of 2)
Understand the basic definitions and concepts of data
warehousing
Understand data warehousing architectures
Describe the processes used in developing and managing data
warehouses
Explain data warehousing operations
Explain the role of data warehouses in decision support
Business Intelligence and Data Warehousing
• BI used to be
Business Analytics
everything related to
use of data for
managerial decision
support Descriptive Predictive Prescriptive

• Now, it is a part of
Questions
What happened? What will happen? What should I do?
What is happening? Why will it happen? Why should I do it?
Business Analytics
Business reporting Data mining Optimization
– BI = Descriptive
ü ü ü
Enablers

ü Dashboards ü Text mining ü Simulation


ü Scorecards ü Web/media mining ü Decision modeling

Analytics ü Data warehousing ü Forecasting ü Expert systems


Outcomes

Well defined Accurate projections Best possible


business problems of future events and business decisions
and opportunities outcomes and actions

Business Intelligence Advanced Analytics


What is a Data Warehouse?
• A physical repository where relational data are specially
organized to provide enterprise-wide, cleansed data in a
standardized format
• A relational database? (so what is the difference?)
• “The data warehouse is a collection of integrated, subject-
oriented databases designed to support D S S functions, where
each unit of data is non-volatile and relevant to some moment
in time”
A Historical Perspective to Data Warehousing
Characteristics of DWs
• Subject oriented
• Integrated
• Time-variant (time series)
• Nonvolatile
• Summarized
• Not normalized
• Metadata
• Web based, relational/multi-dimensional
• Client/server, real-time/right-time/active...
Data Mart
A departmental small-scale “DW” that stores only
limited/relevant data
• Dependent data mart
A subset that is created directly from a data warehouse
• Independent data mart
A small data warehouse designed for a strategic business unit
or a department
Other DW Components
• Operational data stores (ODS)
– A type of database often used as an interim area for a data
warehouse
• Oper marts
– An operational data mart
• Enterprise data warehouse (EDW)
– A data warehouse for the enterprise
• Metadata – “data about data”
– In DW metadata describe the contents of a data
warehouse and its acquisition and use
DW for Data-Driven Decision Making
• An example of a DW supporting data-driven decision making in
automotive industry

Data Warehouse
One management and analytics platform
for product configuration, warranty, and
diagnostic readout data

Reduced Produced Warranty Improved Cost of IT Architecture


Accurate
Infrastructure Expenses Quality Standardization
Environmental
Expenses Improved reimbursement Faster identification, One strategic platform for
2/3 cost reduction through accuracy through improved prioritization, and resolution Performance business intelligence and
data mart consolidation claim data quality of quality issues Reporting compliance reporting
A Generic DW Framework

Data Applications
Sources No data marts option (Visualization)
Data
Marts Routine
ERP Business
ETL
Reporting
Process
Data mart
Select (Marketing)
Legacy Metadata Data/text

/ Middleware
Extract mining
Data mart
Transform Enterprise (Operations)
POS Data warehouse
OLAP,
Integrate
Dashboard,

API
Data mart
(Finance) Web
Other Load
OLTP/Web
Replication Data mart
(...) Custom built
External
applications
Data
DW Architecture
• Three-tier architecture
1. Data acquisition software (back-end)
2. The data warehouse that contains the data & software
3. Client (front-end) software that allows users to access and
analyze data from the warehouse

• Two-tier architecture
– First two tiers in three-tier architecture are combined into one
DW Architectures

3-tier
architecture
Tier 1: Tier 2: Tier 3:
Client workstation Application server Database server

2-tier
architecture
Tier 1: Tier 2:
Client workstation Application & database server
Data Warehousing Architectures
• Issues to consider when deciding which architecture to use:
– Which database management system (DBMS) should be
used?
– Will parallel processing and/or partitioning be used?
– Will data migration tools be used to load the data
warehouse?
– What tools will be used to support data retrieval and
analysis?
A Web-based DW Architecture

Web pages
Application
Server

Client Web
(Web browser) Internet/ Server
Intranet/
Extranet
Data
warehouse
Alternative DW Architectures (1 of 2)
(a) Independent Data Marts Architecture

ETL
End user
Source Staging Independent data marts
access and
Systems Area (atomic/summarized data)
applications

(b) Data Mart Bus Architecture with Linked Dimensional Datamarts

ETL
Dimensionalized data marts End user
Source Staging
linked by conformed dimentions access and
Systems Area
(atomic/summarized data) applications

(c) Hub and Spoke Architecture (Corporate Information Factory)

ETL
End user
Source Staging Normalized relational
access and
Systems Area warehouse (atomic data)
applications

Dependent data marts


(summarized/some atomic data)
Alternative DW Architectures (2 of 2)
(d) Centralized Data Warehouse Architecture

ETL
Normalized relational End user
Source Staging
warehouse (atomic/some access and
Systems Area
summarized data) applications

(e) Federated Architecture

Data mapping / metadata


End user
Logical/physical integration of access and
Existing data warehouses
common data elements applications
Data marts and legacy systmes

• Each architecture has advantages and disadvantages!


• Which architecture is the best?
Ten Factors that Potentially Affect the
Architecture Selection Decision
1. Information interdependence between organizational units

2. Upper management’s information needs

3. Urgency of need for a data warehouse

4. Nature of end-user tasks

5. Constraints on resources

6. Strategic view of the data warehouse prior to implementation

7. Compatibility with existing systems

8. Perceived ability of the in-house I T staff

9. Technical issues

10. Social/political factors


Data Integration and the Extraction, Transformation,
and Load Process (1 of 2)
• ETL = Extract Transform Load
• Data integration
– Integration that comprises three major processes: data access,
data federation (allows multiple databases to function as one), and change capture (identifies
and tracks changes to data in a database).

• Enterprise application integration (EAI)


– A technology that provides a vehicle for pushing data from
source systems into a data warehouse
• Enterprise information integration (EII)
– An evolving tool space that promises real-time data integration
from a variety of sources, such as relational or multidimensional
databases, Web services, etc.
Data Integration and the Extraction,
Transformation, and Load Process (2 of 2)

Packaged Transient
application data source

Data
warehouse

Legacy Extract Cleanse Load


Transform
system

Data
marts
Other internal
applications
ETL (Extract, Transform, Load)
• Issues affecting the purchase of an ETL tool
– Data transformation tools are expensive
– Data transformation tools may have a long learning curve
• Important criteria in selecting an ETL tool
– Ability to read from and write to an unlimited number of
data sources/architectures
– Automatic capturing and delivery of metadata
– A history of conforming to open standards
– An easy-to-use interface for the developer and the
functional user
Data Warehouse Development
• Data warehouse development approaches
– Inmon Model: EDW approach (top-down)
– Kimball Model: Data mart approach (bottom-up)
– Which model is best?
• Table 3.3 provides a comparative analysis between EDW and
Data Mart approach
• Another alternative is the hosted data warehouses
Comparing EDW and Data Mart (1 of 2)
Table 3.3 Contrasts between the DM and EDW Development Approaches

Effort DM Approach EDW Approach


Scope One subject area Several subject areas
Development time Months Years
Development cost $10,000 to $100,000+ $1,000,000+
Development difficulty Low to medium High
Data prerequisite for sharing Common (within business area) Common (across enterprise)
Sources Only some operational and external Many operational and external
systems systems
Size Megabytes to several gigabytes Gigabytes to petabytes
Time horizon Near-current and historical data Historical data
Data transformations Low to medium High
Comparing EDW and Data Mart (2 of 2)
Table 3.3 [continued]
Effort DM Approach EDW Approach
Update frequency Hourly, daily, weekly Weekly, monthly
Technology Blank Blank
Hardware Workstations and departmental Enterprise servers and mainframe
Servers computers
Operating system Windows and Linux Unix, Z/OS, OS/390
Databases Workgroup or standard Enterprise database servers
database servers
Usage Blank Blank
Number of simultaneous Users 10s 100s to 1,000s
User types Business area analysts and Enterprise analysts and senior
Managers executives
Business spotlight Optimizing activities within the Cross-functional optimization and
business area decision making
Additional DW Considerations Hosted Data
Warehouses
• Benefits:
– Requires minimal investment in infrastructure
– Frees up capacity on in-house systems
– Frees up cash flow
– Makes powerful solutions affordable
– Enables solutions that provide for growth
– Offers better quality equipment and software
– Provides faster connections
Summary
Understand the basic definitions and concepts of data
warehousing
Understand data warehousing architectures
Describe the processes used in developing and managing data
warehouses
Data warehousing operations and the role of data warehouses in
decision support

You might also like