0% found this document useful (0 votes)
84 views

Module 1

This document outlines a course on introduction to data warehousing. It includes the following: 1. The course objectives which are to describe key concepts of data warehousing including what it is, its value, how it differs from transaction systems, and the data warehouse design process. 2. The assessments for the course which are an assignment and quiz with corresponding rubrics. 3. An overview of the data warehousing topics to be covered including definitions of data warehousing, its value, applications, environment components, data models, and challenges of integration. It provides business scenarios to motivate the need for data warehousing capabilities.

Uploaded by

Sweet Emme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Module 1

This document outlines a course on introduction to data warehousing. It includes the following: 1. The course objectives which are to describe key concepts of data warehousing including what it is, its value, how it differs from transaction systems, and the data warehouse design process. 2. The assessments for the course which are an assignment and quiz with corresponding rubrics. 3. An overview of the data warehousing topics to be covered including definitions of data warehousing, its value, applications, environment components, data models, and challenges of integration. It provides business scenarios to motivate the need for data warehousing capabilities.

Uploaded by

Sweet Emme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

ng A.Y.

2020-2021
E
MODULE 1: 6 hrs. Introduction to Data Warehousing

Course
Alan S. Brillantes, CPA, MBA
Instructor
FB
Alan Brillantes
Messenger
Contact Details Email Ad [email protected]
Phone No./s 0932-9543932
Consultation 8:00-10:00
MWF 2:30-4:00 pm TTH
Hours am

Part I: TARGETED COURSE OUTCOMES

Demonstrate understanding of the basic concepts of data warehousing.

Learning Objectives
At the end of this module, you must be able to:
1. Describe what a data warehouse is, including its key characteristics and
properties.
2. Discuss the value and applications of data warehousing in business.
3. Differentiate between an online transaction processing system and a data
warehousing system.
4. Describe the parts of a data warehouse environment and their
interrelationships.
5. Discuss data model concepts.
6. Describe the data warehouse design and development process, and key
principles of data warehouse administration and management.
7. Discuss the major challenges of integrating a data warehouse into the
global information environment.
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 1


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Part II: ASSESSMENT/S

Learning Evidence

The following shall serve as evidence of your learning:

1. Accomplished Assignment
2. Accomplished Quiz

Rubric/Evaluation Tool

The following rubrics shall be utilized in evaluating and grading your work:

LE1: Accomplished
Assignment
Area to Weight Excellent Above Average Average Passing Failure
Assess
Complete- 60% All required 86-99% of 71-85% of 50%-70% of <50% of
ness contents are required required required required
present contents are contents are contents are contents are
present present present present
Substance 40% Depth & Depth & Depth & Depth & Generally
elaboration are elaboration are elaboration elaboration lacks depth &
exemplary very good are good are wanting elaboration
in some
parts

LE2: Quiz
Area to Superior Above Average Average Below Poor
Assess Average
Number of 91-100% 61-90% correct 51-60% 41-50% <40% correct
Correct correct
Answers

TEACHING-LEARNING ACTIVITIES (TLA)

What follows is a narrative discussion on the “Introduction to Data Warehousing.” At the


end of this section is a learning task composed of guide questions for you to answer to
enhance your learning of the topic(s).
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 2


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
INTRODUCTION TO DATA WAREHOUSING

CONTENTS:

 Problematic Business Scenarios


 What is a Data Warehouse?
 The Value of a Data Warehouse
 Applications of Data Warehousing
 Online Transaction Processing System vs Data Warehousing System
 The Data Warehouse Environment
 Data Models
 Multidimensional Data Model
 Data Warehouse Design and Develop Sequence
 Administering and Managing a Data Warehouse
 Integration of Data Warehouses into Global Information Environment: Major Challenges

Problematic Business Scenarios

“"We have heaps of data, but we cannot access it!" This shows the frustration of those
who are responsible for the future of their enterprises but have no technical tools to help them
extract the required information in a proper format.
"How can people playing the same role achieve substantially different results?" In
midsize to large enterprises, many databases are usually available, each devoted to a specific
business area. They are often stored on different logical and physical media that are not
conceptually integrated. For this reason, the results achieved in every business area are likely
to be inconsistent.
"We want to select, group, and manipulate data in every possible way!" Decision-making
processes cannot always be planned before the decisions are made. End users need a tool that
is user-friendly and flexible enough to conduct ad hoc analyses. They want to choose which
new correlations they need to search for in real time as they analyze the information retrieved.
"Show me just what matters!" Examining data at the maximum level of detail is not only
useless for decision-making processes, but is also self-defeating, because it does not allow
users to focus their attention on meaningful information.
"Everyone knows that some data is wrong!" This is another sore point. An appreciable
percentage of transactional data is not correct—or it is unavailable. It is clear that you cannot
achieve good results if you base your analyses on incorrect or incomplete data.“

Without a centralized database allowing ease of access, there was a lot speculation as
while people knew where the data was or how to get it, they couldn't. With a central database
providing the information at a moments notice, gathering information is much easier.

Even with meticulous organization, data accumulates over time. Before there were data
warehouses, people would have to manually shift through records stored and hope that the
information they wanted was kept.

Data is not always valid. Data is inputted by people and as such there is always a
chance that errors can occur, both accidentally and not. Before DW's, validation of data was
something which was not guaranteed which could lead to faulty data during information
gathering.

What is a Data Warehouse?

The data warehouse is a collection of integrated subject‐oriented databases


designed to support the DSS (decision support system) function, where each unit of
data is relevant to some moment in time. The data warehouse contains atomic data and
lightly summarized data.
Bill's definition is also described and expanded by Claudia Imhoff and colleagues in
Mastering Data Warehouse Design (Wiley, 2003):

… It [the DW] is the central point of data integration for business intelligence and is the
source of data for the data marts, delivering a common view of enterprise data.
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 3


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
This second viewpoint is incomplete without also including their definition of a data mart.
Again, expanding on Bill Inmon's definition, Claudia states in Mastering Data
Warehouse Design:

A data warehouse (DW) is the collection of processes and data whose


overarching purpose is to support the business with its analysis and decision‐making. In
other words, it is not one thing per se, but a collection of many different parts. Before
looking more closely at the specific parts of a data warehouse environment, it is helpful
to compare the characteristics and purpose of a data warehouse with an operational
application system.

A data mart is a departmentalized structure of data feeding from the data


warehouse where data is denormalized [organized] based on the department's need for
information. It utilizes a common enterprise view of strategic data and provides business
units with more flexibility, control, and responsibility. The data mart may or may not be
on the same server or location as the data warehouse.
To bring this second viewpoint into the proper context, Mastering Data Warehouse
Design further defines business intelligence:

Business intelligence is the set of processes and data structures used to analyze
data and information used in strategic decision support. The components of Business
Intelligences are the data warehouse, data marts, the DSS (decision support system)
interface and the processes to ‘get data in’ to the data warehouse and to ‘get
information out’.)

Key Characteristics of a Data Warehouse

Subject Oriented

In a data warehouse, data is organized according to subject instead of applications


(Chaudhuri and Dayal, 1997; Gardner, 1998; Tryfona et al., 1999). A subject area
identifies and groups processes that relate to a logical area of the business. In a data
warehouse, the information from across functional departments or business units is
organized in a manner that is subject oriented, with an enterprise view. This subject-
oriented detailed transactional data allows corporate users to drill down into the depth of
their business operations for data mining and business intelligence activities.
The operational environment focuses on the day-to-day operations of the business. In
the data warehouse, the data is oriented differently. It is concerned with the things that
drive the transactions; for example, customer, product, employee, accounts, flight,
purchase, or billing. Each of these subject areas is physically implemented as several
related tables in the data warehouse. A particular subject may be involved in different
types of transactions. For example, a customer appearing in the accounts payable
system may also be a parts supplier in the supplier system and therefore appears in
both systems.

Integrated

The warehouse contains integrated data about a particular subject instead of the
ongoing operations of the organization (Debevoise, 1999; Inmon, 1996a; Rahm and Do,
2000). Data is integrated as the data moves from operational systems into the data
warehouse. In a data warehouse the data not only is integrated across different
functional units of the organization but also includes external entities such as customers
and suppliers. For example, feeds from the stock market may be integrated with
financial data from operational systems in a data warehouse for a comprehensive
financial analysis.

Because data warehouses are targeted for decision support, they contain consolidated
data rather than detailed, individual transactional records. Data in the warehouses is
integrated from several operational databases, over potentially long periods of time into
one repository. Data is integrated to support a corporate view of the data. Integration is
not the mere gathering of data into a single large database. Integration of data requires
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 4


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
several processes. The two most important processes are data transformation and data
cleansing.

Nonvolatile

Data in the data warehouse is nonvolatile. Once the data enters the data warehouse, it
remains unchanged. In an operational system, data can be changed by deleting or
modifying it. The data in the data warehouse is not updated. Any change to the
information is done by adding a new record to reflect the changed status of the data.
The existing records are not modified. For example, say a person’s contact details are
stored in the customer database as Record No. 1. In an operational system, if the
person’s telephone number changes, this change is made to the Record No. 1 in the
customer database by modifying the entry. However, in a data warehouse no change
will be made to Record No. 1. Instead, a new record (Record No. 2) will be created and
inserted into the data warehouse to reflect the changed telephone number. The
warehouse data is nonvolatile in that the data that enter the database are rarely, if ever,
changed.

Time Variant

A major strength of the data warehouse lies in the time variance of its data (Pedersen
and Jensen, 1998; Han et al., 1998). The value of the operational data archived in the
data warehouse is a function of time and changes on the basis of time. A data
warehouse gives an accurate picture of operational data for a given time, and changes
in the data in the warehouse are based on the time-based changes in operational data.
The data from the operational systems is extracted at a specific moment in time,
creating a snapshot of the data. The data warehouse consists of snapshots of the
operational data taken at intervals of time. Data can be viewed in the data warehouse
across the field of time in different levels of detail. This time variant characteristic of the
data warehouse allows complex analysis along the time dimension, allowing patterns
and trends to be viewed over time.

Properties of Data Warehouses

Separation Analytical and transactional processing should be kept apart as much as


possible.
Scalability Hardware and software architectures should be easy to upgrade as the data
volume, which has to be managed and processed, and the number of users'
requirements, which have to be met, progressively increase.
As businesses grow, they accumulate more data. This data will eventually end up it's
way to the data warehouse where it will be stored for all time. Because of this, the data
warehouse will eventually find itself with years worth of day-to-day transactions which is
a lot of data. Therefore, a data warehouse should be available to accommodate the rise
of data without a loss of efficiency. The user should be able to get a reasonable analysis
whether there is a day's worth of data or a centuries in a relatable time.
Extensibility The architecture should be able to host new applications and technologies
without redesigning the whole system
Security Monitoring accesses is essential because of the strategic data stored in data
warehouses.
Administerability Data warehouse management should not be overly difficult

The Value of a Data Warehouse

Data warehousing is much bigger than simply delivering reports in a timely manner. It is
not the data, the technologies, or the reports that impact the business. Rather, it is the ability of
your staff to harness the information to make better, fact‐based, insightful decisions. The data
warehouse is simply a tool that enables your staff to be more effective. The types of things that
can be done using a data warehouse include the following:
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 5


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Tracking and trending key performance indicators: A data warehouse can provide reports
that indicate which product lines are popular in various regions, which employees have
generated the most sales, or whether certain ad campaigns are correlated with successful
sales.

Measuring business performance: Using reports from the data warehouse, actual and
forecasted performance can be compared. For example, claims managers can see how close
they are to reaching the target of making a first payment on claims within the first ten days of
opening a new claim.

Reporting and understanding financial results: A data warehouse can help identify
departments that have exceeded their monthly budgets, highlight suppliers who have
consistently met profitability goals, and single out products that have contributed the most (or
least) to the bottom line.

Understanding customers and their behavior: Exception reports that highlight changes in
consumer purchase patterns can help identify shifts in the marketplace or erosion of brand
loyalty. For example, early identification of changes in payment patterns might indicate that a
customer is under financial pressure and could benefit from a courtesy call to prevent more
serious problems.

Identifying high‐value customers: Using the data warehouse to identify the lifetime value of
customers helps with the development of loyalty programs and improves customer service.
Some customers may generate many business transactions, but they may not actually be
profitable. Other customers may contribute consistently to the organization's profits without a lot
of hands‐on interaction or support.

Attracting and retaining high‐value customers: Data warehouse reports can help you to
develop a profile of high‐value customers so that initiatives can be created to seek out new
customers with a similar profile. This may mean offering low‐cost incentives early on so that the
organization has the opportunity to develop a strong long‐term relationship.

Better selection or development of new products: Having integrated data in a common


place, the data warehouse can help streamline the product development process by enabling all
groups involved to quickly access market research test results, product packing cost scenarios,
and projected product sales.

Understanding which products should be scaled back or eliminated: Using the data
warehouse, reports can be generated to highlight products with lagging sales. Additional
analyses can be run to determine the cost effectiveness of continuing to carry these items in
stores. The data warehouse can also be used to help develop plans identifying when trendy
items should be marked down to clear out any remaining inventory.

Understanding business competitors: The data warehouse can provide reports to compare
internal sales volumes with external competitor sales figures. This can help identify fluctuations
in the overall marketplace and how well the organization is maintaining its market share.

Identifying opportunities to improve business flow and processes: The data warehouse
can be used to track how business transactions flow within the organization to identity
bottlenecks, the need for more training, or when systems capacity can no longer keep up with
demand.

Understanding the impact of highly qualified professionals: A data warehouse can also be
used effectively in not‐for‐profit scenarios. For example, data warehouse reports can help
identify teachers who meet specific criteria and to track how a teacher's students perform on
educational assessments over time.

The Promise of Data Warehousing


Since its inception, data warehousing has offered the promise of helping to improve your
business. A data warehouse is expected to provide both of the following:
• A single version of the truth
• The capability to access all data whenever it is needed
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 6


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Unfortunately, many organizations have invested millions of dollars in data warehousing
without realizing either of these goals. There are many reasons for these struggles and failures,
many of which are addressed in this book. First, it is worth looking a little more closely at these
expectations.
The idea of having a single version of the truth is appealing because so much time and
energy is wasted in chasing down discrepancies. It is reasonable to want to trust report results
so that decisions based upon those reports are sound. A data warehouse can indeed provide a
single repository for all of your data, but that alone is not enough to ensure that all reports will
be consistent. Clean, trusted data from the data warehouse is often pulled out, further
manipulated, possibly loaded into yet another database, and finally presented on reports. The
use of different formulae for calculations, and the criteria used to include or exclude data from
the result set, can cause significant differences in what is shown on a report. The effort of
loading data into a data warehouse is not enough to fulfill the promise of a single version of the
truth.
The second big promise of data warehousing is that any data that is desired will be
available at your fingertips whenever it is needed. Indeed, a data warehouse can make data
more accessible to many different types of users across an organization, but it is too expensive
to load all of the company's data into a data warehouse. The audience and business impact of
some data is not significant enough to justify the expense of including it in the data warehouse.
An organization must determine what data is needed to help the business decision‐making
processes. Then, the most useful data can be loaded into the data warehouse over a period of
time, perhaps even a number of years. Often this is viewed as the final goal: The data is
available in the data warehouse.
However, data in a database does not automatically mean that it is accessible to the
business community. The data must be made available through reports, dashboards, or analysis
tools that are combined with appropriate education about how to leverage the data as part of the
day‐to‐day decision‐making processes.

Applications of Data Warehousing


Data warehousing is applied across many different industries. For example, industries that
have realized data warehousing success include the following:
 Consumer packaged goods
 Financial services
 Manufacturing
 Utilities and telecommunications
 Pharmaceuticals
 Insurance
 Healthcare
 Shipping and transportation
 Educational institutions
 Nonprofit organizations
Furthermore, data warehousing has been successfully deployed across a wide variety of
business functions, such as sales, marketing, finance, purchasing, manufacturing quality,
human resources, inventory management, customer relationship marketing, call centers, and
more.

Online Transaction Processing System vs Data Warehousing System

Applications that run the business are called online transaction processing systems
(OLTPs). These OLTPs involve transaction processing that occurs interactively with the
end user. Online transactions are familiar to most people. Examples include:
 ATM machine transactions such as deposits, withdrawals, inquiries, and
transfers
 Supermarket payments with debit or credit cards
 Purchase of merchandise over the Internet

One of the main characteristics of a transaction system is that the interactions between
the user and the system are very short. The user will perform a complete business
transaction through short interactions, with immediate response time required for each
interaction. These systems are currently supporting mission-critical applications;
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 7


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
therefore, continuous availability, high performance, and data protection and integrity
are required.

OLTP systems are geared toward functions such as processing incoming orders,
getting products shipped out, and transferring funds as requested. These applications
must ensure that transactions are handled accurately and efficiently. No one wants to
wait minutes to get cash from an automated teller machine, or to enter sales orders into
a company's system.
In contrast, the purpose and characteristics of a data warehousing environment are to
provide data in a format easily understood by the business community in order to
support decision‐making processes. The data warehouse (DW) supports looking at the
business data over time to identify significant trends in buying behavior, customer
retention, or changes in employee productivity. Table 1 lays out the primary differences
between these two types of systems.
The inherent differences between the functions performed in OLTP and DW systems
result in methodology, architecture, tool, and technology differences. Data warehousing
emerged as an outgrowth of necessity, but has blossomed into a full‐fledged industry
that serves a valuable function in the business community.

MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 8


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Table 1: Difference Between Operational and DW System[1]

Now that the differences between data warehouse and OLTP systems have been
reviewed, it is time to look deeper into the makeup of the data warehouse itself.

The Data Warehouse Environment

Figure 1: Parts of the Data Warehouse Environment

OLTP
Source Extract
System , Data
OLTP
Transf Organized
Source
System orm to Support Acce
OLTP and ss &
Source the
Load Business Use
System
OLTP Proces of
Source s Data
System

There are many different parts of a data warehouse environment, which encompasses
everything from where the data lives today through where it is ultimately used on reports
and for analysis. Each of the main parts of the data warehousing environment, shown in
Figure 1, are described in the following sections. This figure indicates how the data
flows throughout the environment.

Source systems, shown on the left side of Figure 1, are where data is created or
collected by operational application systems that run the business. These are often
large applications that have been in place for a long time. Examples of source systems
include the following:
 Order processing
 Production scheduling
 Financial trading systems
 Policy administration
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 9


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
 Claims handling
 Accounts payable/receivable
 Employee payroll

The entire midsection of Figure 1 is devoted to the preparation and organization of data.
First, the data must be extracted from the source systems. Next, the data needs to be
transformed to prepare it for business use. It must be cleansed, validated, integrated
together, and reorganized. Finally, the data is loaded into structures that are designed
to deliver it to the business community. The entire process is referred to as the extract,
transform, and load (ETL) process.

The database in which the data is organized to support the business is called a data
mart. A data mart includes all of the data that is loaded into a single database and used
together for analysis. Data marts are often developed to meet the needs of a business
group such as marketing or finance. The key to a successful data mart is to create it in
an integrated manner. It is also recommended that data be loaded into only one data
mart and then shared across the organization to ensure data consistency.

Finally, an application or reporting layer is provided to facilitate access and analysis of


the data. This is where business users access reports, dashboards, and analytical
applications. Collections of these reports and analyses are called business intelligence.

Data Models

There is one more critical concept that warrants some attention: the mechanism used to
help organize data, which is called a data model.

A data model is an abstraction of how individual data elements relate to each other. It
visually depicts how the data is to be organized and stored in a database. A data model
provides the mechanism for documenting and understanding how data is organized.

There are many different types of data modeling, each with a specific goal and purpose.
As organizations modified how data was structured to support reporting and analysis, a
new data modeling technique, now called dimensional modeling, emerged. Ralph
Kimball, a pioneer in data warehousing, can be credited with crystallizing these
techniques and publishing them for the benefit of the industry.

Figure 2: Data Models


MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 10


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Note from the figure above that ODS stands for “operational data store”, which is
designed to integrate data from multiple sources. The data is then passed back to
operational for further operations for reporting.

Data model is the process used to define data requirements to support business
structures.
Here a brief overview of what IT industry sees and define data modeling.

“Conceptual Data Model describes the scope of the model and the business
structures. It is the first step in organizing the data requirements. A Conceptual Data
Model (CDM) is a structured business view of the data required to support current
business processes, business events, and related performance measures. It is a single
integrated data structure which reflects the structure of business functions rather than
the processing flow or the physical arrangement of data.
Characteristics:
- Represents overall logical structure of data
- Independent of software or data storage structure
- Often contains objects not implemented in physical databases
- Represents data needed to run an enterprise or a business activity

Logical Data Model describes the technicalities on how the business users
conceptualized their data. This includes now the describing of what tables to be used ,
their relationships etc. etc.
Transactional Logical Data Model: Used for transactional data modeling ( Transaction
includes ledger, sales, history logs and data matrices.
Analytical Logical Data Model : Used for analytic / generic logical data modeling
( Analytical includes Business strategies, Data consolidation, Profiling, and Fact finding
solutions.

Logical Data Model (LDM) builds upon the business requirements and includes a further
level of detail that supports both the business and system requirements. Business rules
are incorporated into the LDM and it loses some of the “generalities” from the Enterprise
CDM
Characteristics
– Independent of specific software and data storage structure
– Includes more specific entities and attributes
– Includes business rules and relationships
– Includes foreign keys, alternate keys

Physical Data Model : defines the physical concepts of the data warehouse, where you
will put all of your records, what software to be used and how the storage will be used.
Physical Data Model (PDM) is specific to the software and performance constraints of
the specific database management system to be used in the implementation. Both
software and data storage structures are considered and the model is often modified to
meet performance or physical constraints.
Characteristics:
– Dependent on specific software and data storage structure
– Includes tables and columns
– Includes physical database objects (triggers, stored procedures, table spaces)
– Includes referential integrity rules that restrict relationships between tables

Multidimensional Data Model

Multidimensional data model in data warehouse is a model which represents data in the form
of data cubes. It allows to model and view the data in multiple dimensions and it is defined by
dimensions and facts. Multidimensional data model is generally categorized around a central
theme and represented by a fact table.

Dimension: provides the context surrounding a business process event. In simple terms, they
give who, what, where of a fact. In the Sales business process, for the fact quarterly sales
number, dimensions would be
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 11


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
 Who – Customer Names
 Where – Location
 What – Product Name

In other words, a dimension is a window to view information in the facts.

Fact: Observation which affects decision-making processes

Measure: A quantitative description of facts

MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 12


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Figure 3: Dimensions, Facts and Measures

Fact Examples:
Sales, Shipments,
Hospital
Admissions

Measure
Examples:
Sale Receipts,
Amount
Shipped, Hospital
Admission Costs

The multidimensional model begins with the observation that the factors affecting decision-
making processes are enterprise-specific facts, such as sales, shipments, hospital admissions,
surgeries, and so on. Instances of a fact correspond to events that occurred. For example,
every single sale or shipment carried out is an event. Each fact is described by the values of a
set of relevant measures that provide a quantitative description of events. For example, sales
receipts, amounts shipped, hospital admission costs, and surgery time are measures.
Perhaps the best starting point to approach the multidimensional model effectively is a definition
of the types of queries for which this model is best suited. Section 1.7 offers more details on
typical decision-making queries such as those listed here (Jarke et al., 2000):
"What is the total amount of receipts recorded last year per state and per product
category?"
"What is the relationship between the trend of PC manufacturers' shares and quarter
gains over the last five years?"
"Which orders maximize receipts?"
"Which one of two new treatments will result in a decrease in the average period of
admission?"
"What is the relationship between profit gained by the shipments consisting of less than
10 items and the profit gained by the shipments of more than 10 items?"

It is clear that using traditional languages, such as SQL, to express these types of queries can
be a very difficult task for inexperienced users. It is also clear that running these types of
queries against operational databases would result in an unacceptably long response time.

Figure 4: Sales Cube and Its Dimensions

Sales Cube
Example
Dates

Stores Products
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 13


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Obviously, a huge number of events occur in typical enterprises—too many to analyze one by
one. Imagine placing them all into an n-dimensional space to help us quickly select and sort
them out. The n-dimensional space axes are called analysis dimensions, and they define
different perspectives to single out events. For example, the sales in a store chain can be
represented in a three-dimensional space whose dimensions are products, stores, and dates.
As far as shipments are concerned, products, shipment dates, orders, destinations, and terms &
conditions can be used as dimensions. Hospital admissions can be defined by the department-
date-patient combination, and you would need to add the type of operation to classify surgery
operations.
Figure 5: Cube Concepts

How it Works:
Cubes
The metaphor Cubes came up as a way to visualize this model

If there are more than 3 dimensions, the cube is called a


hypercube

Cube Cells: Events


Cube Edges: Analysis Dimensions

Each Cube Cell has a value for each measure

The concept of dimension gave life to the broadly used metaphor of cubes to represent
multidimensional data. According to this metaphor, events are associated with cube cells and
cube edges stand for analysis dimensions. If more than three dimensions exist, the cube is
called a hypercube. Each cube cell is given a value for each measure. Figure 6 shows an
intuitive representation of a cube in which the fact is a sale in a store chain. Its analysis
dimensions are store, product and date. An event stands for a specific item sold in a specific
store on a specific date, and it is described by two measures: the quantity sold and the receipts.
This figure highlights that the cube is sparse—this means that many events did not actually take
place. Of course, you cannot sell every item every day in every store..

Figure 6: Multidimensional Model

Multidimensional
Model
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 14


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Figure 7: Events

Events:
Unique Events in the Multidimensional Model (Cube Cells) do
not always correspond to
Unique Events in the Application Domain

Store
Each cell represents the entire sales in a store
in a day
It doesn't take into account individual transactions
Day (App Domain)
Sales

To avoid any misunderstanding of the term event, you should realize that the group of
dimensions selected for a fact representation singles out a unique event in the multidimensional
model, but the group does not necessarily single out a unique event in the application domain.
To make this statement clearer, consider once again the sales example. In the application
domain, one single sales event is supposed to be a customer's purchase of a set of products
from a store on a specific date. In practice, this corresponds to a sales receipt. From the
viewpoint of the multidimensional model, if the sales fact has the product, store, and date
dimensions, an event will be the daily total amount of an item sold in a store. It is clear that the
difference between both interpretations depends on sales receipts that generally include various
items, and on individual items that are generally sold many times every day in a store. In the
following sections, we use the terms event and fact to make reference to the granularity taken
by events and facts in the multidimensional model.

Figure 8: Dimensions and Slices

For the marketing manager, his business dimensions are product, product category,
time (day, week, month), sales district, and distribution channel. For the financial controller,
the business dimensions are budget line, time (month, quarter, year), district, and
division.
If your users of the data warehouse think in terms of business dimensions for decision
making, you should also think of business dimensions while collecting requirements.
Although the actual proposed usage of a data warehouse could be unclear, the business
dimensions used by the managers for decision making are not nebulous at all. The users
will be able to describe these business dimensions to you. You are not totally lost in the process
of requirements definition. You can find out about the business dimensions.
Let us try to get a good grasp of the dimensional nature of business data. Figure 8 shows
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 15


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
the analysis of sales units along the three business dimensions of product, time, and
geography.
These three dimensions are plotted against three axes of coordinates. You will see that
the three dimensions form a collection of cubes. In each of the small dimensional cubes, you
will find the sales units for that particular slice of time, product, and geographical division.
A slice in a multidimensional array is a column of data corresponding to a single value for one or
more members of the dimension. In this case, the business data of sales units is three
dimensional because there are just three dimensions used in this analysis. If there are more
than three dimensions, we extend the concept to multiple dimensions and visualize
multidimensional cubes, also called hypercubes.

Figure 9: Cube Management

Cube Management:
Assume:
100 Stores,
100 Items,
3 Years (Roughly 1000 Days)
Cube Size = 100x100x 1000 = 10000000 Potential Events

Cubes Need Automatic Tools to


Analyze
Or, the data needs to be reduced, there are 2 ways to do this:
Restriction and Aggregation

The information in a multidimensional cube is very difficult for users to manage because of its
quantity, even if it is a concise version of the information stored to operational databases. If, for
example, a store chain includes 50 stores selling 1000 items, and a specific data warehouse
covers three-year-long transactions (approximately 1000 days), the number of
potential events totals 50 × 1000 × 1000 = 5 × 107. Assuming that each store can sell only 10
percent of all the available items per day, the number of events totals 5 × 106. This is still too
much data to be analyzed by users without relying on automatic tools.
You have essentially two ways to reduce the quantity of data and obtain useful information:
restriction and aggregation.
The cube metaphor offers an easy-to-use and intuitive way to understand both of these
methods.

Data Warehouse Design and Develop Sequence

Earlier in this module, you looked at how data flows through the data warehouse environment.
While this correctly illustrates how data flows in the completed environment, this is not the
recommended sequence for designing and developing a data warehouse. A better way to
design the environment is to start from the business user perspective. Business Questions and
Problems arises and then collected by the source system, (Source system can be flat files,
Spreadsheets, Personal folders, Pictures etc.), then These Source Data will be processed and
then organized that might help support the business, once It was organized, users can now
access the information needed.
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 16


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Figure 10: Data Warehouse Design and Develop Sequence

Shows how data flows in the completed environment.


Source Source
Input to Data Delivery
System System
Design
Business Questions
And Problems Source Source
System System

Data
Organized Design and
Access Proces
Build
& Use to Support Process s the
of Data Data
the Business
This Figure shows the correct order to successfully design and implement a data warehousing
environment. Both the technical and business team members play a role throughout.
An understanding of what the business is trying to accomplish and how success is measured
should be the foundation for all data warehousing initiatives. The starting point for designing the
data warehouse is with the business community.

Once the business requirements are understood, the data in the underlying source systems
needs to be studied. Many business people have a vision for what they want to do, but it is not
always tied to the reality of the organization's actual data.

The foundation for successful data warehousing, now and into the future, is properly structuring
the data. Data must be organized to support the business perspective. This provides ease of
use and improved query performance. This design is created based on a knowledge of the
business requirements, as well as the reality of the existing data.

After defining how the data will be organized, the design for getting the data from the source
systems to the database can be created. Decisions about the architecture and tools needed to
prepare the data can be made in the proper context. Too often these decisions are made before
you know what is to be delivered.

While the data is being prepared, the data access and application layer can be designed. This
includes the design of basic reports, business intelligence, and analytical applications, and
performance dashboards or other end user tools.

Project Methodology

Many different project methodologies are available for all systems' development efforts. There
are even multiple methodologies specifically targeted toward data warehousing. These have
evolved over several decades. Most organizations already have adopted some type of project
methodology or project life cycle. It is important to understand how your organization runs
projects to ensure that the data warehouse project is adhering to the strategic direction for all
information systems. Several basic building blocks are found in any methodology. These
primary components are as follows:

 Project definition, planning and management


 Defining business requirements
 Designing the data delivery database
 Defining the architecture
 Processes for building the database
 Developing reports/analyses and providing education/support
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 17


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
These basic components need to be in place regardless of the chosen methodology, approach,
philosophy, or technology. The sequence outlined earlier helps to ensure overall success by
tying all other activities to business requirements. This sequence also helps build a foundation
that will withstand the test of time.

Administering and Managing a Data Warehouse

Data Warehouse and Business Strategy

A well-managed data warehouse can assist a corporation in its strategy to gain competitive
advantages. This can be achieved by using an exploration warehouse, which is a direct product
of data warehouse, to identify environmental factors, formulate strategic plans, and determine
business specific objectives:

Identifying Environmental Factors: Quantified analysis can be used for identifying a


corporation's products and services, market share of specific products and services, financial
management.
Formulating Strategic Plans: Environmental factors can be matched up against the strategic
plan by identifying current market positioning, financial goals, and opportunities.
Determining Specific Objectives: Exploration warehouse can be used to find patterns; if
found, these patterns are then compared with patterns discovered previously to optimize
corporate objectives.

While managing a data warehouse for business strategy, what needs to be taken into
consideration is the difference between companies. No one formula fits every organization.
Avoid using so called "templates" from other companies. The data warehouse is used for your
company's competitive advantages. You need to follow your company's user information
requirements for strategic advantages.

Developing a Data Warehouse

Building a data warehouse is a large system development process. Participants of data


warehouse development can range from a data warehouse administrator to a business analyst.
The data warehouse team is supposed to lead the organization into assuming their roles and
thereby bringing about a partnership with the business. A data warehouse team may have the
following roles:

Data Warehouse Administrator (DWA): responsible for integrating and coordinating of


metadata and data across many different data sources as well as data source management,
physical database design, operation, backup and recovery, security, and performance and
tuning.

Manager/Director: responsible for the overall management of the entire team to ensure that the
team follows the guiding principles, business requirements, and corporate strategic plans.

Project Manager: responsible for data warehouse project development, including matching
each team member's skills and aspirations to tasks on the project plan.

Executive Sponsor: responsible for garnering and retaining adequate resources for the
construction and maintenance of the data warehouse.

Business Analyst: responsible for determining what information is required from a data
warehouse to manage the business competitively.

System Architect: responsible for developing and implementing the overall technical
architecture of the data warehouse, from the backend hardware and software to the client
desktop configurations.

ETL Specialist: responsible for routine work on data extraction, transformation, and loading for
the warehouse databases.

Front End Developer: responsible for developing the front-end, whether it is client-server or
over the Web.

OLAP Specialist: responsible for the development of data cubes, a multidimensional view of
data in OLAP.
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 18


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
Data Modeler: responsible for modeling the existing data in an organization into a schema that
is appropriate for OLAP analysis.

Trainer: responsible for training the end-users to use the system so that they can benefit from
the data warehouse system.

End User: responsible for providing feedback to the data warehouse team.

Process Management

Developing data warehouse has become a popular but exceedingly demanding and costly
activity in information systems development and management. Data warehouse vendors are
competing intensively for their customers because so much of their money and prestige are at
stake. Consulting vendors have redirected their attention toward this rapidly expanding market
segment. User companies are facing with a serious question on which product they should buy.
As mentioned before, data warehouse development is a large system development process.
Process management is not required in every step of the development processes.

Security Management

In recent years, information technology (IT) security has become one of the hottest and most
important topics facing both users and providers. The goal of database security is the protection
of data from accidental or intentional threats to its integrity and access. The same is true for a
data warehouse. However, higher security methods, in addition to the common practices such
as view-based control, integrity control, processing rights, and DBMS security, need to be used
for the data warehouse due to the differences between a database and data warehouse. One of
the differences that demand a higher level of security for a data warehouse is the scope of and
detail level of data in the data warehouse, such as financial transactions, personal medical
records, and salary information. A method that can be used to protect data that requires high
level of security in a data warehouse is by using encryption and decryption.

Confidential and sensitive data can be stored in a separate set of tables where only authorized
users can have access. These data can be encrypted while they are being written into the data
warehouse. In this way, the data captured and stored in the data warehouse are secure and can
only be accessed on an authorized basis. Three levels of security can be offered by using
encryption and decryption. The first level is that only authorized users can have access to the
data in the data warehouse. Each group of users, internal or external, ranging from executives
to information consumers should be granted different rights for security reasons. Unauthorized
users are totally prevented from seeing the data in the data warehouse. The second level is the
protection from unauthorized dumping and interpretation of data. Without the right key an
unauthorized access will not be allowed to write anything into the tables. On the other hand, the
existing data in the tables cannot be decrypted. The third level is the protection from
unauthorized access during the transmission process. Even if unauthorized access occurs
during transmission, there is no harm to the encrypted data unless the user has the decryption
code

Integration of Data Warehouses into Global Information Environment: Major Challenges

The availability of the opportunities described will completely depend on the availability, quality,
and organization of the information in the external sources as well as on the organization's
ability to complete complicated projects and its commitment to flexible and well-grounded
decision making. That is why such a company will have to adopt new organizational approaches
as well as new software engineering solutions and technologies that could provide a solid base
for efficient and adequate accumulation of information from scattered sources in a global
information environment for its further integration into corporate decision support systems.

The problem of development of data warehouses in a global information environment has much
in common with the development of data warehouses within the scope of corporate information
systems (we have already mentioned those difficulties and tasks). But on the whole it
significantly differs from the implementation of data warehouses that use information from local
databases:
 External information environment and data in it (data sources, data formats, access
interfaces) may change significantly, and these changes may be made without taking
into account how they influence the behavior of the entities that consume this
information.
MODULE GUID
Flexible Learn

This document is a property of the University of St. La Salle Module 1 | Page 19


Unauthorized copying and / or editing is prohibited.
ng A.Y. 2020-2021
E
 Information collected from external environment practically cannot be improved and
adjusted in accordance with the needs of organizations that consume (or potentially may
consume) it.
 Technologies of storage and presentation of data in an external information environment
may be numerous, and some of these technologies may be unavailable for the
organization that wants to include external information into corporate data warehouses.
That is why development of analytical tools and integration of information flows from a global
information environment into corporate analytical information systems will require new
approaches, skills, and technologies. Particularly development of analytical tools and decision
support systems that use external information sources will require dealing with both
organizational and technological difficulties.

LEARNING TASK (55 pts.)

A. In your own words, answer the following questions (DO NOT copy verbatim from the text of
this module; you may use Word or simply write your solution):

1. Describe what a data warehouse is, including its key attributes. (10 pts.)
2. Justify how a large organization like San Miguel Corporation (SMC) may benefit from
having a data warehouse. You may cite actual products and other real-life aspects of
SMC operations to illustrate your points (you may do some research on this). (15 pts.)
3. Identify the parts of the data warehouse environment and discuss their interrelationships.
(10 pts.)
4. Construct a multidimensional model of possible data cubes for SMC (DO NOT use the
same examples mentioned in the module). Draw the same; illustrate and label the
dimensions, facts, measures, events, and slices.(15 pts.)
5. Discuss briefly the data warehouse design and develop process.(5 pts.)

B. Submit your work to the instructor via Canvas.

This document is a property of the University of St. La Salle Module 1 | Page 20


Unauthorized copying and / or editing is prohibited.

You might also like