0% found this document useful (0 votes)
6 views43 pages

Module - 2 Business Data Warehousing

Uploaded by

georgina918
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views43 pages

Module - 2 Business Data Warehousing

Uploaded by

georgina918
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Module - 2: Business Data Warehousing

Business Data Warehousing (MIS 6309)


Module - 2: Business Data Warehousing

Learning Objectives:
 To comprehend the purpose of and the nature of a data warehouse
 A thorough examination of the development of data warehouses across time
 To understand what makes up a data warehouse
 To comprehend the distinction between data and information
 To comprehend the distinction between OLTP and OLAP
Module - 2: Business Data Warehousing

What is Data Warehouse?


Data Warehouse is a subject-oriented, integrated, time-variant, non-updateable collection of data used in support of
management decision-making processes and Business Intelligence (BI).

Key Terms in Definition:


• Subject-Oriented: A data warehouse is organized around the key subjects of the enterprise. E.g.: Customer, Product,
Time, Organizational Unit Location, etc.
• Integrated: The data in the data warehouse are defined using consistent naming conventions, formats, encoding
structures, and related characteristics gathered from several internal systems of record and also often from sources
external to the organization.
• Time-Variant: Data in the data warehouse are carefully associated with a specific period of time so that they may be used
to study trends and changes.
• Non-Updateable: Data in the data warehouse are loaded and refreshed from operational systems but cannot be updated
by the end users.
A data warehouse is not just a consolidation of all the operational databases in an organization. Because of its focus on
business intelligence, external data, and time-variant data (not just current status), a data warehouse is a unique kind of
database.
Module - 2: Business Data Warehousing

A Working Definition Data Warehousing:


It is property described through the specific definition of the two words that make up the term:
Data : Facts and information about something
Warehouse: A location or facility for storing goods and merchandise

A data warehouse system has the following characteristics:


• It provides centralization of corporate data assets
• It’s contained in a well-managed environment
• It has consistent and repeatable processes defined for loading data from the corporate applications
• It’s built on an open and scalable architecture that can handle future expansion of data
• It provides tools that allow its users to effectively process the data into information without a high degree of technical
support.
Module - 2: Business Data Warehousing

Is a Bigger Data Warehouse a Better Data Warehouse?


To determine the size, you need for your data warehouse, follow these steps:

• Determine the mission, or the business objectives of the data warehouse


• Determine the functionality that you’re the data warehouse to have
• Determine what contents (type of data) the data warehouse needs to support its functionality
• Determine, based on the content volume(which is based on the functionality, which in turn is based on the mission), how
big you need to make your data warehouse.
Module - 2: Business Data Warehousing

Need for Data Warehouse:


Data in operational systems are typically fragmented and inconsistent, so-called silos, or islands, of data. They are also
generally distributed on a variety of incompatible hardware and software platforms.

Two major factors driving data warehousing in most organizations today are:
• Integrated View of Information: A business requires an integrated, company-wide, view of high-quality information
• Information Vs. Operational Systems: The information systems department must separate informational from
operational systems to improve performance in managing company data.
Module - 2: Business Data Warehousing

Need for Data Warehouse (Cont.):


Difficulties of driving a single corporate view. Look at these 3 tables from 3 separate systems of record, each containing
similar student data:
• The STUDENT DATA table is from the class registration system.
• The STUDENT EMPLOYEE table is from the personnel system.
• The STUDENT HEALTH table is from the health center system.
STUDENT DATA
StudentNo LastName MI FirstName Telephone S_Status ***
124-45-6789 De'Luke T Britney 883-8881 Soph
987-65-4123 Smith M Eric 777-1234 Jr

STUDENT EMPLOYEE
StudentNo Address Dept. Hours ***
124-45-6789 1218 Elk Drive MIS 8
987-65-4123 256 Mesa Drive BUAN 10

STUDENT HEALTH
StudentName Telephone Insurance ID ***
Britney T. DeLuke 883-8881 BCBS 124-45-6789
Eric M. Smith 777-1234 ? 987-65-4123
Module - 2: Business Data Warehousing

Need for Data Warehouse (Cont.):


Some of the issues that you must resolve are as follows:
• Inconsistent key structure: the primary key of the first two tables in some version of the student Social Security number,
whereas the primary key of STUDENT HEALTH is StudentName.
• Synonym: In STUDENT DATA, the primary key is named StudentNo, whereas in STUDENT EMPLOYEE it is named
StudentID. (You learned how to deal with synonyms)
• Free–Form fields versus structured fields: In STUDENT HEALTH, StudentName is a single field. In Student DATA,
StudentName (a composite attribute) is broken into its component parts: LastName, MI, and FirstName.
• Inconsistent data values: Elaine Smith has one telephone number in STUDENT DATA but a different number in STUDENT
HEALTH. Is this an error, or does this person have two telephone numbers?
• Missing data: The value for Insurance is missing (or null) for Elaine Smith in the STUDENT HEALTH table. How will this
value be located?
Module - 2: Business Data Warehousing

Need for Company-Wide view (Cont.)


• No single system of record
• Multiple systems are not synchronized
• Organizations want to analyze the activities in a balanced way
• Customer relationship management
• Supplier relationship management
Module - 2: Business Data Warehousing

What’s in a Data Warehouse:


• A Data Warehouse is a home for your high-value data, or data assets, that originates in other corporate applications, such
as a public database that contains sales information gathered from all your competitors
• If your company’s data warehouse were advertised as a product for sale, it might be described as “ Contains high–quality,
refined and purified information, all of which has undergone a 25-point quality check and is offered to you with a
warranty to guarantee hassle-free ownership so that you can better monitor the performance of your business
Module - 2: Business Data Warehousing

Classifying Data as an Asset:


You can classify data that are managed within an enterprise in three groupings:

Run the business data: Produced by corporate applications such as the one your company uses to fill customer orders for its
products or the one your company uses to manage the financial transaction. The new materials for a data warehouse.

Integrate the business data: Built to improve the quality of and synchronize two or more corporate applications. Such as a
master list of customers. Data leveraged to integrate applications that weren’t designed to work with each other.

Monitor the business data: Presented to end users for reporting and decision support, such as your financial dashboard. The
data is cleaned to enable users to better understand progress and evaluate the cause-and-effect relationship in the data.
Module - 2: Business Data Warehousing

Manufacturing Data Assets:


• The data warehousing teams select a focus area, such as tracking and reporting the company’s product sales activity
against that of its competitors

• The team in charge of building the data warehouse assigns a group of business users and other key individuals within the
company to play the role of subject-matter-expert

• The group goes through the list of information (data assets), item by item, and figures out where the data warehouse can
obtain that particular piece of data (raw material)

• After completing the details of where the business can get each piece of information, the data warehousing team creates
extraction programs
Module - 2: Business Data Warehousing

Extraction Programs:
• Extraction programs collect data from various internal databases and files, copy certain data to a staging area (a work
area outside the data warehouse), cleanse the data to ensure that the data has no errors, and then copy the higher-
quality data (data assets) into a data warehouse

• Extraction programs are created either by hand (custom coded) or by using specialized data warehousing products – ETL
(extract, Transform, and load) tools
Module - 2: Business Data Warehousing

Data Movement Mechanism:


Module - 2: Business Data Warehousing

Historical Perspective
of Data Warehousing
Module - 2: Business Data Warehousing

Data Warehouse – Why?

• Single point of truth

• Consolidate data across the organization to get a consistent and agreed view

• Standardized data models

• Support decision making on all organization levels


Module - 2: Business Data Warehousing

Data Warehouse – Why? (Cont.)


• Data warehousing is a technology that aggregates structured data from one or more sources so that it can be compared
and analyzed for greater business intelligence
• Data warehouses use a different design from standard operational databases. The latter is optimized to maintain the
strict accuracy of data at the moment by rapidly updating real-time data. Data warehouses, by contrast, are designed to
give a long-range view of data over time. They trade off transaction volume and instead specialize in data aggregation.
• Unlike an operational data store, a data warehouse contains aggregate historical data, which may be analyzed to reach
critical business decisions

Benefits:
• Better data: Adding data sources to a data warehouse enables organizations to ensure that they are collecting consistent
and relevant data from that source. This ensures higher data quality and integrity for sound decision-making.
• Faster decisions — Data in a warehouse is consistent and is ready to be analyzed. It provides the analytical power and a
more complete dataset to base decisions on hard facts. Thus, decision-makers no longer need to rely on hunches,
incomplete data, or poor-quality data and risk delivering slow and inaccurate results.
Module - 2: Business Data Warehousing

Data Warehouse History:


In the beginning, there were simple mechanisms for holding data.
• There were punched cards. There were paper tapes. There was a core memory that was hand beaded. In the beginning,
storage was very expensive and very limited.
• A new day dawned with the introduction and use of magnetic tape. With magnetic tape, it was possible to hold very large
volumes of data cheaply.
 With magnetic tape, there were no major restrictions on the format of the record of data. With magnetic tape, data could be
written and rewritten. Magnetic tape represented a great leap forward from early methods of storage.
 But magnetic tape did not represent a perfect world. With magnetic tape, data could be accessed only sequentially. It was often said
that to access 1% of the data, 100% of the data had to be physically accessed and read.
 In addition, magnetic tape was not the most stable medium on which to write data. The oxide could fall off or be scratched off of a
tape, rendering the tape useless.
• Disk storage represented another leap forward for data storage. With disk storage, data could be accessed directly. Data
could be written and rewritten. And data could be accessed in masse. There were all sorts of virtues that came with disk
storage.
Module - 2: Business Data Warehousing

Data Warehouse History: (Cont.)


• Soon disk storage was accompanied by software called a “ DBMS ”or “Database Management System. ”
• DBMS software existed for the purpose of managing storage on the disk itself.
• Disk storage managed such activities as
 identifying the proper location of data
 resolving conflicts when two or more units of data were mapped to the same physical location
 allowing data to be deleted
 spanning a physical location when a record of data would not fi t in a limited physical space
• Online applications - Once data could be accessed directly, using disk storage and a DBMS, there soon grew what came to
be known as online applications.
 Online applications were applications that depended on the computer to access data consistently and quickly.
 There were many commercial applications of online processing.
 Online applications became so powerful and popular that they soon grew into many interwoven applications.
 Data was being proliferated around the corporation so that at any one point in time, people were never sure about the accuracy or
completeness of the data that they had.
Module - 2: Business Data Warehousing

Progression of Systems:

PERSONAL COMPUTERS AND 4GL TECHNOLOGY - To placate the end user’s cry for accessing data, two technologies
emerged—personal computer technology and 4GL technology.
• Personal computer technology allowed anyone to bring his/her own computer into the corporation and to do
his/her own processing at will. Personal computer software such as spreadsheet software appeared. In addition, the
owner of the personal computer could store his/her own data on the computer.
• At about the same time, along came a technology called “ 4GL ” — fourth-generation technology. The idea behind
4GL technology was to make programming and system development so straightforward that anyone could do it.
Module - 2: Business Data Warehousing

Personal Computers and 4GL Technology: (Cont.)


While the end users were now free to access data, they discovered that there was a lot more to making good decisions than
merely accessing data. The end users found that even after data had been accessed if the data was not accurate, it was
worse than nothing because incorrect data can be very misleading;
• incomplete data is not very useful;
• data that is not timely is less than desirable;
• when there are multiple versions of the same data, relying on the wrong value of data can result in bad decisions;
• data without documentation is of questionable value.
• It was only after the end users got access to the data that they discovered all the underlying problems with the data.

The Spider web environment


It is called the spider’s web environment because there are many lines going to so many places that they are reminiscent of a
spider’s web.
The spider’s web environment grew to be unimaginably complex in many corporate environments.
Module - 2: Business Data Warehousing

The Early Progression Led to the Spider’s Web Environment:


Module - 2: Business Data Warehousing

A Real Spider Web Environment:

• The frustration of the end user, the IT professional, and the management resulted in a movement to a different
information architecture.
• That information systems architecture was one that centered around a data warehouse
Module - 2: Business Data Warehousing

Classifying the Data Warehouse:


Each of the below classifications of data warehouses implements various aspects of an overall data warehousing
architecture

Data warehouse lite: A relatively straightforward implementation of a modest scope (often, for a small user group or team)
in which you don’t go out on any technological limbs; almost a low-tech implementation.

Data warehouse deluxe: A standard data warehouse implementation that uses advanced technologies to solve complex
business information and analytical needs across a broader user population.

Data warehouse supreme: A data warehouse that has large-scale data distribution and advanced technologies that can
integrate various “ run the business” systems, improving the overall quality of the data assets across business information
analytical needs, and transactional needs.
Module - 2: Business Data Warehousing

Sample Architecture of Data Warehouse Lite:


Module - 2: Business Data Warehousing

Sample Architecture of Data Warehouse Deluxe:


Module - 2: Business Data Warehousing

Sample Architecture of Data Warehouse Supreme:


Module - 2: Business Data Warehousing

SAP Business Warehouse Evolution:


Module - 2: Business Data Warehousing

SAP BW Evolution:
Module - 2: Business Data Warehousing

SAP BW Architecture:
Module - 2: Business Data Warehousing

Components of Data Warehouse:


The data warehouse is based on an RDBMS server which is a central information repository that is surrounded by some key
components to make the entire environment functional, manageable and accessible.

There are mainly five components of a Data Warehouse:


a) Data Warehouse Database:
• The central database is the foundation of the data warehousing environment.
• This database is implemented on the RDBMS technology.

b) Sourcing, Acquisition, Clean-up, and Transformation Tools (ETL):


• The data sourcing, transformation, and migration tools are used for performing all the conversions, summarizations,
and changes needed to transform data into a unified format in the data warehouse.
• They are also called Extract, Transform and Load (ETL) Tools
Components of Data Warehouse: (Cont.)
c) Metadata:
• Metadata is data about data that defines the data warehouse.
• It is used for building, maintaining, and managing the data warehouse.
• In the Data Warehouse Architecture, meta-data plays an important role as it specifies the source, usage, values, and
features of data warehouse data. It also defines how data can be changed and processed.
• Metadata can be classified into the following categories:
 Technical Meta Data: This kind of Metadata contains information about the warehouse which is used by Data warehouse
designers and administrators.
 Business Meta Data: This kind of Metadata contains detail that gives end-users a way easy to understand the information stored in
the data warehouse.
Module - 2: Business Data Warehousing

Components of Data Warehouse: (Cont.)


d) Query Tools:
• One of the primary objectives of data warehousing is to provide information to businesses to make strategic decisions.
• Query tools allow users to interact with the data warehouse system.
• These tools fall into four different categories:
 Query and reporting tools
 Application Development tools
 Data mining tools
 OLAP tools

e) Data Marts:
• A data mart is an access layer that is used to get data out to the users.
• It is presented as an option for large-size data warehouses as it takes less time and money to build.
Module - 2: Business Data Warehousing

Data Warehouse Framework and Views:


Module - 2: Business Data Warehousing

Data vs Information:

Data Information

Raw Facts- Not yet been processed


Produced by processing raw data to
to reveal their meaning to the end
reveal its meaning
user

Building blocks of information Bedrock of knowledge

Should be accurate, relevant, and


Data Management- Generation,
timely to enable good decision
storage and retrieval of data
making
Module - 2: Business Data Warehousing

Converting Data to Information:


The first half represents data, and the second half represents information that is retrieved from the data.
Module - 2: Business Data Warehousing

OLTP vs OLAP:
IT systems can be divided into –
• Transactional (OLTP)
• Analytical (OLAP)
OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.
Module - 2: Business Data Warehousing

OLTP vs OLAP: (Cont.)


OLTP (On-Line Transaction Processing) is characterized by a large number of short online transactions (INSERT, UPDATE,
DELETE).
• The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access
environments, and effectiveness measured by the number of transactions per second.
• In an OLTP database, there is detailed and current data, and the schema used to store transactional databases is the
entity model (usually normalized ).

OLAP (On-Line Analytical Processing) is characterized by a relatively low volume of transactions.


• Queries are often very complex and involve aggregations.
• For OLAP systems a response time is an effectiveness measure.
• OLAP applications are widely used in Data Mining techniques.
• In the OLAP database, there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema
which we will discuss later).
Module - 2: Business Data Warehousing

OLTP vs OLAP: (Cont.)


Module - 2: Business Data Warehousing

OLTP vs OLAP: (Cont.)


Module - 2: Business Data Warehousing

OLTP vs OLAP: (Cont.)


Data Modeling and Data Models
Data modeling: Iterative and progressive process of creating a specific data model for a determined problem domain
Data models: Simple representations of complex real-world data structures Useful for supporting a specific problem domain
Model - Abstraction of a real-world object or event
Importance of Data Models -
Module - 2: Business Data Warehousing

Learning Outcomes:
 You now understand the purpose and nature of a data warehouse
 You are aware of the development of data warehouses across time
 You are familiar with a data warehouse's components
 You can list the distinctions between data and information
 You can list the distinctions between OLTP and OLAP

You might also like