0% found this document useful (0 votes)
280 views5 pages

Data Warehouse: Concepts, Architecture and Components

The document defines a data warehouse as a system that contains historical and cumulative data from single or multiple sources to simplify reporting and analysis for decision making. It describes the key characteristics of a data warehouse as being subject-oriented, integrated, time-variant, and non-volatile. The document also outlines the typical three-tier architecture for a data warehouse consisting of a bottom tier database, middle tier OLAP server, and top tier front-end tools. It notes that data warehouses contain current, historical, and summarized data as well as metadata.

Uploaded by

nandini swami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
280 views5 pages

Data Warehouse: Concepts, Architecture and Components

The document defines a data warehouse as a system that contains historical and cumulative data from single or multiple sources to simplify reporting and analysis for decision making. It describes the key characteristics of a data warehouse as being subject-oriented, integrated, time-variant, and non-volatile. The document also outlines the typical three-tier architecture for a data warehouse consisting of a bottom tier database, middle tier OLAP server, and top tier front-end tools. It notes that data warehouses contain current, historical, and summarized data as well as metadata.

Uploaded by

nandini swami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Warehouse

Concepts, Architecture and Components

Definition:

Data warehouse is an information system that contains historical and commutative data from single or
multiple sources. It simplifies reporting and analysis process of the organization.
It is also a single version of truth for any company for decision making and forecasting.

Characteristics of Data warehouse

A data warehouse has following characteristics:


 Subject-Oriented
 Integrated
 Time-variant
 Non-volatile

Subject-Oriented
A data warehouse is subject oriented as it offers information regarding a theme instead of companies'
ongoing operations. These subjects can be sales, marketing, distributions, etc.
A data warehouse never focuses on the ongoing operations. Instead, it put emphasis on modeling and
analysis of data for decision making. It also provides a simple and concise view around the specific subject
by excluding data which not helpful to support the decision process.

Integrated
In Data Warehouse, integration means the establishment of a common unit of measure for all similar data
from the dissimilar database. The data also needs to be stored in the Datawarehouse in common and
universally acceptable manner.
A data warehouse is developed by integrating data from varied sources like a mainframe, relational
databases, flat files, etc. Moreover, it must keep consistent naming conventions, format, and coding.
This integration helps in effective analysis of data. Consistency in naming conventions, attribute measures,
encoding structure etc. have to be ensured. Consider the following example:
In the above example, there are three different application labeled A, B and C. Information stored in these
applications are Gender, Date, and Balance. However, each application's data is stored different way.
 In Application A gender field store logical values like M or F
 In Application B gender field is a numerical value,
 In Application C application, gender field stored in the form of a character value.
 Same is the case with Date and balance
However, after transformation and cleaning process all this data is stored in common format in the Data
Warehouse.

Time-Variant
The time horizon for data warehouse is quite extensive compared with operational systems. The data
collected in a data warehouse is recognized with a particular period and offers information from the
historical point of view. It contains an element of time, explicitly or implicitly.
One such place where Data warehouse data display time variance is in the structure of the record key. Every
primary key contained with the DW should have either implicitly or explicitly an element of time. Like the
day, week month, etc.
Another aspect of time variance is that once data is inserted in the warehouse, it can't be updated or changed.

Non-volatile
Data warehouse is also non-volatile means the previous data is not erased when new data is entered in it.
Data is read-only and periodically refreshed. This also helps to analyze historical data and understand what
& when happened. It does not require transaction process, recovery and concurrency control mechanisms.
Activities like delete, update, and insert which are performed in an operational application environment are
omitted in Data warehouse environment. Only two types of data operations performed in the Data
Warehousing are
1. Data loading
2. Data access

Data Warehouse Architectures

There are mainly three types of Data warehouse Architectures: -

Single-tier architecture
The objective of a single layer is to minimize the amount of data stored. This goal is to remove data
redundancy. This architecture is not frequently used in practice.

Two-tier architecture
Two-layer architecture separates physically available sources and data warehouse. This architecture is not
expandable and also not supporting a large number of end-users. It also has connectivity problems because
of network limitations.

Three-tier architecture
This is the most widely used architecture.
It consists of the Top, Middle and Bottom Tier.
1. Bottom Tier: The database of the Datawarehouse servers as the bottom tier. It is usually a relational
database system. Data is cleansed, transformed, and loaded into this layer using back-end tools.
2. Middle Tier: The middle tier in Data warehouse is an OLAP server which is implemented using either
ROLAP or MOLAP model. For a user, this application tier presents an abstracted view of the
database. This layer also acts as a mediator between the end-user and the database.
3. Top-Tier: The top tier is a front-end client layer. Top tier is the tools and API that you connect and get
data out from the data warehouse. It could be Query tools, reporting tools, managed query tools,
Analysis tools and Data mining tools.
Datawarehouse Components

The data has been selected from various sources and then integrated and store the data in a single and
particular format

Data warehouse contains current detailed data, historical detailed data, lightly and highly summarized data,
and metadata.

Current and historical data: these are voluminous because they are stored at the highest level of detail.
Lightly and highly summarized data: are necessary to save processing time when users request them and
readily accessible.
Metadata: are “data about data”. It is important for designing, contructing, retrieving, and controlling the
warehouse data.

Benefits of Data Warehousing 


The successful implementation of a data warehouse can bring major, benefits to an organization including:
• Potential high returns on investment
Implementation of data warehousing by an organization requires a huge investment typically from Rs 10
lack to 50 lacks. However, a study by the International Data Corporation (IDC) in 1996 reported that
average three-year returns on investment (RO I) in data warehousing reached 401%.
• Competitive advantage
The huge returns on investment for those companies that have successfully implemented a data warehouse is
evidence of the enormous competitive advantage that accompanies this technology. The competitive
advantage is gained by allowing decision-makers access to data that can reveal previously unavailable,
unknown, and untapped information on, for example, customers, trends, and demands.
• Increased productivity of corporate decision-makers
Data warehousing improves the productivity of corporate decision-makers by creating an integrated database
of consistent, subject-oriented, historical data. It integrates data from multiple incompatible systems into a
form that provides one consistent view of the organization. By transforming data into meaningful
information, a data warehouse allows business managers to perform more substantive, accurate, and
consistent analysis.
• More cost-effective decision-making
Data warehousing helps to reduce the overall cost of the· product· by reducing the number of channels.
• Better enterprise intelligence.
• Enhanced customer service.

Problems of Data Warehousing 


The problems associated with developing and managing a data warehousing are as follows:
Underestimation of resources of data loading
Some times we underestimate the time required to extract, clean, and load the data into the warehouse. It
may take the significant proportion of the total development time, although some tools are there which are
used to reduce the time and effort spent on this process.
Hidden problems with source systems
Hidden .problems associated with the source systems feeding the data warehouse may be identified after
years of being undetected. For example, when entering the details of a new property, certain fields may
allow nulls which may result in staff entering incomplete property data, even when available and applicable.
Required data not captured
In some cases the required data is not captured by the source systems which may be very important for the
data warehouse purpose. For example the date of registration for the property may be not used in source
system but it may be very important analysis purpose.
Increased end-user demands
After satisfying some of end-users queries, requests for support from staff may increase rather than decrease.
This is caused by an increasing awareness of the users on the capabilities and value of the data warehouse.
Another reason for increasing demands is that once a data warehouse is online, it is often the case that the
number of users and queries increase together with requests for answers to more and more complex queries.
Data homogenization
The concept of data warehouse deals with similarity of data formats between different data sources. Thus,
results in to lose of some important value of the data.
High demand for resources
The data warehouse requires large amounts of data.
Data ownership
Data warehousing may change the attitude of end-users to the ownership of data. Sensitive data that owned
by one department has to be loaded in data warehouse for decision making purpose. But some time it results
in to reluctance of that department because it may hesitate to share it with others.
High maintenance
Data warehouses are high maintenance systems. Any reorganization· of the business processes and the
source systems may affect the data warehouse and it results high maintenance cost.
Long-duration projects
The building of a warehouse can take up to three years, which is why some organizations are reluctant in
investigating in to data warehouse. Some only the historical data of a particular department is captured in the
data warehouse resulting data marts. Data marts support only the requirements of a particular department
and limited the functionality to that department or area only.
Complexity of integration
The most important area for the management of a data warehouse is the integration capabilities. An
organization must spend a significant amount of time determining how well the various different data
warehousing tools can be integrated into the overall solution that is needed. This can be a very difficult task,
as there are a number of tools for every operation of the data warehouse.

You might also like