Introduction To Data Warehouse
Introduction To Data Warehouse
OLAP operations:
There are five basic analytical operations that can
be performed on an OLAP cube:
1. Drill down: In drill-down operation, the
less detailed data is converted into highly
detailed data. It can be done by:
Moving down in the concept hierarchy
Adding a new dimension
In the cube given in overview section, the drill down operation is performed
by moving down in the concept hierarchy of Time dimension (Quarter ->
Month).
4. Slice: It selects a single dimension from the OLAP cube which results
in a new sub-cube creation. In the cube given in the overview
section, Slice is performed on the dimension Time = “Q1”.
6. OLAP vs OLTP
2 OLAP systems are used by OLTP systems are used by clerks, DBAs, or
knowledge workers such as database professionals.
executives, managers and
analysts.
It may hold multiple subject areas. It holds only one subject area. For example,
Finance or Sales.
It is a Centralized System. It is a
Decentralized System.
The figure shows the essential elements of a typical warehouse. We see the
Source Data component shows on the left. The Data staging element serves as
the next building block. In the middle, we see the Data Storage component
that handles the data warehouses data. This element not only stores and
manages the data; it also keeps track of data using the metadata repository.
The Information Delivery component shows on the right consists of all the
different ways of making the information from the data warehouses available
to the users.
Source data coming into the data warehouses may be grouped into four broad
categories:
2.6M
238
Java Collection MCQ Set 1
Production Data: This type of data comes from the different operating systems
of the enterprise. Based on the data requirements in the data warehouse, we
choose segments of the data from the various operational modes.
Archived Data: Operational systems are mainly intended to run the current
business. In every operational system, we periodically take the old data and
store it in achieved files.
After we have been extracted data from various operational systems and
external sources, we have to prepare the files for storing in the data
warehouse. The extracted data coming from several different sources need to
be changed, converted, and made ready in a format that is relevant to be
saved for querying and analysis.
We will now discuss the three primary functions that take place in the staging
area.
1) Data Extraction: This method has to deal with numerous data sources. We
have to employ the appropriate techniques for each data source.
First, we clean the data extracted from each source. Cleaning may be the
correction of misspellings or may deal with providing default values for missing
data elements, or elimination of duplicates when we bring in the same data
from various source systems.
On the other hand, data transformation also contains purging source data that
is not useful and separating outsource records into new combinations. Sorting
and merging of data take place on a large scale in the data staging area. When
the data transformation function ends, we have a collection of integrated data
that is cleaned, standardized, and summarized.
3) Data Loading: Two distinct categories of tasks form data loading functions.
When we complete the structure and construction of the data warehouse and
go live for the first time, we do the initial loading of the information into the
data warehouse storage. The initial load moves high volumes of data using up
a substantial amount of time.
Data storage for the data warehousing is a split repository. The data
repositories for the operational systems generally include only the current
data. Also, these data repositories include the data structured in highly
normalized for fast and efficient processing.
Metadata Component
Data Marts
It includes a subset of corporate-wide data that is of value to a specific group
of users. The scope is confined to particular selected subjects. Data in a data
warehouse should be a fairly current, but not mainly up to the minute,
although development in the data warehouse industry has made standard and
incremental data dumps more achievable. Data marts are lower than data
warehouses and usually contain organization. The current trends in data
warehousing are to developed a data warehouse with several smaller related
data marts for particular kinds of queries and reports.
The management and control elements coordinate the services and functions
within the data warehouse. These components control the data
transformation and the data transfer into the data warehouse storage. On the
other hand, it moderates the data delivery to the clients. Its work with the
database management systems and authorizes data to be correctly saved in
the repositories. It monitors the movement of information into the staging
method and from there into the data warehouses storage itself.
Types of Metadata
o Operational Metadata
o Extraction and Transformation Metadata
o End-User Metadata
Operational Metadata
As we know, data for the data warehouse comes from various operational
systems of the enterprise. These source systems include different data
structures. The data elements selected for the data warehouse have various
fields lengths and data types.
In selecting information from the source systems for the data warehouses, we
divide records, combine factor of documents from different source files, and
deal with multiple coding schemes and field lengths. When we deliver
information to the end-users, we must be able to tie that back to the source
data sets. Operational metadata contains all of this information about the
operational data sources.
End-User Metadata