DWM Unit1 Solved QB
DWM Unit1 Solved QB
DWM Unit1 Solved QB
A Data Warehouse (DW) is a relational database that is designed for query and
analysis rather than transaction processing. It includes historical dataderived from
transaction data from single and multiple sources.
1. Subject Oriented –
3. Time Variant –
The data collected in a data warehouse is already identified with a
particular time period. Data warehouse provides data from historical point
of view
4. Non-volatile –
When new data is added to previous data, old data is not deleted it means
nonvolatile. A data warehouse is keep separated from the operational database
& hence changes made in operational database are not reflected in the data
warehouse.
ETL stands for Extract, Transform, Load and it is a process used in data
warehousing to extract data from various sources, transform it into a format
suitable for loading into a data warehouse, and then load it into the
warehouse. ETL process can also use the pipelining concept i.e., as soon as
some data is extracted, it can transformed and during that period some new
data can be extracted. And while the transformed datais being loaded into the
data warehouse, the already extracted data can be transformed.
The process of ETL can be broken down into the following three stages:
1. Extraction:
The first step of the ETL process is extraction. In this step, data from various
source systems is extracted which can be in various formats like relational
databases, No SQL, XML, and flat files into the staging area. It is important to
extract the data from various source systems and store it into the staging area
first and not directly into the data warehouse because the extracted data is in
various formats and can be corrupted also. Hence loading it directly into the
data warehouse may damage it and rollback will be much more difficult.
Therefore, this is one of the most important steps of ETL process.
2. Transformation:
The second step of the ETL process is transformation. In this step, aset of
rules or functions are applied on the extracted data to convert it into a single
standard format. It may involve following processes/tasks:
• Filtering – loading only certain attributes into the data warehouse.
• Cleaning – filling up the NULL values with some default values,
mapping U.S.A, United States, and America intoUSA, etc.
• Joining – joining multiple attributes into one.
• Splitting – splitting a single attribute into multiple attributes.
• Sorting – sorting tuples on the basis of some attribute (generally key-
attribute).
3. Loading:
The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse. Sometimes the
data is updated by loading into the data warehouse very frequently and
sometimes it is done after longer but regular intervals. The rate and period of
loading solely depends on the requirements and varies from system to system.
Advantages:
1. Data warehouse house permits business users to quickly accesssignificant
data from a few sources all in one place
2. Data warehouse gives consistent data on various cross-functional actions
3. It assists to put together many sources of data to reduce time for analysis&
reporting
4. Data warehouse gives to reduce total rotate time for analysis &reporting
5. For reporting & analysis of data need to use restructuring & integrationwhich
make it easier
6. To save user's time of retrieving data from multiple sources it allows users to
access critical data from the number of sources in a single place
Disdvantages:
Applications:
• Financial sectors
• Banking areas
• Consumer supplies
• Retail services
• Controlled industrialized manufacturing.
➢ Single-Tier Architecture
The requirement for separation plays an essential role in defining the two-
tier architecture for a data warehouse system, as shown in fig:
➢ Three-Tier Architecture
Metadata is simply defined as, data about data. The data that are used to represent
other data is known as metadata. For example, the index of a book serves as
metadata for the contents in the book. In other words, we can say that metadata is
the summarized data that leads us to the detailed data.
Metadata in a data warehouse is similar to the data dictionary or the data
catalogue in a database management system.
The metadata can be broadly categorized into following three categories:
1. Business Metadata: This metadata has the data ownership information,
business definition and changing policies.
2. Technical Metadata: Technical metadata includes database system names,
table and column names and sizes, data types and allowed values. Technical
metadata also includes structural information such as primary and foreign
key attributes and indices.
3. Operational Metadata: This metadata includes currency of data and data
lineage. Currency of data means whether data is active, archived or purged.
Lineage of data means history of data migrated and transformation applied
on it.
The generation and management of metadata serves two purposes:
A. To Minimize the Efforts for Development and Administration of a Data
Warehouse
B. To Improve the Extraction of Information