0% found this document useful (0 votes)

38 views14 pages

DWM Unit1 Solved QB

The document provides definitions, characteristics, differences between operational databases and data warehouses, needs of data warehousing, explanation of the ETL process with diagram, advantages and disadvantages of data warehousing, applications of data warehousing, and types of data warehouse architectures. Specifically, it defines a data warehouse as a relational database designed for query and analysis using historical data from multiple sources. It also explains characteristics like being subject-oriented, integrated, time-variant, and non-volatile.

Uploaded by

Aryan Buchake

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views14 pages

DWM Unit1 Solved QB

Uploaded by

Aryan Buchake

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

DMW Unit-1 Question Bank

1) Define data warehouse

A Data Warehouse (DW) is a relational database that is designed for query and
analysis rather than transaction processing. It includes historical dataderived from
transaction data from single and multiple sources.

Reference book definition – A Data Warehouse is a collection of corporate

information, derived directly from operational system and some external data
sources

(both definitions have to be written)

2) Write and explain characteristics of data warehousing

1. Subject Oriented –

A data warehouse is subject oriented because it gives information around

a subject somewhat than the organization's ongoing operations. These
subjects contain product, clients, suppliers, sales, customers etc.
2. Integrated –

A data warehouse is built by integrating data from different sources such

as relation database files, etc. Integration improves the effective analysis of
data.

3. Time Variant –
The data collected in a data warehouse is already identified with a
particular time period. Data warehouse provides data from historical point
of view

4. Non-volatile –
When new data is added to previous data, old data is not deleted it means
nonvolatile. A data warehouse is keep separated from the operational database
& hence changes made in operational database are not reflected in the data
warehouse.

3) Difference between operational DBMS and Data Warehouse

Operational Database Systems Data Warehouse

Operational database systems are Data warehousing systems are
designed to support high-volume typically designed to support
transaction processing. high-volume analytical processing
(i.e.,OLAP).
Operational database systems are Data Warehousing Systems are
usually concerned with current usually concerned with
data. historical
data.
Data within operational systems are Non-volatile, new data may be
mainly updated regularly according added regularly. Once, the
toneed. dataadded rarely changed.
It is designed for real-time It is designed for analysis of
businessdealing and business measures by subject
processes. area,categories, and attributes.
It is optimized for a simple set of It is optimized for extent loads
transactions, generally adding or andhigh, complex, unpredictable
retrieving a single row as per queries that access many rows
timetable. per
table.
It is optimized for validation of Loaded with consistent, valid
incoming information during information, requires no real
transactions, uses validation timevalidation.
data-tables.
It supports thousands of concurrent It supports a few
clients. concurrentclients relative
to OLTP.
Operational database systems are Data warehousing subjects
widely functional or process oriented. arewidely subject-
oriented.
Operational systems are usually Data warehousing systems are
optimized to perform fast inserts and usually optimized to perform
updates of associatively small fastretrievals of relatively high
volumes of data. volumes of data.
Operational database system Data warehousing system
focuseson Data in. focuseson data out.
Less number of data accessed. Large number of data accessed.
Relational databases are created Data warehouse designed
forOLTP. forOLAP.
Data integration in Data integration in data warehouse
operationaldatabase is is subject based.
application based.
It provides detailed and flat It provides summarized and
relationalview of data. multidimensional view of
data.

4) Write needs of data warehousing

a. Business User: Business users require a data warehouse to view

summarized data from the past. Since these people are non-technical, the
data may be presented to them in an elementary form.
b. Store historical data: Data Warehouse is required to store the time
variable data from the past. This input is made to be used for various
purposes.
c. Make strategic decisions: Some strategies may be depending upon the
data in the data warehouse. So, data warehouse contributes to making
strategic decisions.
d. For data consistency and quality: Bringing the data from different
sources at a commonplace, the user can effectively undertake to bring the
uniformity and consistency in data.
e. High response time: Data warehouse has to be ready for somewhat
unexpected loads and types of queries, which demands a significant degreeof
flexibility and quick response time.
5) Explain ETL with diagram

ETL stands for Extract, Transform, Load and it is a process used in data
warehousing to extract data from various sources, transform it into a format
suitable for loading into a data warehouse, and then load it into the
warehouse. ETL process can also use the pipelining concept i.e., as soon as
some data is extracted, it can transformed and during that period some new
data can be extracted. And while the transformed datais being loaded into the
data warehouse, the already extracted data can be transformed.
The process of ETL can be broken down into the following three stages:

1. Extraction:
The first step of the ETL process is extraction. In this step, data from various
source systems is extracted which can be in various formats like relational
databases, No SQL, XML, and flat files into the staging area. It is important to
extract the data from various source systems and store it into the staging area
first and not directly into the data warehouse because the extracted data is in
various formats and can be corrupted also. Hence loading it directly into the
data warehouse may damage it and rollback will be much more difficult.
Therefore, this is one of the most important steps of ETL process.

2. Transformation:
The second step of the ETL process is transformation. In this step, aset of
rules or functions are applied on the extracted data to convert it into a single
standard format. It may involve following processes/tasks:
• Filtering – loading only certain attributes into the data warehouse.
• Cleaning – filling up the NULL values with some default values,
mapping U.S.A, United States, and America intoUSA, etc.
• Joining – joining multiple attributes into one.
• Splitting – splitting a single attribute into multiple attributes.
• Sorting – sorting tuples on the basis of some attribute (generally key-
attribute).

3. Loading:
The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse. Sometimes the
data is updated by loading into the data warehouse very frequently and
sometimes it is done after longer but regular intervals. The rate and period of
loading solely depends on the requirements and varies from system to system.

6) Advantages, Disadvantages, Applications of Data Warehouse

Advantages:
1. Data warehouse house permits business users to quickly accesssignificant
data from a few sources all in one place
2. Data warehouse gives consistent data on various cross-functional actions
3. It assists to put together many sources of data to reduce time for analysis&
reporting
4. Data warehouse gives to reduce total rotate time for analysis &reporting
5. For reporting & analysis of data need to use restructuring & integrationwhich
make it easier
6. To save user's time of retrieving data from multiple sources it allows users to
access critical data from the number of sources in a single place

Disdvantages:

1. For unstructured data it is not an ideal option

2. Data warehouses creation & implementation is surely time confusing
matter
3. Data warehouse can be out of date relatively & rapidly
4. The data warehouse may appear easy but it is too difficult for the users.
5. In data warehouse sometime users will widen diverse business rules
6. It is difficult to make changes in Data type & ranges Data source schema
indexes & queries from data base

Applications:

• Financial sectors
• Banking areas
• Consumer supplies
• Retail services
• Controlled industrialized manufacturing.

7) Explain types of DW architecture

➢ Single-Tier Architecture

Single-Tier architecture is not periodically used in practice. Its purpose is

to minimize the amount of data stored to reach this goal; it removes data
redundancies. The figure shows the only layer physically available is the
source layer. In this method, data warehouses are virtual. This means that
the data warehouse is implemented as a multidimensional view of
operational data created by specific middleware, or an intermediate
processing layer.

The vulnerability of this architecture lies in its failure to meet the

requirement for separation between analytical and transactional
processing. Analysis queries are agreed to operational data after the
middleware interprets them. In this way, queries affect transactional
workloads.
➢ Two-Tier Architecture

The requirement for separation plays an essential role in defining the two-
tier architecture for a data warehouse system, as shown in fig:

Although it is typically called two-layer architecture to highlight a

separation between physically available sources and data warehouses, in
fact, consists of four subsequent data flow stages:

1. Source layer: A data warehouse system uses a heterogeneous source of

data. That data is stored initially to corporate relational databases or legacy
databases, or it may come from an information system outside the
corporatewalls.
2. Data Staging: The data stored to the source should be extracted, cleansed
to remove inconsistencies and fill gaps, and integrated to merge
heterogeneous sources into one standard schema. The so- named
Extraction, Transformation, and Loading Tools (ETL) can combine
heterogeneous schemata, extract, transform, cleanse, validate, filter, and
load source data into a data warehouse.
3. Data Warehouse layer: Information is saved to one logically centralized
individual repository: a data warehouse. The data warehouses can be
directly accessed, but it can also be used as a source for creating data marts,
which partially replicate data warehouse contents and are designed for
specific enterprise departments. Meta-data repositories store
information on sources, access procedures, data staging, users, data mart
schema, and so on.
4. Analysis: In this layer, integrated data is efficiently, and flexible accessed
to issue reports, dynamically analyze information, and simulate
hypothetical business scenarios. It should feature aggregate information
navigators, complex query optimizers, and customer-friendly GUIs.

➢ Three-Tier Architecture

The three-tier architecture consists of the source layer (containing multiple

source system), the reconciled layer and the data warehouse layer
(containing both data warehouses and data marts). The reconciled layer
sitsbetween the source data and data warehouse. The main advantage
of the reconciled layer is that it creates a standard reference data model
for a whole enterprise. At the same time, it separates the problems of
source data extraction and integration from those of data warehouse
population. In some cases, the reconciled layer is also directly used to
accomplish bettersome operational tasks, such as producing daily reports
that cannot be satisfactorily prepared using the corporate applications or
generating data flows to feed external processes periodically to benefit
from cleaning and integration. This architecture is especially useful for the
extensive, enterprise-wide systems. A disadvantage of this structure is the
extra file storage space used through the extra redundant reconciled
layer. It also makes the analytical tools a little further away from being
real-time.
8) Difference between data warehouse and data mart

S.NO Data Warehouse Data Mart

1. Data warehouse is a While it is a decentralised system.
Centralised system.
2. In data warehouse, lightly While in Data mart, highly
denormalization takes place. denormalization takes place.
3. Data warehouse is top-down While it is a bottom-up model.
model.
4. To build a warehouse is While to build a mart is easy.
difficult.
5. In data warehouse, Fact While in this, Star schema and
constellation schema is used. snowflake schema are used.
6. Data Warehouse is flexible. While it is not flexible.
7. Data Warehouse is the data- While it is the project-oriented in
oriented in nature. nature.
8. Data Ware house has long While data-mart has short life than
life. warehouse.
9. In Data Warehouse, Data are While in this, data are contained in
contained in detail form. summarized form.
10. Data Warehouse is vast in While data mart is smaller than
size. warehouse.
11. It collects data from various It generally stores data from a data
data sources. warehouse.
12. Long time for processing the Less time for processing the data
data because of large data. because of handling only a small
amount of data.
13. Complicated design process Easy design process of creating
of creating schemas and schemas and views.
views.

9) Explain DW models - Enterprise Dw, Data Marts, Virtual Warehouse

Enterprise Data Warehouse – EDW is a form of centralized corporate repository

that stores and manages all historic business data of an enterprise
Virtual Data Warehouse – VDW provides a collective view of the completed data. It
has no historic data. It can be considered as a logical data model of the containing
metadata
Data Marts – A data mart includes a subset of corporate-wide data that is of value
to a specific collection of users. The scope is confined to particular selected
subjects.

10) Difference ETL vs ELT

S.NO ETL ELT

1. ETL first extracts data from In ELT, data is immediately
a pool of data sources loaded after being extracted
which are typically from source data pools
transactional databases

2. Data is held in temporary There is no staging

staging database. database meaning the
Transformation operations are data is immediately
then performed to structure loaded into a single
and convert the data into a centralized repository
suitable form for the target
data warehouse system

3. Structured data is loaded into Data is transformed inside the

the warehouse ready for data warehouse system for use
analysis with business intelligence tools
and analytics

11) Define metadata repository

Metadata is simply defined as, data about data. The data that are used to represent
other data is known as metadata. For example, the index of a book serves as
metadata for the contents in the book. In other words, we can say that metadata is
the summarized data that leads us to the detailed data.
Metadata in a data warehouse is similar to the data dictionary or the data
catalogue in a database management system.
The metadata can be broadly categorized into following three categories:
1. Business Metadata: This metadata has the data ownership information,
business definition and changing policies.
2. Technical Metadata: Technical metadata includes database system names,
table and column names and sizes, data types and allowed values. Technical
metadata also includes structural information such as primary and foreign
key attributes and indices.
3. Operational Metadata: This metadata includes currency of data and data
lineage. Currency of data means whether data is active, archived or purged.
Lineage of data means history of data migrated and transformation applied
on it.
The generation and management of metadata serves two purposes:
A. To Minimize the Efforts for Development and Administration of a Data
Warehouse
B. To Improve the Extraction of Information

DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
From Everand
THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE: "THE STEP BY STEP GUIDE FOR SUCCESSFUL IMPLEMENTATION OF DATA LAKE-LAKEHOUSE-DATA WAREHOUSE"
AJIT DASH
2/5 (2)
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Unit1 (DW&DM)
No ratings yet
Unit1 (DW&DM)
30 pages
In T e G R A Ti o N: Integration of Data
No ratings yet
In T e G R A Ti o N: Integration of Data
21 pages
$RRWYO9T
No ratings yet
$RRWYO9T
71 pages
BI Unit 1
No ratings yet
BI Unit 1
39 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
DWDM
No ratings yet
DWDM
107 pages
DWM 1
No ratings yet
DWM 1
15 pages
Data Warehousing
No ratings yet
Data Warehousing
6 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
17 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
Data Warehouse 9 Oct
No ratings yet
Data Warehouse 9 Oct
15 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
DWM Unit-1 Notes
No ratings yet
DWM Unit-1 Notes
10 pages
Data Warehouse
No ratings yet
Data Warehouse
3 pages
Introduction To DW
No ratings yet
Introduction To DW
59 pages
Data Warehouse
No ratings yet
Data Warehouse
86 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
31 pages
DATA Ware House Mining NOTES
No ratings yet
DATA Ware House Mining NOTES
31 pages
Data Warehouse Unit-3 Complete
No ratings yet
Data Warehouse Unit-3 Complete
31 pages
Module 2
No ratings yet
Module 2
43 pages
Warehousing
No ratings yet
Warehousing
15 pages
Introduction To Data Warehousing and Business Intelligence
No ratings yet
Introduction To Data Warehousing and Business Intelligence
15 pages
Datawarehouse Unit2
No ratings yet
Datawarehouse Unit2
75 pages
Datawarehousingbasics 160923045745
No ratings yet
Datawarehousingbasics 160923045745
21 pages
Etl Testing Documentation PDF
No ratings yet
Etl Testing Documentation PDF
22 pages
Data Warehousing-Notes (Module - I & II)
No ratings yet
Data Warehousing-Notes (Module - I & II)
32 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
DWDM Book
No ratings yet
DWDM Book
58 pages
Data Warehouse Basics (Lec. Notes 1)
No ratings yet
Data Warehouse Basics (Lec. Notes 1)
5 pages
Session Five - Data Integration
No ratings yet
Session Five - Data Integration
11 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
103 pages
All Unit
No ratings yet
All Unit
17 pages
Introduction To Data Warehouse Edited
No ratings yet
Introduction To Data Warehouse Edited
34 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
DWDM
No ratings yet
DWDM
15 pages
U1 DMBI
No ratings yet
U1 DMBI
51 pages
Data Warehousing
No ratings yet
Data Warehousing
16 pages
Data Mining Final New
No ratings yet
Data Mining Final New
109 pages
Unit 6 Data Warehousing
No ratings yet
Unit 6 Data Warehousing
40 pages
Module 1
No ratings yet
Module 1
32 pages
Unit - I DW
No ratings yet
Unit - I DW
12 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Soft Copy of The Seminar Topic On
No ratings yet
Soft Copy of The Seminar Topic On
23 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
34 pages
Data Warehouse: Concepts, Architecture and Components
No ratings yet
Data Warehouse: Concepts, Architecture and Components
5 pages
DM Unit V
No ratings yet
DM Unit V
50 pages
DMBI Unit-1
No ratings yet
DMBI Unit-1
37 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
57 pages
Data Warehousing-1
No ratings yet
Data Warehousing-1
51 pages
Data Warehouse Architecture
No ratings yet
Data Warehouse Architecture
4 pages
What Is Data Warehouse
No ratings yet
What Is Data Warehouse
19 pages
DWM Unit-I Notes
No ratings yet
DWM Unit-I Notes
9 pages
Advanced Database Presentation
No ratings yet
Advanced Database Presentation
11 pages
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
82 pages
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
From Everand
Practical TimescaleDB Solutions: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
s-22 DWM
100% (2)
s-22 DWM
33 pages
DWM Microproject Report GRP No.24
No ratings yet
DWM Microproject Report GRP No.24
24 pages
QB Solution PT-1 MAD
No ratings yet
QB Solution PT-1 MAD
31 pages
Exp 14 Osy
No ratings yet
Exp 14 Osy
6 pages
Memory-Management - Unit 5
No ratings yet
Memory-Management - Unit 5
12 pages
ch-4 Deadlock
No ratings yet
ch-4 Deadlock
16 pages

DWM Unit1 Solved QB

Uploaded by

DWM Unit1 Solved QB

Uploaded by

DMW Unit-1 Question Bank

1) Define data warehouse

Reference book definition – A Data Warehouse is a collection of corporate

(both definitions have to be written)

2) Write and explain characteristics of data warehousing

A data warehouse is subject oriented because it gives information around

A data warehouse is built by integrating data from different sources such

3) Difference between operational DBMS and Data Warehouse

Operational Database Systems Data Warehouse

4) Write needs of data warehousing

a. Business User: Business users require a data warehouse to view

6) Advantages, Disadvantages, Applications of Data Warehouse

1. For unstructured data it is not an ideal option

7) Explain types of DW architecture

Single-Tier architecture is not periodically used in practice. Its purpose is

The vulnerability of this architecture lies in its failure to meet the

Although it is typically called two-layer architecture to highlight a

1. Source layer: A data warehouse system uses a heterogeneous source of

The three-tier architecture consists of the source layer (containing multiple

S.NO Data Warehouse Data Mart

9) Explain DW models - Enterprise Dw, Data Marts, Virtual Warehouse

Enterprise Data Warehouse – EDW is a form of centralized corporate repository

10) Difference ETL vs ELT

S.NO ETL ELT

2. Data is held in temporary There is no staging

3. Structured data is loaded into Data is transformed inside the

11) Define metadata repository

You might also like