DWH Question Bank
DWH Question Bank
Question Bank
Two Mark Question
9 List some differences between an OLTP system and a data warehouse system.
12 What types of queries do managers need to pose to the enterprise’s database systems?
13 What is an ODS used for? How does it differ from an OLTP system
14 Give three most important guideline in implementing a data warehouse for a large
enterprise.
16 What ETL?
17 Give two reasons for the dirty data being extracted from source systems?
22 What are the major differences between OLTP and a data warehouse system?
26 What is OLAP?
35 List four types of aggregate queries that are possible with two variables.
37 What is a measure?
1.Explain why ETL must deal with dirty data when extracting information from the source
systems.
3. Describe the operations roll-up, drill-down, slice and the dice and pivot.
4.Suppose that you are in the market to purchase a data mining system.
(a) Regarding the coupling of a data mining system with a database and/or data warehouse
system, what are the differences between no coupling, loose coupling, semitight coupling,
and tight coupling?
(b) What is the difference between row scalability and column scalability?
(c) Which feature(s) from those listed above would you look for when selecting a data mining
system
5.Discuss the major difference between the star schema and the snowflake schema?
6.Data warehousing is the only viable means to resolve the information crisis and to provide
strategic information. List four reasons to support this assertion and explain them.
8.Why is it important to store multiple types of data in the data warehouse? Give examples of
some nonstructured data likely to be found in the data warehouse of a health management
organization (HMO).
10.Discuss the major design issues that need to be addressed before proceeding with the data
design.
11.Discuss the difference between OLTP and Data warehouse with example
12.You are the data design specialist on the data warehouse project team for a manufacturing
company. Design a STAR schema to track the production quantities. Production quantities
are normally analyzed along the business dimensions of product, time, parts used, production
facility, and production run. State your assumptions.
13.Discuss the three major types of metadata in a data warehouse? Briefly mention the
purpose of each type.
14.Discuss any six different methods for information delivery.
18.Data warehouse architect for a leading national department store chain. The data
warehouse has been up and running for nearly a year. Now the management has decided to
provide the power users with OLAP facilities. How will you alter the information delivery
component of data warehouse architecture? Make realistic assumptions and proceed.
20. The current trends in hardware/software technology make data warehousing feasible.
Explain via some examples how exactly technology trends do help.
22.Describe the composition of the primary keys for the dimension and fact tables with
suitable example.
23.You are the data analyst on the project team building a data warehouse for an insurance
company. List the possible data sources from which you will bring the data into your data
warehouse. State your assumptions
24.Suppose that you are in the market to purchase a data mining system .Discuss how
different forms of data mining can be used in the application.
29.Describe the main theoretical foundations that have been proposed for data mining.
Comment on how they each satisfy (or fail to satisfy) the requirements of an ideal theoretical
framework for data mining.
30Describe the top-down and bottom-up approaches for building a data warehouse and
discuss the merits and disadvantages of each approach.
.31, In a STAR schema to track the shipments for a distribution company, the following
dimension tables are found: (1) time, (2) customer ship-to, (3) ship-from, (4) product, 5) type
of deal, and (6) mode of shipment. Review these dimensions and list the possible attributes
for each of the dimension tables. Also, designate a primary key for each table.
34.Explain various steps of Dimension reduction in Data warehouse with suitable example.
36. Describe four types of charts you are likely to see in the delivery of information from a
data mart supporting the finance department.
37.Describe with examples snapshot and transaction fact tables. How are they related?
38.Describe aggregate fact tables. Why are they needed? Give an example.
40.Discuss how different forms of data mining can be used in the application.. Explain with an
example
IMPORTANT Question and Answers
Metadata are data about data. When used in a data warehouse, metadata are the data that
define warehouse objects. Metadata are created for the data names and definitions of the
given warehouse. Additional metadata are created and captured for time stamping any
extracted data, the source of the extracted data, and missing fields that have been added by
data cleaning or integration processes.
There are four key characteristics which separate the data warehouse from other major
operational systems:
Handling of relational and complex types of data: Because relational databases and data
warehouses are widely used, the development of efficient and effective data mining systems
for such data is important.
Local- and wide-area computer networks (such as the Internet) connect many sources of
data, forming huge, distributed, and heterogeneous databases.
5. Compare OLTP and OLAP Systems.
If an on-line operational database systems is used for efficient retrieval, efficient storage and
management of large amounts of data, then the system is said to be on-line transaction
processing. Data warehouse systems serves users (or) knowledge workers in the role of data
analysis and decision-making. Such systems can organize and present data in various
formats. These systems are known as on-line analytical processing systems.
Fact table contains the name of facts (or) measures as well as keys to each of the related
dimensional tables.
A dimension table is used for describing the dimension. (e.g.) A dimension table
for item may contain the attributes item_ name, brand and type.
The dimension table of the snowflake schema model may be kept in normalized form to
reduce redundancies. Such a table is easy to maintain and saves storage space.
`In data transformation, the data are transformed or consolidated into forms appropriate for
mining. Data transformation can involve the following:
The slice operation performs a selection on one dimension of the cube resulting in
a sub cube.The dice operation defines a sub cube by performing a selection on two (or) more
dimensions.
Stars schema: The most common modeling paradigm is the star schema, in which the data
warehouse contains (1) a large central table (fact table) containing the bulk of the data, with
no redundancy, and (2) a set of smaller attendant tables (dimension tables), one for each
dimension.
Snowflakes schema: The snowflake schema is a variant of the star schema model, where
some dimension tables are normalized, thereby further splitting the data into additional tables.
The resulting schema graph forms a shape similar to a snowflake.
Fact Constellations: Sophisticated applications may require multiple fact tables to share
dimension tables. This kind of schema can be viewed as a collection of stars, and hence is
called a galaxy schema or a fact constellation.
11. How is a data warehouse different from a database? How are they similar? (
12. List out the functions of OLAP servers in the data warehouse architecture.
The OLAP server performs multidimensional queries of data and stores the results in its
multidimensional storage. It speeds the analysis of fact tables into cubes, stores the cubes
until needed, and then quickly returns the data to clients.
1. Write in detail about the architecture and implementation of the data warehouse. (OR)
Diagrammatically illustrate and discuss the three tier data warehousing architecture. OR)
Write a detailed diagram describe the general architecture of data warehouse. (OR)
Describe the data warehouse architecture with a neat diagram.
• Virtual Warehouse
• Data mart
• Enterprise Warehouse
Some data is denormalized for simplification and to improve performance. Large amounts of
historical data are used.
Queries often retrieve large amounts of data. Both planned and ad hoc queries are common.
3. Discuss the various types of warehouse schema with suitable example. (Nov/Dec’09) (OR)
Star Schema
Snowflake Schema
5. Enumerate the building blocks of a data warehouse. Explain the importance of metadata in
a data warehouse environment. What are the challenges in metadata management?