0% found this document useful (0 votes)
17 views10 pages

DWM Unit-1 Notes

A data warehouse is a system for collecting and managing business data from various sources to provide insights, characterized by being subject-oriented, integrated, time-variant, and non-volatile. It supports strategic decision-making, data consistency, and high response times, while employing a multi-tier architecture comprising bottom, middle, and top tiers. The document also discusses the advantages and disadvantages of data warehouses, the importance of metadata repositories, and the ETL process for data integration.

Uploaded by

Gajanan Markad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

DWM Unit-1 Notes

A data warehouse is a system for collecting and managing business data from various sources to provide insights, characterized by being subject-oriented, integrated, time-variant, and non-volatile. It supports strategic decision-making, data consistency, and high response times, while employing a multi-tier architecture comprising bottom, middle, and top tiers. The document also discusses the advantages and disadvantages of data warehouses, the importance of metadata repositories, and the ETL process for data integration.

Uploaded by

Gajanan Markad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unit-1: Introduction To Data Warehousing

Q.1] Define Data warehouse


Data Warehouse:
▪ A Data Warehousing (DW) is a process for collecting and managing
business data from varied sources to provide meaningful business insights.
▪ Data warehouse is an information system that contains historical and
commutative data from single or multiple sources.
Q.2] Distinguish between Operational database system & Data Warehouse

Q.3] Characteristics of Data Warehousing

Fig. Characteristics of Data Warehouse


1. Subject-oriented
2. Integrated
3. Time Variant
4. Non-Volatile
1] Subject-oriented:
A data warehouse is also subject-oriented, which means that
the data is organized around specific subjects, such as customers, products, or
sales. This allows for easy access to the data relevant to a specific subject, as well
as the ability to track the data over time.
2] Integrated:
One of the key characteristics of a data warehouse is that it contains
integrated data. This means that the data is collected from various sources, such
as transactional systems, and then cleaned, transformed, and consolidated into a
single, unified view. This allows for easy access and analysis of the data, as well
as the ability to track data over time.
3] Time Variant:
A data warehouse is also time-variant, which means that the
data is stored with a time dimension. This allows for easy access to data for
specific time periods, such as last quarter or last year. This makes it possible to
track trends and patterns over time.
4] Non-Volatile:
Data warehouse is also non-volatile means the previous data is
not erased when new data is entered in it. Data is read-only and periodically
refreshed. This also helps to analyse historical data and understand what & when
happened. Activities like delete, update, and insert which are performed in an
operational application environment are omitted in Data warehouse environment.
Only two types of data operations performed in the Data
Warehousing are Data loading and Data access.
Q.4] Need for Data Warehousing

Need for Data Warehouse:


1) Business User: Business users require a data warehouse to view summarized
data from the past. Since these people are non-technical, the data may be
presented to them in an elementary form.
2) Store historical data: Data Warehouse is required to store the time variable
data from the past. This input is made to be used for various purposes.
3) Make strategic decisions: Some strategies may be depending upon the data in
the data warehouse. So, data warehouse contributes to making strategic
decisions.
4) For data consistency and quality: Bringing the data from different sources at
a commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5) High response time: Data warehouse has to be ready for somewhat
unexpected loads and types of queries, which demands a significant degree of
flexibility and quick response time.
Q.5] Multi-tiered Architecture of data warehouse
Multi-Tier Data Warehouse Architecture Components:
1. Bottom Tier
2. Middle Tier
3. Top Tier

1] Bottom Tier (Data sources and data storage):


▪ Bottom tier includes the databases or data sources.
▪ From these sources the data is collected in data warehouse.
▪ It is usually a relational database system.

2] Middle Tier:
▪ The middle tier in Data warehouse is an OLAP server which is
implemented using either ROLAP or MOLAP model.
▪ This application tier presents an abstracted view of the database. This layer
also acts as a mediator between the end-user and the database.
▪ It includes summary data, raw data and metadata.
3] Top Tier:
▪ The top tier is a front-end client layer. Top tier is the tools and API that
user used to get useful data out from the data warehouse.
▪ The different tools are Query tools, reporting tools, managed query tools,
Analysis tools and Data mining tools.
Q.6] Advantages and Disadvantages of Data Warehouse:
Advantages of Data Warehouse:
1. Better decision-making with consolidated data.
2. Faster queries due to optimized storage.
3. Consistent data through ETL processes.
4. Time-saving with centralized access.
5. Helps with trend forecasting.
Disadvantages of Data Warehouse:
1. High setup and maintenance costs.
2. Complex data integration from multiple sources.
3. Data latency due to periodic updates.
4. Scalability issues with growing data volumes.
5. Requires specialized knowledge and skills.
Q.7] Explain Metadata repository.
Metadata Repository:
(Metadata: data about data, repository: big container)
▪ The metadata repository is responsible for physically storing and
categorizing metadata. The data in the metadata repository should be
generic, integrated, current and historical.
▪ Metadata is the information about the structures that contain the actual
data.
▪ It is data about the structures that contain data. Metadata may describe the
structure of any data, of any subject, stored in any format.
▪ Metadata repository contains the structures of all data at one place, which
gives the plenty of data more than requirement for decision making.
▪ Metadata Repository used for building, maintain, managing Data
warehouse.
Concept example: a line in sales database may contain: 4030 KJ732 299.90 This
is a meaningless data until we consult the Meta that tells us what it was.
The Meta of the data is
• Model number: 4030
• Sales Agent ID: KJ732
• Total sales mount of $299.90
▪ Therefore, Metadata are essential ingredients in the transformation of data
into knowledge.
Example: Metadata of a Book Store:
1. Name of book
2. Summary of book
3. Publication of book
4. Edition of book
5. Author of book
6. Date of publication
7. Availability of book
8. Reviews of book
Above information (metadata) helps to search the book, access the book, etc.
Advantages of Metadata Repository:
1. Centralizes and simplifies metadata management.
2. Ensures data consistency.
3. Supports better decision-making.
4. Enhances data governance.
5. Tracks data lineage for quality.
6. Eases data integration.
Disadvantages of Metadata Repository:
1. Complex to set up and maintain.
2. High initial costs for implementation.
3. Can become overloaded with data.
4. Security risks if unprotected.
5. Requires ongoing updates.
6. Needs skilled personnel.
Q.8] Describe Extraction, Transformation and Loading.
ETL:
ETL means Extract, transform, and load which is a data integration
process that include clean, combine and organize data from multiple sources into
one place which is consistent storage of data in data warehouse, data lake or other
similar systems.

Fig. ETL Process


1]Extraction:
▪ The first step of the ETL process is extraction.
▪ In this step, data from various source systems is extracted which can be in
various formats like relational databases, No SQL, XML and flat files into
the staging area.
▪ The data cannot be loaded in data warehouse; therefore, this is one of the
most important steps of ETL process.
2] Transformation:

The second step of the ETL process is transformation. In this step, a set of
rules or functions are applied on the extracted data to convert it into a single
standard format.

It may involve following processes/tasks:


1. Filtering – loading only certain attributes into the data
warehouse.
2. Cleaning – filling up the NULL values with some default values,
mapping U.S.A, United States, and America into USA, etc.
3. Joining – joining multiple attributes into one.
4. Splitting – splitting a single attribute into multiple attributes.
5. Sorting – sorting tuples on the basis of some attribute (generally
key-attribute).
3]Loading:
The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse. Sometimes the
data is updated by loading into the data warehouse very frequently and
sometimes it is done after longer but regular intervals. The rate and period of
loading solely depends on the requirements and varies from system to
system.

Q.9] Differentiate between Data Warehouse & Data Mart

Q.10] List and explain data warehouse models with suitable examples
Data warehouse models:
1) Enterprise Data Warehouse
2) Data mart
3) Virtual Warehouse
1] Enterprise Data Warehouse:
▪ Enterprise Data Warehouse is a centralized warehouse, which aggregates
the information or data automatically.
▪ It offers a unified approach for organizing and representing data. It also
provides the ability to classify data according to the subject and give access
accordingly to users.
▪ It provides decision support service across the enterprise.
Example: All Polytechnic data available at MSBTE
2] Data Marts:
▪ A data mart is a subset of the data warehouse.
▪ It is specially designed for a particular line of business, such as sales,
finance, sales or finance. In an independent data mart, data can collect
directly from sources.
▪ Due to large amount of data, a single warehouse can become overburdened.
So, to prevent the warehouse from becoming impossible to navigate,
subdivisions created, called as Data Marts.
▪ These data marts divide the information saved in the warehouse into
categories or specific groups of users.
▪ In a simple word Data mart is a subsidiary of a data warehouse.
Example: Five regions of MSBTE: One region may be referred as Data Mart.
3] Virtual Warehouse:
▪ The view over an operational data warehouse is known as a virtual
warehouse.
▪ A virtual warehouse is essentially a separate business database, which
contains only required data for operation system.
▪ The data found in a virtual warehouse is usually copied from multiple
sources throughout an operation system.
▪ Virtual warehouse is used to search the data quickly and without accessing
the entire system. It speeds up the overall access process.
Example: It may contain only one or two Polytechnics data.
Q.11] State any four Benefits of Data warehouse.
1. Delivers enhanced business intelligence:
By having access to
information from various sources in a single platform, decision makers will
no longer need to rely on limited data.
2. Saves times:
A data warehouse standardizes, preserves, and stores data
from different sources, and integration of all the data in one place. So, all
critical data is available to all users simultaneously.
3. Enhances data quality and consistency:
A data warehouse converts data
from multiple sources into a consistent format. The data from different
sources can be filtered, sorted, cleaned. This will lead to more accurate
data, which will become the basis for solid decisions.
4. Generates a high Return on Investment (ROI):
Companies experience
higher revenues and cost savings than those that haven’t invested in a data
warehouse.
5. Provides competitive advantage:
Data warehouses help to get a holistic
(as a whole not parts) view of their current standing and evaluate
opportunities and risks, thus providing companies with a competitive
advantage.
6. Improves the decision-making process:
Data warehousing provides
better insights (detailed understanding) to decision makers by maintaining
a related database of current and historical data.
Q.12] Applications of Data Warehouse
Applications of Data Warehouse:
1. Airlines
2. Banking
3. Healthcare
4. Public Sector
5. Telecommunication
6. Investment

You might also like