0% found this document useful (0 votes)
14 views7 pages

Data Warehouse Unit 1

The document provides a comprehensive overview of data warehouses, detailing their history, purpose, characteristics, benefits, and use cases across various industries. It explains the distinction between operational databases and data warehouses, emphasizing the importance of historical data for strategic decision-making. Additionally, it outlines the functions of data warehouse tools and utilities, as well as the advantages and disadvantages of implementing data warehouses.

Uploaded by

asmalubnashaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Data Warehouse Unit 1

The document provides a comprehensive overview of data warehouses, detailing their history, purpose, characteristics, benefits, and use cases across various industries. It explains the distinction between operational databases and data warehouses, emphasizing the importance of historical data for strategic decision-making. Additionally, it outlines the functions of data warehouse tools and utilities, as well as the advantages and disadvantages of implementing data warehouses.

Uploaded by

asmalubnashaikh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Warehouse

History of Data Warehouse


The idea of data warehousing came to the late 1980's when IBM researchers Barry Devlin
and Paul Murphy established the "Business Data Warehouse."
In essence, the data warehousing idea was planned to support an architectural model for the
flow of information from the operational system to decisional support environments. The
concept attempt to address the various problems associated with the flow, mainly the high
costs associated with it.
In the absence of data warehousing architecture, a vast amount of space was required to
support multiple decision support environments. In large corporations, it was ordinary for
various decision support environments to operate independently.

What is a Data Warehouse?


A Data Warehouse (DW)
is a relational database that is designed for query and analysis rather than transaction
processing. It includes historical data derived from transaction data from single and multiple
sources.
A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on
providing support for decision-makers for data modeling and analysis.
A Data Warehouse is a group of data specific to the entire organization, not only to a
particular group of users.
It is not used for daily operations and transaction processing but used for making decisions.
A Data Warehouse is separate from DBMS, it stores a huge amount of data, which is
typically collected from multiple heterogeneous sources like files, DBMS, etc. The goal is to
produce statistical results that may help in decision-making. For example, a college might
want to see quick different results, like how the placement of CS students has improved over
the last 10 years, in terms of salaries, counts, etc.
Need For Data Warehouse
1) Business User: Business users require a data warehouse to view summarized data
from the past. Since these people are non-technical, the data may be presented to them
in an elementary form.
2) Store historical data: Data Warehouse is required to store the time variable data from
the past. This input is made to be used for various purposes.
3) Make strategic decisions: Some strategies may be depending upon the data in the data
warehouse. So, data warehouse contributes to making strategic decisions.
4) For data consistency and quality: Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5) High response time: Data warehouse has to be ready for somewhat unexpected loads
and types of queries, which demands a significant degree of flexibility and quick
response time.

Benefits of Data Warehouse


Understand business trends and make better forecasting decisions.
The structure of data warehouses is more accessible for end-users to navigate, understand,
and query.
Queries that would be complex in many normalized databases could be easier to build and
maintain in data warehouses. Data warehousing is an efficient method to manage demand for
lots of information from lots of users.
Data warehousing provide the capabilities to analyze a large amount of historical data.

Understanding a Data Warehouse


A data warehouse is a database, which is kept separate from the organization's operational
database.
There is no frequent updating done in a data warehouse.
It possesses consolidated historical data, which helps the organization to analyze its business.
A data warehouse helps executives to organize, understand, and use their data to take
strategic decisions.
Data warehouse systems help in the integration of diversity of application systems.
A data warehouse system helps in consolidated historical data analysis.
Why a Data Warehouse is Separated from Operational Databases?
An operational database is constructed for well-known tasks and workloads such as searching
particular records, indexing, etc. In contract, data warehouse queries are often complex and
they present a general form of data.
Operational databases support concurrent processing of multiple transactions. Concurrency
control and recovery mechanisms are required for operational databases to ensure robustness
and consistency of the database.
An operational database query allows to read and modify operations, while an OLAP query
needs only read only access of stored data.
An operational database maintains current data. On the other hand, a data warehouse
maintains historical data.
Data warehouse use cases
Data warehouses have many different business applications. Their use cases may depend on
the industry they're used in. The following offers two examples:
• Health care: A data warehouse may carry patient information that health care
professionals can use to understand certain conditions or evaluate treatment methods.
For example, a health care data scientist may analyze the information in a data
warehouse to determine how often cancer patients over 25 receive chemotherapy
rather than radiation treatment and why.
• Marketing: A marketing firm may use a data warehouse to track the success of a
campaign or product launch. An organization can create and share dashboards and
reports to gauge performance, sales, and customer service interactions.

• Difference Between Database System and Data Warehouse

Database System Data Warehouse

It supports analysis and performance


It supports operational processes.
reporting.

Capture and maintain the data. Explore the data.

Current data. Multiple years of history.

Data is balanced within the scope of Data must be integrated and balanced from
this one system. multiple system.

Data is updated when transaction


Data is updated on scheduled processes.
occurs.
Database System Data Warehouse

Data verification occurs when entry is


Data verification occurs after the fact.
done.

100 MB to GB. 100 GB to TB.

ER based. Star/Snowflake.

Application oriented. Subject oriented.

Primitive and highly detailed. Summarized and consolidated.

Flat relational. Multidimensional.

Characteristics and Functions of Data warehouse


1. Subject Oriented: Focuses on a specific area or subject such as sales, customers, or
inventory.
2. Integrated: Integrates data from multiple sources into a single, consistent format.
3. Read-Optimized: Designed for fast querying and analysis, with indexing and
aggregations to support reporting.
4. Summary Data: Data is summarized and aggregated for faster querying and analysis.
5. Historical Data: Stores large amounts of historical data, making it possible to
analyze trends and patterns over time.
6. Schema-on-Write: Data is transformed and structured according to a predefined
schema before it is loaded into the data warehouse.
Subject-Oriented
A data warehouse target on the modeling and analysis of data for decision-makers.
Therefore, data warehouses typically provide a concise and straightforward view around a
particular subject, such as customer, product, or sales, instead of the global organization's
ongoing operations. This is done by excluding data that are not useful concerning the subject
and including all data needed by the users to understand the subject.
Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and
online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among
different data sources.
Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files from 3
months 6 months 12 months or even previous data from a data warehouse. These variations
with a transactions system, where often only the most current file is kept.
Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the
source operational RDBMS. The operational updates of data do not occur in the data
warehouse, i.e., update, insert, and delete operations are not performed. It usually requires
only two procedures in data accessing: Initial loading of data and access to data. Therefore,
the DW does not require transaction processing, recovery, and concurrency capabilities,
which allows for substantial speedup of data retrieval. Non-Volatile defines that once entered
into the warehouse, and data should not change.
Optimized for Querying and Analysis
People can quickly and precisely analyze data with the help of a data warehouse. It is
designed to make discovering and analyzing information easy. This is accomplished via
strategies such as indexing, partitioning, and aggregation. These strategies allow the data
warehouse to quickly search through massive amounts of data. They provide you with the
required information.
Designed for Decision Support
A data warehouse is designed to help decision-making(reporting, analysis, and data mining).
It provides a centralized and consistent view of the data. It makes data easier to identify
trends, patterns, and anomalies in the data.

Advantages of a Data Warehouse:


Advantages
• Improved Decision-Making: By consolidating data from various sources, data
warehouses provide a comprehensive view of the business, enabling more informed
and strategic decision-making.
• Enhanced Data Quality and Consistency: The processes of cleansing and
transforming data for the warehouse improve data quality and ensure consistency
across the organization.
• Historical Intelligence: Data warehouses store historical data, enabling trend
analysis, forecasting, and strategic planning based on past performance.
• Time Savings: They provide quick access to relevant data, significantly reducing the
time needed for data retrieval and analysis.
• Increased Productivity: With easy access to data and analytical tools, employees can
focus more on analysis rather than data collection, boosting productivity.
• Competitive Advantage: The insights gained from a data warehouse can provide a
significant competitive edge by identifying market trends, customer behavior patterns,
and operational efficiencies.
Disadvantages
• High Costs: The initial setup, maintenance, and operation of data warehouses can be
expensive, requiring investment in hardware, software, and skilled personnel.
• Time-Consuming Implementation: Designing and implementing a data warehouse
can be a complex and lengthy process, requiring careful planning and execution.
• Complexity: Managing a data warehouse requires specialized skills and expertise,
which can be challenging to find and retain.
• Inflexibility: Traditional data warehouses can be inflexible when it comes to handling
unstructured data or adapting to rapidly changing business needs.

Some real-world examples of how data warehouses are used across various industries:
1. Retail
Analyzing Customer Behavior: Retailers use data warehouses to analyze customer
purchase history, browsing patterns, demographics, and preferences. This helps them
understand customer segments, personalize marketing campaigns, and optimize
product assortment.
Inventory Management: By integrating data from point-of-sale systems,
warehouses, and suppliers, retailers can optimize inventory levels, predict demand,
and minimize stockouts or overstocking.
Sales Performance Tracking: Data warehouses enable retailers to track sales
performance across different channels, regions, and product categories, identifying
top-selling items, underperforming products, and seasonal trend

2. Healthcare
• Patient Care: Hospitals and healthcare providers use data warehouses to consolidate
patient medical records, track treatment outcomes, and identify trends in diseases and
patient populations. This helps improve patient care, optimize treatment protocols,
and conduct research.
• Operational Efficiency: Data warehouses help healthcare organizations manage
resources, optimize staffing levels, and improve operational efficiency.
• Public Health: Public health agencies use data warehouses to track disease outbreaks,
monitor public health trends, and develop effective interventions and prevention
programs.
Amazon Redshift is a cloud-based data warehouse service that allows users to analyze large
amounts of data efficiently:
• Features
Amazon Redshift is a fully-managed service that supports petabyte-scale data sets. It offers
fast query performance, and integrates with business intelligence (BI) tools, reporting, data,
and analytics tools

Functions of Data Warehouse Tools and Utilities


The following are the functions of data warehouse tools and utilities −
• Data Extraction − Involves gathering data from multiple heterogeneous sources.
• Data Cleaning − Involves finding and correcting the errors in data.
• Data Transformation − Involves converting the data from legacy format to
warehouse format.
• Data Loading − Involves sorting, summarizing, consolidating, checking integrity,
and building indices and partitions.
• Refreshing − Involves updating from data sources to warehouse.

How Data Warehouse Works


Data Warehousing integrates data and information collected from various sources into one
comprehensive database. For example, a data warehouse might combine customer
information from an organization’s point-of-sale systems, its mailing lists, website, and
comment cards. It might also incorporate confidential information about employees, salary
information, etc. Businesses use such components of data warehouse to analyze customers.
Data mining is one of the features of a data warehouse that involves looking for meaningful
data patterns in vast volumes of data and devising innovative strategies for increased sales
and profits.

You might also like