0% found this document useful (0 votes)
14 views6 pages

Data Warehousing

Data warehousing involves collecting and managing large volumes of operational data from various sources into a centralized system for analysis. Key characteristics include being subject-oriented, integrated, time-variant, and non-volatile, while components include source data, data staging, storage, information delivery, and management. Advantages include improved decision-making and data quality, while disadvantages encompass high setup costs and complex data integration.

Uploaded by

sahiljamwal2720
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

Data Warehousing

Data warehousing involves collecting and managing large volumes of operational data from various sources into a centralized system for analysis. Key characteristics include being subject-oriented, integrated, time-variant, and non-volatile, while components include source data, data staging, storage, information delivery, and management. Advantages include improved decision-making and data quality, while disadvantages encompass high setup costs and complex data integration.

Uploaded by

sahiljamwal2720
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Warehousing

Data warehousing is the process of collecting, storing, and managing large volumes of data from
different sources into a centralized system, known as a data warehouse.

Data Warehouse is a storage of large amount of operational data ( data that document the everyday
operations of an organisation) gathered from multiple sources , stored under a unified schema at a
single site.

Characteristics of data warehouse

Subject-Oriented: Database is used to represent a process. Like payroll, accounting, etc. on the
other hand, a data warehouse is used to analyze a particular subject area. For example, “sales” .
“Sales” may further have dimensions.

Integrated: Original data available in different source systems is not integrated. A data warehouse
integrates data from these multiple data sources. For example, a customer may be identified using
two different keys at different data sources. Data warehouse must be able to integrate the two
source systems and identify customers on the basis of single key.

Time-Variant: Operational data represents only the current data, whereas data warehouse keeps all
the historical data as well. You can retrieve data for the last 3 months, 6 months, 12 months, or even
older data from a data warehouse.

Non-volatile: Only way to add data to a data warehouse is to extract data from source systems. The
data is used only for the analysis task and no changes are made to it. Historical data in a data are
house is never altered or deleted.
Components of Data Warehousing:

Source Data Component

Source data coming into the data warehouses may be grouped into four broad categories:

 Production Data: It is the data that comes from various operational system of an enterprise
(all the day to day operation data ).
 Internal Data: It includes "private" spreadsheets, reports, customer profiles, and sometimes
even department databases
 Archived Data: It is the old or historical data . In every operational system, we periodically
take the old data and store it in achieved files.
 External Data: It includes statistical data of their industry produced by the external
department.

Data Staging Component ( ETL Process )

After we have been extracted data from various operational systems and external sources, we have
to prepare the files for storing in the data warehouse. The extracted data coming from several
different sources need to be changed, converted, and made ready in a format that is relevant to be
saved for querying and analysis.

We will now discuss the three primary functions that take place in the staging area.
 Data Extraction: Data is extracted from various sources.
 Data Transformation:
 First, we clean the data extracted from each source. Cleaning may be the correction of
misspellings or may deal with providing default values for missing data elements, or
elimination of duplicates when we bring in the same data from various source systems.
 Then , Data standardization is performed. In this , Data is combined to single source from
many source records.
 Data Loading: Then there is loading of the information into the data warehouse storage .

Data Storage Components

Data storage for the data warehousing is a split repository. Data storage is done on three levels:

 Metadata
Metadata is data that describes other data. In a data warehouse, it provides
information about the data's origin, structure, format, and how it is used.
 Data Marts
A data mart is a subset of the data warehouse that is focused on a specific business
area or department, such as sales, marketing, or finance.
 Multidimensional Database
For the analysis purpose data is stored in various multidimensional database

Information Delivery Component

The information delivery element is used to enable the process of fetching of data warehouse files
and transferred to one or more destinations .

Some of the ways by which can be done are:

 Simply data is fetching using query and result is transferred.


 We can perform data mining.
 We can use OLAP
 We can perform Report Query

Management and Control Component

The management and control elements coordinate the services and functions within the data
warehouse. These components control the data transformation and the data transfer into the data
warehouse storage. On the other hand, it controls the data delivery to the clients. Its work with the
database management systems and authorizes data to be correctly saved in the repositories. It
monitors the movement of information into the staging method and from there into the data
warehouses storage itself.
Complexity increase with size

To check data should be clean , correct , authentic


Advantages of Data Warehousing

1. Improved Decision-Making

o Data warehousing provides businesses with a consolidated, unified view of their


data, which enables informed decision-making through better insights, trend
analysis, and forecasting.

2. Historical Data Storage

o Data warehouses store historical data, allowing businesses to track trends over time,
compare past and present performance, and perform long-term data analysis.

3. Enhanced Data Quality and Consistency

o Through data cleansing and transformation processes, data warehouses improve the
quality and consistency of data, reducing errors and discrepancies between different
systems.

4. Faster Query Performance

o Optimized for analytical processing, data warehouses offer faster query


performance compared to transactional databases, especially for complex queries
and reports.

5. Data Integration

o Data warehouses consolidate data from multiple sources (e.g., ERP, CRM, social
media), offering a comprehensive view of an organization’s operations, improving
cross-functional analysis.

Disadvantages of Data Warehousing

1. High Initial Setup Costs

o Implementing a data warehouse requires substantial investment in hardware,


software, and personnel for setup, leading to high upfront costs for businesses.

2. Complex Data Integration

o Integrating data from multiple sources, especially if they are in different formats
(structured, unstructured), can be time-consuming and complex, requiring advanced
data transformation efforts.

3. Maintenance and Operating Costs

o Data warehouses need regular maintenance, updates, and scaling as data volumes
grow, leading to ongoing operational costs in terms of both resources and
personnel.

4. Data Latency

o Data in a warehouse is typically updated in batches, meaning there can be a delay


(data latency) between when data is collected and when it is available for analysis,
limiting real-time decision-making.

5. Complexity in Managing Changes


o Updating or modifying a data warehouse to reflect changes in business needs, such
as adding new data sources or accommodating new reporting requirements, can be
complex and resource-intensive.

You might also like