0% found this document useful (0 votes)
24 views23 pages

Data Warehousing

Uploaded by

librandosarhj12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views23 pages

Data Warehousing

Uploaded by

librandosarhj12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

DATA

WAREHOUSING
Contents
WHAT IS DATA WAREHOUSING
HISTORY OF DATA WAREHOUSING
FACTS ABOUT DATA WAREHOUSING
CHARACTERISTICS OF DATA
WAREHOUSING
USAGE AND TRENDS
ARCHITECTURE OF DATA WAREHOUSING
DBMS VS. DATA WAREHOUSING
WHAT IS DATA WAREHOUSE?
Data warehousing is a process of collecting and
storing data from various sources in a single
repository for analysis and reporting. It plays a
crucial role in extracting valuable insights from
data, enabling better decision-making across
various business functions.

Back to Agenda Page


Data Warehousing
Definition Data warehousing is an aspect to gather
data from multiple sources into central
repository, called Data warehousing.

According to William H. Inmon, a leading


architect in the constructions of data
warehousing systems, “A data
warehousing is a subject-oriented,
integrated, time-variant and non-
volatile collection of data in support of
management’s decisions making
process.
Data Warehousing Processes

Data Data Periodic Data


Data Cleaning Data Loading
Integrration Tranformation Refreshing

include filling in Data integration Sort; summarize,


Convert data from
missing values, includes consolidate; Propogates the
legacy or host
smoothing noisy integration of compute views; update from data
format to data
data, identifying multiple warehouse format. check integrity. sources to the
or removing databases, data Build indices and warehouse.
inconsistencies cube, or files. partitions.
HISTORY OF DATA WAREHOUSING
The Concept of data warehousing dates back to the late 1980's when IBM researchers Barry
Davlin and Paul Murphy developed the “ Business Data Warehouse”.
In essence, the data warehousing concept was intended provide an architectural model
for the flow of data from operational systems to decision support environments
1960's - General Mills and Dartmouth College, in a joint research project, developed the terms
of Dimension and Facts.
1970's- ACNielsen and IRI provide dimensional data marts from retail sales.
1983- Tera data introduces a database management system specifically designed for decision
support.
1988- Barry Delvin and Paul Murphy publish an article An Architecture for Business and
Information System in IBM System Journal where they introduce the “Business Data
Warehouse”.
1
Issues inloved in warehousing include techniques for
dealing with errors and techniques for efficient storage and
indexing of large volumes of data.

Facts about This system is used for reporting and data analysis.

Data 3
It usually contains historical data derived from transaction

Warehousing
data.

4
Data warehousing is not meant for current ‘live’ data.
Benefits of Data Warehousing
Improved Decision Making Competitive Advantage
Data warehousing enables better decision by leveraging data insights, organizations
making by providing a comprehensive and can make informed decisions to stay ahead
unified view of data of the competition.

Enhanced Business Intelligence Increased Efficiency


Data warehousing allows organizations to Data warehousing streamlines operations
gain insights into business trends and and improves overall business efficiency by
customer behavior. optimizing processes.
Key Components of a Data Warehouse

Extraction,
Transformation, and
Data Warehouse Metadata Repository
Data Source Systems Load (ETL) Process
Database
This component stores
These are the original Data is extracted from
The central repository for information about the
systems where data source systems,
storing and managing data itself, including its
originates, such as transformed into a
data in a structured and structure, definitions, and
transactional databases, consistent format, and
organized manner. relationships.
web logs, or CRM systems. loaded into the data
warehouse.
DIMENSIONAL MODELING
TECHNIQUES

1 STAR
SCHEMA
2
SNOWFLAKES
SCHEMA

A simple and widely used dimensional A more complex schema, where dimension
model, featuring a central fact table tables can be further normalized into smaller
surrounded by dimension tables, tables, creating a snowflake-like pattern.
representing attributes. This approach can lead to better data
This structure allows for efficient querying, integrity, but might increase query
particularly for reporting and analysis. complexity.
Extract, Transform, Load (ETL) Process
1

Extraction
Data is extracted from various source systems, such as
databases, spreadsheets, or web logs..

2
Transformation
Extracted data is transformed into a consistent format,
cleansed, and validated to ensure data quality.

3
Loading
Cleansed and transformed data is loaded into the data
warehouse, ready for analysis and reporting.
Complete control over the four main areas of data management systems:-

Data
Warehouse Clean Data

Advantages Query processing: multiple options


Indexes: Multiple types
Security: data and access
Adding new data sources takes time and associated Data
Warehousing
high cost.
Data owners lose control over their data, raising
ownership, security, and privacy issues.
Long initial implementation time and associated high
cost.
Disadvantages
Difficult to accommodate changes in data types and
ranges, data sources schemas, indexes and queries.
SUBJECT-ORIENTED- A data warehouse can be used to
analyze a particular subject area.
EX: “SALES” Can be a particular subject.

INTEGRATED- a data warehouse integrates data from

CHARACTERISTICS multiple data sources.


EX: Source A and Source B may have different ways of

OF DATA identifying a product, but in a data warehouse, there will be


only a single way of identifying a product.

WAREHOUSING TIME VARIANT- Historical data is kept in a data warehouse

NON VOLATILE- once data is in the data warehouse; it will


not change. so, historical data in a data warehouse should
never be altered.
1.) Information Processing
supports quering, basic statistical analysis,
and reporting using crosstabs, tables, chrarts

Data and graphs.

Warehouse 2.) Analytical Processing

Usage
Multidimensional analysis of data warehouse
data.
Supports basic OLAP operations, slice-dice,
drilling and pivoting.
Three kinds of data warehouse
applictions
3.) Data Mining
Knowledge discovery from hidden patterns.
Supports association, constructing analytical
models, performing classifications and
prediction,, and presenting the mining results
using visualization tools.
Architecture of data warehouse
1.) Bottom Tier: The bottom tier is a warehouse database server that is always rational
database system.
Back-end tools and utilities are used to feed data into the bottom tier from operational
database or other external sources. These tools and utilities perform data extraction,
cleaning and transformation as well as load and refresh functions to update the data
warehouse.
the date extracted using application program interfaces known as gateways.
example of gateways are ODBS (open database connection) and OLEDB (open linking and
embedding for database) by microsoft and JDBC (java database connection)..
the tier also contains a metadata repository, which stores information about the data
warehouse and its contents.
Architecture of data warehouse
2.Middle Tier- The middle tier is an OLAP server that is typically implemented
using either:
a. RELATIONAL OLAP (ROLAP)- Model that is an extended
relation DBMS that maps operations. intermediate server by relational back-end
server and client front-end tools
b. A MULTIDIMENTIONAL OLAP (MOLAP) Model that is, a special purpose
server that directly implements multidimensional data and operations.
supports multidimension views.
Architecture of data warehouse
3.) Top Tier- The top tier is a front-end client layer, which contains query and
reporting tools, analysis tools and data mining tools.
OLAP- Online Analytical Processing
1. this is the major task of data warehousing system
2. useful complex data analysis and decision making
3. Market oriented- used by managers, executives and data analyst.
4.
DBMS VS DATA WAREHOUSING

Data
In today’s corporate world, every business enterprise, no matter how big or
Warehouse
small requires a database. The more the business growws, the more urgent is

Advantages
the requirement of a database. the database is required to keep a check on the
growth of a business in a specific period.
BDMS- is at times known as the database manager although it is the
abbreviated from database management system.
it is basically a repertoire of computer programs that devoted for the
management of the database of an organization.
it is a complete and comprehensive methodology in use for specific
purposes
Like overall management of digital data-bases, creation and
maintenance of data, searching and serving other operations relating
to the database.

Back to Agenda Page


DATA WAREHOUSES

a data warehouse that The data warehouse consist The data warehouse is a
usally a place where of eighter one or several database of different kind: an
various types of database computer systems. OLAP (online analytical
are stored mainly for processing ) database. A data
purpose of security, warehouse exists as a layer on
archival analysis and top of another database or
storage database (usually OLTP
database).
in DBMS, there is OLTP (Online Transaction Processing) is used. here we
cannot analyze because data changes changes day by day.

In data warehousing there is OLAP (online analytical processing). it maintain


historical data. it collects data from different database like oracle and so on. iit
is used to find analysis and generate reports.

The key difference between DBMS and data warehouse is the fact that a data
warehouse can be treated as a type of database or kind of database which provides
special facilities for analysis and reporting while DBMS is the overall system which
manages certain data .
THANK YOU
FOR LISTENING
Reporters:
Beldad Maryjoy A.
Librando, Marie Krisyha S.
Obelidor, Rochelle S.
BSIS 4A

You might also like