0% found this document useful (0 votes)

95 views26 pages

Unit 1

The document discusses the components and characteristics of a data warehouse. It describes the source data component, data staging component, and data storage component. It also outlines some key characteristics of a data warehouse such as being subject-oriented, integrated, and time-variant.

Uploaded by

Santhosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views26 pages

Unit 1

Uploaded by

Santhosh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

CCS341 DATA WAREHOUSING L T P C2 0 2 3

UNIT I INTRODUCTION TO DATA WAREHOUSE 5

Data warehouse Introduction - Data warehouse components- operational database Vs data

Warehouse – Data warehouse Architecture – Three-tier Data Warehouse Architecture –
Autonomous Data Warehouse- Autonomous Data Warehouse Vs Snowflake - Modern Data
Warehouse

Data warehouse Introduction

Data Warehouse is a relational database management system (RDBMS) construct to meet

the requirement of transaction processing systems. It can be loosely described as any centralized
data repository which can be queried for business benefits. It is a database that stores information
oriented to satisfy decision-making requests. It is a group of decision support technologies, targets
to enabling the knowledge worker (executive, manager, and analyst) to make superior and higher
decisions. So, Data Warehousing support architectures and tool for business executives to
systematically organize understand and use their information to make strategic decisions.

Data Warehouse environment contains an extraction, transportation, and loading (ETL) solution,
an online analytical processing (OLAP) engine, customer analysis tools, and other applications
that handle the process of gathering information and delivering it to business users.

What is a Data Warehouse?

A Data Warehouse (DW) is a relational database that is designed for query and analysis
rather than transaction processing. It includes historical data derived from transaction data from
single and multiple sources.

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on

providing support for decision-makers for data modeling and analysis.

A Data Warehouse is a group of data specific to the entire organization, not only to a
particular group of users.

It is not used for daily operations and transaction processing but used for making decisions.

A Data Warehouse can be viewed as a data system with the following attributes:

o It is a database designed for investigative tasks, using data from various applications.
o It supports a relatively small number of clients with relatively long interactions.
o It includes current and historical data to provide a historical perspective of information.

1
o Its usage is read-intensive.
o It contains a few large tables.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in

support of management's decisions."

Characteristics of Data Warehouse

Subject-Oriented
A data warehouse target on the modeling and analysis of data for decision-makers.
Therefore, data warehouses typically provide a concise and straightforward view around a
particular subject, such as customer, product, or sales, instead of the global organization's
ongoing operations. This is done by excluding data that are not useful concerning the subject and
including all data needed by the users to understand the subject.

2
Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files,
and online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among different
data sources.

Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files
from 3 months, 6 months, 12 months, or even previous data from a data warehouse. These
variations with a transactions system, where often only the most current file is kept.

3
Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the
source operational RDBMS. The operational updates of data do not occur in the data warehouse,
i.e., update, insert, and delete operations are not performed. It usually requires only two
procedures in data accessing: Initial loading of data and access to data. Therefore, the DW does
not require transaction processing, recovery, and concurrency capabilities, which allows for
substantial speedup of data retrieval. Non-Volatile defines that once entered into the warehouse,
and data should not change.

History of Data Warehouse

The idea of data warehousing came to the late 1980's when IBM researchers Barry Devlin
and Paul Murphy established the "Business Data Warehouse."

In essence, the data warehousing idea was planned to support an architectural model for
the flow of information from the operational system to decisional support environments. The
concept attempt to address the various problems associated with the flow, mainly the high costs
associated with it.

In the absence of data warehousing architecture, a vast amount of space was required to
support multiple decision support environments. In large corporations, it was ordinary for various
decision support environments to operate independently.

Goals of Data Warehousing

o To help reporting as well as analysis

o Maintain the organization's historical information
o Be the foundation for decision making.

Need for Data Warehouse

Data Warehouse is needed for the following reasons:

4
1. 1) Business User: Business users require a data warehouse to view summarized data from
the past. Since these people are non-technical, the data may be presented to them in an
elementary form.
2. 2) Store historical data: Data Warehouse is required to store the time variable data from
the past. This input is made to be used for various purposes.
3. 3) Make strategic decisions: Some strategies may be depending upon the data in the data
warehouse. So, data warehouse contributes to making strategic decisions.
4. 4) For data consistency and quality: Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring the uniformity and consistency
in data.
5. 5) High response time: Data warehouse has to be ready for somewhat unexpected loads
and types of queries, which demands a significant degree of flexibility and quick response
time.

Benefits of Data Warehouse

Understand business trends and make better forecasting decisions.

1. Data Warehouses are designed to perform well enormous amounts of data.
2. The structure of data warehouses is more accessible for end-users to navigate, understand,
and query.
3. Queries that would be complex in many normalized databases could be easier to build and
maintain in data warehouses.
4. Data warehousing is an efficient method to manage demand for lots of information from
lots of users.
5. Data warehousing provide the capabilities to analyze a large amount of historical data.

Components or Building Blocks of Data Warehouse

Architecture is the proper arrangement of the elements. We build a data warehouse with software and
hardware components. To suit the requirements of our organizations, we arrange these building we may
want to boost up another part with extra tools and services. All of these depend on our circumstances.

5
The figure shows the essential elements of a typical warehouse. We see the Source Data
component shows on the left. The Data staging element serves as the next building block.
In the middle, we see the Data Storage component that handles the data warehouses data.
This element not only stores and manages the data; it also keeps track of data using the
metadata repository. The Information Delivery component shows on the right consist of all
the different ways of making the information from the data warehouses available to the
users.
Source Data Component
Source data coming into the data warehouses may be grouped into four broad
categories:

Production Data: This type of data comes from the different operating systems of the
enterprise. Based on the data requirements in the data warehouse, we choose segments of
the data from the various operational modes.

Internal Data: In each organization, the client keeps their "private" spreadsheets, reports,
customer profiles, and sometimes even department databases. This is the internal data, part
of which could be useful in a data warehouse.

Archived Data: Operational systems are mainly intended to run the current business. In
every operational system, we periodically take the old data and store it in achieved files.

External Data: Most executives depend on information from external sources for a large
percentage of the information they use. They use statistics associating to their industry
produced by the external department.

6
Data Staging Component
After we have been extracted data from various operational systems and external
sources, we have to prepare the files for storing in the data warehouse. The extracted data
coming from several different sources need to be changed, converted, and made ready in a
format that is relevant to be saved for querying and analysis.
We will now discuss the three primary functions that take place in the staging area.

1) Data Extraction: This method has to deal with numerous data sources. We have to
employ the appropriate techniques for each data source.
2) Data Transformation: As we know, data for a data warehouse comes from many
different sources. If data extraction for a data warehouse posture big challenges, data
transformation present even significant challenges. We perform several individual tasks as
part of data transformation.

First, we clean the data extracted from each source. Cleaning may be the correction of
misspellings or may deal with providing default values for missing data elements, or
elimination of duplicates when we bring in the same data from various source systems.

Standardization of data components forms a large part of data transformation. Data

transformation contains many forms of combining pieces of data from different sources. We
combine data from single source record or related data parts from many source records.

On the other hand, data transformation also contains purging source data that is not useful and
separating outsource records into new combinations. Sorting and merging of data take place
on a large scale in the data staging area. When the data transformation function ends, we have
a collection of integrated data that is cleaned, standardized, and summarized.

7
3) Data Loading: Two distinct categories of tasks form data loading functions. When we
complete the structure and construction of the data warehouse and go live for the first time, we
do the initial loading of the information into the data warehouse storage. The initial load moves
high volumes of data using up a substantial amount of time.

Data Storage Components

Data storage for the data warehousing is a split repository. The data repositories for the
operational systems generally include only the current data. Also, these data repositories
include the data structured in highly normalized for fast and efficient processing.

Information Delivery Component

The information delivery element is used to enable the process of subscribing for data
warehouse files and having it transferred to one or more destinations according to some
customer-specified scheduling algorithm.

Metadata Component
Metadata in a data warehouse is equal to the data dictionary or the data catalog in a database
management system. In the data dictionary, we keep the data about the logical data structures,
the data about the records and addresses, the information about the indexes, and so on.

8
Data Marts

It includes a subset of corporate-wide data that is of value to a specific group of users. The
scope is confined to particular selected subjects. Data in a data warehouse should be a fairly
current, but not mainly up to the minute, although development in the data warehouse industry
has made standard and incremental data dumps more achievable. Data marts are lower than
data warehouses and usually contain organization. The current trends in data warehousing are
to developed a data warehouse with several smaller related data marts for particular kinds of
queries and reports.

Management and Control Component

The management and control elements coordinate the services and functions within the data
warehouse. These components control the data transformation and the data transfer into the
data warehouse storage. On the other hand, it moderates the data delivery to the clients. Its
work with the database management systems and authorizes data to be correctly saved in the
repositories. It monitors the movement of information into the staging method and from there
into the data warehouses storage itself.

Why we need a separate Data Warehouse?

Data Warehouse queries are complex because they involve the computation of large groups of
data at summarized levels.

It may require the use of distinctive data organization, access, and implementation method
based on multidimensional views.

Performing OLAP queries in operational database degrade the performance of functional tasks.

Data Warehouse is used for analysis and decision making in which extensive database is
required, including historical data, which operational database does not typically maintain.

The separation of an operational database from data warehouses is based on the different
structures and uses of data in these systems.

Because the two systems provide different functionalities and require different kinds of data,
it is necessary to maintain separate databases.
Difference between Database and Data Warehouse

9
Database Data Warehouse

1. It is used for Online Transactional Processing 1. It is used for Online Analytical Processing (OLAP).
(OLTP) but can be used for other objectives such as This reads the historical information for the customers
Data Warehousing. This records the data from the for business decisions.
clients for history.

2. The tables and joins are complicated since they are 2. The tables and joins are accessible since they are de-
normalized for RDBMS. This is done to reduce normalized. This is done to minimize the response time
redundant files and to save storage space. for analytical queries.

3. Data is dynamic 3. Data is largely static

4. Entity: Relational modeling procedures are used 4. Data: Modeling approach are used for the Data
for RDBMS database design. Warehouse design.

5. Optimized for write operations. 5. Optimized for read operations.

6. Performance is low for analysis queries. 6. High performance for analytical queries.

7. The database is the place where the data is taken as 7. Data Warehouse is the place where the application
a base and managed to get available fast and efficient data is handled for analysis and reporting objectives.
access.

Difference between Operational Database and Data Warehouse

The Operational Database is the source of information for the data warehouse. It includes detailed
information used to run the day to day operations of the business. The data frequently changes as
updates are made and reflect the current value of the last transactions.

Operational Database Management Systems also called as OLTP (Online Transactions Processing
Databases), are used to manage dynamic data in real-time.

10
Data Warehouse Systems serve users or knowledge workers in the purpose of data analysis and
decision-making. Such systems can organize and present information in specific formats to
accommodate the diverse needs of various users. These systems are called as Online-Analytical
Processing (OLAP) Systems.

Data Warehouse and the OLTP database are both relational databases. However, the goals of both
these databases are different.

Operational Database Data Warehouse

Operational systems are designed to support high-volume transaction Data warehousing systems are typically
processing. designed to support high-volume
analytical processing (i.e., OLAP).

Operational systems are usually concerned with current data. Data warehousing systems are usually
concerned with historical data.

Data within operational systems are mainly updated regularly Non-volatile, new data may be added
according to need. regularly. Once Added rarely changed.

It is designed for real-time business dealing and processes. It is designed for analysis of business
measures by subject area, categories, and
attributes.

It is optimized for a simple set of transactions, generally adding or It is optimized for extent loads and high,
retrieving a single row at a time per table. complex, unpredictable queries that
access many rows per table.

It is optimized for validation of incoming information during Loaded with consistent, valid
transactions, uses validation data tables. information, requires no real-time
validation.

It supports thousands of concurrent clients. It supports a few concurrent clients

relative to OLTP.

Operational systems are widely process-oriented. Data warehousing systems are widely
subject-oriented

Operational systems are usually optimized to perform fast inserts and Data warehousing systems are usually
updates of associatively small volumes of data. optimized to perform fast retrievals of
relatively high volumes of data.

Data In Data Out

Less Number of data accessed. Large Number of data accessed.

Relational databases are created for on-line transactional Processing Data Warehouse designed for on-line
(OLTP) Analytical Processing (OLAP)

11
Data Warehouse Architecture

A data warehouse architecture is a method of defining the overall architecture of data

communication processing and presentation that exist for end-clients computing within the
enterprise. Each data warehouse is different, but all are characterized by standard vital
components.

Production applications such as payroll accounts payable product purchasing and inventory control
are designed for online transaction processing (OLTP). Such applications gather detailed data
from day to day operations.

Data Warehouse applications are designed to support the user ad-hoc data requirements, an activity
recently dubbed online analytical processing (OLAP). These include applications such as
forecasting, profiling, summary reporting, and trend analysis.

Production databases are updated continuously by either by hand or via OLTP applications. In
contrast, a warehouse database is updated from operational systems periodically, usually during
off-hours. As OLTP data accumulates in production databases, it is regularly extracted, filtered,
and then loaded into a dedicated warehouse server that is accessible to users. As the warehouse is
populated, it must be restructured tables de-normalized, data cleansed of errors and redundancies
and new fields and keys added to reflect the needs to the user for sorting, combining, and
summarizing data.

Data warehouses and their architectures very depending upon the elements of an organization's
situation.

Three common architectures are:

o Data Warehouse Architecture: Basic

o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts

Data Warehouse Architecture: Basic

12
Operational System

An operational system is a method used in data warehousing to refer to a system that is used to
process the day-to-day transactions of an organization.

Flat Files

A Flat file system is a system of files in which transactional data is stored, and every file in the
system must have a different name.

Meta Data

A set of data that defines and gives information about other data.

Meta Data used in Data Warehouse for a variety of purpose, including:

Meta Data summarizes necessary information about data, which can make finding and work with
particular instances of data more accessible. For example, author, data build, and data changed,
and file size are examples of very basic document metadata.

Metadata is used to direct a query to the most appropriate data source.

Lightly and highly summarized data

The area of the data warehouse saves all the predefined lightly and highly summarized
(aggregated) data generated by the warehouse manager.

The goals of the summarized information are to speed up query performance. The summarized
record is updated continuously as new information is loaded into the warehouse.

End-User access Tools

The principal purpose of a data warehouse is to provide information to the business managers for
strategic decision-making. These customers interact with the warehouse using end-client access
tools.

The examples of some of the end-user access tools can be:

o Reporting and Query Tools

o Application Development Tools
o Executive Information Systems Tools
o Online Analytical Processing Tools
o Data Mining Tools

13
Data Warehouse Architecture: With Staging Area

We must clean and process your operational information before put it into the warehouse.

We can do this programmatically, although data warehouses uses a staging area (A place where
data is processed before entering the warehouse).

A staging area simplifies data cleansing and consolidation for operational method coming from
multiple source systems, especially for enterprise data warehouses where all relevant data of an
enterprise is consolidated.

Data Warehouse Staging Area is a temporary location where a record from source systems is
copied.

14
Data Warehouse Architecture: With Staging Area and Data Marts

We may want to customize our warehouse's architecture for multiple groups within our
organization.

We can do this by adding data marts. A data mart is a segment of a data warehouses that can
provided information for reporting and analysis on a section, unit, department or operation in the
company, e.g., sales, payroll, production, etc.

The figure illustrates an example where purchasing, sales, and stocks are separated. In this
example, a financial analyst wants to analyze historical data for purchases and sales or mine
historical information to make predictions about customer behavior.

Properties of Data Warehouse Architectures

The following architecture properties are necessary for a data warehouse system:

15
1. Separation: Analytical and transactional processing should be keep apart as much as possible.

2. Scalability: Hardware and software architectures should be simple to upgrade the data volume,
which has to be managed and processed, and the number of user's requirements, which have to be
met, progressively increase.

3. Extensibility: The architecture should be able to perform new operations and technologies
without redesigning the whole system.

4. Security: Monitoring accesses are necessary because of the strategic data stored in the data
warehouses.

5. Administerability: Data Warehouse management should not be complicated.

Types of Data Warehouse Architectures

Single-Tier Architecture

Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the amount
of data stored to reach this goal; it removes data redundancies.

The figure shows the only layer physically available is the source layer. In this method, data
warehouses are virtual. This means that the data warehouse is implemented as a multidimensional
view of operational data created by specific middleware, or an intermediate processing layer.

16
The vulnerability of this architecture lies in its failure to meet the requirement for separation between
analytical and transactional processing. Analysis queries are agreed to operational data after the
middleware interprets them. In this way, queries affect transactional workloads.

Two-Tier Architecture
The requirement for separation plays an essential role in defining the two-tier architecture
for a data warehouse system, as shown in fig:

Although it is typically called two-layer architecture to highlight a separation between physically

available sources and data warehouses, in fact, consists of four subsequent data flow stages:

1. Source layer: A data warehouse system uses a heterogeneous source of data. That data is
stored initially to corporate relational databases or legacy databases, or it may come from
an information system outside the corporate walls.
2. Data Staging: The data stored to the source should be extracted, cleansed to remove
inconsistencies and fill gaps, and integrated to merge heterogeneous sources into one
standard schema. The so-named Extraction, Transformation, and Loading Tools

17
(ETL) can combine heterogeneous schemata, extract, transform, cleanse, validate, filter,
and load source data into a data warehouse.
3. Data Warehouse layer: Information is saved to one logically centralized individual
repository: a data warehouse. The data warehouses can be directly accessed, but it can also
be used as a source for creating data marts, which partially replicate data warehouse
contents and are designed for specific enterprise departments. Meta-data repositories store
information on sources, access procedures, data staging, users, data mart schema, and so
on.
4. Analysis: In this layer, integrated data is efficiently, and flexible accessed to issue reports,
dynamically analyze information, and simulate hypothetical business scenarios. It should
feature aggregate information navigators, complex query optimizers, and customer-
friendly GUIs.

Three-Tier Architecture

The three-tier architecture consists of the source layer (containing multiple source system), the
reconciled layer and the data warehouse layer (containing both data warehouses and data marts).
The reconciled layer sits between the source data and data warehouse.

The main advantage of the reconciled layer is that it creates a standard reference data model for
a whole enterprise. At the same time, it separates the problems of source data extraction and
integration from those of data warehouse population. In some cases, the reconciled layer is also
directly used to accomplish better some operational tasks, such as producing daily reports that
cannot be satisfactorily prepared using the corporate applications or generating data flows to feed
external processes periodically to benefit from cleaning and integration.

This architecture is especially useful for the extensive, enterprise-wide systems. A disadvantage
of this structure is the extra file storage space used through the extra redundant reconciled layer. It
also makes the analytical tools a little further away from being real-time.

18
Three-Tier Data Warehouse Architecture

Data Warehouses usually have a three-level (tier) architecture that includes:

1. Bottom Tier (Data Warehouse Server)

2. Middle Tier (OLAP Server)
3. Top Tier (Front end Tools).

A bottom-tier that consists of the Data Warehouse server, which is almost always an RDBMS.
It may include several specialized data marts and a metadata repository.

Data from operational databases and external sources (such as user profile data provided by
external consultants) are extracted using application program interfaces called a gateway. A
gateway is provided by the underlying DBMS and allows customer programs to generate SQL
code to be executed at a server.

Examples of gateways contain ODBC (Open Database Connection) and OLE-DB (Open-
Linking and Embedding for Databases), by Microsoft, and JDBC (Java Database Connection).

A middle-tier which consists of an OLAP server for fast querying of the data warehouse.

The OLAP server is implemented using either

19
(1) A Relational OLAP (ROLAP) model, i.e., an extended relational DBMS that maps functions
on multidimensional data to standard relational operations.

(2) A Multidimensional OLAP (MOLAP) model, i.e., a particular purpose server that directly
implements multidimensional information and operations.

A top-tier that contains front-end tools for displaying results provided by OLAP, as well as
additional tools for data mining of the OLAP-generated data.

The overall Data Warehouse Architecture is shown in fig:

The metadata repository stores information that defines DW objects. It includes the following
parameters and information for the middle and the top-tier applications:

1. A description of the DW structure, including the warehouse schema, dimension,

hierarchies, data mart locations, and contents, etc.
2. Operational metadata, which usually describes the currency level of the stored data, i.e.,
active, archived or purged, and warehouse monitoring information, i.e., usage statistics,
error reports, audit, etc.
3. System performance data, which includes indices, used to improve data access and retrieval
performance.

20
4. Information about the mapping from operational databases, which provides
source RDBMSs and their contents, cleaning and transformation rules, etc.
5. Summarization algorithms, predefined queries, and reports business data, which include
business terms and definitions, ownership information, etc.

Principles of Data Warehousing

Load Performance

Data warehouses require increase loading of new data periodically basis within narrow time
windows; performance on the load process should be measured in hundreds of millions of rows
and gigabytes per hour and must not artificially constrain the volume of data business.

Load Processing

Many phases must be taken to load new or update data into the data warehouse, including data
conversion, filtering, reformatting, indexing, and metadata update.

Data Quality Management

Fact-based management demands the highest data quality. The warehouse ensures local
consistency, global consistency, and referential integrity despite "dirty" sources and massive
database size.

Query Performance

Fact-based management must not be slowed by the performance of the data warehouse
RDBMS; large, complex queries must be complete in seconds, not days.

21
Terabyte Scalability
Data warehouse sizes are growing at astonishing rates. Today these size from a few to
hundreds of gigabytes and terabyte-sized data warehouses.

Autonomous Data Warehouse

Oracle Autonomous Data Warehouse is the world’s first and only autonomous database
optimized for analytic workloads, including data marts, data warehouses, data lakes, and data
lakehouses. With Autonomous Data Warehouse, data scientists, business analysts, and
nonexperts can rapidly, easily, and cost-effectively discover business insights using data of any
size and type. Built for the cloud and optimized using Oracle Exadata, Autonomous Data
Warehouse benefits from faster performance and, according to an IDC report, lowers operational
costs by an average of 63%.
Autonomous Database provides the foundation for a data lakehouse—a modern, open
architecture that enables you to store, analyze, and understand all your data. The data lakehouse
combines the power and richness of data warehouses with the breadth, flexibility, and low cost
of popular open source data lake technologies. Access your data lakehouse through Autonomous
Database using the world's most powerful and open SQL processing engine.

This shows you how to load data from an Oracle Object Store into a database in
Autonomous Data Warehouse.

Oracle Autonomous Data Warehouse Cloud Service Tutorial Series

This is the third in a series of tutorials for Autonomous Data Warehouse. Perform the
tutorials sequentially.

• Provisioning Autonomous Data Warehouse Cloud

• Connecting SQL Developer and Creating a Table
• Loading Your Data Into Autonomous Data Warehouse
• Running a Query on Sample Data

22
• Using Oracle Machine Learning with Autonomous Data Warehouse Cloud (set of
additional tutorials)

Autonomous Data Warehouse Vs Snowflake

Main Differences

23
Technical Differences

24
Modern Data Warehouse

A Modern Data Warehouse is a cloud-based solution that gathers and stores that information.
Organizations can process this data to make intelligent decisions. That’s why various
organizations use a Modern Data Warehouse to improve their finances, human resources, and
operations business processes. Quality cloud-based warehouse departments need this information
to make smarter decisions.

Modern Data Warehouse Pyramid

There are five different components of a Modern Data Warehouse.

Level 1: Data Acquisition

Data acquisition can come from a variety of sources such as

• IoT devices

• Social media posts

• YouTube videos
• Website content
• Customer data
• Enterprise Resource Planning
• Legacy data stores
Level 2: Data Engineering

Once you acquired it, you need to upload it into the data warehouse. Data engineering uses
pipelines and ETL (extract, transform, load) tools. Using these different tools, you can upload
that information to a data warehouse similar to a factory. Data engineering is similar to a truck
bringing raw materials into a factory.

Level 3: Data Management Governance

Once the data comes into the factory, you need someone to evaluate the quality of the data. You
then need to steward that data because security and privacy must be considered.
Data governance helps ensure the quality of the info by stewarding, prepping, and cleaning the
data to ensure it is ready for analysis.

Level 4: Reporting and Business Intelligence

Once you prep and clean the data, you can start using factory analysis to take that raw
material(data) and turn it into a finished good (business intelligence). For our purposes, we will

25
use Microsoft Power BIto help you visualize the information by using advanced analytics, KPIs,
and workflow automation. When you are finished, you can see exactly what’s going on with
your data.
Level 5: Data Science

Modern Data Warehouse is about more than seeing the information; it’s about using the data to
make smarter decisions. That’s one of the key concepts you should walk away with here today.
There are several different programs to help you leverage the data to your benefit, including:

• AI
• Deep learning
• Machine learning
• Statistical modeling
• Natural language processing (NLP)
Keep in mind that all the algorithms above need data to work successfully. The more data you
provide, the smarter your decisions, and the smarter your results. It’s essential to see if you want
to understand your reports that you leverage AI to get better answers, leading us back to Modern
Data Warehouse. Again, it is more than gathering and storing data. It is about making smart
decisions.

MCQ's (CRM)
89% (9)
MCQ's (CRM)
14 pages
DATA Ware House & Mining NOTES
100% (2)
DATA Ware House & Mining NOTES
31 pages
CCS341-Data Warehousing Notes-Unit I
100% (2)
CCS341-Data Warehousing Notes-Unit I
30 pages
Ccs341 DW Notes All 5 Units
100% (1)
Ccs341 DW Notes All 5 Units
159 pages
MySQL Exercises & Solutions
80% (10)
MySQL Exercises & Solutions
61 pages
MySQL Complete Guide
100% (4)
MySQL Complete Guide
199 pages
SIMPLE SQL Begginers Guide To Master SQL and Boost Career
86% (7)
SIMPLE SQL Begginers Guide To Master SQL and Boost Career
425 pages
Complete SQL Notes
81% (53)
Complete SQL Notes
18 pages
SQL 100 Interview Questions
80% (5)
SQL 100 Interview Questions
24 pages
SQL & NoSQL Data PDF
100% (8)
SQL & NoSQL Data PDF
238 pages
Data Warehousing & Mining BCA V SEM
No ratings yet
Data Warehousing & Mining BCA V SEM
107 pages
SQL Exercises (HR Database) (JOINS)
100% (4)
SQL Exercises (HR Database) (JOINS)
6 pages
Extensibility of The Sales and Distribution Price List: SAP Enhancement Package 7 For SAP ERP 6.0
100% (2)
Extensibility of The Sales and Distribution Price List: SAP Enhancement Package 7 For SAP ERP 6.0
20 pages
Beginning MySQL
86% (37)
Beginning MySQL
865 pages
Learn Excel Data Analysis
100% (15)
Learn Excel Data Analysis
721 pages
Harsha Full Stakc Java Developer
No ratings yet
Harsha Full Stakc Java Developer
10 pages
SQL ALL Queries
86% (22)
SQL ALL Queries
33 pages
BIBLIOMETRIC
100% (1)
BIBLIOMETRIC
30 pages
SQL PDF
100% (13)
SQL PDF
221 pages
Microsoft SQL Database
67% (6)
Microsoft SQL Database
401 pages
Teradata Alerts: Installation and Configuration Guide
No ratings yet
Teradata Alerts: Installation and Configuration Guide
46 pages
SQL Crash Course
100% (2)
SQL Crash Course
178 pages
SQL - With Practice Exercises, Learn SQL Fast (PDFDrive) PDF
100% (3)
SQL - With Practice Exercises, Learn SQL Fast (PDFDrive) PDF
167 pages
SQL
50% (4)
SQL
30 pages
Data Warehouse-Ccs341 Material
No ratings yet
Data Warehouse-Ccs341 Material
58 pages
SQL For Beginners The Simplified Guide To Managing, Analyzing Data PDF
100% (3)
SQL For Beginners The Simplified Guide To Managing, Analyzing Data PDF
109 pages
SQL Tutorial
100% (5)
SQL Tutorial
200 pages
CH 1 Querying and SQL Functions 2023-24
No ratings yet
CH 1 Querying and SQL Functions 2023-24
38 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
SQL Cheat Sheet
91% (11)
SQL Cheat Sheet
11 pages
SQL Queries
100% (6)
SQL Queries
194 pages
Sap C Abapd 2309 Certsdeals Actual Questions by Patton 29 01 2024 8qa
No ratings yet
Sap C Abapd 2309 Certsdeals Actual Questions by Patton 29 01 2024 8qa
19 pages
500 SQL Server Interview Questions and Answers - SQL FAQ PDF
75% (12)
500 SQL Server Interview Questions and Answers - SQL FAQ PDF
22 pages
80 SQL Queries
No ratings yet
80 SQL Queries
14 pages
PHP Lab Program - II BCOM CA
No ratings yet
PHP Lab Program - II BCOM CA
5 pages
Important SQL Queries
100% (1)
Important SQL Queries
33 pages
SQL & Advanced SQL
100% (6)
SQL & Advanced SQL
37 pages
SQL Interview Questions & Answers
75% (4)
SQL Interview Questions & Answers
63 pages
SAA-C03 Study Guide
No ratings yet
SAA-C03 Study Guide
3 pages
Data Mining Final New
No ratings yet
Data Mining Final New
109 pages
Unit 1
No ratings yet
Unit 1
14 pages
SQL Interview Questions PDF
88% (43)
SQL Interview Questions PDF
48 pages
SQL Commands Cheat Sheet
86% (7)
SQL Commands Cheat Sheet
1 page
SQL
90% (10)
SQL
101 pages
SQL Material
No ratings yet
SQL Material
36 pages
Data Warehouse: Concepts, Architecture and Components
No ratings yet
Data Warehouse: Concepts, Architecture and Components
5 pages
SQL Notes
100% (3)
SQL Notes
38 pages
Data Warehousing Fundamentals
No ratings yet
Data Warehousing Fundamentals
108 pages
Data Warehousing
No ratings yet
Data Warehousing
71 pages
Data Warehousing-1
No ratings yet
Data Warehousing-1
51 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Bi Units F
No ratings yet
Bi Units F
53 pages
DWDM U-1
No ratings yet
DWDM U-1
45 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
34 pages
Data Warehouse Tutorial
No ratings yet
Data Warehouse Tutorial
88 pages
Data Mining 1
No ratings yet
Data Mining 1
41 pages
02 - Basic Data Warehousing & Architectures
No ratings yet
02 - Basic Data Warehousing & Architectures
51 pages
Unit I Data Warehousing
No ratings yet
Unit I Data Warehousing
10 pages
Introduction To DW
No ratings yet
Introduction To DW
28 pages
Lab9 Webbased Database Application
No ratings yet
Lab9 Webbased Database Application
34 pages
Introduction To Data Warehouse Edited
No ratings yet
Introduction To Data Warehouse Edited
34 pages
DM Unit V
No ratings yet
DM Unit V
50 pages
DATA Ware House Mining NOTES
No ratings yet
DATA Ware House Mining NOTES
31 pages
Data Warehouse Unit-3 Complete
No ratings yet
Data Warehouse Unit-3 Complete
31 pages
Data Mining
No ratings yet
Data Mining
65 pages
Unit 1 DWDM Complete
No ratings yet
Unit 1 DWDM Complete
104 pages
DW Unit1
No ratings yet
DW Unit1
26 pages
Data Warehousing Fundamentals
No ratings yet
Data Warehousing Fundamentals
47 pages
Unit One
No ratings yet
Unit One
41 pages
Unit I
No ratings yet
Unit I
33 pages
Data and AI - Data Warehousing
No ratings yet
Data and AI - Data Warehousing
58 pages
What Is A Data Warehouse
No ratings yet
What Is A Data Warehouse
34 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
24 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
29 pages
Data Warehouse Unit1 CS3551
No ratings yet
Data Warehouse Unit1 CS3551
25 pages
DMS Mircoproject Final
No ratings yet
DMS Mircoproject Final
22 pages
Unit 1
No ratings yet
Unit 1
22 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Data Warehousing-Notes (Module - I & II)
No ratings yet
Data Warehousing-Notes (Module - I & II)
32 pages
Unit 3 Introduction To Data Warehousing: Structure Page Nos
No ratings yet
Unit 3 Introduction To Data Warehousing: Structure Page Nos
21 pages
Data Warehousing
No ratings yet
Data Warehousing
16 pages
DH&DM Unit-1
No ratings yet
DH&DM Unit-1
16 pages
Pyth Full Stack Schedule Plan
No ratings yet
Pyth Full Stack Schedule Plan
33 pages
S12 B4H ADSOs+-+Part+1
No ratings yet
S12 B4H ADSOs+-+Part+1
12 pages
2.data Warehousing: Heterogeneous Database Integration
No ratings yet
2.data Warehousing: Heterogeneous Database Integration
26 pages
DBxConnect User Guide
No ratings yet
DBxConnect User Guide
20 pages
FD Unit 2
No ratings yet
FD Unit 2
20 pages
205214118-Nithya S
No ratings yet
205214118-Nithya S
15 pages
HW 2
No ratings yet
HW 2
8 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
14 pages
CDS Views Interview Questions
No ratings yet
CDS Views Interview Questions
11 pages
DWM Exp1
No ratings yet
DWM Exp1
12 pages
Introduction To Warehousing
No ratings yet
Introduction To Warehousing
21 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Data Ware House and Its Purposes
No ratings yet
Data Ware House and Its Purposes
13 pages
DW Lecture Unit 1
No ratings yet
DW Lecture Unit 1
19 pages
Data Warehouse
No ratings yet
Data Warehouse
22 pages
Eei3266 DS3
No ratings yet
Eei3266 DS3
25 pages
Data Warehousing
No ratings yet
Data Warehousing
11 pages
DWDM
No ratings yet
DWDM
15 pages
SQL Injection Scanner
No ratings yet
SQL Injection Scanner
24 pages
CCS341-Data Warehousing Notes-Unit I
No ratings yet
CCS341-Data Warehousing Notes-Unit I
30 pages
MAP The Blockchain World A Trustless and Scalable Blockchain Interoperability Protocol For Cross-Chain Applications
No ratings yet
MAP The Blockchain World A Trustless and Scalable Blockchain Interoperability Protocol For Cross-Chain Applications
11 pages
Warehousing
No ratings yet
Warehousing
15 pages
Online Examination System
No ratings yet
Online Examination System
2 pages
Data Warehouse: From Wikipedia, The Free Encyclopedia
No ratings yet
Data Warehouse: From Wikipedia, The Free Encyclopedia
5 pages
Data Warehouse 9 Oct
No ratings yet
Data Warehouse 9 Oct
15 pages
Assignment2 Instructions
No ratings yet
Assignment2 Instructions
10 pages
Data Types in SQL Server
No ratings yet
Data Types in SQL Server
3 pages
Data Warehouse Unit 1
No ratings yet
Data Warehouse Unit 1
7 pages
Prectical List
No ratings yet
Prectical List
6 pages
COMP 552 Introduction To Cybersecurity Winter 2021: Page 1 of 3
No ratings yet
COMP 552 Introduction To Cybersecurity Winter 2021: Page 1 of 3
3 pages
CDM Class1,2,3
No ratings yet
CDM Class1,2,3
4 pages
Partition Types
No ratings yet
Partition Types
4 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
Design A Social Media Feed Like Facebook or Twitter
No ratings yet
Design A Social Media Feed Like Facebook or Twitter
3 pages
Aryan Mittal
No ratings yet
Aryan Mittal
1 page
Active Directory Partition
No ratings yet
Active Directory Partition
3 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Unit 1

Uploaded by

Unit 1

Uploaded by

CCS341 DATA WAREHOUSING L T P C2 0 2 3

UNIT I INTRODUCTION TO DATA WAREHOUSE 5

Data warehouse Introduction - Data warehouse components- operational database Vs data

Data warehouse Introduction

Data Warehouse is a relational database management system (RDBMS) construct to meet

What is a Data Warehouse?

A Data Warehouse provides integrated, enterprise-wide, historical data and focuses on

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in

Characteristics of Data Warehouse

History of Data Warehouse

Goals of Data Warehousing

o To help reporting as well as analysis

Need for Data Warehouse

Data Warehouse is needed for the following reasons:

Benefits of Data Warehouse

Understand business trends and make better forecasting decisions.

Components or Building Blocks of Data Warehouse

Standardization of data components forms a large part of data transformation. Data

Data Storage Components

Information Delivery Component

Management and Control Component

Why we need a separate Data Warehouse?

3. Data is dynamic 3. Data is largely static

5. Optimized for write operations. 5. Optimized for read operations.

Difference between Operational Database and Data Warehouse

Operational Database Data Warehouse

It supports thousands of concurrent clients. It supports a few concurrent clients

Data In Data Out

Less Number of data accessed. Large Number of data accessed.

A data warehouse architecture is a method of defining the overall architecture of data

Three common architectures are:

o Data Warehouse Architecture: Basic

Data Warehouse Architecture: Basic

Meta Data used in Data Warehouse for a variety of purpose, including:

Metadata is used to direct a query to the most appropriate data source.

Lightly and highly summarized data

End-User access Tools

The examples of some of the end-user access tools can be:

o Reporting and Query Tools

Properties of Data Warehouse Architectures

5. Administerability: Data Warehouse management should not be complicated.

Types of Data Warehouse Architectures

Although it is typically called two-layer architecture to highlight a separation between physically

Data Warehouses usually have a three-level (tier) architecture that includes:

1. Bottom Tier (Data Warehouse Server)

The OLAP server is implemented using either

The overall Data Warehouse Architecture is shown in fig:

1. A description of the DW structure, including the warehouse schema, dimension,

Principles of Data Warehousing

Data Quality Management

Autonomous Data Warehouse

Oracle Autonomous Data Warehouse Cloud Service Tutorial Series

• Provisioning Autonomous Data Warehouse Cloud

Autonomous Data Warehouse Vs Snowflake

Modern Data Warehouse Pyramid

There are five different components of a Modern Data Warehouse.

Level 1: Data Acquisition

Data acquisition can come from a variety of sources such as

• Social media posts

Level 3: Data Management Governance

Level 4: Reporting and Business Intelligence

You might also like