Unit 1 Notes - DW
Unit 1 Notes - DW
-Time-Variant
Historical information is kept in a data warehouse. For
example, one can retrieve files from 3 months, 6 months, 12
months, or even previous data from a data warehouse. These
variations with a transactions system, where often only the
most current file is kept.
-Non-Volatile
The data warehouse is a physically separate data storage,
which is transformed from the source operational RDBMS. The
operational updates of data do not occur in the data
warehouse, i.e., update, insert, and delete operations are not
performed. It usually requires only two procedures in data
accessing: Initial loading of data and access to data. Therefore,
the DW does not require transaction processing, recovery, and
concurrency capabilities, which allows for substantial speedup
of data retrieval. Non-Volatile defines that once entered into
the warehouse, and data should not change.
History of Data Warehouse
The idea of data warehousing came to the late 1980's when
IBM researchers Barry Devlin and Paul Murphy established the
"Business Data Warehouse."
Metadata Component
Metadata in a data warehouse is equal to the data dictionary or
the data catalog in a database management system. In the data
dictionary, we keep the data about the logical data structures,
the data about the records and addresses, the information about
the indexes, and so on.
Data Marts
It includes a subset of corporate-wide data that is of value to a
specific group of users. The scope is confined to particular
selected subjects. Data in a data warehouse should be a fairly
current, but not mainly up to the minute, although development
in the data warehouse industry has made standard and
incremental data dumps more achievable. Data marts are lower
than data warehouses and usually contain organization. The
current trends in data warehousing are to developed a data
warehouse with several smaller related data marts for particular
kinds of queries and reports.
Management and Control Component
The management and control elements coordinate the services
and functions within the data warehouse. These components
control the data transformation and the data transfer into the
data warehouse storage. On the other hand, it moderates the
data delivery to the clients. Its work with the database
management systems and authorizes data to be correctly saved
in the repositories. It monitors the movement of information
into the staging method and from there into the data
warehouses storage itself.
Operational systems are designed to support high- Data warehousing systems are typically designed
volume transaction processing. to support high-volume analytical processing
(i.e., OLAP).
Operational systems are usually concerned with Data warehousing systems are usually concerned
current data. with historical data.
Data within operational systems are mainly updated Non-volatile, new data may be added regularly.
regularly according to need. Once Added rarely changed.
It is designed for real-time business dealing and It is designed for analysis of business measures
processes. by subject area, categories, and attributes.
It is optimized for a simple set of transactions, It is optimized for extent loads and high,
generally adding or retrieving a single row at a time complex, unpredictable queries that access many
per table. rows per table.
Operational systems are widely process-oriented. Data warehousing systems are widely subject-
oriented
Operational systems are usually optimized to Data warehousing systems are usually optimized
perform fast inserts and updates of associatively to perform fast retrievals of relatively high
small volumes of data. volumes of data.
Relational databases are created for on-line Data Warehouse designed for on-line Analytical
transactional Processing (OLTP) Processing (OLAP)
Difference between OLTP and OLAP
OLTP System
OLTP System handle with operational data. Operational data are those data
contained in the operation of a particular system. Example, ATM transactions
and Bank transactions, etc.
OLAP System
OLAP handle with Historical Data or Archival Data. Historical data are those data
that are achieved over a long period. For example, if we collect the last 10 years
information about flight reservation, the data can give us much meaningful data
such as the trends in the reservation. This may provide useful information like
peak time of travel, what kind of people are traveling in various classes
(Economy/Business) etc.
The major difference between an OLTP and OLAP system is the amount of data
analyzed in a single transaction. Whereas an OLTP manage many concurrent
customers and queries touching only an individual record or limited groups of
files at a time. An OLAP system must have the capability to operate on millions
of files to answer a single query.
Data contents OLTP system manages current OLAP system manages a large amount of
data that too detailed and are historical data, provides facilitates for
used for decision making. summarization and aggregation, and stores and
manages data at different levels of granularity.
This information makes the data more
comfortable to use in informed decision making.
Database OLTP system usually uses an OLAP system typically uses either a star or
design entity-relationship (ER) data snowflake model and subject-oriented database
model and application-oriented design.
database design.
View OLTP system focuses primarily on OLAP system often spans multiple versions of a
the current data within an database schema, due to the evolutionary
enterprise or department, without process of an organization. OLAP systems also
referring to historical information deal with data that originates from various
or data in different organizations. organizations, integrating information from
many data stores.
Volume of data Not very large Because of their large volume, OLAP data are
stored on multiple storage media.
Access patterns The access patterns of an OLTP Accesses to OLAP systems are mostly read-only
system subsist mainly of short, methods because of these data warehouses
atomic transactions. Such a stores historical data.
system requires concurrency
control and recovery techniques.
Insert and Short and fast inserts and updates Periodic long-running batch jobs refresh the
Updates proposed by end-users. data.
4. Security: Monitoring accesses are necessary because of the strategic data stored
in the data warehouses.
Data from operational databases and external sources (such as user profile data
provided by external consultants) are extracted using application program
interfaces called a gateway. A gateway is provided by the underlying DBMS and
allows customer programs to generate SQL code to be executed at a server.
A middle-tier which consists of an OLAP server for fast querying of the data
warehouse.
(1) A Relational OLAP (ROLAP) model, i.e., an extended relational DBMS that
maps functions on multidimensional data to standard relational operations.
A top-tier that contains front-end tools for displaying results provided by OLAP,
as well as additional tools for data mining of the OLAP-generated data.
The metadata repository stores information that defines DW objects. It includes
the following parameters and information for the middle and the top-tier
applications:
Why ADW?
It is built on Oracle Database, that has automatic
Datawarehouse procedures.
It is easy to use as all the management tasks are automated,
all configuration and tuning tasks are fully automated. All data
is automatically compressed and encrypted.
It is fast since its built on Exadata and Oracle database. It also
offers instant elasticity on computing and storage dimensions.
When it comes to elasticity, the user can choose the exact
amount of storage and CPU as needed. Later when more CPU's
are required, one can Scale Up or Scale down.
Machine Learning enables continuous optimization.
ML in ADW delivers an excellent query performance. Since its
built-on oracle database, every business intelligence and Data
Integration Services that are compatible with Oracle database
supports this service out of the box. For development purpose,
existing tools or a newer version of SQL developer (which
supports ADW) can be used.
ADW Patches all software online at all levels (security, OS,
network, database) while the system is running.
ADW Features
Features of Oracle Autonomous Data Warehouse:
Introduction
In today’s data-driven world, businesses need robust and
scalable data warehousing solutions to stay ahead of the
competition. Two key players in this domain are Oracle
Autonomous Data Warehouse (ADW) and Snowflake
Data Cloud. Both platforms offer unique features and
capabilities for businesses looking to leverage the power
of their data.
What Is Snowflake?
Snowflake is a Data Warehouse built for the cloud. It
centralizes data from multiple sources, enabling you to
run in-depth business insights that power your teams.