Unit3 Notes
Unit3 Notes
Subject-Oriented
Integrated
Non-Volatile
Data once entered into a data warehouse must remain unchanged. All
data is read-only. Previous data is not erased when current data is
entered. This helps you to analyze what has happened and when.
Time-Variant
Other than these two categories, one more type exists that is called
"Hybrid Data Marts."
Designing
The design step is the first in the data mart process. This phase covers all
of the functions from initiating the request for a data mart through
gathering data about the requirements and developing the logical and
physical design of the data mart.
It may hold multiple subject areas. It holds only one subject area. For
example, Finance or Sales.
It is a Centralized System. It is a
Decentralized System.
OLAP
Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from
the first letters of the characteristics are:
Fast
It defines which the system targeted to deliver the most feedback to the
client within about five seconds, with the elementary analysis taking no
more than one second and very few taking more than 20 seconds.
Analysis
It defines which the method can cope with any business logic and
statistical analysis that is relevant for the function and the user, keep it
easy enough for the target client. Although some preprogramming may be
needed we do not think it acceptable if all application definitions have to
be allow the user to define new Adhoc calculations as part of the analysis
and to document on the data in any desired method, without having to
program so we excludes products (like Oracle Discoverer) that do not
allow the user to define new Adhoc calculation as part of the analysis and
to document on the data in any desired product that do not allow
adequate end user-oriented calculation flexibility.
Share
It defines which the system tools all the security requirements for
understanding and, if multiple write connection is needed, concurrent
update location at an appropriated level, not all functions need customer
to write data back, but for the increasing number which does, the system
should be able to manage multiple updates in a timely, secure manner.
Multidimensional
This is the basic requirement. OLAP system must provide a
multidimensional conceptual view of the data, including full support for
hierarchies, as this is certainly the most logical method to analyze
business and organizations.
Information
The system should be able to hold all the data needed by the applications.
Data sparsity should be handled in an efficient manner.
Benefits of OLAP
OLAP holds several benefits for businesses: -
Advantages of OLAP:
Disadvantages of OLAP:
Snowflake Schema:
In star schema, The fact tables and the While in snowflake schema, The fact tables, dimension
1. dimension tables are contained. tables as well as sub dimension tables are contained.
It takes less time for the execution of While it takes more time than star schema for the
4. queries. execution of queries.
In star schema, Normalization is not While in this, Both normalization and denormalization are
5. used. used.
9. It has less number of foreign keys. While it has more number of foreign keys.
10. It has high data redundancy. While it has low data redundancy.
ETL Process
ETL, which stands for extract, transform, and load, is the process data
engineers use to extract data from different sources, transform the
data into a usable and trusted resource, and load that data into the
systems end-users can access and use downstream to solve business
problems.
Transform
The second step consists of transforming the raw data that has been
extracted from the sources into a format that can be used by
different applications. In this stage, data gets cleansed, mapped and
transformed, often to a specific schema, so it meets operational
needs. This process entails several types of transformation that
ensure the quality and integrity of data Data is not usually loaded
directly into the target data source, but instead it is common to
have it uploaded into a staging database. This step ensures a quick
roll back in case something does not go as planned. During this
stage, you have the possibility to generate audit reports for
regulatory compliance, or diagnose and repair any data issues.
Load
Finally, the load function is the process of writing converted data
from a staging area to a target database, which may or may not
have previously existed. Depending on the requirements of the
application, this process may be either quite simple or intricate.
Each of these steps can be done with ETL tools or custom code.