Unit-2 Notes DW 2021
Unit-2 Notes DW 2021
Unit-2 Notes DW 2021
What is ETL – ETL Vs ELT – Types of Data warehouses - Data warehouse Design and Modeling -
Delivery Process - Online Analytical Processing (OLAP) - Characteristics of OLAP - Online
Transaction Processing (OLTP) Vs OLAP - OLAP operations- Types of OLAP- ROLAP Vs MOLAP
Vs HOLAP.
What is ETL?
The mechanism of extracting information from source systems and bringing it into the data
warehouse is commonly called ETL, which stands for Extraction, Transformation and
Loading.
The ETL process requires active inputs from various stakeholders, including developers,
analysts, testers, top executives and is technically challenging.
To maintain its value as a tool for decision-makers, Data warehouse technique needs to
change with business changes. ETL is a recurring method (daily, weekly, monthly) of a Data
warehouse system and needs to be agile, automated, and well documented.
Working of ETL?
ETL consists of three separate phases:
CCS341 CSE
II
Extraction
Extraction is the operation of extracting information from a source system for further
use in a data warehouse environment. This is the first stage of the ETL process.
Extraction process is often one of the most time-consuming tasks in the ETL.
The source systems might be complicated and poorly documented, and thus
determining which data needs to be extracted can be difficult.
The data has to be extracted several times in a periodic manner to supply all changed
data to the warehouse and keep it up-to-date.
Cleansing
The cleansing stage is crucial in a data warehouse technique because it is supposed to
improve data quality. The primary data cleansing features found in ETL tools are rectification
and homogenization. They use specific dictionaries to rectify typing mistakes and to
recognize synonyms, as well as rule-based cleansing to enforce domain-specific rules and
defines appropriate associations between values.
The following examples show the essential of data cleaning:
If an enterprise wishes to contact its users or its suppliers, a complete, accurate and up-to-date
list of contact addresses, email addresses and telephone numbers must be available.
If a client or supplier calls, the staff responding should be quickly able to find the person in
the enterprise database, but this need that the caller's name or his/her company name is listed
in the database.
If a user appears in the databases with two or more slightly different names or different
account numbers, it becomes difficult to update the customer's information.
Transformation
Transformation is the core of the reconciliation phase. It converts records from its operational
source format into a particular data warehouse format. If we implement a three-layer
architecture, this phase outputs our reconciled data layer.
The following points must be rectified in this phase:
Loose texts may hide valuable information. For example, XYZ PVT Ltd does not
explicitly show that this is a Limited Partnership company.
Different formats can be used for individual data. For example, data can be saved as a
string or as three integers.
Following are the main transformation processes aimed at populating the reconciled data
layer:
Conversion and normalization that operate on both storage formats and units of measure
to make data uniform.
Matching that associates equivalent fields in different sources.
CCS341 CSE
II
Loading
The Load is the process of writing the data into the target database. During the load step, it is
necessary to ensure that the load is performed correctly and with as little resources as
possible.
Loading can be carried in two ways:
1. Refresh: Data Warehouse data is completely rewritten. This means that older file is
replaced. Refresh is usually used in combination with static extraction to populate a
data warehouse initially.
2. Update: Only those changes applied to source information are added to the Data
Warehouse. An update is typically carried out without deleting or modifying
preexisting data. This method is used in combination with incremental extraction to
update data warehouses regularly.
CCS341 CSE
II
ETL Vs ELT
ETL (Extract, Transform, and Load)
Extract, Transform and Load is the technique of extracting the record from sources (which is
present outside or on-premises, etc.) to a staging area, then transforming or reformatting with
business manipulation performed on it in order to fit the operational needs or data analysis,
and later loading into the goal or destination databases or data warehouse.
Strengths
Development Time: Designing from the output backwards provide that only information
applicable to the solution is extracted and processed, potentially decreasing development,
delete, and processing overhead.
Targeted data: Due to the targeted feature of the load process, the warehouse contains only
information relevant to the presentation. Reduced warehouse content simplify the security
regime enforce and hence the administration overheads.
Tools Availability: The number of tools available that implement ETL provides the
flexibility of approach and the opportunity to identify the most appropriate tool. The
proliferation of tools has to lead to a competitive functionality war, which often results in loss
of maintainability.
CCS341 CSE
II
Weaknesses
Flexibility: Targeting only relevant information for output means that any future
requirements that may need data that was not included in the original design will need to be
added to the ETL routines. Due to the nature of tight dependency between the methods
developed, this often leads to a need for fundamental redesign and development. As a result,
this increase the time and cost involved.
Hardware: Most third-party tools utilize their engine to implement the ETL phase.
Regardless of the estimate of the solution, this can necessitate the investment in additional
hardware to implement the tool's ETL engine. The use of third-party tools to achieve the ETL
process compels the information of new scripting languages and processes.
Learning Curve: Implementing a third-party tools that uses foreign processes and languages
results in the learning curve that is implicit in all technologies new to an organization and can
often lead to consecutive blind alleys in their use due to shortage of experience.
ELT (Extract, Load and Transform)
ELT stands for Extract, Load and Transform is the various sight while looking at data
migration or movement. ELT involves the extraction of aggregate information from the
source system and loading to the target method instead of transformation between the
extraction and loading phase. Once the data is copied or loaded into the target method, then
change takes place.
The extract and load step can be isolated from the transformation process. Isolating the load
phase from the transformation process delete an inherent dependency between these phases.
In addition to containing the data necessary for the transformations, the extract and load
process can include components of data that may be essential in the future. The load phase
could take the entire source and loaded it into the warehouses.
Separating the phases enables the project to be damaged down into smaller chunks, thus
making it more specific and manageable.
Performing the data integrity analysis in the staging method enables a further phase in the
process to be isolated and dealt with at the most appropriate point in the process. This method
CCS341 CSE
II
also helps to ensure that only cleaned and checked information is loaded into the warehouse
for transformation.
Isolating the transformations from the load steps helps to encourage a more staged way to the
warehouse design and implementation.
Strengths
Project Management: Being able to divide the warehouse method into specific and isolated
functions, enables a project to be designed on a smaller function basis, therefore the project
can be broken down into feasible chunks.
Flexible & Future Proof: In general, in an ELT implementation, all record from the sources
are loaded into the data warehouse as part of the extract and loading process. This, linked
with the isolation of the transformation phase, means that future requirements can easily be
incorporated into the data warehouse architecture.
Risk minimization: Deleting the close interdependencies between each technique of the
warehouse build system enables the development method to be isolated, and the individual
process design can thus also be separated. This provides a good platform for change,
maintenance and management.
Utilize Existing Hardware: In implementing ELT as a warehouse build process, the
essential tools provided with the database engine can be used.
Utilize Existing Skill sets: By using the functionality support by the database engine, the
existing investment in database functions are re-used to develop the warehouse. No new skills
need to be learned, and the full weight of the experience in developing the engine?s
technology is utilized, further reducing the cost and risk in the development process.
Weaknesses
Against the Norm: ELT is a new method to data warehouse design and development. While
it has proven itself many times over through its abundant use in implementations throughout
the world, it does require a change in mentality and design approach against traditional
methods.
Tools Availability: Being an emergent technology approach, ELT suffers from the limited
availability of tools.
CCS341 CSE
II
Compute-intensive Transformations
CCS341 CSE
II
1. These TP systems have been developing in their database design for transaction
throughput. In all methods, a database is designed for optimal query or transaction
processing. A complex business query needed the joining of many normalized tables,
and as result performance will usually be poor and the query constructs largely
complex.
2. There is no assurance that data in two or more production methods will be consistent.
Before embarking on designing, building and implementing such a warehouse, some further
considerations must be given because
1. Such databases generally have very high volumes of data storage.
2. Such warehouses may require support for both MVS and customer-based report and
query facilities.
3. These warehouses have complicated source systems.
4. Such systems needed continuous maintenance since these must also be used for
mission-critical objectives.
To make such data warehouses building successful, the following phases are generally
followed:
1. Unload Phase: It contains selecting and scrubbing the operation data.
CCS341 CSE
II
2. Transform Phase: For translating it into an appropriate form and describing the rules
for accessing and storing it.
3. Load Phase: For moving the record directly into DB2 tables or a particular file for
moving it into another database or non-MVS warehouse.
An integrated Metadata repository is central to any data warehouse environment. Such a
facility is required for documenting data sources, data translation rules, and user areas to the
warehouse. It provides a dynamic network between the multiple data source databases and
the DB2 of the conditional data warehouses.
A metadata repository is necessary to design, build, and maintain data warehouse processes.
It should be capable of providing data as to what data exists in both the operational system
and data warehouse, where the data is located. The mapping of the operational data to the
warehouse fields and end-user access techniques. Query, reporting, and maintenance are
another indispensable method of such a data warehouse. An MVS-based query and reporting
tool for DB2.
Host-Based (UNIX) Data Warehouses
Oracle and Informix RDBMSs support the facilities for such data warehouses. Both of these
databases can extract information from MVS¬ based databases as well as a higher number of
other UNIX¬ based databases. These types of warehouses follow the same stage as the host-
based MVS data warehouses. Also, the data from different network servers can be created.
Since file attribute consistency is frequent across the inter-network.
CCS341 CSE
II
Designed for the workgroup environment, a LAN based workgroup warehouse is optimal for
any business organization that wants to build a data warehouse often called a data mart. This
type of data warehouse generally requires a minimal initial investment and technical training.
Data Delivery: With a LAN based workgroup warehouse, customer needs minimal
technical knowledge to create and maintain a store of data that customized for use at the
department, business unit, or workgroup level. A LAN based workgroup warehouse ensures
the delivery of information from corporate resources by providing transport access to the data
in the warehouse.
Host-Based Single Stage (LAN) Data Warehouses
Within a LAN based data warehouse, data delivery can be handled either centrally or from
the workgroup environment so business groups can meet process their data needed without
burdening centralized IT resources, enjoying the autonomy of their data mart without
comprising overall data integrity and security in the enterprise.
Limitations
Both DBMS and hardware scalability methods generally limit LAN based warehousing
solutions.
Many LAN based enterprises have not implemented adequate job scheduling, recovery
management, organized maintenance, and performance monitoring methods to provide robust
warehousing solutions.
Often these warehouses are dependent on other platforms for source record. Building an
environment that has data integrity, recoverability, and security require careful design,
CCS341 CSE
II
CCS341 CSE
II
Usually, the ODS stores only the most up-to-date records. The data warehouse stores the
historical calculation of the files. At first, the information in both databases will be very
similar. For example, the records for a new client will look the same. As changes to the user
record occur, the ODs will be refreshed to reflect only the most current data, whereas the data
warehouse will contain both the historical data and the new information. Thus the volume
requirement of the data warehouse will exceed the volume requirements of the ODS
overtime. It is not familiar to reach a ratio of 4 to 1 in practice.
Stationary Data Warehouses
In this type of data warehouses, the data is not changed from the sources, as shown in fig:
Instead, the customer is given direct access to the data. For many organizations, infrequent
access, volume issues, or corporate necessities dictate such as approach. This schema does
generate several problems for the customer such as
Identifying the location of the information for the users
Providing clients the ability to query different DBMSs as is they were all a single
DBMS with a single API.
Impacting performance since the customer will be competing with the production data
stores.
Such a warehouse will need highly specialized and sophisticated 'middleware' possibly with a
single interaction with the client. This may also be essential for a facility to display the
extracted record for the user before report generation. An integrated metadata repository
becomes an absolute essential under this environment.
CCS341 CSE
II
CCS341 CSE
II
CCS341 CSE
II
CCS341 CSE
II
The data within the specific warehouse itself has a particular architecture with the emphasis
on various levels of summarization, as shown in figure:
CCS341 CSE
II
Highly summarized data is compact and directly available and can even be found outside
the warehouse.
Metadata is the final element of the data warehouses and is really of various dimensions in
which it is not the same as file drawn from the operational data, but it is used as:-
A directory to help the DSS investigator locate the items of the data warehouse.
A guide to the mapping of record as the data is changed from the operational data to
the data warehouse environment.
A guide to the method used for summarization between the current, accurate data and
the lightly summarized information and the highly summarized data, etc.
Data Modeling Life Cycle
Data modeling life cycle is a straight forward process of transforming the business
requirements to fulfill the goals for storing, maintaining, and accessing the data within IT
systems. The result is a logical and physical data model for an enterprise data warehouse.
The objective of the data modeling life cycle is primarily the creation of a storage area for
business information. That area comes from the logical and physical data modeling stages, as
shown in Figure:
CCS341 CSE
II
We can see that the only data shown via the conceptual data model is the entities that define
the data and the relationships between those entities. No other data, as shown through the
conceptual data model.
CCS341 CSE
II
CCS341 CSE
II
CCS341 CSE
II
CCS341 CSE
II
CCS341 CSE
II
CCS341 CSE
II
5. Sources: The information for the data warehouse is likely to come from several data
sources. This step contains identifying and connecting the sources using the gateway, ODBC
drives, or another wrapper.
6. ETL: The data from the source system will require to go through an ETL phase. The
process of designing and implementing the ETL phase may contain defining a suitable ETL
tool vendors and purchasing and implementing the tools. This may contains customize the
tool to suit the need of the enterprises.
7. Populate the data warehouses: Once the ETL tools have been agreed upon, testing the
tools will be needed, perhaps using a staging area. Once everything is working adequately,
the ETL tools may be used in populating the warehouses given the schema and view
definition.
8. User applications: For the data warehouses to be helpful, there must be end-user
applications. This step contains designing and implementing applications required by the end-
users.
9. Roll-out the warehouses and applications: Once the data warehouse has been populated
and the end-client applications tested, the warehouse system and the operations may be rolled
out for the user's community to use.
CCS341 CSE
II
IT Strategy: DWH project must contain IT strategy for procuring and retaining funding.
Business Case Analysis: After the IT strategy has been designed, the next step is the
business case. It is essential to understand the level of investment that can be justified and to
recognize the projected business benefits which should be derived from using the data
warehouse.
Education & Prototyping: Company will experiment with the ideas of data analysis and
educate themselves on the value of the data warehouse. This is valuable and should be
required if this is the company first exposure to the benefits of the DS record. Prototyping
method can progress the growth of education. It is better than working models. Prototyping
requires business requirement, technical blueprint, and structures.
Business Requirement: It contains such as
The logical model for data within the data warehouse.
The source system that provides this data (mapping rules)
The business rules to be applied to information.
The query profiles for the immediate requirement
Technical blueprint: It arranges the architecture of the warehouse. Technical blueprint of
the delivery process makes an architecture plan which satisfies long-term requirements. It
lays server and data mart architecture and essential components of database design.
Building the vision: It is the phase where the first production deliverable is produced. This
stage will probably create significant infrastructure elements for extracting and loading
information but limit them to the extraction and load of information sources.
History Load: The next step is one where the remainder of the required history is loaded into
the data warehouse. This means that the new entities would not be added to the data
warehouse, but additional physical tables would probably be created to save the increased
record volumes.
AD-Hoc Query: In this step, we configure an ad-hoc query tool to operate against the data
warehouse.
These end-customer access tools are capable of automatically generating the database query
that answers any question posed by the user.
Automation: The automation phase is where many of the operational management processes
are fully automated within the DWH. These would include:
Extracting & loading the data from a variety of sources
systems Transforming the information into a form suitable for
analysis Backing up, restoring & archiving data
Generating aggregations from predefined definitions within the Data Warehouse.
CCS341 CSE
II
Monitoring query profiles & determining the appropriate aggregates to maintain system
performance.
Extending Scope: In this phase, the scope of DWH is extended to address a new set of
business requirements. This involves the loading of additional data sources into the DWH i.e.
the introduction of new data marts.
Requirement Evolution: This is the last step of the delivery process of a data warehouse. As
we all know that requirements are not static and evolve continuously. As the business
requirements will change it supports to be reflected in the system.
CCS341 CSE
II
Production
Production planning
Defect analysis
OLAP cubes have two main purposes. The first is to provide business users with a data model
more intuitive to them than a tabular model. This model is called a Dimensional Model.
The second purpose is to enable fast query response that is usually difficult to achieve using
tabular models.
How OLAP Works?
Fundamentally, OLAP has a very simple concept. It pre-calculates most of the queries that
are typically very hard to execute over tabular databases, namely aggregation, joining, and
grouping. These queries are calculated during a process that is usually called 'building' or
'processing' of the OLAP cube. This process happens overnight, and by the time end users get
to work - data will have been updated.
OLAP Guidelines (Dr.E.F.Codd Rule)
Dr E.F. Codd, the "father" of the relational model, has formulated a list of 12 guidelines and
requirements as the basis for selecting OLAP systems:
1) Multidimensional Conceptual View: This is the central features of an OLAP system. By
needing a multidimensional view, it is possible to carry out methods like slice and dice.
2) Transparency: Make the technology, underlying information repository, computing
operations, and the dissimilar nature of source data totally transparent to users. Such
transparency helps to improve the efficiency and productivity of the users.
3) Accessibility: It provides access only to the data that is actually required to perform the
particular analysis, present a single, coherent, and consistent view to the clients. The OLAP
system must map its own logical schema to the heterogeneous physical data stores and
perform any necessary transformations. The OLAP operations should be sitting between data
sources (e.g., data warehouses) and an OLAP front-end.
4) Consistent Reporting Performance: To make sure that the users do not feel any
significant degradation in documenting performance as the number of dimensions or the size
of the database increases. That is, the performance of OLAP should not suffer as the number
of dimensions is increased. Users must observe consistent run time, response time, or
machine utilization every time a given query is run.
5) Client/Server Architecture: Make the server component of OLAP tools sufficiently
intelligent that the various clients to be attached with a minimum of effort and integration
programming. The server should be capable of mapping and consolidating data between
dissimilar databases.
6) Generic Dimensionality: An OLAP method should treat each dimension as equivalent in
both is structure and operational capabilities. Additional operational capabilities may be
CCS341 CSE
II
allowed to selected dimensions, but such additional tasks should be grantable to any
dimension.
7) Dynamic Sparse Matrix Handling: To adapt the physical schema to the specific
analytical model being created and loaded that optimizes sparse matrix handling. When
encountering the sparse matrix, the system must be easy to dynamically assume the
distribution of the information and adjust the storage and access to obtain and maintain a
consistent level of performance.
8) Multiuser Support: OLAP tools must provide concurrent data access, data integrity, and
access security.
9) Unrestricted cross-dimensional Operations: It provides the ability for the methods to
identify dimensional order and necessarily functions roll-up and drill-down methods within a
dimension or across the dimension.
10) Intuitive Data Manipulation: Data Manipulation fundamental the consolidation
direction like as reorientation (pivoting), drill-down and roll-up, and another manipulation to
be accomplished naturally and precisely via point-and-click and drag and drop methods on
the cells of the scientific model. It avoids the use of a menu or multiple trips to a user
interface.
11) Flexible Reporting: It implements efficiency to the business clients to organize columns,
rows, and cells in a manner that facilitates simple manipulation, analysis, and synthesis of
data.
12) Unlimited Dimensions and Aggregation Levels: The number of data dimensions should
be unlimited. Each of these common dimensions must allow a practically unlimited number
of customer-defined aggregation levels within any given consolidation path.
Characteristics of OLAP
In the FASMI characteristics of OLAP methods, the term derived from the first letters of
the characteristics are:
Fast
It defines which the system targeted to deliver the most feedback to the client within about
five seconds, with the elementary analysis taking no more than one second and very few
taking more than 20 seconds.
Analysis
It defines which the method can cope with any business logic and statistical analysis that is
relevant for the function and the user, keep it easy enough for the target client. Although
some preprogramming may be needed we do not think it acceptable if all application
definitions have to be allow the user to define new Adhoc calculations as part of the analysis
and to document on the data in any desired method, without having to program so we
excludes products (like Oracle Discoverer) that do not allow the user to define new Adhoc
CCS341 CSE
II
calculation as part of the analysis and to document on the data in any desired product that do
not allow adequate end user-oriented calculation flexibility.
Share
It defines which the system tools all the security requirements for understanding and, if
multiple write connection is needed, concurrent update location at an appropriated level, not
all functions need customer to write data back, but for the increasing number which does, the
system should be able to manage multiple updates in a timely, secure manner.
Multidimensional
This is the basic requirement. OLAP system must provide a multidimensional conceptual
view of the data, including full support for hierarchies, as this is certainly the most logical
method to analyze business and organizations.
Information
The system should be able to hold all the data needed by the applications. Data sparsity
should be handled in an efficient manner.
The main characteristics of OLAP are as follows:
1. Multidimensional conceptual view: OLAP systems let business users have a
dimensional and logical view of the data in the data warehouse. It helps in carrying
slice and dice operations.
2. Multi-User Support: Since the OLAP techniques are shared, the OLAP operation
should provide normal database operations, containing retrieval, update, adequacy
control, integrity, and security.
3. Accessibility: OLAP acts as a mediator between data warehouses and front-end. The
OLAP operations should be sitting between data sources (e.g., data warehouses) and
an OLAP front-end.
4. Storing OLAP results: OLAP results are kept separate from data sources.
5. Uniform documenting performance: Increasing the number of dimensions or
database size should not significantly degrade the reporting performance of the OLAP
system.
6. OLAP provides for distinguishing between zero values and missing values so that
aggregates are computed correctly.
7. OLAP system should ignore all missing values and compute correct aggregate values.
8. OLAP facilitate interactive query and complex analysis for the users.
9. OLAP allows users to drill down for greater details or roll up for aggregations of
metrics along a single business dimension or across multiple dimension.
10. OLAP provides the ability to perform intricate calculations and comparisons.
11. OLAP presents results in a number of meaningful ways, including charts and graphs.
CCS341 CSE
II
Benefits of OLAP
OLAP holds several benefits for businesses: -
1. OLAP helps managers in decision-making through the multidimensional record views
that it is efficient in providing, thus increasing their productivity.
2. OLAP functions are self-sufficient owing to the inherent flexibility support to the
organized databases.
3. It facilitates simulation of business models and problems, through extensive
management of analysis-capabilities.
4. In conjunction with data warehouse, OLAP can be used to support a reduction in the
application backlog, faster data retrieval, and reduction in query drag.
Motivations for using OLAP
1) Understanding and improving sales: For enterprises that have much products and
benefit a number of channels for selling the product, OLAP can help in finding the most
suitable products and the most famous channels. In some methods, it may be feasible to find
the most profitable users. For example, considering the telecommunication industry and
considering only one product, communication minutes, there is a high amount of record if
a company want to analyze the sales of products for every hour of the day (24 hours),
difference between weekdays and weekends (2 values) and split regions to which calls are
made into 50 region.
2) Understanding and decreasing costs of doing business: Improving sales is one method
of improving a business, the other method is to analyze cost and to control them as much as
suitable without affecting sales. OLAP can assist in analyzing the costs related to sales. In
some methods, it may also be feasible to identify expenditures which produce a high return
on investments (ROI). For example, recruiting a top salesperson may contain high costs, but
the revenue generated by the salesperson may justify the investment.
CCS341 CSE
II
CCS341 CSE
II
7) View: An OLTP system focuses primarily on the current data within an enterprise or
department, which does not refer to historical data or data in various organizations. In
contrast, an OLAP system spans multiple version of a database schema, due to the
evolutionary process of an organization. OLAP system also deals with information that
originates from different organizations, integrating information from many data stores.
Because of their huge volume, these are stored on multiple storage media.
8) Access Patterns: The access pattern of an OLTP system consist primarily of short, atomic
transactions. Such a system needed concurrency control and recovery techniques. However,
access to OLAP systems is mostly read-only operations because these data warehouses store
historical information.
The biggest difference between an OLTP and OLAP system is the amount of data analyzed in
a single transaction. Whereas an OLTP handles many concurrent customers and queries
touching only a single data or limited collection of records at a time, an OLAP system must
have the efficiency to operate on millions of data to answer a single query.
CCS341 CSE
II
Example
Consider the following cubes illustrating temperature of certain days recorded weekly:
Temperature 64 65 68 69 70 71 72 75 80 81 83 85
Week1 1 0 1 0 1 0 0 0 0 0 1 0
Week2 0 0 0 1 0 0 1 2 0 1 0 0
Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in
temperature from the above cubes.
To do this, we have to group column and add up the value according to the concept
hierarchies. This operation is known as a roll-up.
By doing this, we contain the following cube:
Temperature cool mild hot
Week1 2 1 1
Week2 2 1 1
The roll-up operation groups the information by levels of temperature.
The following diagram illustrates how roll-up works.
CCS341 CSE
II
Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-
down is like zooming-in on the data cube. It navigates from less detailed record to more
detailed data. Drill-down can be performed by either stepping down a concept hierarchy for
a dimension or adding additional dimensions.
Figure shows a drill-down operation performed on the dimension time by stepping down a
concept hierarchy which is defined as day, month, quarter, and year. Drill-down appears by
descending the time hierarchy from the level of the quarter to a more detailed level of the
month.
Because a drill-down adds more details to the given data, it can also be performed by adding
a new dimension to a cube. For example, a drill-down on the central cubes of the figure can
occur by introducing an additional dimension, such as a customer group.
Example
Drill-down adds more details to the given data
Temperature cool mild hot
Day 1 0 0 0
Day 2 0 0 0
Day 3 0 0 1
Day 4 0 1 0
Day 5 1 0 0
Day 6 0 0 0
Day 7 1 0 0
Day 8 0 0 0
Day 9 1 0 0
Day 10 0 1 0
Day 11 0 1 0
Day 12 0 1 0
Day 13 0 0 1
Day 14 0 0 0
CCS341 CSE
II
Slice
A slice is a subset of the cubes corresponding to a single value for one or more members of
the dimension. For example, a slice operation is executed when the customer wants a
selection on one dimension of a three-dimensional cube resulting in a two-dimensional site.
So, the Slice operations perform a selection on one dimension of the given cube, thus
resulting in a subcube.
For example, if we make the selection, temperature=cool we will obtain the following cube:
Temperature cool
Day 1 0
Day 2 0
Day 3 0
Day 4 0
Day 5 1
Day 6 1
Day 7 1
Day 8 1
Day 9 1
Day 11 0
Day 12 0
Day 13 0
Day 14 0
CCS341 CSE
II
Here Slice is functioning for the dimensions "time" using the criterion time = "Q1".
It will form a new sub-cubes by selecting one or more dimensions.
Dice
The dice operation describes a subcube by operating a selection on two or more dimension.
For example, Implement the selection (time = day 3 OR time = day 4) AND (temperature =
cool OR temperature = hot) to the original cubes we get the following subcube (still two-
dimensional)
Temperature cool hot
Day 3 0 1
Day 4 0 0
CCS341 CSE
II
The dice operation on the cubes based on the following selection criteria involves three
dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
Pivot
The pivot operation is also called a rotation. Pivot is a visualization operations which rotates
the data axes in view to provide an alternative presentation of the data. It may contain
swapping the rows and columns or moving one of the row-dimensions into the column
dimensions.
CCS341 CSE
II
CCS341 CSE
II
Other OLAP operations may contain ranking the top-N or bottom-N elements in lists, as well
as calculate moving average, growth rates, and interests, internal rates of returns,
depreciation, currency conversions, and statistical tasks.
OLAP offers analytical modeling capabilities, containing a calculation engine for
determining ratios, variance, etc. and for computing measures across various dimensions. It
can generate summarization, aggregation, and hierarchies at each granularity level and at
every dimensions intersection. OLAP also provide functional models for forecasting, trend
analysis, and statistical analysis. In this context, the OLAP engine is a powerful data analysis
tool.
Types of OLAP
There are three main types of OLAP servers are as following:
HOLAP stands for Hybrid OLAP, an application using both relational and multidimensional
techniques.
CCS341 CSE
II
ROLAP systems work primarily from the data that resides in a relational database, where the
base data and dimension tables are stored as relational tables. This model permits the
multidimensional analysis of data.
This technique relies on manipulating the data stored in the relational database to give the
presence of traditional OLAP's slicing and dicing functionality. In essence, each method of
slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.
Relational OLAP Architecture
ROLAP Architecture includes the following components
Database server.
ROLAP server.
Front-end tool.
Relational OLAP (ROLAP) is the latest and fastest-growing OLAP technology segment in
the market. This method allows multiple multidimensional views of two-dimensional
relational tables to be created, avoiding structuring record around the desired view.
Some products in this segment have supported reliable SQL engines to help the complexity of
multidimensional analysis. This includes creating multiple SQL statements to handle user
requests, being 'RDBMS' aware and also being capable of generating the SQL statements
based on the optimizer of the DBMS engine.
Advantages
Can handle large amounts of information: The data size limitation of ROLAP technology
is depends on the data size of the underlying RDBMS. So, ROLAP itself does not restrict the
data amount.
<="" strong=""> RDBMS already comes with a lot of features. So ROLAP technologies,
(works on top of the RDBMS) can control these functionalities.
CCS341 CSE
II
Disadvantages
Performance can be slow: Each ROLAP report is a SQL query (or multiple SQL queries) in
the relational database, the query time can be prolonged if the underlying data size is large.
Limited by SQL functionalities: ROLAP technology relies on upon developing SQL
statements to query the relational database, and SQL statements do not suit all needs.
Multidimensional OLAP (MOLAP) Server
A MOLAP system is based on a native logical model that directly supports multidimensional
data and operations. Data are stored physically into multidimensional arrays, and positional
techniques are used to access them.
One of the significant distinctions of MOLAP against a ROLAP is that data are summarized
and are stored in an optimized format in a multidimensional cube, instead of in a relational
database. In MOLAP model, data are structured into proprietary formats by client's reporting
requirements with the calculations pre-generated on the cubes.
MOLAP Architecture
MOLAP Architecture includes the following components
Database server.
MOLAP server.
Front-end tool.
MOLAP structure primarily reads the precompiled data. MOLAP structure has limited
capabilities to dynamically create aggregations or to evaluate results which have not been
pre-calculated and stored.
Applications requiring iterative and comprehensive time-series analysis of trends are well
suited for MOLAP technology (e.g., financial analysis and budgeting).
CCS341 CSE
II
Examples include Arbor Software's Essbase. Oracle's Express Server, Pilot Software's
Lightship Server, Sniper's TM/1. Planning Science's Gentium and Kenan Technology's
Multiway.
Some of the problems faced by clients are related to maintaining support to multiple subject
areas in an RDBMS. Some vendors can solve these problems by continuing access from
MOLAP tools to detailed data in and RDBMS.
This can be very useful for organizations with performance-sensitive multidimensional
analysis requirements and that have built or are in the process of building a data warehouse
architecture that contains multiple subject areas.
An example would be the creation of sales data measured by several dimensions (e.g.,
product and sales region) to be stored and maintained in a persistent structure. This structure
would be provided to reduce the application overhead of performing calculations and
building aggregation during initialization. These structures can be automatically refreshed at
predetermined intervals established by an administrator.
Advantages
Excellent Performance: A MOLAP cube is built for fast information retrieval, and is
optimal for slicing and dicing operations.
Can perform complex calculations: All evaluation have been pre-generated when the cube
is created. Hence, complex calculations are not only possible, but they return quickly.
Disadvantages
Limited in the amount of information it can handle: Because all calculations are
performed when the cube is built, it is not possible to contain a large amount of data in the
cube itself.
Requires additional investment: Cube technology is generally proprietary and does not
already exist in the organization. Therefore, to adopt MOLAP technology, chances are other
investments in human and capital resources are needed.
Hybrid OLAP (HOLAP) Server
HOLAP incorporates the best features of MOLAP and ROLAP into a single architecture.
HOLAP systems save more substantial quantities of detailed data in the relational tables
while the aggregations are stored in the pre-calculated cubes. HOLAP also can drill through
from the cube down to the relational tables for delineated data. The Microsoft SQL Server
2000 provides a hybrid OLAP server.
CCS341 CSE
II
Advantages of HOLAP
1. HOLAP provide benefits of both MOLAP and ROLAP.
2. It provides fast access at all levels of aggregation.
3. HOLAP balances the disk space requirement, as it only stores the aggregate
information on the OLAP server and the detail record remains in the relational
database. So no duplicate copy of the detail record is maintained.
Disadvantages of HOLAP
1. HOLAP architecture is very complicated because it supports both MOLAP and
ROLAP servers.
Other Types
There are also less popular types of OLAP styles upon which one could stumble upon every
so often. We have listed some of the less popular brands existing in the OLAP industry.
Web-Enabled OLAP (WOLAP) Server
WOLAP pertains to OLAP application which is accessible via the web browser. Unlike
traditional client/server OLAP applications, WOLAP is considered to have a three-tiered
architecture which consists of three components: a client, a middleware, and a database
server.
Desktop OLAP (DOLAP) Server
DOLAP permits a user to download a section of the data from the database or source, and
work with that dataset locally, or on their desktop.
CCS341 CSE
II
The ROLAP storage The MOLAP storage The HOLAP storage mode
mode
mode causes the principle the aggregations of the connects attributes of both
aggregation of the division and a copy of its source MOLAP and ROLAP. Like
division to be stored in information to be saved in a MOLAP, HOLAP causes the
indexed views in the multidimensional operation aggregation of the division to
relational database that analysis services when be stored in a multidimensional
was specified in the separation is processed. operation in an SQL Server
partition's data source. analysis services instance.
ROLAP does not This MOLAP operation is highly HOLAP does not causes a copy
because a copy of the optimize to maximize of the source information to be
query
source information to performance. The storage area stored. For queries that access
be stored in the can be on the computer where the the only summary record in the
Analysis services data partition is described or aggregations of a division,
on
folders. Instead, when another computer HOLAP is the equivalent of
running
the outcome cannot be Analysis services. Because MOLAP.
derived from the query copy of the source information
cache, the indexed resides in the multidimensional
views in the record operation, queries can
source are accessed to resolved without accessing the
answer queries. partition's source record.
Query response is Query response times can be Queries that access source
frequently slower with reduced substantially by using record for example, if we want
ROLAP storage than aggregations. The record in the to drill down to an atomic cube
with the MOLAP or partition's MOLAP operation is cell for which there is no
HOLAP storage mode. only as current as of the most aggregation information must
Processing time is also recent processing of retrieve data from the relational
frequently slower. separation. database and will not be faster.
CCS341 CSE
II
CCS341 CSE