Unit-2 DMDW
Unit-2 DMDW
Now, let's discuss various OLAP operations on a data cube with examples.
1. Roll-up: Roll-up is the process of summarizing data along one or more dimensions by
moving up in the hierarchy. It involves aggregating lower-level data to higher-level
data.
2. Drill-down: Drill-down is the opposite of roll-up, where data is broken down into
more detailed levels or dimensions. It allows users to explore lower-level data to gain
more insights.
3. Slice: Slicing involves selecting a particular value or range of values from one or
more dimensions to create a subcube. It allows users to analyze data from a
specific perspective
4. Dice: Dicing is similar to slicing, but it involves selecting multiple values or
ranges from one or more dimensions to create a subcube. It helps in analyzing data
from a more specific viewpoint.
5. Pivot (Rotate): Pivoting or rotating the data cube involves reorienting the
dimensions and measures. It allows users to view data from different angles or
perspectivesBy employing these OLAP operations, users can perform advanced
analysis on a data cube and gain valuable insights from different dimensions and
levels of detail.
Example:
Dimensions : course, student, time fact : Aggregate marks.
Each cell gives aggregate marks for each student, for each course year wise.
1. Slice : It is used to obtain a particular slice of the cube.
example : Aggregate marks of all students for all courses in the year 2016.
2. Dice : It is used to obtain a small portion of the cube covering all the dimensions.
example : Aggregate marks of all BE students, for courses of DB cluster for test
five years.
3. Roll up : It is used to obtain generalized level of data from the current level example
: Cluster wise aggregate marks for all students for every year.
4. Drill down : It is used to obtain detailed level of data from the current
level. example theory, T/w & viva marks for all students for every year.
5. Pivot : It is used to analyze the some data from different
perspective example : Analyzing aggregate marks course arise.
OLTP (Online
OLAP (Online Transaction
Category Analytical Processing) Processing)
It is well-known as an It is well-known as an
Definition online database query online database modifying
management system. system.
Consists of historical data Consists of only
Data source
from various Databases. operational current data.
It makes use of a
It makes use of a data
Method used standard database
warehouse.
management system
(DBMS).
It is subject-oriented. Used
for Data Mining, It is application-oriented.
Application
Analytics, Decisions Used for business tasks.
making, etc.
In an OLTP database,
In an OLAP database,
Normalized tables are normalized
tables are not normalized.
(3NF).
The data is used in The data is used to
Usage of data planning, problem-solving, perform day-to-day
and decision-making. fundamental operations.
It provides a multi-
It reveals a snapshot of
Task dimensional view of
present business tasks.
different business tasks.
It serves the purpose to It serves the purpose to
extract information for Insert, Update, and Delete
Purpose
analysis and decision- information from the
making. database.
The size of the data is
A large amount of data is relatively small as the
Volume of data
stored typically in TB, PB historical data is archived
in MB, and GB.
Relatively slow as the
amount of data involved is Very Fast as the queries
Queries
large. Queries may take operate on 5% of the data.
hours.
The OLAP database is not The data integrity
often updated. As a result, constraint must be
Update
data integrity is maintained in an OLTP
unaffected. database.
It only needs backup from The backup and recovery
Backup and Recovery time to time as compared process is maintained
to OLTP. rigorously
It is comparatively fast in
The processing of complex
processing because of
Processing time queries can take a lengthy
simple and straightforward
time.
queries.
This data is generally
This data is managed by
Types of users managed by CEO, MD,
clerksForex and managers.
and GM.
Only read and rarely write Both read and write
Operations
operations. operations.
With lengthy, scheduled
The user initiates data
batch operations, data is
Updates updates, which are brief
refreshed on a regular
and quick.
basis.
The process is focused on The process is focused on
Nature of audience
the customer. the market.
Design with a focus on the Design that is focused on
Database Design
subject. the application.
Improves the efficiency of Enhances the user’s
Productivity
business analysts. productivity.
2. Snowflake Schema:
The Snowflake schema is an extension of the Star schema.
It organizes dimension tables into multiple levels of normalization, resulting
in a more normalized structure.
In a Snowflake schema, dimension tables may have additional tables related to
them, forming a hierarchical structure.
The normalization reduces data redundancy and improves data integrity
but adds complexity to query execution.
While the Snowflake schema offers more flexibility in terms of data
management, it can lead to slower query performance due to the
increased number of joins required.
The Snowflake schema is typically used when data integrity and
space optimization are of higher importance.
3. Fact Constellation Schema (also known as Galaxy Schema):
The Fact Constellation schema is a complex schema design used for
highly complex and diverse business scenarios.
It consists of multiple fact tables connected to multiple dimension
tables, forming a network or constellation of tables.
Each fact table represents a different grain or level of detail,
capturing different measures and dimensions.
The dimension tables are shared among the fact tables, allowing for
flexible analysis across multiple perspectives.
The Fact Constellation schema provides a high degree of flexibility and
allows for detailed and granular analysis.
However, it can be more challenging to manage and maintain due to
the increased complexity and potential redundancy.
2B) Write detailed note on how OLAP technology helps in discovery driven
exploration of data cubes
Discovery-Driven Exploration of Data Cubes using OLAP Technology:
1. Introduction to OLAP Technology:
OLAP technology is designed to facilitate advanced analysis and decision-
making by providing multidimensional views of data.
It allows users to explore and analyze data from different
dimensions, hierarchies, and levels of detail.
2. Key Features of OLAP Technology:\
a. Multidimensionality:
OLAP technology enables users to analyze data from
multiple dimensions simultaneously.
Dimensions represent various attributes or perspectives of the data,
such as time, geography, product, or customer.
b. Aggregation and Drill-Down:
OLAP technology allows users to aggregate data at higher levels
of abstraction for a summarized view.
Users can drill-down into more detailed levels to explore data at a
granular level and uncover underlying patterns or trends.
c. Slicing and Dicing:
OLAP technology supports slicing and dicing operations to
select specific subsets of data based on particular dimension
values.
Slicing involves selecting data for a specific dimension value,
while dicing involves selecting data based on multiple dimension
values.
d. Pivot (Rotation):
OLAP technology allows users to pivot or rotate the dimensions
to analyze data from different perspectives.
It enables users to rearrange dimensions to gain new insights
and explore alternative views of the data.
3. Discovery-Driven Exploration of Data Cubes:
OLAP technology plays a crucial role in discovery-driven exploration of data
cubes by providing interactive and flexible analysis capabilities.
a. Exploratory Analysis:
OLAP tools empower users to perform ad-hoc analysis and exploration
of data cubes without predefined queries.
Users can interactively navigate through dimensions, drill down into
details, and perform dynamic aggregations to discover patterns,
trends, or anomalies.
b. Hypothesis Testing:
OLAP technology supports hypothesis testing by allowing users
to define and test hypotheses on data cubes.
Users can slice, dice, and filter data to verify or refute their hypotheses
and gain insights into the underlying causes or correlations.
c. What-If Analysis:
OLAP tools enable users to perform what-if analysis by
modifying data values and observing the impact on the results.
Users can simulate scenarios, apply different assumptions,
and evaluate the potential outcomes before making decisions.
d. Visualizations and Reporting:
OLAP technology provides rich visualization capabilities to
present data in charts, graphs, or dashboards.
Users can create interactive reports and visual representations of data
to communicate insights effectively and support decision-making.
4. Benefits of Discovery-Driven Exploration using OLAP:
Enables users to gain in-depth insights and understanding of complex data
through interactive exploration.
Facilitates hypothesis testing and data-driven decision-making by providing
flexible analysis capabilities.
Supports iterative and exploratory analysis, empowering users to
uncover hidden patterns or trends.
Promotes a deeper understanding of the business, leading to improved
strategic planning and identification of opportunities or challenges.
In summary, OLAP technology plays a vital role in discovery-driven exploration of data
cubes. By providing multidimensional views, interactive analysis capabilities, and the ability
to slice, dice, drill down, and pivot data, OLAP enables users to explore data, discover
insights, and make informed decisions.
3A) Describe in detail about COGNOS IMPROMTU
Star Schema: Star schema is the type of multidimensional model which is used for data
warehouse. In star schema, The fact tables and the dimension tables are contained. In this
schema fewer foreign-key join is used. This schema forms a star with fact table and
dimension tables.
Snowflake Schema: Snowflake Schema is also the type of multidimensional model which
is used for data warehouse. In snowflake schema, The fact tables, dimension tables as well
as sub dimension tables are contained. This schema forms a snowflake with fact tables,
dimension tables as well as sub-dimension tables.
In star schema, The fact tables While in snowflake schema, The fact tables,
1. and the dimension tables are dimension tables as well as sub dimension
contained. tables are contained.
It takes less time for the While it takes more time than star schema for
4.
execution of queries. the execution of queries.
10. It has high data redundancy. While it has low data redundancy.
4A) List and explain the OLAP operations in multidimensional data model.
Same as 1A
4B) Explain the implementation of data warehouse system with data cubes.
Discuss the problems and the ways to handle them
7B) What is an ODS used for? How does it differ from an OLTP system.
Operational Data Store (ODS) and its Difference from OLTP Systems:
1. Operational Data Store (ODS):
An Operational Data Store (ODS) is a centralized database that integrates data
from various operational systems within an organization.
It acts as a repository for near-real-time or near-line data that is used for
operational reporting, data integration, and decision support.
2. Purpose and Uses of an ODS:
Integration of Data: The primary purpose of an ODS is to integrate data from
multiple sources and provide a unified view of operational data.
Near-Real-Time Reporting: An ODS allows for near-real-time reporting,
providing up-to-date information for operational decision-making.
Data Consistency: It ensures consistent and reconciled data by resolving data
conflicts and enforcing data quality standards.
Data Transformation and Aggregation: An ODS can perform data transformation
and aggregation to support reporting and analytics requirements.
Data Integration: It serves as a staging area for data integration processes,
enabling the consolidation and cleansing of data from disparate sources.
Simplified Reporting: An ODS provides a simplified and user-friendly interface
for reporting and accessing operational data.
3. Differences from OLTP Systems:
a. Data Purpose and Focus:
OLTP (Online Transaction Processing) systems are designed for transactional
processing, capturing and processing day-to-day business operations in real-
time.
ODS, on the other hand, focuses on data integration, consolidation, and near-
real-time reporting for operational decision support.
b. Data Structure and Granularity:
OLTP systems typically have a normalized data structure optimized for
efficient transaction processing and data consistency.
ODS often follows a denormalized or hybrid structure to support faster
reporting, data integration, and data transformation.
c. Data Volume and Historical Data:
OLTP systems handle high transaction volumes and store current data
for active operations.
ODS may store a larger volume of data by including historical data,
facilitating trend analysis and retrospective reporting.
d. Reporting and Analytics:
OLTP systems are not optimized for complex reporting and analytical
queries.
ODS provides a more flexible and optimized environment for
reporting, analytics, and decision support by consolidating data from
multiple sources.
e. Latency and Update Frequency:
OLTP systems provide real-time or near-real-time updates to ensure
immediate transaction processing.
ODS offers near-real-time or near-line updates, capturing data from
operational systems periodically or at predefined intervals.
f. Data Governance and Data Quality:
OLTP systems prioritize data integrity and enforce business rules and
validations for transactional data.
ODS incorporates data cleansing, reconciliation, and data quality
measures to ensure consistent and accurate data integration.
In summary, an ODS serves as a central repository for integrated operational data, supporting
near-real-time reporting, data integration, and decision support. It differs from OLTP systems
in terms of data purpose, structure, volume, reporting capabilities, latency, and data
governance.
ROLAP stands for Relational Online While MOLAP stands for Multidimensional Online
1.
Analytical Processing. Analytical Processing.
2. ROLAP is used for large data volumes. While it is used for limited data volumes.
In ROLAP, Data is fetched from data- While in MOLAP, Data is fetched from MDDBs
5.
warehouse. database.
6. In ROLAP, Complicated sql queries are used. While in MOLAP, Sparse matrix is used.
Metadata Management:
MQE maintains metadata about the data sources, query definitions, and other relevant
information.
It enables users to discover and explore available data sources, understand their
structure, and navigate the data model.
Metadata management in MQE facilitates easier query development and ensures
consistency in data interpretation.
OLAP (Online Analytical Processing) tools can be accessed and utilized on the Internet through
web-based applications or cloud-based platforms. Here's an explanation of how OLAP tools
are used on the Internet:
Web-based OLAP (WebOLAP) Tools:
WebOLAP tools provide OLAP functionality through web-based interfaces.
Users can access OLAP capabilities using a web browser without the need for client-side
installations.
WebOLAP tools offer features like multidimensional analysis, data slicing and dicing, drill-
down, and data visualization through interactive web interfaces.
Accessibility: OLAP tools on the Internet provide access to data and analysis from
anywhere, as long as there is an internet connection.
Scalability: Cloud-based OLAP platforms can handle large volumes of data and scale
resources based on demand.
Cost Efficiency: By utilizing cloud-based OLAP services, organizations can avoid
upfront infrastructure costs and pay for usage on a subscription or pay-as-you-go
basis.
Collaboration: Internet-based OLAP tools enable collaboration and sharing of
insights across geographically distributed teams.
Integration: OLAP tools on the Internet can integrate with various data sources,
including cloud storage, databases, and data warehouses.
Security: Data security and privacy measures must be in place to protect sensitive
information when using OLAP tools over the internet.
Data Transfer: Efficient data transfer mechanisms should be employed to ensure fast
and reliable access to data stored in remote locations.
Network Connectivity: Reliable and high-speed internet connectivity is crucial for
seamless interaction with OLAP tools on the internet.
Vendor Selection: When choosing web-based or cloud-based OLAP tools,
organizations should consider factors such as features, performance, scalability,
pricing models, and vendor reputation.
In summary, OLAP tools can be accessed and utilized on the Internet through web-based
applications and cloud-based platforms. They offer benefits like accessibility, scalability, cost
efficiency, and collaboration. However, organizations should consider security, data transfer,
network connectivity, and vendor selection factors when using OLAP tools on the Internet.
10A) Compare relational data model and multidimensional data model.
Comparison of Relational Data Model and Multidimensional Data Model:
1. Structure:
Relational Data Model: The relational data model represents data
as tables with rows and columns. It uses the concept of entities,
attributes, and relationships to organize data. Data is stored in
normalized form, eliminating data redundancy.
Multidimensional Data Model: The multidimensional data model
organizes data in a multidimensional structure known as a data cube.
It represents data as dimensions, hierarchies, and measures. Data
cubes provide a compact and efficient representation for analytical
processing.
2. Data Representation:
Relational Data Model: Relational databases represent data as
individual tables with rows and columns. Data is organized into
normalized structures, ensuring data integrity and reducing
redundancy. Relationships between tables are established using
primary and foreign keys.
Multidimensional Data Model: Multidimensional databases
represent data as multi-dimensional cubes, where dimensions
represent different aspects of the data, hierarchies represent levels
of detail, and measures represent numerical values to be analyzed.
Data cubes provide a more intuitive representation for analytical
queries.
3. Data Operations:
Relational Data Model: Relational databases support complex data
operations through Structured Query Language (SQL). Operations
such as joins, projections, selections, and aggregations are used to
manipulate and retrieve data. Relational databases are well-suited
for transactional processing (OLTP) and ad-hoc querying.
Multidimensional Data Model: Multidimensional databases
support OLAP (Online Analytical Processing) operations,
specifically designed for analytical processing. Operations include
slicing and dicing, drilling down, rolling up, and pivoting data
along dimensions. Multidimensional databases provide faster query
response times for analytical queries.
4. Data Analysis:
Relational Data Model: Relational databases are designed for
transactional processing and support basic analytical operations.
However, complex analytical tasks may require multiple table joins
and aggregations, which can be time-consuming.
Multidimensional Data Model: Multidimensional databases are
specifically designed for analytical processing. They provide fast
and efficient analysis capabilities, allowing users to navigate
through data along different dimensions and hierarchies, perform
aggregations, and generate meaningful insights.
5. Schema Design:
Relational Data Model: Relational databases use entity-relationship
modeling for schema design. Normalization techniques are applied
to eliminate data redundancy and ensure data integrity.
Relationships between tables are defined through primary and
foreign keys.
Multidimensional Data Model: Multidimensional databases use
dimension modeling for schema design. Dimensions represent the
characteristics of data, hierarchies define the levels of detail within
dimensions, and measures represent the numerical values to be
analyzed. Schema design focuses on providing a flexible and
intuitive structure for analytical processing.
6. Use Cases:
Relational Data Model: Relational databases are suitable for
transactional systems where data integrity and consistency are
critical. They are commonly used for online transaction processing
(OLTP) applications such as banking systems, e-commerce
platforms, and inventory management.
Multidimensional Data Model: Multidimensional databases are
ideal for analytical systems where fast and complex data analysis is
required. They are commonly used for decision support systems,
business intelligence applications, and data warehousing, where
analysis and reporting are crucial.
In summary, the relational data model and the multidimensional data model differ
in terms of structure, data representation, data operations, data analysis
capabilities, schema design, and use cases. Relational databases excel in
transactional processing and ad-hoc querying, while multidimensional databases
provide efficient analytical processing and faster response times for complex
queries.
10B) The college wants to record the Marks for the courses completed by
students using the dimensions: i) Course, ii) Student, iii) Time & a measure
Aggregate marks. Create a Cube and describe following OLAP operations:
(i) Slice (ii) Dice (iii) Roll up (iv) Drill down (v) Pivot
(i) Slice:
Slicing involves selecting a specific value or range of values along one or
more dimensions to create a subset of the cube.
Example: Slicing the cube by selecting the "Course" dimension as "Mathematics"
would show the aggregate marks for all students in the Mathematics course across
different time periods.
example : Aggregate marks of all students for all courses in the year 2016.
(ii) Dice:
Dicing involves selecting specific values or ranges of values along
multiple dimensions to create a more focused subset of the cube.
Example: Dicing the cube by selecting the "Course" dimension as "Mathematics"
and the "Time" dimension as "2022" would show the aggregate marks for all
students in the Mathematics course specifically in the year 2022.
example : Aggregate marks of all BE students, for courses of DB cluster for test
five years.
(iii) Roll up:
Roll up involves aggregating data from a lower level of detail to a higher level
of detail within a dimension.
Example: Rolling up the cube by the "Time" dimension from "Monthly" to
"Quarterly" would aggregate the aggregate marks for each quarter, providing
a higher-level summary of the data.
example : Cluster wise aggregate marks for all students for every year.
(iv) Drill down:
Drill down involves exploring data at a lower level of detail within a dimension.
Example: Drilling down the cube by the "Time" dimension from "Yearly" to
"Monthly" would break down the aggregate marks into monthly values, providing
a more detailed view of the data.
example theory, T/w & viva marks for all students for every year.
(v) Pivot:
Pivoting involves rotating the cube to view the data from different
perspectives, allowing for cross-tabulation and rearrangement of dimensions.
Example: Pivoting the cube to view the "Student" dimension horizontally and the
"Course" dimension vertically would provide a tabular view of the aggregate
marks for each student and course combination.
These OLAP operations provide different ways to analyze and explore data within the cube.
Slicing and dicing help in filtering and selecting specific subsets of the data. Roll up and drill
down facilitate summarization and detailed exploration within dimensions. Pivot enables the
rearrangement and cross-tabulation of dimensions for better analysis and reporting.
By applying these OLAP operations to the cube with the Course, Student, Time dimensions,
and Aggregate Marks measure, the college can gain valuable insights into the marks recorded
for courses completed by students across different dimensions and levels of detail.